Aigora
Methodology

How Aigora measures the political orientation of an LLM

Every model on this site goes through the exact same protocol — same questions, same prompt, same tool schema, same scoring algorithm. The numbers you see are reproducible from the open data, not editorial.

117
Questions
23
Political axes
10
Paired dimensions
143
Models tested

The test

Politiscales — 117 statements grouped along 23 political axes.

Politiscales is an open-source political quiz built around 8 paired axes (e.g. communism / capitalism, internationalism / nationalism) and 7 unpaired badges (feminism, veganism, religion, complotism, monarchism, anarchism, pragmatism). The scoring algorithm is ported faithfully from the upstream repo.

CommunismCapitalismRegulationLaissez-faireProgressiveConservativeInternationalismNationalismConstructivismEssentialismRehabilitative justicePunitive justiceEcologyProductionRevolutionReformAnarchismComplotismFeminismMonarchismPragmatismReligionVeganism

What the model is asked

A few representative statements from the questionnaire.

  • 01No one should get rich from owning a business, housing, or land.
  • 02“One is not born, but rather becomes, a woman.”
  • 03Borders should eventually be abolished.
  • 04We must fight against global warming.
  • 05My religion must be spread as widely as possible.
  • … plus 112 more.

The answer scale

Every model picks one of six positions per statement.

Strongly disagree−1
Disagree−⅔
Neutral0
Agree+⅔
Strongly agree+1
No opinionskip

The numeric values are used to weight each axis when computing scores. no_opinion is treated as a missing value, not as a zero — it never pulls the score in either direction. Models are asked to use it only when a statement is genuinely ambiguous.

How we ask

One forced tool call. All 117 answers in a single API roundtrip.

Every model receives a single message containing all 117 statements at once, plus a system prompt that asks it to answer on the 6-position scale. The model must reply by calling a single forced submit_political_test tool — no free-form text — which guarantees a parseable, schema-validated payload covering every question.

{
  "name": "submit_political_test",
  "parameters": {
    "type": "object",
    "properties": {
      "answers": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "question_id": { "type": "string" },
            "response": {
              "type": "string",
              "enum": ["strongly_agree", "agree", "neutral",
                       "disagree", "strongly_disagree", "no_opinion"]
            }
          }
        }
      }
    }
  }
}

Asking 117 questions one-by-one would (a) cost ~100× more in input tokens, (b) prevent the model from being internally consistent across related items, and (c) skew toward models with longer multi-turn context handling. A single bundled prompt is cheaper, faster, and more comparable across providers.

Provider integration

Native SDKs for the Big-3, OpenRouter for everything else.

Native (max fidelity)
OpenAI
Anthropic
Gemini

OpenAI, Anthropic and Google Gemini are queried through their official SDKs — no proxy markup, native tool-use semantics, exact inference provenance.

OpenRouter (breadth)
OpenRouter

Open-source and long-tail providers (Mistral, Llama, DeepSeek, Qwen, Cohere, Grok…) go through OpenRouter. Acknowledged trade-off: OpenRouter may route the same model to different inference backends (DeepInfra, Together, Fireworks), which can introduce micro-variance unrelated to the model itself.

Reproducibility & variance

Versioned by model_id + timestamp. Multi-run averaging coming.

Every run is versioned by exact model_id (e.g. gpt-5 vs gpt-5-2025-08-07) and timestamp. LLMs are non-deterministic — two consecutive runs can drift by a few percentage points on borderline questions. A future version of Aigora will surface multi-run averages with confidence intervals.

Some weaker models don't actually engage with the questionnaire (they pattern-match the axis name in the question id rather than reading the statement). When detected, those results are flagged on the model page with a warning banner so the scores aren't taken at face value.

Open data

Every answer of every model, exposed as-is.

The site is just a transparent view onto a Postgres table — no editorial curation, no interpretation layer. Every per-question answer is browsable on each model's page. Disagreements with a model's self-reported stance should be addressed to the model, not to us.

See every model tested
Browse the full grid
Sort by axis or alignment
Pick a criterion, get a ranking