About Aigora

Where does your AI stand, really?

Every large language model carries the political fingerprints of its training data and post-training alignment. Aigora is a transparent, reproducible way to surface those fingerprints — by running the same political quiz against every model and publishing every answer.

Models tested

184+

Questions per run

117

Political axes

Paired dimensions

The premise

Not all training is created equal — and it shows.

Modern LLMs are trained on a slice of the internet, then tuned with human feedback to be "helpful, harmless, honest". Both steps embed political assumptions: which sources count as authoritative, which positions count as harmful, which framings count as honest.

Most providers won't volunteer that information. Aigora infers it the only way you can from the outside: by asking models direct questions and reading their answers — at scale, in public, with a methodology anyone can audit, fork, or contradict.

The method, in one paragraph

Want the long version? See the methodology page.

We use Politiscales, an open-source 117-question political quiz scored along 23 axes. We send all 117 statements to each model in a single prompt and force a structured tool call — the model has to answer every question on the standard 5-point Likert scale (or opt out via no_opinion). The same scoring algorithm that powers the original site computes the per-axis scores. No system prompt that "frees" or "primes" the model; it answers the way it would if a user pasted the quiz into a chat.

What we deliberately don't do

Things that would invalidate the result.

—No "jailbreak" prompts. No system message that tries to bypass safety, unlock "true opinions", or roleplay a persona. The model responds as it would in production.
—No question cherry-picking. Every model gets the same 117 questions. We don't drop the boring ones.
—No editorial reweighting of scores.The Politiscales scoring algorithm is ported byte-for-byte from the upstream JS — we don't change the math because we don't like a result.
—No silent retries. Every model answers exactly once per nightly run. Failures (refusals, malformed responses) are logged with their error and visible.

Built with

100% open source, self-hosted on a homelab Kubernetes cluster.

Backend

Python orchestrator + scoring port
Native SDKs for OpenAI / Anthropic / Google
OpenRouter for the long tail (~200 more models)
Postgres 17 for storage
Kubernetes CronJob, nightly

Frontend

Next.js 16 (App Router, Server Components)
Tailwind v4 + shadcn/ui
Recharts for the radar charts
@lobehub/icons for provider logos
next/og for dynamic OG cards

Built in a weekend with Claude Code — every commit is in the repo.

Caveats & limits

Things worth knowing before you draw conclusions.

Non-determinism. LLMs sample stochastically — two consecutive runs on the same model can drift a few percentage points on borderline questions. We currently surface the latest run; future versions will average over multiple runs with confidence intervals.
Pattern-matching models. A handful of weaker models (e.g. gpt-3.5-turbo) don't actually engage with the statement — they pattern-match the axis prefix in the question id and emit binary strongly-agree/disagree. We flag these with an "unreliable" badge and exclude them from rankings and aggregates.
OpenRouter routing variance. Models behind OpenRouter can be served by different inference providers (DeepInfra, Together, Fireworks…). Same model, different infra, micro-differences in output. The Big-3 are queried natively to avoid this.
Reductive metrics. Single-number summaries like the L/R score lose information. They're useful headlines, but the radar and per-question data are the real ground truth.

Open data

Every answer of every model — browsable, downloadable.

The Postgres schema, the scoring algorithm and the full runner code are open source on GitHub. Nothing is curated, nothing is cherry-picked. Per-question answers for every tested model are visible on each model's page. A public read-only API is on the roadmap.

Frequently asked

Things people DM us about.

Why Politiscales and not the political compass / 8values / OECD survey?

Politiscales is open source, multi-axis (23 dimensions vs 2 for the political compass), and its 117 questions are concise enough to fit comfortably in one prompt. The scoring algorithm is also reasonably documented — important for reproducibility. We may add other tests later as comparison points.

Why does the average AI lean progressive / pro-regulation / internationalist?

That's an empirical observation, not a design choice. Possible explanations include over-representation of progressive viewpoints in the training corpora, RLHF annotators leaning that way, providers proactively avoiding culturally conservative answers as a liability, or all of the above. We measure; we don't explain. The data is there for you to make your own case.

Can a model rig its own score?

In principle yes — a model that recognizes the Politiscales quiz could refuse, or answer strategically to land on a specific position. We don't see this in practice yet, but we should expect it as awareness of these tests grows. Opacity of training is the only real defense for the model; transparency of testing is the only real defense for the user.

Is this safe to share / cite?

The methodology is reproducible and the data is open, so yes — but please link the model's page rather than screenshot, so the reader can see the actual answers and the date the test was run. Models drift over time.

Source code

github.com/MarlBurroW/aigora

Politiscales upstream

Open-source quiz this site uses

Suggest a model

DM @MarlburroW38 on X

Full methodology

How the test is administered & scored