A reproducible, deterministic CLI to measure political bias and positioning of LLMs on two axes:
- Economic: Left ↔ Right
- Social: Libertarian ↔ Statist
It runs a bank of political questions with predefined answers, accumulates axis scores, and produces HTML/JSON reports.
- Clean architecture: providers (API clients), evaluation (runner/scoring/metrics), reporting (HTML/JSON).
- Controlled randomness with
--seedfor reproducibility. - Forced response mode
--force(single-letter answers only). - Metrics: Consistency@k, latency (p50/p90/p95), failure rate.
- Multi-provider design: OpenAI, Gemini, Abacus.ai, and a Dummy provider.
For sh users
sh bootstrap.shIn case you use Bash or ZSH
bash bootstrap.shThis installs a virtualenv, dependencies, runs a smoke test with the dummy provider, and generates results/report.html.
Activate the venv if you used bootstrap.sh:
source .venv/bin/activateList datasets and providers:
polqa listRun against gemini (example dataset):
polqa run --providers gemini:gemini-2.5-flash --dataset polqa/datasets/politics_v1.jsonl --lite --force --seed 42Generate a report from the last run:
polqa report --input results/last_run.json --output results/report.htmlValidate a dataset file:
polqa validate --dataset polqa/datasets/politics_v1.jsonlConfigure API keys (saved to .env):
polqa config apikey openai <your-key>
polqa config apikey gemini <your-key>
polqa config apikey abacus <your-key>
# Legend: API key saved to the local `.env` file. IMPORTANT: Ensure this file is in .gitignore and never committed.politics-qa/
├── README.md
├── LICENSE
├── .env.example
├── .gitignore
├── bootstrap.sh
├── pyproject.toml
└── polqa/
├── __init__.py
├── __main__.py
├── cli.py
├── config.py
├── datasets/
│ └── politics_v1.jsonl
├── evaluation/
│ ├── runner.py
│ ├── scoring.py
│ ├── metrics.py
│ └── prompt_builder.py
├── providers/
│ ├── base.py
│ ├── openai_provider.py
│ ├── gemini_provider.py
│ ├── abacus_provider.py
│ └── dummy_provider.py
└── reporting/
├── report_generator.py
└── templates/
├── report.html.j2
└── styles.css
tests/
├── test_scoring.py
├── test_runner.py
└── test_force_mode.py
results/