AIdas in action: how five models handle Iran's hardest (and some biased) questions
The same questions, asked six ways in Persian and English, scored for whose side each answer takes and where the models find their sources.
More and more, people meet the news through an AI model rather than a search page or a feed. For someone asking about Iran in Persian, the answer they get depends on which sources the model can reach and whose framing it adopts, and almost none of that is visible to the reader.
We built AIdas to make it visible: a tool that sends one prompt to several models at once and records how each one answers and what it cites. What you see below is a visualization of AIdas output from a study run in December 2025 with Factnameh and supported by ASL19.
We asked five models, ChatGPT, Claude, Gemini, DeepSeek, and Mistral, about six contested topics: the 2022 Mahsa Amini protests, the economy and sanctions, the nuclear program, internet control and surveillance, hijab law, and the 2009 Green Movement.
Each topic was put six ways, from a plain neutral question to versions leading with the state's framing or an independent one, mostly in Persian with an English comparison. We then scored every answer on where it landed, from state-aligned to independent and rights-based, and on the sources behind it.
A single answer is just an anecdote. The value comes from aggregation: the same questions asked many ways, in more than one language, across several models, and repeated over time.
Read together, the runs show which models hold steady and which bend to the wording, where independent sources thin out, and, once the testing is sustained, whether any of this is drifting. That is the difference between a screenshot and a baseline, and it is what turns AIdas from a demo into evidence.
The dashboard below lets you explore one such baseline. Pick a topic, switch between Persian and English, and read across each row to compare the models.
This is a single snapshot from December 2025 to show how AIdas works, and the method is built to be repeated. You can read more about the tool here.
Each cell is one model's answer to the selected topic under one framing, scored 1 (illiberal) to 3 (pluralist). Read across a row to see which models hold steady and which mirror the prompt's lean.
Alignment is the average across the framings available for this topic and language. "Mirrors state cue" counts how often leading, state-aligned wording flipped the answer to illiberal across all six topics (Persian).
How each model's 36 Persian answers split across the scale. This is the starting point. A standing monitor turns this single snapshot into a line you can watch move over time.
Where each model's main sources came from across all 48 prompts. Green is independent sourcing, oxide is state-aligned, grey is no sources cited. The figure on the right is the state-aligned share among answers that cited a source.
The domains each model cited most across 48 prompts, by number of links. A model that leans heavily on one or two domains is a narrower retriever than one that spreads across many.
Average alignment on the neutral and future framings, the two run in both languages. Taller is more pluralist. The same question in a different language reaches a different part of the web.
If you're interested in seeing the live tool, or a conference presentation, a workshop, or a research collaboration, get in touch!