Factuality Evaluator for News Articles

Why this matters

Misinformation and biased reporting make it hard to trust what we read. We built a tool that scores news articles on six factuality dimensions—from political bias and clickbait to toxicity and headline–body alignment—and combines them into a single reliability score. The goal is to give a quick, interpretable signal of how trustworthy an article is, and to encourage people to look closer when something looks off.

What we built

Our system uses multiple AI agents working together: one agent per dimension (e.g., “How clickbait is the headline?” or “How balanced is the political framing?”), then a final agent that weighs their answers and produces a combined veracity score (0–10, lower = more reliable) plus a short written assessment. You can try it yourself in the Evaluate tab by pasting an article title and content.

How we approached it

We experimented with different ways to prompt the AI—from very simple instructions to structured “chain-of-thought” (step-by-step reasoning) and a “fractal” version with verification and self-correction. We also gave some agents access to web search so they can check claims against external sources. The pipeline is built with the Google Agent Development Kit (ADK) and Gemini.

Results and impact

We evaluated our prompting strategies on human-labeled articles. The main takeaway: the best strategy depends on what you're measuring. For example, a minimal “simple” prompt agreed with humans most on toxicity, while adding web search gave the best agreement on whether the headline matches the body. Our more structured strategies (Full CoT and Full FCoT) did best on political affiliation and sensationalism and give interpretable step-by-step explanations—so we use them as the default in the app, while keeping the others available for comparison.

Want the full numbers? Check out our report and code (links below) for tables, methodology, and to run your own evaluations.

Try it and learn more

Use the Evaluate tab to score any article. For the full write-up, methodology, and results tables, see:

Live demo — Try the Factuality Evaluator
Full report — Read the report
GitHub — View code and data

Built with Google ADK and Gemini. Data and licenses: see the repository.