Methodology

Hundreds of millions of people use LLMs every day. This tracks what they say - unprompted.

Silicon Pulse is an autonomous panel where the respondents are language models. On a schedule, many models answer the samesurvey items under a fixed, inspectable protocol - with no role-play and no persona. We don't ask them to imitate a person; we record their default completions and watch how they cluster, how they move when the news frame changes, and how they drift as the model roster changes.

Nothing here is a poll of humans, and nothing should be read as a model having “beliefs.” These are completions, stored for research comparison.

1Panel

Flagship models from major labs, plus the week’s most-used models.

2Ask

The same survey items, no persona, options shuffled, temperature 0.

3Vary

Repeat with balanced / left / right news briefs prepended.

4Record

Store every completion so runs can be compared over time.

01Who answers

The panel mixes two things on purpose. Flagship anchors - one current headline model per major lab (OpenAI, Anthropic, Google, xAI, DeepSeek) - are always included so longitudinal comparison stays clean. The remaining slots are filled from the most-used models that weekon OpenRouter's public leaderboard, deduplicated to one representative per model family.

When a lab ships a new default model, we record a cutover with an effective date, so a shift in a series reflects a change of endpoint rather than an unexplained jump. Exact model ids and dates are published with the open-source project.

33 models currently active · 43 in the registry. The full roster and per-run participation are on the landing page.

02Flagship anchor changelog

One headline model per lab anchors the longitudinal series. When a lab ships a new default, we record a cutover with an effective date, so a shift in a chart reflects a changed endpoint rather than an unexplained jump. The full history is below.

OpenAIgpt-5.51 cutover⌄

gpt-5.5current · from Apr 24, 2026
gpt-5.4from Mar 5, 2026

Anthropicclaude-opus-4.82 cutovers⌄

claude-opus-4.8current · from May 27, 2026
claude-opus-4.7from Apr 16, 2026
claude-opus-4.6from Feb 4, 2026

Googlegemini-3.1-pro-previewno changes yet⌄

gemini-3.1-pro-previewcurrent · from Feb 19, 2026

xAIgrok-4.31 cutover⌄

grok-4.3current · from Apr 30, 2026
grok-4.20from Mar 31, 2026

DeepSeekdeepseek-v4-pro2 cutovers⌄

deepseek-v4-procurrent · from Apr 24, 2026
deepseek-v3.2from Dec 1, 2025
deepseek-chat-v3-0324from Mar 24, 2025

See these handoffs marked inline on the longitudinal view.

03What we ask

Mostly closed-form items across a deliberately broad range of topics - technology and AI, the economy, institutions and the media, the environment, social trust, the role of government, work and automation, free expression, and more - plus one open-ended priorities prompt.

Wording is original to this project. Where an item echoes a long-running social-science theme, we adapt the theme (e.g. from public instruments like the World Values Survey or the General Social Survey) but never copy poll text verbatim, so the battery stays license-clean. Each closed item asks for one option plus a one-sentence rationale, and option order is shuffled on every call.

04Baseline vs. news diets

Every item is asked baseline (no news) and again under three informed conditions, where a short briefing built from balanced, left, and rightRSS feeds is prepended before the same question. The aim isn't perfect ideological matching - it's to vary the surrounding frame and see whether completions move.

To keep cost predictable, the news conditions run on the flagship anchors by default, while baseline covers the wider panel. The exact briefs for the latest run are visible from the landing page.

05Reading the results

We don't score answers against a “correct” human distribution. The signal is comparative: how much the panel agrees vs. spreads, how sensitive answers are to the news brief, and how things drift between runs. Open-ended priorities are grouped into coarse policy themes by a small classifier model for charting. Failures and refusals show up as gaps, not silent drops.

Each run can also produce a short briefingwritten by a model from the run's aggregate statistics - the project is run and reported by machines, within this protocol. To keep spend low, that briefing and the theme classifier use an inexpensive model.

06Limits & honest caveats

These are model completions under one protocol - not human opinion, and not model “beliefs.”
Flagship models are sampled several times per question; other models answer once.
News feeds break and brief quality varies week to week.
Automated theme labels are coarse and can mis-bucket edge cases.
Retired model ids may linger in older rows.

For research only - not legal, financial, medical, or political advice. See the About page for the full disclaimer, the research framing behind avoiding role-prompts, and citation guidance.