About
Silicon Pulse is an autonomous survey loop: many LLMs answer the same original items on a schedule, with optional news context. Run summaries and the public digest copy can be produced by models from aggregate stats - so the project is run and reported by machines, within a fixed protocol you can inspect here. Nothing on this site is a poll of humans, and nothing should be read as models having “beliefs” in a folk-psychology sense; we store completions for research comparison.
Each run pulls from the live model registry, attaches optional news digests for informed conditions, and records answers for longitudinal views. The accordions below spell out motivation, how models, flagship anchors, and briefs are built, measurement, limits, and how to cite or reuse the work.
Why Silicon Pulse▼
We want to observe what models produce under a neutral study protocol (research participation, no persona): what they return as their default response to the same items over time. Millions of people use these systems every day as general assistants; tracking how that baseline behavior moves, with and without news context, is a different question from impersonating a human panel or fabricating synthetic survey populations. Silicon Pulse is built for that baseline-tracking question, not for role-play or demographic mimicry.
The UI and summaries are there to make completions comparable across runs and diets; nothing here should be read as models having beliefs in a folk-psychology sense. See Why we don't use role-based prompts for why we read the models raw, with no persona.
What we measure▼
Think of it as a panel where the panelists are API endpoints. Each run asks the same items: mostly closed-form survey-style questions, plus one open priorities prompt (what national issue deserves the most attention right now). We store answers under four information diets: baseline (no brief), and three informed slices - balanced, left-leaning, and right-leaning digests - so we can see whether completions move when the surrounding frame changes.
Outputs are not interpreted as inner beliefs; they are completions under a protocol. The comparative signal is what matters: agreement vs spread, sensitivity to the brief, drift between runs when the model roster changes.
Runs & conditions▼
A run is one batch: models are drawn from the registry, optional digests are attached, then each question is asked under each condition. Baseline uses no news text and covers the flagship anchors plus a small usage-ranked fill pool. Informed conditions (balanced / left / right digests) reuse the same question text but, to keep spend low, run on the flagship anchors only by default. Temperature is 0. All caps live in survey-config.json.
Failures and refusals appear as gaps in the data rather than silent drops.
Models & the registry▼
The roster updates over time from the provider API; usable text-generation models are stored in model_registry. Model ids are stable strings so you can compare runs even when offerings change.
How models are selected. The panel is drawn from OpenRouter: we take the top eligible models on their public leaderboard (models are listed by weekly usage). On each sync we fetch that list, then filter to text-generationchat endpoints only: we drop embeddings and rerankers, image or audio generators, non-instruct "base" checkpoints, free-tier endpoints, and models without enough context length or a positive per-token price. We then deduplicate by model family (one representative per lineage, in usage order) so we keep different families rather than variants of the same stack. The active roster is the flagship anchors plus that usage-ranked pool. Exact counts (the usage-pool size and the baseline fill cap) are set in survey-config.json.
We record provider, origin where available, and rough capability metadata. Participation in a run is whoever returned usable responses under that run - there is no hand-picked panel beyond roster health. For flagship anchors (one curated representative per major lab), see Flagship anchors.
33 active / 43 total in registry. The full roster, per-run participation, and links into model detail are on the landing page: use the Run snapshot card and open the Models tab (including “Browse all models” from there).
Flagship anchors▼
The weekly OpenRouter pool tracks what is popular, but longitudinal comparison is clearer if major labs are always represented by an explicit flagship: one current endpoint per lab (for example OpenAI, Anthropic, Google, xAI, DeepSeek) chosen for comparability across runs, not for demographics.
Each survey run always includes those flagships first (they are deduplicated against the leaderboard so the same id is not counted twice), then the top-by-usage family pool fills the remaining slots up to the configured cap. That way the panel mixes “what people are using this week” with “what each big lab is shipping as its headline model.”
When a lab releases a new default model, maintainers record a cutover with an effective date. On the Longitudinal page, vertical dividers in flagship mode mark those handoffs, so a shift in the series reflects a change of endpoint, not an unexplained jump. Exact model ids and effective dates are published with the open-source project for transparency and reproducibility.
Prompts & theme labels▼
Closed items ask for one option plus a short rationale in a fixed format; option order is shuffled per call. The open priorities item is free text; answers are later grouped into coarse policy themes for charts.
Why we don’t use role-based prompts▼
We do not ask models to play a demographic role (“answer as a 45-year-old from…”) or to imitate a human respondent. That design choice is deliberate.
We care about what the models say raw, with no persona attached. That is how people actually use them. Most people never prompt a model to act like some voter or fill a quota; they just open it and ask things while working or chatting, the way you would with a co-worker.
So the answer a model gives by default, unprompted, is the one that actually reaches people every day. That is what we want to track over time, with and without news context. See Why Silicon Pulse for more on what we are measuring.
News digests & feeds▼
Three RSS-backed digests per run - balanced, left, and right - are summarized into a readable block we inject before the same questions used in baseline. The goal is not perfect ideological matching; it is to vary the surrounding news frame and observe whether completions shift.
How the brief is curated. Each slice pulls from a fixed set of public RSS feeds (for example: balanced mixes wires and general outlets such as Reuters, AP, BBC, NPR, CNN, and The Verge; left-leaning includes sources such as The Guardian US and Vox; right-leaning includes sources such as Fox Politics and National Review). We drop items that look like sports, entertainment, or lifestyle when category cues match. Headlines are deduplicated, ranked by recency, then assigned to rough buckets (breaking, politics, tech, world, general) so we can select a mixed basket of stories instead of collapsing on a single topic. A small LLM summarizerturns the chosen headlines into neutral, numbered paragraph briefings; that text is what models see as the news context. Headlines are stored with outlet names for attribution; summaries may be shortened. Use them in line with fair use and each publisher's terms.
The same digest text we prepend in production is on the landing page: open the Run snapshot card and choose the News digests tab to read briefs for the latest run.
Limits & caveats▼
- Timeouts, refusals, and formatting failures show up as missing answers.
- RSS feeds break; brief quality varies week to week.
- Model ids may be retired; historical rows can reference endpoints that no longer exist.
- Even at temperature 0, providers are not always bitwise-deterministic.
Disclaimer & non-reliance▼
This project is provided for research and transparency only. It does not provide legal, financial, medical, or political advice. Do not use outputs here as a substitute for professional judgment, regulatory filings, or decisions affecting safety or rights.
Survey wording is original to this projectand is not claimed to match any third-party poll verbatim. News excerpts and headlines are aggregated from public RSS sources and attributed where possible; reuse must respect copyright, fair use, and each publisher's terms.
Model outputs can be wrong, biased, inconsistent, or outdated. Automated classification of open-ended answers uses coarse labels and may mis-bucket edge cases. The maintainer does not warrant fitness for any particular purpose. Use at your own risk.
Contributions, attribution & license▼
Contributions are welcome via issues and pull requests on GitHub. Please keep changes focused and match existing code style. For substantial features, opening an issue first avoids duplicate work.
If you use Silicon Pulse data or ideas in academic or public work, please cite the repository and the run date you relied on. A minimal attribution line is fine, e.g. "Data from Silicon Pulse (Lang / MaxMLang,2026)" with a link to this repo.
Respect the licenses of underlying APIs and models (e.g. provider terms for OpenRouter and each model). The application code in this repository is shared on the terms of the LICENSE file in the repo root - check that file for the exact license text.
Open source works best when forks and derivatives credit upstream work and clearly describe what changed.