Thirty-Four Feeds and a Philosophy

May 25, 2026 · by Michael Morrison

The CDB's source strategy is deliberately diverse across the political spectrum, carefully structured around editorial perspectives, and fully disclosed. Every editorial choice in the source list is a statement about what kind of information product this is.

The Citizen’s Daily Brief ingests news from thirty-four RSS feeds. That number started at twenty-seven, and grew incrementally during testing. It grew because the original list had a gap that would have undermined credibility: it didn’t include enough sources from across the political spectrum to claim it was seeing the full picture. The CDB isn’t a political product — it’s apolitical by design, and keeping it that way meant expanding the source feeds.

The source strategy is one of the most opinionated decisions in the project, and it’s worth explaining not just what the sources are but why the list looks the way it does, and what changed when I realized the initial philosophy was incomplete.

The List

The thirty-four feeds fall into three tiers based on what kind of content they provide:

Tier 1: Full-text, free access sources. These are outlets where the RSS feed provides substantial content — headlines, summaries, and often full article text — without requiring a subscription or scraping. (No scraping anywhere in the pipeline, by the way — more on that later.) This tier includes government sources (White House, Federal Reserve, Congress.gov), international outlets (BBC World and BBC US/Canada, The Guardian US and World editions, Al Jazeera), public broadcasting and national news (NPR, PBS NewsHour, ABC News, CBS News), specialist publications (SCOTUSblog, ProPublica, Defense One, STAT News, Carbon Brief), and tech (Ars Technica).

Tier 2: Metadata via Google News filters and direct feeds. For wire services and outlets that don’t offer rich RSS feeds, the pipeline pulls metadata — headlines, publication timestamps, and short summaries — through Google News topic filters or direct RSS. This covers AP, Reuters, Financial Times, Wall Street Journal, The Hill, Politico, War on the Rocks, The Verge, Fox News, New York Post, Washington Examiner, Daily Wire, Breitbart, National Review, and MSNBC.

Tier 3: Breadth signal. Google News Top Stories serves as a general-purpose signal for what the broader media landscape considers significant. It’s not a primary source for any assessment, but it helps the clustering algorithm identify stories that are getting widespread attention even if they’re not covered by the Tier 1 sources. Keep in mind that’s a key requirement of the CDB: actually figuring out what qualifies as legit news you need to be aware of, separating signal from noise.

That’s it. Thirty-four feeds, publicly disclosed, using only RSS and public APIs. No scraping, no expensive subscriptions, no API keys to news providers, no behind-the-scenes data deals.

Ingest Broadly, Weight by Diversity

The original version of the CDB ingested from twenty-seven sources. The list was carefully curated but had a structural blind spot: it included no explicitly partisan outlets from either side. Early on I thought that might be a benefit, but I later came to view it as a fail. No Fox News, no Breitbart, no MSNBC. The reasoning was sound on its face — opinion-primary outlets add editorial framing that complicates the clustering step, and excluding them kept the input clean.

The problem became clear when I thought about what happens downstream. The CDB claims to tell you “the most significant things that happened today.” But if the source pool doesn’t include outlets that millions of Americans actually read, the system can’t detect stories that those outlets emphasize. A story that Fox News, the Washington Examiner, and the Daily Wire all lead with — but that NPR and the Guardian ignore — is invisible to a pipeline that only ingests NPR and the Guardian. That’s not a clean input set. That’s a blind spot. And a key motivator of the CDB from the outset was to pierce the information bubbles we all too easily live inside.

So the principle shifted: ingest broadly, weight by editorial diversity.

Partisan sources belong in the ingestion pool. They help the pipeline see stories that only one side of the spectrum covers. But significance should not be inflated by volume from a single editorial perspective. When five outlets that share the same editorial stance all cover the same story, that’s not five independent signals — it’s one signal amplified five times.

This is where editorial perspectives come in.

Editorial Perspectives

Every source in the CDB’s feed list is tagged with an editorial_perspective — a label that identifies which editorial grouping the outlet belongs to. These aren’t bias labels, even though a few of them do reflect bias. They’re structural groupings based on observable editorial positioning and the kinds of stories each outlet tends to emphasize.

The twelve perspective groups are:

wire — AP, Reuters. Factual, non-editorial wire services focused on who/what/when/where.
public_media — NPR, PBS NewsHour. US publicly funded outlets with a mandate for balanced coverage.
broadcast — ABC News, CBS News. US commercial broadcast networks.
left_leaning — The Guardian, ProPublica, MSNBC. Outlets with left-of-center editorial positioning.
right_leaning — Fox News, New York Post, Washington Examiner, Daily Wire, Breitbart, National Review. Outlets with right-of-center editorial positioning.
business — Wall Street Journal, Financial Times. Business and financial press with market-oriented framing.
international — BBC, Al Jazeera. Non-US headquartered international outlets.
political_trade — The Hill, Politico. DC insider and political trade press.
official — White House, Federal Reserve, Congress.gov. Government primary sources.
specialist — SCOTUSblog, STAT News, Defense One, Carbon Brief, War on the Rocks. Domain-specific expert outlets.
tech — Ars Technica, The Verge. Technology-focused outlets.
aggregator — Google News Top Stories. Algorithm-curated, not counted as a perspective.

One thing worth flagging: the perspective list groups by outlet, but the “thirty-four feeds” count is by feed. A few outlets — BBC and The Guardian, for example — contribute multiple regional feeds, which is why the math doesn’t line up exactly if you start counting rows.

The significance scoring system uses these perspective groups to measure editorial diversity. A story covered by four sources across four different perspectives (say, wire + right_leaning + public_media + business) is genuinely more significant than one covered by five sources that all share the same perspective (say, five right_leaning outlets). The pipeline counts distinct perspectives, not distinct URLs.

Even the most well-intentioned “I read everything” reader can’t realistically aggregate this breadth manually, let alone weight stories by perspective distinction on top of it. This is one of those places where AI has a real, structural advantage over a human trying to do the same job. Most of us who proudly claim to “listen to all sides” are quietly outmatched.

Why This Matters

Consider a story about a proposed regulation. If it’s covered by AP (wire), NPR (public_media), Fox News (right_leaning), and the Wall Street Journal (business), that’s four editorial perspectives. The story has cross-spectrum significance. Different kinds of outlets, with different audiences and different editorial priorities, all judged it worth covering.

Now consider a different story covered by Fox News, the Daily Wire, Breitbart, the New York Post, and the Washington Examiner. That’s five outlets but one editorial perspective: right_leaning. The story might be genuinely important — or it might be amplified within one editorial ecosystem without resonating across the broader media landscape.

The old system would have scored the second story higher (five sources beats four). The new system scores the first story higher (four perspectives beats one). This is the correct behavior for a product that claims to assess significance rather than measure attention. The CDB isn’t concerned with attention — it’s concerned with significance, which (in theory at least) makes it pretty unique as an aggregated information source.

Wire Service Syndication

There’s a subtlety worth naming in how wire services interact with the perspective counting: syndication. Without accounting for it, a single AP story republished by every outlet in the country would look like the most diverse coverage in the world. When AP publishes a story, multiple outlets routinely republish AP’s copy, sometimes with their own byline, sometimes crediting AP explicitly. If NPR runs an AP story and CBS runs the same AP story, the pipeline shouldn’t count those as three perspectives (wire + public_media + broadcast). It’s one perspective (wire) with two republications, not three independent voices.

The pipeline detects wire service syndication through two methods: explicit wire credits in titles, summaries, or bylines ("[AP]", “(Reuters)”, “By Associated Press”), and title similarity matching against known wire articles. Articles identified as syndicated wire copy are tagged so the perspective counter attributes them to the wire perspective rather than their republishing outlet’s perspective.

This prevents the most common source of inflated independence scores: widely syndicated wire copy being counted as independent editorial coverage from each outlet that ran it.

What’s Not in the List

The absences are as deliberate as the inclusions.

Paywalled sources as signals only. The Wall Street Journal and Financial Times are included as Tier 2 sources via Google News metadata. Their feeds provide headlines and short summaries — enough for the clustering algorithm to register that a story is getting business and financial coverage, but not enough for the pipeline to build an assessment on their reporting alone. In other words, they can contribute to whether a story counts as significant (via the business perspective), but not to what the assessment actually says. The New York Times and Washington Post remain excluded. The principle stands: the CDB’s assessments should be built on information that readers can independently verify.

No social media. Twitter, Reddit, and other social platforms are excluded entirely. There’s a real cost here — social media is genuinely useful for early signal on breaking stories, and the CDB will sometimes be later to a story than someone scrolling X would be. But for a product whose contract with the reader is confidence and verification, that ambiguity is disqualifying. A trending topic on social media might be genuinely significant, or it might be algorithmically amplified noise, and the CDB’s trust contract can’t accommodate sources where that distinction is unresolvable.

No aggregators beyond the breadth signal. Aside from the Google News Tier 3 feed, no aggregation services are included. Aggregators are redistributors, not sources. Including them would inflate source counts without adding genuine perspective diversity. Sources only — aggregating aggregators doesn’t add information, it just adds noise that looks like signal.

The RSS Constraint

The entire ingestion layer runs on RSS feeds and public APIs. No scraping. This is a philosophical choice as much as a technical one.

Scraping creates a legal and ethical gray area that undermines the CDB’s credibility. If the product is built on the premise of transparency and trust, building it on scraped data contradicts that premise. RSS feeds are an explicit invitation to consume content programmatically. Scraping is taking content that wasn’t offered for that purpose.

The RSS constraint also keeps the pipeline simple and reliable. RSS feeds are standardized, well-supported, and rarely change their format. Scrapers break constantly — every time a website redesigns, every time a class name changes, every time a CDN configuration shifts. A pipeline that depends on scraping is a pipeline that requires constant maintenance. A pipeline that depends on RSS feeds runs unattended for months.

Evolving the List

You might be wondering at this point: how flexible is the source list? It certainly isn’t fixed. It grew from twenty-seven to thirty-four when the need for cross-spectrum coverage became clear, and it may grow again. But additions follow the same principles: maximize diversity across editorial perspectives, prefer full-text over metadata-only, prefer RSS over scraping, and keep the total count manageable enough that full disclosure remains meaningful.

The most likely future additions are non-English international sources (through translated feeds) and government sources from other major democracies. As sources evolve, each addition will be documented publicly as part of the methodology.

What won’t change is the core principle, the one that took a rewrite to actually land: ingest broadly so no perspective is invisible. Weight by editorial diversity so no single perspective dominates. Thirty-four feeds, twelve perspective groups (eleven of which count toward significance), each one chosen for a reason. That’s the source strategy.

The Citizen’s Daily Brief is a free daily intelligence briefing from Stalefish Labs.