The compression filter

A language model does not answer with your page. It answers with what survives compression — what remains after billions of tokens have been weighted, ranked, and discarded into a single response of a few hundred words. The compression filter is brutal and not particularly visible from the outside. It is, however, learnable.

What survives compression has properties. It tends to be definitionally clear in its first sentence. It tends to carry data points the model can quote without inventing them. It tends to be structurally legible — tables, lists, comparison frames — so the model can re-emit the structure rather than paraphrase around it. It tends to be self-contained: a paragraph that requires the rest of the article to make sense will lose to one that does not.

This is not a stylistic preference. It is what we observe, empirically, in eight months of measuring which content becomes source at four major LLM providers across thousands of buyer-intent prompts. The variance is not random. The patterns are stable across providers, though the weighting differs. And the patterns are buildable.

The page is not the unit of measurement. The function the page performs in the buyer's mental state is the unit of measurement.

Semantic coverage, defined

The dominant content strategy frame of the last decade was keyword density on the input side and topic coverage on the output side. Both frames assumed search engines as the reader. They have not aged well, because LLMs do not read the way search engines do.

What LLMs read for, when they read your content as a candidate source, is whether you have performed the cognitive functions that the user's question implicitly requires. A user asking what a concept means requires a definition. A user asking how two options differ requires a comparison structure. A user asking what to look for in a provider requires a criteria list. The page that performs the right function for the question becomes source. The page that tries to perform all functions at once becomes none of them.

Semantic coverage is the degree to which a page performs the specific functions required by the information situation it serves. It is not measured in words or keywords. It is measured in function-presence, function-completeness, and function-precision. Three pages of 1,500 words each, all attempting the same coverage, will lose to one page of 800 words that performs three precise functions and stops.

The eight situations

Through systematic prompt analysis and grounding observation, the buyer-journey decomposition we introduced in Episode Three resolves cleanly into eight distinct information situations, each with its own function requirements. We do not invent the situations from theory. We observe them in what users ask and what LLMs answer.

The Eight Information Situations and Their Function Anchors

Understand
the Problem
describe_symptoms · contextualize_impact
Understand
the Term
define_term · clarify_misconceptions
Discover
the Topic
give_orientation · differentiate_terms
Explore
Solutions
list_solution_types · describe_each_type
Compare
Options
comparison_table · pros_cons_overview
Clarify
Criteria
list_criteria · quality_indicators
Compare
Providers
provider_fit_guide · show_credentials
Ready
to Decide
show_requirements · offer_consultation

The point of this granularity is not academic. Every named function corresponds to a structural pattern an LLM can recognize and re-emit. define_term is a sentence beginning with the noun and ending with a precise predicate. comparison_table is a table with column headers naming the comparison dimension. list_criteria is an enumerated list where each item is a criterion plus its operationalization. These are not abstractions. They are detectable patterns in HTML, and they are exactly what models reach for when constructing answers.

The function inventory

From three years of pattern observation and eight months of focused LLM-grounding measurement, we maintain an evolving inventory of content functions. The current production set contains thirty-one named functions, each with operational detection rules and empirical evidence of their effect on citation rate.

A function is not a topic. It is what the page does when read. The same topic — say, pricing structure — can be addressed via at least four distinct functions: a cost comparison table, a list of cost drivers, a calculator, or a worked example. Each performs differently for different prompts. None is universally superior. The architectural decision is which function to assign to which page, given the information situation that page is meant to serve.

The function inventory has a corollary that surprises most content teams when they first see it: more is not better. A page that attempts to perform six functions simultaneously will perform none of them well, will fail every individual function-detection test, and will not become source for any prompt. We see this repeatedly in the wild. The instinct to make pages comprehensive is the instinct that destroys semantic coverage.

The principle

One page, one situation, one function-cluster. The semantic coverage of a site is the sum of focused pages, not the sum of comprehensive ones. A site of 150 precise pages outperforms a site of 30 long ones — measurably, at the citation level, across all four major providers.

The silent citation pattern

In mid-2026, we observed a phenomenon at Google Gemini that quietly reframes the entire content-architecture question. Over a sustained measurement of roughly twenty-two thousand executions in a single B2B category, Gemini cited the client's URLs as grounding source approximately two thousand eight hundred times across nearly five hundred distinct URLs. Thirty-eight percent of the client's glossary pages were retrieved as source at least once. By any reasonable measure, the content was working.

And yet, in roughly sixty-one percent of those same answers, the brand was not named in the response text at all. The model used the content. It did not attribute to the source.

~2,800
grounding citations
on Gemini across a single B2B client domain, multi-week measurement window
~490
distinct URLs cited
across glossary, methodology, and product-explanation pages
61%
silent share
of grounded answers in which the brand is not named in the response text

This is the silent citation pattern, and it is the most important strategic finding of our last twelve months. It means: content can be absorbed and used as authoritative source by a major LLM without the brand ever appearing in the answer the user sees. The content was good enough to ground the response. It was not anchored enough to brand the response.

The implication for architecture is sharp. Semantic coverage produces grounding. Branded-entity anchoring produces visible attribution. They are two different layers, and both must be designed for. A glossary page can be authoritative without being branded. A branded page can be visible without being authoritative. The art is to be both — and the technique is to embed the brand as a semantic anchor inside the function, not as a footer decoration outside it.

Concretely: a definition page that opens with a category-level definition gives the model a definition. A definition page that opens with the same category-level definition bound to the brand as the entity performing it gives the model a definition and a branded entity to attribute to. Same function, different anchoring. Different citation behaviour downstream. The technique is not to add brand mentions; it is to make the brand the grammatical subject of the function the page performs.

This also reframes the open question. In the thirty-nine percent of grounded answers that do attribute, what is different about the pages being cited? Our working hypothesis, supported by emerging pattern analysis: those pages bind their function to their entity inside the first paragraph. The pages that remain silent do not. The asymmetry between sixty-one percent silent and thirty-nine percent attributed is, in this reading, a measurable consequence of anchoring discipline — not of model behaviour alone.

Precision over density

The second architectural axis after coverage is precision. Coverage asks: does the page perform the required function. Precision asks: does it perform it in the form the model can reuse.

Models reuse what is quotable. They quote what is dense, specific, and traceable. Princeton research published in late 2025 — and replicated in our own client-side measurement — shows that proprietary statistics with named sources increase citation rate by approximately a factor of three over the equivalent claim made without numbers or attribution. The same finding holds for date markers, for explicit unit specifications, and for named methodology references.

The practical translation is uncomfortable for marketing-trained writers, because it inverts the instinct toward smooth, flowing copy. The first sentence of a glossary page should not invite the reader in. It should answer the question. The second sentence should anchor the answer with a number, a year, or a named reference. The third sentence may, finally, address context — but only if the context is itself dense with information the model can re-emit.

Sentence pattern Common form Citation-optimized form
Opening Atmospheric framing that invites the reader in A first-sentence definition that names the concept and predicates it precisely, with a quantified anchor where one exists
Definition Soft introduction acknowledging that the term has several uses A specific operational definition, ideally tied to a named legal or industry frame, with the variants enumerated rather than gestured at
Comparison Generalized statement of advantages without a comparison structure A direct, dimensional comparison: each row a named dimension, each cell a numeric or categorical fact, with the relevant unit explicit
Criterion Adjectival recommendation ("choose a reliable provider") An enumerated list of criteria, each item paired with its operationalization — what to look for, how to verify it, what range or threshold matters

None of this is stylistic preference. The right-hand column is what models reach for when constructing answers. The left-hand column is what models discard during compression. Style as it has been taught for forty years was optimized for human attention. The new optimization target is model retention. The two are not the same, and pretending otherwise is the most expensive habit in B2B content right now.

Architecture, explicit

The architectural answer to the LLM-authority problem has three layers, not one. Most content strategies address one and call it sufficient. Few address all three.

Coverage layer. One page per situation. One function-cluster per page. No page attempting more than it can perform precisely. The site as a portfolio of focused pages, mapped explicitly to the eight information situations, with named functions per page and detection rules for each.

Precision layer. First-sentence definitions. Numbers with sources. Date markers. Tables with named columns. Lists with criteria, not slogans. Sentence patterns the model can reuse without paraphrasing. The cost of this layer is editorial — it requires writers who will sacrifice flow for density. The reward is citation behaviour that compounds over months.

Anchoring layer. The brand as a semantic entity inside the function, not as a logo outside it. Named provider in the comparison row. Named methodology in the criterion list. Named case in the worked example. Each anchoring instance gives the model a branded reference it can — and over time, will — attribute to in answers.

The synthesis

Coverage produces eligibility. Precision produces retention. Anchoring produces attribution. A site that does only one of the three becomes invisible in one of three ways. A site that does all three becomes the source LLMs reach for and the brand they name.

The closing argument

Five episodes. The first observed the closing window. The second diagnosed the measurement failure. The third decomposed the invisible journey. The fourth specified the measurement instrument. The fifth — this one — closes with the construction principle.

None of this is theory we are testing. Every claim in this episode is observable in production data across multiple clients, measured biweekly, traced URL-by-URL through grounding-source extraction at four providers. The function inventory is operational. The coverage measurement is operational. The silent-citation pattern was discovered, not predicted, and the architectural correction is in active deployment.

What the series has documented, episode by episode, is the operating instruction set for a category of content work that did not exist three years ago and that most agencies still describe in the vocabulary of the work that preceded it. The vocabulary that fits is not "SEO for AI." It is content architecture for absorption. Coverage, precision, anchoring. Three layers, all measurable, all buildable, all observable in their effects on citation rate.

The brands that build for absorption — explicitly, deliberately, with measurement closing the loop — will spend the next two years quietly accumulating presence in the LLM answers that increasingly precede every B2B purchase decision. The brands that do not will spend the same two years writing content that compresses to nothing.

That is the window. The architecture is no longer a guess. The closing argument of this series is also its operating instruction: build for compression survival, measure what survives, repeat.