A structured-data playbook for rich results and AI citations
Most structured data I audit is decoration. It validates, it sits in the page, and it earns nothing, because it was bolted on by a plugin that doesn't know what the page is actually about. This is the playbook I use to make markup load-bearing: the schema types that genuinely move the needle, the entity graph that ties a site together, and the discipline that keeps it from going stale or getting flagged as spam.
Structured data is a description, not a ranking lever
Let me set expectations before the tactics, because this is where most programmes go wrong. Structured data does not, on its own, improve your rankings. Google has said this repeatedly and my own experience backs it up: markup is an eligibility mechanism, not a boost. What it changes is how your result is presented and how confidently a machine can understand the entities on the page. Those two things have real downstream value, but neither is "rank higher because you added JSON-LD".
The value shows up in two places. First, rich results: review stars, FAQ accordions, breadcrumb trails, recipe cards, product pricing. These change your click-through rate and the visual real estate you occupy, which is a pipeline lever even when position is unchanged. Second, and increasingly the bigger story, machine comprehension. When a crawler, a Knowledge Graph pipeline, or a large language model is trying to work out what your organisation is, who wrote a piece, and how your products relate, explicit markup removes ambiguity that the model would otherwise have to guess at. Guessing is where citations get lost.
So the right mental model is this: you are writing a precise, machine-readable description of the things on your page, and you are connecting those things into a graph. Do that well and rich results follow where they're eligible. Do it badly and you've added weight and risk for nothing.
Which schema types actually earn rich results
There are hundreds of schema.org types. Google supports a couple of dozen for rich results, and within that set the practical hit rate varies enormously. I sort them into three buckets when I plan a deployment.
Types that reliably earn a visible feature, assuming the underlying content qualifies and the page is eligible:
ProductwithOffer(andAggregateRatingwhere you have genuine reviews) for pricing, availability and stars in commerce results.Recipefor the recipe card with cook time, ratings and the image carousel.BreadcrumbListfor the breadcrumb trail that replaces the raw URL in the SERP.Event,JobPostingandVideoObjectfor their dedicated experiences and aggregator placements.HowToandFAQPage, with a heavy caveat I'll come to: their SERP treatment has been sharply restricted.
Types that earn nothing visible but pull serious weight for comprehension: Organization, Person, WebSite (with SearchAction for the sitelinks search box) and Article. You won't see a flashy SERP feature from Organization markup, but it is the spine of your entity graph and a primary input to your Knowledge Panel. I treat these as mandatory infrastructure regardless of rich-result eligibility.
Types people add out of habit that do essentially nothing for results: WebPage on its own, SiteNavigationElement, generic Thing, and the long tail of properties Google simply doesn't consume. Adding them isn't harmful in moderation, but don't mistake a green validator for an earned feature. Validation confirms the syntax is legal; it does not confirm Google will use it.
The FAQ and HowTo situation deserves a direct word, because it's the clearest recent lesson in not chasing features. Google rolled FAQ rich results back to a tiny set of authoritative government and health sites, and HowTo rich results were effectively retired on desktop and mobile. If you built a content strategy around stuffing FAQ schema onto every page to grab SERP space, that ROI evaporated. I still use FAQPage where the page genuinely is a Q&A, because it remains clean comprehension signal and useful for AI grounding, but I no longer promise anyone an accordion in the results. That distinction, what the markup is for versus what it displays, is the whole game now.
JSON-LD is the only delivery format worth using
You can express structured data three ways: Microdata and RDFa inline in your HTML, or JSON-LD in a script block. Use JSON-LD. Google explicitly prefers it, and the practical reasons are decisive.
JSON-LD decouples your markup from your DOM. Your structured data lives in one <script type="application/ld+json"> block instead of being smeared across dozens of itemprop attributes that break the moment a designer reorders the template. It's far easier to generate server-side from your actual data model, easier to diff in code review, and easier to keep a single source of truth for. Inline Microdata, by contrast, rots: someone refactors the component, the attributes drift, and you don't notice because the page still looks fine to humans.
The one thing JSON-LD lets you do that you must resist is decoupling the markup from the visible content. Because the JSON isn't tied to the DOM, it's trivially easy to assert things in the markup that aren't on the page. That's exactly the behaviour Google's spam systems look for, and it's the subject of the next section. Use JSON-LD for the engineering benefits, but hold yourself to the rule that everything in the block is also visible to a human on the page.
The connected entity graph is where the real work is
Here's the difference between markup that validates and markup that compounds. Most sites emit isolated islands: a Product here, an Article there, an Organization in the footer, none of them aware of each other. A connected graph uses @id to give each entity a stable, canonical identifier and then references those identifiers everywhere the entity reappears. You define your organisation once, then every article, product and breadcrumb points back to the same node instead of redefining it.
Two properties do the heavy lifting. @id is your internal join key: a canonical URI (I use the page URL plus a fragment, like https://example.com/#organization) that lets you say "the publisher of this article is that organisation, the one I already defined". sameAs is your external join key: an array of authoritative URLs that disambiguate the entity to the outside world. For a company that's your Wikipedia entry, Crunchbase, LinkedIn, and your verified social profiles. For a person it's their LinkedIn, ORCID, GitHub, or Wikidata entry. These are the breadcrumbs a Knowledge Graph pipeline follows to decide that your "Acme" is the same Acme it already knows about.
Practically, I build the graph around a handful of nodes that nearly every site needs:
Organization(orLocalBusinessfor a physical premises): name, logo,sameAs, contact points. Defined once, referenced aspublishereverywhere.WebSite: thepublisherlink back to the Organization, plus theSearchActionif you support site search.Personfor authors: real bylines withsameAsto their professional profiles, referenced asauthoron everyArticlethey wrote.Article/BlogPosting:authorandpublisherby@id, plusdatePublishedanddateModifiedthat match reality.BreadcrumbList: the real navigational path, item by item.ProductwithOffer, orFAQPage/HowTowhere the page genuinely is one.
When these reference each other by @id, you've handed the crawler a small, internally consistent knowledge graph rather than a pile of unrelated assertions. That coherence is what lets a machine answer "who is behind this site, who wrote this, and is this the same entity I've seen elsewhere" without guessing.
A worked snippet you can adapt
Here's the pattern in practice. Note the use of @graph to hold multiple connected nodes, the @id references rather than repeated definitions, and the fact that nothing here asserts anything that wouldn't also be on the page.
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "Organization",
"@id": "https://example.com/#organization",
"name": "Example Labs",
"url": "https://example.com/",
"logo": "https://example.com/logo.png",
"sameAs": [
"https://www.linkedin.com/company/example-labs",
"https://www.crunchbase.com/organization/example-labs"
]
},
{
"@type": "WebSite",
"@id": "https://example.com/#website",
"url": "https://example.com/",
"name": "Example Labs",
"publisher": { "@id": "https://example.com/#organization" }
},
{
"@type": "Person",
"@id": "https://example.com/#luke",
"name": "Luke McLaughlin",
"url": "https://example.com/about",
"sameAs": [
"https://www.linkedin.com/in/example",
"https://github.com/example"
]
},
{
"@type": "Article",
"@id": "https://example.com/insights/structured-data#article",
"headline": "A structured-data playbook for rich results",
"datePublished": "2026-06-24",
"dateModified": "2026-06-24",
"author": { "@id": "https://example.com/#luke" },
"publisher": { "@id": "https://example.com/#organization" },
"isPartOf": { "@id": "https://example.com/#website" }
},
{
"@type": "BreadcrumbList",
"itemListElement": [
{ "@type": "ListItem", "position": 1, "name": "Home", "item": "https://example.com/" },
{ "@type": "ListItem", "position": 2, "name": "Insights", "item": "https://example.com/insights" },
{ "@type": "ListItem", "position": 3, "name": "Structured data playbook" }
]
}
]
}
</script>
The last breadcrumb item deliberately omits item because it's the current page. Small detail, but it's the kind of thing the validator catches and a careless template gets wrong. I generate blocks like this server-side from the same data that renders the visible page, so the byline, dates and breadcrumb in the JSON are literally the same variables that render the HTML. That's how you keep markup and content from diverging.
How clean markup gets penalised, and how to avoid it
Structured data can earn you a manual action under Google's spam policies, and the failure modes are predictable. I've cleaned up sites that lost their rich results overnight because they tripped one of these:
- Marked-up content that isn't visible to users. The cardinal sin. If your
FAQPagemarkup contains questions that don't appear on the page, or your review stars come from reviews users can't see, that's a violation. Everything in the JSON-LD must be present and visible in the rendered page. - Self-serving review markup. Adding
AggregateRatingto your own organisation or to a page you control, with ratings you wrote yourself, is against policy. Reviews must be genuine and, for many entity types, collected through a legitimate process rather than self-declared. - Irrelevant or misleading types. Putting
Recipemarkup on a non-recipe page to grab the carousel, or marking a category page as a singleProduct. The type must honestly describe the content. - Markup for hidden, blocked or gated content. If the marked-up content sits behind a tab, a paywall, or a
robots.txtblock, you're describing something the user and crawler can't reliably access.
The single rule that prevents almost all of this: markup describes what's on the page, never what you wish were on the page. If you find yourself wanting to assert something in JSON-LD that isn't visible, the fix is to add it to the page, not to the markup. This is also why I'm wary of third-party "schema injector" tools that layer markup on top of a page they don't control, they make divergence the default state.
Keeping markup in sync, and proving it stays that way
Drift is the quiet killer. The markup was correct on launch, then prices changed, an author left, a product went out of stock, articles got re-dated, and nobody updated the JSON-LD because it's invisible. Stale markup is worse than no markup: it's a credibility signal that's now lying.
The structural fix is to generate structured data from the same source of truth as the rendered content, never as a parallel hand-maintained artefact. If your price comes from the database, your Offer price must read the same field. If dateModified is in the markup, it should be wired to the CMS's actual last-modified timestamp, not a value someone typed once. When the markup and the page draw from one source, they can't disagree.
On top of that, monitor continuously rather than testing once:
- Rich Results Test for ad-hoc checks of a single URL, especially to confirm a new template renders eligible markup and to see the live, rendered JSON after JavaScript executes.
- The Schema Markup Validator (validator.schema.org) when you want pure schema.org conformance rather than Google's rich-result subset, useful for the comprehension-only types Google doesn't report on.
- Search Console enhancement reports as your real monitoring surface. These show valid, warning and error counts per type across your whole property over time, and the trend line is what matters. A sudden spike in errors usually means a template change broke the markup at scale, which is exactly the signal you can't get from spot-checking one URL.
- Validation in CI. On sites where it matters I add a build-time check that fetches representative page types and asserts the JSON-LD parses, contains the expected types, and that key fields are non-empty. It's the same instinct that drives the test suites in the local-first software I build: if something is load-bearing, a machine should verify it on every change rather than trusting that a human remembered.
Set a cadence: spot-check new templates with the Rich Results Test before they ship, watch the Search Console enhancement trend weekly, and treat any error spike as a regression to triage, not a nice-to-have.
Why this now matters for AI answers, not just blue links
The reason I've stopped treating structured data as a SERP-feature chase is that its highest-value use has shifted. Large language models grounding their answers in web content, whether through retrieval pipelines, AI Overviews, or assistant tools that browse, benefit from the same thing the Knowledge Graph always did: unambiguous, explicit statements about entities and their relationships.
When a model is assembling an answer and deciding what to cite, it's doing a comprehension and attribution task. Who published this? Who wrote it? Is this the authoritative source on the entity, or a thin aggregator? A page that clearly states its author as a real Person with sameAs links, its publisher as a defined Organization, and its facts in machine-readable form is dramatically easier to ground against and to attribute correctly than one where the model has to infer all of that from prose. This is the same problem I work on from the other side when building local-first AI tools and writing about how retrieval and grounding actually behave: clean, structured, self-consistent sources are the ones that survive the pipeline and get cited. Noisy or contradictory ones get dropped.
The connection between the two worlds is direct. The connected entity graph that helps Google build your Knowledge Panel is the same graph that helps an LLM decide your page is the authoritative answer worth quoting. sameAs disambiguation, consistent @id references, honest markup that matches the page, those aren't separate tactics for two audiences. They're one discipline serving both.
So here's where I'd point your effort over the next year. Stop optimising for SERP ornaments that Google keeps quietly retiring, and start treating structured data as the canonical machine-readable description of your business and your content. Build the entity graph properly, wire it to your real data so it can't drift, monitor it like the production asset it is, and keep every assertion honest. Do that and you're well-positioned for whatever the next interface is, whether that's a rich result, a Knowledge Panel, or a sentence in an AI answer with your name on the citation.
Where this fits in my work
This is the kind of technical-SEO and growth work I ship end to end, not just advise on. You can see the full portfolio of sites, software and publications I’ve built, browse what I do, request my technical SEO and AI search services, or get in touch about applying it to your site. Related reading: Getting cited by AI: AEO and GEO and Technical SEO that actually moves revenue.