Programmatic SEO: a practitioner's playbook

Programmatic SEO is building a large set of pages from a single template and a structured dataset, so one piece of engineering captures hundreds or thousands of long-tail queries. Done well, it is the highest-leverage growth channel I know. Done badly, it is a thin-content factory that drags the rest of your site down with it. The difference is almost never the code. It is judgement about what is worth building and the discipline to keep quality high at scale.

What it is, and when it is worth it

The mechanics are simple: a repeatable query pattern, a dataset that fills in the variables, and a template that renders one good page per row. Think "best [city] coworking spaces", "[product] vs [competitor]", "[job title] salary in [country]". Each combination is a real search someone types, and there are too many to write by hand. That is the premise: scale where hand-writing is uneconomic but demand is genuine.

It is worth it when three things are true at once: real, distributed search demand across the pattern (not one or two head terms with a long tail of zeroes), differentiated data for each page that a visitor cannot get more easily elsewhere, and intent that maps to your business so ranking feeds pipeline rather than vanity traffic. If any one is missing, programmatic SEO is the wrong tool. A pattern with demand but no unique data produces doorway pages; a pattern with great data but no demand produces a beautifully crafted archive nobody searches for.

Picking a scalable pattern with real intent

This decision determines whether the whole project earns or fails, and it happens before a line of code is written. I enumerate candidate patterns and validate each against demand and intent, not the other way round. The trap is falling in love with a pattern because it produces a satisfying number of URLs, then finding most of those URLs target searches with no volume.

So I pull search volume for a representative sample of the combinations, not the head term. If "[product] integrations" gets 50,000 searches a month but the specific tool combinations are zero for 90% of my list, the pattern justifies a few dozen pages, not ten thousand. I would rather ship 200 pages that each have demand than 10,000 where most are dead weight, because dead pages dilute perceived quality and waste crawl budget that should reach the pages that earn.

Intent is the second filter. A modifier like "vs", "alternatives", "pricing" or "for [use case]" signals someone close to a decision; a bare informational pattern may get traffic that never converts. I weight patterns by how directly the searcher's goal maps to something my client sells. The best programmatic projects sit one click from a transaction.

Data sources: the part that makes or breaks it

Unique value per page comes from data, and sourcing it honestly is the unglamorous core of the work. The options, roughly in order of how defensible they are:

  • Proprietary or first-party data. Usage statistics, aggregated anonymised customer data, your own pricing and inventory. The strongest position, because nobody can replicate it, and why marketplaces and SaaS products tend to win at programmatic SEO.
  • Public datasets, transformed. Government data, open APIs, registries. Available to everyone, so the value is in how you clean, combine and enrich them. Republishing a raw dataset is not a page; turning three datasets into a single answer nobody else has assembled is.
  • Computed or derived data. Comparisons, rankings and conversions you calculate from inputs. A "[salary] after tax in [country]" figure is generated but genuinely useful because it saves the visitor the work.
  • Editorial enrichment at scale. Even templated pages can carry a unique paragraph or curated list per row, often the line between "passes the quality bar" and "looks auto-generated".

Whatever the source, the data has to be accurate and current. A page showing a stale price or a defunct competitor is worse than no page, because it erodes trust on exactly the templates you are betting on. I build the refresh pipeline before the template, not after.

Building a template that is useful, not thin

The template is where good intentions go to die. The failure mode is a page where 90% of the content is identical boilerplate and 10% is a swapped-in variable, the textbook definition of thin, duplicate content. Google's systems are good at detecting near-duplicates across a pattern, and once they do, the whole template gets devalued, not just the weak pages.

My rule: each page must answer the specific query better than a generic page could, and a meaningful share of the visible content must be unique to that row. Practically, that means leading with the data, not the boilerplate. The comparison table, the figures, the curated list for this combination should be above the fold and substantial. Surrounding copy is fine, but it cannot be the bulk of the page or it reads as filler around a thin core.

I also build the template to fail gracefully. A combination with no data should not render a near-empty page that returns 200 OK, that is a soft 404 waiting to happen. Either it gets no URL, or it returns a proper status and stays out of the index. Let the data decide which pages deserve to exist. A genuinely useful page also tends to earn structured data naturally, product, FAQ or dataset markup that reflects what is on it; my structured-data playbook covers how to do that without tripping over Google's guidelines.

Internal linking and architecture at scale

You can render ten thousand perfect pages and have most of them earn nothing because nothing links to them. PageRank flows through internal links, and orphan pages with no links pointing in are starved no matter how good they are. At programmatic scale this is not a tidy-up at the end; it is part of the template design.

I think in hubs and spokes. The individual pages are spokes; they need curated hub pages that link to them in logical clusters, and the spokes should cross-link to their closest neighbours ("[product] vs A" links to "[product] vs B"). This helps discovery and builds topical relationships Google reads as expertise rather than a flat dump of similar URLs. The same discipline underpins ordinary content too, which I cover in topic clusters and internal linking; programmatic SEO is that pattern automated at scale. Keep click depth shallow, the money pages of any template within three clicks of a strong hub, and build the linking as part of rendering so it cannot be forgotten on the ten-thousandth page.

Managing crawl budget and the indexation trap

The moment you ship thousands of URLs, indexation stops being automatic and becomes something you actively manage. The most common outcome of a careless launch is the Search Console report filling with "Discovered, currently not indexed" and "Crawled, currently not indexed", and the two mean different things.

Discovered, currently not indexed usually means Google found the URL but has not bothered to crawl it, a crawl-budget signal: you created more URLs than Google judges worth its attention, often because pages are slow to serve or the site has not earned the authority to justify that crawl volume. Crawled, currently not indexed is harsher: Google crawled the page and decided it was not worth indexing, almost always a quality verdict on thin or duplicate content. A wall of the latter across a template means the template itself is the problem, and adding pages makes it worse.

The levers I reach for, in order: ship fewer, better pages so average quality stays high; keep XML sitemaps limited to canonical, indexable, 200 URLs so discovery effort is not wasted; make pages fast to serve so crawling is cheap; and keep low-value combinations out of the index with noindex rather than letting them dilute the pattern. Phasing the launch helps too: release a strong first tranche, confirm it indexes and ranks, then expand, rather than dumping the entire set on day one and hoping.

Quality guardrails against doorway pages

Programmatic SEO sits one bad decision away from violating Google's spam policies on scaled content abuse and doorway pages. Recent spam updates explicitly targeted "scaled content created primarily for ranking", which is precisely what a thin programmatic build is. So I treat a few guardrails as non-negotiable.

  • Every page must stand on its own. If a reasonable person landing from search finds a unique, satisfying answer, it earns its place. If they feel they hit a near-empty template, it does not.
  • No page exists only to funnel. Doorway pages whose only purpose is to capture a query and push the visitor elsewhere are the classic penalty trigger. Each page must be a destination, not a turnstile.
  • Set a minimum data threshold per page and enforce it in code, so a row without enough unique content simply does not produce an indexable URL.
  • Sample and review by hand. I read a random sample across the pattern and ask: would I have written this page deliberately? If the honest answer is no, the template needs work before it scales.

The mindset that keeps you safe is simple: build the pages you would build if Google did not exist, then scale them. Programmatic SEO is a way to produce genuinely useful pages efficiently, not a loophole for producing useless ones cheaply. The teams that win with it would be proud to show any single page to a sceptical user, multiplied a thousand times.

Where this fits in my work

Building and shipping programmatic templates that actually earn, picking the pattern, sourcing the data, designing the architecture, and managing indexation, is core to how I grow sites. You can request my services, see how I work as a technical SEO consultant or an AI-growth consultant, or get in touch about a programmatic project. Related reading: topic clusters and internal linking and a structured-data playbook for rich results and AI citations.

Back to all insights