Tracking parameters, sort orders, and filter combinations can multiply a single page into thousands of crawlable near-duplicates. Here's how to keep parameterized URLs from draining your crawl budget.
A URL parameter is anything that appears after a ? in a web address — ?utm_source=newsletter, ?sort=price_asc, ?color=blue&size=m. Parameters are useful: they let a single page template serve a tracking campaign, a sort order, or a filtered product list without creating a new physical page for every combination. The SEO problem starts when those combinations get crawled and indexed as if they were distinct pages, even though they show largely the same content as the URL without the parameter.
Three things go wrong when parameters multiply unchecked. First, crawl budget gets wasted — a site with a few hundred real pages can generate millions of theoretically reachable parameter combinations through faceted navigation (filter by color, then size, then brand, then price, in any order), and crawlers spend their limited visits on near-duplicates instead of your real content. Second, ranking signals split — if /shoes?color=red and /shoes?color=blue&color=red both get indexed separately, links and engagement that should consolidate on one canonical page get divided across several. Third, duplicate content dilutes relevance signals — Google has to do extra work to figure out which version is the "real" one, and sometimes guesses wrong.
Not every parameter is a problem. It helps to sort them into categories:
utm_source, fbclid, gclid, ref) — these never change page content, only attribution. They should always canonicalize to the clean URL.?sort=price_asc) — same items, different order. Almost always should canonicalize to the unsorted version.?color=red, ?in_stock=true) — genuinely different content (a subset of items), but usually too granular to be worth its own indexed page unless that specific filter combination has real standalone search demand.?page=2) — distinct content per page, generally worth letting search engines crawl, though each page should still point back to itself canonically, not to page 1.?session_id=...) — pure noise, should never be crawled or indexed at all.The most common mistake is reaching for robots.txt as the first fix. Disallowing /*?* in robots.txt stops crawling, but if those parameterized URLs were already indexed, blocking them in robots.txt prevents Google from re-crawling the page to discover a noindex or canonical tag — so the stale, parameterized version can linger in the index indefinitely with no description. The correct order of operations is the opposite:
<link rel="canonical" href="https://example.com/shoes" /> on every /shoes?... variant.Build the canonical-tag logic into your page template once — every URL for the same underlying content should emit the identical canonical href, regardless of which parameters are present in the address bar.
E-commerce and directory sites with multi-select filters are the worst-case scenario for parameter explosion. A practical rule: pick a small number of filter combinations that have genuine search demand (verified via keyword research, not guesswork) and make those crawlable, indexable, canonical pages in their own right — ideally with clean, static-looking URLs rather than query strings. Every other combination should canonicalize back to the unfiltered category page. This turns an infinite combinatorial problem into a deliberate, finite set of pages worth ranking.
If you're restructuring URLs as part of this cleanup, run the before-and-after pairs through the 301 Redirect Generator so nothing 404s, and double check the canonical guidance in our canonical tags guide if you're unsure when a 301 redirect is more appropriate than a canonical tag — they solve overlapping but not identical problems.
No sign-up required — use them instantly in your browser.