TL;DR
Google discovers URLs from multiple sources: sitemaps, internal links, external backlinks, and its own historical index. Deleting a page doesn’t remove it from Google’s URL queue. Until Google crawls the deleted URL enough times and gets consistent 404 responses, it stays in the system. Old backlinks are the most common culprit, essentially telling Google “this URL matters” even when you’ve moved on.
The Conversation
Diane, owner of a 12-year-old recipe blog, 2,200 posts, $6K/month ad revenue, WordPress, basic SEO knowledge, noticed GSC showing thousands of 404 errors for URLs she doesn’t recognize: I’m getting anxiety every time I log into Search Console. There are like 4,000 “Not Found” errors and the number keeps growing. I’ve never even heard of half these URLs.
SDC SEO Brain: Those 404 errors are Google telling you it tried to access URLs that don’t exist. The growing number isn’t necessarily bad on its own. Google doesn’t penalize sites for having 404s. But 4,000 is significant. Let’s figure out where these URLs are coming from. Click into a few of the 404 errors and tell me what the URL patterns look like.
Diane: Okay, here’s a weird one: /recipe/grandmas-apple-pie-2013-version. I did have an apple pie recipe but I consolidated all my apple pie posts into one “ultimate” guide three years ago. The old ones got deleted.
SDC SEO Brain: That’s your answer for at least some of these. When you consolidated and deleted old posts, Google didn’t forget they existed. Google had those URLs in its index, possibly had backlinks pointing to them from other sites, and will keep trying to crawl them periodically to check if they’ve come back. What did you do when you deleted the old posts? Just delete, or redirect to the new consolidated post?
Diane: Just deleted. There were like 200 posts I cleaned up during that consolidation project. Was I supposed to redirect all of them?
SDC SEO Brain: For posts that had traffic, rankings, or backlinks, yes. Redirects tell Google “this old URL permanently moved to this new URL, transfer the signals.” Deletion tells Google “this URL is gone” but doesn’t transfer anything. If your old apple pie post had 50 backlinks from food blogs, those backlinks are now pointing at nothing. That link equity evaporated.
Diane: I had no idea. But that was three years ago. Why is Google still crawling them?
SDC SEO Brain: Because those external backlinks still exist. Every time a food blogger’s old post gets crawled, Google discovers the link to your deleted URL again. It’s like getting a fresh signal saying “hey, this URL exists.” Google puts it back in the crawl queue. This cycle can continue indefinitely as long as the backlinks remain.
Diane: So those 50 backlinks that used to help me are now hurting me?
SDC SEO Brain: Not hurting in a penalty sense. 404s don’t damage your site. But you’re missing out on the ranking benefit those links would provide if they pointed somewhere useful. It’s an opportunity cost, not a punishment. The 404 errors in Search Console are Google saying “I tried to collect this value you’re owed and the address doesn’t exist.”
Diane: Can I still redirect them after three years?
SDC SEO Brain: Yes, absolutely. Redirects work regardless of how long the original URL has been gone. Create 301 redirects from your deleted recipe URLs to the consolidated posts they merged into. Google will follow the redirects, find the new destination, and start associating those backlinks with your current content.
Diane: All 200 of them? That sounds like a nightmare.
SDC SEO Brain: You don’t need to redirect all 200. Check which deleted URLs actually have backlinks worth recovering. Use Ahrefs or Semrush, run your domain, look at broken backlinks or lost backlinks pointing to 404 pages. Prioritize redirecting URLs that have backlinks from real sites. A deleted URL with zero backlinks isn’t losing you anything.
Diane: What about internal links? Could those be causing some of these?
SDC SEO Brain: Yes, that’s the second most common source. Run Screaming Frog on your site and filter for broken internal links. If your posts still link to deleted content with old URLs, you’re telling Google “this URL is important, go crawl it” and Google is trying. Those internal links are also wasting your crawl budget on URLs that don’t exist.
Diane: I’m looking at some of these 404 URLs and they have really strange query parameters. Like /recipe/chocolate-cake/?utmsource=pinterest&utmmedium=social&ref=2847361. The chocolate cake post exists but that specific URL with all the junk at the end doesn’t.
SDC SEO Brain: Query parameters create technically different URLs. Pinterest, email marketing tools, affiliates, they all append tracking parameters. Each variation is a separate URL from Google’s perspective. If Google found a link somewhere to that specific parameterized version and it doesn’t resolve correctly, you get a 404 error for that exact URL string.
Diane: But the base URL works fine. Why doesn’t Google just load that?
SDC SEO Brain: Google treats each URL literally. It doesn’t assume that /chocolate-cake and /chocolate-cake/?utm=whatever should resolve the same way. Your server might handle it fine for users, but if there’s any behavior difference, like a redirect chain or a soft 404, Google logs it as a separate URL issue.
Diane: I think I’m starting to understand. Google has a massive list of every URL it’s ever heard of related to my site, and it keeps checking them?
SDC SEO Brain: Exactly. Google’s URL database for your domain includes everything it ever discovered from any source. Sitemaps, crawling, external links, internal links, even links in JavaScript or PDFs. Deleting content from your side doesn’t delete Google’s memory of those URLs. You have to either redirect them to valid destinations or let Google eventually give up after enough 404 responses.
Diane: How long until Google gives up?
SDC SEO Brain: It depends on the URL’s history. A URL that once ranked well or has ongoing backlinks might get recrawled for years. Google has seen that URL matter before, so it keeps checking in case it comes back. A URL that was never important and has no external signals might get dropped from the crawl queue within months. There’s no fixed timeline.
Diane: Can I force Google to stop crawling them?
SDC SEO Brain: You have three options. First, return proper 404 or 410 status codes. A 410 means “gone permanently” and tells Google to stop trying faster than a 404. Second, redirect to relevant content, which is better if the URL has backlink value. Third, if the URL truly shouldn’t exist and you want Google to forget it completely, you can use the Removals tool in Search Console, but this is temporary. It hides the URL from search results for about six months but doesn’t stop crawling.
Diane: What’s the difference between 404 and 410 in practice?
SDC SEO Brain: 404 means “not found right now.” Google interprets this as potentially temporary. 410 means “gone permanently, don’t come back.” Google treats 410 as a stronger signal to remove the URL from its index and reduce crawl frequency. For content you’ve intentionally deleted forever, 410 is the clearer signal. Most sites just use 404 for everything, which works but takes longer.
Diane: I’m also seeing some URLs with dates in them that don’t match my current permalink structure. Like /2015/03/chocolate-chip-cookies instead of just /chocolate-chip-cookies.
SDC SEO Brain: WordPress permalink changes. At some point you switched from date-based to post-name permalinks. Unless you set up redirects when making that change, Google still knows about the old date-based URLs from historical crawling and backlinks. Every time a site links to your old URL structure, Google tries to crawl it.
Diane: I changed permalinks like eight years ago. This is still haunting me?
SDC SEO Brain: Permalink changes without redirects create permanent URL fragmentation. Google now has two URL patterns for every post published before the switch. The old pattern returns 404, the new pattern works. If both patterns got backlinks over the years, you have link equity split across two URLs, one of which doesn’t resolve.
Diane: This is overwhelming. Where do I even start?
SDC SEO Brain: Prioritize by impact. First, identify deleted URLs with backlinks using Ahrefs. Set up 301 redirects for the top 20-30 that have the most referring domains. Second, fix internal broken links using Screaming Frog. This stops your own site from sending Google to dead URLs. Third, change your deleted pages to return 410 instead of 404 to accelerate Google dropping them. Fourth, add a wildcard redirect for your old date-based permalink structure pointing to the equivalent non-date URLs.
Diane: The wildcard redirect sounds complicated.
SDC SEO Brain: In WordPress, you can use a plugin like Redirection or add a rule to your .htaccess file. The pattern matches any URL starting with /year/month/ and redirects to the post-name version. Something like: RewriteRule ^([0-9]{4})/([0-9]{2})/(.*)$ /$3 [R=301,L]. This catches all your legacy date-based URLs in one rule.
Diane: Will fixing all this help my rankings or just clean up the error report?
SDC SEO Brain: Both. Cleaning up the errors reduces crawl waste, meaning Google spends more of its crawl budget on your actual content instead of chasing ghosts. Recovering backlink equity through redirects directly improves your rankings for the destination pages. A 12-year-old blog with years of deleted and redirected content probably has significant untapped link value sitting in those 404s.
Diane: How can I tell how much link value I’m missing?
SDC SEO Brain: In Ahrefs, go to your domain overview and look at “Best by links” filtered to 404 pages. Or check “Broken backlinks” under the backlinks section. Sort by referring domains. If you see deleted pages with 30, 50, 100 referring domains, that’s substantial link equity you’re not capturing. Each of those could be contributing to your domain authority if redirected properly.
Diane: One last thing. Some of these 404 URLs look like they were never real. Like complete gibberish paths. /asdf-recipe-3847-test or whatever. I definitely never created those.
SDC SEO Brain: Spam or crawler noise. Bots probe random URLs looking for vulnerabilities or trying to find hidden content. If Google found evidence of those URLs somewhere, maybe from a spammy site that linked to random paths on your domain, it will try to crawl them. These you can ignore. Return 404, and eventually Google will stop. They have no value to recover.
FAQ
Q: Does having lots of 404 errors hurt my SEO?
A: No. 404 errors are a normal part of how the web works and Google explicitly states they don’t harm rankings. What hurts is when 404 pages could be redirected to capture backlink value or when internal links point to deleted content, wasting crawl budget and user experience. The 404 count itself is just informational.
Q: What’s the difference between 404 and 410 status codes for SEO?
A: Both tell Google the page doesn’t exist, but 410 means “gone permanently” while 404 means “not found.” Google treats 410 as a stronger signal to remove the URL from its index and reduce crawl frequency. For content you’ve intentionally deleted forever, 410 helps Google stop trying faster. Most sites use 404 for everything, which works but takes longer for Google to give up on.
Q: How long will Google keep crawling deleted URLs?
A: There’s no fixed timeline. URLs that ranked well or have ongoing external backlinks might be recrawled for years because Google has evidence they mattered. URLs with no historical importance or external signals typically drop from the crawl queue within months. Setting up redirects or using 410 status codes accelerates the process.
Q: Should I redirect all deleted pages or just ones with backlinks?
A: Prioritize pages with backlinks. A redirect transfers link equity from the old URL to the new destination. A deleted page with no backlinks has no equity to transfer, so redirecting it doesn’t provide ranking benefit. Focus your effort on identifying and redirecting the deleted URLs that actually have referring domains pointing to them.
Summary
Google maintains a URL database that persists long after you delete content. External backlinks are the primary reason deleted URLs keep getting crawled, because each time a linking page gets crawled, Google rediscovers the reference to your deleted URL and adds it back to the queue.
The solution hierarchy is: redirect high-value deleted URLs with backlinks to relevant live pages, fix internal broken links to stop your own site from requesting dead URLs, and use 410 status codes for permanently deleted content to signal Google should stop trying faster.
Permalink structure changes create long-term fragmentation. Wildcard redirects can catch entire legacy URL patterns rather than requiring individual redirects for each old post.
404 errors don’t hurt rankings directly, but they represent missed opportunity. Every backlink pointing to a 404 is link equity that could be flowing to your live content through a redirect. For older sites with years of content changes, auditing and redirecting broken backlinks can unlock significant ranking potential.
Sources
- Google Search Central: Soft 404 errors documentation
- Google Search Central: Use 301 redirects for changed URLs
- Google Search Central: Crawl budget management
- Ahrefs: How to find and fix broken backlinks