TL;DR
Last-Modified headers tell browsers when a page was last updated, but Google largely ignores them for crawl prioritization. Google relies more on content change detection, sitemap lastmod dates, and crawl history patterns than server headers. If you’re using Last-Modified headers hoping to signal freshness to Google, you’re optimizing for a signal Google doesn’t meaningfully use.
The Conversation
Kevin, web developer at a news publisher, 200,000 articles, high crawl volume, intermediate SEO knowledge, trying to optimize crawl efficiency: We’ve implemented Last-Modified headers on all our pages to help Google know when content changed. But Google still seems to crawl old articles that haven’t been updated in years while missing fresh updates.
SDC SEO Brain: Last-Modified headers are useful for browsers and CDN caching, but Google has stated they don’t rely heavily on them for crawl decisions. Google has its own methods for detecting content freshness that don’t depend on server-reported timestamps.
Kevin: Why would Google ignore a perfectly good signal?
SDC SEO Brain: Three reasons, and they’re all about trust hierarchy. First, Last-Modified headers can be trivially gamed. Any site can set Last-Modified to the current timestamp on every request, making Google think every page is constantly updating. Second, headers are often inaccurate by accident. Many CMS platforms set Last-Modified based on server restart time, not content modification. Third, and most importantly, Google has a more reliable alternative: content fingerprinting.
Kevin: Content fingerprinting?
SDC SEO Brain: Google stores a hash of your rendered content on each crawl. When it crawls again, it computes a new hash. If the hashes differ, the content changed, period. No server header can lie about that. This is computationally more expensive than checking a header, but it’s the ground truth. Headers are claims; content comparison is evidence. Google trusts evidence.
Kevin: So Google’s own content comparison is the real freshness signal?
SDC SEO Brain: Exactly. Google compares page content between crawls. If the rendered content changed since the last visit, Google knows the page was updated, regardless of what headers claim. Google also uses signals like sitemap lastmod dates as hints, though it’s confirmed these are hints not directives. Historical crawl patterns matter too: if a page has changed frequently in the past, Google is more likely to recrawl it frequently.
Kevin: Our sitemap has accurate lastmod dates. Are those being used?
SDC SEO Brain: Google considers sitemap lastmod as a signal, but with caveats. If your sitemap lastmod has been historically accurate, Google may give it more weight. If it’s frequently inaccurate, Google learns to ignore it. And even accurate lastmod only suggests a page changed, not that Google should prioritize crawling it.
Kevin: How do I know if Google trusts our sitemap lastmod?
SDC SEO Brain: There’s no direct way to know. But you can test correlation. Compare your sitemap lastmod updates to actual crawl dates in Search Console’s crawl stats. If pages with recent lastmod get crawled quickly after update, Google is likely using the signal. If there’s no correlation, Google may be ignoring your lastmod.
Kevin: We have 200,000 articles. Google can’t crawl all of them frequently. How do I prioritize?
SDC SEO Brain: Google prioritizes based on page importance signals: how much traffic a page gets, how many internal and external links it has, how often it’s historically changed, and how often users search for related queries. High-importance pages get crawled more often regardless of headers.
Kevin: Can we influence that prioritization?
SDC SEO Brain: To some extent. Internal linking affects perceived importance. Pages linked from your homepage and main navigation are signaled as more important than pages five clicks deep. External backlinks matter too. Pages with more referring domains are crawled more frequently.
Kevin: What about the Crawl Rate settings in Search Console?
SDC SEO Brain: Those settings limit crawl rate to prevent server overload, not to increase crawl frequency. You can tell Google to crawl slower, but you can’t tell Google to crawl faster. Google determines crawl priority, you can only set a ceiling on rate.
Kevin: So there’s nothing we can do about old articles not getting crawled?
SDC SEO Brain: A few things help. First, if an old article gets significant updates, create internal links to it from fresh content. A new article linking to an old one signals that the old article might be relevant again. Second, submit the URL through Search Console’s URL inspection and request indexing. This works for individual URLs but doesn’t scale to thousands.
Kevin: We have article updates daily. We can’t manually submit each one.
SDC SEO Brain: For scale, focus on structural signals. Ensure your sitemap is segmented: news content in a news sitemap with real-time lastmod updates, evergreen content in a separate sitemap updated less frequently. Google handles different sitemap types differently.
Kevin: We do have a news sitemap. But Google News has its own rules, right?
SDC SEO Brain: Yes, Google News prioritizes recent content heavily. For news sitemaps, recency and publication date matter more than for regular sitemaps. But even Google News looks at content signals, not just timestamps. A headline change with no content change may not trigger the same response as a substantive update.
Kevin: What signals does Google use to detect content changes?
SDC SEO Brain: Google compares rendered page content between crawls. Significant text changes, structural changes, and new content sections register as updates. Minor changes like footer dates or ad units typically don’t register as substantive changes. Google is trying to distinguish between meaningful updates and cosmetic changes.
Kevin: We update timestamps on every page load. Could that be confusing Google?
SDC SEO Brain: It might make your headers useless but likely doesn’t confuse Google’s content detection. Google is comparing actual page content, not timestamp metadata. If your content doesn’t change but your timestamp does, Google will see through that. The bigger problem is you’ve trained Google that your timestamps are meaningless.
Kevin: Should we stop sending Last-Modified headers entirely?
SDC SEO Brain: Keep them for browser caching and CDN efficiency, but don’t rely on them for SEO. Your browser cache and CDN can use Last-Modified to serve cached content efficiently. Just understand that Google has its own freshness detection that doesn’t depend on what you claim in headers.
Kevin: What’s the most effective thing we can do to improve crawl distribution?
SDC SEO Brain: Site architecture and link structure. Make sure important pages are well-linked, recent important content gets featured on high-traffic pages, and your sitemap accurately reflects content importance and recency. These structural signals carry more weight than any header or timestamp optimization.
FAQ
Q: Does Google use Last-Modified headers for crawling decisions?
A: Minimally, if at all. Google has confirmed they don’t rely heavily on server headers for freshness signals. Google detects content changes by comparing actual page content between crawls, which is more reliable than trusting server-reported timestamps.
Q: Are sitemap lastmod dates important for SEO?
A: They’re a signal, not a directive. Google considers lastmod as a hint that content changed, but only if your lastmod has been historically accurate. Inaccurate lastmod dates cause Google to learn to ignore your sitemap timestamps.
Q: How can I get Google to crawl updated content faster?
A: Structural signals work better than timestamps. Link to updated content from high-traffic pages, ensure updated pages are well-connected in your site architecture, and maintain accurate sitemap lastmod dates. For individual urgent pages, use Search Console’s URL inspection to request indexing.
Q: Can I increase Google’s crawl rate for my site?
A: No. Search Console’s crawl rate settings only let you limit crawl rate, not increase it. Google determines crawl priority based on page importance, historical change patterns, and search demand. You can influence perceived importance through internal linking and content quality.
Summary
Last-Modified headers are useful for browser and CDN caching but Google doesn’t meaningfully rely on them for crawl prioritization. Server-reported timestamps can be gamed or misconfigured, so Google uses its own content change detection.
Google compares actual page content between crawls to determine if pages were updated. This method is more reliable than trusting what sites claim about themselves.
Sitemap lastmod dates are considered as hints, not directives. If your lastmod has been historically accurate, Google may give it some weight. Inaccurate lastmod dates cause Google to learn to ignore your timestamps.
Crawl prioritization is driven by page importance signals: internal linking, external backlinks, historical change frequency, and traffic patterns. These structural signals matter more than any timestamp or header optimization.
Site architecture is your primary crawl efficiency lever. Well-linked pages get crawled more frequently than isolated pages, regardless of freshness claims.
Sources
- Google Search Central: Sitemaps overview
- Google Search Central: Crawl budget management
- Google Search Central: URL inspection tool