How to Diagnose Why Google Is Not Indexing Your Pages

TL;DR

“Not indexed” is a symptom, not a diagnosis. Google’s decision to not index a page can stem from dozens of different causes: technical blocks preventing crawling, quality signals suggesting the page isn’t worth indexing, duplicate content consolidation, crawl budget exhaustion, or intentional noindex directives you forgot about. Effective diagnosis requires systematic elimination: first verify Google can access the page, then check for explicit blocks, then assess quality signals, then examine site-wide patterns. Most indexing problems fall into predictable categories, but the specific cause determines the fix. Guessing wastes time; systematic diagnosis finds root causes.

Do This Today (3 Quick Checks)

URL Inspection in GSC: Inspect your non-indexed page. Does it say “Crawled – not indexed” or “Discovered – not indexed”? These are completely different problems.

Live test: In URL Inspection, click “Test Live URL.” Compare the live result to the indexed version. Discrepancies reveal rendering or blocking issues.

Check robots.txt: Search your robots.txt for any rules that might block the page. A forgotten Disallow rule is a common culprit.

The Indexing Decision Tree

Google makes three sequential decisions about every URL:

DISCOVER → CRAWL → INDEX → RANK

Each stage can fail independently.

Stage	What Happens	If It Fails
<strong>Discover</strong>	Google finds the URL exists	Page never enters Google's queue
<strong>Crawl</strong>	Googlebot requests the page	Page is "Discovered – not indexed"
<strong>Index</strong>	Google adds page to search index	Page is "Crawled – not indexed"
<strong>Rank</strong>	Google shows page for queries	Page indexed but invisible in search

Critical insight: “Discovered – not indexed” means Google knows about the page but hasn’t bothered to crawl it. “Crawled – not indexed” means Google saw the content and decided it wasn’t worth indexing. These require completely different fixes.

Complete GSC Coverage Status Reference

Status	Meaning	Primary Cause	Action
<strong>Indexed</strong>	In Google's index	Success	Monitor
<strong>Discovered – not indexed</strong>	Known but not crawled	Low priority signals	Improve internal linking, reduce crawl waste
<strong>Crawled – not indexed</strong>	Crawled, not indexed	Quality/value insufficient	Improve content quality, differentiate
<strong>Duplicate without user canonical</strong>	Duplicate detected, no canonical	Missing canonical tag	Add canonical
<strong>Duplicate, Google chose different</strong>	Your canonical ignored	Google disagrees	Review canonical target quality
<strong>Alternate page with proper canonical</strong>	Intentional duplicate	Working as intended	No action needed
<strong>Blocked by robots.txt</strong>	Can't crawl	robots.txt rule	Remove block if unintentional
<strong>Blocked due to unauthorized request</strong>	401/403 response	Authentication required	Fix access or noindex intentionally
<strong>Not found (404)</strong>	Page doesn't exist	Deleted or broken	Fix URL or remove from sitemap
<strong>Soft 404</strong>	Page looks like error	Thin/empty content	Add substantial content
<strong>Redirect</strong>	Page redirects	Redirect in place	Update sitemap to final URL
<strong>Page with redirect</strong>	Redirects elsewhere	Redirect configured	Ensure redirect is correct
<strong>Excluded by noindex</strong>	noindex present	Intentional or accidental	Remove if unintentional

Bulk URL Analysis Workflow

For sites with thousands of non-indexed pages:

Step 1: Export from GSC

Indexing → Pages → Export affected URLs
Get full list of “Discovered – not indexed” and “Crawled – not indexed”

Step 2: Enrich with data

Data Source	What to Add	How
GSC Performance	Impressions, clicks	Export and VLOOKUP by URL
Analytics	Organic traffic	Export and VLOOKUP
Ahrefs/Semrush	Backlinks, DR	Batch analysis
Screaming Frog	Internal links, depth	Crawl and export
CMS/Database	Page type, category	Export from CMS

Step 3: Create analysis spreadsheet

Column	Source	Purpose
URL	GSC	Identifier
Status	GSC	Discovered vs Crawled
Last crawl date	GSC URL Inspection	Recency
Internal links	Screaming Frog	Priority signal
External backlinks	Ahrefs	Value indicator
Page type	CMS	Category analysis
Word count	Screaming Frog	Quality indicator
Priority score	Calculated	Triage decision

Step 4: Calculate priority score

Priority Score = 
  (Has backlinks × 3) +
  (Has impressions × 2) +
  (Internal links > 5 × 2) +
  (Word count > 500 × 1) +
  (Strategic page type × 2)

Step 5: Batch triage

Score	Count	Action
7+	Highest priority	Immediate improvement
4-6	Medium priority	Scheduled improvement
1-3	Low priority	Consolidate or remove
0	No signals	Remove or noindex

Indexing Timeline Expectations

Site Size	Typical Discovery Time	Typical Index Decision
Small (<1K pages)	Days to weeks	2-4 weeks after crawl
Medium (1K-50K)	Weeks to months	2-6 weeks after crawl
Large (50K-500K)	Months for full coverage	4-8 weeks after crawl
Very large (500K+)	Ongoing, never complete	4-12 weeks after crawl

After making improvements:

Change Type	Recrawl Time	Index Decision
Content enhancement	1-4 weeks	2-4 weeks after recrawl
Internal linking	1-3 weeks	2-4 weeks after recrawl
Technical fix	Days to weeks	1-2 weeks after recrawl
Site-wide quality improvement	Weeks to months	Next core update (2-4 months)

The Conversation

Yuki, SEO Manager at an e-commerce company with 50,000 product pages, only 28,000 are indexed according to GSC, has been trying various fixes for months with no improvement, frustrated and confused about conflicting advice, intermediate SEO knowledge: We have 50,000 product pages but only 28,000 are indexed. I’ve submitted sitemaps, requested indexing manually, improved page speed, added content. Nothing works. Some pages have been “Discovered – not indexed” for over a year. Others are “Crawled – not indexed.” I don’t understand why Google is ignoring 22,000 pages.

SDC SEO Brain: You’re treating two different problems as one. “Discovered – not indexed” and “Crawled – not indexed” have completely different causes and fixes. Let’s separate them first. In GSC, go to Indexing → Pages. What’s the breakdown between these two statuses?

Yuki: About 8,000 are “Discovered – currently not indexed” and 12,000 are “Crawled – currently not indexed.” The remaining 2,000 are various other issues.

SDC SEO Brain: Good. Now we have two distinct problems to solve.

Problem 1: 8,000 “Discovered – not indexed”

This means Google knows these URLs exist but hasn’t crawled them yet. Causes:

Cause	How to Identify	Fix
<strong>Crawl budget exhaustion</strong>	High-traffic site, many URLs, slow server	Reduce waste, improve speed
<strong>Low perceived value</strong>	New pages, thin content signals, few internal links	Improve signals before crawl
<strong>Crawl queue backlog</strong>	Recently added pages, large site	Wait + improve priority signals
<strong>Technical crawl blocks</strong>	robots.txt, server errors during crawl attempts	Check logs for Googlebot access

Let’s diagnose. How old are these 8,000 pages on average?

Yuki: Mixed. Some are new products from the last few months. But some have been live for over a year.

SDC SEO Brain: Pages stuck in “Discovered” for over a year is a strong signal of low crawl priority. Google is saying: “I know this exists but I have no urgency to look at it.” Let’s check why.

Diagnostic step 1: Internal link audit

Sample 20 of the year-old “Discovered – not indexed” pages. For each one:

How many internal links point to it?
Where do those links come from?
What’s the click depth from homepage?

Yuki: Checking a few… Most have only 1-2 internal links, usually from their category page. Click depth is 4-5 from homepage. Some are only linked from the sitemap.

SDC SEO Brain: There’s your problem. Pages with 1-2 internal links and high click depth are sending signals of low importance. Google prioritizes pages that:

Have many internal links (votes of importance)
Are close to homepage (architecturally prominent)
Are linked from high-value pages (not just categories)

Sitemap-only discovery is the weakest signal. Google sees: “This URL exists” but no context about importance.

Fix for “Discovered – not indexed”:

Action	Impact	Effort
Add internal links from high-traffic pages	High	Medium
Reduce click depth (flatten architecture)	High	High
Improve category page prominence	Medium	Medium
Link from blog content to products	Medium	Low
Ensure products appear in filtered navigation	Medium	Low

Yuki: What about the 12,000 “Crawled – not indexed”? That seems worse.

SDC SEO Brain: It is more concerning. “Crawled – not indexed” means Google visited the page, evaluated it, and decided: “This doesn’t deserve a spot in my index.” That’s an active rejection, not just neglect.

Problem 2: 12,000 “Crawled – not indexed”

Common causes:

Cause	Signal in GSC	How to Verify
<strong>Thin content</strong>	No specific signal	Manual review – is content substantial?
<strong>Duplicate content</strong>	May show canonical elsewhere	Check if Google chose different canonical
<strong>Low quality</strong>	No specific signal	Compare to indexed competitors
<strong>Soft 404</strong>	Sometimes flagged separately	URL Inspection shows soft 404
<strong>Recent page, pending decision</strong>	Recently crawled date	Check last crawl date
<strong>Noindex (rendered)</strong>	May not show in GSC	Check rendered HTML for noindex

Let’s sample these. Pick 10 “Crawled – not indexed” products and send me their URLs (or describe them).

Yuki: Looking at 10 random ones… They’re products with:

Product name as title
2-3 sentence description
Price and specs in a table
1-2 product images
No reviews

SDC SEO Brain: I see several potential issues:

Issue 1: Thin content
2-3 sentences is extremely thin for a product page. Compare to competitors ranking for these products. How much content do their pages have?

Yuki: Competitor product pages have longer descriptions, buyer guides, multiple customer reviews, Q&A sections, comparison tables.

SDC SEO Brain: That’s your gap. Your pages have minimal content; competitors have comprehensive pages. Google sees your pages as adding no unique value.

Issue 2: No differentiating signals
Without reviews, Q&A, or unique content, your pages might be near-duplicates of:

Manufacturer product pages
Other retailer pages selling the same product
Your own similar product variants

Google might be thinking: “Why index this when identical information exists elsewhere?”

Yuki: But we have 28,000 pages indexed with similar content levels.

SDC SEO Brain: Good observation. Let’s compare indexed vs non-indexed. What’s different about the 28,000 that ARE indexed?

Check:

Do indexed pages have more reviews?
Are they older (established authority)?
Do they have more internal links?
Are they higher in the site hierarchy?
Do they get more organic traffic historically?

Yuki: Indexed pages… most have at least some reviews. They’re generally older. They do seem to have more internal links from cross-sell widgets.

SDC SEO Brain: Pattern emerging:

Reviews = unique content = indexed
No reviews = thin/duplicate = not indexed

This is actually good news because it’s actionable.

Yuki: How do I fix 12,000 thin product pages?

SDC SEO Brain: You won’t write unique content for 12,000 pages. Options:

Option 1: Prioritize and tier

Tier	Criteria	Action
Priority (top 1,000)	Highest margin, search volume	Manual content enhancement
Secondary (next 3,000)	Moderate potential	Template enhancement + review push
Long-tail (remaining)	Low individual value	Consider consolidation or noindex

Option 2: Programmatic enhancement

Aggregate reviews from manufacturer or third-party sources
Add structured spec comparisons with similar products
Pull in user-generated Q&A
Generate comparison content with related products

Option 3: Strategic noindex
For pages with near-zero search potential:

Noindex intentionally
Reduces crawl waste
Improves quality ratio of indexed pages

Yuki: Wait, intentionally noindex products?

SDC SEO Brain: Counterintuitive but sometimes correct. Consider:

A product page with 0 monthly search demand
Thin content, no reviews
Google already refuses to index it
It consumes crawl budget
It dilutes your site’s average quality

Noindexing it:

Formalizes Google’s decision (no more wasted crawl attempts)
Improves your indexed content ratio
Lets you focus resources on pages with potential

Yuki: How do I know which pages have no search potential?

SDC SEO Brain: Cross-reference with keyword data:

Export your “Crawled – not indexed” URLs
For each product, check search volume for product name/SKU
Check GSC for any impressions (even if not indexed, you might have impression data from brief indexing)
Categorize:

Category	Criteria	Recommendation
Has search demand	>50 monthly searches	Invest in content, push for indexing
Has some signals	10-50 searches OR has impressions	Template enhancement
No demand evidence	0 searches, 0 impressions, no potential	Consider noindex or consolidation

Yuki: What about pages Google crawled recently? Should I wait?

SDC SEO Brain: Check last crawl date in URL Inspection. If Google crawled within the last 2-3 weeks, it might still be processing. But if it crawled 6+ months ago and remains “Crawled – not indexed,” Google has made a decision. Waiting won’t change it without changes to the page.

Yuki: How long until I see results from fixes?

SDC SEO Brain: Timeline expectations:

Fix Type	Time to Crawl	Time to Index Decision
Content enhancement	Days to weeks	2-4 weeks after crawl
Internal linking improvement	Days to weeks	2-4 weeks after crawl
Technical fix (removing blocks)	Days	1-2 weeks
Site-wide quality improvement	Weeks to months	Months

For your 50,000 page site, don’t expect to go from 28,000 to 50,000 indexed quickly. A more realistic goal:

Month 1-2: Diagnose and categorize all non-indexed pages
Month 2-4: Implement fixes for priority tier
Month 4-6: Expand to secondary tier
Month 6-12: Evaluate if long-tail pages should remain

Yuki: What about requesting indexing in GSC?

SDC SEO Brain: “Request indexing” triggers a crawl, but it doesn’t guarantee indexing. If your page is thin or low-quality, requesting indexing just makes Google confirm its decision faster.

Use “Request indexing” for:

Genuinely updated pages where content has improved
Critical pages you need crawled urgently
Pages where you’ve fixed a technical issue

Don’t use it for:

Mass submission of unchanged pages
Pages you haven’t improved
Trying to “force” indexing

Google’s daily limit and the marginal impact make it a poor strategy for scale problems.

Diagnostic Cheat Sheet

Step 1: Identify the status

GSC Status	Meaning	Primary Fix Direction
Discovered – not indexed	Google hasn't crawled yet	Improve crawl priority signals
Crawled – not indexed	Google crawled, decided not to index	Improve content quality/uniqueness
Duplicate, Google chose different canonical	Google consolidated to another URL	Check canonical setup
Blocked by robots.txt	Can't crawl due to block	Remove block if unintentional
Noindex detected	Page has noindex directive	Remove noindex if unintentional
Soft 404	Page looks like error page	Add substantial content
Redirect	Page redirects elsewhere	Check redirect chain
Not found (404)	Page doesn't exist	Fix if should exist

Step 2: For “Discovered – not indexed”

Check	How	If Problem Found
Internal links	Screaming Frog inlinks report	Add more internal links
Click depth	Screaming Frog crawl depth	Flatten architecture
Robots.txt	Test URL in robots.txt tester	Remove unintentional blocks
Server logs	Check if Googlebot attempted crawl	Investigate failures
Site speed	Is server slow when Googlebot visits?	Improve server response

Step 3: For “Crawled – not indexed”

Check	How	If Problem Found
Content depth	Compare to indexed competitors	Enhance content
Uniqueness	Search key sentences in quotes	Add unique value
Canonical	Check rendered canonical tag	Fix canonical issues
Quality signals	Manual review vs quality guidelines	Improve E-E-A-T signals
Duplicate variants	Check for near-duplicate pages	Consolidate or differentiate

FAQ

Q: How long should I wait before considering a page permanently not indexed?
A: If a page has been “Crawled – not indexed” for 6+ months without changes, Google has made a stable decision. If “Discovered – not indexed” for 12+ months, crawl priority is severely low. Both require action, not waiting.

Q: Does requesting indexing help?
A: It triggers a crawl, not indexing. Useful for fixed pages, useless for pages that haven’t changed.

Q: If Google chose a different canonical, is my page indexed?
A: No, your page is not indexed. The canonical target is indexed. Your page contributes signals to the canonical but doesn’t appear in search results itself.

Q: Can too many non-indexed pages hurt my site overall?
A: Yes. High ratio of thin or low-quality pages sends negative site-wide signals. This is why strategic noindexing can improve overall performance.

Q: Why are my competitors’ thin pages indexed when mine aren’t?
A: Domain authority matters. High-authority sites get more pages indexed with less content. Building authority is a long-term solution.

Summary

Indexing problems require diagnosis before solutions. “Not indexed” has many causes with different fixes.

Separate the statuses:

Discovered – not indexed = crawl priority problem
Crawled – not indexed = quality/value problem

For discovery problems:

Improve internal linking
Reduce click depth
Fix technical crawl barriers
Reduce crawl waste elsewhere

For indexing rejections:

Enhance content depth
Add unique value (reviews, Q&A, guides)
Differentiate from duplicates
Consider strategic noindexing for no-value pages

Prioritize by business value:

Not all pages deserve indexing investment
Focus resources on pages with search demand and revenue potential
Long-tail consolidation often better than page-by-page fixes

Timeline reality:

Diagnosis: 1-2 weeks
Implementation for priority pages: 1-2 months
Visible improvements: 2-4 months
Full remediation: 6-12 months for large sites

Sources

Google Search Central: Why pages aren’t indexed – https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
Google: URL Inspection tool – https://support.google.com/webmasters/answer/9012289
Google Search Central: Indexing coverage report – https://support.google.com/webmasters/answer/7440203

SDC SEO