TL;DR
“Not indexed” is a symptom, not a diagnosis. Google’s decision to not index a page can stem from dozens of different causes: technical blocks preventing crawling, quality signals suggesting the page isn’t worth indexing, duplicate content consolidation, crawl budget exhaustion, or intentional noindex directives you forgot about. Effective diagnosis requires systematic elimination: first verify Google can access the page, then check for explicit blocks, then assess quality signals, then examine site-wide patterns. Most indexing problems fall into predictable categories, but the specific cause determines the fix. Guessing wastes time; systematic diagnosis finds root causes.
Do This Today (3 Quick Checks)
- URL Inspection in GSC: Inspect your non-indexed page. Does it say “Crawled – not indexed” or “Discovered – not indexed”? These are completely different problems.
- Live test: In URL Inspection, click “Test Live URL.” Compare the live result to the indexed version. Discrepancies reveal rendering or blocking issues.
- Check robots.txt: Search your robots.txt for any rules that might block the page. A forgotten Disallow rule is a common culprit.
The Indexing Decision Tree
Google makes three sequential decisions about every URL:
DISCOVER → CRAWL → INDEX → RANK
Each stage can fail independently.
| Stage | What Happens | If It Fails |
|---|---|---|
| <strong>Discover</strong> | Google finds the URL exists | Page never enters Google's queue |
| <strong>Crawl</strong> | Googlebot requests the page | Page is "Discovered – not indexed" |
| <strong>Index</strong> | Google adds page to search index | Page is "Crawled – not indexed" |
| <strong>Rank</strong> | Google shows page for queries | Page indexed but invisible in search |
Critical insight: “Discovered – not indexed” means Google knows about the page but hasn’t bothered to crawl it. “Crawled – not indexed” means Google saw the content and decided it wasn’t worth indexing. These require completely different fixes.
Complete GSC Coverage Status Reference
| Status | Meaning | Primary Cause | Action |
|---|---|---|---|
| <strong>Indexed</strong> | In Google's index | Success | Monitor |
| <strong>Discovered – not indexed</strong> | Known but not crawled | Low priority signals | Improve internal linking, reduce crawl waste |
| <strong>Crawled – not indexed</strong> | Crawled, not indexed | Quality/value insufficient | Improve content quality, differentiate |
| <strong>Duplicate without user canonical</strong> | Duplicate detected, no canonical | Missing canonical tag | Add canonical |
| <strong>Duplicate, Google chose different</strong> | Your canonical ignored | Google disagrees | Review canonical target quality |
| <strong>Alternate page with proper canonical</strong> | Intentional duplicate | Working as intended | No action needed |
| <strong>Blocked by robots.txt</strong> | Can't crawl | robots.txt rule | Remove block if unintentional |
| <strong>Blocked due to unauthorized request</strong> | 401/403 response | Authentication required | Fix access or noindex intentionally |
| <strong>Not found (404)</strong> | Page doesn't exist | Deleted or broken | Fix URL or remove from sitemap |
| <strong>Soft 404</strong> | Page looks like error | Thin/empty content | Add substantial content |
| <strong>Redirect</strong> | Page redirects | Redirect in place | Update sitemap to final URL |
| <strong>Page with redirect</strong> | Redirects elsewhere | Redirect configured | Ensure redirect is correct |
| <strong>Excluded by noindex</strong> | noindex present | Intentional or accidental | Remove if unintentional |
Bulk URL Analysis Workflow
For sites with thousands of non-indexed pages:
Step 1: Export from GSC
- Indexing → Pages → Export affected URLs
- Get full list of “Discovered – not indexed” and “Crawled – not indexed”
Step 2: Enrich with data
| Data Source | What to Add | How |
|---|---|---|
| GSC Performance | Impressions, clicks | Export and VLOOKUP by URL |
| Analytics | Organic traffic | Export and VLOOKUP |
| Ahrefs/Semrush | Backlinks, DR | Batch analysis |
| Screaming Frog | Internal links, depth | Crawl and export |
| CMS/Database | Page type, category | Export from CMS |
Step 3: Create analysis spreadsheet
| Column | Source | Purpose |
|---|---|---|
| URL | GSC | Identifier |
| Status | GSC | Discovered vs Crawled |
| Last crawl date | GSC URL Inspection | Recency |
| Internal links | Screaming Frog | Priority signal |
| External backlinks | Ahrefs | Value indicator |
| Page type | CMS | Category analysis |
| Word count | Screaming Frog | Quality indicator |
| Priority score | Calculated | Triage decision |
Step 4: Calculate priority score
Priority Score =
(Has backlinks × 3) +
(Has impressions × 2) +
(Internal links > 5 × 2) +
(Word count > 500 × 1) +
(Strategic page type × 2)
Step 5: Batch triage
| Score | Count | Action |
|---|---|---|
| 7+ | Highest priority | Immediate improvement |
| 4-6 | Medium priority | Scheduled improvement |
| 1-3 | Low priority | Consolidate or remove |
| 0 | No signals | Remove or noindex |
Indexing Timeline Expectations
| Site Size | Typical Discovery Time | Typical Index Decision |
|---|---|---|
| Small (<1K pages) | Days to weeks | 2-4 weeks after crawl |
| Medium (1K-50K) | Weeks to months | 2-6 weeks after crawl |
| Large (50K-500K) | Months for full coverage | 4-8 weeks after crawl |
| Very large (500K+) | Ongoing, never complete | 4-12 weeks after crawl |
After making improvements:
| Change Type | Recrawl Time | Index Decision |
|---|---|---|
| Content enhancement | 1-4 weeks | 2-4 weeks after recrawl |
| Internal linking | 1-3 weeks | 2-4 weeks after recrawl |
| Technical fix | Days to weeks | 1-2 weeks after recrawl |
| Site-wide quality improvement | Weeks to months | Next core update (2-4 months) |
The Conversation
Yuki, SEO Manager at an e-commerce company with 50,000 product pages, only 28,000 are indexed according to GSC, has been trying various fixes for months with no improvement, frustrated and confused about conflicting advice, intermediate SEO knowledge: We have 50,000 product pages but only 28,000 are indexed. I’ve submitted sitemaps, requested indexing manually, improved page speed, added content. Nothing works. Some pages have been “Discovered – not indexed” for over a year. Others are “Crawled – not indexed.” I don’t understand why Google is ignoring 22,000 pages.
SDC SEO Brain: You’re treating two different problems as one. “Discovered – not indexed” and “Crawled – not indexed” have completely different causes and fixes. Let’s separate them first. In GSC, go to Indexing → Pages. What’s the breakdown between these two statuses?
Yuki: About 8,000 are “Discovered – currently not indexed” and 12,000 are “Crawled – currently not indexed.” The remaining 2,000 are various other issues.
SDC SEO Brain: Good. Now we have two distinct problems to solve.
Problem 1: 8,000 “Discovered – not indexed”
This means Google knows these URLs exist but hasn’t crawled them yet. Causes:
| Cause | How to Identify | Fix |
|---|---|---|
| <strong>Crawl budget exhaustion</strong> | High-traffic site, many URLs, slow server | Reduce waste, improve speed |
| <strong>Low perceived value</strong> | New pages, thin content signals, few internal links | Improve signals before crawl |
| <strong>Crawl queue backlog</strong> | Recently added pages, large site | Wait + improve priority signals |
| <strong>Technical crawl blocks</strong> | robots.txt, server errors during crawl attempts | Check logs for Googlebot access |
Let’s diagnose. How old are these 8,000 pages on average?
Yuki: Mixed. Some are new products from the last few months. But some have been live for over a year.
SDC SEO Brain: Pages stuck in “Discovered” for over a year is a strong signal of low crawl priority. Google is saying: “I know this exists but I have no urgency to look at it.” Let’s check why.
Diagnostic step 1: Internal link audit
Sample 20 of the year-old “Discovered – not indexed” pages. For each one:
- How many internal links point to it?
- Where do those links come from?
- What’s the click depth from homepage?
Yuki: Checking a few… Most have only 1-2 internal links, usually from their category page. Click depth is 4-5 from homepage. Some are only linked from the sitemap.
SDC SEO Brain: There’s your problem. Pages with 1-2 internal links and high click depth are sending signals of low importance. Google prioritizes pages that:
- Have many internal links (votes of importance)
- Are close to homepage (architecturally prominent)
- Are linked from high-value pages (not just categories)
Sitemap-only discovery is the weakest signal. Google sees: “This URL exists” but no context about importance.
Fix for “Discovered – not indexed”:
| Action | Impact | Effort |
|---|---|---|
| Add internal links from high-traffic pages | High | Medium |
| Reduce click depth (flatten architecture) | High | High |
| Improve category page prominence | Medium | Medium |
| Link from blog content to products | Medium | Low |
| Ensure products appear in filtered navigation | Medium | Low |
Yuki: What about the 12,000 “Crawled – not indexed”? That seems worse.
SDC SEO Brain: It is more concerning. “Crawled – not indexed” means Google visited the page, evaluated it, and decided: “This doesn’t deserve a spot in my index.” That’s an active rejection, not just neglect.
Problem 2: 12,000 “Crawled – not indexed”
Common causes:
| Cause | Signal in GSC | How to Verify |
|---|---|---|
| <strong>Thin content</strong> | No specific signal | Manual review – is content substantial? |
| <strong>Duplicate content</strong> | May show canonical elsewhere | Check if Google chose different canonical |
| <strong>Low quality</strong> | No specific signal | Compare to indexed competitors |
| <strong>Soft 404</strong> | Sometimes flagged separately | URL Inspection shows soft 404 |
| <strong>Recent page, pending decision</strong> | Recently crawled date | Check last crawl date |
| <strong>Noindex (rendered)</strong> | May not show in GSC | Check rendered HTML for noindex |
Let’s sample these. Pick 10 “Crawled – not indexed” products and send me their URLs (or describe them).
Yuki: Looking at 10 random ones… They’re products with:
- Product name as title
- 2-3 sentence description
- Price and specs in a table
- 1-2 product images
- No reviews
SDC SEO Brain: I see several potential issues:
Issue 1: Thin content
2-3 sentences is extremely thin for a product page. Compare to competitors ranking for these products. How much content do their pages have?
Yuki: Competitor product pages have longer descriptions, buyer guides, multiple customer reviews, Q&A sections, comparison tables.
SDC SEO Brain: That’s your gap. Your pages have minimal content; competitors have comprehensive pages. Google sees your pages as adding no unique value.
Issue 2: No differentiating signals
Without reviews, Q&A, or unique content, your pages might be near-duplicates of:
- Manufacturer product pages
- Other retailer pages selling the same product
- Your own similar product variants
Google might be thinking: “Why index this when identical information exists elsewhere?”
Yuki: But we have 28,000 pages indexed with similar content levels.
SDC SEO Brain: Good observation. Let’s compare indexed vs non-indexed. What’s different about the 28,000 that ARE indexed?
Check:
- Do indexed pages have more reviews?
- Are they older (established authority)?
- Do they have more internal links?
- Are they higher in the site hierarchy?
- Do they get more organic traffic historically?
Yuki: Indexed pages… most have at least some reviews. They’re generally older. They do seem to have more internal links from cross-sell widgets.
SDC SEO Brain: Pattern emerging:
- Reviews = unique content = indexed
- No reviews = thin/duplicate = not indexed
This is actually good news because it’s actionable.
Yuki: How do I fix 12,000 thin product pages?
SDC SEO Brain: You won’t write unique content for 12,000 pages. Options:
Option 1: Prioritize and tier
| Tier | Criteria | Action |
|---|---|---|
| Priority (top 1,000) | Highest margin, search volume | Manual content enhancement |
| Secondary (next 3,000) | Moderate potential | Template enhancement + review push |
| Long-tail (remaining) | Low individual value | Consider consolidation or noindex |
Option 2: Programmatic enhancement
- Aggregate reviews from manufacturer or third-party sources
- Add structured spec comparisons with similar products
- Pull in user-generated Q&A
- Generate comparison content with related products
Option 3: Strategic noindex
For pages with near-zero search potential:
- Noindex intentionally
- Reduces crawl waste
- Improves quality ratio of indexed pages
Yuki: Wait, intentionally noindex products?
SDC SEO Brain: Counterintuitive but sometimes correct. Consider:
- A product page with 0 monthly search demand
- Thin content, no reviews
- Google already refuses to index it
- It consumes crawl budget
- It dilutes your site’s average quality
Noindexing it:
- Formalizes Google’s decision (no more wasted crawl attempts)
- Improves your indexed content ratio
- Lets you focus resources on pages with potential
Yuki: How do I know which pages have no search potential?
SDC SEO Brain: Cross-reference with keyword data:
- Export your “Crawled – not indexed” URLs
- For each product, check search volume for product name/SKU
- Check GSC for any impressions (even if not indexed, you might have impression data from brief indexing)
- Categorize:
| Category | Criteria | Recommendation |
|---|---|---|
| Has search demand | >50 monthly searches | Invest in content, push for indexing |
| Has some signals | 10-50 searches OR has impressions | Template enhancement |
| No demand evidence | 0 searches, 0 impressions, no potential | Consider noindex or consolidation |
Yuki: What about pages Google crawled recently? Should I wait?
SDC SEO Brain: Check last crawl date in URL Inspection. If Google crawled within the last 2-3 weeks, it might still be processing. But if it crawled 6+ months ago and remains “Crawled – not indexed,” Google has made a decision. Waiting won’t change it without changes to the page.
Yuki: How long until I see results from fixes?
SDC SEO Brain: Timeline expectations:
| Fix Type | Time to Crawl | Time to Index Decision |
|---|---|---|
| Content enhancement | Days to weeks | 2-4 weeks after crawl |
| Internal linking improvement | Days to weeks | 2-4 weeks after crawl |
| Technical fix (removing blocks) | Days | 1-2 weeks |
| Site-wide quality improvement | Weeks to months | Months |
For your 50,000 page site, don’t expect to go from 28,000 to 50,000 indexed quickly. A more realistic goal:
- Month 1-2: Diagnose and categorize all non-indexed pages
- Month 2-4: Implement fixes for priority tier
- Month 4-6: Expand to secondary tier
- Month 6-12: Evaluate if long-tail pages should remain
Yuki: What about requesting indexing in GSC?
SDC SEO Brain: “Request indexing” triggers a crawl, but it doesn’t guarantee indexing. If your page is thin or low-quality, requesting indexing just makes Google confirm its decision faster.
Use “Request indexing” for:
- Genuinely updated pages where content has improved
- Critical pages you need crawled urgently
- Pages where you’ve fixed a technical issue
Don’t use it for:
- Mass submission of unchanged pages
- Pages you haven’t improved
- Trying to “force” indexing
Google’s daily limit and the marginal impact make it a poor strategy for scale problems.
Diagnostic Cheat Sheet
Step 1: Identify the status
| GSC Status | Meaning | Primary Fix Direction |
|---|---|---|
| Discovered – not indexed | Google hasn't crawled yet | Improve crawl priority signals |
| Crawled – not indexed | Google crawled, decided not to index | Improve content quality/uniqueness |
| Duplicate, Google chose different canonical | Google consolidated to another URL | Check canonical setup |
| Blocked by robots.txt | Can't crawl due to block | Remove block if unintentional |
| Noindex detected | Page has noindex directive | Remove noindex if unintentional |
| Soft 404 | Page looks like error page | Add substantial content |
| Redirect | Page redirects elsewhere | Check redirect chain |
| Not found (404) | Page doesn't exist | Fix if should exist |
Step 2: For “Discovered – not indexed”
| Check | How | If Problem Found |
|---|---|---|
| Internal links | Screaming Frog inlinks report | Add more internal links |
| Click depth | Screaming Frog crawl depth | Flatten architecture |
| Robots.txt | Test URL in robots.txt tester | Remove unintentional blocks |
| Server logs | Check if Googlebot attempted crawl | Investigate failures |
| Site speed | Is server slow when Googlebot visits? | Improve server response |
Step 3: For “Crawled – not indexed”
| Check | How | If Problem Found |
|---|---|---|
| Content depth | Compare to indexed competitors | Enhance content |
| Uniqueness | Search key sentences in quotes | Add unique value |
| Canonical | Check rendered canonical tag | Fix canonical issues |
| Quality signals | Manual review vs quality guidelines | Improve E-E-A-T signals |
| Duplicate variants | Check for near-duplicate pages | Consolidate or differentiate |
FAQ
Q: How long should I wait before considering a page permanently not indexed?
A: If a page has been “Crawled – not indexed” for 6+ months without changes, Google has made a stable decision. If “Discovered – not indexed” for 12+ months, crawl priority is severely low. Both require action, not waiting.
Q: Does requesting indexing help?
A: It triggers a crawl, not indexing. Useful for fixed pages, useless for pages that haven’t changed.
Q: If Google chose a different canonical, is my page indexed?
A: No, your page is not indexed. The canonical target is indexed. Your page contributes signals to the canonical but doesn’t appear in search results itself.
Q: Can too many non-indexed pages hurt my site overall?
A: Yes. High ratio of thin or low-quality pages sends negative site-wide signals. This is why strategic noindexing can improve overall performance.
Q: Why are my competitors’ thin pages indexed when mine aren’t?
A: Domain authority matters. High-authority sites get more pages indexed with less content. Building authority is a long-term solution.
Summary
Indexing problems require diagnosis before solutions. “Not indexed” has many causes with different fixes.
Separate the statuses:
- Discovered – not indexed = crawl priority problem
- Crawled – not indexed = quality/value problem
For discovery problems:
- Improve internal linking
- Reduce click depth
- Fix technical crawl barriers
- Reduce crawl waste elsewhere
For indexing rejections:
- Enhance content depth
- Add unique value (reviews, Q&A, guides)
- Differentiate from duplicates
- Consider strategic noindexing for no-value pages
Prioritize by business value:
- Not all pages deserve indexing investment
- Focus resources on pages with search demand and revenue potential
- Long-tail consolidation often better than page-by-page fixes
Timeline reality:
- Diagnosis: 1-2 weeks
- Implementation for priority pages: 1-2 months
- Visible improvements: 2-4 months
- Full remediation: 6-12 months for large sites
Sources
- Google Search Central: Why pages aren’t indexed – https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers
- Google: URL Inspection tool – https://support.google.com/webmasters/answer/9012289
- Google Search Central: Indexing coverage report – https://support.google.com/webmasters/answer/7440203