How to Fix “Blocked by Robots.txt” in Google Search Console

TL;DR

“Blocked by robots.txt” means you told Google not to crawl specific URLs. Google obeys but can still index those URLs based on external signals like backlinks, showing them in search results with “No information is available for this page” descriptions. This is usually worse than not being indexed at all. The fix isn’t adding more robots.txt rules. It’s understanding that robots.txt blocks crawling (fetching content) while noindex blocks indexing (appearing in search results). Using the wrong tool creates the opposite of what you intended.


The Conversation

Hassan, technical SEO at an e-commerce company, discovered 15,000 URLs marked “Blocked by robots.txt” in GSC, many are pages they want indexed, site on Magento: We have 15,000 URLs showing as “Blocked by robots.txt” in GSC. I checked our robots.txt and it’s blocking paths I don’t recognize. Some of these are product pages we definitely want indexed. How did this happen?

SDC SEO Brain: Magento’s default robots.txt is notoriously aggressive. It blocks a lot of paths by default to prevent crawling of admin pages, checkout flows, and internal search results. But it often catches legitimate pages too. Can you share what your robots.txt currently looks like for the product-related paths?

Hassan: There’s a line that says “Disallow: /?” which I think blocks all URLs with query parameters. But some of our product URLs have parameters.

SDC SEO Brain: That wildcard rule is extremely broad. “Disallow: /?” blocks any URL containing a question mark, which includes: filtered category pages, paginated listings, product variants selected via parameters, search results, and any URL with tracking parameters. If your product URLs ever get linked with UTM parameters attached, those versions become blocked even though the base URL might not be.

Hassan: So Google can’t crawl any URL with parameters?

SDC SEO Brain: Correct. Google sees those URLs, attempts to crawl them, reads your robots.txt, and stops. Here’s the critical part: robots.txt blocks crawling, not indexing. Google can still index those URLs based on external signals. If someone links to your product page with “?color=blue” appended, Google might index that URL without ever seeing its content. The search result shows your URL with “No information is available for this page” because Google indexed the URL but was blocked from reading it.

Hassan: Wait, we have product pages appearing in search results with that “No information available” message?

SDC SEO Brain: Check your GSC under Coverage, then filter by “Indexed, though blocked by robots.txt” if that status exists, or search Google directly using site:yoursite.com and look for results with blank descriptions. If you see them, those are pages Google indexed from links but couldn’t crawl due to robots.txt.

Hassan: I just searched. There are hundreds of results with no description. That looks terrible.

SDC SEO Brain: It is terrible. Users see your URL in search results with no snippet, no description, no indication of what the page contains. Click-through rates plummet. And if they do click, they land on a real page with real content that Google just wasn’t allowed to see. This is worse than not being indexed at all because you’re taking up search real estate with a poor experience.

Hassan: How do I fix this? Just remove the robots.txt rule?

SDC SEO Brain: Not that simple. You need to understand what that rule was trying to protect. Magento has that rule because without it, Googlebot can crawl thousands of parameter combinations: every filter, every sort option, every pagination state. That burns crawl budget and creates duplicate content. The rule exists for a reason. It’s just too broad.

Hassan: So I need a more specific rule?

SDC SEO Brain: You need a strategy. First, identify which parameter URLs should be indexed: product pages with variant selectors (color, size) if each variant has unique content, maybe. Second, identify which should never be indexed: internal search results, filtered listings that duplicate category pages, checkout and cart URLs. Third, decide the right tool for each: robots.txt blocking for pages that shouldn’t even be crawled, noindex for pages that can be crawled but shouldn’t appear in search results, canonical tags for duplicate parameter versions that should consolidate to a master URL.

Hassan: What’s the difference between blocking with robots.txt versus using noindex?

SDC SEO Brain: Robots.txt says “don’t crawl this URL.” Google can’t fetch the page content. But Google might still index the URL based on external links.

Noindex says “you can crawl this, but don’t put it in search results.” Google fetches the content, reads the noindex directive, and excludes it from the index.

The key difference: noindex requires Google to crawl the page to see the instruction. Robots.txt prevents crawling entirely. If you block a page with robots.txt, Google never sees any noindex tag you put on it. The two directives can conflict.

Hassan: So if a page has both robots.txt blocking and a noindex tag, which wins?

SDC SEO Brain: Robots.txt wins in the sense that Google never sees the noindex. You’ve blocked Google from reading the page, so any instructions on the page are invisible. This is a common mistake: someone adds noindex to a page, it keeps appearing in search results, and they can’t figure out why. The reason is robots.txt is blocking Google from ever seeing the noindex.

Hassan: That might be happening to us. We added noindex to some pages months ago and they’re still indexed.

SDC SEO Brain: Check if those URLs match any robots.txt Disallow patterns. If they do, Google is blocked from seeing your noindex. You need to either remove the robots.txt block (so Google can crawl and see noindex) or use a different approach like the X-Robots-Tag HTTP header (which Google sees before the robots.txt check prevents the full crawl).

Hassan: Wait, HTTP headers get checked before robots.txt blocks the crawl?

SDC SEO Brain: No, I misspoke. Let me clarify the actual order: Google checks robots.txt first. If blocked, it stops and never makes the full HTTP request that would include headers. The X-Robots-Tag header doesn’t help if robots.txt already blocked the crawl. The only way to communicate noindex for a robots.txt-blocked URL is to remove the robots.txt block. There’s no workaround.

Hassan: So for our 15,000 blocked URLs, what’s the action plan?

SDC SEO Brain: Step one: Export all blocked URLs from GSC. Categorize them by pattern. Product pages with parameters, category filters, internal search, admin paths, etc.

Step two: For each category, decide the correct handling. Product pages you want indexed: remove from robots.txt blocking, add canonical to the non-parameter version if appropriate. Pages you never want indexed or crawled (admin, checkout, internal search): keep robots.txt blocking, accept they might appear with no description if heavily linked.

Step three: For the “indexed without content” problem, you have two options. Remove robots.txt block so Google can crawl and index properly, or use Google’s URL Removal Tool to request temporary removal of the broken listings (but this is temporary and requires fixing the underlying cause).

Hassan: What about the “Disallow: /?” rule specifically? Should I remove it entirely?

SDC SEO Brain: Don’t remove it entirely. Replace it with more specific rules. Block specific parameter patterns you definitely don’t want crawled:

Disallow: /?q= (internal search)
Disallow: /?filter (filter combinations)
Disallow: /?sort (sort options)
Disallow: /checkout/
Disallow: /cart/

Allow the parameters that matter:
Allow: /?color=
Allow: /?size=

The order matters in robots.txt. More specific rules override broader rules when they match.

Hassan: How do I know if I’ve gotten it right?

SDC SEO Brain: Test before deploying. Use Google’s robots.txt Tester in GSC (under Settings, then robots.txt). Enter specific URLs and see if they’re blocked or allowed. Test your product URLs with various parameter combinations. Test your filter and search URLs to ensure they’re still blocked. Test edge cases like UTM parameters on product pages.

Hassan: After I deploy the fix, how long until the “indexed without content” problem resolves?

SDC SEO Brain: Two phases. First, Google needs to recrawl the now-unblocked URLs and fetch their actual content. For an active e-commerce site, this might take one to four weeks for most URLs. Second, Google updates its index to replace the “no information” entries with real snippets. Total timeline: typically four to eight weeks to see significant improvement. You can speed up critical URLs by requesting indexing in URL Inspection.

Hassan: One more thing. We have some staging URLs that somehow got indexed. They’re blocked by robots.txt now, but they still appear in search. How do I get rid of those?

SDC SEO Brain: Staging URLs are a special problem. If they’re truly on a staging domain, that domain should have a blanket robots.txt block and ideally require authentication. For already-indexed staging URLs, you need the URL Removal Tool in GSC. Request temporary removal. But if robots.txt still blocks them, they might reappear later because Google can’t see any instruction to stay away permanently. The real fix is: staging environments should never be publicly accessible without authentication.

Hassan: They’re on a subdomain that was accidentally public for a while. We’ve added password protection now.

SDC SEO Brain: Good. The password protection (401/403 response) will eventually lead to de-indexing as Google repeatedly fails to access the URLs. But “eventually” could be months. For faster removal: temporarily remove robots.txt blocking on the staging subdomain, add noindex meta tags to all pages, let Google crawl and see the noindex, then re-add password protection. The noindex instruction gets processed and those URLs drop from the index within days to weeks.


FAQ

Q: What’s the difference between “blocked by robots.txt” and “excluded by noindex”?
A: Robots.txt blocks crawling: Google can’t fetch the page content. Noindex blocks indexing: Google can fetch the page but won’t add it to search results. Critically, if robots.txt blocks a URL, Google never sees any noindex tag on that page. You can have URLs blocked by robots.txt that still appear in search results with “No information available” descriptions because Google indexed them from external links without seeing the content.

Q: Why do pages blocked by robots.txt still appear in Google search results?
A: Google can index URLs without crawling them based on external signals like backlinks. If many sites link to a URL, Google may add it to the index even if robots.txt prevents crawling. These listings show “No information is available for this page” because Google knows the URL exists but can’t access the content to generate a description.

Q: How do I remove pages that are “indexed though blocked by robots.txt”?
A: Two options. First, remove the robots.txt block so Google can crawl, then add noindex if you don’t want the page in search results. Google will see the noindex and remove the listing. Second, use the URL Removal Tool in GSC for temporary removal, but this is only temporary and doesn’t solve the underlying issue. The first approach is the permanent fix.

Q: Can I use both robots.txt and noindex on the same URL?
A: Technically yes, but they conflict. If robots.txt blocks the URL, Google never crawls it and never sees the noindex tag. The noindex is ignored because Google can’t access the page to read it. If you want to use noindex, you must allow crawling in robots.txt so Google can fetch the page and see the directive.

Q: How do I test robots.txt changes before deploying?
A: Use Google’s robots.txt Tester in GSC (under Settings). Enter specific URLs to see if they’re blocked or allowed under current and proposed rules. Test product URLs, parameter combinations, filter URLs, and edge cases. Test both URLs you want blocked and URLs you want crawled. Deploy only after confirming expected behavior for all critical URL patterns.


Summary

Robots.txt blocks crawling, not indexing. When you Disallow a URL pattern, Google can’t fetch the page content but might still index the URL based on external signals like backlinks. This creates search results showing your URLs with “No information is available for this page” descriptions, which is often worse than not being indexed at all.

“Disallow: /?” is dangerously broad. This common rule blocks all URLs containing query parameters, including product variants, legitimate filtered pages, and any URL with tracking parameters. Magento and other platforms include aggressive default rules that often catch pages you want indexed.

Noindex requires crawling to work. If robots.txt blocks a URL, Google never sees any noindex tag on that page. The two directives conflict: robots.txt prevents Google from ever reading page-level instructions. Many “why is my noindex not working” problems trace back to robots.txt blocking the crawl.

Replace broad rules with specific patterns. Instead of blocking all parameters, block specific parameter patterns you don’t want crawled (internal search, filters, sort options) and explicitly allow parameters you do want crawled (product variants). Order matters in robots.txt: more specific rules override broader ones.

Test before deploying. Use Google’s robots.txt Tester in GSC to verify which URLs are blocked and which are allowed. Test product URLs with various parameters, filter URLs, and edge cases. Mistakes in robots.txt affect thousands of URLs and take weeks to recover from.

Recovery takes four to eight weeks. After fixing robots.txt, Google must recrawl previously-blocked URLs to fetch actual content, then update its index to replace “no information” entries. Request indexing for critical URLs in URL Inspection to speed up the process for high-priority pages.

Staging environments need authentication, not just robots.txt. Accidentally public staging URLs get indexed and are difficult to remove. Password protection (401/403) eventually leads to de-indexing but slowly. For faster removal, temporarily allow crawling, add noindex, let Google process it, then re-add authentication.


Sources