I understand that if I put an entry in robots.txt such as:
Disallow: /tmp/
then spiders will not crawl that directory. In older versions of Joomla, Disallow: /components/ was a standard line. I noticed (and I'm not sure when this happened) that this directory no longer appears in a vanilla Joomla install, neither do /modules/ nor /plugins/. I wonder why these were removed?
The issue I'm getting is that Google is now sending me emails about various component URLs being 404s - such as the URL https://www.domainname.com/component/convertforms. So I have been manually adding the "missing" entries (the 3 mentioned above) since I discovered this.
Today, Google sent me an email telling me these URLs are blocked by robots.txt:
https://www.domainname.com/component/convertforms
https://www.domainname.com/component/ajax/?format=json
Again, I know this is Google trying to be "helpful", but I already know what I've added to robots.txt and it's just an annoyance.
I doubt there's a way to exclude Google from reporting "Blocked by robots.txt" (if there is i'd love to know), but can anyone explain why /components/, /modules/, and /plugins/ are removed from a vanilla Joomla install?
Advertisement
Google and robots.txt
Moderator: General Support Moderators
Forum rules
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Windows Defender SmartScreen Issues <-- please read this if using Windows 10.
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Windows Defender SmartScreen Issues <-- please read this if using Windows 10.
- trogladyte
- Joomla! Guru
- Posts: 635
- Joined: Sat May 03, 2008 9:27 pm
- Location: Phoenix, AZ
- Contact:
Google and robots.txt
Ian Shere - Phoenix Website Design & Hosting
http://www.citruskiwi.com
Survey Pilot - Surdex.com
“Learn from the mistakes of others. You won’t live long enough to make all of them yourself.”
http://www.citruskiwi.com
Survey Pilot - Surdex.com
“Learn from the mistakes of others. You won’t live long enough to make all of them yourself.”
Advertisement
- Webdongle
- Joomla! Master
- Posts: 44689
- Joined: Sat Apr 05, 2008 9:58 pm
Re: Google and robots.txt
A google search for
Google from reporting "Blocked by robots.txt"
https://www.google.com/search?client=fi ... ots.txt%22
Provided this AI result at the top
Google from reporting "Blocked by robots.txt"
https://www.google.com/search?client=fi ... ots.txt%22
Provided this AI result at the top
When Google Search Console reports "Blocked by robots.txt", it means that Google was unable to crawl a URL on your website because of instructions in your robots.txt file. This can happen for a number of reasons, including:
The robots.txt file is not configured correctly
You accidentally blocked Googlebot from accessing the page
You included a disallow directive in your robots.txt file
It's normal to prevent Googlebot from crawling some URLs, especially as your website gets bigger. However, improper use of disallow rules can severely damage a site's SEO.
To find the “Blocked by robots.txt” error in Google Search Console, you can:
Go to the Pages section
Click on the Not indexed section
To prevent a URL from being indexed entirely, you can use the "noindex" meta tag or HTTP header in addition to blocking it in your robots.txt file.
http://www.weblinksonline.co.uk/
https://www.weblinksonline.co.uk/updating-joomla.html
"When I'm right no one remembers but when I'm wrong no one forgets".
https://www.weblinksonline.co.uk/updating-joomla.html
"When I'm right no one remembers but when I'm wrong no one forgets".
- trogladyte
- Joomla! Guru
- Posts: 635
- Joined: Sat May 03, 2008 9:27 pm
- Location: Phoenix, AZ
- Contact:
Re: Google and robots.txt
I understand how to use, and the use of, robots.txt. That really wasn't my question. I was interested in knowing what those particular lines had been removed from it which is now causing this annoying issue.
All well and good if you have an actual page or menu item. In the 2 examples I quoted, neither exist so, other than what I did (Disallow: /components/) the only other option is also using robots.txt and adding:
Disallow: <specific URL>
[/quote]
All well and good if you have an actual page or menu item. In the 2 examples I quoted, neither exist so, other than what I did (Disallow: /components/) the only other option is also using robots.txt and adding:
Disallow: <specific URL>
Ian Shere - Phoenix Website Design & Hosting
http://www.citruskiwi.com
Survey Pilot - Surdex.com
“Learn from the mistakes of others. You won’t live long enough to make all of them yourself.”
http://www.citruskiwi.com
Survey Pilot - Surdex.com
“Learn from the mistakes of others. You won’t live long enough to make all of them yourself.”
-
- Joomla! Guru
- Posts: 551
- Joined: Sat Aug 20, 2005 3:15 pm
- Contact:
Re: Google and robots.txt
Hi,
The only actual good option is to do absolutely nothing. Having these links returning 404s is 100% fine and totally expected as the natural life of a website. It won't hurt your seo performance in any way.
So I'd suggest removing any addition you made to your robots.txt related to these pages.
One thing you could do is check why and where Google is finding these bad URLs on your site but it's unlikely you'll be able to change the way ConvertForms is operating so I'm not sure it's worth the time.
The only actual good option is to do absolutely nothing. Having these links returning 404s is 100% fine and totally expected as the natural life of a website. It won't hurt your seo performance in any way.
So I'd suggest removing any addition you made to your robots.txt related to these pages.
One thing you could do is check why and where Google is finding these bad URLs on your site but it's unlikely you'll be able to change the way ConvertForms is operating so I'm not sure it's worth the time.
4SEO, 4AI, 4Command, 4Podcast, 4Video, SEO and content extensions for Joomla 3, 4 & 5 - https://weeblr.com
I don't reply to PM anymore. Thanks for using our extensions.
I don't reply to PM anymore. Thanks for using our extensions.
-
- Joomla! Champion
- Posts: 6405
- Joined: Tue Aug 23, 2005 1:56 pm
- Location: South coast, UK
- Contact:
Re: Google and robots.txt
+1 @ shumisha
https://gadsolutions.biz Electrical services
https://electrical-testing-safety.co.uk Testing services
https://electrical-testing-safety.co.uk Testing services
- Webdongle
- Joomla! Master
- Posts: 44689
- Joined: Sat Apr 05, 2008 9:58 pm
Re: Google and robots.txt
It does say one of the causes for gbot returning 404
You included a disallow directive in your robots.txt file
http://www.weblinksonline.co.uk/
https://www.weblinksonline.co.uk/updating-joomla.html
"When I'm right no one remembers but when I'm wrong no one forgets".
https://www.weblinksonline.co.uk/updating-joomla.html
"When I'm right no one remembers but when I'm wrong no one forgets".
- Per Yngve Berg
- Joomla! Master
- Posts: 31344
- Joined: Mon Oct 27, 2008 9:27 pm
- Location: Romerike, Norway
Re: Google and robots.txt
Add "Allow /components/ajax"
All CSS and JS shall also be allowed.
All CSS and JS shall also be allowed.
Advertisement