Google Web Crawler not indexing site correctly!
Moderator: General Support Moderators
Forum rules
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Google Web Crawler not indexing site correctly!
The website I am referring to is a Grad student association group website I developed for my department. It can be accessed here: http://gsa.ece.umd.edu/
However, I am completely lost with regards to the robots.txt and Google's web crawler. I am using Google Webmaster tools and I made sure that my current robots.txt file worked well with the testing tool. I checked almost all pages and they are all accessible via the web crawler, or at least that is what Google's webmaster tools page said under the "Fetch as Google" tab.
However, on Google search, when I search for the phrase "gsa.ece.umd.edu" I see the following:
Also, I receive these errors and warnings on the Google Webmaster page on the "Sitemap" tab: (Please click the image for a larger view)
The sitemap can be found at: http://gsa.ece.umd.edu/sitemap.xml
Oddly enough, those specific links SHOULD be accessible based on the website's current robots.txt file that can be found at: http://gsa.ece.umd.edu/robots.txt
Note that I have let almost a month pass since I setup this robots.txt file and sitemap. Still the problem persists with Google's crawler. And forget about Bing and Yahoo! The website is nowhere to be seen!
Also, under the Configuration tab in the administrator backend, the Robots option is set to "Index, Follow".
So, uhm, I am seriously lost. What am I doing wrong and why is Google adamantly not indexing my site!?
However, I am completely lost with regards to the robots.txt and Google's web crawler. I am using Google Webmaster tools and I made sure that my current robots.txt file worked well with the testing tool. I checked almost all pages and they are all accessible via the web crawler, or at least that is what Google's webmaster tools page said under the "Fetch as Google" tab.
However, on Google search, when I search for the phrase "gsa.ece.umd.edu" I see the following:
Also, I receive these errors and warnings on the Google Webmaster page on the "Sitemap" tab: (Please click the image for a larger view)
The sitemap can be found at: http://gsa.ece.umd.edu/sitemap.xml
Oddly enough, those specific links SHOULD be accessible based on the website's current robots.txt file that can be found at: http://gsa.ece.umd.edu/robots.txt
Note that I have let almost a month pass since I setup this robots.txt file and sitemap. Still the problem persists with Google's crawler. And forget about Bing and Yahoo! The website is nowhere to be seen!
Also, under the Configuration tab in the administrator backend, the Robots option is set to "Index, Follow".
So, uhm, I am seriously lost. What am I doing wrong and why is Google adamantly not indexing my site!?
- dhuelsmann
- Joomla! Master
- Posts: 19659
- Joined: Sun Oct 02, 2005 12:50 am
- Location: Omaha, NE
- Contact:
Re: Google Web Crawler not indexing site correctly!
I suggest you simply remove these two lines from robots.txt:
Allow: /index.php
Allow: /index.php*
Allow: /index.php
Allow: /index.php*
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Re: Google Web Crawler not indexing site correctly!
Okay, I will try doing that. I am also trying to learn why Google crawler is behaving the way it is due to my choice of robots.txt. What is it about those two "allow" lines that is messing up things?
Also, I will let you know in a few days if this changes things. I hope it does, but if not I'll be back here till I sort this out! It is really valuable for me if Google can index this site correctly, as it is a newly built one and it needs all the exposure it can get.
Also, I will let you know in a few days if this changes things. I hope it does, but if not I'll be back here till I sort this out! It is really valuable for me if Google can index this site correctly, as it is a newly built one and it needs all the exposure it can get.
- dhuelsmann
- Joomla! Master
- Posts: 19659
- Joined: Sun Oct 02, 2005 12:50 am
- Location: Omaha, NE
- Contact:
Re: Google Web Crawler not indexing site correctly!
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Re: Google Web Crawler not indexing site correctly!
I am slightly overwhelmed by the information on the Google developers page. I am not a web developer by trade, its just a hobby. So could you break it down for me a little?
Also, I made some minor changes to the robots.txt file to be as follows:
Does that make sense or is the line "Allow: /" a big no-no? Because I am still robotting out the other directories.
EDIT: Btw, I am sure you guys all know that Joomla comes with a default setting for the robots.txt file. The default file was basically the following:
This file too caused the same problem I face now more or less. I had this up for nearly 2 months of this site's existence, from mid December last year till about February end. Somewhere at the beginning of March I changed it to include a few lines as follows:
All in all, none of that helped change the way Google crawled my page.
Also, I made some minor changes to the robots.txt file to be as follows:
Code: Select all
Sitemap: http://gsa.ece.umd.edu/sitemap.xml
User-agent: *
Allow: /
Allow: /index.php/
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
EDIT: Btw, I am sure you guys all know that Joomla comes with a default setting for the robots.txt file. The default file was basically the following:
Code: Select all
# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/orig.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Code: Select all
Allow: /index.php
Allow: /index.php*
Disallow: /administrator*
Disallow: /administrator/*
- dhuelsmann
- Joomla! Master
- Posts: 19659
- Joined: Sun Oct 02, 2005 12:50 am
- Location: Omaha, NE
- Contact:
Re: Google Web Crawler not indexing site correctly!
On the Webmaster Tools Dashboard, click Crawl.
Click Blocked URLs to find out the conflicts with your sitemap vs your robots.txt file.
Click Blocked URLs to find out the conflicts with your sitemap vs your robots.txt file.
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Re: Google Web Crawler not indexing site correctly!
Ok, I just did that. I've kinda tried it before and was puzzled by the results. I'll show them to you to get any help:dhuelsmann wrote:On the Webmaster Tools Dashboard, click Crawl.
Click Blocked URLs to find out the conflicts with your sitemap vs your robots.txt file.
And compare it with this gem I showed you earlier!
At this point, I have a strong suspicion that Google has a personal vendetta against my website and is being deliberately obtuse. Has anyone ever had such a suspicion before?
On a more serious note, have you ever heard of people having such kind of strange behavior with search engine indexing? I cannot seem to find people with similar errors before. Joomla's default robots.txt and the one I kinda modified both lead to this result. Very strange. I love the flexibility that Joomla provides with content management, but the search indexing is getting a tad too strange.
- dhuelsmann
- Joomla! Master
- Posts: 19659
- Joined: Sun Oct 02, 2005 12:50 am
- Location: Omaha, NE
- Contact:
Re: Google Web Crawler not indexing site correctly!
Try the blocked url's once again, only this time remove the Allow statements from your robots.txt.
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org
-
- Joomla! Fledgling
- Posts: 1
- Joined: Mon Apr 14, 2014 7:03 am
Re: Google Web Crawler not indexing site correctly!
Google Web Crawler is very important tool for SEO, it fetches website for indexing.
Some times it will take time to get indexing otherwise fetch URL individually, its get indexed quickly.
Some times it will take time to get indexing otherwise fetch URL individually, its get indexed quickly.
Last edited by pe7er on Mon Apr 14, 2014 8:34 am, edited 1 time in total.
Reason: Manual signature has been removed.
Reason: Manual signature has been removed.
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Re: Google Web Crawler not indexing site correctly!
Hi there,dhuelsmann wrote:Try the blocked url's once again, only this time remove the Allow statements from your robots.txt.
Sorry I got caught up in some important work so I put this issue aside. Now, coming back to it, I removed all Allow lines and tested the URL again, and this time it said simply "Allowed" on that page.
I will update my robots.txt to simply the following:
Code: Select all
Sitemap: http://gsa.ece.umd.edu/sitemap.xml
User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
-
- Joomla! Fledgling
- Posts: 3
- Joined: Fri Apr 18, 2014 7:52 am
- Location: South Africa
- Contact:
Re: Google Web Crawler not indexing site correctly!
dont use a robot.txt if u are not used to it, u might end up preventing your website from being indexed like u did
-
- Joomla! Apprentice
- Posts: 12
- Joined: Thu Dec 19, 2013 1:23 am
Re: Google Web Crawler not indexing site correctly!
After days of trial, it is the exact same issue. It is very strange. I remember not having issues with the default Joomla robot.txt on a different website. On this, every combination, including the default Joomla robot.txt is causing my site to not be indexed. Is there a possibility that there is some conflicting configuration, either with my Joomla installation or the domain the site is on that I am just not aware of? Is there something else in Joomla's settings that is causing this robot.txt to mess up though on paper it should not?
Please provide your comments. In the meantime I shall also approach the IT in my department and see if they can sort it out.
Please provide your comments. In the meantime I shall also approach the IT in my department and see if they can sort it out.
-
- Joomla! Apprentice
- Posts: 10
- Joined: Thu Dec 26, 2013 12:07 am
- Location: Indonesia
- Contact:
Re: Google Web Crawler not indexing site correctly!
If two line not remove, so google not crawl your link or blogdhuelsmann wrote:I suggest you simply remove these two lines from robots.txt:
Allow: /index.php
Allow: /index.php*