Google Web Crawler not indexing site correctly!

Discuss Search Engine Optimization in relation to Joomla! 2.5. This forum will also have discussions on SEF/SEO Joomla! 2.5 extensions.

Moderator: General Support Moderators

Forum rules
Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.
Locked
litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Google Web Crawler not indexing site correctly!

Post by litetaker » Sat Apr 12, 2014 7:11 am

The website I am referring to is a Grad student association group website I developed for my department. It can be accessed here: http://gsa.ece.umd.edu/

However, I am completely lost with regards to the robots.txt and Google's web crawler. I am using Google Webmaster tools and I made sure that my current robots.txt file worked well with the testing tool. I checked almost all pages and they are all accessible via the web crawler, or at least that is what Google's webmaster tools page said under the "Fetch as Google" tab.

However, on Google search, when I search for the phrase "gsa.ece.umd.edu" I see the following:
Image

Also, I receive these errors and warnings on the Google Webmaster page on the "Sitemap" tab: (Please click the image for a larger view)
Image
The sitemap can be found at: http://gsa.ece.umd.edu/sitemap.xml


Oddly enough, those specific links SHOULD be accessible based on the website's current robots.txt file that can be found at: http://gsa.ece.umd.edu/robots.txt

Note that I have let almost a month pass since I setup this robots.txt file and sitemap. Still the problem persists with Google's crawler. And forget about Bing and Yahoo! The website is nowhere to be seen!

Also, under the Configuration tab in the administrator backend, the Robots option is set to "Index, Follow".

So, uhm, I am seriously lost. What am I doing wrong and why is Google adamantly not indexing my site!? :(

User avatar
dhuelsmann
Joomla! Master
Joomla! Master
Posts: 19659
Joined: Sun Oct 02, 2005 12:50 am
Location: Omaha, NE
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by dhuelsmann » Sat Apr 12, 2014 2:20 pm

I suggest you simply remove these two lines from robots.txt:
Allow: /index.php
Allow: /index.php*
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org

litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Re: Google Web Crawler not indexing site correctly!

Post by litetaker » Sat Apr 12, 2014 4:48 pm

Okay, I will try doing that. I am also trying to learn why Google crawler is behaving the way it is due to my choice of robots.txt. What is it about those two "allow" lines that is messing up things?

Also, I will let you know in a few days if this changes things. I hope it does, but if not I'll be back here till I sort this out! It is really valuable for me if Google can index this site correctly, as it is a newly built one and it needs all the exposure it can get.

User avatar
dhuelsmann
Joomla! Master
Joomla! Master
Posts: 19659
Joined: Sun Oct 02, 2005 12:50 am
Location: Omaha, NE
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by dhuelsmann » Sat Apr 12, 2014 5:21 pm

Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org

litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Re: Google Web Crawler not indexing site correctly!

Post by litetaker » Sat Apr 12, 2014 5:35 pm

I am slightly overwhelmed by the information on the Google developers page. I am not a web developer by trade, its just a hobby. So could you break it down for me a little?

Also, I made some minor changes to the robots.txt file to be as follows:

Code: Select all

Sitemap: http://gsa.ece.umd.edu/sitemap.xml

User-agent: *
Allow: /
Allow: /index.php/
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
Does that make sense or is the line "Allow: /" a big no-no? Because I am still robotting out the other directories.

EDIT: Btw, I am sure you guys all know that Joomla comes with a default setting for the robots.txt file. The default file was basically the following:

Code: Select all

# If the Joomla site is installed within a folder such as at
# e.g. www.example.com/joomla/ the robots.txt file MUST be
# moved to the site root at e.g. www.example.com/robots.txt
# AND the joomla folder name MUST be prefixed to the disallowed
# path, e.g. the Disallow rule for the /administrator/ folder
# MUST be changed to read Disallow: /joomla/administrator/
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/orig.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
This file too caused the same problem I face now more or less. I had this up for nearly 2 months of this site's existence, from mid December last year till about February end. Somewhere at the beginning of March I changed it to include a few lines as follows:

Code: Select all

Allow: /index.php
Allow: /index.php*
Disallow: /administrator*
Disallow: /administrator/*
All in all, none of that helped change the way Google crawled my page.

User avatar
dhuelsmann
Joomla! Master
Joomla! Master
Posts: 19659
Joined: Sun Oct 02, 2005 12:50 am
Location: Omaha, NE
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by dhuelsmann » Sat Apr 12, 2014 6:14 pm

On the Webmaster Tools Dashboard, click Crawl.
Click Blocked URLs to find out the conflicts with your sitemap vs your robots.txt file.
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org

litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Re: Google Web Crawler not indexing site correctly!

Post by litetaker » Sun Apr 13, 2014 7:10 pm

dhuelsmann wrote:On the Webmaster Tools Dashboard, click Crawl.
Click Blocked URLs to find out the conflicts with your sitemap vs your robots.txt file.
Ok, I just did that. I've kinda tried it before and was puzzled by the results. I'll show them to you to get any help:

Image
Image

And compare it with this gem I showed you earlier!
Image

At this point, I have a strong suspicion that Google has a personal vendetta against my website and is being deliberately obtuse. :eek: Has anyone ever had such a suspicion before?

On a more serious note, have you ever heard of people having such kind of strange behavior with search engine indexing? I cannot seem to find people with similar errors before. Joomla's default robots.txt and the one I kinda modified both lead to this result. Very strange. I love the flexibility that Joomla provides with content management, but the search indexing is getting a tad too strange.

User avatar
dhuelsmann
Joomla! Master
Joomla! Master
Posts: 19659
Joined: Sun Oct 02, 2005 12:50 am
Location: Omaha, NE
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by dhuelsmann » Sun Apr 13, 2014 7:51 pm

Try the blocked url's once again, only this time remove the Allow statements from your robots.txt.
Regards, Dave
Past Treasurer Open Source Matters, Inc.
Past Global Moderator
http://www.kiwaniswest.org

KIRAN549
Joomla! Fledgling
Joomla! Fledgling
Posts: 1
Joined: Mon Apr 14, 2014 7:03 am

Re: Google Web Crawler not indexing site correctly!

Post by KIRAN549 » Mon Apr 14, 2014 7:13 am

Google Web Crawler is very important tool for SEO, it fetches website for indexing.

Some times it will take time to get indexing otherwise fetch URL individually, its get indexed quickly.
Last edited by pe7er on Mon Apr 14, 2014 8:34 am, edited 1 time in total.
Reason: Manual signature has been removed.

litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Re: Google Web Crawler not indexing site correctly!

Post by litetaker » Tue Apr 15, 2014 12:48 pm

dhuelsmann wrote:Try the blocked url's once again, only this time remove the Allow statements from your robots.txt.
Hi there,

Sorry I got caught up in some important work so I put this issue aside. Now, coming back to it, I removed all Allow lines and tested the URL again, and this time it said simply "Allowed" on that page.

I will update my robots.txt to simply the following:

Code: Select all

Sitemap: http://gsa.ece.umd.edu/sitemap.xml

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/
And see if this problem persists.

mamahadija
Joomla! Fledgling
Joomla! Fledgling
Posts: 3
Joined: Fri Apr 18, 2014 7:52 am
Location: South Africa
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by mamahadija » Fri Apr 18, 2014 8:16 am

dont use a robot.txt if u are not used to it, u might end up preventing your website from being indexed like u did

litetaker
Joomla! Apprentice
Joomla! Apprentice
Posts: 12
Joined: Thu Dec 19, 2013 1:23 am

Re: Google Web Crawler not indexing site correctly!

Post by litetaker » Thu Apr 24, 2014 4:22 am

After days of trial, it is the exact same issue. It is very strange. I remember not having issues with the default Joomla robot.txt on a different website. On this, every combination, including the default Joomla robot.txt is causing my site to not be indexed. Is there a possibility that there is some conflicting configuration, either with my Joomla installation or the domain the site is on that I am just not aware of? Is there something else in Joomla's settings that is causing this robot.txt to mess up though on paper it should not?

Please provide your comments. In the meantime I shall also approach the IT in my department and see if they can sort it out.

bolobolojogja
Joomla! Apprentice
Joomla! Apprentice
Posts: 10
Joined: Thu Dec 26, 2013 12:07 am
Location: Indonesia
Contact:

Re: Google Web Crawler not indexing site correctly!

Post by bolobolojogja » Sun May 25, 2014 10:00 am

dhuelsmann wrote:I suggest you simply remove these two lines from robots.txt:
Allow: /index.php
Allow: /index.php*
If two line not remove, so google not crawl your link or blog


Locked

Return to “Search Engine Optimization (Joomla! SEO) in Joomla! 2.5”