The Joomla! Forum ™



Forum rules


Forum Rules
Absolute Beginner's Guide to Joomla! <-- please read before posting, this means YOU.
Security Checklist
Forum Post Assistant - If you are serious about wanting help, you will use this tool to help you post.



Post new topic This topic is locked, you cannot edit posts or make further replies.  [ 24 posts ] 
Author Message
PostPosted: Thu Apr 01, 2010 12:24 am 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
hi everybody...

i like to introduce a good robots.txt for joomla, which is (what i think) one, that doesn't tell every damn bot outsite that u are using joomla because of the standard joomla robots.txt.

so what do you have to do ?

simple:
generate a sitemap.xml with a tool like this :

http://www.xml-sitemaps.com/

save the generated sitemap.

DOUBLE check in the sitemap file, what links you wanna have into the search robots like google and co.
delete all the stuff in your sitemap file you don't wanna have listed.
copy the sitemap into your web root.

then paste the following into your robots.txt

Code:
Sitemap: http://www.yoursite.net/sitemap.xml

User-agent: *
Allow: /sitemap.xml
Allow: /index.php
Allow: /index.html
Allow: /index.htm
Disallow: /

User-agent: Googlebot-Image
Disallow: /

_________________
http://www.schrammen.net


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 12:48 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Fri Aug 12, 2005 12:38 am
Posts: 13379
Location: Sydney - Australia
That is a clever method. Thanks for sharing.

_________________
Brad Baker - Follow me on Google+
http://www.rochen.com - Joomla! Hosting, the correct way.
http://www.joomlatutorials.com <-- Joomla Help & Tutorials
^Now with Joomla 2.5 and Joomla 3.0 Tutorials


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 12:50 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Mon Mar 20, 2006 1:56 am
Posts: 11644
Location: The Girly Side of Joomla in Sussex
can this be copied and stickied over at administration? with a flashing beacon?

_________________
HU2HY- Poor questions = Poor answer
Un requested Help PM's will be added to the foe list and possibly just deleted
{Community.Connect Administrator }{ Showcase & Security Moderator}


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 1:28 am 
User avatar
Joomla! Apprentice
Joomla! Apprentice

Joined: Tue May 22, 2007 4:15 am
Posts: 9
Location: Australia
So is there a disadvantage to telling every "damn bot outsite that u are using joomla" ?


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 8:26 am 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
hiddensphinx wrote:
So is there a disadvantage to telling every "damn bot outsite that u are using joomla" ?


yes, it' s like the fly thing..

[edit , sorry that one before did not make any sense, so ]
all script kids, that like joomla can find a good start point for automatic scanning.
and if the bot finds anything usefull, he'll start on with further checks.
[/edit]

_________________
http://www.schrammen.net


Last edited by fw116 on Thu Apr 01, 2010 1:25 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 8:46 am 
User avatar
Joomla! Exemplar
Joomla! Exemplar

Joined: Fri Aug 12, 2005 7:19 am
Posts: 9206
Location: Leeds, UK
Why would you want to disallow the googlebot-image

Of course if joomla people adopt this robots.txt then i can still identify your site as joomla - not that there aren't a hundred different methods for not only seeing that a site is Joomla but finding the exact joomla version. Sorry but this really serves no purpose.

_________________
"Exploited yesterday... Hacked tomorrow"
Blog http://brian.teeman.net/
Joomla Hidden Secrets http://hiddenjoomlasecrets.com/


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 9:00 am 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Tue Sep 04, 2007 3:16 pm
Posts: 749
Location: Ohio
It's possible that some bots looking for Joomla sites are only written to look via the robots.txt. If it keeps a robot from realizing the software, the creator of that bot isn't going to double check the sites that the bot says are not Joomla, since there will be millions of other Joomla sites out there that it will find easily using the bot. This isn't a fool-proof method of keeping people from seeing what software you use, but it could keep a few maliciously built bots out.

_________________
AnotherGuy's Weblog - http://anotherguy.us/


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 9:05 am 
User avatar
Joomla! Exemplar
Joomla! Exemplar

Joined: Fri Aug 12, 2005 7:19 am
Posts: 9206
Location: Leeds, UK
If you look at the bots out there (and I've seen a lot of them) they are all using other methods to determine if a site is Joomla and the exact version of Joomla. The value for a script kiddie is in finding the version of the product not just the product

_________________
"Exploited yesterday... Hacked tomorrow"
Blog http://brian.teeman.net/
Joomla Hidden Secrets http://hiddenjoomlasecrets.com/


Top
 Profile  
 
PostPosted: Thu Apr 01, 2010 12:44 pm 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
brian wrote:
Why would you want to disallow the googlebot-image

Of course if joomla people adopt this robots.txt then i can still identify your site as joomla - not that there aren't a hundred different methods for not only seeing that a site is Joomla but finding the exact joomla version. Sorry but this really serves no purpose.


thats not the point...

sure there are a million ways to check that your site is a joomla site... but why should i give all the people and bot
a first welcome message ?

if i look into the server logs the step goes like:
GET / HTTP/1.1" 200

if this one inst very usefull ( no ISS , not a complete apache version number whatever...)
then no time later the request is:

GET /robots.txt HTTP/1.1

this is the first contact for the bost scanning your site.. and IF such bots dont find anything usefull in the first sight.. they'll move forward..

this doesn't mean that they don't come back.. but your site is out of the scannig front line.


the googlebot-image bot is blocked, because i dont like to have all my images and stuff in google... thats all..

_________________
http://www.schrammen.net


Top
 Profile  
 
PostPosted: Fri Apr 02, 2010 8:59 am 
Joomla! Ace
Joomla! Ace

Joined: Sat Oct 21, 2006 8:53 am
Posts: 1334
So what you're actually saying that this is part of a defence strategy against initial bot perusal. If this is a good method then it needs to be put with the general strategy recomendations for site security. Is there such a set of recomendations? And would this be built into the Joomla! CMS download?

Come to think of it I have the "inverse" robots.txt:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/

?

_________________
Thanks for your time.


Top
 Profile  
 
PostPosted: Fri Apr 02, 2010 2:28 pm 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
cantthinkofanickname wrote:
So what you're actually saying that this is part of a defence strategy against initial bot perusal. If this is a good method then it needs to be put with the general strategy recomendations for site security. Is there such a set of recomendations? And would this be built into the Joomla! CMS download?

Come to think of it I have the "inverse" robots.txt:

User-agent: *
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/

?


well, i would use my robots.txt, because u deny everything expect the sitemap (where you can place the links you wanna have listed in google , bing whatever....)

and you can put something else in you like to allow...

so YOU decide what get listed and where the robot looks...

and yes , it should replace the standard joomla robots.txt.

_________________
http://www.schrammen.net


Top
 Profile  
 
PostPosted: Fri Apr 02, 2010 4:52 pm 
Joomla! Ace
Joomla! Ace

Joined: Sat Oct 21, 2006 8:53 am
Posts: 1334
And of course, the bot can ignore the robots.txt if it likes?

_________________
Thanks for your time.


Top
 Profile  
 
PostPosted: Fri Apr 02, 2010 5:08 pm 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
cantthinkofanickname wrote:
And of course, the bot can ignore the robots.txt if it likes?


exactly... but such bots should be filtered with something like fail2ban , mod_security and so on...

_________________
http://www.schrammen.net


Top
 Profile  
 
PostPosted: Wed Jan 26, 2011 3:27 pm 
Joomla! Fledgling
Joomla! Fledgling

Joined: Mon Nov 09, 2009 11:20 pm
Posts: 3
Thanks for the suggestions with the robots.txt file. I used your info and just went to my google webmaster tools to check the indexing. Now, something came up and I am not sure if I should be concerned with this but I think I may (btw, I am an seo noob).

So the status of the robots is as follows -

200 (Success) Googlebot is blocked from http://www.deitynyc.com/

What made me concerned is the fact that with your robots.txt setup it is coming back with Googlebot blocking my site... ???

Any info, suggestions, etc. on this?

Should I change the robots.txt file?

Thanks!!


Top
 Profile  
 
PostPosted: Wed Jan 26, 2011 5:55 pm 
User avatar
Joomla! Ace
Joomla! Ace

Joined: Tue Sep 06, 2005 11:18 am
Posts: 1365
Location: Germany
no you dont need to change this.
below this message you should get listed the crawler result with:
Line 1: Sitemap: www.yourdomain.net vaild sitemap-reference found
or smiliar

so if you have a xml sitemap listed in the webmaster tools with your content everything is ok...

_________________
http://www.schrammen.net


Top
 Profile  
 
PostPosted: Wed Jan 26, 2011 6:00 pm 
Joomla! Fledgling
Joomla! Fledgling

Joined: Mon Nov 09, 2009 11:20 pm
Posts: 3
Ok, great!

Thanks again!


Top
 Profile  
 
PostPosted: Tue May 31, 2011 4:55 pm 
Joomla! Apprentice
Joomla! Apprentice

Joined: Fri Jun 19, 2009 1:08 am
Posts: 8
Thanks for this useful tip. One question/challenge--after using this Robot.txt file I am getting a crawler error saying that the main page, http://totherescuedfw.com/ can't be accessed. Adding an explicit state to crawl that link has the effect of opening the entire site to the crawler.

I saw your comment above but not sure I understand. My line 1 is:
Sitemap: http://totherescuedfw.com/sitemap.xml

I'df appreciate your comments.


Top
 Profile  
 
PostPosted: Tue Dec 27, 2011 4:33 am 
Joomla! Apprentice
Joomla! Apprentice

Joined: Mon Nov 16, 2009 1:19 am
Posts: 14
Location: Ελλάδα
Will it affect SEO by using this trick?

_________________
http://levites.gr/


Top
 Profile  
 
PostPosted: Tue Dec 27, 2011 10:02 am 
User avatar
Joomla! Master
Joomla! Master

Joined: Mon Mar 20, 2006 1:56 am
Posts: 11644
Location: The Girly Side of Joomla in Sussex
Fonias wrote:
Will it affect SEO by using this trick?

please ask that in the seo forum this is a security forum

_________________
HU2HY- Poor questions = Poor answer
Un requested Help PM's will be added to the foe list and possibly just deleted
{Community.Connect Administrator }{ Showcase & Security Moderator}


Top
 Profile  
 
PostPosted: Sun Jan 22, 2012 10:08 am 
User avatar
Joomla! Guru
Joomla! Guru

Joined: Fri Apr 07, 2006 4:02 pm
Posts: 893
Location: Egypt
I think this topic should be stickied?

_________________
Joomla! Fan
http://www.alfystudio.com


Top
 Profile  
 
PostPosted: Fri Jan 27, 2012 11:59 pm 
User avatar
Joomla! Master
Joomla! Master

Joined: Mon Mar 20, 2006 1:56 am
Posts: 11644
Location: The Girly Side of Joomla in Sussex
ahmad wrote:
I think this topic should be stickied?

with respecting the fact that most bad bots will NOT respect the robots.txt file anyway, what is the point of making an off topic /non security issue (see posting.php?mode=quote&f=432&p=2724750 and viewtopic.php?p=2094372#p2094372 for justification) as a sticky topic.

_________________
HU2HY- Poor questions = Poor answer
Un requested Help PM's will be added to the foe list and possibly just deleted
{Community.Connect Administrator }{ Showcase & Security Moderator}


Top
 Profile  
 
PostPosted: Sat Jan 28, 2012 12:11 am 
Joomla! Guru
Joomla! Guru

Joined: Mon Feb 21, 2011 4:02 pm
Posts: 951
Location: UK
The robots.txt file is a part of the Robots EXCLUSION Protocol and as such "Allow" is NOT a part of the standard syntax.

Robots.txt is all about what to DISALLOW.

_________________
Online since 1995.


Top
 Profile  
 
PostPosted: Tue Apr 03, 2012 8:36 pm 
Joomla! Fledgling
Joomla! Fledgling

Joined: Tue Apr 03, 2012 8:22 pm
Posts: 1
Now I know that hiding from the bots that you are using Joomla can be advantageous as well. We use Joomla in one of our sites.


Top
 Profile  
 
PostPosted: Sat Apr 07, 2012 3:53 am 
User avatar
Joomla! Hero
Joomla! Hero

Joined: Sat Oct 21, 2006 10:20 pm
Posts: 2694
Location: Wisconsin USA
You can not hide from hack scripts. hack bots, scripts, and other bad bots do not even look at the robots file. The robots file is only for telling legit bots what not to index on a site.

_________________
PhilD -- Unrequested PM's and/or emails may not get a response.
Security Moderator


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic This topic is locked, you cannot edit posts or make further replies.  [ 24 posts ] 



Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group