Advertisement

Is there a way to block HTTrack from copying my entire site?

Relax and enjoy The Lounge. For all Non-Joomla! topics or ones that don't fit anywhere else. Normal forum rules apply.
Locked
crazydiver
Joomla! Explorer
Joomla! Explorer
Posts: 377
Joined: Wed May 30, 2007 7:55 am
Location: Worldwide
Contact:

Is there a way to block HTTrack from copying my entire site?

Post by crazydiver » Sat Feb 23, 2013 7:56 pm

Hello,

It appears that there are people who have the ability to scrape an entire site with httrack. I wonder if Joomla has some kind of mechanism that can block such software via its .htaccess .

I know such methods could be useless if someone tries hard enough but I would like it as a first line of defense.

Look forward to the advice.

Thanks.

Advertisement
User avatar
ranwilli
Joomla! Master
Joomla! Master
Posts: 19203
Joined: Sun Feb 19, 2006 6:47 pm
Location: Toledo, OH
Contact:

Re: Is there a way to block HTTrack from copying my entire s

Post by ranwilli » Sat Feb 23, 2013 11:42 pm

Any portion of your site that you want to be interesting and of value to "guests" (visitors who are not logged in) will need to be set up very conventionally, with links to everything, and easy one-click or two-click navigation to everything. Since you will need that stuff to be easily accessed and indexed by search engines, you'll want to be sure you have "micro-formatted" important data through the use of schema tags.

If you do all that, you will not want to discourage or prevent ANYONE from scraping your site as often and as deeply as they like.

If your website has content you don't wish to be freely available to guests, the same methods that prevent guests from seeing it will prevent scraping.
Don't HACK the Joomla! core, Instead "Extend" and/or "Override."
Stay ON the update path.
https://harpervance.com

User avatar
mandville
Joomla! Master
Joomla! Master
Posts: 15161
Joined: Mon Mar 20, 2006 1:56 am
Location: The Girly Side of Joomla in Sussex

Re: Is there a way to block HTTrack from copying my entire s

Post by mandville » Sun Feb 24, 2013 8:39 am

HU2HY- Poor questions = Poor answer
Un requested Help PM's will be reported, added to the foe list and possibly just deleted
portable mini golf https://www.puttersminigolf.co.uk/

crazydiver
Joomla! Explorer
Joomla! Explorer
Posts: 377
Joined: Wed May 30, 2007 7:55 am
Location: Worldwide
Contact:

Re: Is there a way to block HTTrack from copying my entire s

Post by crazydiver » Sun Feb 24, 2013 9:27 am

Mandville thank you. This will do for now.
RewriteCond %{HTTP_USER_AGENT} ^HTTrack.* [NC,OR]

Ranwilli, what you say is true and thank you for the advice.

Alan Grift
Joomla! Apprentice
Joomla! Apprentice
Posts: 15
Joined: Tue Mar 26, 2013 11:04 am
Contact:

Re: Is there a way to block HTTrack from copying my entire s

Post by Alan Grift » Tue Mar 26, 2013 11:31 am

Block the particular HTTrack bot in your robots.txt.
Last edited by Alan Grift on Tue Mar 26, 2013 1:08 pm, edited 1 time in total.
My Joomla Language Lessons Website: http://www.conversation-piece.co.uk
My SEO blog: http://www.velaseo.com

User avatar
brian
Joomla! Master
Joomla! Master
Posts: 12813
Joined: Fri Aug 12, 2005 7:19 am
Location: Leeds, UK
Contact:

Re: Is there a way to block HTTrack from copying my entire s

Post by brian » Tue Mar 26, 2013 12:37 pm

Its only fair to point out that there are LOTS of tools that you can use to scrape a site. If someone really wants to and they find that httrack is being blocked they will just switch to another tool
"Exploited yesterday... Hacked tomorrow"
Blog http://brian.teeman.net/
Joomla Hidden Secrets http://hiddenjoomlasecrets.com/

hashimji12
Joomla! Apprentice
Joomla! Apprentice
Posts: 14
Joined: Tue Apr 30, 2013 11:12 am

Re: Is there a way to block HTTrack from copying my entire s

Post by hashimji12 » Tue Apr 30, 2013 11:38 am

Micknick wrote:If you knew the IP address of the robot, you could block the IP address from cPanel
this is the only way. then also chances are many to get hacked or scrapped.there is no ultimate solution for it.

Advertisement

Locked

Return to “The Lounge”