Hello,
It appears that there are people who have the ability to scrape an entire site with httrack. I wonder if Joomla has some kind of mechanism that can block such software via its .htaccess .
I know such methods could be useless if someone tries hard enough but I would like it as a first line of defense.
Look forward to the advice.
Thanks.
Advertisement
Is there a way to block HTTrack from copying my entire site?
-
- Joomla! Explorer
- Posts: 377
- Joined: Wed May 30, 2007 7:55 am
- Location: Worldwide
- Contact:
Advertisement
- ranwilli
- Joomla! Master
- Posts: 19203
- Joined: Sun Feb 19, 2006 6:47 pm
- Location: Toledo, OH
- Contact:
Re: Is there a way to block HTTrack from copying my entire s
Any portion of your site that you want to be interesting and of value to "guests" (visitors who are not logged in) will need to be set up very conventionally, with links to everything, and easy one-click or two-click navigation to everything. Since you will need that stuff to be easily accessed and indexed by search engines, you'll want to be sure you have "micro-formatted" important data through the use of schema tags.
If you do all that, you will not want to discourage or prevent ANYONE from scraping your site as often and as deeply as they like.
If your website has content you don't wish to be freely available to guests, the same methods that prevent guests from seeing it will prevent scraping.
If you do all that, you will not want to discourage or prevent ANYONE from scraping your site as often and as deeply as they like.
If your website has content you don't wish to be freely available to guests, the same methods that prevent guests from seeing it will prevent scraping.
Don't HACK the Joomla! core, Instead "Extend" and/or "Override."
Stay ON the update path.
https://harpervance.com
Stay ON the update path.
https://harpervance.com
- mandville
- Joomla! Master
- Posts: 15161
- Joined: Mon Mar 20, 2006 1:56 am
- Location: The Girly Side of Joomla in Sussex
Re: Is there a way to block HTTrack from copying my entire s
interestingly i googled and bingd httrack block
and found
http://www.httrack.com/html/abuse.html
https://forums.digitalpoint.com/threads ... k-it.5532/
and found
http://www.httrack.com/html/abuse.html
https://forums.digitalpoint.com/threads ... k-it.5532/
HU2HY- Poor questions = Poor answer
Un requested Help PM's will be reported, added to the foe list and possibly just deleted
portable mini golf https://www.puttersminigolf.co.uk/
Un requested Help PM's will be reported, added to the foe list and possibly just deleted
portable mini golf https://www.puttersminigolf.co.uk/
-
- Joomla! Explorer
- Posts: 377
- Joined: Wed May 30, 2007 7:55 am
- Location: Worldwide
- Contact:
Re: Is there a way to block HTTrack from copying my entire s
Mandville thank you. This will do for now.
RewriteCond %{HTTP_USER_AGENT} ^HTTrack.* [NC,OR]
Ranwilli, what you say is true and thank you for the advice.
RewriteCond %{HTTP_USER_AGENT} ^HTTrack.* [NC,OR]
Ranwilli, what you say is true and thank you for the advice.
-
- Joomla! Apprentice
- Posts: 15
- Joined: Tue Mar 26, 2013 11:04 am
- Contact:
Re: Is there a way to block HTTrack from copying my entire s
Block the particular HTTrack bot in your robots.txt.
Last edited by Alan Grift on Tue Mar 26, 2013 1:08 pm, edited 1 time in total.
My Joomla Language Lessons Website: http://www.conversation-piece.co.uk
My SEO blog: http://www.velaseo.com
My SEO blog: http://www.velaseo.com
- brian
- Joomla! Master
- Posts: 12813
- Joined: Fri Aug 12, 2005 7:19 am
- Location: Leeds, UK
- Contact:
Re: Is there a way to block HTTrack from copying my entire s
Its only fair to point out that there are LOTS of tools that you can use to scrape a site. If someone really wants to and they find that httrack is being blocked they will just switch to another tool
"Exploited yesterday... Hacked tomorrow"
Blog http://brian.teeman.net/
Joomla Hidden Secrets http://hiddenjoomlasecrets.com/
Blog http://brian.teeman.net/
Joomla Hidden Secrets http://hiddenjoomlasecrets.com/
-
- Joomla! Apprentice
- Posts: 14
- Joined: Tue Apr 30, 2013 11:12 am
Re: Is there a way to block HTTrack from copying my entire s
this is the only way. then also chances are many to get hacked or scrapped.there is no ultimate solution for it.Micknick wrote:If you knew the IP address of the robot, you could block the IP address from cPanel
Advertisement