Page 1 of 1

Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 12:38 pm
by kensay
Hi guys

I have installed the sh404sef component. My problem is that i got shitloads of different urls pointing to the same page and its very confusing to me how to avoid this. I've avoided many of them with the robot.txt file so that i only get the sef friendly urls indexed. But if you take following example:

http://www.azurex.dk/Referencer.html
http://www.azurex.dk/referencer.html

both have the exact same content.
i used http://www.azurex.dk/Referencer.html but then i changed the url to lower case and purged all urls. But the link with upper case R still exists obviously. How do i avoid this using my current Joomla components?

Regards Mikael

Re: Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 12:48 pm
by Leftfield
Dont put it in the sitemap and dont link it from anywhere :). Thats all. If it is linked from somewhere, make 301 redirect in htaccess ot custom redirect in sh404.

Re: Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 1:01 pm
by kensay
I have discovered that it is any version of the url that gives the same result:

ReFErenCER
referenCER

etc etc etc. Making a canonical or 301 etc for all of these versions seems impossible. Isnt there a feature somehow that converts all urls to lower case automatically? there is such a function in sh404sef which i've tested but with no luck

Re: Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 1:07 pm
by Leftfield
Leftfield wrote:Dont put it in the sitemap and dont link that kind of url from anywhere :). Thats all.
You can relax.

Re: Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 1:12 pm
by kensay
Ok great but could you explain to me why i can relax?

As i see it i have a potential of massive duplicate content error if say, someone links to me with a uppercase letter. Then i supose google with index that URL and BOOM:(

Regards Mikael

Re: Duplicate content with sh404sef

Posted: Mon Jan 18, 2010 2:06 pm
by Leftfield
IMHO regarding SERP, there will be no boom. Anyway, this is not duplicate content.

Re: Duplicate content with sh404sef

Posted: Fri Jul 02, 2010 7:20 pm
by wmalcom
You must be on a windows server. Windows servers do not take into fact uppercase and lowercase file/directory names, linux servers do. Don't worry though, this is not considered dup content. :D

Re: Duplicate content with sh404sef

Posted: Sun Apr 01, 2012 7:07 pm
by fmmarzoa
Hi,

I am facing the same problem now. My site has been having that problem for a long time, but I just noticed it today since I was working on other projects.

It seems clearly like a sh404sef bug for me, using SEF URLs that in fact are NOT DEFINED nowhere to show content instead of giving an SH404SEF.

It is not a problem of the underlying OS and has nothing to see with it: it's the sh404SEF who receives the URL and should manage it. It is pretty clear, but anyway I have my site on an Ubuntu Linux distribution with Apache, so It is also empirical tested.

You are right on that: IT IS a potential duplicate content issue, since even when you do not use those invalid links, one mistake in one backlink its enough for Google to find them. And I have that problem exactly with one of those URLs, since I have found in my site stats visits that use uppercase and others using lowercase.

Anyway my site is in an old J!1.5 -as this post is also old- and I am planning to migrate it to J!1.7, or may be even a custom CMS that I am using for another pages.

In the meantime I am thinking in an eventual workaround forcing all urls lowercase with mod_rewrite into .htaccess, so Apache mod_rewrite will take care of that before the silly sh404sef gets the URL.

It seems not to be very difficult:

http://www.chrisabernethy.com/force-low ... d_rewrite/

Probably this is not useful for you so long after your post, but I leave it here just for the record. It may be useful for someone later, even for myself.

Best regards,