VirtueMart Forum

VirtueMart 2 + 3 + 4 => Security (https) / Performance / SEO, SEF, URLs => Topic started by: Cococoder on June 07, 2018, 10:39:16 AM

Title: Any Path leads to an existing page...really any path
Post by: Cococoder on June 07, 2018, 10:39:16 AM
Hi Guys,

Let me explain shortly my issue, let's say I have a category called test category
https://store.seobytes.eu/index.php/test-category

I can type anything between index.php and test category and that will lead to the same page. example:
https://store.seobytes.eu/index.php/banana/potatoes/test-category/

Examples are live.

Sounds quite bad for SEO in my opinion. It should give a 404 a redirect to the home page.

I tried with .htaccess and URL rewrite on with the same result and also consulted various thread such as https://forum.virtuemart.net/index.php?topic=69544.0

Any feedback on the matter would be appreciated.
Title: Re: Any Path leads to an existing page...really any path
Post by: Jörgen on June 07, 2018, 11:01:24 AM
Vm version, Joomla versio etc

Regards

Jörgen @ Kreativ Fotografi
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 07, 2018, 11:15:50 AM
Sorry for that!
Joomla! 3.8.8 Stable
VirtueMart 3.2.14
PHP Version   5.6.35
Currently on Beez3

Note: tested on other virtuemart site with similar results
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 07, 2018, 11:52:34 AM
Tested with and without default htaccess with the same result
Title: Re: Any Path leads to an existing page...really any path
Post by: jenkinhill on June 07, 2018, 16:35:32 PM
The VM 404 error handling is on by default, to avoid any potential loss of sales in case joe shopper should type in some stupid URL. The important URL is, of course, the canonical, which remains the same for that page.
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 08, 2018, 12:27:33 PM
Thanks for your clear and concise answer. I understand how useful this kind of error handling can be, although nowadays I don't know anyone typing full URLs. I realized some fancy url were indexed by google, the result is that the page shows a product with the home page layout with a URL like /404/productname leading to a bad user experience and potential duplicate content issue. I will play with the error handling option and keep you posted.
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 08, 2018, 12:52:32 PM
Ok, did further check:
Enable VirtueMart 404 error handling: Tested with and without on two different sites (in VM config)
Use URL Rewriting: tested with and without on two different sites (In Joomla config)

Same behavior.

I tested on a joomla blog article and it returns the expected 404. So indeed it is implemented in virtuemart.

Any pointers on how to disable or "fix" this "all path lead to Roma" behavior to a normal "Page not found, how can we help you?" approach?

Title: Re: Any Path leads to an existing page...really any path
Post by: Studio 42 on June 08, 2018, 13:43:42 PM
Cocoder, Google should never see link that you manually set, so it's not a real problem.
https://store.seobytes.eu/index.php/banana/potatoes/test-category/ is not giving a 404 because test-category is a valid slug.
banana/potatoes should set in your case a menu ID in Joomla, so in this case Virtuemart try to set the menu ID from DB and fall back to root category menu ID
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 09, 2018, 13:41:19 PM
Hi, thanks for the reply,
I raised the concern because google did index some wacky slugs.
/whatever/younameit/validPage is a valid slug if the CMS manage it as avalid slug.
I'd like to have pointers to go back to the default joomla behavior, which doesn't handle such slugs as a valid slug, and other community advice regarding how to handle the case.

Hope I can get some help on that.

Thanks guys
Title: Re: Any Path leads to an existing page...really any path
Post by: Studio 42 on June 09, 2018, 15:04:18 PM
I dont mean that this can be invalidate directly.
But you can add some rules in .htaccess to redirect your bad links using
RewriteRule ^/?whatever/(.*)$ newfolder/$1 [R=301,L]
or to your 404 page
RewriteRule ^/?whatever/(.*)$ /my404page [R=404,L]
Title: Re: Any Path leads to an existing page...really any path
Post by: jenkinhill on June 09, 2018, 16:28:53 PM
If Google has indexed those strange URLs then it must be indexing your access log. A SE bot follows links, it should not "type in" stupid URLs. AFAIK.
Title: Re: Any Path leads to an existing page...really any path
Post by: GJC Web Design on June 09, 2018, 23:37:00 PM
Just to add my 2 pennies worth....

I have also seen googled indexed nonsense urls  to some sites I run.  How they got indexed for me is not that interesting .. IMHO if the url is not valid it should return a 404 so it would drop out of the index eventually..

But currently with VM these are just reverting to to the root category view so google thinks they are valid and keeps the urls indexed.

I ( a while ago now so don't exactly remember the full scenario)  added this snippet in the vm router.php around line 750

search for the string  if (!isset($vars['virtuemart_category_id'])){

I added

/* GJC check that there is a category segment*/
$catseg = '';
foreach($segments as $segment){
if($segment == 'category') {
$catseg = '1';
}
}
//if (!isset($vars['virtuemart_category_id'])){
if (!isset($vars['virtuemart_category_id']) && $catseg){
/* GJC check that there is a category segment*/


this as it says checks if there is a category in the non sef url - from memory the code is : after passing various tests the default treats the first segment as a category - if not found then sends to the root cat.

now nonsense urls return 404

I haven't fully tested this but it works for me .. I especially had problems using a vmextended plugin where the new "view" was wrongly seen as a category  therefore the plugin wasn't useable with SEF on
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 11, 2018, 11:28:38 AM
Hey Thanks GJC Web Design!

I am going to test that asap and give feedback here!
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 11, 2018, 11:33:48 AM
Hey!

Thanks GJC Web Design!

Well I tested your solution but it didn't change anything for me. It still revert to home page view with or without SEF URL activated with Virtuemart 404 enabled or disabled.

Any other settings I should be aware off?

@Jenkinhill: This is an interesting but that mean I should have typed this URL in the first place, but the thing is it is the other way around.
Title: Re: Any Path leads to an existing page...really any path
Post by: Cococoder on June 18, 2018, 09:01:34 AM
Well, solution was easy, just disable virtuemart 404 error handling to fall back on joomla 404 handling which returns a proper 404 for wacky URLs