VirtueMart Forum

VirtueMart 2 + 3 + 4 => Virtuemart Development and bug reports => Topic started by: Sillero on March 29, 2023, 13:02:17 PM

Title: Random url in products
Post by: Sillero on March 29, 2023, 13:02:17 PM
Hi, This is a problem that I have been seeing for a long time.

Any published product can have a different url when we use the full category tree but the canonical url is correct. For example: domain/category1/subcategory1/product1 if you change the category path the url show the product and the canonical is right but the breadcrumbs and the base url are both wrong: domain/category99/subacategory99/product.

You may think that since the canonical is correct there are no problems with indexing but it is not. Google is somehow indexing hundreds of urls with the wrong path even though the canonical is correct (since the canonical is a recommendation and not a directive)

I don't know why these urls are built or how Google gets to them. I already noticed in another post a problem with wrong urls in product variants and since then I have removed all ajax updates of product variants and neighboring products. https://forum.virtuemart.net/index.php?topic=149438.0

I use an outdated version of virtuemart 3.2.12 but I understand that this is not the problem.

Can someone guide me in the right direction?
Title: Re: Random url in products
Post by: Sillero on March 29, 2023, 19:38:42 PM
@GJC Web Design
You must have the same problem on this website that you gave as an example: https://www.escape-watersports.co.uk

If you change the route of any product, the page continues to exist and gives a status of 200, the canonical will be correct but Google seems to crawl these products with different routes than what it should have. You really don't have a problem with this? Anyone?

I think this should give a 404 error and not give access to the product.
Title: Re: Random url in products
Post by: GJC Web Design on March 29, 2023, 22:00:21 PM
Hi,

please can u give some actual urls as examples of what you mean - struggling to understand the problem your discussing

Title: Re: Random url in products
Post by: pinochico on March 30, 2023, 07:02:29 AM
we had problem with url of products in google search console:
- we hacked canonical plugin from JSitemap Pro and setup a lot of URL in JSitemap Pro for sitemap
- we hacked breadcrumbs modul
- we hacked our rich snippets plugin for VirtueMart

Now we have all urls in sitemap and GSC right and url for product (I came form differents categories) in breadcrumb modul is still the same with the right canonical URL (product, category, articles)

We worked on this etc 40 hours and payed 2 licencies and 2 external developpers and put into our 8 shops.

Now you know a journey :)

You can check on www.zelenazeme.cz
Title: Re: Random url in products
Post by: Sillero on March 30, 2023, 12:57:18 PM
@ GJC Web Design
This is what I mean:
https://www.escape-watersports.co.uk/clothing/drysuits/mens-drysuits/crewsaver-atacama-pro-suit-detail
https://www.escape-watersports.co.uk/equipment/helmets/crewsaver-atacama-pro-suit-detail
Both products have the same canonical but the base url it's different. Somehow Google is crawling hundreds of pages with the wrong path. Can you check your GSC reports? Especially the indexed pages not submitted in the sitemap.

@pinochico
Thanks, now I'm worried a lot more...
I see that the urls of the products have the root of /eshop/ but the breadcrumb does have all the categories, it is a good example and I see that there is a lot of work

Both websites are awesome, congratulations.

So far the only major modification I have made has been to eliminate the crawling and possible indexing of urls that contain parameters and other unwanted urls such as sorting and filtering urls (product_name, product_price dirDesc, Keyword, manage, results... .) I recommend you to set no index this urls and unset the canonical, you will get a better crawl budget from google. Example: https://www.escape-watersports.co.uk/equipment/helmets/kayaking-helmets/by,price?keyword=

I haven't been able to find much information on the forum, do you know if this has been discussed?
Title: Re: Random url in products
Post by: balai on March 30, 2023, 14:19:09 PM
Given that the url of a product is based on the category, i cannot think of how you can solve that from within VM.
Also to my knowledge this is how it works in other e-commerce platforms as well.

Indeed canonical is a preference hint and not a directive. Also Google can choose another page as canonical.

Other methods to deal with duplicate content as proposed by Google are:
Redirects and Site Map Inclusion/Exclusion
https://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urls
Title: Re: Random url in products
Post by: Studio 42 on March 30, 2023, 15:08:27 PM
Google index most time the first page he find and use canonial link after
SO. It should not be a problem about SEO to have a wrong link some weeks
I have a customer shop using more then 20 categories per product and have no report about duplicates from google
Title: Re: Random url in products
Post by: Sillero on March 30, 2023, 16:47:38 PM
@balai
Yes this is how VM works if you use full category tree option. I personally like working with the full path to the product better, this adds context to the product url. Google understands this and has no problem with it, although it also recommends using short urls, so as long as two category levels are not exceeded, there should be no problem.

On other platforms I have been able to see different solutions. In several, when you try to change the category path there is a 301 redirect to the correct url. In others, a 404 is simply generated. The first option seems the most appropriate to me and the second could be a solution.

@Studio 42
Yes, Google usually indexes the first page it hits (sometimes even if it's blocked by robots.txt) It may not be a problem initially if the correct url is later updated but a lot of crawl budget is wasted as hundreds of urls can be generated to crawl. Also, if the url were from an alternative category to which the article also belongs, I wouldn't care so much.

The main problem is that I don't understand why Google builds those urls, where do they come from? As far as I know, when I had enabled the multivariates that were refreshed through Ajax, these urls could be built, since the category of the product previously seen was taken. But I no longer have it enabled, I work with the multivariates with the products_horizon.php sublayout and there is an http request when you enter to the variant product.
Title: Re: Random url in products
Post by: Studio 42 on March 30, 2023, 17:37:53 PM
THe problem for by,price?keyword= is from Sort by Product Name, Product Price
This generate a href to invert sort by.
The link should have a rel='nofollow' so only 1 page is indexed
But in all case the canonical link does not add this informations so it should be safe
Title: Re: Random url in products
Post by: Sillero on March 30, 2023, 18:54:54 PM
Yes, I seted this a long time ago but again this is a recommendation not a directive. Google continues to crawl hundreds of pages despite this. In my case all the filtering urls (product_name, product_price...) I have set them to noindex and removed the canonical. So yes, I think you should worry about it when you have a large catalog of products. But it also has an easy solution.

What I'm concerned about is why those urls are being crawled and indexed if they don't exist and aren't linked (supposedly) from any other page.
Title: Re: Random url in products
Post by: pinochico on April 01, 2023, 00:44:55 AM
because you don't setup right robots.txt
because you don't setup right view for nofollow, noindex
because a lot of others :)

This is complex problem, not only one URL.
We are hard working o SEO with GSC two years and in this time google change rules 3 times :D
Title: Re: Random url in products
Post by: Sillero on April 01, 2023, 10:19:47 AM
Yes, the robots.txt rules can be tricky. Once you discover the problems you first have to dexindex and then block by robots.txt. Every website is different. Your robots.txt file is very interesting

There is a lot of work to set nofollow links and to noindex some urls in a new virtuemart setup. I think this aspect should be improved by taking more consideration in SEO.

I was able to find a possible solution to my problem and I want to share it. I am not a programmer and it is possible that it will not work for other web pages.
Since the canonical url is always generated correctly and in my case I don't have the same product in several categories, I can do a 301 redirect from the base url that is reached to the canonical url. This is my code implemented in the view .../templates/.../productsdetails/default.php

I would like to hear your comments

$flag = false;
$document = JFactory::getDocument ();
foreach ($document->_links as $k => $array) {
if ($k != $document->base)
if ( $array['relation'] == 'canonical' ) {
//unset($document->_links[$k]);
$flag = true;
}

}
if($flag) {
//$document->setMetaData( 'robots', 'noindex' );
header("HTTP/1.1 301 Moved Permanently");
header("Location: $k");
header("Connection: close");
}
Title: Re: Random url in products
Post by: pinochico on April 01, 2023, 18:37:26 PM
default.php is view in FE - this is place where souldn't be developping, only view

right place is model or system plugin, some little can be in view.html.php.

but some best is use system plugin for canonicall url and this plugin customize.

QuoteI don't have the same product in several categories

This is specially option, a lot of shops are different

for some view which we don't want indexing and not as the menu item then we setup as noindex, nofollow or noindex,follow with template:


//SEO Analyse
$document = JFactory::getDocument();
$document->setMetaData('robots', "noindex, nofollow");
//END

Title: Re: Random url in products
Post by: Sillero on April 02, 2023, 11:23:14 AM
Thank for the tip, like I said, I'm not a programmer ;) but I will try.

That other code is the one I use in the category view when certain conditions are met and thus many unwanted urls are deindexed.

I see that in GSC many urls of product variants are being indexed but with the wrong route, I repeat, the conincal is correct (the url of the parent product) Can anyone point me to how I can make the base url the correct one as well since I can't prevent those urls from being generated?
When I output all the data for $this and compare with the right path I only find that [Itemid] and [categoryId] are wrong.
Title: Re: Random url in products
Post by: pinochico on April 04, 2023, 14:15:41 PM
Itemid must be only one - the high level - but we use ArtioSEF and on the one shop without Artio now we develop solution :)
CategoryID is from canonical URL of products

Sorry but its complex and not for forum :(
And this is my job (work for money :)

I can tell you a journey, but develop you have to self
Or buy some support on minijoomla.org

Title: Re: Random url in products
Post by: Sillero on April 04, 2023, 21:52:43 PM
Hi, thanks to you all.

@ Pinochico. I understand, thank for all your tips and explanations.

I want to return a small contribution to the community for everything that has helped me in the past:

If someone has this same problem (in GSC many urls are crawled and indexed with the wrong route) you can do:

In simple products (without multivariants) you only have to check if the canonical url of the $document is the same as the base url. If not, apply the redirect to the canonical.

In products with multivariants, the same logic would be applied to the parent product, but in the children it is necessary to check if the base url ($document->base) is different from the canonical one of the child product (which is not the same as the one of the product when it is shown - if the option to use the parent as canonical url is activated in the custom field) that is to say in $this->product->canonical. In that case apply the redirect to the canonical url of the child (JRoute::_($this->product->canonical))
In this way, random urls will not continue to be generated. The only catch is that if a product appears in several categories, when entering it you will be redirected to the canonical category. This is something I need to fix but it's a big step.
In my opinion Virtuemart should implement this somehow since a lot of crawl budget is spent and many unnecessary urls are indexed.
Thanks!!