Search crawlers tries to index non-existent pagination pages

There is a site with ~15.000 products and several hundreds of categories, manufacturers.
I checked the access logs and found a huge amount of records like this - - [06/Feb/2022:03:29:24 +0100] "GET /manufacturer/yelowstone/discs/new-discs/by,price/dirAsc/results,4841-4940?keyword= HTTP/1.1" 200 211370 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Mobile Safari/537.36 (compatible; Googlebot/2.1; +" (-) 3929289

As far as I see these are the products from a manufacturer in a specific category, which is fine. But also it is sorted by price and the products from 4841-4940. The problem is that /manufacturer/yelowstone/discs/new-discs contains only two products so no need these pagination values.
But I might know where it is come from:
When you open a category you'll get and orderby field and a manufacturer filter and a pagination. If you have a lot of products in the category you'll get lots of pages. Just open a big page, you'll get like results,4841-4940
But if you start a manufacturer filter after this it keeps the results parameter no matter if there is no products for that.

The list is created from orderbymanu sublayout. I'm not sure which would be the proper way to handle this?
- remove limitstart, limit parameters?
- add rel="nofollow"?
- something else?


you should first check if you have a canonical link for those pages.

For example, my page is: /papiers/by,product_sku?language=fr-FR&keyword=
My canonical for this page is: /papiers

Because if not, the bots will index those pages, but usually googlebot, does not index if a page has parameters (?keyword=).
In any case, he knows how to recognize and manage them.

But to block the bot to index those type of pages, you can simply add this in your htaccess if you want:

--- Code: ---RewriteCond %{REQUEST_URI} (.*)/by,(.*)$ [NC]
RewriteRule ^.*$ - [ENV=NOINDFO:true]
Header set X-Robots-Tag "noindex, follow" env=NOINDFO
--- End code ---

So the bot will follow the link, what is still ok, but wil not index those pages anymore.

You can add several lines in the same block if you need, for example for your link

--- Code: ---RewriteCond %{REQUEST_URI} (.*)/by,(.*)$ [NC,OR]
RewriteCond %{REQUEST_URI} (.*)/results,(.*)$ [NC]
RewriteRule ^.*$ - [ENV=NOINDFO:true]
Header set X-Robots-Tag "noindex, follow" env=NOINDFO
--- End code ---

And do the same for others that for example you need the bots to index and follow

--- Code: ---RewriteCond %{REQUEST_URI} (.*)/new-discs,(.*)$ [NC]
RewriteRule ^.*$ - [ENV=INDFO:true]
Header set X-Robots-Tag "index, follow" env=INDFO
--- End code ---


or you cn go to override view and add

//SEO Analyse
$document = JFactory::getDocument();
$document->setMetaData('robots', "noindex, nofollow");

and for the right cannonical links you can use great app Jmap

Hi, thanks for the reply. Maybe the post's subject is misleading. Basically the problem is not about the crawlers, I also have canonical urls. My problem is why VirtueMart generates unneccessary pagination urls when there is no need them?
Again, the steps:

* open a category page with lots of products. The pagination value is e.g 50 products/page
* open the 10th page, you get results,451-500
* select a manufacturer from the top filter. It will filter the current category by the selected manufacturer. If the manufacturer has only e.g. 60 products in the category, you only need page1 and page2 but you'll get to page10 which is empty and unneccessary. What is the point with that? Why do we link to a page way above the product count limit?

So in my opinion the manufacturer filter in the category list should open the first page of the list, so the manufacturer filter list's items' links should not contain limit and limitstart parameters.


--- Quote ---- open the 10th page, you get results,451-500
- select a manufacturer from the top filter.

--- End quote ---

Select manufacturer on 10th page?

The error is in select manufacturer - the select must delete before info from cookies or session about first filtering and pagination and set pagination on first page for manufacturer.
I think is work for DEV VM team.
But I don't tested on clean installation VM :(


