Hello, Can someone help me to understand why crawler gets strange addresses on my site, and there is a lot of this links, i would like to understand the reason,
Using latest Joomla and VM versions as of today
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=536175736169206f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a1206f646f732073656ec4976a696dc485</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204d69c5a172696169204f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204272616e64c5be696169206f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=20506c61756b616d73</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a1207069676d656e746163696ac485</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204f646f7320c5a176656974696d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a120616b6ec49920</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=20c5a0616d70c5ab6e6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=20536572756d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=205069656e656c6973</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=4b72656d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=204b61756bc497</loc><lastmod>2020-03-21</lastmod></url>
Thank you!
Seems that the crawler follows the filters, like on this page:
https://www.tuttoperlamoto.it/shop/by,product_sku/results,1-60.html?keyword=&viewmode=list&cff_3%5B0%5D=3630&cff_3%5B1%5D=3437&cff_3%5B2%5D=35584c2f36584c&cff_3%5B3%5D=5332584c&cff_3%5B4%5D=33584c2f34584c&cff_3%5B5%5D=31312e35202f203436&cff_3%5B6%5D=3034&cff_3%5B7%5D=3536&cff_3%5B8%5D=4c2d4c&cff_3%5B9%5D=3432&cff_3%5B10%5D=5853&cff_3%5B11%5D=382e35202f2034302e35
Is that yours? :)
Not mine but same template.. :)
And seems you are right, its filtering.
yes .. the VirtualPlanet filter thing
try
Disallow: /*?cff
in your robots.txt
Thanks GJC
I had a similar problem and so I use the following in the robots.txt to eliminate all possible unwanted indexing.
Disallow: /?start=*
Disallow: /*by,product_name*
Disallow: /*by,created_on*
Disallow: /*by,product_price*
Disallow: /*dirDesc*
Disallow: /*dirAsc*
Disallow: /*results,*
Of course, by,xxxx depends on the sorting options used.