VirtueMart Forum

VirtueMart 2 + 3 + 4 => Security (https) / Performance / SEO, SEF, URLs => Topic started by: NoOneLt on March 21, 2020, 09:49:27 AM

Title: ?cff_ pattern in crawled addresses for sitemap
Post by: NoOneLt on March 21, 2020, 09:49:27 AM
Hello, Can someone help me to understand why crawler gets strange addresses on my site, and there is a lot of this links, i would like to understand the reason,

Using latest Joomla and VM versions as of today

<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=536175736169206f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a1206f646f732073656ec4976a696dc485</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204d69c5a172696169204f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204272616e64c5be696169206f646169</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=20506c61756b616d73</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a1207069676d656e746163696ac485</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=204f646f7320c5a176656974696d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_3%5B0%5D=2050726965c5a120616b6ec49920</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=20c5a0616d70c5ab6e6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=20536572756d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=205069656e656c6973</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=4b72656d6173</loc><lastmod>2020-03-21</lastmod></url>
<url><loc>https://www.mysite.lt/products?cff_9%5B0%5D=204b61756bc497</loc><lastmod>2020-03-21</lastmod></url>

Thank you!
Title: Re: ?cff_ pattern in crawled addresses for sitemap
Post by: jjk on March 21, 2020, 10:57:00 AM
Seems that the crawler follows the filters, like on this page:
https://www.tuttoperlamoto.it/shop/by,product_sku/results,1-60.html?keyword=&viewmode=list&cff_3%5B0%5D=3630&cff_3%5B1%5D=3437&cff_3%5B2%5D=35584c2f36584c&cff_3%5B3%5D=5332584c&cff_3%5B4%5D=33584c2f34584c&cff_3%5B5%5D=31312e35202f203436&cff_3%5B6%5D=3034&cff_3%5B7%5D=3536&cff_3%5B8%5D=4c2d4c&cff_3%5B9%5D=3432&cff_3%5B10%5D=5853&cff_3%5B11%5D=382e35202f2034302e35

Is that yours? :)
Title: Re: ?cff_ pattern in crawled addresses for sitemap
Post by: NoOneLt on March 21, 2020, 11:24:18 AM
Not mine but same template.. :)

And seems you are right, its filtering.
Title: Re: ?cff_ pattern in crawled addresses for sitemap
Post by: GJC Web Design on March 21, 2020, 12:10:55 PM
yes .. the VirtualPlanet filter thing

try

Disallow: /*?cff

in your robots.txt
Title: Re: ?cff_ pattern in crawled addresses for sitemap
Post by: NoOneLt on March 21, 2020, 20:50:37 PM
Thanks GJC
Title: Re: ?cff_ pattern in crawled addresses for sitemap
Post by: Ventsi Genchev on March 24, 2020, 08:47:02 AM
I had a similar problem and so I use the following in the robots.txt to eliminate all possible unwanted indexing.

Disallow: /?start=*
Disallow: /*by,product_name*
Disallow: /*by,created_on*
Disallow: /*by,product_price*
Disallow: /*dirDesc*
Disallow: /*dirAsc*
Disallow: /*results,*


Of course, by,xxxx depends on the sorting options used.