VirtueMart Forum

VirtueMart 2 + 3 + 4 => Security (https) / Performance / SEO, SEF, URLs => Topic started by: lliseil on August 18, 2014, 18:04:29 PM

Title: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 18, 2014, 18:04:29 PM
Hi,
Here are the main keywords Google indexed on a Virtuemart 2 e-shop we migrated:
1.  virtuemart_category_id    
2.  category_child_id    
3.  ordering    
4.  xref    
5.  virtuemart_media_id
(have attached webmastertools' screenshot)
Wondering where it get them from, since they're _not_ in any pages' source.
Does anyone has more clue than me please?

[attachment cleanup by admin]
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: jjk on August 18, 2014, 20:25:13 PM
I suppose you don't have SEF urls enabled and Google is extracting keywords from your urls.
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 19, 2014, 19:57:58 PM
@jjk You'd think I would have specified that point, **kof kof** sorry about that. SEF is activated and GWT shows no duplicates : sitemap.xml (http://www.guixmodel.fr/sitemap.xml)
The main indexed URLS are like: domain/main-category/category/subcategory/product-detail
And others with added URL parameters, like {/by,product_price,results,1-0?filter_product=} added. But none of the SQL strings that Google indexed.

Nevertheless, SERP for "virtuemart_category_id site:domain" displays no less than 169 results. With none of the pages I've checked manually containing any of the aforementioned virtuemart strings in the source.

Remember that according to GWT these strings are this site's top 20 "keywords"  :o
Am really wondering where they come from ?
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: jjk on August 19, 2014, 21:24:53 PM
Just a shot in the dark - looks like you are using an "Autocomplete search plugin for VirtueMart". I suppose this stores searches in a cache folder, which the Googlebots find and include in the index. If that's the case, you could disallow that folder for the robots.
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 20, 2014, 19:30:45 PM
@jjk thank you for looking up with professional eyes! nice find indeed.
Now /cache and /plugins directories are disabled in robots.txt (from the site's birth). Also I believe 'virtuemart_category_id' , '<DB-prefix>virtuemart_categories_fr_fr' and the rest aren't printed in VM Search Autocomplete Ajax display searches, are they? Won't show up in the source code anyway.
Still lookin ":-|
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: jjk on August 20, 2014, 20:53:41 PM
Quote from: lliseil on August 20, 2014, 19:30:45 PM
Still lookin ":-|
Perhaps discuss it with the developer of the autocomplete plugin (Daycounts, I suppose).
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 21, 2014, 18:00:54 PM
Thank you for the tips jjk.
QuoteStill lookin ":-|
I meant that I looked in the source and grep'ing website's cache ;D Sorry for being unclear.
Daycounts yes. I tried to ask them but thier forum is locked (they're migrating to a ticket's system) and a contact is nowhere to be seen. Trying twitter.
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 22, 2014, 19:28:04 PM
According to Daycounts who answered very fast,
QuoteCache is disabled in VM Autocomplete. The result is refreshed in Ajax as you type.
Closes down a possible cause to the virtuemart code and SQL's strings be very well indexed by search engines, that is if I'm not mistaken.
Unkfortunetely my present knowledges lets me with no clue on where Google indexes these VM strings; especialy those begining with DB prefix  ???
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: jjk on August 22, 2014, 20:14:39 PM
Maybe your site has been visited by a googlebot, while you had Joomla 'Debug System" enabled.
Title: Re: Google indexes lots of Virtuemart's code strings unseen in page's source
Post by: lliseil on August 23, 2014, 11:20:47 AM
Hmmm, I checked and Debug has been enabled on this site a few hours last week, but only for logged admin (as Virtuemart allows).
Dunno whether Google could see the debug strings. If it did, then I bet I just have to wait till it replaces the unwanted indexed strings by the site's content. Hopefully that'll reveal the cause of this strange indexation behaviour. Thanks you jjk for pointing it out!