Google indexes lots of Virtuemart's code strings unseen in page's source

Started by lliseil, August 18, 2014, 18:04:29 PM

Previous topic - Next topic

lliseil

Hi,
Here are the main keywords Google indexed on a Virtuemart 2 e-shop we migrated:
1.  virtuemart_category_id    
2.  category_child_id    
3.  ordering    
4.  xref    
5.  virtuemart_media_id
(have attached webmastertools' screenshot)
Wondering where it get them from, since they're _not_ in any pages' source.
Does anyone has more clue than me please?

[attachment cleanup by admin]
- A thread solved is a thread [SOLVED]! (a memo for myself)

jjk

I suppose you don't have SEF urls enabled and Google is extracting keywords from your urls.
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

lliseil

@jjk You'd think I would have specified that point, **kof kof** sorry about that. SEF is activated and GWT shows no duplicates : sitemap.xml
The main indexed URLS are like: domain/main-category/category/subcategory/product-detail
And others with added URL parameters, like {/by,product_price,results,1-0?filter_product=} added. But none of the SQL strings that Google indexed.

Nevertheless, SERP for "virtuemart_category_id site:domain" displays no less than 169 results. With none of the pages I've checked manually containing any of the aforementioned virtuemart strings in the source.

Remember that according to GWT these strings are this site's top 20 "keywords"  :o
Am really wondering where they come from ?
- A thread solved is a thread [SOLVED]! (a memo for myself)

jjk

Just a shot in the dark - looks like you are using an "Autocomplete search plugin for VirtueMart". I suppose this stores searches in a cache folder, which the Googlebots find and include in the index. If that's the case, you could disallow that folder for the robots.
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

lliseil

@jjk thank you for looking up with professional eyes! nice find indeed.
Now /cache and /plugins directories are disabled in robots.txt (from the site's birth). Also I believe 'virtuemart_category_id' , '<DB-prefix>virtuemart_categories_fr_fr' and the rest aren't printed in VM Search Autocomplete Ajax display searches, are they? Won't show up in the source code anyway.
Still lookin ":-|
- A thread solved is a thread [SOLVED]! (a memo for myself)

jjk

Quote from: lliseil on August 20, 2014, 19:30:45 PM
Still lookin ":-|
Perhaps discuss it with the developer of the autocomplete plugin (Daycounts, I suppose).
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

lliseil

Thank you for the tips jjk.
QuoteStill lookin ":-|
I meant that I looked in the source and grep'ing website's cache ;D Sorry for being unclear.
Daycounts yes. I tried to ask them but thier forum is locked (they're migrating to a ticket's system) and a contact is nowhere to be seen. Trying twitter.
- A thread solved is a thread [SOLVED]! (a memo for myself)

lliseil

According to Daycounts who answered very fast,
QuoteCache is disabled in VM Autocomplete. The result is refreshed in Ajax as you type.
Closes down a possible cause to the virtuemart code and SQL's strings be very well indexed by search engines, that is if I'm not mistaken.
Unkfortunetely my present knowledges lets me with no clue on where Google indexes these VM strings; especialy those begining with DB prefix  ???
- A thread solved is a thread [SOLVED]! (a memo for myself)

jjk

Maybe your site has been visited by a googlebot, while you had Joomla 'Debug System" enabled.
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

lliseil

Hmmm, I checked and Debug has been enabled on this site a few hours last week, but only for logged admin (as Virtuemart allows).
Dunno whether Google could see the debug strings. If it did, then I bet I just have to wait till it replaces the unwanted indexed strings by the site's content. Hopefully that'll reveal the cause of this strange indexation behaviour. Thanks you jjk for pointing it out!
- A thread solved is a thread [SOLVED]! (a memo for myself)