News:

Support the VirtueMart project and become a member

Main Menu

Suchmaschienen BUG!!

Started by DEVflorian, January 13, 2013, 13:48:38 PM

Previous topic - Next topic

DEVflorian

BUG verursacht Ranking Probleme! Absulut nicht Suchmaschienen Tauglich.
Wir haben aufgrund der Probleme keinen anderen Ausweg als das Shop per robots.txt zu blockieren!!

Jedes Produkt kann durch Änderung der Kategorie Alias in der URL in einer anderen Kategorie angezeigt werden
Ich erkläre das Problem mal anhand Ihres Demo Shops

Hier das Apple Iphone 4 in der Kategorie Mobile Phones           
http://demo.virtuemart.net/index.php/2012-01-13-09-33-20/virtuemart-default-layout/mobile-phones/apple-iphone-4-detail

Durch die Änderung der Kategorie z.B. auf Computer
http://demo.virtuemart.net/index.php/2012-01-13-09-33-20/virtuemart-default-layout/computer/apple-iphone-4-detail

Fazit: Wir haben z.B. Knapp 2500 Produkte im Shop mit 80 Kategorien.
Da alle Crawler diese in allen Kategorien anzeigen lassen = 2500 x 80 = 2000000 Pages im Index!!!! Pages mit doppeltem Content !!!

...............................................................
Probleme entstehen auch mit der Sitemap, da Crawler oder Sitemap Generatoren mit dem Cralwer System was ebenfalls sämtliche Suchmaschienen verwenden ins endlose Laufen
Einzigst Funktionsfähiges Prinzip bietet XMAP, da diese auf Virtuemart zugreifft und die Produkte in der richtigen Kategorie in der Sitemap generiert. LEIDER suchen die SUCHMASCHIENEN auch ausserhalb der Sitemap. UND XMP packt sowieso nur sehr wenig Produkt bis es aussteigt!!
................................................................

Ein weiteres Großes Problem ist das beliebige Alias für Kategorien in der URL angegeben werden können und diese auch ebenfalls gecrawlt werden können.
z.B. Funktioniert auch  Apple Iphone 4 in der Kategorie Virtuemart-Mobilphonecenter
http://demo.virtuemart.net/index.php/2012-01-13-09-33-20/virtuemart-default-layout/virtuemart-mobilphonecenter/apple-iphone-4-detail

Milbo

Should I fix your bug, please support the VirtueMart project and become a member
______________________________________
Extensions approved by the core team: http://extensions.virtuemart.net/

Peter Pillen

#2
Thiz izz a german bug  ;D

* VM is already complicated in English *

I think you need to look for a canonical url solution to solve this problem. I have programmed a little snippet for myself to solve this. But blocking your url, will only cause loss of traffic. If you can configure it canonical, then google and other search engines will combine the traffic of both urls and show only one in the index.

I'm still improving my code, but for now I have enough with this. It echos a canonical tag ther where I want it. Pretty basic for the moment.

<?php
//arrays with the page url as key and the target url as value
// page url is the page where the canonical tag should appear
// and the target url is the preffered url
$canonicals=array(
"http://www.pillini.be/en/frequently-asked-questions" => "http://www.pillini.be/en/buying-shoes-and-bags-online/frequently-asked-questions"
        
//other urls are deleted for overview reasons
);
$page_url=JURI::current();
if (
array_key_exists($page_url$canonicals)) {
    echo 
"<link rel=\"canonical\" href=\"".$canonicals[$page_url]."\"/>";
}
?>


in the future I want to adapt this so that the last part of the sef url is analysed. And if there are two or more url's that exist with the same url ending, a canonical tag is echoed. But I will need the database for this.

Milbo

No it is not a german bug and we have already canonical URLs.

So the question is what is wrong with them and also it is very important to know the version, because we changed the behaviour for vm2.0.18a
Should I fix your bug, please support the VirtueMart project and become a member
______________________________________
Extensions approved by the core team: http://extensions.virtuemart.net/

franzpeter

#4
I did try that out using VM 2.0.18a. It is true that with the alias number I can go to the product from every category. But:
If I enter for example:
http://www.mydomain.de/2013-01-06-08-37-22/asus-eeepad-bundle-tf600tg-1b016r-detail.html

it is possible to show the product but VM creates the correct canonical link like:
<link href="http://www.mydomain.de/notebook/tablet-pc/asus-eeepad-bundle-tf600tg-1b016r-detail.html" rel="canonical" />
for example.



But indeed, it seems to be a bug in the router file. Every product can be made visible in every category and even on the start page by simply adding the last part of the url i.e. your product.html for example and that works with every language. Instead of showing an error page, because it is the wrong way to call the product, VM 2.0.18a allows to call the product with the wrong url, but adds the correct canonical url. I agree with fl, that is absolutely bad for crawlers, because they see a lot of useless links which go to the canonical url. Big shops with a lot of categories and maybe 50 000 products can easily have >500 000 possible links.

Peter Pillen

I'm not sure about this, because the canonical url in VM2 is new to me. But in my opinion it is pretty wrong. I've chosen a product that is shown in two categories.

page 1

the page url is this: http://www.pillini.be/en/webshop/womens-shoes/boot-streep-zwart-9985-detail
canonical is <link rel="canonical" href="http://www.pillini.be/en/webshop/womens-shoes/boot-streep-zwart-9985-detail">

page url = canonical url

page 2 of the same product but in different category

the page url is: http://www.pillini.be/en/webshop/womens-shoes/short-boots/boot-streep-zwart-9985-detail
canonical is <link rel="canonical" href="http://www.pillini.be/en/webshop/womens-shoes/short-boots/boot-streep-zwart-9985-detail">

again...

page url = canonical url

Isn't it supposed to be?

url page 1 -> with canonical url directing to page 2 (because the url is longer)
url page 2 -> no canonical needed

Or am I missing the point here?

jjk

#6
My experience to far is that Google doesn't complain about duplicate content (read Google's explanation of what they consider duplicate content) in shops, because Google knows that a product in a webshop often can be reached via several different urls.
Original citation from Google: "Examples of non-malicious duplicate content could include: "...Store items shown or linked via multiple distinct URLs..."

In the example given in the above forum post, the position of your product page SERPs will depend to a certain extent on how good the match of the user's serch term is with your url. If the user searches i.e. for 'Womens short boots', the ranking of the second url would be much higher than the first url - which would be good for you :-)

Crawling my life site Google does a pretty good job displaying only ONE out of the available 'duplicates/different urls' (Google citation: "...we'll identify what we think is the best version"). After the site has been online for a while, I noticed i.e. that Google indexes all child products which share the description with their parent product, but usually displays only the parent product. Overall, Google has indexed approximately 2000 urls on my site and filtered out almost 280000 (looks vastly exaggerated to me, since the site has approximately 600 products only x 2 languages), but obviously this doesn't hurt the ranking at all, since I've got plenty of No1 positions. If Google thinks that your urls are malicious, they will tell you, provided you registered at their webmaster tools.

PS: I currently don't use any sitemap generator for my shop, because imho none of them produces good sitemaps from the shop. XMAP does work, if you make a change in one of it's files (uninstalled it meanwhile and currently I don't remember what I changed to make it work)
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

franzpeter

@jjk,

but I hope that you agree that this is a bug. It cannot be in a good ecommerce system, that you can reach the items by using the last part of the url (the item itself) from everywhere what has to do with start page, any category, the categories pages a.s.o., even if VM 2 puts the correct canonical url inside the source code. Take a look at shop ware or prestashop demos or open cart to see how it should work. Adding canonical url to cure a logical bug is not the right way, it simply should not happen. The question with VM 2 and the bug is: What are product categories for? Actually they have barely any purpose.
If Google did filter out about 280 000 url, Google did a nice job but there are other web crawlers too and: It is not the job of Google to correct existing bugs in VM 2.

Just my opinion!

jjk

Quote from: franzpeter on January 16, 2013, 10:42:38 AM
@jjk, ...but I hope that you agree that this is a bug.
If you mean that it is possible to replace your VM category name with "my-fantasy-catagory-name" in the url of your browser and VM still displays the product, I agree (but I'm not worried about that funny feature :-)
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

franzpeter

@jjk,
if you declare a bug as feature I agree with you. With a little bit of marketing idle talk we could even say:
VM 2 has an amazing feature, it finds every product anywhere by just entering the last part of the url.

DEVflorian

#10
Quote from: Milbo on January 13, 2013, 14:16:34 PM
http://forum.virtuemart.net/index.php?topic=79800.0
Betrifft alle Versionen
Google Ranking faktoren kennt keiner genau. Doch bei Shops über 100 Kategorien dürfen trotz alle dem schnell Probleme mit doppeltem Inhalt auftachen. Fakt ist, das Prinzip ist von grund heraus Falsch. das ist ein VM Core problem.
Einen kleinen schritt zu verbesserung hat WD gemacht http://www.wd-profi.de/virtuemart-erweiterungen/174-virtuemart-seo-
leider müsste diese lösung noch ausgebaut werden, da dort ebenfalls das problem beseht.
Wie gesagt XMAP ist das einzigeste Sitemap programm das dafür ausgelegt ist. Und XMAP Steigt bei ungefähr über 4000 arikel aus und ist nicht mehr in der lage eine Sitemap zu generieren. Andere Programme dir nach dem Crawler prinzip arbeiten. Wie z.B. http://www.xml-sitemaps.com/standalone-google-sitemap-generator.html oder Google selbst generieren ins endlose. Fakt ist. Es ist uns leider nicht mehr möglich eine sitemap zu generieren und wir haben bereits jetzt schon übner 60.000 sites im Google index.
WTF. Und das schlimmste niemanden vom Virtuemart interessiert das Problem und ist darüber bemüht dieses zu beheben. Nicht ein mal ein Ticket wurde erstellt. Thx.

jjk

Quote from: fl on January 19, 2013, 14:32:54 PM
Einen kleinen schritt zu verbesserung hat WD gemacht

That was old VirtueMart 1.x stuff and definitely is not compatible with VM2.

Concerning xmap:
xmap might work if you make two adjustments in the code:
In the file components\com_xmap\views\xml\view.html.php around line 44-45 change the max_execution_time
I've got mine set to 10 minutes (you will problably need need more), which is:
@ini_set('max_execution_time',600);

Also I've used a slightly modified version of the file ...\plugins\xmap\com_virtuemart\com_virtuemart.php
(file is attached below)

However, since that generated sitemap is not what I want, I'm not using it. (Google has indexed all my products anyway)

[attachment cleanup by admin]
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

franzpeter

#12
@fl,

genauso ist es. Das einzige, was bisher hilft (allerdings bei vielen Kategorien und bei vielen Produkten unmöglich): redirect per htacess. Im Grunde genommen müsste man sämtliche SEF Fehler per redirect auffangen. Für Produkt xyz in der Subkategorie product/subcategory würde man schon 2 redirects benötigen: einmal um zu verhindern, dass das Produkt über die Startseite abrufbar ist, dann bzgl. Kategorie Produkt. Beide müssten von www.domain.com/xyz.html bzw. www.domain.com/product/xyz.html auf www.domain.com/product/subcategory/xyz.html umgeleitet werden. Dies stelle man sich allein schon mal bei 200 Produkten in 4 Kategorien vor!

Yes, that is true. The only way - for now - would be to write a lot of redirects to the htaccess file. We would need to trap all SEF url problems. For product xyz in a subcategory 'subcategory' with the main category 'product' we already need 2 redirects. To prevent calling the product from the start page without a category url and to prevent to call a product from the main category page without the subcategory. With only 200 products in 4 categories, it may take a lot of time to write redirects for all those misleading links.

And it is not sufficient to add a canonical url! I see a lot of those things in webmaster tools, telling something about double title tags, produced by the strange way VM 2 allows to call a product from everywhere with nearly every url.

Indeed, it is a no go for a shopping cart!

I agree with jjk,
a sitemap should not be necessary if the VM 2 code would produce good results!!!

jjk

#13
Quote from: franzpeter on January 19, 2013, 15:44:44 PM
I agree with jjk,
a sitemap should not be necessary if the VM 2 code would produce good results!!!

If VM2 code would produce bad results, then why do most of my products show up in Google in first page positions, many of them at No1? (Usually ahead of my competitor's shops) ;D
I'm pretty shure, part of that ranking is because the of the canonical url in the generated source code of your product pages.
Non-English Shops: Are your language files up to date?
http://virtuemart.net/community/translations

Peter Pillen

@jjk ... also when you're not logged in to your google account? If i'm logged in to google, my page also shows up in top 5 results, but logged out ... I drop back a few pages.