vBulletin

Thank you for visiting. This is our website archive. Please visit our main website by clicking the banner above.
vBulletin FAQ is dedicated to helping the forum owner build, manage and profit from his vBulletin Forum
vBulletin Web Hosting - Free skins and styles for your vBulletin - Search Engine Optimization




Google Sitemaps

sarahk
07-11-2005, 03:41 AM
Joey asked me to write something about sitemaps...

Google have recently introduced XML based sitemaps to their range of webmaster tools. It's an exciting initiative and not before time - webmasters should be hoping the feed gets accepted generically.

So why is it so good?
Google doesn't need to crawl an entire site to find the new content saving your bandwidth and theirs
Google can find new content even if it's buried deep within the site.
You can have a navigation system designed for humans and a sitemap for the bots

So how did it work before?
In the really early days Google relied on the date a page was last modified - that's a server setting. That was fine when all pages were static html but webmasters learnt how to spoof the date and dynamic sites gave the date of the main file, not the date of the data it was dynamically presenting.
Google would index each link it found on a page and search through. If it hit stop words, bad html, bad links or size limits then your page wouldn't be indexed as well as it might have been
Webmasters learnt to have sitemap pages to help both humans and bots and there were alot of compromises

Just what is a Google sitemap?

A Google sitemap is an xml document (or series of documents) which lists every page on your site that you want Google to know about. You generate it so you only include the pages you want indexed. It doesn't replace the robots exclusion protocol and the robots.txt document - you still need that.

The xml document lists the site name and it's modification date, then every page, it's modification date, priority and check frequency. In just a few bytes Google can get an accurate picture of your site and set some priorities.

There is no naming convention for the files, that should come, so you need to login to Google https://www.google.com/webmasters/sitemaps/ and register your sitemap

Will Google rank me higher if I have provide a sitemap?

No, it would be unrealistic to expect Google to give preferential treatment to webmasters and site owners who provide sitemaps. What will work in your favour though is that Google will have a better idea of your site, will index new content more quickly and will have a better idea of when to return. Every time I login to check my sitemaps they have checked the sitemap within the last 24 hours. That doesn't necessarily equate to actual indexing but it does mean your site stands a better chance of more accurate indexing more often.

How do I get a Google sitemap?

Google have produced their own tool written in Python (http://www.python.org). What's that? Good question. Your webhost probably doesn't have it installed and isn't likely to either. Within the PHP world the simple answer is to write one yourself, or work from an existing script.

One of the reasons webmasters like using the big systems like vBulletin is that there is a strong community working together to provide the tools to make our sites better.

Michael Brandon (http://www.time2dine.co.nz/aboutus-michael-brandon.php), a fellow kiwi, has written a great vBulletin tool which seems to do it all. You can pick the script up from http://forum.time2dine.co.nz/showthread.php?t=3976 and it's truly plug and play.

I've installed it on v3.0.1 and v3.0.7 sites and after some tiny changes (which Michael has incorporated into the script) I ran the script and out popped the sitemap. He's created a handy-dandy user interface with all the links you need to submit the feed to Google.

He's taken a conservative line and splits the sitemap files at less than the 10meg limit, all gzipped up and sorted nicely. Private forums are excluded and permissions are respected. I'd intended to write installation instructions but there's really no need.

All that's left to do is remember to run it regularly - or set up a cron job to run it for you daily.

But I can't do Cron Jobs!

There are scripts out there to emulate standard cron jobs such as pseudo-cron (http://www.bitfolge.de/pseudocron-en.html) and PHP Cron (http://sarah.users.phpclasses.org/browse/package/2392.html) - but that's another article for another day.

What if my site isn't 100% vBulletin?

Don't worry, the vBulletin script will allow you to add other information via it's vbsitemap-xtrafile.php script, so you can incorporate your entire script.

And what if I miss a page or 100?

Google tell us that they will still do a full index of our sites and pages which are excluded won't be penalised.

Other types of sites

If you are using a CMS or standard type of system go to that website and look for the forum or community links and see if someone has already submitted a google sitemap generator
Or you can use Xenu to generate it's own sitemap and then search on google for a converter. There are a few using excel.
And finally, sit down and write one which matches your site's requirements.

And Finally Ethics

Webmasters with generic sites are going to learn how to create enormous sitemaps pretty quickly with dinky names like http://www.mysite.com/find/something-about-nothing.html and then parse the file name and call a search engine for results and pretend the page actually existed. You know the type, you find them all the time. This won't be news to Google and I'm confident (fingers crossed) that they have a plan for dealing with this. After all using this philosophy you can have a handful of scripts and an enormous sitemap and nothing else!
Modified dates will remain open for abuse by some webmasters. The script mentioned above uses the database settings for when a post is actually saved. It doesn't adjust the date for when a signature on the page is updated, for when a displayed RSS feed is updated or when the blog entry is updated. That's good, because those are artificial page fresheners which have been used in the past to fool googlebot into thinking there was fresh content. Unscrupulous webmasters will find ways to manipulate Google but don't think for a minute that the geeks at Google aren't thinking just as hard.

Joeychgo
07-11-2005, 08:03 AM
Very informative - Thanks Sarah!

chachi
07-11-2005, 10:50 AM
Sarah, thanks for the info. Question, have you tracked Gbot's behavior with relation to pages being updated/not updated in the sitemaps you send them? I have not really read much about how Gbot reacts or does not react when a page is "updated" using this new setup.

sarahk
07-11-2005, 12:27 PM
Hi Chachi

I used to have a great tool to do just that but after I'd been screamed at by various different webhosts for crashing their servers I pulled the plug on that project. I'm not aware that they use a different googlebot to index the pages and most of my pages have adsense (which calls googlebot quickly too, despite the "official line") so I'd have to set up a very specific experiment. No time right now to do that I'm afraid.

Sarah

z|x
07-17-2005, 01:04 PM
I have vbportal 3.0.4 which gives me an xml feed (syndicated content). Is it enough that I just put that link for google sitemap. That XML file covers all the latest posts.
You can check it out in the Syndicated Content Block: http://www.thetalkzone.org

acers
07-20-2005, 09:04 AM
superb article mate...
5 stars


vBulletin

seo book

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum