Jan Leow's Press Blog


XML Sitemap for Google and Yahoo

The sitemap.xml is to assist the search engines to index your web site.
Search engines like Google, Yahoo, MSN Live, and etc constantly sends out robots, spiders, crawlers (or whatever term they wish to call these automated indexing software) to index websites to put into their search database. To give the robots, spiders and crawlers a hand, a list of web pages listed in the sitemap.xml will speed up their process of indexing your web site.

The sitemap.xml follows a standard convention so that the indexing could be done smoothly. Even if a sitemap.xml was not included in your web site, the indexing could still continue on by itself by following the links on your web site, albeit it will take a little longer to index and find your web pages. The sitemap.xml is just to help the robots, spiders and crawlers do their job a little easier and faster. Thus it is a good idea to have sitemap file incorporated into your web site.

How to go about creating a sitemap.xml

You can use your favourite web building software or just use any text editor like notepad to do the coding. Just save the file with the xml extension after doing the necessary coding.

The coding is fairly simple, though it will get tedious to do it by hand to list down all your pages. The basic convention is as below

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.google.com/schemas/sitemap/0.84″>

<url>
<loc>http://www.example.com</loc>
<lastmod>2007-02-25</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>

</urlset>

Each web page will be enclosed in the URL tag with the exact URL enclosed in the LOC tag. Tags like LASTMOD (last modification), CHANGEFREQ (change frequency) and PRIORITY is optional and may be omitted in your sitemap.xml file.

The convention of the optional tags are as below:

  • lastmod = YYYY-MM-DD
  • changefreq = always/hourly/daily/weekly/monthly/yearly/never
  • priority = range 0.0-1.0 default 0.5, 1.0 is the highest priority.

LASTMOD is the date when your web page was last modified with the date in the YYYY-MM-DD format. You may also include the time when it was modified.

CHANGEFREQ is to show how frequently your web page is changed or updated.

The PRIORITY tag is to tell the robots, spiders and crawlers to give a guide as to the importance to each of the web pages. However it doesn’t mean that if you give all your pages priority of 1.0 meant all will be given high importance and get better ranking in the search engines. Usually a web page that was updated frequently will be given more importance than one that is not as regularly updated.

If you are coding the sitemap by hand, listing down the lastmod, changefreq and priority may be too tedious, and would be better off leaving it out from your sitemap altogether. If you are using CMS, try to find an automated sitemap generating module or plug in. It will help save a lot of time to do the coding by hand.

If you have more than one sitemap in your website

If your site is fairly large, you may split your sitemap into several files. For example if you want the root sitemap to point to another sitemap file in another directory containing another set of web pages, the coding convention is as follows:

<sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″>
<sitemap>
<loc>http://www.example.com/subdirectory/sitemap.xml</loc>
</sitemap>
</sitemapindex>

Just include the above codes between the URLSET tags in your root sitemap.xml file. There is no need for the sub sitemap file to point back to the root sitemap file as this is for indexing of web pages.

Reference

For more info about sitemap, visit the www.sitemaps.org website for more information about its tagging protocol and convention.


Related Pages

Internet Marketing SEO

Leave a Comment

Your email address will not be published. Required fields are marked *

Blue Captcha Image
Refresh

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.