A site map (or sitemap) is a list of pages of a web site accessible to crawlers or users. It can be either a document in any form used as a planning tool for web design, or a web page that lists the pages on a web site, typically organized in hierarchical fashion. This helps visitors and search engine bots find pages on the site.
While some developers argue that site index is a more appropriately used term to relay page function, web visitors are used to seeing each term and generally associate both as one and the same. However, a site index is often used to mean an A-Z index that provides access to particular content, while a site map provides a general top-down view of the overall site contents.
XML is a document structure and encoding standard used, amongst many other things, as the standard for webcrawlers to find and parse sitemaps. There is an example of an XML sitemap below (missing link to site). The instructions to the sitemap are given to the crawler bot by a Robots Text file, an example of this is also given below. Site maps can improve search engine optimization of a site by making sure that all the pages can be found. This is especially important if a site uses a dynamic access to content such as Adobe Flash or JavaScript menus that do not include HTML links.
They also act as a navigation aid [1] by providing an overview of a site's content at a single glance.
Contents |
Below is an example of a validated XML sitemap for a simple three page web site. Sitemaps are a useful tool for making sites built in Flash and other non-html languages searchable. Note that because the website's navigation is built with Flash (Adobe), the initial homepage of a site developed in this way would probably be found by an automated search program (ref: bot). However, the subsequent pages are unlikely to be found without an XML sitemap.
XML sitemap example:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/?id=who</loc> <lastmod>2009-09-22</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.example.com/?id=what</loc> <lastmod>2009-09-22</lastmod> <changefreq>monthly</changefreq> <priority>0.5</priority> </url> <url> <loc>http://www.example.com/?id=how</loc> <lastmod>2009-09-22</lastmod> <changefreq>monthly</changefreq> <priority>0.5</priority> </url> </urlset>
Google introduced Google Sitemaps so web developers can publish lists of links from across their sites. The basic premise is that some sites have a large number of dynamic pages that are only available through the use of forms and user entries. The Sitemap files contains URLs to these pages so that web crawlers can find them[2]. Bing, Google, Yahoo and Ask now jointly support the Sitemaps protocol.
Since Bing, Yahoo, Ask, and Google use the same protocol, having a Sitemap lets the four biggest search engines have the updated page information. Sitemaps do not guarantee all links will be crawled, and being crawled does not guarantee indexing. However, a Sitemap is still the best insurance for getting a search engine to learn about your entire site.[3]
XML Sitemaps have replaced the older method of "submitting to search engines" by filling out a form on the search engine's submission page. Now web developers submit a Sitemap directly, or wait for search engines to find it.
XML (Extensible Markup Language) is much more precise than HTML coding. Errors are not tolerated, and so syntax must be exact. It is advised to use an XML syntax validator such as the free one found at: http://validator.w3.org
There are automated XML site map generators available (both as software and web applications) for more complex sites.
More information defining the field operations and other Sitemap options are defined at http://www.sitemaps.org (Sitemaps.org: Google, Inc., Yahoo, Inc., and Microsoft Corporation)
See also Robots.txt, which can be used to identify sitemaps on the server.