Talk:Sitemaps
From Wikipedia, the free encyclopedia
merlinvicki, took the signature off, needed on Talk pages but not article pages
Jwestbrook 22:51, 19 October 2005 (UTC)
past link to article on Merlinvicki
Jwestbrook 23:34, 24 October 2005 (UTC)
- oops
J\/\/estbrook 18:16, 6 November 2005 (UTC)
Contents |
[edit] Addition of links tab
Does anyone know the exact date that the links tab became active in Google sitemaps? Siralexf 17:15, 9 February 2007 (UTC)siralexf
[edit] Multiple links provided with Google result
I've noticed that in the last few months, some site listed in google results include sublinks. For example, this search for slashdot returns a link to the main slashdot site along with links to Games - Login - Apple - Science beneath the description. Is this one of the benefits of submitting a site map to google? If so, it would be worth mentioning in the article. mennonot 09:39, 19 November 2005 (UTC)
- Not related, Matt Cutt explained it was Google search results improvement: [1] Ivan Bajlo 15:09, 27 December 2005 (UTC)
[edit] Wiki Sitemaps?!
Does Mediawiki has an extension that creats a Google Sitemaps auytomaticaly? Or is it is built with integrated sitemap? F16
- Yes there is a script to make sitemaps in your wiki's /maintenance directory. Jidanni (talk) 04:25, 15 March 2008 (UTC)
[edit] Sitemap generation tools
I'm removing a bunch of links to sites that claim to generate sitemaps... by spidering a web site. Can someone please explain how this is any different from allowing the search engines spider your website? Seems pretty pointless and shady to me. --Imroy 20:53, 31 August 2006 (UTC)
==
True, but some tools (not exactly the formerly listed ones) do provide some added value, for example editing attributes like change-frequency or priority. Crawling the site is just a way to create the initial URL list then. However, I don't think that listing every sitemaps tool is a good idea, providing links to lists of tools like at code.googgle.com or sitemapstools.com is enough. That said, I do think that linking to a sitemap validator is a good thing. I provide such a free tool (along with tons of sitemaps info, FAQs, a Vanessa Fox interview ...) on my site at smart-it-consulting.com and somebody linked to it a cpl. months ago. Unfortunately, this link is gone too. --Sebastian September/21/2006
==
Why is ROR listed? All major search engines support RSS, but none (!) of them states support for the added ROR fields. If you don't understand what I mean, check this article: http://www.micro-sys.dk/developer/articles/website-sitemap-kinds-comparison.php You can see Google and Yahoo mentions a lot of formats, but none of them is ROR.
--Tom November/10/2007
==
Spidering a website is the only reliable way to create a sitemap particularly for larger, dynamic websites. When search engines crawl your site, they do not produce a sitemap for you. The entire point of "Google Sitemaps" as well as Yahoo's sitemap program is that webmasters are asked to submit a sitemap. The search engines want sitemaps which is why this page exists here. Besides this, a sitemap service can share their findings with the webmaster... which the search engines do not do very well, if at all. Not all pages on the web are coded very well and despite the myriad of articles which explain how to write good code, for many it's easier to get a list of coding and HTTP protocol errors that are specific to their website (pages, server responses, HTTP status errors, etc.). What is shady about it? --MaxPowers 08:23, 25 January 2007 (UTC)
==
Particularly for larger, dynamic websites the sitemaps should be generated from the underlying database. If dynamic sites use spidering tools to create the sitemap, most probably especially the URLs not visible to SE crawlers will not get included. Makes sense? --Sebastian February/06/2007
- Removed another link. Please se WP:EL for more information, specifically, what is not accepted:
==
Spidering allows a realistic view of any size website and can be used to uncover errors on the page due to template or 'cms error'. One typical example is on WordPress blogs where commenters do not leave a website address and the link is listed as href="http://" when the page is displayed to browsers and SE spiders. This is technically a broken link and is one example of how a spidering service can benefit a webmaster by sharing their findings. Which URLs would not normally be visible, but need to be included in a sitemap? It would seem to reason that if a site wants the SE's to see a page, it should be visible and should have at least some pages linking to it if that page is to do anything within any search engine. All SE's will filter out orphaned pages including Google.
The sitemap programs (not software 'programs') offered by the search engines allow webmasters to share URLs that are not generally spidered, such as multi-level navigation through categories and sub-categories, but if 'normal' navigation is broken to the point that spidering is "impossible", then it is generally a poor navigational structure to begin with. Some spidering services offer a means to get around this anyway using scripted images, but this is probably irrelevant for this discussion.
The biggest problem with db-based systems is that they are very specific to a particular application and do not cover other areas of comprehensive websites (forum, blog, cart, general CMS, static pages, etc. all on one site). I would agree that db-based sitemap generators could be more efficient as they don't require a full page to load, but that efficiency comes at the price of sacrificing completeness in many cases and accuracy from a spiders point of view in all cases.MaxPowers 05:39, 8 February 2007 (UTC)
[edit] Robots.txt "Sitemap:" declaration.
The text both at sitemaps.org and here says:
"The <sitemap_location> should be the complete URL to the Sitemap, ..."
Note that "should" is not "must," and as other directives (namely, "Disallow:") use relative URLs, not absolute ones, the language used in the definition of the declaration implies that a relative URL for a site map declaration (only in the "/robots.txt" file) is valid and may be used. If the intent of the definition were to require only fully specified URLs, the language used to specify the declaration syntax needs to be changed. I have noted that some people think that only a fully specified URL can be used in "robots.txt" for a site map declaration; such a conclusion appears erroneous based on the diction used.
I assume that verbs such as "should, must and may" have their usual meanings as in the popular Internet "request for comments" document series.
- D. Stussy, Los Angeles, CA, USA - 08:30, 31 May 2007 (UTC)
- Of course you can put it in there that way. It won't break robots.txt. However: I want sitemap-aware bots to figure out where my sitemap is, so I'll give them what they're expecting: a full URL. 198.49.180.40 19:17, 8 June 2007 (UTC)
[edit] Submit site map URL
www.google.com/webmasters/tools/ping?sitemap= or http://google.com/webmasters/sitemaps/ping?sitemap= ?
See http://www.google.com/support/webmasters/bin/answer.py?answer=34609 —Preceding unsigned comment added by 87.119.120.23 (talk) 13:10, 22 February 2008 (UTC)
[edit] Submission Sitemap Externals
I forgot to log in when I added those external links, they contain the "official" method by that said search engine to submit a valid XML sitemap. Neither of the pages attempt to sell a product and keep the valid neutral point of view. Please comment here on any change suggestions.
This can help clean up the article with all the how-to information and allow an external neutral point of view apply any official how to methods for that said search engine supporting the sitemaps feature.
SDSandecki (talk) 06:21, 25 February 2008 (UTC)
[edit] Plain text OK too
Mention that sitemaps can also be in plain text format: sitemap.txt, and sitemap.txt.gz. See the Google webmaster tips if you don't believe me. Jidanni (talk) 04:27, 15 March 2008 (UTC)