Talk:Sandbox Effect

From Wikipedia, the free encyclopedia

Articles for deletion This article was nominated for deletion on 8 May 2006. The result of the discussion was no consensus.

Contents

[edit] Highly speculative

The article on the Google "sandbox effect" needs a great deal more work on it, and it needs to clarify in detail what the alleged effects are. In it's current form it is highly speculative and offers no evidence to support the theory.

Issues that should be discussed in the article include:

  • how can you check if a site is "sandboxed" or not?
  • what length of time is the "sandbox effect" believed to last for?
  • Is a sandboxed site omitted from the Google for all search results, or is a penalty applied to its ranking?
  • why is it that sites alleged to be "in the sandbox" are said to rank well in Google for obscure searches, but not for other search terms?
  • what distinguishes search terms that are penalized by the sandbox, and terms that have no penalty?
  • does the "sandbox effect" relate to new web domains or to new pages?
  • what evidence is there to support the assertion that the "level of search engine optimisation appears not to be a factor"?

This last point is critical, as most discussions I have seen related to the sandbox effect actually relate to a lack of search engine optimisation techniques.

Howard Wright


I'd be the first to condemn a conspiracy theory article on Wikipedia, but I know from personal experience that the Google sandbox effect is a real phenomenon. --Beachy 22:41, 23 May 2006 (UTC)

OK then, how do you determine if a particular site has been "sandboxed" or affected by this sandbox effect? It would help a great deal if all those who have observed effects that they think are due to the Google sandbox could put forward a way of checking if a webpage or website suffers from this effect or not. How do you test this? How do you know if your site has been affected by the sandbox? How do you know the symptoms aren't due to other factors (level of optimisation, possible use of "spammy" techniques etc)

Any thoughts on any/all of the above questions would also be most useful - and if we want to end up with a sensible article, I think these issues must be tackled -- Howard


It would be in Google's interests to ensure their ranking/matching algorithms are not public knowledge, otherwise they would doubtless be abused by SEOs. Hence the sandbox effect is not a simple "flag" that we can pick up on for a given site. Think of it more of a "phenomenon" from our point of view. You know your site is being hit regularly by the Googlebot. You know it has a decent pagerank. You know it is included (and not blocked) in the index because it appears in the results if you search for your domain name. However, it is not indexed anywhere for relevant keyword queries. At least not until some period later (usually 6-9 months), when the site assumes a sensible position in all keyword searches. The "sandbox effect" is a perfectly plausible description of this phenomenon, and perfectly viable as a Wikipedia article. --Beachy 12:34, 26 May 2006 (UTC)

I think the article needs a much clearer description of exactly what this phenomenon is, and what the symptoms are. Once there are some specifics, it ought to be fairly easy for people to test individual pages to see if they think they have been "sandboxed". How does this sound:

Sandbox symptoms:

  • the page has a high Google ranking for some highly-specific searches, for example company name or domain name (where "high" means within the top 100 search results)
  • the page has poor Google rankings for most other searches (where "poor" means not showing up in the top 500 search results).

The key question is, how do you distinguish search terms that fall into these two categories? I've had this discussion before, and as soon as you find example of searches where a specific site ranks well (seeming to disprove the sandbox idea), the counter-argument comes that the search term you used "isn't relevant" or "isn't important". How do you tell if a given search phrase is "important" (i.e. will be penalized by the sandbox effect) or not?

Could someone provide some example URLs for pages they believe are currently sandboxed? Having some specifix examples to discuss should help to shed some light on the issue. If necessary, the URL can be entered as plain text, not as a link, to avoid this page itself from influencing the rankings etc.

Howard


One site that I am certain is currently sandboxed: http://www.travel-insurance-now.co.uk/ The domain was registered back in August and a number of similar sites agreed to reciprocal links. The pagerank of the site is very reasonable but it doesn't appear for any relevant searches (e.g. "travel insurance now uk") --Beachy 19:07, 30 May 2006 (UTC)

Search results from Google.com (31st May) give a number of high rankings for this site:

  • "John Holman Sons" - ranked 18th
  • "aig europe uk" - ranked 13th
  • "gosure insurance" - ranked 9th

Doesn't look like it's sandboxed to me.

Do you have thoughts on which terms a sandboxed site can still be expected to rank well (<100) for? Is it just domain names or company names? I really think some responses to some of the points I first raised above are needed, to give the article a stronger foundation. Even if there are no definitive answers, a summary of what the mainstream views are is, I think, essential - especially on issues such as how to test if a page is sandboxed, how long the sandbox effect is believed to last for, whether the sandbox effect applies by default to all new domains (e.g. not to completely new sites put up on older domains). Howard


The above site should rank highly for a search like "travel insurance now uk" After all, on http://search.msn.co.uk/ it features in the #1 (and #2) spot for this search. I think many of your questions have been answered, but I look forward to seeing a further investigation of the sandbox effect in this article and its talk page. --Beachy 08:53, 1 June 2006 (UTC)

I can't help thinking you're avoiding some of the key concerns here. If the above site is sandboxed, why does it get top-20 and top-10 Google rankings for a number of search terms? This looks to be a complete contradiction of what the sandbox effect is claimed to be about! How can these results be explained, if the site really is sandboxed?

Claiming a site is sandboxed on the basis of results for a single search term, when other results directly contradict the claim, looks like very shaky evidence.

As for the other questions, many of these haven't been touched on yet, e.g. how do you test if a site is sandboxed or not (this is fundamental), and how do you distinguish the search terms that are penalised by the sandbox effect from those (e.g. company or domain names) that aren't? Howard

There are many combinations of keywords that should produce a good listing for the site, like "travel insurance now", "backpacker insurance now uk", "winter sports insurance now uk" that should logically put the site within the first ~25 results, judging by the pagerank and keyword occurence of the site and its competitors. Distinguishing search terms that are penalised is easy - do a search for a relevant term, including some words that distinguish the site from its competitors (e.g. "now") and if the site doesn't appear in the top 100 results, it is obviously being penalised. --Beachy 15:56, 1 June 2006 (UTC)

Also, it looks like the links at the bottom of each page on that site are the likely cause of your problem in Google. The links (multiple links to the same site using different anchor text) look pretty suspicous, and my bet is that Google has (rightly) flagged these as suspicous, just as it probably has on all the partner sites that use the same tactic to link back to the travel insurance site. Most likely, MSN and Yahoo don't yet have algorithms as sophisticated as Google does for picking up this kind of thing, and haven;t penalised the tactic, hence the site still ranks well in these search engines. In my view, Google is quite right to treat these kind of links with suspicion. End result - the sites are penalised in the rankings.

In other words - nothing to do with the sandbox effect or the age of the domain. You can expect the same result for any site using these kind of tactics, regardless of domain or site age.

All too often, it seems there are entirely logical and straight forward reasons why certain sites that claim they are "sandboxed" have poor Google rankings. Spammy or dubious linking etc, and poor search engine optimisation are the usual causes, as in this case. Howard

I agree that these link blocks could be viewed as suspicious, particularly as they are replicated throughout the site. However, we have emailed Google regarding the Sandbox Effect in the context of http://www.travel-insurance-now.co.uk/ , and whilst they have not acknowledged the effect, they clearly stated that the site was not being penalised. --Beachy 15:58, 1 June 2006 (UTC)
It's also worth noting that the top two results in Google for searches on "travel insurance" both use the link block technique. --Beachy 08:02, 5 June 2006 (UTC)

[edit] move from sandbox dab

I'm moving this sentence here from the sandbox dab page. -Quiddity 23:43, 4 June 2006 (UTC)

The Google "Sandbox" is a purgatorial state, in which a website is spidered by the Google bots, accumulates backlinks and page rank, but does not appear in Google searches for appropriate keywords.

[edit] Not verified

I have added the not verified tag to this article, as apparently recommended by the latest deletion review. As the above discussion makes clear, the facts are disputed, hence for now the disputed tag will remain. Whilst the authors have cited sources, there are a number of problems with these:

  • One such source appears to both be referencial back to wikipedia (which is typicially frowned upon) and to also be the personal blog of one of the main authors. Whilst self-citing is allowed, it is undoubtably bad form unless supported by primary sources.
  • Although the URL of one of the cited sources seems to give it credence, this alone is not actually enough for it to be considered a primary resource. A lesser known press agency is typically only trusted when they also publish their references, allowing the tracing of information back to it's primary source.
  • Similarly to above, most of the articles do not give any reference at all to the sandbox claim.
  • The text on most of the references reads identically, leading me to believe that they are infact based off a single common source (which may well be one of the articles alone). One article extends the section on coping with being sandoxed, but this is likely an editorial decision rather than due to a different true source being used.

Unless a stronger, ideally primary, source can be found, I would personally recommend removing this page. This page could be put up for deletion again, or alternatively the speculation merged into the article on google's search technology. The not verified tag should allow sufficient fair time for better sources to be located. LinaMishima 17:35, 17 June 2006 (UTC)

To help with the above, I've reviewed the external links as of 21st June 2006:
  • seomoz article : Actually quite good! Well done to beach for adding it!
  • webpro news : Given the date of this article, I suspect this started a lot of the speculation. I personally think that it's a quite poor source.
  • Software Marketing Resource : Fairly detailed, but reads as if it's an expansion of the webpro news article.
  • Chrisbeach : As already stated, this article is referencial to this wiki article, which is generally bad form. It also only deals with one person's problem, and hence is fairly lacking in content. I'd actually recommend removing this one, as it adds little additional information other than a suspected sandboxing.
  • Jumping over the sandbox : Although this article was written reciently (but a few months ago), it too reads as if it's an expansion of the webpro news article. I'm not too impressed.
  • Forum discussion : blech. We should be very wary of using forum discussions as sources for articles, and this is generally discouraged.
Based on the above, it is clear that we should probably cull the source list, however I would advise that we leave this untill we have some better sources.
LinaMishima 13:01, 21 June 2006 (UTC)

As the issues raised in this discussion have not resulted in any changes to the article, and the evidence and citations for many claims remain weak, I have written a major update to the main article. I have tried to incorporate all the key claims in the original article, but have also emphasised that some issues are highly speculative and not universally accepted, while others have general agreement.

I have tried to highlight some of the confusions that often arise relating to the "sandbox", and search engine optimisation techniques. The article needs more work, particularly in reviewing the sources and links (I have left the external links section the same, but agree with above comments that some links are of little value).

I'm not sure of the standard procedures for this kind of thing, and have removed the "disputed" and "not verified" tags for now as the main article now makes it clear (I hope) which ideas and theories are still strongly disputed and speculative, and which are more widely accepted.

Further constructive comments or additions to the article welcome.

Howard


Could we please agree a way forward for this article? Despite raising a number of concerns and offering suggested changes and additions in this discussion, no changes have actually been made (prior to the update I made yesterday, reworking the whole article).

The main issue is that the article (before yesterday's update) makes many speculative claims with little or no evidence to back them up, and also fails to make clear that much of what people believe about the sandbox is hotly debated and disputed by many webmasters. There is no consensus. I believe the article needs to make it clear there are many view on what the sandbox theory is (including the view that the sandbox effect doesn't exist). The article should give some info on the different views, without attaching undue weight to any single idea unless this can be backed up with firm evidence.

The article also needs to refer to search engine optimisation, and the "spammy" techniques which can get a site penalised (this is a widely agreed effects), as this effect is commonly (and wrongly) attributed to the sandbox.

Howard

We already have agreed a way forward for the article, Howard. Myself and other editors accept your concerns, and we too would welcome additional information. When I know more about the sandbox effect I will add information. When a notable source confirms / denies the existence of the sandbox, it will be added to the article. I don't see the value rewording the whole article to include "but some people don't believe this" after each paragraph. There's no point introducing more weasel words. Let's focus on what we know, and work towards a unified view of the theory. If you don't believe the theory exists then that's fine, but please provide evidence to support your claim, rather than diluting the content of the article with "some don't believe.." weasels. It is made very clear above and below the article that it is considered speculative. --Beachy 19:04, 29 June 2006 (UTC)
Likewise, I felt that there was a general consensus to leave the article for now untill better sources cropped up. I agree, though, that some mention of how the sandbox effect relates to SEO would be wise. However this should be no more than a few sentances linking to another article on SEO. Whilst much of the article is contested, it would be bad practice to state this repeatedly within the article (since wikipedia is supposed to be based on facts, and hence would provide further grounds for deletion). We also must be careful to not give the sandbox undue publicity through this article, as we do not wish for it to become inspiration for possible sources to cite. LinaMishima 14:46, 13 July 2006 (UTC)

Nigel:

I found this article to be very accurate and useful. However, I wish to add an interesting observation. The Googlebot put my personal site into the sandbox for three months (September 2006 to December 2006). I suspect this was because I had broken one of the Google guildelines. During this period, I changed the site to try to make it more Google-friendly. My site has now been released from the sandbox, but the cached information that Google hold for the site is still dated September 2006. This suggests to me that releasing from the sandbox is done manually rather than automatically by the Googlebot.

--Nigeljbee 18:43, 14 December 2006 (UTC)

[edit] Contradiction

Beachy, I'm reverting the reversion of my addition of the 'contradict' template (i.e., putting it back). Please read the edit summary of an edit before reverting it. The edit summary completely explained the (already self-evident, if you just read the article) rationale for adding the 'contradict' warning.

Relevant part from my original edit summary:

Added 'Contradict' template - para. 2 ("only...Google") directly contradictions para. 1 and para. 3 ("search engines" plural, in both cases)

You reversion edit summary:

Why discredit this article? Where is the discussion about contradictions??

Reply:

  1. I am not discrediting the article. Flawed articles discredit themselves. This article is already "in trouble" and has been for some time, barely surviving an AfD, and doubly-flagged as unreliable (I even undid the redundant warning templates on this, as the more serious one subsumed the concerns of the second one. I'm giving this article more faith than perhaps you are giving me credit for, and am simply labelling its problems as honestly and clearly as I can.
  2. The contradictions are so clear that they don't even need to be commented on in my opinion (as to their nature, they are clearly the result of someone editing the earlier article text without reading the entire article, with the result of changing the factual claims of the extent of the "sandbox effect" in only one place instead of all three related passages.) But, I was very specific about the nature and location of these contradictions anyway, in my edit summary, even quoting the exact wording at issue. What more would anyone want?
  3. There is no need to go into detail about trivial and obvious article issues on the Talk page when covering them in the edit summary will suffice. Unlelss people won't read edit summaries, I guess. :-)

PS: I did not simply resolve the contradiction with an edit one way or the other because I am not in possession of sufficient facts to say one way or the other whether whether the "sandbox effect" (if it is real, which is itself still an issue of dispute!) applies to "only...Google" or to "search engines" more generally. Again, those are direct, and directly contradictory quotes, from the article as of this writing, paragraphs 2 and 1+3, respectively. — SMcCandlish [talk] [contrib] 03:56, 22 July 2006 (UTC)

SMcCandlish, I have removed the contradictions as it is generally accepted that this effect is limited to Google, both in my own experience and that of others (see external links section). --Beachy 19:21, 22 July 2006 (UTC)

Exactly because it is "generally agreed" but NOT a universally verified fact agreed by everyone that the sandbox applies only to Google, I have restored the para making this point to its original location and moved the speculative para about why Google introduced the sandbox to it's original position. We need to work towards an article that is more widely agreed upon that at current. We need to start off with the common ground, explain that in as uncontroversial a way as possible, and then deal with more speculative points of view later. Since it is not a verifiable fact that the sandbox only applies to Google, it is misleading to state this. It is clear from the many search engine optisation forums etc that there are many people (though still a minority) who believe that other search engines also have a sandbox. We need to phrase the article in such a way that it represents the overall picture fairly, without giving undue emphasis to specific points of view. Howard.


[edit] The Idea Exists

I really don't understand what all the debate is about deleting this article. I see no reason to delete it. I read this article because I wanted to know what people where talking about when they said "google sandbox" and the article accuratly told me what they where talking about. There is nothing in the article (anymore) which is controversial because the article very clearly says that the google sandbox may or may not exist.

Even if the google sandbox is an illusion, the idea of the sandbox is real and talked about and there should be an article about it. If an article that explicitly says the sandbox may not exist is in dispute, shouldn't the article on Santa Clause be in dispute too? Should I nominate the Santa Clause article for deletion? This article does not say that the sandbox is real, infact, being someone who has never heard of the google sandbox before today, I can tell you that this article practically convinced me that there is no sandbox, since it mentioned at least 16 times (that I counted) that no one can be sure that the sandbox exists and mentioned at least 4 very plausable alternative explanations for what causes people to think there is a sandbox.

I think that the "The Factual Accuracy of this Article is Disputed" could probably be removed with very little editing.



As far as I can tell, the following facts are not in dispute:

  • Some people believe that there is something called a "sandbox" which limits the appearance of pages on newly created domains.
  • This does not seem to happen on other search engines.
  • The sandbox probably doesn't really exist', although there is no way to be sure since Google would be unwise to publish their algorithms.

The facts that are in dispute are rightfully not included in the article:

  • The sandbox exists.
  • The sandbox does not exist
  • Google policy limits new domains to protect against abuse.




As long as it doesn't say that the sandbox is real, there is not much else to be in dispute. Maybe it is time to remove the in dispute tag.--VegKilla 23:00, 30 October 2006 (UTC)

I just noticed that the "dispute" tag has been removed. The "does not cite sources" tag remains.--VegKilla 21:24, 2 November 2006 (UTC)


[edit] References at bottom

Maybe the reference to Searchguild.com should be at the bottom of the page instead of in the article.

Should we change "Searchguild.com has created a test to determine whether a site is in the sandbox,"

to "There is an online test to determine whether a site is in the sandbox,"

We could still have the word test be a link to the Searchguild.com site, and the URL could still appear, but at the bottom of the page instead of in the text.

Since Searchguild.com is not just the URL but also the name of the company, I might be wrong in thinking that it should be moved.--VegKilla 23:00, 30 October 2006 (UTC)

[edit] Article Biased

The article is from the point of view that the sandbox effect really exists. I mean when I read it I thought that it really existed. --Luke 01:16, 28 November 2006 (UTC)

Many of us know that the sandbox effect does in fact exist in practice, as we have seen it. Whether you believe it is due to "policy" or "algorithmic" effects is immaterial. This article attempts to explain a genuine phenomenon. --Beachy 12:05, 29 November 2006 (UTC)

[edit] External links

Here are the external links. These should be converted into references, as footnotes in the appropriate places. A list of links is virtually useless because it doesn't relate the facts stated in the article to the source. This stuff could be linkspam, or is at least an open invitation to more linkspam.

Cheers, Jehochman (Talk/Contrib) 21:59, 26 December 2006 (UTC)

If you're willing to convert the links to references then by all means do so - but don't do half a job and leave the article without any links! --Beachy 22:56, 26 December 2006 (UTC)


I don't want to be taken as a spam slinger but I can provide proff of the "sandbox" I have a site where I have been able to find myself through google. My site and that page of the site is indexed. After that I have done a search for my site on google.

The terms I used were all specific key words that were on the specific page. I narrowed the search patern to 300 and something pages that had those specific words on them. My page was NOT on any of the google listings for those search terms.

Just to make sure that the search terms were on my specific page of my site I did a search of www.mydomain.com and found the terms on my site "cash ed" in www.mysite.com/something/something_else.htm

If this test is in dispute I am willing to name my site and name the terms I used. that would validate the first bullet in the top post by oldafdfull on the 8th of may 2006 as for how long a sandbox last for I would say that is depandant on the type of site you have (adult vs not adult, .edu vs .gov, or .com vs .net). I have found that my site can only be found using the URL title. I have been found using non-words that don't have much to do with my site such as the type of camera I used to take pictures with for my site actually gave me a hit. I would like to reffer to oldafdfull's question in specific

"* why is it that sites alleged to be "in the sandbox" are said to rank well in Google for obscure searches, but not for other search terms? "

This is obvious when you think about it how many pages have the specifc word set crafted in such a way that no person would ever search for such a thing. Example... guarana, antarctica, rio, wholesaler, english, rio, 888, 355ml, "78963 10020", contact, tudo. But if I were selling Guarana (a really drink from Brasil) I bet ya I would have all of that on my webpage if I were looking to export to someone in america. If I had this on my page sure I could be the "top ranked for a search that nobody would ever do in the real world. But I look at my page and see that I have all this information on there and wow I'm number in a really silly search term. The point is while their can be only one #1 for the word "Cars" many sites that have the word "Cars" in them are still not getting listed and can't be found without the above mentiond silly search term.

  • what distinguishes search terms that are penalized by the sandbox, and terms that have no penalty?

I think this previous question is somewhat off topic of what puts you into the sand box. It may be a mis understanding on my part so please claify.

  • does the "sandbox effect" relate to new web domains or to new pages?

Speculation here. You have all these spammers that know that a page with 10,000,000 internal pages have a better chance of getting higher PR than a simple 3 page with a lot of content. In order to seed out the spamers put things into a sandbox and see what shakes out. The 10,000,000 page site still has 10,000,000 pages after three months "flag it", the site that started out with 3 now has 9 pages let it go. This would show that the big G has a math equation for seeing if a page grows over time and if it does well they must be adding quality content.

  • what evidence is there to support the assertion that the "level of search engine optimisation appears not to be a factor"?

I wrote my page without any regard for SEO. Just normal writen text that I would write in an email to a friend. I'm in the "Box". At this time I have started to play the SEO game I'm still in the "Box". So I have seen zero change from the view of the big G.

<soap-box> personal notes Once again I am willing to post my URL here but only if requested. I am only giving my view of what it is like to be in the "sandbox" and what I am trying to do to get out of the damn thing. The biggest frustaration for me is I have a product site that nobody knows about, that nobody will know about. I have the choices to pay for ad-words to be seen (Hmmmmm... no google would never "make someone pay to be listed"), or spam boards (like this one) for a link or two that might get me onto the regular plaing field where I can start to play the game like everyone else.

I take full credit for all spelling errors, I'm a Pragrme, I'm a Progrmere, I'm a .... damit I write code. </soap-box> P.I.C. dec 31 2006 0315 gmt -8

[edit] Weasel Words?

I'm relatively new to Wikipedia, but the first two instances in the "Community Response" section marked as in need of citations look like weasel words, such as "it is thought, by some..." or "Since Google...has distressed webmasters who...". Am I correct in assuming this? Thinboy00 02:22, 26 March 2007 (UTC)