Talk:Internet Archive/Archive 1

From Wikipedia, the free encyclopedia

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

1 Accessibility
2 Amazon connection?
3 Phish
4 Text search
5 Controversy?
6 Removal by new domain owners
7 Who wrote this their PR department?
8 Website?
9 IRC channel
10 What nice people
11 Carbon copy
12 Wikimedia mention
13 Wikipedia template related to the Internet Archive
14 An error in 'Moving image collection'
15 Lawsuit
16 Full Text Search

Accessibility

I think that wikimedia should allow itself to be accessible over the arhive. Wikipedia is too valueable a website to not have on the archive. I understand that modifying the robots.txt file to allow wikipedia to be crawled may not be such a good idea but perhapssomeone can find another way. -Edward Nardella

Amazon connection?

I'll be cureious to know if there is a connection between [Amazon] and archive.org ? If yes are there alternatives ?

I don't think there are any direct connections between the Archive and Amazon. However, Brewster Kahle, who founded the Archive, also founded Alexa Internet. Amazon acquired Alexa around 2000. Brewster may own Amazon stock and certainly knows Jeff Bezos, but I don't think the Archive does anything with Amazon, or vice versa. The closest I know of to a relationship between Amazon and the Archive is that Amazon subsidiary Alexa donates its webcrawl to the Archive. --Zippy 07:18, 17 January 2006 (UTC)

Phish

phish has never allowed their shows to be hosted at the archive and dave matthews band no longer allows their concerts to be hosted there.

Text search

Just noticed that as of today, the Wayback machine now also has a full text search. Don't know if that's worth mentioning. --Aramgutang 12:00, 9 Aug 2004 (UTC)

Controversy?

There must have been controversy about publishing copies of everything on the web - the article should include info on this topic!

I believe this falls under the umbrella of "cached" information being legal to own and to display, in the same way google can cache any page and translate any pdf/doc to html, while at the same time any page can block the application thereof. To say it another way, the cached information is basically their distribution of the material to you, in the same way a newspaper is printed. The print (ink/data) is stored on a media (paper/hard drive) for your use, and if you want to give it away in public, who can stop you? What shouldn't be legal is the redistribution of cached material, but how can this be stopped?

Removal by new domain owners

What I find very annoying, is that anyone who has let their domain registration lapse, or had it stolen or purchased, can have their historic content removed by a new owner using robots.txt. This includes content that the previous owners of the domain might wish to remain in archive.org. Since many people cannot afford to keep their domain indefinitely, any content in archive.org could conceiveably be removed, given enough time. Shouldn't there be legal channels with which one can restore their own content, by proving ownership of a domain for a given timeperiod? Would this be worth mentioning it in the article? 64.162.10.162 06:09, 10 Mar 2005 (UTC)

Who wrote this their PR department?

This page is symtomatic of several I've recently encountered on Wikipedia (gmail comes to mind), only it's worse. The gmail page although neglecting controvosial aspects, at least mentioned them. Why doesn't this page mention that archive.org not only complied with scientology's demands to remove pages - it exceeded them and lied about it.

Fix this page please, it's currently a mandible-straining servicing of Mr. Kayle.

Please provide some links for further research on this(i.e. news reports, watchdog websites, etc.) That's the first step to getting it included. JesseW 18:42, 15 Mar 2005 (UTC)

Here's some links on the Scientology removal: CNET story, Forum post at archive.org, LawMeme article. JesseW 19:25, 15 Mar 2005 (UTC)

Website?

I dare not remove it, but WHAT is the link to the 'related' page on Website doing here? Shouldn't a post to Internet be posted here as well, then? And to Archive? I mean, it just seems somewhat ridiculous to me, this link to Website.

IRC channel

Don't they have an IRC channel? If anyone knows it, please add it.

Until some one tell if there is an official one you can join irc://irc.freenode.net/#archive

What nice people

Why should there be someone putting archive.org on the internet for free? Who is behind it? Storing some petabyte of data AND providing free bandwidth is something quite expensive.

--Abdull 8 July 2005 14:25 (UTC)

information on ownership, governmental?? would be very useful --194.66.208.11 16:52, 31 July 2005 (UTC)

Its a 501(c)(3) non-profit, its not governmental. Much of their support comes from Alexa, in the form of most of the data they carry (the Wayback machine is all content spidered by Alexa). Who funds them is not boasted about on the site, so finding out isn't all that easy. They do solicit for donations, and their premisies (in the Presidio of San Francisco) is likely to be free - the entire site is a national recreation area, and at least here (Ireland) that would imply state ownership. However, to find out who does fund them would probably need Original Research. --Kiand 19:15, 31 July 2005 (UTC)

There is a very small list of patrons however - Alexa, Hewlett Packard, the William and Flora Hewlett Foundation (see previous, I guess), LizardTech, the Library of Congress, and some others. --Kiand 19:17, 31 July 2005 (UTC)

Carbon copy

When they say Alexa donates content, does that mean that not every single webpage can be found on the Wayback Machine? Or is the Wayback Machine a carbon copy of the internet (or maybe just the web) up to a certain point? How exactly does it work?

For the Internet Archive (the one the Library of Congress has copies of), is it a carbon copy of the internet up to a certain point, or is it also just bits and pieces of the internet?

When Alexa crawls the web, they start from the list of web pages they already know about and retrieve copies of each of those. They then look at the pages they just got for any URLs that they didn't already have in their list. These then get added to the crawl, and the process continues.

What Alexa donates to the archive is this collection of stored webpages. There is no way that Alexa will ever have every page on the web, but they do about as good a job as Google with their web crawl. For examples of missing pages, try a frequently changing site that isn't too popular, like an obscure messageboard. The Archive probably will only have a few of the pages for such a site. --Zippy 07:23, 17 January 2006 (UTC)

Wikimedia mention

"Many people consider the Internet Archive to be a sister project to the Wikimedia Foundation's various projects." This seems valid but it's so daunting high up in the article that it makes it seem too prominant. We should shy away from Wikimedia bias and I believe that any other encyclopedia would not have mentioned the Wikimedia projects in the first couple sentences of the article.

I'll go one further — it looks pompous, and (at least to this little black duck) doesn't make sense. In what way, exactly, can the Internet Archive be considered a "sister" to Wikipedia and its actual sister projects? Who are these "many people" that think this? I'll remove it from the article. --fuddlemark (fuddle me!) 22:01, 28 September 2005 (UTC)

Wikipedia template related to the Internet Archive

I knew something like this existed, but it was tough to find it: {{Waybackref}}. User:Ceyockey (talk to me) 00:04, 5 March 2006 (UTC)

What's supposed to go into the "work" part of that template? One of the examples has the original URL. Esquizombi 22:51, 3 April 2006 (UTC)

I've always been a bit confused about that myself .. 'work' is not always applicable. The related {{cite web}} states "work: If this item is part of a larger work, name of that work." I personally think that {{Waybackref}} and {{cite web}} should be merged. User:Ceyockey (talk to me) 00:59, 6 May 2006 (UTC)

FYI, that merger happened recently. You may find {{cite web}} more natural to use than {{waybackref}}, especially for a previously-online citation that now needs to be obtained from the Wayback Machine. RossPatterson 00:00, 28 June 2006 (UTC)

An error in 'Moving image collection'

I couldn't find an episode called "The Negro Soldier" from the "Why We Fight?" propaganda series. It is said to be produced in 1943 so I guess it is really The Battle of Russia?

Lawsuit

Is there anything better than a link regarding the Polska lawsuit? http://www.nyls.edu/pages/1819.asp Or should it get a longer explanation to explain what happened there?

Full Text Search

Does anyone have any idea what happened to the full text search feature? Not only is it gone, there doesn't seem to be any explanation of _why_ it's gone. Baby fenris 06:00, 27 July 2006 (UTC)

Categories: Talk archives