Wikipedia talk:Semantic Wikipedia
From Wikipedia, the free encyclopedia
See: http://meta.wikimedia.org/wiki/Semantic_MediaWiki
[edit] Original post in the village pump
I've been thinking about this proposal for some time, but being a proposal, it of course requires comment and refining.
I've been thinking on how to make Wikipedia better, and while reading about Tim Berners-Lee's Semantic Web, it came to me. We need a Semantic Wikipedia. Perhaps I'm using the wrong term, but I like it. :)
The problem with Wikipedia right now is no sense of context. Everything related to something has to be manually added. For example, when we say that someone was born in 1932, that doesn't really mean anything. It doesn't alter the "people born in 1932" pages, it doesn't alter 1932, and it's nothing more than text. Likewise with saying they were born in Lubbock, Texas, or that they were an architect. Furthermore, saying that an object is in Edinburgh doesn't matter much until we tell Edinburgh that.
We need to automate and integrate these issues.
The idea I had was a mix of a template and meta data. For editing, it would take the appearance of a template, but the template would not be required. For example, a "biography" template would include birthday, deathday, primary occupation, secondary occupation, birthplace, deathplace, primary residence, etc. A planetary template would be appropriately different, as would an elemental one, and some things wouldn't use such templates at all, though they could still use meta data.
A major use of this would be for synopses. For example, "Why this person is famous" in one line. This would allow for quick browsing of entries without having to load the entire entry, and it would be automated based on the entry itself, rather than a manually maintained list. So a list of people born in 1932 could also list the synopsis as to why they're famous. This would also ease the process of picking notable anniversaries.
Some of this is managed by categories, but not nearly enough, and I'm not a fan of the categorization system as it presently stands. However, that is for another discussion.
Does everyone know just what I'm trying to get across here? Meta data to help organize and express the information. For example, the header to George W. Bush could include:
- Birthdate: July 6 1946
- Notable for: 43rd President of the United States.
- Secondary reason: Presided during September 11th Attacks and invasions of Afghanistan and Iraq.
- .. and any other possible metadata that could be used.
So, when looking at a list of people born on 7/6/46, it would say him, along with the reasons he's notable. This could allow for a quick list of automatically generated synopses.
A secondary advantage of using a template in this fashion would be to remove Categories and Interwikies from the main article space, making it easier to edit and comprehend. They would still be there, but exist as meta data, not in the main text area. Footer and header tables could be maintained the same way, removing them from the concern of the main text.
I've been pondering this for a while and finally put it in writing, so sorry if it rambles a bit. It focuses on biographies, but it could be useful for many things. A city template could include its coordinates, state and country, etc, allowing for people to get a quick list of townships in Ontario, for example, without having to rely on a manually maintained list. I guess what this is all about is automating many functions that should be automated.
Any comments? --Golbez 04:25, Nov 14, 2004 (UTC)
- This issue is handled to some degree by the existence of categories. When a cat tag is added to an article, the category is automatically updated - no need for "manually updated lists". So adding Category:1946 births to George W. Bush puts him on that page. Unfortunately, cats aren't always the most reader-friendly - they are just lists with a little header, and they are slightly out-of-the-way at the bottom of Monobook. I think you have a good suggestion with creating synopses based on these categories - maybe somthing along the lines of a WikiReader, but tied directly to categories. The only problem is that this would require the creation of metadata for every article - all 1,712,281 of them. Some stub articles you couldn't even make metadata from. So while I think this is a good idea, it might not be currently feasible, without a huge project to create vast wodges of metadata. Some pages already have semi-metadata - anything that was formatted by Wikipedia:WikiProject Elements, for instance, like Hydrogen, has an infobox. But this is the minority of pages. --Whosyourjudas (talk) 03:58, 14 Nov 2004 (UTC)
-
- Categories are inadequate, though, and contain no context. They don't say who the person was, or why that item is included in that category. And categories themselves are manually updated lists. Yes, all 1,712,281 of them, because this is a work in progress, and always will be. Just because there are already so many articles doesn't mean the process can't be refined. Elements has a good idea, but that's just a couple hundred articles - There are many, many more biographies.
-
- I guess my point here is, few articles have any context beyond their borders. We need to connect them. --Golbez 04:25, Nov 14, 2004 (UTC)
-
-
- I think it's a great idea and have had similar thoughts myself (for instance some standard format for someones birth and death dates that can be automatically recognised replacing Categories YYYY Births and YYYY Deaths which do their job but are 'clumsy'). However like the semantic web itself I think it will probably be another 5 years before the underlying standards evolve and 'bed-down' before the MediaWiki software (or successor of) can incorporate them to level discussed above. CheekyMonkey 14:57, 14 Nov 2004 (UTC)
-
-
-
-
- Then there's no harm in starting work on such now. :) --Golbez 23:19, Nov 14, 2004 (UTC)
-
-
- I am very much interested in the Semantic Web, and I have also been thinking about how Wikipedia can be made more "machine readable". The problem is that people are not that used to working directly with metadata, and doing so is not very wiki-friendly, at least if one goes back to the original thoughts behind wikis (granted, Wikipedia has already side-stepped a lot of the original guidelines where it made sense, and for good measure). We would have to make it as easy to edit and expand the metadata as it is to edit the article right now...I think it is a difficult problem to solve (look at how oblivious most people are to the metadata information stored in Microsoft Word documents. Most of the time they're not even aware it's there).
- As an aside, I think it is interesting how wikis and Wikipedia have already brought one of Tim Berners-Lee's original thoughts with the World Wide Web to life; that everyone would become a publisher. The easy editing of the Web was not implemented in the ground-breaking web browsers, and was long forgotten, but wikis bring that back. Wouldn't it be great if Wikipedia could be on the forefront of yet another web revolution; the Semantic Web? — David Remahl 23:27, 14 Nov 2004 (UTC)
-
- Thanks for the comments, and yeah, it was the articles on him and the Semantic Web that started this churning in my mind. My idea was, there would be templates (like "put date of birth here") but those wouldn't be required; the metadata would be stored with the article text just like categories and interwikis are, causing them to show up in a diff. A side benefit of this would be to also add a separate template text entry section of interwikis and categories, thus perhaps enforcing format rules as to where they belong in the article text. I wonder if Tim Berners-Lee is familiar with Wikipedia? Surely he's heard of it, but how familiar is he? --Golbez 10:30, Nov 15, 2004 (UTC)
-
-
- He certainly has heard of it, he even described Wikipedia as "The Font of All Knowledge" in a speech to the MIT Emerging Technologies Conference [1] (penultimate paragraph). CheekyMonkey 14:05, 15 Nov 2004 (UTC)
-
- I like the idea. It is essentially normallizing part of the data (in database terms). It would allow one to use that data programmaticaly. Morris 03:26, Nov 18, 2004 (UTC)
- Right. It would convert the information from pure text into meta data that can be used to ease organization, searching, browsing, and integration of data between articles. --Golbez 18:39, Nov 18, 2004 (UTC)
- Yes, I agree such a system is needed. I thought about it also, for country data, because statistics like population or GDP change quickly, so there should be some nice way to update all articles about countries with new data. Samohyl Jan 12:32, 19 Nov 2004 (UTC)
- Take a look at Web Ontology Language. There was a long discussion without conclusion, now archived at Wikipedia_talk:Categorization/Archive_3, just after categories were introduced. See the "Describing the relations" and "Ontologies and OWL" sections for my thoughts. -- Avaragado 19:05, 25 Nov 2004 (UTC)
- I also already thought about this and think it would be necessairy. If some initiative is started I would like to contribute (in matter of conception and coding) - so please keep me informed :). -- Chrisgraf 16:04, 5 Dec 2004 (UTC)
[edit] On Data and tables
I agree with your end, Golbez, but not with your means. Yes, it is important to able a machine access wikipedia data so it's legacy will go further than a website. But never in prejudice to users.
Is not easy to write wikipedia, but as Don Norman would put it, it's complexity lies in the task, not in the tool. We're trying to write the biggest book ever after all. So I believe than automating some tasks would certainly have big trade-offs. The first rule of Wikipedia:Usability guide for future improvements (wich I just wrote up) is Never mix human readable with computer readable, not if it makes it more dificult to humans.
Let's suppose there was a automatic tag summary. Writting {summary:history of haiti} or {intro:history of haiti} would dysplay respectively the first paragraph or the first line of history of haiti.
Imagine how nice! This way you could put a small summary of haiti in many other articles, one that would be automatically updated. Imagine the ease on wich editing one article would in fact change a lot of others. And then you just needed to put many summary togethers and you would have a ready made history of the caribean, history of latin america, history of the XX century and so on. Wouldn't it be great?
In fact no. An article on history of a region is way more than the sum of it's parts. A good summary of an article is not the same of a summary that should fit in other articles like days. They may be good starting points, and many good articles have started from copy pasting parts from others - and the editing them together. The end result would be articles harder to read and harder to edit.
Keep wikipedia on an human scale, do not artificially inflate it. It has been working.
- I don't think this is quite what I was proposing. I wasn't saying that entire articles should be cobbled together by machine from other articles; my first priority was dealing with things like lists. Who was born in a certain year, and why they're on Wikipedia. That could be done automatically, rather than having robots or people trying to handle it. Likewise, a list could be quickly generated of all landmarks in a certain US state. This is not meant to be a replacement for actual articles. --Golbez 22:18, Dec 12, 2004 (UTC)
But there is something to be said about semantics. Computers are able to read facts, not understand articles. Semantics are not in the body of text, but in those tables at the right.
In fact those tables themself present problems. Each user creates them differently, some use html (<HTML><TR><TD>a totally unreadable code</TD></TR></HTML>) table others use wiki tables (|------- a clever code, but gets complicated when you want to add colors or control size) some even use . The result is a table that is hard to edit, hard to personalize and mainly something that our user number one sees as plain noise.
So, I propose a different approach, the article should keep not a table, but just the info. I'll explain by example.
In the Kyoto article, instead of a table there should be a tag like that:
{{ City Capital: [[Kyoto]] Population: 2,644,331 Pop_density:573/km² Map:[[Image:Japan_kyoto_map_small.png]] Region:[[Kinki region|Kinki]] }}
This would act as two things, first as a template, just as if the user was using the Template:city, by putting a {{city}} tag. Secondly would act as a variable setter, creating variables called “Population” or “Map” and assigning them the values. Maybe we should not even use the name Variable, as this is a programming jargon and, as I said before we do not want to seem that wikipedia is for programmers. Notice that I’m not saying something as
Var: CityName.local= “Shacbiss”
But a simple syntax, in which the word between the line break and the : sign is the variable name and everything after the : and before the next line break is the value.
Thus, any occurrence in the document of something as $Name or $Population would be substituted for the corresponding value. This way if in the aforementioned example, the city template was a table, with the corresponding information on variables, not only it would be easier to give information about a given city, but it would also be easier to change the layout of the table in all of them. Of course, in the example above, one could use a general {{City}} template but a more specific {{Japanese City}} or even a {{Kinki’s Citys}}, in a nice wikipedian style.
So it would be better for humans AND a computer could read all this data. To use it to draw a world map, design a game, or make some statistics...
Feedback please?--Alexandre Van de Sande 14:08, 12 Dec 2004 (UTC)
- I've just skimmed this so I can't supply an in-depth reply, but yes, "automating" the header and footer tables were part of my idea. That is, they would be semantic, they would mean more than they currently do. Their information would be transmitted back to other articles as needed, and there would be a specific (but by no means forced) format for each. The city table, and the information contained within it, is one example. A biography table would be another.
- Furthermore, I want a way to tell what an article is about. IS it about a city? IS it a biography? Right now, there is no such mechanism. --Golbez 22:18, Dec 12, 2004 (UTC)
Just to add another voice to the chorus, I'd love to see the ability to add semantic tags to wikipedia; I think they'd take off a lot more than people here are thinking. As for applications, I dream of projects like CYC being able to make use of our massive editor base. -- Rei 23:49, 12 May 2006 (UTC)