User:Darkov/CatData

From Wikipedia, the free encyclopedia

Contents

[edit] What is CatData?

CatData or Category Data is a simple, direct, participatory approach to extracting and formatting basic information about an article in a structured way.

[edit] Why?

Attaching structured data to articles is very useful because it allows the data to be processed automatically in a more reliable way, yielding such benefits as much improved searching, query answering and statistical analysis.

There have been proposals for creating structured data such as Wikdata and Semantic MediaWiki, but right now these are proposals. In the spirit of Wikipedia, I thought the best approach was to start editing with something that is simple and intuitive and to see if people maintain the data, because it is that ultimately on which the success or failure of such an initiative rests.

[edit] How Does It Work?

CatData consists of three basic elements:

  • Types: These essentially lay out the data or attributes associated with the category and format them for display. They manifest themselves as templates with then name Template:CatData_<type name>.
  • Entires: These are the type templates inserted into the articles to which the data belongs. One or more should be added in correspondence to the categories to which the articles belong.
  • Help: These pages explain what should be entered into each attribute in a simple but specific manner.

[edit] Technicalities

[edit] Structure

Right row there is not much structure and this is partially intentional. The types are intended to be typed feature terms: each type has a name, is the subtype of one or more other types and has one or more features or attributes, each of which has a type. This is intended as a goal more than a rigid structure. In the spirit of the wiki I think it is best to let the actual structure evolve as people add data then apply structure and consistency later as it takes form. Trying to impose a formal structure on the general population risks it being ignored, or people not participating at all. Trying to explain some subtle ontological notion to someone who is doing a casual edit (like most edits) is pointless.

The current choice of data types, or data categories, is meant to run roughly parallel to the category system. The categories are not entirely suitable as a type system but I hope that the more rigourous data categories will influence the category structure while they inform the data structure.

[edit] Automation

A project like this screams out for automation, but providing tools before the system takes hold might bamboozle people with a rigid structure or other formalities. One thing I think is worthwhile is a system for maintaining the type lattice and inheriting attributes. Basically this would provide a tool for easily designing type inheritance relationships and attaching attributes to types as well as help text. Then the system could spit out the appropriate templates for use in the actual articles.

[edit] Meaning

I don't have a degree in semantics or semiotics, but I do know that words have no meaning other than what we give them, so rather than focus on a formal language for describing the structure or meaning of the data I think we should concentrate on two areas: the names we give types and attributes and the explanations or help text we attach to those things, each from a intuitive point of view. This may not be a perfect approach, but the wiki will give us some sort of consensus and raw data, if not an executable specification of the meaning within Wikipedia.

[edit] Current Work

I'm just starting out. I am currently working on a Person template. It has a couple of links to attribute help pages. I intend to do more work on the help page.