Talk:Metadata

From Wikipedia, the free encyclopedia

This article is within the scope of Computing WikiProject, an attempt to build a comprehensive and detailed guide to computers and computing. If you would like to participate, you can edit the article attached to this page, or visit the project page, where you can join the project and/or contribute to the discussion.
Start This article has been rated as Start-Class on the quality scale
High This article has been rated as High-importance on the importance scale
This article is within the scope of WikiProject Databases.
??? This article has not yet received a rating on the assessment scale.
Mid rated as mid-importance on the assessment scale

(Please add new discussion to the end of the document.)

Contents

[edit] From LeeHunter

The following chunk makes no sense to me and I think it should be deleted. --LeeHunter 21:23, 23 Aug 2004 (UTC)

  • would also be given the metadata keywords '4 wheel drives', '4WDs' and 'four wheel drives', as this is what they are known as in Australia. [it's debatable whether the metadata really needs to have every synonym]

The example of metadata given looks more like a symbolic link.

[edit] Is "computing" the right subheading?

Since this article (correctly in my view) begins with a non-computing example, I wonder if the title might not be better as "Metadata (information)". This would keep it in line with entries such as Table.

I think it would be fine simply at Metadata. Michael Z. 2005-05-23 06:19 Z

[edit] looking for info about image metadata - new feature?

I just noticed that photos I have been uploading recently contain metadata, but I can't find anything about this upgrade. Should I re-upload my old photos so that they contain metadata? Please direct me to the discussion of this new Wikipedia feature. Cacophony 01:49, September 7, 2005 (UTC)

The main encyclopaedia article is Exchangeable image file format. I couldn't find anything on the specific feature. Note that many image manipulation programs will lose the metadata (most images need manipulating before uploading). Older cameras won't generate it. --David Woolley 09:39, 12 November 2005 (UTC)

[edit] Link purge

This article has suffered from people adding their own external links which don't do a good job of providing further information on the topic. Remember that Wikipedia is not a web directory. If anyone should know the difference, it's metadata people. So here's a review of all the external links. Feel free to disagree with my assessments.

This was actually an interesting paper. Please restore. RayGates 02:31, 31 January 2006 (UTC)
This is extremely boring, but then I'm not a MAC user :). I'm surprised you kept it. RayGates 02:31, 31 January 2006 (UTC)
  • Meta Meta Data Data - Ralph Kimball is notable.
  • Rationales for using XMP metadata - Too specific. This is about digital photography metadata and nothing else. Deleted.
  • Metadatarisk.org - this site is run by a document security company. Wikipedia is not their free advertising. Deleted.
  • ISO/IEC JTC 1/SC 32 N 1102 - really, extremely boring ISO draft of something. Deleted. It could perhaps be restored if someone puts it in context and clarifies who would actually want need to read it.
This was supporting documentation for the formal definition which you summarily removed previously. Its only purpose was to show the source of the definition. RayGates 02:31, 31 January 2006 (UTC)
  • All blog links: deleted for non-notability.

rspeer / ɹəədsɹ 20:11, 30 January 2006 (UTC)

[edit] Formal Definition

While I did not insert the formal definition originally, I did find it relevant. I am inclined to restore it, but would welcome other views. RayGates 02:31, 31 January 2006 (UTC)

[edit] desktop search programs and metadata

Since desktop search programs such a Google Desktop catalog metadata from files on one's computer then I would suggest that they be mentioned somewhere in the article just as Spotlight and WinFS are. --Cab88 22:33, 12 March 2006 (UTC)


Ok, I added my definition at the beginning, along with those of Bracket, Marco, and Tannenbaum under "Warehouse metadata". --DaveHay 22:15 (CST) 23 March, 2006.

[edit] Added

I have added much and merged few. Someone do that for me, I have no more time at the moment. --Θ~ 17:44, 26 May 2006 (UTC)

[edit] Enterprise Metadata

I have rewritten the Enterprise Metadata section and incorporated it under General IT metadata. It had some good points once I was able to get past some of the grammatical issues. Apologies for my initial reactiveness.

Charles T. Betz 00:24, 27 May 2006 (UTC)

[edit] Digital Camera Metadata linking across all applications?

I just noticed my Netscape 7.2 program is now automatically displaying metadata for email message composition with digital camera images. By symbolic link?

John Zdralek what does god want with my quantum thoughts? 10:02 30 May 2006 (UTC)

Disambiguity of the MetaData virus listed on a webpage of symantec.com 31 May 2006?

Computer display view of virus list
Computer display view of virus list
John Zdralek what the heck? 17:52 04 June 2006 (UTC)

[edit] Although the majority of computer scientists see metadata as a chance for better interoperability, there are some critic voices whose main arguments must be taken seriously:

The above sounds like weasel words too me--Greasysteve13 07:24, 6 August 2006 (UTC)

"must be taken seriously" has no business in Wikipedia. However, the criticisms belong if they are sourced. Unless the criticisms are clearly at the fringe, the benefits of metadata must also be qualified (e.g. "The benefits of metadata are" -> "Supporters assert that the benefits of metadata are") If the criticisms cannot be sourced, they do not belong at all. Simple! Notinasnaid 08:05, 6 August 2006 (UTC)
Ever thought about gettin' your own hands dirty and change anything yourselves? --Θ~ 21:25, 11 August 2006 (UTC)
I agree, each of these criticisms needa citation.

[edit] Etymology?

In the first sentence in the brackets it is stated that "metadata" comes from greek meta=after .... But meta can also mean "about", and I think "metadata" is much more "about information" then "after information".

[edit] On Etymology and Plural

Meta in Greek means after. Just that. About is not a correct translation. I noted that this was written in the Meta article as well so I corrected it. Another issue we should consider is if metadata is plural or singular. Data is plural for datum (latin). So my opinion is that the word metadata should be plural as well. Information is singular for different reasons. Metadata are different from information. Share your thoughts on this.

Although I agree with your reference to the Latin, I believe that the common use is based on considering data as the aggregate, i.e. a shorthand for set of data. I don't feel that strongly which way the article goes but I would like to see some consistancy. The second sentence states "Metadata are...". The third sentance states "In library science metadata is..."

151.118.160.214 20:59, 4 December 2006 (UTC)Hoyt L. Kesterson II

[edit] criticism of the article in the media

The january 12th issue of computable contains a column by Rick van der Lans about the meta-data article, which criticizes this article:

  • the author didn't know the terms 'back room' and 'front room' metadata, and couldn't find any one that did know on a conference devoted to meta data (apparently he didn't Google it), the article lacks an explanation. The only online source I can find to attribute an explanation to is in Dutch, so maybe some one can find an English source and expand on this section?
  • "a reasons give [for the drawback that metadata is 'too complex'] is that users don't create metadata because existing formats, MPEG-7 in particular, are too complex. Pardon me?" I have no idea how to place the author's surprise at this statement. Maybe he has never met anyone who's opined that users don't add meta data (though I'd be hard pressed to find a word document with any of the semantic metadata fields filled). The criticisms listed on the page aren't sourced other than "some critics say" though, that could be fixed.
  • the author calls for 'some one' to 'fix' the article, but apparently hasn't done so himself, nor added a section on this talk page (the article's metadata), which I personally find somewhat ironic.
  • the author's byline describes him as"specialized in software development, datawarehousing and internet"

(The quotes attributed to Rick van der Lans are my crappy translations)

85.144.113.76 10:53, 13 January 2007 (UTC)

[edit] Where this page is now...

This page seems to me to be suffering, relatively speaking, but I'd like other inputs before trying to improve the whole (especially because I'm a wikipedia newbie). I see a lot of detail that takes away from the whole (partly just because it's detailed, and partly because it's not consistently written and presented). For example:

  • Definitions are sprinkled throughout the document in support of descriptions of particular domains of metadata. Some of these definitions contradict, duplicate, or support definitions in he Definitions section. Proposal: All definitions in the Definition section.
  • There's a section for "Types of metadata" and a section for "Types". Both of them list types of metadata, though the first is more categorization and the second is more application domains.
    • Re the first: There are a lot of ways people subcategorize metadata [example])-- I'm willing to go there, but should we be trying to be comprehensive? Proposal: Only list those that are referenced to a paper.
    • Re the second: We could go on listing metadata applications until the cows come home, but will that help people understand what metadata is? Is there a criteria that can be applied to help figure out if a particular application adds value to the article? A lot of this information is good, but some of it is noticeably weaker or secondary.

Many other comments, but mostly more minor. I'd appreciate seeing feedback on whether they see the same issues, and what the best way for me to contribute to addressign them is (fell swoop or piecemeal or...?).

--Metajohng 21:34, 6 February 2007 (UTC)

[edit] Metadata trademark usage

Note: Metadata is not a trademark in France, neither in many other countries where the term is used. Please, Mr. Metadata Company, do not internationalize yourself wherever "metadata" are used in Wikipedia pages, which are not US pages, but international pages. Thank you. Jeansoulin 15:33, 7 February 2007 (UTC) University Marne La Vallee, France.


[edit] Two new sections - What is "Metadata" and "Levels"

I agree with the previous comment that the article is a bit disjointed. Much of the content is okay in isolation but as a whole it tends to confuse rather than enlighten. To address this, I have rewritten the Introduction and added two new explantory sections to get the key concepts across. Following this it is useful to talk about Definitions, Types, Uses, Issues etc. but the current content could usefully be revised to remove duplication and make less disjointed. I might have a go at this later.

Pete S


[edit] The difference between data and information has no practical use?

"As for most people the difference between data and information is merely a philosophical one of no relevance in practical use, other definitions are:"

I believe this phrase is at least short sighted and should be rephrased. There is huge difference between information content and data and this has incredible impact on practical applications, probably the most common and practical being compression. Dpser 10:14, 14 March 2007 (UTC)

[edit] ZIP Code: new example please

A data definition such as "ZIP Code" in the cited text is hardly a useful (first!) example of metadata. What is basically a column name is either edge-case metadata or arguably not metadata at all. Metadata does not exist to give data meaning; rather, it exists to describe data. Meaning is derived: from metadata, context, presentation, personal bias... whatever.

Metadata is easier to explain in unstructured data contexts. An article titled "Solar Power Generation Today" might be assigned the metadata, "alternative energy," with a metadata categorization of, say "subject." On the other hand, if "12345" must be chosen as our, then a nice made-up metadata might be "Processing Center assigned 1987" or something like that.

I'd suggest the first of those two. And, in any case, the ZIP Code example should be archived. (Status:Deprecated) ;-)


  • Example: "12345" is data, and with no additional context is meaningless. When "12345" is given a meaningful name (metadata) of "ZIP code", one can understand (at least in the United States, and further placing "ZIP code" within the context of a postal address) that "12345" refers to the General Electric plant in Schenectady, New York.


67.149.104.192 02:14, 18 February 2007 (UTC)


Please to be more explicit as to why/how 12345/Zip Code (I'm happy with Aus Postal code example too) is a bad example? If it's no good, please to provide a suitable substitute. I was attempting to offer a beginners example from the systems perspective. Your "Solar Power Generation Today" comment is from the world of publishing (I think). As far as I know, neither the systems domain, nor the publishing domain can claim exclusive ownership of metadata... it rather depends on one's perspective/context/experience. DEddy 02:46, 18 February 2007 (UTC)

I think the Zip Code example is perfectly good. The name and definition associated with a data element are the most important items of metadata. RayGates 21:54, 18 February 2007 (UTC)

Ray - Thanks for the vote of support for "my" simple definition/example of metadata. I'm sure I copied this example from someone else 10-15 years ago, & I've yet to see anything that comes remotely close to giving a view into how metadata fits into helping represent the real world via data. Total agreement that a good name goes a long way to resolving lots of metadata ambiguity. DEddy 00:52, 19 February 2007 (UTC)

Not sure why we are making this personal and defensive. It wasn't meant that way. I disagree with the example, I feel that it is not terribly illustrative (since nearly everyone already knows about data element names), and I disagree that there is any domain-specific definition of "metadata." I suggested some alternatives. None of these are personal attacks. 66.93.3.210 20:24, 23 February 2007 (UTC)


With all due respect, I too feel that the Zip Code example is inappropriate. This is because "zip code" tells us what 12345 means. It completes the statement "12345 is a ________", effectively turning a string of numbers (data) into information. This is almost the definition of that object within a specific context. Metadata should provide peripheral information about an object - information that is (generally) not critical to the existence/interpretation of the object.

Also, I'm confused as to why we need another example when we can stay with and explore the very good example provided earlier on in the article - the digital camera JPEG. The JPEG metadata stores the timestamp, shutter speed and aperture among other things. It does not try to store the fact that "This is a picture file". We derive that fact from the structure of the file or the extension of the file-name, both of which contribute to defining the context within which we look at the file. Both are also critical to the existence of the file. If there was absolutely no way of knowing what type of file it was, the only thing one can do with that object is destroy it.Ulric 16:43, 20 March 2007 (UTC)

[edit] Field name is not metadata IMHO

The reason why I think the ZIP code is a bad example the way it is now is the heart of the sentence "12345" is data, and with no additional context is meaningless. I am only a computer engineer, not a computer science theorist, but to me data has meaning; if it does not have meaning, then I call it a string of characters or bits or I call it garbage, or a cryptographic challenge (as you might try to decode meaning just by looking at the shape of data.) There's no difference between 12345 and ABCDE if I don't have any clue about what it means. A field name is what turns a string into data. Clearly, for me the field name isn't metadata, but what qualifies the string as data. Your mileage may vary. So, data has meaning, and metadata extends what I know about the data and may help me work it better. For instance, you may keep the example and make it less US-centric by saying that the Postal Code field may have the value 12345, and metadata about it would be that it refers to a USA ZIP code. So we already had a meaning for the field (it's a postal code), but we're extending it by knowing it's not a French postcode (which it could be). We can work with it better, because we won't try to validate it against a Portuguese postcode mask (9999-999), or Australian (AAA 9999), South African (9999), whatever. – Tintazul msg 10:00, 22 May 2008 (UTC)

[edit] Mild cleanup of talk page

Before adding my own comments, I moved the zip code discussion to the end of the document (consistent with Wikipedia guidance for Talk pages) and made a few other tweaks, hopefully considered minor. Apologies if anyone is vexed. --Metajohng 18:58, 6 April 2007 (UTC)

[edit] Quick Link Purge

I removed a couple of links that were pointing directly to a specific metadata removal tool from the "Document Metadata" category, changed around some of the wording and added a link to E-Discovery. Not sure what you guys think of that, but it seemed like the iScrub links were blatant advertising. I'm not sure if the document-metadata.com links should be removed too. Any thoughts? --TheDude813 19:55, 2 July 2007 (UTC)

[edit] Did you know...

..that The European Library has a handbook and it gives open access to the Metadata Registry it is developing?

greetings, 82.156.209.165 21:28, 1 August 2007 (UTC)

[edit] Need for section on html

It would be very helpful if someone knowledgeable added a section on html metadata. I was very surprised that a quick scan of the article seemed to find no reference to web searching. Soler97 (talk) 22:45, 31 December 2007 (UTC)

[edit] Poor wording in opening paragraph

"metadata about a title would typically include a description of the content..." - "title" can be a piece of metadata, so using the word "title" here instead of "book" or even "information object (such as a book)" is needlessly confusing. —Preceding unsigned comment added by 131.216.164.187 (talk) 18:49, 14 January 2008 (UTC)

Agreed. Even the topic of a camera's metadata is misleading or wrong, as the examples are really talking about the actual photograph's metadata. The camera's metadata would items such as its dimensions, weight, manufacturer, et cetera.

Also, the old, popular "data about data" tag-line is rather lame, because there is also metadata about processes, motivations, etc... In a computing context, I've heard metadata better described as "information resource data". I think that could be attributed a lecture by Larry English, an Information Quality consultant. He also drew some challenge to the word "about" in this context, as the Latin for meta is purported to mean "along side of". —Preceding unsigned comment added by 69.140.220.35 (talk) 22:42, 8 February 2008 (UTC)

[edit] Can metadata have a useful definition?

The usage of the term recommended by this article is so broad as to render the term 'metadata' useless. When it started gaining traction in the 70s, the meaning was confined to data that constrained or organized attributes of entities (E. F. Codd coined the term relational database in 1970, so there was no practical implementation of a relational schema back then). Saying that a library card contains metadata seems highly suspect as librarians have a terminology for the descriptors of their books that was developed long before 'metadata' was coined.

It's disappointing to see that this article has grown to ensconce the prolific use of 'metadata' to include annotations of photographs. It's Jack Meyers' word, though his company doesn't seem to have the chutzpah to follow through on threats of trademark infringement suits. If we're appropriating someone's invention, it seems it should be because there is not a better (or perhaps adequate term) available.

The deep need was (is) for a term that denotes the information that must be present in order to have a definition of the properties of some entity. In the case of a library system, metadata stored would include the string 'title' to name the property of a book by which we call it, the string 'call number' to name the property used to locate the book, 'author' to name the person credited with the book, etc. The point is that the call number is not an example of metadata. The term becomes useless if the one can choose some particular entity and then decide that some of its properties are data and some are metadata. The author is a property of a book. Of course, to be useful in a computing system, metadata must include more than just the name of a property: the metadata should encompass the domain and range of the property (characters with a maximum length of 100) or more elaborate datatype representations if appropriate.

As mentioned in the article, data and document schemata constitute metadata stores as do ontologies, and this is the area in which the term metadata fills a real need. When it's claimed that the term should be applied to things like document properties that are stored and changed with the file (as in the section of 'Important Issues'), it loses value. The application creates property fields in the file for its own purposes (or for no good reason in the case of MS Word); why should those properties be called metadata? Those properties may have little to do with the 'real' content of the file or may be an innate part of the content and in either case can often be removed without any effect on the usefulness of the file.

In the section of 'Important Issues' we find another interesting example regarding digital photography: the need to include interesting properties along with the image data. The term 'metadata' would be most useful in identifying the data needed in order to recognize and separate (parse) or locate the various properties of the image in the image file. The exposure settings, date and time of the event are attributes that may be attached to the image in a number of ways - why should we apply the term metadata to these attributes? The terms 'properties', 'descriptors' and 'attributes' can be applied more meaningfully. On the other hand, the data need to find these properties lacks a term if 'metadata' means the properties themselves. In the case of JPEG images files, the metadata includes the JFIF format rules and possibly the information about the location and format of descriptors such as the date, etc.

I think it's not too late to hold the line. The Metadata Company registered the trademark 'metadata' in 1986. Maybe the wikipedia community can help it retain some meaning so that it doesn't become just another synonym for 'property', 'descriptor' or 'attribute'. JWBito (talk) 06:51, 17 May 2008 (UTC)