Knowledge Science

From Wikipedia, the free encyclopedia

This article is orphaned as few or no other articles link to it.
Please help introduce links in articles on related topics. (December 2007)

Knowledge Science is the discipline of understanding the mechanics through which humans and software-based machines "know," "learn," "change," and "adapt" their own behaviors. Throughout recorded history, knowledge has been made explicit through symbols, text and graphics on media such as clay, stone, papyrus, paper and most recently, as digitally stored representations. The digital effort began in the early 1970's when knowledge science was recognized as a vigorous field of study beginning with the development of natural language learning programs funded by the National Science Foundation (NSF). Now, knowledge science experts are engaged in a debate between:

Meaning as represented by language-based propositions that adhere to universal truth-conditions, and
The quantum relativist view, that meaning exists as a condition under which it can be verified and certified as acceptable without regard to universal truth-conditions.

Knowledge science and knowledge representations encompass ontology, epistemology, and axiology considerations. This articles presents an overview of this new field of study, relying on historical data to provide insight and understanding, while it addresses the two schools of knowledge science.

1 Early Beginnings
2 An Emerging Knowledge Science
3 Language-based Knowledge Representation
4 Theory-based Knowledge Representation
5 References

[edit] Early Beginnings

The challenge of digitally representing knowledge (meaning) within the machine environment, as opposed to simply using language to describe who, what, when, where and how much information, started in 1969/70 under the sponsorship of the National Science Foundation (NSF). Three initial projects were founded for the study of "Natural Language Processing." These projects included the University of California, Irvine Physics Computer Development Project, headed by Alfred Bork and Research Assistant, Richard L. Ballard; the Mitre Ticit Project conducted at the University of Texas, later moved to the University of Utah; and the Plato Project, at the University of Illinois, Champaign. Over 140 natural language dialog programs were created between 1970 and 1978. UCI's California Physics Computer Development Project conducted approximately 55 educational programs and spearheaded development throughout the UC system. Initial projects were conducted on Teletype Type33, paper tape punch machine that operated at a 110 baud rate. In 1976, NSF cited Richard L. Ballard, then co-director of the Physics Computer Development Project at Irvine, for the "first application of artificial intelligence to conceptual learning" (Expert Systems)."

In 1984, Doug Lenat started the Cyc Project with the objective of codify, in machine-usable form, millions of pieces of knowledge that comprise human common sense. CycL presented a proprietary knowledge representation schema that utilized first-order relationships. Cyc^[1], the fruits of which were spun out of MCC into Cycorp in 1994. In 1986, he estimated the effort to complete Cyc would be 250,000 rules and 350 man-years of effort^[2]. As of 2006, Lenat continues his work on Cyc at Cycorp.

This effort in 1989, permitted the Swiss High-Energy Particle Physics lab (SGML), Conseil Européenne pour la Recherche Nucleaire (CERN), to develop an instance of the Cyc project for their needs. This set into motion the work of Tim Berners-Lee, of CERN at the time, who conceived and developed anchor tags, also called hyperlinks, to link text and documents.

Thereafter in August 1991, Tim Berners-Lee released to the world in the Usenet forum alt.hypertext. This development ultimately enabled the decentralized publishing and exchange of information through the use of HTML (HTML) and its accompanying transmission protocol (HTTP).

In the same year Richard L. Ballard compiled a new technology called Mark 2, a system that utilized a single relational database table with four fields designed to store unique identifier codes that were pre-defined, stored, then later looked-up to populate the value-model/object/attribute/value table fields. Ballard called his value-based knowledge system "theory-based semantics," signaling a divide between language-based and theory-based knowledge representation systems. Mark 2 was deployed through "statements of work" with projects for the NASA, DoD, U.S. Navy and U.S. Air Force among others.

[edit] An Emerging Knowledge Science

In October 1994, the World Wide Web Consortium (W3C) was founded at the Massachusetts Institute of Technology, Laboratory for Computer Science (MIT/LCS) in collaboration with CERN, where the World Wide Web originated. This effort received support from the Defense Advanced Research Projects Agency (DARPA) and the European Commission. This organization would serve as a focus point for standardizing mechanisms of information exchange over the Internet. Thereafter, in 1995, Richard L. Ballard released Mark 2, v2.0 that utilized the earlier theory-based semantic representation technology, but included an ad hoc, 2nd order "if this, then that" knowledge representation. Mark 2 v2.0 was deployed through "statements of work" with the DoD, U.S. Navy, NASA, DEA, FAA, Office of the White House and private enterprise.

Then, in 1995, DUBLIN CORE National Center for Supercomputing Applications and the OCLC (OCLC) held a joint workshop to discuss metadata semantics in Dublin, Ohio. At this event, called simply the "OCLC/NCSA Metadata Workshop", more than 50 people discussed how a core set of semantics for Web-based resources would be extremely useful for categorizing the Web for easier search and retrieval. The participants dubbed the result "Dublin Core metadata" based on the location of the workshop. Shortly thereafter, in 1995, HTML (HTML), inspired by SGML and incorporating hyperlinks, was first standardized. This language allowed the transmission and viewing of web pages and lead to an explosion in the popularity of the Internet.

Advancing previous efforts, in 1998, XML (XML) was introduced, created by a W3C Working Group under the leadership of Jon Bosak. Its goal was to combine the approachable simplicity of HTML with the extensibility of SGML while avoiding the shortcomings of each. Its popularity sparked widespread interest in tagging text in an effort to advance the goal of achieving machine-based knowledge representation.

At this same time, conducting parallel research and advanced engineering on the Mark 2 product, Richard L. Ballard completed the last of 50 projects called the Avionics Prototype Tool. That project demonstrated that theory-based knowledge representation would support virtual integation of disparate database systems by modeling theory, the conditional reasoning power of the human brain, to understand and make-sense of the data content stored within database systems. This project, in turn, demonstrated that pre-defined schemas, language-based conventions, and a reliance on self-consistent logic were the cause of conventional system complexity and interoperability. In response to these findings, Ballard redirected his work to the development of a revolutionary technology called Mark 3, theory-based semantics. Mark 3 would demonstrate that knowledge could be explicitly represented within the machine environment, and that software-based machines could reason with this content to answer value-based questions the way people do.

[edit] Language-based Knowledge Representation

Language-based knowledge representation was advanced in 1999 when Resource Description Framework (RDF), an XML-based extension of an earlier 1996 PICS technology, was deployed to enhance content description. RDF drew upon submissions by Microsoft and Netscape and upon the Dublin Core/Warwick Framework. RDF is used primarily to organize and express document properties. The specific needs of different resource types, such as authorization structures or versioning, necessitated a schema language similar to XML DTD, called the RDF Schema (RDF-S) language.

On January 12, 1999, Executive Order 13111 called SCORM (Sharable Content Object Reference Model), was signed, tasking the United States Department of Defense to take the lead in working with other federal agencies and the private sector to develop common specifications and standards for technology-based learning. SCORM was developed as a way to integrate and connect the work of these organizations in support of the DoD's Advanced Distributed Learning (ADL) Initiative.

In 2001, a new markup language was specified called DAML+OIL DARPA Agent Markup Language. This new specification was based on RDF, XML and SGML. The Ontology Interface Layer, or Ontology Interchange Language (OIL) component. This new specification provided a means for description logic to be integrated into the programming language for the purpose of extracting meaning from the ontological framework.

Within two years, Web Ontology Language (OWL) emerged for defining and instantiating Web ontologies. OWL was designed for use by applications that need to process the content of information instead of just presenting information to humans. It facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics. OWL has since evolved into OWL-S and other formats.

Currently languaged-based researchers are working on a numerical reprentation using integers to define a language-independent structured knowledge exchange.

[edit] Theory-based Knowledge Representation

Building on previous research and development, Richard L. Ballard continued the development and articulation of a Physical Theory of Knowledge and Computation delivered as a keynote presentation at the September 8-12, 2003 Seybold conference entitled: "The Future of Publishing Technology - Part 1". Within this presentation, Ballard defined the bit capacity of human thought, how theory and information produce situation awareness, and how software-based machines can faithfully represent every form of knowledge and reason with that knowledge the way people do.

[edit] References

^ Lenat, Douglas. Hal's Legacy: 2001's Computer as Dream and Reality. From 2001 to 2001: Common Sense and the Mind of HAL. Cycorp, Inc.. Retrieved on 2006-09-26.
^ The Editors of Time-Life Books (1986). Understanding Computers: Artificial Intelligence. Amsterdam: Time-Life Books, p84. ISBN 0-7054-0915-5.

Alfred M. Bork, Physics Computer Development Project (PCDP), Progress Report. February 19, 1971, non-journal, University of California, Irvine. (Contains a list of computer dialogues developed with a short description of each. The types of dialogs developed are: (1) development of an interactive proof, (2) assistance in problem solving, (3) diagnosing and filling in limitations in the student's mathematical background, (4)simulations, and (5) quizzes containing minimm performance standards.

Alfred Bork and Richard Ballard, The Physics Computer Development Project, Journal of College Science Teaching, Vol. II, No. 4, April 1973.

Alfred Bork, Current Status of the Physics Computer Development Project. January 3, 1975, Non-Journal ("With support from the National Science Foundation and the University of California, the Physics Computer Development Projects have produced in the last six years computer based material in a wide variety of modes. Among the major products are science learning dialogs, graphic additions to APL (A Programming Language), the underlying software, and the authoring system.)