User:Davidkentsnyder

From Wikipedia, the free encyclopedia

The following is the current version of a readme file from a project I've been working on for about four years.

"This is the root tree of the offogs project. This project is an attempt to create a series of programs that is capable of responding to linguistic events in a context appropriate manner using a database of previous linguistic events and contexts to determine the appropriate response. In the following, the word utterance means a sentence, question, headline, title, heading interjection or some other string of linguistic symbols. There are several parts to the system:

1) An English language parser The English language parser takes in a utterance and parses it based on the verbs used in or implied by the utterance, including verbs used as adjectives, verbs in relative clauses, verbs that appear as gerunds in places where a noun would usually be expected, and others. It is primarily the verbs in English which determine the context of the other words, but more to the point for the parser, the stop signal for parsing is when each verb in a utterance is parsed down to it's verb phrase components, and the verb phrase components are reduced by somewhat arbitrary transformations to basis set verbs (you might think of these as definitions, but these transformations provide somewhat more information). When the parser needs more information to complete a parse tree or transformation, it generates a question and returns it with the partial parse tree.

2) A conversational command line interface The conversational command line interface takes in what the user types, sends it to the English language parser, tries to make sense of the results, if it can't make sense of the results it asks the user follow up questions to try to understand, and then responds with either an action (the running of some program) or an appropriate linguistic response. The conversational command line parser mode in the English language parser is required to have root transformations that may result in an action for certain kinds of utterances that the user might enter.

3) A web crawling data grabber The web crawling data grabber surfs the web to places that the user instructs or using a search engine such as Google, Yahoo or Infoseek. It does so in order to gather more complete information for its database. It will probably end up transforming HTML or XML files to text, it may at some point also keep copies of the pictures it encounters (especially pictures with captions). The web crawling data grabber may develop questions to ask a user or for further data gathering, but it puts these questions in an internal queue which is dumped to a database on a schedule, or upon termination of the program.

4) A block text processor The block text processor is a wrapper for the parser and the context manager that takes a text from a file and cuts through it; recognizing things like section headings, titles and headlines and cutting the rest of the thing up into utterances. It feeds each section heading, title, headline or utterance (I will hereafter use the word utterance to include things like questions and interjections), feeds it to the parser, and sends the results to the context manager. It keeps track of the questions that might be asked.

5) A context manager A context is a set of propositions in a conversation which are stored under the name of an utterer. Each name of an utterer is unique but within a given context may be some string of linguistic symbols that may be used to deterministically reference the utterer's unique name. A proposition is a basis verb with some, all or none of it's slots filled along with optional conditions. A condition of a proposition is, in general, an adverb of time, place, frequency or something else. A conversation is a set of utterances which may or may not be represented as propositions. The utterances in a conversation have at least one utterer.

6) An algorithm analyzer The algorithm analyzer reads C, C++, Python or Perl code (it may at some point read any script or code) and identifies functions. Each function is analyzed for the data structures, data names and actions. Input and output are identified as well as the transformations involved in the changes in the data from input to output. This is necessary so that other parts of the system can implement user requests as quickly and efficiently as possible and so that base verbs and other verbs can receive a visceral definition in terms of the computer the system is running on. Where possible, algorithms are expressed in terms of basis verbs and mathematical constructs. The algorithm analyzer keeps a list of things that the system 'knows how' to do and algorithms which the system can use to do these things.

7) A logical analyzer Analyzes the logic of a proposition set using predicate logic and keywords such as if, then, will, might and so on (this means that I have yet to compile a comprehensive list; I will base my list on the words used by W.V. Quine in at least some of his expositions on logic, especially the sections where he talks about predicate logic).

TO DO Add a list of databases and data structures that will be involved. Refine the explanation of the algorithm analyzer; it seems to be doing two things -- analyzing algorithms and parsing propositional imperatives. "