Lojban is a constructed, human-speakable and (theoretically) machine-speakable language, based on predicate logic. It is one of the latest languages, designed in 1987 with most of its grammar from Loglan and some features from Láadan. Most of its root words are derived from the 6 widely spoken natural languages, Arabic, Chinese, English, Hindi, Russian, and Spanish. The characteristic regularity, unambiguity, and versatility of Lojban grammar owes much to the fact that its creation followed the development of scientific linguistics and computer programming, the fruits of which Esperanto and Loglan could not have drawn on as their design principles. Its linguistic advantage can be summarized as follows: "Lojban moves beyond the restrictions of European grammar. It overtly incorporates linguistic universals, building in what is needed to support the expressivity of the whole variety of natural languages, including non-European ones."[1]
6 vowels and 21 consonants exist in Lojban. The phonemes are to be commensurate with graphemes, which means Lojban is to have 27 letters (lerfu) corresponding to each piece of sound in the language. Lojbanic graphemes can vary in mode; this article employs the Latin alphabet version, which is currently in the most common usage (see Orthography for more detail). The phonemes, on the other hand, are defined solely by International Phonetic Alphabet.
The tables below show typical realizations of sounds and the Latin alphabets in Lojban. In all cases except the rhotic consonant the first phoneme represents the preferred pronunciation, while the rest are the permitted variants intended to cover dissimilitude in pronunciation by speakers of different linguistic backgrounds.
Lojban has 16 diphthongs (a kind of sound which consists of a vowel plus a glide, always constituting a single syllable). The combinations <ai>, <au>, <ei> and <oi>, for instance, are all realized as the corresponding falling diphthongs. To force these sounds to be pronounced separately as monophthongs, a comma can be put between them. Triphthongs do not exist in Lojban.
The vowels can be either rounded or unrounded and the consonants can be either aspirated or unaspirated, but not palatalized in general. The voiceless stops /p/, /t/ and /k/ are usually aspirated, but need not be. The affricates /d͡ʒ/ (the voiced postalveolar affricate), /d͡z/ (the voiced alveolar affricate), /tʃ/ (the voiceless postalveolar affricate) and /t͡s/ (the voiceless alveolar affricate) also occur in Lojban, but are each considered to be a combination of the appropriate phonemes in the language (being the realization of <dj>, <dz>, <tc> and <ts>, respectively). The rhotic sounds are all equally acceptable as an identical phoneme. <l>, <m>, <n>, and <r> may be syllabic.[2]
For those who, given their native language background, may have trouble pronouncing (certain) consonant clusters, there is the option of inserting buffer vowels between them, as long as they differ sufficiently from the phonological vowels and are pronounced as short as possible. Possible choices include [ɪ], [ɨ], [ʊ] and [ʏ] (but not [y], which is the rounded counterpart of [i] and thus a valid realization of <i>). The resulting added syllables are completely ignored by the grammar, including for the purposes of stress determination.
Lojban may be written in different orthography systems as long as it meets the required regularities and unambiguities. Some of the reasons for such elasticity would be as follows:
Some Lojbanist extends this principle so as to claim that even an original orthography of the language is to be sought.[3]
Note: It is suggested that the Lojban term lerfu be used instead of the English so that confusion with letter, the kind one writes to someone, is avoided (James Cooke Brown's version was letteral by analogy with numeral).[4] This section will be in accordance with that discernment.
Lojban's Latin alphabet consists of 23 lerfu a b c d e f g i j k l m n o p r s t u v x y z plus 3 semi-lerfu ' , . . They are intentionally ordered in accordance with that of ASCII characters.
Capitalization may be applied to mark a non-standard stressed syllable as in cmene, but they are not considered separate lerfu. Whether a single vowel or the entire syllable is capitalized is a matter of preference; for example, the name "Josephine" can be rendered as either DJOzefin. or djOzefin. (without the capitalization, the ordinary rules of Lojban stress will cause the 'ze' syllable to be stressed instead).
Punctuation marks are not mandatory; such notions as question or exclamation are expressed with words rather than unpronouncible symbols.[5]
This mode was conceived when the introductory Lojban brochure was translated into Russian. 23 lerfu а б в г д е ж з и к л м н о п р с т у ф х ш ъ plus 3 semi-lerfu ', . are used. The hard sign ъ is assigned to the open-mid vowel. Diphthongs are written as vowel pairs, as in the Roman mode.[6]
Kena[7] argues for the Tengwar writing of Lojban, insisting that:
Exemplary mappings between the Tengwar system and the Lojban sounds are provided as follows: [1], [2].
Advocates of this include Eric S. Raymond.
A Japanese hiragana version of Lojban orthography has been proposed, in which case more than 80 lerfu may by used. This mode is not without certain technical issues since the hiragana (and katakana too) are always syllabary indicating an open syllable except the "n" sound, requiring the practicer some special attention when representing the Lojbanic consonant clusters. Experimental transcription rules are given by Fa-Kuan's website. Examples of Lojban haiku compositions in the orthography can be found at following links: [3] [4].
Lojban has 3 word-classes: brivla (predicate words), cmavo (structure words), and cmene (name words). Each of them has uniquely identifying properties, so that one can unambiguously recognize which word is of which part of speech in a string of the language. They may be further divided in sub-classes (discussed respectively below). There also exists a special form called rafsi assigned to some of the brivla and cmavo.[8][9]
brivla carry the content (semantic information) of an expression, which means their function may be roughly analogous to common noun, verb, adjective, or adverb in natural languages[10] (although some modal cmavo too may have adverbial purposes[11]). brivla may be identified by the following properties[12]:
Such a word like lobypei will still be considered as a brivla because the special gluing vowel y between b and p is to be ignored and therefore a consonant cluster (b-p) assumes its existence within it.
Unlike its natlang counterparts mentioned earlier, brivla do not inflect for tense, person, or number.
Brivla's sub-classes are as follows, with some examples.
The simplest brivla which constitute the lexical base of the language is called gismu. They are invariably five-letter, which distinguishes it from the other types of brivla, and are in a form of either CVCCV or CCVCV (C stands for a consonant and V for a vowel). Being two syllables means that the general rule of gismu to be stressed penultimately will always cause the first syllable to be stressed.
They have been chosen or added as root words because they a) represent concepts that are very familiar and basic, b) represent concepts the usage of which is equally frequent among different languages, c) would be helpful in constructing more complex words, or d) represent fundamental grammatical concepts of Lojban like cmavo and gismu.[13] The main source languages from which they were drawn are Arabic, Chinese, English, Hindi, Russian, and Spanish. Here is further explanation of the nature of gismu by Cowan:
The gismu do not represent any sort of systematic partitioning of semantic space. Some gismu may be superfluous, or appear for historical reasons: the gismu list was being collected for almost 35 years and was only weeded out once. Instead, the intention is that the gismu blanket semantic space: they make it possible to talk about the entire range of human concerns. [...] For a given concept, words in the six languages that represent that concept were written in Lojban phonetics. Then a gismu was selected to maximize the recognizability of the Lojban word for speakers of the six languages by weighting the inclusion of the sounds drawn from each language by the number of speakers of that language.
The Complete Lojban Language: 4.4
According to Robin Turner,[14] the creation was done by computer.
Approximately 1350 gismu exist, which is a relatively small number when compared to that of English words ranging from 450,000 up to 1,000,000.[15] Theoretically, by learning only these root words, as well as their fragmental forms and some major structure words (cmavo), one will be able to communicate effectively in Lojban. A list of picturable gismu with images is available on the Lojban Wikipedia.
The compound form of brivla is called lujvo.
A borrowed-word type of brivla. They usually refer to things that are culture-specific or to kinds of plants or animals, concepts which cannot be easily expressed as mere modifying-modified combinations of Lojban's internal root words.
fu'ivla can be subdivided into four types according to the extent to which they are modified, namely Stage 1, 2, 3, and 4 fu'ivla.
The longest form, quoting a foreign word/phrase while preserving its original spelling with particular structure words.
This stage involves lojbanizing the sound and spelling of the word.
At this stage a borrowed word is fully turned into a single brivla, having its own place structure. Since no brivla may have more than one meaning, it is often the case that they are attached by a rafsi (with a hyphen like "-r-", "-n-", or "-l-") categorizing or limiting the semantic scope of the word (such are called "rafsi classifier"). Again they always start with a consonant and end with a vowel.
These are the borrowings which are so common or so important that have become as short as possible, having no rafsi classifier. Unlike other brivla, they may begin with a vowel (preceded by a pause mark separating it from the previous word). Also the word must not be of a form that one can remove all the initial vowels (and apostrophes) and have a valid word.
It is possible to absorb a fu'ivla into a lujvo, with principles varying among Lojbanists. Notable proponents are Pierre Abbat and Jorge Llambías. Here are some comparisons of their methods drawn from the Lojban mailing list (as of July 2007):
A group of two or more brivla (possibly with associated cmavo) is called tanru. They are always divisible into parts without any morphological breakage; they are a mere sequence of multiple gismu or lujvo or fu'ivla rather than a single distinctive morphological unit. See also: Syntax and semantics
Lojban structure words, cmavo, are recognized by following properties:
And they display one of the following letter patterns: V, VV, V'V, CV, CVV, CV'V. The form generally does not indicate anything about its grammatical function.
cmavo can be sequenced without spaces and without any change to its meaning:
As far as the stress rules of Lojban are concerned, such compound cmavo are still separate words, so penultimate stress (e.g. paREci) is not obligatory.
Some cmavo have rafsi, which may help converting tanru into lujvo:
cmene stand for things (including people) in descriptions or in direct address (cf. proper nouns). Mostly they can be in any form as long as they end in a consonant. The practice by which names in natural languages are modified to be used in Lojban is known as "lojbanization".
A special fragmentary form of gismu and cmavo, from which a new word may be created, is called rafsi. brivla such as lujvo or fu'ivla are usually derived from them (this, in turn, means that lujvo and fu'ivla have no rafsi form of their own). rafsi cannot by themselves function as an individual word; they need to be in a combined form to be used.
The unambiguity of Lojban morphology, according to John Woldemar Cowan, gives rise to "significant clues to the meaning and the origin of the word, even if you have never heard the word before". He further says: "The same principle allows you, when speaking or writing, to invent new brivla for new concepts 'on the fly'; yet it offers people that you are trying to communicate with a good chance to figure out your meaning. In this way, Lojban has a flexible vocabulary which can be expanded indefinitely."
According to What Is Lojban?,[16] the language's grammatical structures are "defined by a set of rules that have been tested to be unambiguous using computers", which is called the "machine grammar". Hence the characteristics of the standard syntactic (not semantic) constructs in Lojban:
Such standards, however, are to be attained with certain carefulness:
It is important to note that new Lojbanists will not be able to speak 'perfectly' when first learning Lojban. In fact, you may never speak perfectly in 'natural' Lojban conversation, even though you achieve fluency in the language. No English speaker always speaks textbook English in natural conversation; Lojban speakers will also make grammatical errors when talking quickly. Lojbanists will, however, be able to speak or write unambiguously if they are careful, which is difficult if not impossible with a natural language.
Nick Nicholas and John Cowan. What Is Lojban? II.3
The computer-tested, unambiguous rules also include grammar for 'incomplete' sentences e.g. for narrative, quotational, or mathematical phrases.
Lojbanic expressions are modular; smaller constructs of words are assembled into larger phrases so that all incorporating pieces manifest as a possible grammatical unity. This mechanism allows for simplistic yet infinitely powerful phrasings; "a more complex phrase can be placed inside a simple structure, which in turn can be used in another instance of the complex phrase structure".
Being derived from predicate logic, the basic unit of Lojban expression is predication, a claim that some objects stand in some relationship, or that some single object has some property. bridi is the Lojban term for this type of unit. Just as a predication is formed by a predicate and arguments in formal logic, bridi are formed by selbri and sumti in Lojban. A construct of selbri and sumti produces a claim that something stands in a specified relationship to something else or has a specified property.
Multiple bridi can be either sequenced across multiple sentences or compounded in one sentence:
A compound bridi can includes multiple tenses and sumti:
The implicit grammatical divisions can be made explicit by separator words such as cu and vau, which are often elidable but sometimes need to be present to avoid ambiguity:
The places of cu and vau in the previous examples can be rendered as follows:
The ordered sets of sumti assigned to every selbri are known as "place structures". They are explicitly defined in dictionaries or word lists.
Some lujvo formations usually operate on the place structure in predictable ways. The rafsi {gau}, for instance, inserts one place for the agent and pushes all others down one. Thus brivla can have indefinitely many places. This contrasts with the accusative alignment or ergative alignment that most languages have, in which there is a small number of named places (subject, direct object, indirect object) and all others are expressed by prepositions.
The typology of Lojban is basically subject–verb–object, with subject–object–verb also common. However, it can practically be anything:
Such flexibility has to do with the language's intended capability to translate as many expressions of natural languages as possible, based on a unique positional case system. The meaning of the sentence mi prami do is determined by prami realizing, with its own predefined place structure, a specific semantic relation between mi and do; when the positional relation between mi and do changes, the meaning of the sentence changes too. As shown above, Lojban has particular devices to preserve such semantic structure of words while altering their order. Compare the followings:
se converts the x1 and x2 sumti place. fo tags the x4 place, and fi the x3. Such conversion and tagging is often used to emphasize particular sumti by bringing it forwards.
Here are some collations of natural languages and Lojban:
It is important to note that Lojban selbri is not a real equivalent of verb in natural languages. A selbri can be either a verb, a noun, an adjective, or an adverb. Its function is determined syntactically, not morphologically. An analogy to natural language word orders by using such terms as "subject", "verb", and "object" cannot accurately describe the nature of Lojban bridi.
There are five kinds of simple sumti:
Descriptions have the most complicated syntax and usage. Closely interwoven with this kind are names.
Basic descriptions in Lojban consist of two units, LE/LA descriptors and a selbri:
Although le is quite close in meaning to English "the", it has particularly unique implications. In this example, le creates an argument which might occur in the x1 place of the belonging selbri zarci, namely a "market". le also specifies that the speaker 1) has one or more specific markets in mind (whether or not the listener knows which ones they are) and 2) is merely describing the things he/she has in mind as markets, without being committed to the truth of that description. Whereas English-speakers must differentiate between "the market" and "the markets", Lojban-speakers are not required to make such a choice (this rule does not mean that Lojban has no way of specifying the number of markets in such a case):
Since the construct le + selbri merely describes something or other which the speaker chooses to represent based on his/her observation, such an expression as follows is possible:
While le is specific, lo is not:
lo refers generally to one or more markets, without being specific about which. Unlike le zarci, lo zarci must refer to something which actually is a market (that is, which can appear in the x1 place of a truthful bridi whose selbri is zarci). lo nanmu cu ninmu is false as there are no objects in the real world which are both men and women.
la dissociates the subsequent selbri from its normal meaning, usually making a name (this usage should not be confused with the other usage before regular Lojbanized names). Like le descriptions, la descriptions are implicitly restricted to those the speaker has in mind:
All descriptions implicitly terminate with ku, which can almost always be omitted with no danger of ambiguity. The main exceptions are a) when relative clauses are involved and b) when a description immediately precedes the selbri (in which case using an explicit cu before the selbri makes the ku unnecessary). Other usages of ku include making a compound negator (naku) and terminating place-structure/tense/modal tags (puku, baiku).
The selbri is the logical predicate of a bridi. This is not to be confused with the meaning of predicate in terms of the English Language, but as a logical predicate. Whereas a predicate in English contains everything that the subject is doing, a logical predicate is simply the relation between all involved parties. In this context, the selbri is roughly the equivalent of a verb in English. For instance:
The gismu nelci is being used as the selbri in this bridi. It is describing the relationship between the sumtis mi (I) and le gerku (the dog). The relationship is that of a liker and that which is liked. The roles in the relationship are determined by the sumti placements inherent in the word being used as the selbri. The cmavo se/te/ve/xe are used to swap the first sumti placement of the selbri with the second, third, fourth, and fifth sumti placement, respectively. This functionality allows for the flexibility in bridi. For instance, the gismu klama has the sumti of:
Thus:
Selbri can also be tanru, where the sumti placements are determined by the last brivla that is part of the tanru. For instance:
Multiple brivla may be linked up together so as to more specifically conceptualize the intended meaning. The tanru in lo skami pilno "computer user(s)", the modifying brivla skami narrows the sense of the modified brivla pilno to form a more specific concept (in which case the modifier may resemble English adverbs or adjectives). Without skami, lo pilno will just mean "user". Other examples:
Part of speech: structure word
There are five articles: lo, le, la, li, and me'o; of which the first three inflect to show individual, mass, or set (though as far as the formal grammar is concerned, the inflected forms are separate words, not inflected forms).
Individual | Mass | Set | Typical | |
---|---|---|---|---|
Indefinite | lo | loi | lo'i | lo'e |
Definite | le | lei | le'i | le'e |
Name | la | lai | la'i | - |
Number | li | - | - | - |
Mathematical expression | me'o | - | - | - |
The individual/mass distinction is similar to the distinction between mass nouns and count nouns, but things that are normally counted can be considered as a mass. The set articles consider the mathematical set of the referents.
The number and mathematical expression articles are used when talking about numbers and numerals or letters as themselves.
As befits a logical language, there is a large assortment of conjunctions.
There exist 16 possible different truth functions, the four fundamental ones of which are assigned four vowels in Lojban. These vowels are a component sound from which actual logical-connective cmavo are built up.
A | FIRST is true and/or SECOND is true (TTTF) |
E | FIRST is true and SECOND is true (TFFF) |
O | FIRST is true if and only if SECOND is true (TFFT) |
U | FIRST is true whether or not SECOND is true (TTFF) |
With the four vowels, the ability to negate either sentence, and the ability to exchange the sentences, as if their order had been reversed, Lojban can create all of the 16 possible truth functions except TTTT and FFFF (which are fairly useless anyway). In order to remain unambiguous, each place in the grammar of the language where logical connection is permitted has its appropriate set of connectives. If the connective suitable for sumti were used to connect selbri, ambiguity would result. Here are examples of connectives suitable for sumti:
Variations of these truth functions can be made as follows:
Connections between components other than sumti can be expressed as follows (note that their functions are in accordance with the assigned vowels):
Connections can be questioned:
Besides the logical connectives, there are several non-logical connectives. These do not change form depending on what they are connecting:
The ku is required by the LALR parser, but not by the PEG parser, which however is not official yet.
Attitudinals are a set of cmavo which allow the speakers to express their emotional state or source of knowledge, or the present stage of discourse. In natural languages, attitudes are usually expressed by the tone of voice when speaking, and (very imperfectly) by punctuation when writing; in Lojban, such information are extensively expressible in words. And the meanings are to be understood separately from the main predicate.
They may be "scaled" by suffixes:
Combination is possible, and highly productive as well as creative:
Evidentials, derived from those of American Indian languages and the constructed language Láadan, show how the speaker came to say the utterance, i.e. the source of the information or the idea:
There are two kinds of prepositions (sumtcita, which refers to adpositions in general) in Lojban: tense markers and proper prepositions. The syntactic difference is that a proper preposition can be converted with se, whereas a tense marker cannot. All proper prepositions (except the vague one do'e) are formed from a brivla and mark their object semantically as being in a place of that brivla. Thus the following are equivalent:
Prepositions (including tense markers) can also be placed in .i ... bo to make sentence conjunctions. With most prepositions this makes no sense, but ki'u, ja'e, mu'i and ni'i are often used this way to express various kinds of "because" and "therefore":
Lojban has 63 unique tense words to express various aspects of both space and time as well as event (such a system is unusual among other languages in that it deals with spatial and temporal aspects in the same term). They can be roughly subdivided as follows:
Marking tenses is always optional in Lojban:
Where the tense information is not specified, the context resolves the interpretation.
Tense words are usually put right before the selbri:
They may be placed elsewhere with the additional terminator ku:
The terminator is used so that the tense word do not directly run into the following sumti and modify it. Compare the next sentences:
Tenses can be "layered up":
Tenses can be "sticky" by being set with ki, continuing in effect over more than a single bridi, until it is unset:
The second ki resets the tense to the implicit default time from the speaker's point of view, which is "now" (this means that ki may be used as a tense word by itself).
Using ki, equivalents of the previous layering tenses can be produced:
The second pu is to be counted from the tense set by the last ki, so in effect it is equivalent to pupu.
Lojban has a formal grammar which does not proscribe all the strings of words that a human would consider ungrammatical. One can say things like "*Either he and I will go". Some of these grammatical, but nonsensical, constructions are: