Talk:Polysynthetic language

From Wikipedia, the free encyclopedia

1 français
2 Telpuilgoc
3 Reply to Gritchka:
4 Sapir
5 So what's the point?
6 Basque Polysynthetic?
7 Which languages are polysynthetic? (an experiment)

[edit] français

Re: je ne me le lui suis pas?
Is this really a correct French sentence? If it is, can somebody tell me what it is supposed to mean in English? (See also: Talk:Clitic) D.D. 20:09 28 Jun 2003 (UTC)

This, and the one on the clitic article, seems to be a made up example using as many pronouns as possible. I guess it means "I am not myself it to him"? Like the clitic one, I'm not sure it's supposed to make sense. And although I'm not a professional linguist or anything, I did take some linguistics classes for fun in university, and I have never heard of French being considered polysynthetic. If you look it up on Google, there are some hits, but not many... Adam Bishop 20:58 28 Jun 2003 (UTC)

Yes, the allegedly French phrase is gibberish. --Zoicon5

I've changed it to je ne le sais pas, which is enough to make the point. It is valid that French can be considered structurally parallel to Bantu in this regard. I've also drawn a distinction between synthetic and polysynthetic. Normally polysynthetic would be taken to mean incorporating, as in Mohawk or Chukchee, not just heavily synthetic, but I've left that other possibility opne. - Gritchka 18:02 25 Jul 2003 (UTC)

[edit] Telpuilgoc

Telpuilgoc has no hits on Google. If an example of an artificial language is required there are much better ones. DJ Clayworth 21:46, 7 Aug 2003 (UTC)

[edit] Reply to Gritchka:

Polysynthetic does not necessarily entail incorporation. English has a limited amount of incorporation (in noun compounding), but it is not really polysynthetic. Polysynthetic simply refers to the word to morpheme ratio or the degree of synthesis, i.e. polysynthetic means a large amount of morphemes per word. A nice summary is in Comrie (1989).

Ish ishwar 06:41, 24 Nov 2004 (UTC)

[edit] Sapir

Comment: This usage of the term polysynthetic must have originated earlier as it is used in Arthur Conan Doyle's The Lost World, serialized, then published, in 1912. See Chapter VIII, PP 15-16, Challenger's and Summerlee's discussion of the local tribes. Perhaps someone can correct the information on the origin of the term.

Wilhelm von Humboldt (1836) established a morphological typology with 4 language types (more or less):

isolating
agglutinating
fusional
polysynthetic (i.e. embodying)

Sapir (1921) pointed out the problems of a classification such as this and instead proposed that languages should be classified according to 2 parameters:

synthesis
technique (similar to fusion)

The term polysynthesis was first used in a linguistic sense by Pierre-Etienne Du Ponceau (a.k.a. Peter Stephen Duponceau) in 1819 (borrowed from chemistry terminology).

Cheers! - Ish ishwar 22:02, 2005 Feb 5 (UTC)

[edit] So what's the point?

Isn't the boundary between words completely arbitrary? Doesn't "Chukchi is a polysynthetic language" really just mean "We/Chukchi speakers have chosen to write Chukchi with a low word-to-morpheme ratio"? The Chukchi "word" quoted in the article is basically a sentence with no spaces. If so, the definition given in the article (and, presumably, everywhere else) completely misses the point. Simetrical 04:47, 30 Jan 2005 (UTC)

No. Generally the boundary between words is not arbitrary. But, let's be sure that we are speaking of the same "boundary".

You seem to be speaking of an orthographic word, which would be a word that is written in a given writing system, perhaps using an alphabet or a syllabary. In English and other languages there is a convention to separate orthographic words with spaces. Here I will give you that there is a bit of arbitrariness in things like compounds (which may be written as (1) two orthographic words, (2) two words with a hyphen between them, or (3) two words separated by spaces), etc. Other writing systems do not use this convention of using spaces.

What is discussed in the article is not an orthographic word, but is rather a linguistic word, which may or may not correspond to the orthographic word. We should expect to find some correspondence between the two and also some differences. A linguistic word could perhaps be defined as the sequence of sounds found in a particular utterance in a given human context and which functions as a unit. More abstractly, we also might want to consider a word to be the abstract unit in our mind that is realized as a particular sequence of sounds (we might also want to distinguish between a lexeme and word-forms). So, this word is a different thing from an orthographic word.

Of course, there are different units in language, like morphemes, phrases, "sentences", etc., so we will need to clarify the above definition. A word is classically defined as a minimal potentially-free linguistic unit.

The boundaries of words can be determined by a few different criteria—for instance, syntactic, morphological, phonological, and/or psycholinguistic criteria. The syntactic and morphological criteria are perhaps the most convincing. For example, if we have an English sentence like Wilhelm kicked the carrot, by using syntactic tests we can state that kicked is a single word and not two words kick and ed. Another famous example is nitrate /najtret/ vs. night rate /najt + ret/, there are phonetic differences between these two words due to nitrate being one word & night rate being two (different phonological processes occur within words & at word boundaries). A psychological argument would come from speakers whose language had no writing system—most speakers seem to be able to sense what words are and extract them from a piece of spoken discourse.

But, this is just a general explanation. Your question is a very important question. It is hard to create a definition of word that can be used usefully in all languages. There are some cases where it is hard to determine if a particular linguistic chunk is a word or something bigger or smaller. This has been discussed a lot in the literature.

I am the one who provided the examples in the article. They are taken from some famous books on linguistics. I dont personally know these languages so I cant comment on them. But, I believe that there was some linguistic analysis of these words that led these linguists to consider them to be words and not something else like a sentence (even though they must be translated as sentences in English).

Maybe someone else can clarify further what I have written. I hope I have been clear. - Ish ishwar 07:44, 2005 Jan 30 (UTC)

You write,

For example, if we have an English sentence like Wilhelm kicked the carrot, by using syntactic tests we can state that kicked is a single word and not two words kick and ed.

What "syntactic tests" would these be? Also,

But, I believe that there was some linguistic analysis of these words that led these linguists to consider them to be words and not something else like a sentence (even though they must be translated as sentences in English).

Surely the idea of an encyclopedia article is to explain a concept. If we're to have an article on this topic, the ideas behind the definition of a word need to be explained—if not here, then perhaps at word, with a link from here (and a pointer to the fact that the definition is important). Actually, it looks to me like—astonishingly—there is no article for a linguistic word. I'll go create a stub now. —Simetrical (talk) 23:54, 30 Jan 2005 (UTC)

Stub created at word (linguistics). —Simetrical (talk) 00:09, 31 Jan 2005 (UTC)

[edit] syntactic constituent tests

Hi.

There are many syntactic tests that are used for different things. Many are used to determine constituent structure. A word is basically the smallest syntactic constituent. Higher level constituents include phrases and sentences. Since words are syntactic constituents, we can use these kinds of tests to determine what the words are in a given sentence.

One type of test is WH-substitution (WH- indicates WH-words which are traditionally called interrogative pronouns: what, who, where, etc.). Applied to our example Wilhelm kicked the carrot, we can get:

who kicked the carrot

Wilhelm kicked what

Wilhelm did what to the carrot

Wilhelm did what

We can substitute who for Wilhelm, what for the carrot, and did what to for kicked. We can also substitute did what for the whole verb phrase kicked the carrot.

But we cannot substitute a WH-word/phrase for only kick or ed:

?*Wilhelm what ed the carrot

*Wilhelm kick what the carrot

(note * = ungrammatical; ?* = ungrammatical, but maybe some people will accept)

Another test is pseudo-cleft constructions:

Wilhem is who kicked the carrot

the carrot is what Wilhelm kicked

kicked the carrot is what Wilhelm did

kick is what Wilhelm did to the carrot

*ed is what Wilhelm kick the carrot

In our pseudo-cleft test, it seems that we can extract kick out of kicked and pseudocleft it, but we cannot pseudocleft ed.

Another issue is that we can imagine situations where a person could utter as one word response the following:

"Wilhelm?"

"the carrot?"

"kicked?"

"kick?"

"kicked the carrot?"

But, it is impossible to have as a one word response:

*"ed?"

So, these tests mentioned above plus many other tests seem to indicate that -ed is not a word, but an affix bound to kick. This is determined by virtue of their different syntactic behaviors. It also seems that kicked is word even though it is composed of two different morphemes.

Note that we could also apply tests to the. If we did so, we might conclude that the is not really a word either but something similar to an affix. Because the does not behave like a true word, it called a clitic (a clitic is sorta like an affix that is affixed to phrases instead of word forms).

Getting back to polysynthesis, in polysynthetic languages the affixes that make up a given polysynthetic word are often cannot be moved around in different syntactic constructions. Words usually can be moved around, but not affixes.

[edit] noun incorporation

In some languages lexical items can occur as words and can also occur as affixed elements within other words. (Note that incorporation often is present in polysynthetic languages but it is not always present and incorporation is present in non-polysyth. langs. But this point is still relevent to your question of whether a particular thing is a word or not.)

Here is an example of two sentences in Lakhota with and without noun incorporation (i.e. the noun is incorporated into the verb):

(1)	wičháša	ki	čą́	ki	kaksáhe
	man	the	wood	the	chopping
	'the man is chopping the wood'

(2)	wičháša	ki	čą-kaksáhe
	man	the	wood-chopping
	'the man is woodchopping'			(Van Valin & LaPolla:1997 in Haspelmath:2002)

In sentence (1) the word 'wood' can occur with the determiner 'the'. When incorporated in (2), 'wood' is prefixed without 'the' and incorporating 'the' into the verb is impossible. Thus the element 'wood' in both sentences has different syntactic behaviors and different semantic readings (definite/specific vs. indefinite/generic) depending whether it occurs as a word or as a bound morpheme.

You can compare this to English babysit. This is a single word with baby incorporated into sit. You cant modify baby with a bunch of stuff when it is a part of babysit.

I babysat for Bilbo yesterday.

*I a very big and plumb baby sat for Bilbo yesterday.

*I two babies sat for Bilbo yesterday.

Usually, baby acts like a normal noun and it can be modified with an indefinite article and/or adjectives and made plural, but in babysit it behaves differently.

References:

Haspelmath, Martin. (2002). Understanding morphology. London: Arnold (co-published by Oxford University Press). ISBN 0-340-76025-7 (hb); ISBN 0-340-76206-5 (pbk).
Van Valin, Robert D.; & LaPolla, Randy. (1997). Syntax: Structure, meaning and function. Cambridge: Cambridge University Press.

Hopefully, this makes sense. If not, just ask for clarification. Peace. - Ish ishwar 09:00, 2005 Feb 5 (UTC)

Baker (not to mention Mithun, Sadock) are pretty clear that verbs like babysit in English are not incorporated forms. Canonical incorporation is productive and the incorporated noun must be modifyable. English has nothing like this, so it seems that the term incorporation here is a misnomer.

Another point of contention I'd like to raise is the statement in the article that West Greenlandic does not have incorporation. Sadock and Baker both agree it does, so I'm going to change that in the article. Straughn 17:42, 25 May 2006 (UTC)

Hi. Yes, I agree with them. However, since it is the closest thing English has to it, I thought it would help understanding. It wouldnt hurt to be explicitly say in the noun incorporation article that English does not have true noun incorporation and gives arguments why it is not.

Whether or not West Greenlandic has noun incorporation depends on your definition. Yes, Sadock & Baker think so, but Mithun & Rosen disagree. – ishwar (speak) 14:45, 15 November 2006 (UTC)

[edit] Distribution of polysynthetic languages

The section on distribution of polysynthetic languages references Finno-Ugric languages. The term polysynthetic might be not very precise and some languages are more synthetic than others, but from what I have seen of grammars of these languages so far, I would not call them polysynthetic. The term should be reserved for teh extrem casses of morphologically very complex languages. Why where these languages included here. If there is not evidence (source explaining this), this reference should be removed from the section.

Nannus 20:39, 6 July 2006 (UTC)

These languages were included here precisely because the definition of "polysynthetic" is pretty vague. Anyway, while I don't know Finnish, the Finnish sentence juoksentelisinkohan from the Finnish language page, translated as I wonder if I should run around aimlessly, could qualify as polysynthesis. The question is, ultimately, how synthetic does a language have to be to qualify as polysynthetic? The way I see it, there are simply too many possibilities: average morpheme:word ratio, possible morpheme:word ratio, minimum morpheme:word ratio, or any or all of those in combination with some sort of restriction on the word-classes that "count" as polysynthetic, which would discount those famous German compound nouns and keep German a fundamentally fusional language. In order to accurately use the word "polysynthetic", I contend that it's necessary to define exactly what one means by it. Thefamouseccles 13:32, 2 August 2006 (UTC)

The article says that polysynthetic languages usually include agreement with object arguments as well as subject arguments in verbs. However in Finnish verbs agree only with their subject, not object. Also that juoksentelisinkohan is a somewhat extreme example and was translated in a rather verbose manner. It would seem to me (I'm a native speaker) that Finnish is not a polysynthetic language in the sense that word is used in this article. However, I'm not a linguist, so don't take my word for it. Ossi 16:52, 4 January 2007 (UTC)

You are correct. Finnish is not considered polysynthetic by linguists working with polysynthesis.Maunus 21:23, 4 January 2007 (UTC)

[edit] Basque Polysynthetic?

I have never heard this claim before and thhere are no references for it. It also doesn't say so in the Basque article. Could someone provide a reliable source calling Basque polysynthetic please? Else I will remove it from among the examples.Maunus 08:45, 9 November 2006 (UTC)

I'm no linguist and therefore know of no sources that can discuss in depth the issue. But, after reading the article I have no doubt it is. Basque language can join morphemes to make complex words, like in German, fusing nouns, adjectives, declinative suffixes and other particles like the diminutive "x" of liburuxko. Basque verbs are extremely synthetic despite using auxiliar verbs for most cases. A single auxiliar verb (or fully declinable "synthetic" verb, that are the most usual ones) can express subject, D.O. and I.O. (person and number), additionally to tense and other possible particles, like declinations, if the verb is "nominalized" as happens in subordinate sentences.

Surely there are better examples than Basque, when polysynthesis is even more extreme but it seems clear to me that Basque is quite polysynthetic.

That the Basque language article is imperfect and more focused in its difussion, official/non-official status, etc. is surely a pity. The grammar section only dwells in its ergative-absolutive quality and little more (declinations) but one user sayss he's already working in an article on Basque grammar. --Sugaar 12:50, 9 November 2006 (UTC)

I have been asked to comment on this discussion as someone who knows "both about Basque and about polysynthesis", so I will try. I was also asked: "Are there any reliable sources calling Basque polysynthetic?"

This latter question is empirical, and I can help to find the answer by looking at a few books in my linguistics library, when I get a chance, so I'll come back on that.

Speaking off-hand from memory, I can say I am pretty sure that some Spanish-language works for general readers (non-scholarly), e.g. school textbooks, probably do describe Basque as polysynthetic. Consequently, "Basque is a polysynthetic language" is the sort of item of "general wisdom" that I think has been circulated among a generation of Spanish-speaking school children, who will go through life quite certain that it must be true (though few of them know what it means). I am emphasising these circumstances in order draw a distinction between such "certainty" based on the doctrines of school teachers, and the kind of "reliable sources" I was asked about.

The trouble, I think, is that polysyntheticity is a matter of degree - doubly. In the first place, because if you look up Synthetic language you will see that this is itself already explained as a matter of degree. In the second place, polysynthetic is defined roughly as 'highly synthetic' - where 'highly' is another expression of degree. Again, if we take the definition of 'polysynthetic' in R.L. Trask's A dictionary of grammatical terms in linguistics (Routledge, 1993):

"A label sometimes applied to word forms, or to languages employing such word forms, consisting of an unusually large number of bound morphemes..."

again, the operative words are "unusually large (number)" - a matter of degree, and of relative degree at that (define "unusually!"). This means that when we start getting too nit-picking about which languages are or are not polysynthetic, one will very likely eventually reach a point where the discussion becomes sterile - a sort of "mine is bigger than yours is" level, at which point the argument becomes pointless because there is no standard way of quantifying how (poly)synthetic a given language is, so to some extent it is an impressionistic concept.

Another point that causes me some concern here is the distinction, which should not be forgotten, and which certainly Trask did not forget in his definition quoted above, between applying the term "polysynthetic" to word forms on the one hand and to languages on the other. A single example such as the one quoted for each "polysynthetic language" cited in the article does not, cannot, serve to "prove" how polysynthetic that language is, unless it can be asserted that the sample sentence is a typical sentence (or the sample word a typical word) in that language. In other words, rooting about to find the "longest" (i.e. most complex) word in the language that one can think of may be a way of impressing one's friends, but is not necessarily indicative of the overall nature of the language.

Now I must come to the Basque example in the article. Unfortunately, the first thing I need to say about it is that the supposedly Basque sentence cited is quite incorrect and shows every sign of having been made up by someone who does not know Basque. It would in fact be unintellegible to a Basque speaker as it stands. The first of the three words is possible but does not mean what the gloss says; the second doesn't exist; the third is a possible verb form, but apart from being a low-frequency form and its comprehension not being supported by the context (i.e. it seems pragmatically inappropriate), it is also syntactically wrong in the intended context (strict rules of information structure are violated). To put it more briefly, it's gibberish. So okay, let us look at the English gloss given and see how you would say that in Basque:

'If you had brought those of the small books to me I would have gone to them.'

As it turns out, again unfortunately, the proposed English gloss is only barely more intelligible than the Basque it purports to translate.

What I think has happened here is that the author of the example has attempted (with poor results, as it turns out) to go to every possible length to concoct a Basque sentence that would be maximally polysynthetic. Now there are words in Basque as long and as complex as the ones in the example, but there are also lots of short and, yes, monomorphemic words in Basque. Undoubtedly the morpheme-to-word ratio in Basque is higher than, say, in English, but it would be more reasonable, I think, to take real Basque (and English) sentences, of a type normally used (or at least ever used) by native speakers, into consideration when determining that. Examples such as those cited here might be better accommodated in a "Stranger Than Fiction" article.

As an off-the-bat exercise in this, here is the beginning of the "Our Father" in English (Revised Version) and Basque (Elizen arteko Biblia, 1994), with morphemes indicated:

English: Our Father which ar-t in heaven, hallow-ed be thy name. Thy king-dom come. Thy will be do-ne in earth, as it is in heaven. 28 morphemes, 24 words, morpheme-to-word ratio = 1.17.

Basque: Gu-re Aita zeru-ko-a, ager-tu santu zeu-re izen-a, e-torr-araz-i zeu-re errege-tza; bete-araz-i lurr-ean zeu-re nahi-a, zeru-an bete-tze-n d-en bezala. 38 morphemes, 18 words, morpheme-to-word ratio = 2.11.

Using real texts and presumably normal words, this example suggests that Basque is a more highly synthetic language than English (almost twice as highly, in fact). But since everything is relative, and polysynthetic languages are ones with an unusually high level of synthesis, we would still need to perform a statistical survey to find out whether Basque is unusually synthetic or not. Basque may be average, and English unusually analytic or isolating! --A R King 17:09, 9 November 2006 (UTC)

Agree. Basque would surely perform low in synthesis when compared with some Siberian or Native American languages, yet in the context of Europe and even West Eurasia is highly polysynthetic with few languages (Caucasian ones?) that can compare.

Yet, while English is particularly isolating because it uses always pronouns and almost always auxiliar verbs (of low synthesis, unlike Basque ones), it seems to me that you'd get simmilar results with most European (Indo-European) languages. The Spanish version of the prayer doesn't seem more word savvy than the English one for instance: Padre nuestro que est-ás en lo-s cielo-s, santific-ado se-a tú reino, ha-ga-se tu voluntad así en la tierra como en el cielo gives 22 words and 29 morphemes a ratio 1.3, not much higher than English.

Possibly German or even Latin would perform somewhat better but I doubt they can reach the level of plysynthesis of Basque. --Sugaar 21:30, 9 November 2006 (UTC)

Pater noster, qui es in cael-is, sancti-fic-et-ur nomen tu-um. Ad-veni-a-t regn-um tu-um. Fi-a-t volunta-s tu-a, sic-ut in cael-o, et in terr-a. 38 morphemes, 21 words, morpheme-to-word ratio: 1,81. Close to Basque! --A R King 22:08, 10 November 2006 (UTC)

I think you are both missing a quite important point, that Polysynthesis is not "simply" a high count of morphemes per word. As the article mentions other factors are usually included when deciding to call a language polysynthetic - for example most of the canonically polysynthetic languages are extreemely headmarking, most mark agreement for both object and subject on the verb. While Basque and latin can form long and multimorphemic words they don't live up to those two (and other) criteria and consequently neither can be said to be a particularly "polysynthetic" language. I think that arguing for the "relativity" of the term is watering down its usefulness and also obscuring the way it is actually used in linguistic literature. Maybe Basque is called a polysynthetic language in some popular spanish literature, but has any scholarly literature ever argued that this is the best way to classify its typology? I doubt it. And even if it has been argued does that make it a noteworthy example of a polysynthetic language? Hardly. What has been shown here gives me no reason to believe that it is more synthetic than say latin, finnish or turkish - none of which are ever referred to as polysynthetic, and consequently I don't believe it should be given as an example in this articleMaunus 22:38, 10 November 2006 (UTC).

I think you have a point, Maunus, but I think I have one too. What I am trying to say is that the term polysynthetic does have a certain meaning (and usefulness), but that this is not to be pushed too far because if we do so, it tends to become less meaningful. As for Basque, it is a little bit more synthetic than Latin - probably closer to Turkish. Latin, of course, is a highly fusional language; in such languages, there is synthetic morphology but the morpheme-to-word ratio is kept down not only by the use of portmanteau affixes which "stand for" combinations of two or more grammatical features (e.g. -um in tuum combines neuter, nominative and singular), but by the general non-sequentiality of such affixes (you can only use them one at a time, they don't allow combinations). Now before we get off on another race to nowhere, let's remember that fusionality can also be viewed as a more-or-less issue rather than a black-or-white one (e.g. in the Latin word ama-ba-nt-ur 'they were loved', we have a chain of three recognisable, discrete suffixes). Basque is much less fusional than Latin, although that does not preclude the existence of some fusion here too. In particular, there are more opportunities to combine segmentally discrete affixes in Basque word forms and form chains of morphemes in a way quite foreign to Latin. A form of recursion is even possible which can lead to the same morpheme occuring twice in a word such as gizon-a-ren-a (a perfectly ordinary, commonplace type of formation in Basque, meaning 'the one of the man', 'the man's one', which contains two occurrences of the article morpheme -a); theoretically at least you can go further and have gizon-a-ren-a-ren-a 'the one of the one of the man', for example, not to mention throwing in other case markers, so it is not that hard to concoct such long morpheme chains. Thus there quite definitely seems to be a qualitative difference on this level between Latin and Basque. (Turkish probably would come out somewhere near Basque in this respect - it is equally non-fusional (or maybe more so), but may realize fewer categories through synthetic morphology and lacks the kind of recursivity mechanism just mentioned.) All in all, then, Basque is more polysynthetic than Latin. (I would actually expect this to be borne out more clearly by morpheme-to-word statistics based on a larger text sample - the little test shown here is obviously based on an inadequate corpus.)

So, let us grant that Basque is more synthetic than Latin, but how polysynthetic is it? How much synthesis must there be to describe a language as polysynthetic? (How many freckles must a person have to be described as freckled?) Another question: how polysynthetic need a language be to make a good paradigm example of a canonically polysynthetic language worthy of being cited in an article on polysynthesis? As regards the usefulness of the term as a descriptor for certain languages, I should think that linguists will find it most useful to employ the term to refer to extremely polysynthetic languages (those near the top end of the scale), and that is, I believe, the sense in which it was originally proposed. Once again let us remember that we also have the term 'synthetic' (and 'fusional', 'inflecting' and several others) to describe languages, and that 'poly-' means many or multiply). Basque is somewhat (poly)synthetic, but it is not in the same league with some paradigm cases, and it is probably quite sufficient to refer to Basque as synthetic and leave it at that.

By the way, I should correct a couple of points of detail. Basque does quite definitely mark agreement for both object and subject on the verb. It also does have some head-marking features (most notably the indexing of up to three arguments on the verb), although it has dependent-marking ones too (such as case marking on noun phrases), making it a hybrid type in this respect. In other respects the dependent-marking aspect dominates, though. --A R King 08:37, 11 November 2006 (UTC)

I see your point Alan. And I also suddenly had an idea. Thiis article shouldn't represent languages and say "this is an example of a polysynthetic language" it should say "this language can be considered polysytnthetic because of the following arguments ...". Languages aren't polysynthetiic or fusional or agglutinative - they can be argued to be one or the other on the basis of the facts of their morphology. That means that I want the article to tell explicitly what is to be considered polysynthetic traits of basque, of Chukchee and of the otheer example languages. This approach will be much more informativee and allow people to learn what polysyntheesis actually is, because it is argued in the article instead of just giving examples of languages that are polysynthetic without stating what makes them so. What do you think?Maunus 12:46, 11 November 2006 (UTC)

What you are saying there, Maunus, sums up what I believe is an evolution from a more naive to a more mature understanding of what linguistic typology is about. The naive idea sees typologists sticking labels on languages and forcing them into a sort of taxonomy. The mature idea is that there are a range of patterns that characterise some languages but not others, or some languages more than others. But be they labels, patterns or whatever, human beings seem to have this tendency to get fetishistic about names and categories. Ideally it would probably be better if linguists of an earlier generation had not accustomed us to the idea of talking about polysynthetic languages, this languages and that languages. Abstract nouns referring to the patterns would be better: 'In this language there is a considerable amount of polysynthesis' or something like that. It is right that we should use abstract nouns here because the concepts they denote are really abstractions, and nothing else. Yes, what you suggest sounds like a good idea to me. Cheers, --A R King 15:46, 11 November 2006 (UTC)

A very intersting discussion. I just wanted to add that I had never seen Basque described as polysynthetic before (but I'm no linguist) but that the description striked as obvious to me when compared with other European languages. Notice that while normal Basque doesn't often make very large words with too many morphemes in them, it has the ability to do so easily, something that most living European languages can't. It's also a matter of potential.

In this sense, the prayer is surely not a good example but it would best tested with very long synthetic words/phrases taken from the extremely polysynthetic languages. IE languages, including Latin, would have to divide such words in many smaller ones but Basque is likely to be able to approximate them better.

After all "Our father..." is originally Latin and was translated into Basque from that origin. It may not express well the full synthetic potential of Basque. --Sugaar 20:27, 11 November 2006 (UTC)

Just for the record, Sugaar, the "Our father" is originally Hebrew or Aramaic (depending on which language Jesus recited the prayer to his disciples in), but the "original" language in which it has been transmitted to posterity is of course New Testament Greek. It was later translated into other languages, including Latin, from the "original Greek". Actually I don't think the fact of being a translation is likely to have affected the amount of synthesis displayed. What has, though, is the fact that by coincidence the passage chosen only contains one finite verb form (den), which furthermore happens to be one of the shortest finite verb forms in the language! That has happened because the "corpus" is too short to be representative. The very next word in the text, if I had continued, meaning 'Give us (it)...', is e-ma-gu-zu, with a morpheme-to-word ratio of four points! --A R King 21:18, 11 November 2006 (UTC)

You are right in that. The emaguzu example is a contrast (though ema- (eman) is a verbal root). The problem is that third person singular morpheme is often absent in Basque (should be represented by a "empty conjunct" symbol), so surely it also has an absent 4th morpheme, that would be visible if "bread" was "breads": emaguzute. That same phrase in English has a ratio of just 1 (give us [it]) but in Spanish of (dá-nos-le) is of 3. You definitively need a longer text or deeper study to make such measurements, and also look for the potential ability of each language for polysynthesis. In this sense, the example mentioned in the article is quite good because it is an example of (maybe) maximal possible polysynthesis in Basque, that, when compared with other languages, shows markedly the differences.

Basque: Liburu-x-ko-aren-ak ba-l-en-ekar-z-ki-da-ke n-in-doa-ki-zue-ke-en. 20:3=6.67!

English: If you ha-d br-ought th-ose of the small book-s to me, I w-ould have go-ne to them. 23:17=1.35

Spanish: Si me h-ubiéra-s tra-ído es-os libr-illo-s, yo hab-rí-a i-do a donde el-los. 22:12=1.83

The example actually shows well the difference of potential polysynthesis in Basque. And would be even better if, instead of the odd I would have gone to them, it would have read I would have brought (carried?) them (the books) to them (they), which in Basque would be very clear and would have one or two more synthetic morphemes, increasing the synthesis ratio to 7 or more.

In this case, I don't know Turkish, but I think Latin wouldn't be close to Basque in polysynthesis because, despite having the same kind of somewhat complex verbs as Spanish, additional synthesis in declinations and some suffixed particles like -que, can't compete with the extreme synthesis of Basque verbs and its abiliy to add up several declinations in a single word.

Instead reflexive sentences like I'm watching myself can be less polysynthetic in Basque because they require the use nire buruari (lit. "to my head") phrase.

But, in general, in the more polysythetic phrases I can think of, English can hardly pass of a ratio of 2, Spanish of 3... but Basque can well reach 7 (or maybe more). --Sugaar 09:22, 12 November 2006 (UTC)

I already made it quite clear (I thought) earlier in this discussion that the Basque example given in the article and which you are quoting here is not valid. It makes no sense and does not represent real Basque. I also said that in my opinion "hunting" desparately for any way to cram more morphemes into the words to "prove" a language's polysyntheticity is not the right way to go about this. (On the other hand, the "Our Father" text I analysed earlier is an authentic bit of corpus, although too brief, as we have admitted; and if it happens to present a rather low (for Basque) ratio, that is just an empirical fact, which does not invalidate the example.) By the way, if you will insist on citing that pseudo-Basque example in the article, I should also point out that the proposed analysis into morphemes is also erroneous. Anyway, getting back on track, I would like to add the following...

Postscript: To support and document what we have said about the objective of modern linguistic typology, I have copied a quote from a reliable textbook, Croft's Typology and universals. (I thought including it on the present page would perhaps be considered unjustified.) In the quotation we can see, in context, that Croft considers current typology to be concerned not, as in the nineteenth century, with "a classification of structural types" such that "a language is taken to belong to a single type", but rather with "linguistic patterns that are found cross-linguistically".

Regarding the parameter of morpheme-to-word ratios, based on nineteenth-century classifications, languages have been described as "analytic (one morpheme per word); synthetic (a small number of morphemes per word); and polysynthetic (a large number of morphemes, particularly multiple roots, per word)." (Quoted from Croft, p. 40, who is summarising Edward Sapir here.) The other parameter that relates closely to this one, as we have already seen above, is degree of fusion. I quote another standard textbook, Bernard Comrie's Language universals and linguistic typology (second edition of 1989), who briefly presents the two parameters thus (p. 46):

"One of these parameters will be the number of morphemes per word, and its two extremes will be isolating and polysynthetic. The other parameter will be the extent to which morphemes within the word are readily segmentable, its two extremes being agglutination (where segmentation is straightforward) and fusion (where there is no segmentability). We may refer to these two parameters as the index of synthesis and the index of fusion... What are traditionally called polysynthetic languages become languages with a high index of synthesis (in addition, they may or may not also have a high index of fusion; for reasons discussed below it is inevitable that a language with a very high index of synthesis will also have a low index of fusion, even though the two parameters are logically independent)."

And to answer your question, Maunus, for what it's worth, I have not found references to Basque as a polysynthetic language in the linguistic literature. Cheers! --A R King 09:44, 12 November 2006 (UTC)

The sentence is kind of odd but it does represent real Basque (at greatest level of polysynthesis, of course). Simmilar sentences can be found and made in real conversations/texts: it's not really forced: if you want to say that you need to do that sentence or a very simmilar one (unless you're speaking bad Basque, that also happens).

I really don't know if the morpheme division is perfectly done or not. To my unexpert eyes it seems ok and in any case the difference would be minimal. --Sugaar 14:45, 12 November 2006 (UTC)

Sugaar, are you saying that that sentence is real Basque as somebody who really knows Basque???? --A R King 15:18, 12 November 2006 (UTC)

Well, I have only 7. maila (out of 8+2). The dezake, lezake, etc. verbal forms are still rather difficult for me. You should better ask someone with a higher level for a more precise answer. But, in principle, it looks ok. The more I read it the more ok it seems.

My question is: why are you questoning the reality of such sentence? --Sugaar 16:04, 12 November 2006 (UTC)

I posted an "interwiki request for help" in the Basque wikipedia, in the entry on Basque language [1] asking to solve this doubt. --Sugaar 16:27, 12 November 2006 (UTC)

Because as somebody who does know Basque (very well indeed) I can tell you (I already have, actually) that it is not correct Basque. You seem to be saying it is good Basque. But you admit that you do not know Basque that well. So what should we do about that? I already told you above what is wrong with the sentence. I don't think I should repeat myself, for the sake of other readers (hey, I'm really sorry about this, you guys!). But to start with, the second word (*balenekarzkidake) doesn't even exist, it's a totally impossible form. Surely that is a sufficient reason?? However, perhaps this is not the place to continue with this line of discussion (actually I'm sure it isn't). We have to respect others reading this, so maybe we should go elsewhere if you really want to carry on arguing about this. For goodness sake though, how can you tell me your knowledge of Basque is only so-so and yet still insist repeatedly that the sentence in question is "real Basque" when I'm telling you, as a Basque speaker, that it's wrong? --A R King 16:46, 12 November 2006 (UTC)

This problem can be solved easily by using only example sentences from actual basque language publications that are reliable and easily verifiable. Anyone can come here and say that a sentence is perfectly fine basque, but if the example is constructed by a non-native speaker then chance is that it isn't really. If the basque phrase in the article comes from a verifiable and reliable source then please list the source in the article - if it doesn't then please remove it and replace it with another that is verifiable. My feeling is that A.R.King is right and the exaple is probably not grammatical basque the argument for this is simply that he says he is good at basque and he is a proffesional linguist, and that Sugaar admits to have less than perfect basque language skills. Based on this alone I would call for the example to be removed, as well as any claims of Basque being polysynthetic untill presented with sources for both the example sentence and the claim of polysynthesis.Maunus 17:52, 13 November 2006 (UTC)

You're porbably right that it is a forced verb. The 2000 dictionary has bazenekarzkete and balekarzkete, both of which have ba-zen*-ekar-z-ke-te* "only" 5-6 morphemes (the * marked ones are only one together? or does -te express plural?). Can you improve the example instead? --Sugaar 18:16, 13 November 2006 (UTC)

Hi again, Maunus and Sugaar! Sugaar, your "verb" balenekarzkidake is more than forced, it is non-existent, ungrammatical, incorrect. The two verbs I think you are referring to in the verb tables of the Hiztegi 2000 dictionary are actually zenekarzkete which means 'you (pl.) would bring them' and lekarzkete meaning 'they would bring them'. Unless you are referring to bazenekartzate 'if you (pl.) were to bring them' and balekartzate 'if they were to bring them'. Closer to your balenekarzkidake, the following do exist: zenekarzkidake 'you (sg.) would bring them to me', lekarzkidake 'he/she would bring them to me', bazenekarzkit 'if you were to bring them to me' and balekarzkit 'if he/she were to bring them to me'. But nowhere, not even in a verb table, will you ever find *balenekarzkidake because it plain doesn't exist, there's no such thing. Sorry! As a Basque learner, you should notice that there are no forms in your tables that have both the prefix ba- and the suffix -ke at the same time, and you also want to observe that there is never a prefix shape len- as in the would-be form you are citing. The other thing that must be understood is that presence of a verb form among pages and pages of Basque paradigm tables does not ensure that the form is a normally used one, they're just telling you what the form (theoretically at least) would be if you needed to know. In fact, these are not commonly used forms; the usual procedure is to express these concepts using periphrastic forms. I'm sorry to have had to be so insistent about this, but there is a moral to the story, which I am trying to put across in the most diplomatic way possible: it's one thing to "be bold" as the Wikipedia motto goes, but another to speak with confident insistence about things we are not all that sure about, and I don't think it's a good idea, at least not when somebody is trying to tell you they know better. Cheers, --A R King 21:01, 13 November 2006 (UTC)

Kaixo, Sugaar eta Alan (hi, Sugaar and Alan). I am Xabier Armendaritz, a Basque native and professional translator, and I have arrived here by means of the note you posted in the article "Euskara" of the Basque Wikipedia. Excuse my poor English, I have to finish a big job for tomorrow (I am translating several text from Spanish and English into Basque), so I do not have much time to improve my wording.

I wholly agree with Alan: the sentence "Liburuxkoarenak balenekarzkidake nindoakizuekeen" is absolutely unnatural and quite ungrammatical. The second word does not exist, the third means "I would have gone to you (pl.)" (and not "to them"), and the main verb lacks an adverbial, "nindoakizukeen" cannot be used alone.

A correct translation of the sentence "If you had brought those of the small book [note that it is "book," singular] to me, I would have gone to them" would be "Liburuxkoarenak bazenekarzkit, haiengana nindoake" -- this is correct, though it still sounds unnatural and forced: "bazenekarzkit" is a very rare form, the usual form would be "ekarriko bazenizkit."

Hope this helps.

Xabier

Kaixo Xabier eta eskerrik asko (Hi Xabier and thank you).

To Maunus I would like to say that I did have a quick flick through some "reliable" linguistics sources (typology textbooks) the other day but found no allusions to Basque as an example of a polysynthetic language in them (although that doesn't prove much on its own).

I also went to the trouble of looking at a longer Basque text (longer than the first half of the Lord's Prayer!), one that is not translated from another language (to satisfy Sugaar), and not a made-up example, but the text of a picture book that I believe will be published soon, and written by Basques. The text is over 3600 words long. I ran a concordancer programme on the text to discover the longest words and from these selected the finite verb forms and graded them by the number of morphemes in each. The result is that only five finite verb forms in this 3666-word text is composed of five or more morphemes. These five words are:

- d-ituz-te-la-ko 'because they have them'
- bait-z-u-te-n 'since they had it'
- d-ir-en-o-ta-n 'in these which are'
- z-i-gu-n-etik 'from the one that he had to us' or 'since he had it to us'
- z-i-o-te-n-ean 'in the one that they had to him' or 'when they had it to him'

A few observations about these forms and what they tell us: 1) These are all words frequently used as tense auxiliaries, that is why the glosses don't seem to make much sense out of context, but they do convey the (maximum) level of complexity shown by such forms. 2) This does not mean that Basque does not have the 'potential' to produce longer, more complex forms. But it gives a measure of how much of such complexity normally characterises authentic Basque texts. 3) I have chosen to look at verbal forms, but there are also some nominal forms that show a comparable degree of synthesis; the arguments would be similar. --A R King 21:06, 14 November 2006 (UTC)

First of all thanks to Xabier for his help. That wipes again any doubt: the sentence must be removed.

A R King seems to have done also a great investigation job. Still nearly all Basque words have two or three morphemes at least (single morpheme words are realtively rare: nearly every noun is declined nearly every verb has three or more morphemes, and there are no prepositions nor need to use pronouns in most cases). This, if considered average, is still quite higher than other European languages. Surely a synthesis level of more than 5 (even if it does exist) is not frequent but, on average, it should be more than the 2.2 that the "Pater noster" gave us.

At this point I leave the discussion for the experts, as I believe that A R King is perfectly able to reach the best conclussions, as he's both linguist and a good euskaldun.

Enjoy, --Sugaar 00:04, 15 November 2006 (UTC)

Incidentally, even Duponceau does not consider Basque to be polysynthetic like the "Indian" languages. – ishwar (speak) 14:31, 15 November 2006 (UTC)

[edit] Which languages are polysynthetic? (an experiment)

Considering the above Basque discussion concluded, I think we still need to come back to the basic issue (from the point of view of the article, I mean): which languages can/should be cited as examples of polysynthesis? Is the list given towards the end of the article correct? (This was the question raised about Basque, but it can apply to other languages on that list too!) Should there even be such a list, or is it likely to be too misleading?

Having a couple of hours to waste this morning (maybe I was just looking for an excuse not to do some "real" work), I've been carrying out a little informal experiment (notice the emphasis on informal). This was already implicitly suggested in the course of the (Basque-centred) discussion about word and morpheme counts. What I have done is the following: take some sample texts of the Lord's Prayer in a range of languages and count the number of words in each. Notice that I have not counted the morphemes. This is in part because it would have taken too much of my time (and space, if I wanted to lay out the data here); partly because my knowledge or expertise concerning all these languages is insufficient for me to be able to do so confidently in every case; but also because what I have done here is quicker, easier, easier for others to replicate if they wish, and (it seems to me) yields results that have at least some relevance to the issue.

The thinking behind this is that assuming that all the texts say the same thing (have the same semantic content), it is reasonable to suppose that the fewer words they use to say it, the more information is packed into each word (at least at some level or in some sense; I am aware of several simplifications involved in this supposition, but it's an informal experiment, remember!). Although we are not actually counting the number of morphemes in these words, it seems reasonable to suppose that the more morphemes a word contains, the more semantic or grammatical content the word is likely to contain, and vice-versa. These suppositions would lead us to predict that, roughly, the more polysynthesis there is in a language (or in a text in a language), the fewer word boundaries there will be in the text, and that the higher the number of words, the fewer morphemes there probably are per word. Intuitively also this expectation seems to make sense. So let's try it and see what happens.

Important notes about the texts on which the following data are based:

The passages start from "Our father" in the middle of Matthew 6:9 and end with "from evil" at the end of verse 13.
I can, if necessary, either give sources for each text or reproduce the texts in their entirety. I haven't done this because it would have been more time-consuming, but if I am challenged to do so I will (or else I can do so for any specific language mentioned). Since this is informal, I am assuming unless told otherwise that you trust me on this.
In the right-hand column I have given word counts for the first lines of the prayer that were referred to in the above Basque discussion (up to but not including "Give us this day our daily bread"). This is partly just for curiosity, but it does also give us a sort of control for margin of error.

NUMBER OF WORDS IN LORD'S PRAYER
LANGUAGE	WORDS	first lines
Greenlandic	°°°°°°°°°°°°°°°°°°°°°°° (23)	10
Swahili	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (35)	14
Basque	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (44)	18
Latin	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (49)	21
Chinese (pinyin)	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (49)	25
Classical Nahuatl	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (50)	20
English (RV)	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (52)	24
Old English	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (53)	23
Spanish	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (55)	25
New Testament Greek	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (57)	24
Fijian	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (67)	26
Tok Pisin	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (70)	35
(Chinese character count)	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (79)	37
Hawaiian (Bible spelling)	°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° (95)	36

Observations / concluding thoughts:

The result seems close to what we would have expected for a scale of polysynthesis as regards the general categories and therefore looks meaningful. Some specific issues are mentioned below (and we could no doubt have an interesting discussion about others!).
The languages at the bottom of the scale are ones that employ lots of particles. As long as these are considered 'words', then we're okay with that.
For Chinese I used a pinyin (romanised) transliteration which shows words as such, not separate syllables/characters, as this seems linguistically more relevant. I have given the result based on counting the written characters as well for comparison. Since most (not all) characters correspond to morphemes, statements that Chinese is isolating, if taken to refer to modern Mandarin Chinese at least, are not to be taken too dogmatically!
I find it interesting the way all the European languages examined bunch together statistically, despite apparent contrasts between e.g. English and Latin. Even Basque isn't that far from the Indo-European languages tested. Some probable expectations about ordering among European languages are not fulfilled. I'm not sure about that, but my interpretation is that the main lesson here is that in a world context, European languages all occupy one niche (compare for example Oceanic languages, which have their niche too!).
The position of Classical Nahuatl comes as a surprise, but is due to its use of particles in conjunction with some highly synthetic words. Hmmm...
In my opinion, the only language in this table that ought to be called polysynthetic is Greenlandic. It is clearly in a different class, and the table brings this out nicely.
Another good point to make is that classifying a given language (Basque for example) taking European languages as the baseline will not give the same results as a world-wide typological perspective does. We seem to keep forgetting that, so take note!
My recommendation is that in the article, the words "The list below gives some families that are stereotypically polysynthetic..." be altered to "The list below gives some families that include some highly synthetic languages..." or something similar. They should not all be called polysynthetic, so that this term may be reserved, as originally intended, for the more extreme type represented by Greenlandic. In my opinion neither Swahili nor Basque belong here (not sure about other Bantu languages, but I'm doubtful). --A R King 09:27, 15 November 2006 (UTC)

For an attempt to put the teerm "polysynthesis" into a teeoretical framework read Mark C Bakers the Polysynthesis parameter. Even if one doesn't agree with his definition of the word it is the most comprehensive study of the properties of languages of the type that are normally called polysyntetic - also some that doesn't fall under his strict definition of polysynthesis. While the above study is very intersting it is original research and as such not includable in the article - also if it were to have scientific weight it is short of a sound theoretical and methodological foundation. I agree that there is no need to say that languages are or aren't polysynthetic - since this is an eecncyclopedia we can simply say these languages are often referred to as polysynthetic by the following people for and the following reasons. We do not need to have a solid theoretical framework underlying our presentation of the word polysynthesis - we only need to describe its usage, and apart from Baker linguistis do not use the term as a strictly defined concept but rather as an impressionistic one. Maunus 11:36, 15 November 2006 (UTC)

I think you should merely list languages that are generally considered polysynthetic, as the term doesnt have a very rigorous definition (despite Baker's redefining). Incidentally, Greenberg's 1960 "A quantitive approach to the morphological typology of languages" in IJAL 26 discusses morpheme-word ratios. I havent read this, but he's supposed to give the following Eskimo 3.72, English 1.68, Sanskrit 2.59. Here is a list by Fortescue (1994):

Eskimo-Aleutian
Algonquian
Iroquoian
Caddoan
Na-Dene
Uto-Aztecan
Wakashan
Salishan
Hokan
Totonac-Tepehua
Mixe-Zoque
Luorawetlan

"less typical, but at least mildy polysynthetic": Siouan, Gulf, northern Penutian. Other "doubtful" ones: some languages of northwest Caucacus (e.g. Abkhaz, although non-incorporating & limited number of "slot fillers"), some languages of Sepik River area in Papua New Guinea (with serial verbs). Can "exclude languages where relatively long words do occur but are the result either of purely derivational compounding...[or] agglutinating chaining of a limited number of successive affixes to single stems": German, Sanskrit, Uralic, Altaic, Bantu, Australian. – ishwar (speak) 14:02, 15 November 2006 (UTC)

Evans & Sasse (2002) disagree with Fortescue and add northern Australia (e.g. Bininj Gun-wok, Ngalakgan, Rembarrnga), and "probably" Ket. Also they say: "we have included two further languages — Georgian and Nivkh — where their position is less clear, in the case of Georgian because of the existence of alternate strategies in which external agrument encoding must be used in certain circumstances, and in the case of Nivkh because, although it exhibits many parallels with canonical polysynthetic languages in terms of the complexity of its morphology, the ‘dependent-head’ synthesis technique it employs does not include the representation of subjects on the verb". – ishwar (speak) 14:14, 15 November 2006 (UTC)

Johanna Mattissen also mentions: Lowland South America, Tiwi (Non-Pama-Nyungan), Munda (e.g. So:ra, Gorum), Papua New Guinea (Yimas, Awtuw). And German, Russian, or South Caucasian are "not convincing polysynthetic languages" – ishwar (speak) 14:19, 15 November 2006 (UTC)

Retrieved from "http://en.wikipedia.org../../../p/o/l/Talk%7EPolysynthetic_language_8e58.html"