Acetylseryltyrosylseryliso...serine
From Wikipedia, the free encyclopedia
- The full name of this protein is 1185 letters. It has been shortened above for practical considerations.
This 1185-letter "word" is the chemical name for "Coat Protein, Tobacco Mosaic Virus, Dahlemense Strain". It is supposedly the third longest word in the English language.
The term was published in the American Chemical Society's Chemical Abstracts in 1972, and is considered by some to be the longest real word. It does hold the record for the longest word in an English language publication in a serious context, that is, for some reason other than to publish a very long word.
In its complete form, the 1185-letter word is:
- acetylseryltyrosylserylisoleucylthreonylserylprolylserylglutaminyl-
- phenylalanylvalylphenylalanylleucylserylserylvalyltryptophylalanyl-
- aspartylprolylisoleucylglutamylleucylleucylasparaginylvalylcysteinyl-
- threonylserylserylleucylglycylasparaginylglutaminylphenylalanyl-
- glutaminylthreonylglutaminylglutaminylalanylarginylthreonylthreonyl-
- glutaminylvalylglutaminylglutaminylphenylalanylserylglutaminylvalyl-
- tryptophyllysylprolylphenylalanylprolylglutaminylserylthreonylvalyl-
- arginylphenylalanylprolylglycylaspartylvalyltyrosyllysylvalyltyrosyl-
- arginyltyrosylasparaginylalanylvalylleucylaspartylprolylleucylisoleucyl-
- threonylalanylleucylleucylglycylthreonylphenylalanylaspartylthreonyl-
- arginylasparaginylarginylisoleucylisoleucylglutamylvalylglutamyl-
- asparaginylglutaminylglutaminylserylprolylthreonylthreonylalanylglutamyl-
- threonylleucylaspartylalanylthreonylarginylarginylvalylaspartylaspartyl-
- alanylthreonylvalylalanylisoleucylarginylserylalanylasparaginylisoleucyl-
- asparaginylleucylvalylasparaginylglutamylleucylvalylarginylglycyl-
- threonylglycylleucyltyrosylasparaginylglutaminylasparaginylthreonyl-
- phenylalanylglutamylserylmethionylserylglycylleucylvalyltryptophyl-
- threonylserylalanylprolylalanylserine
The letter combination yl appears in the word 166 times.
[edit] Etymology
While this term may seem daunting in its length, its construction is actually quite simple because it describes a relatively simple yet lengthy organic molecule. Single-chain organic molecules are constructed of numerous functional groups connected together. The name of any single-chain organic molecule is constructed by simply stringing all the names of the composite functional groups together in the order in which they are found in the molecule itself. All functional groups excluding the last one in the chain are named using their base form with the suffix replaced by yl. Thus, to form the name, all one must do is:
- List the name of each functional group in the molecule,
- Replace the suffixes of all functional groups but the last with yl, and
- Connect all the strings together.
As the molecule becomes larger the name obviously becomes larger as well; a chemical formula may be necessary. The formula for this term can be written (comparatively) much more easily in IUPAC notation as shown next (where abbreviations for all functional groups present in the molecule are connected by hyphens):
- Acetyl-Ser-Tyr-Ser-Ile-Thr-Ser-Pro-Ser-Gln-Phe-Val-Phe-Leu-Ser-Ser-Val-
- Trp-Ala-Asp-Pro-Ile-Glu-Leu-Leu-Asn-Val-Cys-Thr-Ser-Ser-Leu-Gly-Asn-Gln-
- Phe-Gln-Thr-Gln-Gln-Ala-Arg-Thr-Thr-Gln-Val-Gln-Gln-Phe-Ser-Gln-Val-Trp-
- Lys-Pro-Phe-Pro-Gln-Ser-Thr-Val-Arg-Phe-Pro-Gly-Asp-Val-Tyr-Lys-Val-Tyr-
- Arg-Tyr-Asn-Ala-Val-Leu-Asp-Pro-Leu-Ile-Thr-Ala-Leu-Leu-Gly-Thr-Phe-Asp-
- Thr-Arg-Asn-Arg-Ile-Ile-Glu-Val-Glu-Asn-Gln-Gln-Ser-Pro-Thr-Thr-Ala-Glu-
- Thr-Leu-Asp-Ala-Thr-Arg-Arg-Val-Asp-Asp-Ala-Thr-Val-Ala-Ile-Arg-Ser-Ala-
- Asn-Ile-Asn-Leu-Val-Asn-Glu-Leu-Val-Arg-Gly-Thr-Gly-Leu-Tyr-Asn-Gln-Asn-
- Thr-Phe-Glu-Ser-Met-Ser-Gly-Leu-Val-Trp-Thr-Ser-Ala-Pro-Ala-Ser
Note that this molecule contains only 158 functional groups (not 167 as predicted by the yl count and a terminal functional group) because of multiple occurrences of phenylalanine, which contributes the yl string twice per occurrence (in the form -phenylalanyl-).