Acetylseryltyrosylseryliso...serine

From Wikipedia, the free encyclopedia

The correct title of this article is too long. Article title lengths must be less than 256 characters because of technical restrictions.
Structure of the Tobacco mosaic virus coat protein.
Structure of the Tobacco mosaic virus coat protein.

Acetylseryltyrosylseryliso...serine is the third chemical name for "Coat Protein, Tobacco mosaic virus, Dahlemense Strain". In its complete form, the chemical name contains 1185 letters, and is one of the longest words in English.

The term was published in the American Chemical Society's Chemical Abstracts in 1972, and is considered by some to be the longest real word. It does hold the record for the longest word published in an English language publication in a serious context — that is, for some reason other than to publish a very long word — but there are bigger proteins which would generate larger words if written.

In its complete form, the 1185-letter word is:

acetylseryltyrosylserylisoleucylthreonylserylprolylserylglutaminyl-
phenylalanylvalylphenylalanylleucylserylserylvalyltryptophylalanyl-
aspartylprolylisoleucylglutamylleucylleucylasparaginylvalylcysteinyl-
threonylserylserylleucylglycylasparaginylglutaminylphenylalanyl-
glutaminylthreonylglutaminylglutaminylalanylarginylthreonylthreonyl-
glutaminylvalylglutaminylglutaminylphenylalanylserylglutaminylvalyl-
tryptophyllysylprolylphenylalanylprolylglutaminylserylthreonylvalyl-
arginylphenylalanylprolylglycylaspartylvalyltyrosyllysylvalyltyrosyl-
arginyltyrosylasparaginylalanylvalylleucylaspartylprolylleucylisoleucyl-
threonylalanylleucylleucylglycylthreonylphenylalanylaspartylthreonyl-
arginylasparaginylarginylisoleucylisoleucylglutamylvalylglutamyl-
asparaginylglutaminylglutaminylserylprolylthreonylthreonylalanylglutamyl-
threonylleucylaspartylalanylthreonylarginylarginylvalylaspartylaspartyl-
alanylthreonylvalylalanylisoleucylarginylserylalanylasparaginylisoleucyl-
asparaginylleucylvalylasparaginylglutamylleucylvalylarginylglycyl-
threonylglycylleucyltyrosylasparaginylglutaminylasparaginylthreonyl-
phenylalanylglutamylserylmethionylserylglycylleucylvalyltryptophyl-
threonylserylalanylprolylalanylserine

The letter combination yl appears in the word 166 times.

[edit] Etymology

While this term may seem daunting in its length, its construction is actually simple because it describes a relatively simple yet lengthy organic molecule. Single-chain organic molecules are constructed of numerous functional groups connected together. The name of any single-chain organic molecule is constructed by simply stringing all the names of the composite functional groups together in the order in which they are found in the molecule itself. All functional groups excluding the last one in the chain are named using their base form with the suffix replaced by yl. Thus, to form the name, all one must do is:

  • List the name of each functional group in the molecule,
  • Replace the suffixes of all functional groups but the last with yl, and
  • Connect all the strings together.

As the molecule becomes larger the name becomes larger as well; a chemical formula may be necessary. The formula for this term can be written (comparatively) much more easily in IUPAC notation as shown next (where abbreviations for all functional groups present in the molecule are connected by hyphens):

Acetyl-Ser-Tyr-Ser-Ile-Thr-Ser-Pro-Ser-Gln-Phe-Val-Phe-Leu-Ser-Ser-Val-
Trp-Ala-Asp-Pro-Ile-Glu-Leu-Leu-Asn-Val-Cys-Thr-Ser-Ser-Leu-Gly-Asn-Gln-
Phe-Gln-Thr-Gln-Gln-Ala-Arg-Thr-Thr-Gln-Val-Gln-Gln-Phe-Ser-Gln-Val-Trp-
Lys-Pro-Phe-Pro-Gln-Ser-Thr-Val-Arg-Phe-Pro-Gly-Asp-Val-Tyr-Lys-Val-Tyr-
Arg-Tyr-Asn-Ala-Val-Leu-Asp-Pro-Leu-Ile-Thr-Ala-Leu-Leu-Gly-Thr-Phe-Asp-
Thr-Arg-Asn-Arg-Ile-Ile-Glu-Val-Glu-Asn-Gln-Gln-Ser-Pro-Thr-Thr-Ala-Glu-
Thr-Leu-Asp-Ala-Thr-Arg-Arg-Val-Asp-Asp-Ala-Thr-Val-Ala-Ile-Arg-Ser-Ala-
Asn-Ile-Asn-Leu-Val-Asn-Glu-Leu-Val-Arg-Gly-Thr-Gly-Leu-Tyr-Asn-Gln-Asn-
Thr-Phe-Glu-Ser-Met-Ser-Gly-Leu-Val-Trp-Thr-Ser-Ala-Pro-Ala-Ser

Note that this molecule contains only 158 functional groups (not 167 as predicted by the yl count and a terminal functional group) because of multiple occurrences of phenylalanine, which contributes the yl string twice per occurrence (in the form -phenylalanyl-).

[edit] See also