DNA digital data storage

DNA digital data storage refers to any process to store digital data in the base sequence of DNA. This technology uses artificial DNA made using commercially available oligonucleotide synthesis machines for storage and DNA sequencing machines for retrieval. This type of storage system is more compact than current magnetic tape or hard drive storage systems due to the data density of the DNA. Currently it was reported that in 1 gram of DNA 215 petabytes (215 million gigabytes) could be stored.^[1] It also has the capability for longevity, as long as the DNA is held in cold, dry and dark conditions, as is shown by the study of woolly mammoth DNA from up to 60,000 years ago, and for resistance to obsolescence, as DNA is a universal and fundamental data storage mechanism in biology. These features have led to researchers involved in their development to call this method of data storage "apocalypse-proof" because "after a hypothetical global disaster, future generations might eventually find the stores and be able to read them." ^[2] It is, however, a slow process, as the DNA needs to be sequenced in order to retrieve the data, and so the method is intended for uses with a low access rate such as long-term archival of large amounts of scientific data.^[2]^[3]

History

The idea and the general considerations about the possibility of recording, storage and retrieval of information on DNA molecules were originally made by Mikhail Neiman and published in 1964–65 in the Radiotekhnika journal, USSR, and the technology may therefore be referred to as MNeimONics, while the storage device may be known as MNeimON (Mikhail Neiman OligoNucleotides).^[4]

Among early examples of DNA data storage, in 2007 a device was created at the University of Arizona^[5], using addressing molecules to encode mismatch sites within a DNA strand. These mismatches were then able to be read out by performing a restriction digest, thereby recovering the data. This system has a number of advantages over other methods. Firstly, unlike other methods in which bespoke molecules are synthesised for each new DNA encoding, a common set of molecules could be used to encode any arbitrary data. DNA synthesis is currently expensive, and laborious, so this means that this investment can be used to encode many different sets of data, using the same set of DNA molecules. The encoded DNA created here is also "bio-compatible", meaning that, in principle it can be readily inserted into, and propagated within, an organism.

On August 16, 2012, the journal Science published research by George Church and colleagues at Harvard University, in which DNA was encoded with digital information that included an HTML draft of a 53,400 word book written by the lead researcher, eleven JPG images and one JavaScript program. Multiple copies for redundancy were added and 5.5 petabits can be stored in each cubic millimeter of DNA.^[6] The researchers used a simple code where bits were mapped one-to-one with bases, which had the shortcoming that it led to long runs of the same base, the sequencing of which is error-prone. This research result showed that besides its other functions, DNA can also be another type of storage medium such as hard drives and magnetic tapes.^[2]

An improved system was reported in the journal Nature in January 2013, in an article led by researchers from the European Bioinformatics Institute (EBI) and submitted at around the same time as the paper of Church and colleagues. Over five million bits of data, appearing as a speck of dust to researchers, and consisting of text files and audio files, were successfully stored and then perfectly retrieved and reproduced. Encoded information consisted of all 154 of Shakespeare's sonnets, a twenty-six-second audio clip of the "I Have a Dream" speech by Martin Luther King, the well known paper on the structure of DNA by James Watson and Francis Crick, a photograph of EBI headquarters in Hinxton, United Kingdom, and a file describing the methods behind converting the data. All the DNA files reproduced the information between 99.99% and 100% accuracy.^[3] The main innovations in this research were the use of an error-correcting encoding scheme to ensure the extremely low data-loss rate, as well as the idea of encoding the data in a series of overlapping short oligonucleotides identifiable through a sequence-based indexing scheme.^[2] Also, the sequences of the individual strands of DNA overlapped in such a way that each region of data was repeated four times to avoid errors. Two of these four strands were constructed backwards, also with the goal of eliminating errors.^[3] The costs per megabyte were estimated at $12,400 to encode data and $220 for retrieval. However, it was noted that the exponential decrease in DNA synthesis and sequencing costs, if it continues into the future, should make the technology cost-effective for long-term data storage within about ten years.^[2]

The long-term stability of data encoded in DNA was reported in February 2015, in an article by researches from ETH Zurich. By adding redundancy via Reed–Solomon error correction coding and by encapsulating the DNA within silica glass spheres via Sol-gel chemistry, the researchers predict error-free information recovery after up to 1 million years at -18 °C and 2000 years if stored at 10 °C.^[7]^[8] By adding the possibility of being able to handle errors, the research team could reduce the cost of DNA synthesis down to ~$500/MB by choosing a more error-prone DNA synthesis method. In a news article in the New Scientist the team stated that if they are able to further decrease the cost they would store an archive version of Wikipedia in DNA.

Also, a group of researchers, led by Boise State University is working toward a better way to store digital information using nucleic acid memory (NAM). They suggest that the global flash memory market is predicted to reach $30.2 billion this year, potentially growing to $80.3 billion by 2025. They estimated that by 2040, the demand for global memory will exceed the projected supply of silicon (the raw material used to store flash memory), and that nucleic acid memory has a retention time far exceeding electronic memory. They have discussed the longevity of the DNA materials through first principle theoretical calculations that is published as commentary research article.^[9] According to their claims "With information retention times that range from thousands to millions of years, volumetric density 10³ times greater than flash memory and energy of operation 10⁸ times less, we believe that DNA used as a memory-storage material in nucleic acid memory (NAM) products promises a viable and compelling alternative to electronic memory." and "Given exponentially increasing demands for safeguarded information worldwide, and the long retention times for DNA (ranging from thousands to millions of years), NAM can store the world's information for future generations using far less space and energy. NAM could thus be used as a time capsule for massive, infrequently accessed records in scientific, financial, governmental, historical, genealogical, personal and genetic domains.".^[9]

The above methods of DNA storage had the disadvantage that the whole strand of synthetic DNA has to be sequenced in order to retrieve only one of several data sets that were previously encoded. In April 2016 researchers at the University of Washington published an encoding, storage, retrieval and decoding method that enables random access of any one of the data sets ^[10]

In March 2017, scientists at Columbia University and the New York Genome Center published a method known as DNA Fountain which allows perfect retrieval of information from a density of 215 petabytes per gram of DNA. The technique approaches the Shannon capacity of DNA storage, achieving 85% of the theoretical limit. Using this method, they were also able to perfectly retrieve an operating system called KolibriOS, the French movie Arrival of a Train at La Ciotat, a $50 Amazon gift card, a computer virus, a Pioneer plaque and a study by Claude Shannon, all with a total of 2.14 megabytes. A process which allows 2.18 × 10¹⁵ retrievals using the original DNA sample was also tested, being able to perfectly decode the data. The method is however not ready for large-scale use, as it costs $7000 to synthesize 2 megabytes of data and another $2000 to read it.^[11]^[12]^[13]^[14]

References

↑ http://www.sciencemag.org/news/2017/03/dna-could-store-all-worlds-data-one-room
1 2 3 4 5 Yong, E. (2013). "Synthetic double-helix faithfully stores Shakespeare's sonnets". Nature. doi:10.1038/nature.2013.12279.
1 2 3 Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.; Leproust, E. M.; Sipos, B.; Birney, E. (2013). "Towards practical, high-capacity, low-maintenance information storage in synthesized DNA". Nature. 494 (7435): 77–80. PMC 3672958 . PMID 23354052. doi:10.1038/nature11875.
↑ https://sites.google.com/site/msneiman1905/eng
↑ Skinner, Gary M.; Visscher, Koen; Mansuripur, Masud (2007-06-01). "Biocompatible Writing of Data into DNA". Journal of Bionanoscience. 1 (1): 17–21. doi:10.1166/jbns.2007.005.
↑ Church, G. M.; Gao, Y.; Kosuri, S. (2012). "Next-Generation Digital Information Storage in DNA". Science. 337 (6102): 1628. PMID 22903519. doi:10.1126/science.1226355.
↑ Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J. (2015). "Robust Chemical Preservation of Digital Information on DNA in Silica with Error-Correcting Codes". Angewandte Chemie International Edition. 54 (8): 2552. PMID 25650567. doi:10.1002/anie.201411378.
↑ Jacobs, Angelika (February 13, 2015). "Data-storage for eternity". Eidgenössische Technische Hochschule (ETH) Zürich. Archived from the original on March 15, 2015. Retrieved March 15, 2015.
1 2 Zhirnov, V.; Zadegan, R. M.; Sandhu, G. S.; Church, G. M.; Hughes, W. L. (2016). "Nucleic acid memory". Nature Materials. 15 (4): 366–370. doi:10.1038/nmat4594.
↑ "A DNA-Based Archival Storage System." http://doi.acm.org/10.1145/2872362.2872397
↑ Yong, Ed. "This Speck of DNA Contains a Movie, a Computer Virus, and an Amazon Gift Card". The Atlantic. Retrieved 3 March 2017.
↑ "Researchers store computer operating system and short movie on DNA". Phys.org. Retrieved 3 March 2017.
↑ "DNA could store all of the world's data in one room". Science Magazine. 2 March 2017. Retrieved 3 March 2017.
↑ Erlich, Yaniv; Zielinski, Dina (2 March 2017). "DNA Fountain enables a robust and efficient storage architecture". Science. 355 (6328): 950–954. doi:10.1126/science.aaj2038. Retrieved 3 March 2017.

DNA digital data storage

History

See also

References

Further reading