Is DNA the Next Step for Information Storage?

Our Digital Future

Is DNA the next medium for information storage? Scientists Nick Goldman and Ewan Birney seem to think it is. Researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI), Goldman and Birney explain the details of their discovery in the current issue of Nature.

According to researchers at EMBL-EBI, because of DNA’s structure, the ability to compress large amounts of biological information is possible. Theoretically, this characteristic could be extended to digital information as well, as process that is not only incredible, but essential. The 21^st century data crisis that archives and other organizations are facing is no surprise. The volume of information generated every day is growing exponentially, with researchers estimating individual consumption at about twelve hours per day. Using digital media measurements, this translates to around 34 gigabytes of information for the average person during an average day (Report on American Consumers, 2009). Much of this data is critical to understanding human history and intellectual development, and consequently is crucial to preserve. But how is this possible?

The inspiration for the breakthrough derived from this exact question, experienced by the scientists in their daily work environment. EMBL-EBI is responsible for the creation, maintenance, storage, and access of the most complex biological databases in the world, but like digital storage everywhere, they are quickly running out of space. Financial concerns exacerbate the problem. While data creation is growing exponentially, budgets to store this information are not increasing at the same pace. This forces companies to focus on what and how much content humans can afford to lose, rather than concentrating efforts on where to save the data. At the same time, attempting to save too much data yields additional challenges. In order to record high amounts of information, specialists are being forced to use less effective methods of "loosy" (and not lossless compression) in order to store content. While lossy compression makes files smaller, it cannot recreate the original file exactly. This creates difficulties for any audio, video, and images that need to be preserved. When dealing with pictures especially, loosy compression is very noticeable as it affects the overall resolution of an image.

What's the difference between loosy and lossless compression?

However, Goldman and Birney realized that there was a solution to these problems, and it existed right in front of them. DNA, the tool they used every day, was an extremely efficient way to store information. Their hypothesis insisted that any format could be saved as well. What they discovered was not only incredible, but true.

How it Works

The genetic structure of the human body relies on combinations of four primary nucleotides: adenine, cytosine, guanine, and thymine. Each of these chemicals are considered “bases” and abbreviated as A, C, G, and T. These four letters are the foundation for the chemical language that spells out different genetic instructions for human cells. Scientists have postulated that the reorganization of these genes could be mapped to sequences of computer code, which serve as the building blocks of digital files. With a simple cipher, Goldman and Birney translated the zeroes and ones of basic computer language into the four letters of DNA. The challenge was in successfully swapping considerable amounts of genetic information with cultural material. The scientists tried it – and it worked.

First they started with a .txt file of all of all 154 of Shakespeare’s sonnets. Then, they encoded a .mp3 of Dr. Martin Luther King Jr.’s “I have a dream” speech, as well as a .jpg picture of their office, a .pdf of an early Watson and Crick paper, and a file that describes the encoding. They ran all of the instructions and the binary code through a computer program, before shipping off the code blueprint to Agilent Technologies, a biotechnology company and former lab of Hewlett-Packard. Agilent Technologies synthesized the DNA and mailed it back.

What made Goldman and Birney’s research even more noteworthy was the fact that they included error-checking routines to assist in enabling reliable retrieval. Their discovery was not the first time a scientist had considered placing data on DNA, but it was the first study to propose a strategy that reduced mistakes in the data. Previously in August 2012, George Church, a professor of genetics at Harvard, had demonstrated a similar idea with the successful encoding of a book in DNA (both sides state that they were not aware of each other’s research).

For more information on George Church and his research please consult: http://newsfeed.time.com/2012/08/20/the-first-book-to-be-encoded-in-dna/

To help reduce the error rates in the data, the two scientists invented a method to use strings of bases that did not repeat. They also made sure each base was dependent on the one preceding it by breaking up the code into small, overlapping fragments that could be read both backwards and forwards. The index showed where each fragment belonged amongst the entire string of code. Because they developed a code that could not repeat bases, errors would be extremely infrequent. In order for code to fail, there would have to be the same error in four different fragments, which would be highly unlikely.

Upon the DNA’s return, Goldman and Birney were surprised at how miniscule the files were. As the DNA was composed of small fragments, the physical result was of numerous, almost unnoticeable, specks at the bottom of test tubes. The scientists sequenced the DNA and ran the cipher backwards. All of the files were found to be 100% intact and accurate.

Listen to the full story here

Nick Goldman

Researcher Nick Goldman holds the DNA that encodes all of Shakespeare's sonnets, a photograph, and an mp3 clip of the famous "I have a dream" speech.

The Impact

The implications of the scientists’ research are vast. As Golman and Birney explain in the February 2013 issue of Nature, it is possible to “store at least 100 million hours of high-definition video in about a cup of DNA,” and the process would be easy to scale up.

At the same time, DNA is considered a safe, molecular form that scientists predict will be in existence at least another 10,000 years. Hard drives, on the other hand, do not have near that lifespan. They are also expensive and require large amounts of storage space and a constant supply of electricity. Even archival materials that don’t require power sources, such as magnetic tape, degrade within a decade. However, DNA is extremely small, dense, and doesn’t require any electricity. It only needs to be stored in a cool, dry place. Subsequently, storage and shipping needs are much less complicated.

While DNA storage is still expensive (scientists estimate that one megabyte of DNA storage costs about $12,000), potentially this won’t be the case in another ten years. DNA synthesis costs are dropping, and while the average person will not be able to afford their own DNA hard drive, companies may reach the breaking point soon. In another decade, the maintenance costs for a room full of hard drives might be more than purchasing DNA storage. The expensive, time-consuming, and difficult part is making the strands of DNA; however, reading the information is considerably less work and sequencing technology continues to decrease in cost. Nevertheless, as PC Magazine states, this research carries enormous powers as humans are on their way to the next “commercially viable DNA storage model.”