Scientists have developed a new approach to using DNA as a data storage medium, slashing the cost and time of writing to the biological substance.
For decades, scientists have been studying the potential of deoxyribonucleic acid, which contains the genetic instructions for the development and function of an organism, as a material for data storage owing to its phenomenal data density.
In theory, a single gram of DNA could store 215,000 terabytes of data, but since the first demonstration of its data storage potential, scientists have struggled to show how it could become part of a practical computing system.
Well-established techniques rely on making DNA sequences from scratch to create the DNA storage medium. The process known as “de novo” is time-consuming, expensive, and prone to errors.
Researchers have already developed other techniques for storing data directly in the DNA of living bacteria cells by using a electromagnetic technique.
However, the current research goes one step further by employing naturally occurring methylation, the process that is part of the epigenetic modification of DNA, which occurs during an individual’s lifetime rather than between generations.
In a paper in Nature this week, Peking University associate professor Cheng Zhang and colleagues describe a demonstration encoding data in the so-called “epi-bits” and reproducing an image of a Chinese rubbing (16,833 bits) and a photo of a panda (252,504 bits) using their synthesis-free method. They argue it is capable of encoding and reading data at a vastly improved rate compared to earlier approaches.
In an accompanying article, University of Washington computer science researchers Carina Imburgia and Jeff Nivala said Zhang’s team had marked “a new direction in DNA-based data storage that has the potential to bypass the time and cost limitations of conventional approaches.”
In addition to improving the performance of DNA storage, the researchers also widened its usability. They created a platform called iDNAdrive and showed that 60 volunteers from diverse academic backgrounds could use it to manually encode approximately 5,000 bits of text data.
“Our framework presents a new modality of DNA data storage that is parallel, programmable, stable and scalable. Such an unconventional modality opens up avenues towards practical data storage and dual-mode data functions in biomolecular systems,” the authors said.
“With DNA data storage entering the dawn of commercialization, the epi-bit framework demonstrates potential directions in parallel molecular information storage with prefabricated modularity.”
However, there is a long way to go before the technique can compete with mainstream computing. For a start, there is speed. Even with efforts to improve speed using epi-bit barcodes instead of DNA-sequence barcodes and an automated liquid-handling platform, the rate for writing data was 40 bits per second. An SSD might be expected to read and write at 200-550 MBps.
Meanwhile, Imburgia and Nivala point to questions about the long-term stability of methyl marks created using the technique.
Yet more challenges await in using the technique to create RAM, which allows random access to any part of stored data. “In the epi-bit system, the entire database would need to be sequenced to access any subset of the files, which would be inefficient using nanopore sequencing.”
While the costs of the epi-bit approach were greater than those of current DNA data systems, that hurdle might be overcome with further process optimization and automation, the commentators said. ®