First Friday afternoon panel on Hard Problems in DNA Computing and Molecular Programming
Beckman Institute Auditorium, 1:30pm to 2:15pm, Friday.
Back to Main_Page
What Can We Compute With Folding Pathways?
We've recently seen beautiful ways to program DNA molecules that compute as they fold. For example, sequences of strand displacement reactions can simulate the computation of a Boolean circuit. Strand displacement reactions can even simulate universal models of computation. Folding pathway computations are energy efficient and avoid complications such as the need for enzymes or thermal cycling.
The potential for energy-efficient computations with folding pathways may be much greater than the examples to date suggest. I'll describe two limitations of current folding pathway computations and suggest how new theoretical insights might help us compute with folding pathways in richer ways than we currently do.
The first limitation of folding pathway computations such as those mentioned above is that the number of molecules, and thus nucleotides, needed to simulate t computing steps grows at least linearly with t. Can we do interesting t-step computations using a number of molecules or nucleotides that is significantly less than t?
Is it possible, for example, to design a single DNA molecule that, while folding from a pre-specified initial configuration to its unique stable configuration, figures out who wins from an arbitrary given configuration of an n x n checkers board? Is this possible with a DNA molecule whose number of nucleotides is polynomial in n? If so, the molecule's folding pathway would, in effect, explore exponentially many game plays and thus would compute for time exponential in its number of nucleotides.
We can translate this question about the potential for computing with folding pathways into a complexity theory question. Inevitably, some important facets of the original question are lost in translation. Nevertheless, in the interest of progress, here's the translation. Is the following problem PSPACE-complete: Given a DNA sequence and two secondary structures for the strand, is there a low-energy-barrier folding pathway from one structure to the other?
The second limitation of current folding pathway computations applies to chemical reaction systems (CRS's) more generally. A CRS is a set of chemical reactions, with each reaction consuming some molecular species and producing others. CRS's can be realized by strand displacement systems and thus understanding their potential for computation informs our understanding of folding pathway computations too.
Some CRS's are deterministic with respect to an appropriate initial set of chemical species in the sense that exactly one reaction can be applied to the initial set of species, then another to the species resulting from the first reaction, and so on, with exactly one reaction being applicable to the set of species available at each step. (If reactions are reversible then two reactions are applicable at each step, one of which advances the system and one of which causes the system to take a step backwards.) Either the reactions continue ad infinitum or eventually no reaction applies to the available set of species, in which case the sequence of reactions ends. The sequence models a computation on the initial "input" species.
Here's the limitation: some examples of deterministic CRS's, notably Qian et al.'s CRS for simulating universal computations, only work if one copy of some input species are present initially. Roughly, this is because of "cross talk" between different copies of the system. Thus the question: can we design CRS's that perform universal computations and that work correctly even if multiple copies of the initial stand species are present?
Many complain that a hard problem for theoreticians in our field is communicating effectively with experimentalists, in order to help them understand what we do and to see the beauty of pure theory. It is true to say that people from different scientific backgrounds often have trouble understanding the deeper motivation for others' research. But it is the case that theoreticians in our field have been particularly well-inspired by some of the things experimentalists are doing. We don't know how to model all of the details of complicated molecular processes so we abstract and use probabilistic models. We know nature does not have a global clock, so we build asynchrony into our computational models. We try to optimize concentrations, minimize the number of distinct components used in our molecular algorithms, minimize time, and other resources that are inspired by experimental limitations.
Still, we often ask questions and prove theorems simply because the questions are beautiful and finding the results are fun. We admire nice proofs. But these are really good things, and it is what drives us. It also brings a kind of elegance and abstraction that we hope can be inspirational to experimentalists. This is one of the many things we give back.
Also, it seems to me that the division between theory and experiment is becoming less apparent as the field has evolved, and this is a good thing. Over the years the field has developed a personality of its own where we challenge each other and try to keep standards high. Since I'm rather new to the field, I guess I feel fine about saying that I want to give us all a collective pat on the back.
Before the panel discussion I will highlight one particular area that I find to be a particularly inspiring algorithmic process in biology: embryonic development. Every day, millions of biological systems run complicated embryonic developmental algorithms, with very low error rates, fast (even exponential) cellular growth rates, embedded in complicated systems (e.g. humans) that are constantly changing with their environment. The kind of growth processes seen here are in many ways far beyond what we can do in the lab today with DNA-based molecular programs, but as molecular programmers with a computer science background we have the tools to abstract some key principles in order to design our own developmental models and programs. We can inspire the creation of new molecular programs in the lab, and of course continue to have fun asking beautiful questions.
What are the potentialities and limitations of molecular programming?
Mathematical logicians have developed a comprehensive theory of what can and cannot be computed, and theoretical computer scientists have developed an elaborate, though provisional (i.e., assuming things like that P is not NP) theory of what can and cannot be computed efficiently. However, molecular programming is more than computation. It is the algorithmic construction and control of tangible molecular devices and systems with desired shapes and behaviors. And that is a huge difference. To date we have nothing like a theory of the potentialities and limitations of molecular programming.
Isolated results on specific models tell us that the situation is intriguing. For example, we know that the abstract Tile Assembly Model (aTAM) is Turing universal, but we know that there are simple, decidable structures that cannot self-assemble in it. How do we get past this to a systematic theory, even for the simplistic aTAM? Can we develop a theory of the "internal bandwidth" of a structure, the amount of information that must propagate through it in order for it to self-assemble? Can we develop a useful notion of one molecular programming problem being reducible to another?
Questions like these are important, but they only concern the "easy" case of molecular programming as an isolated process. Most of the applications envisioned for molecular programming involve embedding the programs in complex systems like living cells. Even in the simpler world of computer science, theorists and embedded systems people only speak to each other at lunch. We have a long, challenging, and fun way to go.
Toward Deeper Understanding of Assembly Processes and Reaction Processes for DNA Nanostructures
(1) A Deeper Understanding of Assembly of Complex DNA Nanostructures: At this point, the DNA Nanostructure community has succeeded in the experimental assembly of large, complex DNA Nanostructures. DNA origami and related assembly techniques using scaffold strands in have been particularly successful, with very low defect rates. In these self-assembly processes for DNA origami, are a very large number of partially complete, defective, as well as completed assemblies. These assemblies have a well-defined probability distribution that evolves over time, and the time-evolution of the probability distribution eventually may reach an equilibrium distribution. There is a well established theory of kinetics that provides some underlying theoretical basis for these processes, but the scale of their complexity swamps our ability to fully explain how a large DNA origami assembly actually occurs. We do not yet have a full theoretical understanding of even approximately how such assemblies evolve or time, and what the probabilistic landscape of assemblies. First steps may be developement of succinct representations that approximate the probability distributions of large groups of partially complete, defective, as well as completed assemblies, as well as methods for approximating the time-evolution of these assemblies.
(2) A Deeper Understanding of DNA hybridization Reaction Processes: A similar challenge exists to provide a theoretical understanding of reaction processes for DNA Nanostructures. The DNA computing community has succeeded in the experimental demonstrations of large groups of interacting autocatalytic hybridization reactions involving DNA nanostructures, for example, see-saw gates. There is a very well defined kinetic theory, and we can simulate them, and correlate well these simulations to experiments. In spite of this, the complexity and scale of these reaction systems again swamps our ability to fully explain and optimize the reactions, to devise methods to speed the reactions and fully eliminate leaks. Again, a first step is succinct representations that approximate the probability distributions of the molecules, and methods for approximating their time-evolution.
DNA17 participants, please enter comments and questions here.