The information required to produce a complex organism is encoded within its genome. A lens cell of the eye and an insulin-producing pancreatic cell contain identical genomic information yet access only a subset of that information. Thus, regulated expression of specific genes, in response to various cues, is what instructs cells to adopt defined fates in an organism. Inappropriate expression of genes can give rise to diseases, including cancer and diabetes.

The broad goals of the lab are to understand the mechanistic events that culminate in the expression of specific genes, and to develop artificial transcription factors capable of regulating the expression of targeted genes. In a multidisciplinary effort, we utilize chemical, biological, biophysical, computational, and genomic tools to address these goals.

Cognate Site Identification

Cognate Site Identification (CSI) is a strategy used to determine the comprehensive sequence specificity spectrum by a DNA-binding factor or molecule. In this in vitro assay, all possible permutations of 25-bp sequences (1015 unique sequences) are incubated with the protein of interest. Bound DNA sequences are captured with an antibody specific to the protein, and the enriched sequences are PCR amplified. The process of enrichment can be repeated for multiple rounds, then, the captured sequences of each round of enrichment undergo high-throughput sequencing. Computational analysis of the sequencing results provides sequence specificity landscapes (SSLs), which are comprehensive representations that show binding preferences from low to high affinity binding sites. Position weight matrices (PWMs) can be generated from this data, aiding in the identification of binding sites across the genome.
The goal is to develop an integrated suite of computation, instrumentation, and synthetic methods for the design, synthesis, and fabrication of de novo designed genomes to bypass current choke points. This suite will enable fabrication of user-programmed genomes that encode for desired functional properties. Four interwoven projects (Genetic Circuits, Virtual Foundry, Genome Assembly, and Gene Switches) are proposed to realize this goal. Done in parallel and in collaboration with labs across the UW-Madison campus (Reed Labs, Ramanathan Labs, and Schwartz Labs), each project provides solutions to seemingly intractable obstacles to genome engineering.

Designing Gene Switches

The goal of metabolic engineering is to harness an organism’s metabolism to create a product of interest, which is generally at odds with the cell’s primary objective of maximizing biomass production. In the branched-chain amino acid (BCAA) biosynthesis pathway, the overproduction of valine indirectly causes accumulation of a toxic byproduct in Escherichia coli cells. The modular design principles of natural transcription factors can be harnessed to create artificial transcription factors (ATFs), such as Transcription Activator-Like Effectors (TALEs), that target any specified sequence and perturb metabolic networks with temporal control. The ability to regulate gene expression circumvents current limitations of copying extant genes and transplanting them in near-identical recipients that can recognize related regulatory elements.
The overarching goal is to create synthetic molecules that mimic natural transcription factors and can be tuned to counteract the actions of transcription factors expressed in the cell. ATFs may be engineered to control any gene regulatory circuit in a predetermined manner. In the long-term, we will create synthetic molecules that can regulate desired circuitry to dictate a desired biological outcome, for example, the fate decisions of human embryonic stem cells.

Artificial Transcription Factors

Transcriptional networks drive cell fate changes and maintain the cell at a homeostatic state. Artificial Transcription Factors (ATFs) can be tailor-made to perturb these transcriptional networks and initiate a conversion to a different cell type.

Activating The Pluripotency Network

We have developed a zinc finger ATF library, which can target an array of 9-bp sequences in the genome. We have used this ATF library with a complexity of 2.62 x 106 as a genetic screen for activating the pluripotency network and inducing cardiomyocyte and hematopoietic differentiation. This strategy enables the de novo identification of master regulators, which control major cell fate decisions.


We are working on a class of sequence-specific DNA-binding molecules, polyamides. Pyrrole/imidazole-based polyamides can be rationally designed to target specific DNA sequences with exquisite precision in vitro; yet, the biological outcomes are often difficult to interpret using current models of binding energetics. To identify polyamide binding sites in cells, I have been developing and applying a method called COSMIC (crosslinking of small molecules for isolation of chromatin).

Carboxy-terminal Domain

The spatiotemporally controlled engagement of different protein complexes during the transcription cycle is choreographed via the carboxyl-terminal domain (CTD) of the largest subunit of Pol II. The CTD is an unusual domain consisting of a repeating hepta-peptide with the sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7 (Y1S2P3T4S5P6S7). This domain is conserved across eukaryotes with increasing numbers of tandem YSPTSPS repeats in increasingly complex organisms (26 repeats in yeast and 52 in humans). Structural studies of Pol II place the hepta-peptide unit of the CTD near the RNA exit tunnel on the polymerase. Biophysical analyses further support the notion that the hepta-peptide is extremely flexible and can adopt different structures when bound to different proteins. The CTD is therefore optimally positioned and sufficiently malleable to function as a scaffold for the assembly of a plurality of protein complexes that act on nascent transcripts, on Pol II, and on chromatin. How the CTD orchestrates the dynamic association of relevant protein complexes at different classes of genes is a fundamentally important question that lies at the intersection of eukaryotic transcriptional regulation, co-transcriptional processing and establishment of stable chromatin landscapes.

As the importance of patterned CTD modifications comes into greater focus, so does the realization that many fundamental tenets of the hypothesis remain unexamined. It remains to be determined if specific modification patterns define a code that is deciphered in a deterministic manner or if the molecular recognition rules are less “code-like” and more contextual. Recent reports assert that certain modifications of the CTD only impact a specific class of genes, yet they are placed by kinases, acetyl- and methyltransferases that are associated with most, if not all, Pol II transcribed genes. How might ubiquitous modifications be deciphered differently at different gene classes? Or can the set of four canonical CTD kinases, known to phosphorylate specific residues and facilitate specific steps of the transcription cycle, uncharacteristically place different patterns at different gene classes? Is the pattern of the CTD modifications read in concert with “class-specific” sequence elements in nascent transcripts or genomic DNA? Beyond the canonical kinases, are there other signal-responsive or cell-state responsive kinases that act on the CTD to remodel transcription of specific gene networks and facilitate transient RNA regulons. Do certain patterns have non-transcriptional roles or impact Pol II stability, assembly and cytoplasm? While we exemplify kinases and phosphorylation, these questions are relevant to every other modification of the CTD. Moreover, due to the multiplicity of near-identical CTD hepta-peptide repeats, it is unclear if all the repeats or a subset are modified at any time, and if the modifications are read in a combinatorial manner.