Ben Mallory: 'You are not just asking whether a sequence drives expression; you are asking how, at the level of individual protein-DNA interactions.'
INTRODUCTORY NOTE: Benjamin J. Mallory, a Ph.D. candidate in the Stergachis and Starita Labs at BBI and the University of Washington, is the lead author of “Chromatin architectures underlying plasmid-based assays for regulatory variant effects,” published June 4, in the journal Molecular Cell. Here, Mallory discusses how the paper came about, its findings, and implications for future research.
Q: How did this paper come about?
Mallory: This started as my first-year rotation project in the labs of Lea Starita and Andrew Stergachis. The field has a tool it relies on heavily called massively parallel reporter assays — or MPRAs. The basic idea is you take a DNA sequence that you think might control how much a gene gets expressed, pull it out of the genome, and put it onto a small circular piece of DNA called a plasmid. You place it next to a reporter gene — a common one is GFP, which glows green under fluorescent light. If your sequence causes GFP to be expressed, that’s your signal that the sequence can function as a regulatory element. MPRAs let you test thousands of these sequences simultaneously, cheaply and efficiently.
The problem is that when you move a sequence from the genome onto a plasmid, you’re placing it in a very different environment from where it normally lives. We had no real way of knowing how similar or different those two environments actually were — and that distinction matters enormously for how you interpret your results.
Q: Why does that matter?
Mallory: There is a large international research consortium called ENCODE — the Encyclopedia of DNA Elements — whose goal is to build a comprehensive catalog of functional elements across the human genome. A major part of that work involves identifying sequences that may have a gene regulatory function, meaning they control when and how much a gene is expressed. Researchers identify these using chromatin markers, which act like signposts: Characteristics of a DNA sequence that, when present, suggest it could be regulatory. These sequences are cataloged as “candidate regulatory elements” — and that word candidate matters. To move from candidate to confirmed, you have to functionally validate them.
Here’s the disconnect: That validation typically happens on plasmids. Researchers take a candidate regulatory element out of the genome, put it on a plasmid, and measure whether it drives gene expression. But the whole reason they flagged it as a candidate in the first place was based on its chromatin context in the genome — and nobody knew whether that same chromatin context was even present on the plasmid. Our paper asks, for the first time, how similar those two contexts actually are.
Q: What is plasmid Fiber-seq, and how did you use it here?
Mallory: Plasmid Fiber-seq builds on Fiber-seq, a sequencing technology developed in Andrew Stergachis’s lab. The best way to think about it is as a molecular stencil: An enzyme is introduced into the cell nucleus that chemically marks all the DNA that is not protected by proteins — essentially painting the DNA and leaving behind “footprints” of whatever proteins were bound there. Those proteins include nucleosomes, the spools that DNA wraps around, and transcription factors, the proteins that switch genes on and off. Together, these proteins and DNA make up chromatin. When you then sequence those painted molecules using long-read sequencing, you can read out exactly where each protein was sitting, at near single-nucleotide resolution. Applying that to plasmids gave us, for the first time, a detailed map of the chromatin architecture along individual transfected plasmid molecules.
Q: Are plasmids actually chromatinized?
Mallory: Yes — and that itself was a meaningful finding. The field had largely been operating under one of two assumptions: either that plasmids aren’t chromatinized at all, or that if they are, they don’t adopt the same chromatin structure as the genome. Neither had been rigorously tested. What we found is that proteins do bind plasmids in an organized, sequence-specific way. But the picture is more complicated than a simple “Yes.” Plasmids adopt what we call a heterogeneous and incomplete chromatin architecture; they have fewer nucleosomes per length of DNA than the nuclear genome does, and different copies of the same plasmid within the same cell can look quite different from one another. One of the more striking findings was how much the cell type matters. Even closely related cell lines — HEK293 and HEK293T, which differ by a single integrated gene — showed dramatically different chromatinization patterns for the same plasmid.
Q: Can a plasmid faithfully replicate the chromatin structure along a sequence from the genome?
Mallory: It depends on the specific sequence and its surrounding context — and for anyone using reporter assays, that’s the most important takeaway from this paper. Most reporter assay experiments use short fragments, around 300 base pairs, because that’s the limit of commercially synthesized DNA. You cannot buy accurately synthesized DNA longer than about 350 base pairs, and that is a technological constraint, not a financial one. We found that the surrounding sequence context plays an important role in whether a plasmid faithfully recapitulates the chromatin of that same sequence in the genome, but there’s no simple rule. In some cases, including a larger stretch of sequence helped; in others, a shorter fragment actually did a better job. What we can say is that about a quarter of sequences we tested showed chromatin patterns on the plasmid that differed from what we see in the genome. That’s a meaningful error rate, and until now, researchers had no way to detect it.
Q: The paper also examines disease-causing genetic variants. What can plasmid Fiber-seq tell us there?
Mallory: This is where it gets really exciting. Think of it this way: Proteins that regulate gene expression each recognize a specific sequence in the DNA, almost like a unique address. When a protein finds its address, it binds to that sequence and carries out its regulatory function. If you change a single base pair, you can alter that address just enough that the protein no longer recognizes it. Now it does not bind, and gene expression is altered, either rising or falling, depending on what that protein normally does. Before this work, researchers could see that a variant changed expression levels, but could not explain why — what was the actual molecular mechanism driving the change? Now, with plasmid Fiber-seq, you can see exactly which protein-DNA interactions changed, at near single-nucleotide resolution. We demonstrated this with a variant in the SLC39A4 gene that causes a rare zinc deficiency disorder, and with a large library of variants in the LDLR promoter — which is relevant to heart disease risk.
Q: What does this mean for the field going forward?
Mallory: At minimum, it gives researchers a quality control step they did not have before. You can now check whether the sequence you are studying on a plasmid actually looks the way it does in the genome — before spending time and money on a full experiment. But it is more than a quality check. This adds an entirely new layer of information to these assays. You are not just asking whether a sequence drives expression; you are asking how, at the level of individual protein-DNA interactions. That is a meaningful shift in what these experiments can reveal.