WEST LAFAYETTE, IN. – Uncovering the complex associations among genes, proteins and the molecules they produce in a living organism can be a monumental task. Some scientists may devote entire careers to one particular protein and spend decades tracking a handful of metabolic pathways.

They’re often faced with mountains of data but few clues about how everything fits together. Precision tools like mass spectrometry, which discovers the masses of each molecule in a sample, has helped, but there is still a needle-in-the-haystack-like search required to identify each molecule in a living thing.

A Purdue University research collaboration has led to a new method of conducting those searches with the promise of significantly reducing the size of that haystack. The result of years of interdisciplinary collaboration, including innovative bioinformatics work from an undergraduate student, were published last week in the journal The Plant Cell.

“We have the tools to detect and quantify thousands of molecules even when we don’t know their structure and identity,” said Clint Chapple, distinguished professor of biochemistry and member of the Purdue Center for Plant Biology. “But now, we can determine amongst a whole set of compounds in a plant extract — or an extract from cancer cells or fungi — which of the compounds that we see are derived from a particular amino acid precursor. That’s a powerful tool.”


Cole Wunderlich was a sophomore back in 2012 when he looked for a Summer Undergraduate Research Fellowship and came across the Chapple lab. Wunderlich, who would earn a bachelor’s degree in biochemistry, had a passion for both computer programming and biology and thought it would be interesting to explore ways in which plants could be modified to produce biofuels.

“If you want to manipulate a plant to produce a biofuel, you need to understand its genetics and metabolism. Only then can you alter that metabolism to create a fuel compound,” said Wunderlich, now a doctoral candidate at Cold Spring Harbor Laboratory in New York. “I thought that was something really far out, but it turns out Dr. Chapple was already working on that.”

Chapple told Wunderlich he wasn’t taking on undergraduate researchers for his lab, but he met with him anyway. Seeing Wunderlich’s passion, Chapple asked if he had any programming experience. Wunderlich said he had a little.

“He was a whiz,” Chapple said.

Wunderlich joined Chapple’s lab and started working on the process of identifying molecules in plant extracts using statistics and computational algorithms. Scientists can look at the masses of the molecules a plant produces and then run similar experiments with mutants of the same plant. They can compare the results, but they don’t often know the identities of the molecules that have increased or decreased in abundance due to the genetic changes.

Wunderlich helped develop a method for identifying all of the molecules of interest in a plant by using a special molecular labeling technique. Wunderlich then developed a computer program that could automatically identify all of these labeled compounds from an experiment, essentially creating a magnet that could be used to pull out all of the interesting needles from each haystack. 

“We wanted to find a way so that at the push of a button, we could see how a mutated plant is responding,” Wunderlich said. “I found open-source software that did some of this, but I had to write some of my own code to detect the labeled molecules and then turn the whole thing into a pipeline so that we could do labeling experiments, collect data and see the things we were interested in.”

The process took three years, but it paid off.

“At a lab meeting, Dr. Chapple said, ‘With this one experiment, you found what it’s taken my lab years to find,’” Wunderlich said.


Biochemists like Chapple can analyze a plant sample by liquid chromatography and mass spectrometry to learn about the chemical formulae and amounts of the molecules they had extracted. Chapple’s technique involves feeding the plants with amino acids with atomic tags before running those tests and observing the changes in the small subset of molecules that the plants made from those compounds.

Chapple has long worked on understanding the biosynthesis of lignin, a polymer that serves as a barrier to better biofuel production. Phenylalanine, an amino acid, is a precursor to the formation of lignin, so Chapple’s lab is interested in all the molecules that plants make from phenylalanine.

Running a normal experiment, Chapple may find that a particular molecule has a mass of 301. He can add phenylalanine to the sample with an isotopic tag that makes phenylalanine six mass units heavier, run the experiment again, and then look to see if a molecule with a mass of 307 increases in abundance. If so, it’s because it was made from the tagged phenylalanine he gave the plant.

“Whenever the plant incorporates a heavy phenylalanine into the molecule, we see that,” Chapple said. “It acts as a filter, if you will. It tags these molecules for us as being of interest.”


Determining which molecules are derived from specific amino acids is a major accomplishment. But Chapple and Brian Dilkes, a professor of biochemistry and fellow member of the Purdue Center for Plant Biology, took it step further.

Dilkes uses genome-wide association to identify the tiny changes in a plant’s genome and tie them to the resulting changes seen in an organism. In this case, Dilkes can use all molecules and see how millions of different genetic changes alter the accumulation of each molecule. The process can uncover how differences in the molecular makeup of an organism are affected by a single change in the DNA of a particular gene. Because Wunderlich identified the phenylalanine-derived metabolites, the DNA changes that control these compounds can be analyzed as a set. That approach reveals groups of genes that are required to make these compounds of interest.

“By doing this, we can exploit the slight variations in the genetics of a species. A change in a gene affects the abundance of a certain molecule, and now that we know these are derived from the amino acid phenylalanine, like lignin,” Dilkes said. “We have a much better idea of what that molecule might be.

“I know three things about this compound — it came out at a particular time in liquid chromatography, has a particular mass and is affected by these alleles,” Dilkes added. “You’re triangulating different kinds of data to get large amounts of information about one molecule.”


The techniques have already caught the eyes of funding agencies that see the potential for quickly identifying metabolites.

Jeff Simpson, a postdoctoral researcher in Chapple’s lab and co-first author of the Plant Cell paper, obtained a two-year $165,000 U.S. Department of Agriculture fellowship to apply the work to maize.

“Two plants in the same species may have different metabolites, so looking at one genotype doesn’t give you all of the diversity of what a species produces,” Simpson said. “With these methods, we can survey a large population of plants and create a library of these associations so it can be out there for the plant science community to use. Using this information, you could potentially up-regulate certain genes to produce more of a beneficial metabolite to produce a crop that is more resistant to stress.”

The Department of Energy has also funded the Chapple/Dilkes collaboration for $2.5 million to identify molecules in the model plant Arabidopsis and in sorghum.

“We still really have a huge amount to learn about plant metabolism and plant genomes,” Chapple said. “This work opens doors to gathering and understanding that information in a more efficient manner, and that’s thanks to some outstanding work by my co-workers and undergraduates who brought great interdisciplinary skills to the table.”

Read or Share this story: