New DNA Tool Predicts Height, Shows Promise For Serious Illness Assessment

Oct 4, 2018

A new DNA tool created by Michigan State University can accurately predict people’s height, and more importantly, could potentially assess their risk for serious illnesses, such as heart disease and cancer. For the first time, the tool, or algorithm, builds predictors for human traits such as height, bone density and even the level of education a person might achieve, purely based on one’s genome. But the applications may not stop there.


Stephen Hsu, lead investigator of the study and vice president for research and graduate studies at MSU, discusses the research and talks about MSU's overall research mission and impact.

Stephen Hsu: Well, MSU has recently reached a milestone of $700 million a year in research expenditures, and that number's tracked each year by the National Science Foundation. So each research university in the country reports that number. We've grown from about, not so many years ago, maybe five years ago, around $500 million, and now we're at $700 million a year. That really reflects growth in a lot of different areas. Two of the big stories, I think, worth mentioning are - one is FRIB, so this huge Department of Energy accelerator facility. Normally, those go to national labs, and we were very lucky to win the competition to have it here. It's going to be the leading instrument for nuclear physics, for understanding what happens when you bang together the cores of atoms. These are things that haven't really happened except in the Big Bang, and then, also, when huge stars explode or collide with each other. That's a very exciting frontier area of physics.

The other area that we've been really building up on campus is biomedical research. We've been hiring really prominent people. We created a new institute, the Institute for Quantitative Health Science and Engineering, and we have a number of significant collaborations that we're planning with big health systems in the state.

Russ White: Just so exciting, all the stuff that's going on, Stephen, and explain to people why basic scientific research, so much of that done at universities like MSU, is important in a world where taxpayers want the results yesterday.
Hsu: Companies are generally run to please the shareholders, to make as much money as possible for the shareholders. So if they invest money in research and development, it has to be something that's going to pay off in a product and within the foreseeable future, which typically means just a few years. If you have any kind of problem or someone has a brilliant idea that could take 10 years of hard work or 20 years of hard work to bring to fruition, there are really only a limited number of places in our society where that research can be done. One that I mentioned just a minute ago is the national labs, but there are only something like a dozen national labs in the country, whereas there are, I would say, 50 or 100 world class research universities in the United States. That is actually where most of the basic research, the research that really is trying to advance far out ideas, ideas that are so complex that they're going to take decades to work out takes place. Society really benefits from those things. Almost every difference between the way we live now and the way we lived a few hundred years ago or even a thousand years ago is due to technological and scientific advances.

White: Steve, as you look out into the future, what are some of the research frontiers that excite you?

Hsu: I think the two areas that, with 100 percent certainty, are going to affect people's daily lives, transform people's lives are, number one, artificial intelligence. As computers get faster and faster and they have access to bigger and bigger data sets and better and better algorithms. In any narrow area, they can already perform better than people. You pick a game like chess or Go or poker or what have you, or driving a car, any kind of narrow task, machines are already comparable to, or possibly much better than people. The real question is how long it will take before we can embody in machines a kind of broad common sense, intelligence that we humans have evolved over billions of years of natural selection. It might be in our lifetimes. It might be somewhat more distant in the future, but I am confident that in the next 20, 30 years, we're going to see huge impacts on society from AI.

The other one that I'll mention is genomics technology. We're now able to read out, very inexpensively, people's genetic code, and once you can read out people's genetic code and once you understand how to interpret that code, how to decode the DNA, then you can actually say things about the person. You can predict who is most likely to have heart disease, who is more likely to have diabetes, who is most likely to have breast cancer, and you can have treatments which are informed by people's genomics, a particular drug which works well on a particular 10 percent, say, of the population that have a certain genetic profile but doesn't work in the 90 percent. All of these are things which we're making huge advances on right now.

White: Steve, would you describe for us, in general, your type of research and then your particular new research, which is ... What exactly is this thing you've developed? Is it an algorithm? A machine learning modeling system? How do we properly refer to it?

Hsu: Well, my main area of research is theoretical physics, and that generally has to do with things like quantum mechanics and general relativity and black holes. But an area that I started getting interested in about a decade ago, seven years ago, is the application of AI or machine learning to genomics. One of the questions was what I mentioned a few minutes ago. To what degree, given lots of data, can we learn to predict the nature of an organism from its DNA alone? We know that if you take, say, two identical twins, human twins, and you separate them at birth so they're raised by different families and they encounter different environments, still, there are a lot of similarities between them when they reach adulthood. Those similarities are most likely due to the impact of the fact that they have the same genetic code.

We know a lot of aspects of our nature, our disease risks, our physiognomy, all those things are influenced by our genetics, and it's been an open problem, actually, since the dawn of genetics, to figure out how to, from just reading the genetic code, predict what an organism will be like. Just in the last few years, because the cost of DNA sequencing and of genotyping have dropped so fast, researchers like me now have access to really big data sets, hundreds of thousands of people, their genomes, and also aspects of them. So how tall are they, what is their bone density, what is their red blood cell count? All of these things are traits that are, ultimately, predictable, and what we did in our recent research was actually build the first accurate complex trait predictors for humans.

A complex trait is a trait that depends on many different genes, many different loci or regions of your genome, and the one that we have the most accurate prediction capability for is actually human height. If you think about human height, it manifests over something like 20 years. People often don't reach their full height until they’re almost 20 years old. Your metabolism is involved. Your hormones are involved. Your mineral deposition in your bones is involved, all these things, how much you sleep at night. It's an incredibly complex biological system.

Here, at our supercomputing center at MSU, we have about 500,000 genomes, and by having the algorithms look at those examples, so look at genome of person number one, how tall is person number one, look at genome of person number two, how tall is person number two, and do that 500,000 times and learn from each example, and ultimately, we end up with a predictor which can predict, from genotype alone. So from your, say, 23andMe genotype or from your Ancestry.com genotype, we can predict your adult height, plus or minus about an inch. It's incredible accuracy, and it was very surprising, I think to most people in the field that that was possible, even with such a large data set to train on.

White: How does your work compare to the Broad Institure and Harvard University work that was published recently in the New York Times?

Hsu: There are sort of two, I mean, at least two big, I would say, advances that surprised a lot of people who study biomedicine or human genetics. Our paper was quite a surprise. We can predict height with the accuracy that I just mentioned. We can predict other traits like bone density, red blood platelet count, all kinds of weird things where, if you just happen to know the measurement of each of those 500,000 people, the algorithm learns how to predict it from the genotype. Just this summer, there was a very impactful paper that came out in Nature Genetics by a group at Harvard, primarily based at Harvard, the Broad Institute, in which, for the first time, they could predict disease risk for a number of really major conditions like heart disease, diabetes, breast cancer, arterial fibrillation. And they could do that, again, from genome alone.

They could, by looking at the genomes of people, pick out, say, what one or two or 3 percent of the population was really at significantly higher risk for those conditions than the average person. This is really opening the door for what people have talked about for a long time, anticipated for a long time, but we were never sure when it was going to really arrive, and that is something we call precision medicine. Precision medicine means the medicine is really targeted toward the individual, and it's informed by things like the specific genotype or the specific DNA that that individual has.

White: I think you just started to answer my next question, Steve. What will this technology allow us to do? How will it benefit people?

Hsu: I think people who are aware of this research, and it really is a small number of people, it's kind of the intersection of the set of people who know their way around genomes and know how to handle hundreds of thousands of genomes and also understand machine learning and AI technology. In that intersection of those two sets of people are the researches who can do this work, and that set of people has been anticipating precision medicine for some time. We actually made a prediction that, for, in the case of height, that once we got past a few hundred thousand genomes to be analyzed, we would be able to crack that problem. What we have to look forward to is, I think, in the future, instead of just getting your blood lipid levels measured or your cholesterol levels or your PSA score from your doctor, your medical system will actually take your genotype, probably, maybe when you're first born.

At some point, you'll go in for your checkup, and your doctor will say, "Hey, we can actually do a lot in terms of figuring out health risks for you by looking at your genome. Do you mind if we take a cheek swab, a few cells from your cheek, or have you spit in this tube?" That will become standard. It will become normal for people to give up some DNA to their medical practitioner. That will become part of their medical record, and all kinds of things, like figuring out what drugs will work best for you, what specific health risks we need to monitor carefully for you, more than for an average person, all those things will become incorporated into precision medicine.

White: Wow. That is cool, Steve. Now, what are the next steps now in your research?

Hsu: We are pursuing this in a number of ways. This field is moving forward very fast, so there are breakthroughs by our group, our research group here at Michigan State, but also at other places, like at Harvard, in which a new condition or a new complex trait suddenly becomes predictable from genotype because we get enough data to really crack the problem. And that's just going on and on. I will predict within a year or two, there'll probably be a dozen serious disease conditions where one can reliably pick out the top few percent of people who are really at elevated risk. I think that's going to happen due to the efforts of researchers. Then what you're going to see is health systems start to move toward actually using these tools. Be on the lookout for, next time you're in for a checkup or some procedure, they may say, "Hey, Mrs. Smith, I notice we don't have your genotype yet. Do you mind if I take a few cells, a cheek swab, and genotype you?"

White: Well, Steve, is society ready for this?

Hsu: I think it's going to take a little bit of getting used to. In terms of disease risk, people understand that diseases can run in families. They're already familiar with the idea that there are single gene mutations that can literally cause a disease condition. We're familiar with Down Syndrome. We're familiar with specific mutations like BRCA that predispose you for breast cancer, et cetera, et cetera. People are a little bit familiar with it, but they're not prepared for the surprisingly strong predictive power that we're going to get. And predictive power is based not just on a single mutation but on combining the information from thousands of different locations in your genome. That's what the new technology permits, and so people are going to have to get used to that.

I think, gradually, people will appreciate it, because it would be nice to go to the doctor and say, "Hey, you know what? You're at bottom 1 percent risk for diabetes, so you can eat that dessert, but you're at top 1 percent risk for breast cancer. So we're going to actually monitor you two or three times more carefully than the average person." Everyone would be better off because the medical resources will be allocated much more efficiently in the future than they are today.

White: That's Stephen Hsu, Michigan State University's vice president for research and graduate studies. There's much more online at research.msu.edu.
MSU Today airs Sunday afternoons at 4:00 on 105.1 FM and AM 870.