Mentioned in Video:
- Simon's latest publication: PacBio ($PACB) Doubles Down on Accuracy by Acquiring Omniome: https://medium.com/@sbarnettARK/pacbio-doubles-down-on-accuracy-by-acquiring-omniome-875e64c943cf
- 🧬 ARKG | MASSIVE Breakthrough for ARK Invest's Genomics Stocks: https://www.youtube.com/watch?v=WCCv0H_fgU8
- 🤩 Cathie Wood's Favorite Stock | ARK Invest Buys This Over Tesla Stock (My Deep Dive into TDOC stock): https://www.youtube.com/watch?v=tBopWxiqgGs
- 💔 IT'S OVER! ARK Invest's $700M Breakup with Illumina (ILMN) – A Case Study in Conviction: https://www.youtube.com/watch?v=oH4llGAzi3Y
- 🧬🧪 ARKG: ARK Invest's BEST ETF | Editing Out Disease With ARK's Genomic Revolution Fund: https://www.youtube.com/watch?v=2dIHDOA8rRA
- 🧠 Deep Learning: This Disruptive Innovation Could Rewrite (or Erase) Us All: https://www.youtube.com/watch?v=J583J5Ts5uE
- The Genomics Age of Cancer Screening: Earlier Cancer Detection (Simon Barnett, ARK Invest): https://ark-invest.com/articles/analyst-research/the-genomics-age-of-cancer-screening-earlier-cancer-detection/
- Support the channel and get extra member-only benefits by joining us on Patreon: https://www.patreon.com/tickersymbolyou
🧬🧪 #SimonBarnett is an analyst at #ARKInvest on the genomic revolution strategy (found in #ARKG and #ARKK), where he focuses on next-generation DNA sequencing, #bioinformatics, molecular #diagnostics, and synthetic biology. We discuss what the future holds for these disruptive technologies and how Simon's focus areas at @ARK Investare converging to change the #healthcare landscape.
I'm joined today by Simon Barnett, one of ARK Invest's analysts focused on the genomics revolution. So first of all, Simon, thanks so much for joining us, and we're excited to have you. I'm going to ask you the biggest important question upfront. How are you today, man? How's it going?
Thanks a lot. Yeah. No, thank you for having me. It's been a super exciting summer. You know, I think everybody is still suffering from a little bit of shell shock over the course of the year, you know, tons of deals. And I think it's an exciting, exciting time to be studying the healthcare space, for sure. I think as terrible as this whole entire situation has been broadly, I think there have been some interesting innovations and adaptations that have come kind of as a forcing function of what happened during the pandemic, and we can get into that a little bit, but, yeah, super busy, but happy to take some time out to talk about things.
Yeah. Awesome. We're really happy to have you. Why don't we start with telling us a little bit about yourself, your background and your focus areas at ARK Invest.
Sure. So I've been with ARK since the fall of 2018, so I was straight out of undergrad at Johns Hopkins. I studied chemical and biomolecular engineering, so I think it was a good combination in retrospect, especially given what I do now. It was an education that really combined an understanding of the actual underlying chemistry on a molecular level, which is super important for the investment process. But also the scale up, right, so chemical engineering is all about taking something that happens on the micro level and scaling it up.
And oftentimes there are phenomena that happen as you try to do that. That wouldn't necessarily be super intuitive. And I think, for working in the genomic space, and we can talk a little bit about what that is. You have to be really attuned to what's happening in academia, what's happening in smaller research groups and thinking about how that could actually get to commercial scale. So putting that aside for now, that's sort of what I draw on from an education perspective. It's great coming out of the Hopkins community because I'm able to actually go back and talk to some professors and some of the folks that work at the hospital and get their insight and their information.
And it's a really important part of our research ecosystem. So a little bit more about my background. I was involved in cancer research on the diagnostics as well as the therapeutic side. In college, I was working in kind of a handful of internship scenarios where I learned a lot about applying machine learning models specifically within computer vision. So I lean on those things a little bit. But in general, that's a little bit about me and kind of what led me to getting involved with ARK right after school.
Yeah, that's awesome. We definitely have a lot of overlap, especially in the machine learning end. So I'm really excited to geek out with you about that. So why don't we take a step back and talk a little bit about health care today? Right. So genomics is broadly tackling a whole bunch of challenges in health care today. So can you shed some light on the context of what's going on in healthcare? What are some of the big challenges today and how genomics broadly is solving some of those challenges?
Sure. So I think it's best to probably start with a brief 30000 foot overview of genomics generally because I think it's one of those things where I mean, even the difference between something as simple as, like, genetics versus genomics. People hear that. And they're like, what is the difference? What is that? So originally the first time that we really had any insight onto the structure and the mechanisms of heredity on a molecular level was the 1950s. And we spent the second half of the 20th century developing all sorts of new tools and instruments and methods to increase the volume of information that we could generate.
And so I'm sure as many of the people watching this know, the Human Genome Project, which was kicked off in the early 1990s, was essentially a global effort at mapping the first whole human genome. So I think of this sort of like what was happening in the 1600 hundred, and then 1700 with actually sending sailors out and charting totally unknown parts of the ocean and coming back with iteratively, better maps that plotted an archipelago or something where there wasn't before. So it was a little like that.
And the idea was to enable this whole ethereal concept of precision medicine. And we can talk about what that means because you hear it and you're like, oh, that sounds nice. But what does it actually mean? How does it work? The idea was like, okay, we have all these medicines and we can deploy them. But sometimes there is a difference in outcome. Someone who takes one medication for cancer could be really successful. And someone who's on the same exact medication, same dose might have a completely different outcome.
So we're like, why does that happen? And the key to this is really zooming in as much as we can to understand, again on a molecular and a genomic level, why those differences in outcome actually happen. And there's a lot to be said about that. But to close the loop on genetics and genomics right, back in the 50s, it was so expensive to look at genetic information. And again through the 70s and whatever, looking at a single gene was like this massive endeavor. And then the Human Genome Project literally took $3 billion
and took about a decade to put together all of this research, and that was still only about 90% of the human genome, our first draft sequence. So we went from this area of studying single genes to studying whole genomes. So that's part of the reason why there's that change in verbiage is just the scale of what we're looking at is totally different. And it brings with it a whole new set of problems, because beforehand we had the dirt of information. There was no info out there.
But now that we can print genomes pretty trivially, there are some nuances to it. But you're working from this angle of like, okay, how do I make sense of an entire sea of data and actually get to an answer that is packageable in such a way where a physician who may or may not be trained in molecular genetics can say, okay, this is what I need to do to augment care. And this is why. Because I mean, from the diagnostic perspective, if you're not altering clinical management, you're not adding value.
You're just adding costs. Right. And I want to use is kind of ducktail into your question, which is okay, there's genetics and genomics. How does it integrate into the healthcare system? And so there are three main areas that I want to draw up that get this to happen. And it's actually one of the ways that we try to mirror our own investment strategies. And the way we kind of divvy up responsibilities internally. So you have the instrumentation, you have sequencers, you have digital spatial biology instruments, and we can talk about all those different things, but essentially, they're giving you a view on a molecular level of what's going on inside of our bodies.
So that's the instrumentation layer. That generates data. Stacked on top of that is the diagnostic ecosystem and the therapeutics ecosystem. So diagnostics is all about, partially, the prevention of disease. So an ounce of prevention is worth a pound of cure. So detecting disease early, diagnosing it more accurately and then surveying over the long term how the treatment or whatever you're doing is actually staying efficacious for a long time. So that's molecular diagnostics, then we have therapy. So people may have heard about CRISPR or gene editing immunotherapies.
There's a whole long list of therapeutic modalities that have been enabled by sequencing and by diagnostics. So it's sort of like in an artillery strike, you have to have the coordinates being fed over to the howitzer or whatever to actually launch an attack on it. So there has to be an inner play of information and that exchange of data, you can talk about it in terms of computational biology, bioinformatics, all of the digital infrastructure that goes into taking massive multimodal datasets that include your genetic information, but also something that we call phenotype, which people may have heard about in their biology classes.
Phenotype is what more classical symptoms do I have or what ways the disease or the lack of disease is manifesting. So all the clinical information that you go and you get collected when you have a physical now, right. So merging those data sets in a way that's useful and then again, packaging up and reporting it. These are all the different ways that broadly genomics is driving itself into the health care system. And we can talk a lot about it. They're interesting. I think kind of infrastructural barriers, definitely
in health care we have to talk about incentives because someone's always losing money. So we have to think about the flow of cash as part of the rationale for why certain business models are successful and why some others are not. But on a very high level, those are some of the broad areas that I would outline, and I'd be happy to kind of dig into any one of them in particular.
Yeah, sure. I think we should start with the data, right. So why don't we tackle the stack sort of in order? So the first thing you do is you gather lots and lots of data. Can you walk us through how that's changing from previous health care data, for example, just a sheer amount of quality and quantity of data being collected and the types of personalized decisions it's being used for in medicine?
Sure. So I'll talk about it first from and I think it's best if we think about this with a specific disease area so people can kind of wrap their mind around it and think about that patient experience, because you want to talk about this from the perspective of the individual. And this is, as an aside, one of the reasons why I love talking about this is if you take all these different areas, health care and genomics is the one that is guaranteed to affect you. At some point, it's guaranteed to affect your family, your friends, your loved ones.
And yet, when it works perfectly, you never see it. It has a different psychological charge, I mean, buying a new Tesla and not to say anything negative about that, but it's awesome. You have a brand new car. It's an immediate value add to your life versus I'm going to start talking about precision medicine and diagnostics, and you're like, well, I don't want to get my blood drawn. That doesn't sound fun, and it's not, but it could potentially help you avert horrible situations later. So putting that aside, let's talk about it from the context of oncology. Cancer care is one of the biggest applications of genomics across every part of it.
So you mentioned the data, the diagnostics, the therapeutics. It's a main focus point. There's a lot of focus being given to it from a data perspective. I think a good reference point would actually be to talk a little bit about the Ancestry companies and talk about the first way of 2006/2007, because for many people, especially folks that aren't necessarily steeped in what's happening in molecular oncology, that may be their only reference point for what a genetic test is. So the first thing I can say is that medical genetics that are based on DNA sequencing are completely different and completely unlike the ancestral kind of testing that there might be some level of health data that's being packaged alongside it.
But I can tell you that in terms of the comprehensiveness and the accuracy, it's a night and day difference. What used to happen years ago is there really was no reimbursement in preventive medicine. Our health care system is built to spot and treat, is built to reimburse acute disease, right? Disease that is certainly there, is causing symptoms and that is able to gather reimbursement. But there's really not a whole lot of money in preventive medicine because you have to test or give therapies to huge numbers of people, because when you're talking about preventing, there are a lot of healthy people compared to every person who's sick, there's a million healthy people.
Sure, sorry not to interrupt you either, but not to mention false positives and diagnostics are often unreliable and inconsistent. All the experience associated with probabilistic instead of deterministic type of diagnoses, right?
Exactly. And the signals that these tests can give sometimes, especially with something like a polygenic risk score, which I can talk about in a moment. If you want the place individuals along a continuous distribution of risk. And there's not a clear or universally agreed upon cutoff point for saying, oh, well, this person is two standard deviations above the norm for developing early onset cardiovascular disease like. Well, two standard deviations would be huge. But sometimes these tests yield results where the odds ratio, meaning that the percentage increase from baseline of risk is like 0.1 or 0.2 and it becomes ambiguous how to move them through the care pathway, so that's a piece of it. To close the loop on the whole idea around the volume of data we're generating.
It wasn't too long ago that single gene tests were doled out and would cost 20, 30 $40,000 a gene and would take half a year to return the results to patients. And nowadays what we're seeing is there is a vanguard of companies that are investing aggressively in driving these costs to the floor. Well, at the same time, increasing quality in kind. So we've gone from a time when these tests were tens of thousands of dollars to the point where there is low as $250. And instead of three or four or five genes, the panels include 300 or 400 genes that are well studied, well anticipated, and we're confident that the results that we deliver to individuals and to doctors are actionable.
That's what you really want. Is okay, I got this answer, but I need to redirect and course correct whether it's behavior, medication because no one wants to be left stranding with these results and not a clear pathway and a support system to guide you through that. So many of these tests, like these companies are not just innovating on the molecular level, but the packaging of the reporting. So connecting an individual with a genetic counselor so, someone who is essentially like the broker of okay, I understand everything about genetics.
How do we translate that into clinical care to a physician or a primary care doc who may not have the background? So there's concommittant business model, service level innovation that is equally as important to the genetics. And that goes in turn with what I was saying kind of the digital infrastructure for brokering that relationship between diagnostics and treatment. Broadly, there's a middle layer of innovation that's going on there. Broadly speaking, over the past few years, huge increase in the volume of data that's contained on these tests, and we still try to ensure that they're as actionable.
And the information is as sturdy in terms of our cumulative understanding of genetics. So that's really been the direction that things are trending. To forecast it out a little bit, I think something that we will probably see in modern healthcare system, certainly before the end of the decade is, I think, routine whole genome assembly, which again, I know we're coming up with a lot of topics, and I'm happy to talk about them all. But what's basically what's called De Novo Assembly, which means taking your cells and sequencing all 3.2 billion letters that make up your genome.
And keep in mind for folks who don't know, your genome and mine are about 99% identical. But the 1% of variability is the reason why we have different paths through life while we look different, why the diseases we get are different. And so it's really important to have super high accuracy, a lot of comprehensiveness. So I think it's not going to be too much longer, given the cost declines that we're seeing and the reimbursement and the weight that is happening or being carried over into the payer communities that full genome sequencing, full genome assembly at scale, fairly routinely and at a price point that even outside of a reimbursement envelope people could afford at a patient pay level.
And coming with that all of the digital infrastructure for actually making sure that that is translated efficaciously into clinical practice, I think, is going to be fairly routine.
Yeah. So there's a lot of great stuff to unpack there, and I'm going to kind of do it in reverse order here. So the first thing, the last thing you mentioned was that we're writing down a lot of really steep cost curve declines. Right. So one obvious one is the cost associated with genome sequencing. Can you speak to some of the other cost curves that are sort of overlapping and making this industry more accessible and exploding with new opportunities?
Sure. So sequencing. Yeah. You're right, is the bedrock behind all of it. And there is an important nuance, though, I think, to bring up here before talking about some of the other cost declines, which is if people have seen the last Big Ideas paper that we put out, we talked about the different cost curve dynamics between something called short read sequencing and long read sequencing. So these are both different methods of essentially doing the same thing, which is taking the genome, fracturing it into small fragments. Although the short read platforms are about two orders of magnitude, those reads are about two orders of magnitude smaller than long reads.
And I'll talk about why it's important later. But the point is, these are on different cost curves and for some applications, specifically for the ones that have to do with heredity. So not the cancer diagnostic stuff where you get a blood test and you look just at your cancer. The situations where you're assembling entire genomes from healthy persons are from people that have a suspicion of disease. Those cost curves historically have been coming down more slowly, but as of late, the long read cost decline is really starting to accelerate, and we think that's going to be hugely important for hereditary applications of genetics.
So that's the first thing. It's just that both of those are coming down. Second one is and I'm sure this wouldn't surprise you, is with all of that data decreasing costs associated with training and developing neural networks, machine learning algorithms. This goes back to what I was saying about making sense of large volumes of data and having systems that are interoperable with classical healthcare unstructured and doctor's handwriting importing. That right. It's a neural network task to read that handwriting. So everything having to do with understanding data and the software for infrastructure behind it is really important.
Some of the other work that we do, which might surprise people. My colleague Sam Korus does a ton of work in understanding the economics of collaborative and assisted robotics, so autonomous systems that are not necessarily being used for something like driving, but are used in industrial scale settings to perform menial tasks. And one of the ways that that has manifested is industrial scale sequencing and not just for the diagnostic applications. But if you go and you talk about companies doing computerated, drug discovery and design, or companies doing synthetic biology using genetic engineering to create microbes and microorganisms that can print physical goods at a much better price point and throughput than legacy petrochemical methods.
All of these places use collaborative robotics. And if you look at these things and you go inside of these labs, you'll have these really elegant, almost like symphony-like orchestrations between collaborative robotics that are running lab equipment, moving things around. And as you scale, you're trying to remove the human elements because you want to suppress the inter experiment or inter test variability, they can come with that. So I'd say broadly, it really comes down to software cost declines, the sequencing tech both long and short read for different applications, collaborative and assisted robotics.
All these things are coming together for certain to make these things go from completely cost prohibitive to something that you can pay for out of your own pocket.
Yeah. I had to literally hide my face there, because that is a vision of the future. That sounds super awesome. I'm excited to is there something that you can point us to where we can learn more about which labs are doing that, just further reading for me and the audience.
Yeah. So there's a company that we are an investor in, but I think who has done a really tremendous job at actually making this easy to watch and look at and see, and we want to make things tangible. So there's a company called Ginkgo Bioworks that I'm sure some people might be familiar with. It's one of the largest synthetic biology companies now, through its financing, they have the series of literally like biological factories they're called Bioworks' Foundries. And there's five of them. I think now there might be more by the time we finished this.
But they're all based in Cambridge, up in Boston. And each of these things is like an amazing interplay between some of the most sophisticated and experimental life science tools, things that are looking at single cell biology and a massively parallel fashion and doing spatial biology, understanding where things are organized inside of cells and the robots that are doing that are all super low latency because the software and the code base is being updated so quickly that these things are learning by the end of every workday, they might be behaving slightly differently than they did at the beginning of the work day.
So it's really amazing to see that interplay between different… It could not be closer to how we think about the future of investing in general, which is the multiplicative effects of all of these different exponential cost curves colliding in such a way that you get totally freakish and alien output out of these things. So I would encourage people, if you go on their website, they just did kind of like a welcome. People have never heard of us, or maybe you have, but we're public now. So you should come watch all of our videos and learn about us.
They did a great explainer series on what the labs look like. And there's a lot of footage and yeah, I think that's probably a good one to check out.
That's awesome. Yeah. So in addition to these sort of large scale cost curve declines, what other metrics, performance metrics should we be thinking about where we can gauge these companies and their performance? Is it like imaging resolution? Is it some sort of throughput? Can you walk us through some of the other performance metrics we can be looking at?
Sure. So it's a little different, depending on which of the different areas that I outlined that you're kind of looking at. There's a rubric for sequencing companies. There's a different rubric for the diagnostics companies. And what I would say generally is if you're an investor and you're interested in trying to understand how companies in the sequencing space are working. For example, you can look at the throughput of their instruments. You can look at the accuracies. And I just published a piece a little while ago that talks a lot about why sequencing accuracy is everything in the clinical space, especially for applications where you're looking for a sample that is very, very rare, relative to background noise.
And an example of that could be early stage cancer detection. Without going too into detail, you have a cancer, a tumor that is beginning to grow somewhere in the body. And it is quite literally shedding small pieces of cancer-derived DNA into the bloodstream and these fragments, if you compare them to healthy background DNA that's being constantly shed and metabolized inside of your bloodstream. For early stage cancer, it can sometimes be a million to one. So you need to ensure that when you capture that one fragment, you're able to yield a confident diagnostic result off of that one fragment and make sure that you're not calling false positives and all the background fragments.
So accuracy is everything and what I would call the needle in a haystack genomic applications like those liquid biopsies. So accuracy we could talk about throughput like I mentioned… To your point about resolution, there's this whole interesting new field of digital spatial biology. And basically what that means is I'm sure people know that in a lot of diagnostic workflows there's medical imaging, so literally a visual look at the pathology and the physiology of tissue and understanding what's wrong. And these approaches, some of them, like histochemistry, have been around for dozens of years, and they're fairly commoditized at this point.
And the resolution compared to what we can get with these new molecular methods is pretty small. So when we look at this new wave of digital spatial biology companies, resolution can be everything. And sometimes the resolution of these instruments is such that you can literally see whether an RNA molecule is still inside the nucleus or if it's been kicked out into the cytoplasm. I mean, you're talking about single molecule, like not even single cells, single molecule to reach clinical scale and move away from the research to the translational to the clinical setting, there needs to be diagnostics that are actually approved and regulated and drawing reimbursement for that to carry forward.
But so I would say the digital spatial things. So resolution, I think, and also something called multiplexing is another area that's really important. I like to use this analogy of muffins when I talk about multiplexing, which is basically like if it costs me a fixed amount of money to run an oven, like a fixed amount of energy, I want to bake as many muffins as I can per cycle or run of that oven because I'm getting paid as the diagnostic provider by selling well, each of those muffins would be like a patient sample.
So I'm billing per sample in that run. And so multiplex or just plex generally refers to how many individual samples I can run in parallel and get reimbursed. And so for some instruments, it's harder to multiplex. Other times, it's very easy. And for those that may be interested in it, this is super, super cool. One of the ways that we do it on sequencers, especially at industrial scale, because you imagine you take this big sample and let's say you want to run 100 patients' DNA on it, and you load it all into the sequencer through a cartridge called a flow cell.
How do you tell whose DNA is from whose if they're all mixed together on this little glass wafer into the machine, what we do is we literally anneal or attach a barcode, like a QR code to the end of every DNA snippet. So when the sequencer reads it, it reads the barcode. It's like, Ah, I can bend all that data into person A. I can bind all this data. It's called a unique molecular identifier, or UMI. And it's physically a small strand of DNA that we see.
So you have to sequence it. And there's this problem where if your sequencing accuracy is lower, the likelihood that the sequencer will make a mistake in that barcode region goes up. So you have to design these barcodes, because if you make them bigger, you can do more patients because there are more unique molecular identifiers in the batch. But you run the risk of, if there's an error in one of them, you can misattribute the data to the wrong patient, which could be pretty disastrous. So that's an example where the chemistry and the accuracy affects the multiplexing.
And I'm billing based on the multiplex. So it might be kind of a rabbit hole to go down. But that's an example of one of the metrics that we look at is multi flexibility, as some of these platforms, especially the clinical ones, moving on outside of instruments and feel free to cut me off for giving too many examples. But in the diagnostic space, there are so many different types of diagnostics. You can divide them up by disease area. You can divide them up by stage in which they're meant to be used.
Is it preventive? Is it screening? Is it prognostic? Meaning is this going to be an aggressive or non aggressive disease? You can talk about them from the context of therapy optimization. So, which drug should I pick? And then you can talk about them in terms of the ones that are meant to be long term surveillance type tests, right. So making sure that cancer doesn't come back, for instance. The reason that I think it's a little challenging just to drop a single metric in the bucket is it really depends on the clinical question you're trying to answer.
Broadly speaking, I think and you actually mentioned this a few times. The easiest one that is common across the board is these analytical variables, like what are called sensitivity and specificity or basically like, what is the true positive and true negative rate? How good is a test if it tells me I'm healthy, that I'm healthy, or if it tells me that I'm sick, I'm sick. Right. And sometimes you want to tweak those on the back end, depending on, again, the clinical question, right. So if I have a cancer screening test, for example, and this is one that I think is really important for the long tail of people interested in genetics and genomics to think about is the cancer screening problem in the post genomic era.
It's really something that we've been working on for about a year and a half, and we continue to iterate on it because it is such a complex issue to go down, because you can talk about how great a technology is. But you have to ensure that in addition to being a great technology, that it fits into the clinical care ecosystem. So there's a pragmatic element to those two things, agree. So for cancer screening, right. Something that you'll hear sometimes is these cancer screening tests. The metric that will be used as a pitch is the false positive rate.
So you can say let's just say arbitrarily that some test has, no test in particular, has a 1% false positive rate. And so you hear that and you're like, 1%. That sounds awesome. That's 99% true positive, I guess. And that's 99 to one. I'll take those odds. And that's a totally natural way to think. And actually, to some extent, think that the marketing behind that is you have to be careful because people who aren't super steep in understanding the biostatistics can make the wrong decision for them based on hearing something like that.
So the added bit of information that I would give here is that number. It's it's called specificity, the false positive rate. It doesn't mean a whole lot until you understand it in the context of the disease that you're looking for. So I'll give an example: if I've got a test that's literally a coin flip and it's 50% likely to be whatever. And I take a room full of people and get a hundred of them, and let's say 80 of them are sick, 80% of them are sick.
And I flip the coin, which is totally random on all of them. And I compute the statistics afterwards. My test is gonna look pretty good, certainly better than if I had one sick person in the room, right. The number of false positives I'm going to throw is going to be a lot different. So even though the 50%, the specificity never changed, the actual positive predictive value, meaning what percent of people that the test said was positive are positive that changes drastically. So what you should be asking is 1% sounds great,
but is it sufficient for the level of in frequency of disease and the population that I'm studying? And so for doing asymptomatic cancer screening? 1%, we think is not going to cut it. You need to be talking about something like 99.9 and then more nines, the better because there is real human harm in false positives. I know people like to think about diagnostics as being something that's like, oh, it's just a blood test. It's not. It's not because the next thing that could happen to you is potentially being non covered for a more invasive procedure or just having to suffer through a more invasive biopsy.
So there is real tangible harm with false positives as well as false negatives. Of course, there's a consequence for everything. But at the margin, I think that's a big one to think about is just sensitivity and specificity. Sure. But what you really want to look at is the negative and positive predictive values, especially in the context of the disease that you're looking for. I think those are pretty important metrics as well.
Yeah. I think it's also important to note that many of those are going up at the same time. Right. So we talked a lot about a single cancer pipeline, but one of my favorite articles of yours is actually talking about multi cancer screenings. So being able to tie that back to our muffin example a little bit. You know, not only are we able to do one pipeline much better today, thanks to genomics and the different spaces we're talking about, we're also able to do things like multi cancer screenings. Can you talk a little bit about that?
Sure. So for those that don't know, right. So on the surface, these tests look exactly the same. And as they start to commercialize and to take a step back real quick, by 2030, we think that these things are going to be, like, just massively affordable. I mean, $50 might be the price point, and there will definitely be companies that will drive that cost to the floor. There are going to be other ones, kind of the specialty diagnostic companies that are not incentivized to lower those prices because they have long term contracts in place, perhaps with payers and are not necessarily incentivized by their shareholders or by their business models, or, frankly, by the state of their technology, giving them the flexibility to lower prices.
So it's going to be a lot of different things. But regardless, on the surface, they look the same. It looks like a blood test. So a single tube of blood taken, in some instances, you could go to a lab core or any sort of clinic where you can get that done. In some other cases, for people that are indisposed to be compromised or what have you know, mobile phlebotomy and getting it done at home could be an option. But regardless, it looks the same.
What's very different is what's happening under the hood. So to your point, the difference between single cancer and multi cancer is is the test looking for one cancer type. So maybe something like colorectal cancer. Or is it looking for multiple cancers? So maybe all cancers of the GI tract or all solid tumors, you can cut up the whole bolus of different cancers in any which way. And I would even go so far as to make a third distinction, which is technically under the umbrella of multi cancer.
But what we've started to notice as of recently is there are companies going after kind of the rifle shot like single cancer approach, and we'll talk about the merits of that; the multi cancer approach, which is, again, more than one cancer but generally less than a dozen. And there's a reason why you kind of go for that middle ground. And then there's the pan cancer test. So pan cancer is basically like just including everything you can 50, 100. I don't know exactly how many, but certainly several dozen packaged together.
And so you could look at that. And you could say, well, why would you ever want a single cancer test or even a multi cancer test if we have these pan cancer tests that can look for more cancers? And on some level, I think that's a pretty sound way of thinking about it. Why wouldn't you always want to look for more cancers? And here is the devil is in the details with these sorts of things. And part of my current understanding around this is because of what's actually happening at the clinical level, whether the algorithms that the tests run on can be performant and give you like we talked about good sensitivity and specificity.
It doesn't propagate downstream to how actually that gets put into clinical practice. So I'll give an example where this will sort of make sense and we have to look at it. And again, not just with what one single diagnostic company the vendor can control, but how these things actually diffuse into the health care system. So an example I'll use is something with a single cancer test versus a multi or a pan cancer test, which is, let's say that I'm designing a multi cancer test that is like a couple dozen different cancers, and I'm trying to on the back end, tweak the sensitivity and the specificity like we talked about.
And there's a trade off. Right. You can't push them both up formally at the same time. You kind of have to balance it. And for some cancer types, the method or the assay design. So an assay is just how you build a diagnostic test and what's included on it. It may not generalize well to every cancer type, because different cancers are unique. Sometimes some cancers are louder than others, meaning they shed more signal. Other times, certain cancers have what's called very low mutational burden, so they don't actually actually have a lot of mutations hidden with them, them that they can shed into blood.
And in those cases, we may have to rely on early warning signals from your adaptive immune system that could be kind of harbingers that something is growing. So there's just a lot of the cancer, like all the different cancer types are variegated, and so it's hard, I think for one test to be kind of one size fits all in this case. The second big point is algorithmically, when you're optimizing around all these different conditions, it's very difficult to bring the sensitivity, which is, again, the true positive rate, up to a level.
That is what a lot of specialized oncologists are used to. So an example of this is if I'm an oncologist and I typically see people who have colorectal cancer, for instance, and there's a multi cancer test out there that is boasting a sensitivity across the board for the whole test of maybe like 65%. And that could be pretty good for the intended use case of that original multi-cancer test. But if I have something like Cologuard or Diagnostic Colonoscopy, which is a single cancer test and the sensitivity of this test, because it's only optimized for one cancer type, you can bring the sensitivity up to something like 95% or something like that.
And in those cases, an oncologist may not actually consider their indication, meaning their cancer type truly screened for unless they're given a test that has a level of sensitivity that they're used to. So that is actually one of the reasons why we think the single cancer, the multi cancer and the pan cancer, are going to coexist, and it has to do with taking the complexity and the differences in the biology of these different tumors, the different patient use cases and intended use populations, and the difficulty of adapting a single algorithm to meet and satisfy all of those different conditions simultaneously.
That leads us to this position that they're probably going to go out together. There's probably a market for all of them. You just have to size them relativistically to kind of understand how big the opportunity is for this, that and the other. And the last point that I'll make on this is we have to think about next steps because no diagnostic lives in a vacuum. It's going to flex you to the next node in your care pathway. So, for a cancer test or a screening test,
one of the kind of the bells and whistles that's getting tacked on and is currently pretty divisive is this idea of molecular tissue of origin. And basically what that means is in addition to telling me if I have cancer, telling me where it is in my body, and so again on paper, sounds awesome. Sounds like it would remove complexity from the system. And I would understand that argument, and I'm willing and happy to hear arguments in that favor because it's getting tougher for me to palate it.
But essentially, the problem is, let's say you look at 30 cancers, you're doing the tissue of origin read out in addition, and 25 of those cancers that you're looking for, we've never been able to screen for in history. We've never had a test that we could use. We have some people know about mammography for breast cancer. There are other lower quality ways to screen for prostate cancer. Colorectal has Cologuard colonoscopy. Okay. Those are ones where there is an established, okay, you're in the high risk group because you are screen positive.
We're going to flex you to diagnostic colonoscopy to confirm. This is the diagnosis and then the pathologist and the oncologist, and it goes on down and down, and there's cost that's accruing, there's complexity, there's pain. There's like, you have to think about all these human sides to this as well. So if you have a test where 25 of the cancer don't have an existing care pathway, it can be difficult for, especially in the community setting, where about 80% of cancer patients get treated, for the system to know.
Okay, where do we send you next? And is your insurer going to cover that next step? Because, again, this is totally new territory. So part of the next decade is going to be not just the technological development, but understanding how to integrate it into best practices. So this is part of the rationale for why we think that a better method initially, even though it might not be as elegant or in the long, long, long term as performance is, take the positive and pair it in every case to something like a pet CT scan.
Right. Because doing Pet CT scan, like a whole body pet CT scan without a screen can be kind of dangerous because you're surfacing all these incidental findings and it can be messy. But if you have a screen positive from one of these multi cancer tests and then you image, I mean, you're going to get a level of information that even with a super accurate tissue of origin read out you wouldn't be able to get, like, if it's like, okay, you have kidney cancer. Okay. Well, which kidney?
Right? As far as I know, there's not a biomarker for, like, left kidney or right kidney. We'll see. But that's the general idea for why we still are kind of at that point in our own understanding.
Sure. And when you zoom all the way out and you think about the ten year future of care pipelines, do you imagine it being… I'm imagining there's some trade space between the broadness of the test and the accuracy of each individual part of that test. So as one way to couple them and have them all coexist is, hey, if you're feeling crummy in general, we're going to start with these broad tests. And as we narrow down into what you may have, the tests get more accurate and individualized, but we're not spending resources giving you very accurate test some things you have a low statistical probability of having.
And then we pair that with the most expensive imaging techniques only when we're really sure. Hey, there's a good chance you might really have this. Your insurance has a really good chance of covering it. And the other real financial health care implications associated with hey, now we're only running all these expensive diagnostics because we're definitely in the area of the things you may really have. Can you describe how all those things fit together, am I close? Am I way off?
It sounds like you already did it. Yeah. So you're exactly right. If you zoom out just like you did, what you're describing is called the diagnostic funnel. Its idea that we start at the broad end where we don't know anything about people, and we want to try to get to a state where we're not treating them as like a one or a zero. Are you sick or not sick? Because it's so low resolution and crude and doesn't give the patient or the person or the physician any sort of line of sight on when they're near a transition state from sick to non sick.
Ideally, what we want. Your point is to dole out diagnostics and preventive techniques that narrow down the funnel such that we only have to focus the more expensive treatments and tests on a smaller set of individuals that are much more likely to benefit. Otherwise, the system doesn't become scalable. It's not profitable. There's value extraction, and in many cases, the cumulative burden of all this testing is actually too big for the back of the healthcare system to take in the US, and certainly in situations and other nations and places where there's not as cohesive or widespread of a reimbursement system, and certainly in places where there is no reimbursement, you have to be able to finance it through patient pay or whatever.
So you're talking about an entirely different problem having to do with price elasticity of demand. But you're pretty much like conceptually exactly right. We don't want to over test. We don't want to just test everyone for everything all the time. It's not sustainable. We want to be very prudent about how we trim down that population. So examples are good. I'll give you another one because we're kind of talking about it in the context of transfer. And it's also a good lead in to talk about something that I think a lot of folks want to hear about, which is hear me talk about proteins. So, generally speaking, the way that I see it,
kind of given the current state of things with the certain types of diagnostics and different tests that we have right now. I think the best way to be very careful about something I mentioned, which is probably called over treatment. So we don't want to be diagnosing people with things that they don't have and elevating them to tests and treatments that are going to hurt them or cost them money. So that's over treatment. But we also don't want to under diagnose people either and miss things.
So, what I think the best thing to do, especially within oncology, but also other diseases like cardiovascular, different areas where we can actually do behavioral changes to improve outcomes over time is starting with understanding hereditary genetics. So start with an understanding of what cards you were dealt with when you were born. What does your genome say about your own individual predisposition to a broad set of different diseases? And that blueprint, this goes back to something I said at the very beginning, which is, before too long,
we really hope that whole genome assembly for individuals in healthcare systems where there's access to this types of technology, that can be something that parents could afford and just choose to do what they wanted, even outside of a reimbursement scenario. But regardless, hereditary genetics is super important. People talk a lot about molecular oncology and using the genome to fight cancer, and it's like, okay, well, it's hard for me to imagine a perfect steady state of that system that doesn't involve hereditary genetics. So I'd say that's the first part, is understanding gravity, risk stratifying the population and do a high risk, medium risk group and low risk group so that you can give the correct interventions to the correct group and clamp down on costs or elevate them depending on the need of the individual.
So that's one thing and then the next step of this could be something like, in some cases, cancer screening I talked about, but focusing higher frequency screening on individuals that have a higher risk of cancer, and in some cases, like the resolution of this technology. And this goes back on what you were saying earlier about how much data are we generating, what are we able to actually glean off of it? I mean, there are situations where we can tell you what organ system is likely to succumb, you know, develop early onset cancer for you, if you have a specific pathogenic mutation in genes that are responsible for that, maintaining that organ system, like a classic one, which people may know about are the BRCA genes, BRCA1 and 2, which are some of the most highly studied genes where if a woman or a man, but different cancer types, but regardless, you have this predisposition to breast cancer, for example.
So we know there are ways to prevent this. There are ways to get in front of it, and it's more likely to affect you earlier if you have a pathogenic bracket mutation. So that's a very simple example of translating monogenic, meaning one gene risk and monitoring it longitudinally with screening. But in a way that focuses it solely on that person that has that genetic predisposition and ultimately, in cases where it can't be avoided, to be able to say, look, okay, it's starting and we've detected it when it's still the size of a grain of rice, for example.
And you can say, okay, the best treatment here might actually just be surgery. It might not be precision medicine at all. I mean, this is still precision medicine, but like a precision, not blunt chemotherapy, but an immunotherapy or precision cancer agent. That may be the best option, and then you can kind of treat it almost like a chronic condition, not something that's necessarily like a life or death kind of surprise. And this is one of the statistics I gave in that presentation about cancer is like solid tumors all follow the same linear path from localized to regional to distal or metastatic.
And even though late onset cancers that are diagnosed late when they're caught are a smaller fraction, I can't remember the exact number, but it's like I think it's in the teens, like 17% or something, even though that's a minority of new diagnoses at a five year period, it's the majority of deaths. Cancer is an elusive, very complex problem. But one of the blunt ways that I think we can really cut down on overall mortality is just catch it earlier. Make sure that metastatic cancer isn't what we're catching, make sure it's regional or local, even better.
And this is actually a really interesting thing to talk about, too, because regardless of how good the system is, there are always going to be people that slipped through the cracks. There are always going to be this may take 20 years to fully implement at scale, but I there are always going to be people that show up with advanced, more advanced diseases, and we have to have diagnostic solutions for them as well as therapeutic solutions. I think it's super interesting the way that these things work. I like to think about cancer specifically, almost the same way through looking at it through the same lens as like a macro evolutionary biologist. In school, we learn about evolution, having these main components of basically survival of the fittest, and you have to be well suited to your environment to thrive, out compete your neighbors and have a differential rate of reproduction.
That's how you propagate and grow a species. And cancer is in many ways very similar. So you'll have maybe a single cell in your body, and we'll take, like, lung cancer, for example, maybe there's genetic instability that happens because you're taking outside influence, like smoke or other carcinogens that are causing that change in the genome. And you may have a cell that has a random genetic change that gives it a survival advantage relative to its neighbors. Maybe it's better at collecting oxygen or nutrients or something that is giving it that survivability advantage.
And so it starts to reproduce at a very fast rate. And before long, you have a localized tumor, which is a collection of cells that are descended from that original set of rogue cells that are harboring those same genetic mutations. It's called oncogenesis the creation or the Genesis of cancer and treatment, so things like immunotherapy, CAR-T, broad chemotherapy. And we can talk about different therapies as well. But in many ways, they're all sort of, like, selection pressures. We're giving an outside external pressure to the environment, like a flood or like the meteor strike with a dinosaur or something, and it might kill 90% of the cells.
But the ones who survive again, they may have had a survival adaptation to that therapy. And so this is why it's so important that we have liquid biopsies and diagnostic tests that enable clinicians to stay one step ahead of what is quite literally called the natural history or the evolution of that cancer. It's kind of like whack-a-mole, you hit, you're literally targeting the mutation. That's how these cancer drugs work. You target one mutation and you get rid of the subpopulation of those cancer cells that harbor that mutation.
It's called a clone. You get rid of it. But then another one springs up. And that's one of the reasons why people who are on this track of treatment, may have it for a while, and then all of a sudden, it stops working. Two thirds of the way through is because of that somatic evolution of disease. And so that's where monitoring becomes really important. And the great thing about these liquid biopsies, and we're still continuing to accrue evidence on it is that what's called the limit of detection, our ability to catch it when it's very, very small is getting to a point that is astoundingly better than medical imaging.
So instead of having to go in and get your scan, like is this coming back? We can see when there are one in a million kind of fluctuations of what's called cancer burden. Is the burden going up or down in response to therapy? And not only is it going up or down, but like which of the, and remember, late stage cancer or more advanced cancers, sometimes they can be very heterogeneous. So there are different pockets of the tumor that are all slightly, I mean, they're genetically related, for sure. But they're called sub clones, and they all have slightly different adaptations to therapy in some cases.
So seeing which sub clones are getting killed and which ones are expanding lets you again stay one step ahead of, okay, should we switch to this treatment or this treatment, or how should we augment our care and no amount of imaging is going to get you that level of molecular detail. So I think that's going to be another common place of molecular oncology.
You know what? Naively, when I think about getting sick, I think about I'm sick today and I'm starting to get better tomorrow. But there are these chronic conditions, these disabilities and diseases, as well as these long lead diseases where you have opportunities to catch it early. And it's sort of this ramping up cycle. Right. So being able to form medicine and care pipelines around those different types of diseases, where as certain symptoms manifest and things like that, you're introducing new treatments, new tests and new ways to sort of take whatever the most efficient measure is, right.
Then I think that's a really, really huge change from the way we do it today, right?
Yeah. Absolutely. And you reminded me of this. Something else I wanted to talk about, which was I want to be fair and talking about lots of different types of diseases in addition to cancer, cancer is a unique problem because its genome is quite literally distinct from your own genome. The genome I inherited, we can call it germ line. And if there are mutations that happen somewhere in my body that manifest into a tumor, that tumor has its own, it's related to me. Sure. But it's so different that we say it has its own genome and that those are called somatic mutations that are not passed on to your children.
Yes. Cancer is a disease of the genome. And we should be using genomic tools to solve that problem. Okay. Put that aside. Lots of diseases are rooted in genetics in many ways. And another interesting one to talk about that I'm really, really excited for now that we have this, and again, like I said, with genomics, right. It starts at the instrument layer. You have to have instrumentation that allows you to generate the data that we can put into our ecosystem of software tools to get intelligence off of it.
That's the workflow. So proteomics. Basically, what? First of all, what the heck is protein? Let's start there. There's something that people may be familiar with called the central dogma of biology. And basically, what that means is that DNA, so DNA is contained inside the trillions of cells inside of your body and the nucleus of every cell. DNA codes for RNA and RNA is taken and put into proteins. And proteins are like on a chess board. They're like the ponds. They do everything. They're a little simpler. We have a lot of them, and they perform the vast majority of your bodily functions in all the different types of tissues in your body.
So there's the flow of information in biological organisms and not just in humans, but all organisms is DNA to RNA to protein. So that's the flow. Right. So when we talk about DNA, that's the study of genomics, right. The study of DNA. We talk about studying RNA. That's called transcriptomics, because RNAs, those molecules are called transcripts, because DNA is transcribed into RNA. So we call them transcripts, transcriptomics. The last step is the study of all proteins in an organism. That's proteomics. The challenge has been that as you move down that central dogma, from DNA to RNA to protein, things get way funkier and harder to study.
And I'll use just a few examples, but like DNA, we've got 20000 genes, in RNA, there can be several hundred thousand transcripts that are unique. And basically, each layer, there's more diversity. So you're working with more discrete units. So that's just a bigger set of data in each one. That's one thing. Second thing is, your DNA doesn't change over the course of your lifetime. It's static. That's not to say that it's simple to figure out or irrelevant. It's just static. It stays the same if we sequence your genome.
Well, for the most part, your DNA is going to be the same if you're a sequence when you're a newborn versus if you're old. So that's what I mean by that there's no temporal variation. Proteins are changing all the time. They're constantly in flux. And what we've started to figure out and actually develop are these tools that have, and you talked about the metrics that are important in the tool space. They have the throughput and the resolution that enable a level of data generation that is amenable to modern machine learning algorithms.
Right. So the cool thing about looking at the proteome is you can take a blood sample, run it through a different instrument that studies proteins and not DNA. And the idea is that we're able to see what those proteins are doing and quantify them. Meaning is a certain subset of proteins that's associated with cardiovascular or, like a heart attack. Are those proteins brought about in the body at an abnormal level compared to a baseline population? And we can run models around that. The really cool thing about proteomics is doing a liquid biopsy based on proteins, not DNA, longitudinally, may help us monitor and predict those transition states from healthy to sick for a long tail of different diseases that don't shed those same DNA markers that cancer does into the body.
So this opportunity would be almost like, literally, it's still a liquid biopsy. You're still taking blood. But an example would be like non alcoholic fatty liver disease or NASH. So this sort of problem that you can have in your liver, where it has this kind of fatty lipid buildup and can cause all sorts of issues that are chronic and debilitating and result in a lot of cost. It's the same sort of problem where we have these legacy diagnostics that are very invasive. Oftentimes they're acquiring a puncture biopsy, which is just as fun as it sounds and getting that done as many as 20 times a year.
Let's say for another great example, and this is DNA, but I'll bring it up as transplant admission. We can talk about the readmission rates or the difficulties rejecting an organ that's been transplanted and not just for solid transplants, but also cell therapies are literally transplants you're taking and you're transplanting, it's allergenic. You're taking it from another organism, perhaps another person, and you're giving it to the body, and then you have the graft versus host problem of is your body's immune system gonna reject this foreign thing you're putting into it.
We can monitor that with liquid biopsies and be able to tell very early if the immune system is no longer quiescent or stable, this person may reject. Right? So get ahead of it. Because it's way more expensive to go in and get another kidney. Right. And so this is again, kind of it's the same on the surface level. When we hear about it, we're like, these are all just blood tests, but the technology and the pipelines that that is flowing through are changing quarter over quarter and watching the space is super exciting because you get to learn about what the protein based assays are doing for cardiovascular, neurologic, other sorts of we talk about the liver, for example, that's a set of them.
And then using donor derived DNA to study transplant efficacy, for solid organs and for cancer treatment that's based on a cell transplant to cancer itself, is a solid or liquid tumor growing in the body. And how to catch that. It's pretty exciting and incorporates, again, everything that we're trying to invest in study, from the hardware to the software to the chemistry. It's all important.
Yeah, sure. I actually think this is a beautiful segue into the huge data pipeline. Can you talk to us about the Deepmind and deep variant breakthroughs that have been going on?
Oh, yeah. You teed me up. This is great. I love talking about… Google is cooking up so many cool things. I know Google is a huge company and is involved in a lot of different areas, but I would be remiss if not to talk about what Google is doing in the health care space, specifically in molecular medicine, to your point. So you mentioned Deep Variant and DeepMind. So let's take DeepMind first. So DeepMind, to my knowledge, is that it was a separate company, kind of acquired and brought in as a research division of Google, and they're focused on a lot of different problems of applying deep neural networks to really hairy problems that potentially are amenable to that approach.
And I think there are a handful of ones based on different games like Chess or AlphaGo or these different breakthroughs where applying neural networks to what was a very seemingly intractable problem or complex problem, you were able to get a new step function of improvement. So some people may have heard about Alpha Fold, and this is a good time. I'm glad you brought it up now because it leans on a lot of the central dogma that I was saying again, just to remind people the DNA, RNA to protein.
First of all, let's say what DeepMind built? What Alpha Fold two is what it does and what it doesn't do. Plain and simple. Alpha Fold is a software program that is now, I believe, open source, and I don't know the licensing structure, but it's open source. It's online. The initial paper has been published, and you can read about it if you'd like. So Alpha Fold is a software program. So think of it like, well, it's just an artificial intelligence. It's a neural network, and it's designed to take an input and give you an output.
Okay. So what it takes in is the sequence or so the order of amino acids, which amino acids are the building blocks of protein. The same way that nucleotides A, G, T, and C are the building blocks, the Alphabet of your DNA. It's actually another really good reason for why studying proteins is a lot harder is because they're working with 20 discrete letters. There are 20 amino acids versus only four bases in the human genome. So the Alphabet is bigger. So you take in, say, a protein is 1000 amino acids long.
You would take in a text file, like a string of what the different amino acids are that make up that protein. So you take in sequence data. And then, what Alpha Fold does. And I'll describe how it does it. But I'll just tell you what the output is. First, what it spits out is the three dimensional structure that that protein, in its native confirmation would take. And so what I mean by structure is like, if you go online, you Google protein, you'll see these ribbony structures, some of them, they look like they have helixes, like little Cheetos that spiral.
Other ones have these big sheets, and they're twisted and they're turned and they're kinked all up into this big hairball structure. And you're like, okay, well, what is the good of that? It's actually there's this sort of axiom in biology generally, that something's form dictates its function. So the shape of a protein actually determines what it does in the body and how good it is at that thing. And there are many examples of this. But I think one kind of grotesque example is people are familiar with carbon monoxide poisoning, right?
You've heard of people leaving their stove on, and that happening. So basically, one of the reasons why that happens is carbon monoxide, carbon and oxygen. When that molecule gets in the body, it has a very high affinity for this protein complex called hemoglobin, which you may have heard of before. It's the protein complex whose job is to shuttle oxygen that you breathe throughout your body, to all of your tissues and your muscles and things like that. And so it has a pocket like a groove that physically binds oxygen and holds on to it and delivers it to tissue.
When you feed it carbon monoxide. The carbon monoxide is actually stickier than the oxygen is for that slot. Once it's stuck, it basically can't come back out. And you've basically turned a hemoglobin molecule whose native function was to shuttle oxygen into just a brick floating through your bloodstream, that's an example of like the physical structure of a protein dictates its function. Right. So if you had a change in the structure of the protein such that it wasn't able to have that binding pocket or that groove, its bioactivity, its function would be diminished
potentially. That's actually the exact reason why mutations on the DNA level propagate downstream to create problems. Again, back to the central dogma. And this is actually one of the ways Alpha Fold can be very helpful from a research perspective is mutations in DNA, so spelling errors, will cause potentially a change in the RNA, which continues to propagate into the protein layer. And if there's a mutation that breaks a very key segment of the protein, it might not be able to do its job anymore. And if its job is really important to life, then that's how disease manifests in some cases.
So back to Alpha Fold quickly, the algorithm it takes in sequence. Like I said. So a linear text string and outputs a three dimensional structure. And this is one of the reasons why it got so much news and notoriety was the structures that it was able to generate are freakishly good approximations of the most accurate atomic microscopy and crystallization structures, like the most hardcore, expensive methods that we have for determining a protein's three dimensional structure. If you superimpose them, and they gave an example on the Alpha Fold, like the original blog release.
If you superimpose them, they're almost exact carbon copies. And you never had to employ those expensive, low throughput imaging techniques. Now, again, it's a model that potentially could suffer from biases and the training data, as you know, with your background in ML. So there are certain protein families that may have a poor prediction value relative to ones that we know very well. So I'm not at the point where I'm certain that every single sequence it could decipher. There's probably a family that it's particularly very good at, and there might be other families where it's not as good.
But regardless, in general, that's the concept, its sequence and structure. Now, I won't go too too far into it, but I think it's super interesting. Like how the heck it does this algorithmically, one of the ways I think about it is like every of those 20 amino acids, the 20 different Alphabet letters that make up a protein has a chemical branch on the back of it called a functional group. And each of those functional groups has a different set of properties. Like how electronegative is it?
What does the charge and the polarization look like around it? So you can kind of think of it like a form of magnetism. And so if I have 100 amino acids that are physically linked together by their backbones, and there are all these little off branches happening across the entire landscape. And let's say two of them, them that are next to each other are both highly negatively charged, they're going to want to move away from each other, if that makes sense. Right. So it's like you have a bunch of magnets on a string and you throw it in a room and it's like, click, click, click.
It finds a thermodynamic energy state that is preferable. So it's kind of like in machine learning. It's a lot like gradient descent where you have this vector or this surface and you're running calculations to try to find the position of its most stable conformation. So it's a spontaneous process. It is influenced by the environment of course, if you put a protein in a different environment, you can denature it or cause it to change. But essentially, it's called steric hindrance, or the different kind of electromagnetic interactions between the different amino acids on that backbone get you to the final confirmation.
And that's essentially part of the problem that Alpha Fold tries to tackle is computing all those pairwise relationships between adjacent amino acids and figuring out both the local and then eventually the global confirmation that is thermodynamically stable, if that makes sense. So that's sort of how it works. And there are different ways of doing it. But that's what Alpha Fold does. The reason why it can be really cool is because lots of drugs that we make are proteins. And so if the drug development process can become more digital and lean on again, borrowing the whole issue server supremacy.
But you can use these algorithms as point solutions along that medicinal chemistry workflow to improve your operating efficiency. And this is one of the things that we think is happening broadly in the therapeutic space, like therapy companies that are leveraging really sophisticated tech stacks and not just the software, but like, there's a company that just and again, we're a holder of it. But one of the best examples of this on the therapeutic side is a company called Recursion. And Recursion is super fascinating, because what essentially they've done is they've stitched together two corpuses of expertise, the Dry Lab, so software, and the Wet Lab, the molecular biology, and you stitch them together in quite a literal training loop of information.
And you take your physical experiments that are governed by physical and molecular laws, and you figure out what that did. Did the drug that it created, is it more efficacious or is it less efficacious? And there are other parameters you have to optimize: foreign drug development, like toxicity, solubility, et cetera. You need to make sure that the drug is working, but it's also not going to kill you or it's not going to cause toxicity in your brain or your heart or whatever. So there's a multi parameter problem, which is why it's so hard.
And they'll measure these things quantitatively with this branch of omics called phenomic, so that it's like massive scale imaging and microscopy that's looking at tissues and understanding what happens. And then it takes those learnings packages them and structure them in a digital format, feeds it back into the dry lab stack and says, okay, well, we can get marginally better at inferencing chemical space if we take those learnings and then we will take another shot at it. So you have a recursive training loop. And that's a really cool way we think to print drugs is figuring.
I mean, it's not a panacea, I want to say. And dumb it down to the extent that it's just tying those two things together. There's a lot of nuance to their approach, and I really encourage people, if they're interested to go read their ten K online. It's publicly available and to learn more about. I think it's called Recursion OS, their operating system that they've coined, super fascinating work, and I think is illustrative of the broader kind of investment strategies that we have when you kind of try to combine these historically disparate areas of innovation and see what they can do when they stack together.
Yeah. I think you just did a great job teeing me up. That sounds awesome. And I'm definitely going to Google Recursion after this. But I'm interested in hearing how other innovation platforms that ARK specifically focuses on overlap.
So one you brought to our attention already was robotics, this idea of robotics and genomics being a lot closer than we think. Where, do you see any other types of innovation platforms like that overlapping. We talked extensively about artificial intelligence and robotics at this point, but is there a third or a fourth innovation platform you're really excited about?
Yeah, it's a good question. Maybe not ones that we've published the Wright's Law curve on yet. But certainly there are definitely areas that we're looking at from. So I talked about sequencing in AI, but another one that I would be remiss not to mention on the therapeutic side, is how much more accessible gene editing has become. So the cost declines associated with augmenting DNA and editing it for discovery or therapeutic purposes. So we talk about gene editing. It's conceptually very simple. So if sequencing is DNA read right.
Gene editing is editing. I would say synthetic biology is where the writing is, and we can actually, that is probably another cost declined curve that we haven't published quite yet, but I'm going to mention it. So don't let me forget after I talk about gene editing. So with gene editing, right? It's not new necessarily. We've had to my knowledge. My colleague Ali is the real expert on the therapeutic part of our strategies. But essentially, you've got these two previous methods. One is called zinc finger nucleases, and the other one is called talons, and these are both gene editing approaches that were a little bit more tedious and difficult to work with and had a higher kind of barrier to entry in terms of expertise to really do them.
I don't believe that they were as strong in terms of how much editing or the types of editing or the quality of the edits that they could imbue into cells or different genetic structures. And then along came CRISPR of variable fame. And CRISPR is a system adapted from bacterial immune systems that allows us to rewrite genetic code pretty much anywhere in the genome with very high fidelity. Of course, studying off-target effects is really, really important, and it's still an area that we're learning about.
But in terms of what it did for me, I think I remember doing a CRISPR experiment when I was a senior in high school. So it was like something that went from a large pharmaceutical or nation state-worthy endeavor to something that you could drop in the hands of an 18 year old per se. So the cheapness, the speed. I mean, these are all things that you can look at. The publication volume for CRISPR is like a straight line up after 2014, but I think just putting it into the hands of more researchers enables new methods and techniques and applications to be built just by virtue of the critical mass that you create. So with CRISPR and the ability to edit DNA,
and the fact that that's now starting to be translated into this year has been amazing in terms of the clinical data we've gotten from phase one and two, and hopefully phase three data readouts where you essentially and I'll talk about it in the context of, I think, the most recent release company. And again, large owner of this company Intellia. They're one of the foundational CRISPR patent holders. And essentially what they did for folks that don't know is they took a disease called ATTR, which is largely there's, I think, a predominant hereditary form of the disease that will not cause you to present with symptoms until you're a little older.
I believe an adolescent or maybe older than that. And it's a problem where your liver, again. So let's lean back on the cumulative learning we've had over this conversation. There's a mutation on the DNA level that is hereditary. So it's term line. You're born with it. And as you age, there is a point where the liver cells are printing a misfolded amyloid protein that I believe acts sort of like a plaque. So as its level begins to rise in the body and escape your body's native ability to metabolize or break down this plaque, it starts to build up inside a really critical to life parts of your vascularity, your circulatory system.
And I'm not exactly sure about the prognosis or I believe the lifespan. It's a terminal condition to my knowledge, and the lifespan, I think, has improved with time, but only through chronic treatments that are expensive and that you have to get perpetually, like injections, for this disease. And at the end of the day, it's not curing you. It's just kind of abating the rate of progression of the disease, and it's terrible, terrible quality of life, lower life expectancy and chronic treatment costs. So not a situation anyone wants to be in.
So Intellia has come along. And again, I'll give the cash that we're not through clinical trials. But the data that we have seen is certainly revolutionary. Essentially, what they were able to do was give individuals a single injection of CRISPR, of a medicine that was generated through CRISPR. And those molecules are coded in what's called a lipid nanoparticle, which is like this microstructure created through fats, and it will navigate to the correct tissue in the body. Right. So that's the problem with drug delivery is like, we can pump it into you, but is it going to navigate to where it needs to go?
So there's a whole set of engineering around that. That's really fascinating. But I won't get into it. Long story short, the CRISPR payload is delivered to the tissue in the liver, and the payload leaves and then gloms on to genomic DNA, navigates to the exact ATTR gene. So the one of 20,000 genes of the 3.2 billion letters contained on the genome and every cell, and it navigates there and effectively silences that gene such that it cannot print the misfolded amyloid protein anymore. So it's called a knockout.
Right. So we're knocking the gene out. We're turning it off with CRISPR. And then what happens is the amyloid protein, as measured through kind of standard diagnostic, starts to drop and drop and drop and drop. And I believe their phase two data showed a drop in amyloid protein that actually outperform the current standard of chronic care through a single dose. Right. So these people are effectively cured of a disease because you're treating like, what the disease is. The mutation, the ATTR mutation, you're silencing it.
Right. So it's really awesome. Similar things that happened with with sickle cell, which is another example of an previously intractable condition. And there's a lot of different genetic diseases, right. And a lot of them are caused by hereditary mutations and genes that cause the later kind of trait and cause the disease to manifest. And patterns of inheritance can sometimes be easy. As with monogenic diseases, like ATTR, which is one gene, one mutation, one disease. Then there's also polygenic disorders that are resulting from the cumulative effect of lots of mutations scattered throughout the genome in different sheets.
So those that are a little harder or a lot harder in some cases to go after. But there are, I don't know the exact number, but thousands of monogenic diseases that potentially if treated through CRISPR would have a lot of value add to the healthcare system. So that was a long soliloquy on gene editing. But generally speaking, I'd say that one is central to the genomic thesis. The other one is synthetic biology. So synthetic biology is our ability to literally create genetic material de novo. So writing a DNA and the genome is a sequence of those four letters like I mentioned, and they're created in biological structures and cells.
But we also have the ability to synthesize them on silicon wafers and have these situations. We're step by step by step, base by base, creating the polymer, creating the DNA polymer, and we can use it for a vast array of different things. So I don't know if people know this, but like in so many different types of diagnostics in order to actually tell a sequencer to focus on a specific part of the genome. So let's say I'm doing again like a liquid biopsy, and I want to look at the mutation status of 20 genes that are oncogenes that are associated with cancer.
How do I tell the sequencer to focus on those specific regions and not the other regions? Because if I'm sequencing those regions and generating data, I'm wasting money because I'm just going to throw away that information. Right. So how do you physically get that focusing to happen? And one of the ways you do it is with synthetic biology. So what you'll do is if here's the sequence that you want, if I can synthesize the complimentary sequence and I can combine those two things together, they will attach to each other and get pulled out of solution.
And then I can amplify them. Right. So make copies and copies and copies to, like, basically amplify their signal such that it's easier to detect with a sequencer. So that process is called an amplicon. And in order to do that, you have to print a complementary DNA snippet to the target that you're the locus that you're trying to look at. So we don't think about it a whole lot, but that's central to the diagnostic world, is targeted analysis require synthetic biology. So the cost decline of printing DNA largely has been huge for doing this.
And the applications go way beyond just the diagnostics realm. We can use synthetic DNA to essentially code. And this goes back to what Gingko does, which is treating cells like computers. They take code, they compile it, and then they execute it the same way that a cell can take code DNA that we write with a synthetic biology instrument, give it to code, and then it executes on that code by converting it into proteins, and then doing exactly what the cell always does. Except we gave it the instructions.
And when you're engineering microbes that have genomes that are like, 5000 base pairs, like these big structures. Because normally, when you're when we're talking about the diagnostic applications, those little amplicons can be like, 20 base pairs. Right. They're tiny. So 5000 is pretty big. Right. So not only did the technology have to come down in cost to where we could engineer entire pathway, gene pathways or genomes. Right. But also the error rate. And this goes back to the thing I said earlier about sample misattribution, where you have an error that happens to statistically fall in a region that was a barcode.
The same thing is true when you're creating DNA, which is okay. Well, if I need to create a 5000 base pair fragment to engineer this microbe and the system that I'm using has a one in 1000 error rate.
How reliably can I do it? Right?
You're going to have five errors. And depending on where they are, your process could break.
Talking about single gene expression, mono is what's the word, monogenetic?
Monogenic. Yeah so it seems like those are the today, right? Those are what we're tackling. And then polygenic is what we're looking to tomorrow. What kind of chasm do we have to cross technologically to start getting into the polygenic disease curing or at least treatments, as opposed to everything we're tackling now being still monogenic.
Yeah. So my reflex is to say there are two big things that come to mind. So the first one is actually understanding the patterns of inheritance and the manifestation and the history of the disease itself. And that whole process sometimes gets referred to as disease penetrance. Meaning, how does the mistakes that are happening on the how do those mistakes that are happening on the genome level eventually propagate. How and when and why? Right. So that's part of it. And sometimes you can go into a heredity or a textbook on heredity.
And there are all these different. I mean, there's, like epistasis and pleiotropy and all these different interactions where one gene, a mutation in one place, balloons out and has four different phenotypic effects or the inverse of that where you need four different mutations to happen in tandem to give rise to one phenotypic effect. So there is this relationship. You can represent them through mathematical models and data structures and actually a lot of graph theory. But generally speaking, that's one of the problems is just a pure science.
How the pattern of heredity works, and for certain, this is definitely not a nontrivial problem, and we need larger data sets. We need them to be better curated and matched with labeled data because training on low quality data… This is one of the things about health care data generally is a lot of it's junk, and it's very followed. And it's a very low quality. And again, if you know anything about machine learning, that's like, kind of a non starter. So that's another problem we're working through.
So there's an infrastructure. I mean, it sounds like really boring problem to site, but unfortunately, a lot of the stuff you have to worry about, the boring kind of devil in the details, types of things. So that's one thing. The other thing I would mention is about we talked about multiplexing from a diagnostic standpoint, but we can talk about plex from an editing standpoint as well. Can you make me can you make multiple edits inside of a cell and still have the cell be viable?
Right. So if you have a polygenic disorder, let's say that's caused by mutations and, like, ten different locations scattered throughout the genome. Can you, not only can you deliver a sufficient quantity of the CRISPR, you know the Invivo editing stack, can you deliver that to the right tissues and edit a sufficient amount of cells because it's like, what is your conversion rate? How many of the cells that are the ones you want to edit? What fraction are you editing? And then there's another problem where it's like, okay, well, you've delivered to the some of the cells you've edited some of the cells, but how many have you edited completely, right.
Do you need all ten edits to happen to all of those cells in order for the therapeutic properties to take hold? Does that make sense?
Multiplexing edits talking about and the problem there can be every time that you edit a genome, there's a risk of scarring. There's a risk of creating often on target edits that are essentially like collateral damage and editing multiple times, especially with adjacent locations on the genome. The cumulative damage of that can be pretty severe in some cases and cause cell death or other sorts of issues.
I think we have a lot of fundamental biological learning. I think we have a lot of curation to do from a database standpoint, and we certainly have to have kind of a more evolved toolkit or a more efficacious toolkit for the actual editing and ensure that we can properly plex to the level that we need. So I think the glide path is going to be monogenic, might keep us. I mean, frankly, like, monogenic might keep us busy for decades, right? Yeah. And that's not to say that the technology isn't going anywhere or that it takes a long time.
For me, it might get to the point where we're able to I mean, there has to be, like, commitment and innovation in the clinical trial process, the patient matching cohort building process, like drugs. There's a lot of things that has to go right to get drugs to market, and that's one of them. So, yeah, it's going to be an intermixing of different things, but those are kind of the big areas that I point out.
Sure. Yeah. It sounds like what you're really saying is that's such a big, broad space worth tackling is that it just takes a while to get through it. And it's specialized enough where you have to think about that path versus the separate paths of polygenic diseases. And so people who are trying to aggressively tackle single monogenic diseases, that's going to take them a really long time. It's not like just a stepping stone to polygenic. It's an industry all of its own, right?
Yeah. Totally. And I mean, the timelines may overlap, right. There might be specific kind of low hanging fruit polygenic disorders that are simple enough to where we might still be firmly in the monogenic area, and a couple of them get figured out. That's totally possible. So I don't think it's going to be like, okay, we finished number 20,000 of 20,000 of monogenic. Let's start on the first polygenic. I don't think like that. And there are also so many new companies working on it that I think we'll kind of have a critical mass of effort.
And hopefully there's a fluid exchange of ideas and learnings and publications that helps people kind of a rising tide lifts all boats sort of situation. Also, another thing that we think about is the likelihood that drugs get approved. And this is one of the things that we think about in terms of the way we value therapeutics companies. And sometimes it's very difficult when you have a pipeline that has 20 assets in different stages of development. Right. So historically, the pharmaceutical models that are constructed are built such that they take into account the likelihood of phase transitions.
So what is the likelihood that a discovery stage early early stage drug candidate gets through phase one and then gets through phase two? So again, it's this funnel process and that's actually been on secular decline for quite a while. An incremental dollar into biotech R&D has not surfaced the same level of value creation that it did in the 80s. And so what we're seeing is that there is a strong and some of the early trends, especially for the companies that are really kind of at the vanguard of this, are starting to seriously increase their biotech return on R&D.
And it's partly because we're not working with, like, the same technology we did in the 80s or even ten years ago. Right. We're taking high volumes of data that are high quality, combining them with super sophisticated machine learning models. We have these training loops that are in place. We have systems that can help these biotech companies can lean on these hyper scalar molecular diagnostic companies that can source patients effectively match them to the right drugs. There are lots of reasons and shots on goal for increasing that probability of success. Their success.
Excuse me. So one of the reasons why we think the earlier stage assets are being undervalued is because those same legacy phase transition multiples are being put on those assets without necessarily going back through and looking at they're not being built the same way as they were in 2015 or 2010. And so you should take into account, like, a higher baseline level of likelihood. And sure, it's an art, not a science. And there are specific cases where there might be more likelihood or less likelihood of success depending on the indication or the exact drug.
But regardless, I think that's a generally true statement.
Yeah. That's huge. And that's widely important. I think a lot of people undervalue across all sorts of innovation platforms, the rate at which people are able to succeed today because of the quality and quantity of tools at their disposal. And it's so interesting to hear that mirrored even in something as specific as health care and clinical trials and all that. Besides, passing through different phases of trials, do you see any other regulatory risks or other industry specific risks when I think about Tesla, for example, you know, tax credits positive or negatively can affect the whole EV industry?
Right. A change in tax policy around that industry. Is there something like that for the genomic space?
Yeah. Totally. I think in therapy, and I'll stick with talking about it from a CRISPR perspective first. So I talked a little bit about off target or unintended edits to genetic material, and that's a real concern. Right. And I think that there's been an increased level of regulatory scrutiny on the manufacturing principles, the scaling up, the product quality control, and what it really boils down to is are you imbuing DNA with changes that you didn't want to, and we don't know the long term effects of doing that.
And so I think there's this pressure for both manufacturers and designers and developers of drugs to have more rigorous procedures when it comes to understanding the quality of the drugs that they're making. And again, this is one of the applications where we think long read sequencing is really valuable is in understanding the quality of genetic medicine, selling gene therapies. Because, in many cases, the off target effects are types of genetic variations that are very difficult to detect with short read sequencing platforms, whether it's because the mutations are the off target effects are very large, and we call those structural variants, or whether it's an insertion or deletion of genetic material or even a very, very small, single letter change that happens to fall in a region of the genome that is difficult for short read sequencers to have high levels of confidence.
And, broadly speaking, those are sections or stretches of the genome that are very repetitive. So you'll have the same three letter genetic motif repeated over and over and over and over like GGC, GGC. Right. So you have 1000 copies of that in a row. Those can be really hard for some sequencers to look at. When I'm looking at some of these papers and I'm seeing therapeutics companies saying, hey, we're applying short read genome sequencing in our quality control standard operating procedures. And at the same time, I see a lot of really interesting research coming out of the application of long read sequencing, both fluorescence based sequencing as well as nanopore sequencing, for actually looking at efficacy of these changes, we're finding out that we may not be as good as we thought for looking at off target effects, and you're never going to catch what you don't look for.
So I think that that could be an interesting shift on the therapeutic side is like are these companies transitioning over to more comprehensive quality control measures to make sure what they're sending out the door is actually what they say. So I think that's going to be kind of a downward regulatory risk that these folks are going to have to contend with as time goes on.
Yeah. No, that makes a lot of sense. And that's super interesting to think about all the things that go into not just developing a solution, but quality controlling it. And we can spend all day talking about the implications and the software and the hardware and the other tools that go into refining, testing and quality controlling a solution. What I'd like to do for the rest of this interview is kind of turn to the future. Right. So when I look at Tesla five or ten years ago versus today, it's very easy to kind of point to where they were and where they are today and all the leaps that got made. The same with Bitcoin and cryptocurrencies and blockchain technology.
Can you help us understand some of the biggest strides that got made that may not be obvious to the individual investor or tech savvy person where we were ten years ago with this? Besides, it costs $3 billion in ten years to sequence a genome. What else can we point to just show how much advancement has been made in this space?
I think the I think the best way to think about it and look at it is to start and understandably, start with, like, the patient experience, like something that is tangible for the people that are watching and listening to this. Right. So the fact that and I'll use, you gave a good springboard there, like DNA sequencing coming down from this hundred million dollar run rate in the early $2000 to $600 or so today is a herculean achievement in biology. Right. And it's crazy, too, because it's not just the orders of magnitude that we crossed.
It's also the objective price level. So you can say, well, we came down four orders of magnitude. This is now like $10,000 a pop. It's like, great. Who cares? Right. Because you haven't unlocked any of the market at that price point below $1,000. It's like, wow. Okay. Hang on a second. Now, certain iterations of this are becoming cheaper than legacy imaging based approaches or things like that. So that's the objective threshold that we passed. And that's the very key one. Right. So I see what you're saying.
I want to mention that there's also, like, an objective level that becomes really critical from a patient base standpoint. And the next iteration of that, which, you know, it's partly technological, but also partly it's going to just require industry evolution or perhaps new approaches. But I'm really excited to see a genome that costs $10 or less. I hope we're at that point by the end of the decade. I certainly think that we could be. And that's another profound, because what is the cost of a genome matter if I only have one, it's like, yeah, well, if you're studying something like cancer that exists at very low levels, you're having to sequence potentially the equivalent of 30 genomes worth of data in a very extreme example.
But that would be intractable even at today's current $600 price point. So there's still a lot of room to go to unlock certain markets. But to your point on things that are amazing, I lean back on something I said earlier about the cost to patient. I think the fact that genomic medicine is now at the point where in 2006, right. Like the only incarnation of genetics, quote, unquote. That was something where a person living in our healthcare system and many others around the globe could just go on their phone and order.
In 2006, the only flavor of that that we had was the ancestry tests, which, as we go on record of having virtually no health care value. The fact that that same price point is now getting you a comprehensive service that includes, you know, clinical grade, state of the art, molecular biology and sequencing and the service layer that surrounds it with things like genetic counselling and integration into clinical decision support tools. I think the fact that that happened should not go unappreciated because another secular trend that I think is going to play out as these costs continue to get driven to the floor.
We're excited to see the business model innovation that happens with this. Totally alien ideas in terms of, yeah, we can do it. We can do it with a smaller scale, like hereditary cancer risk, cardiovascular risk test. But what happens if we activate the patient in settings where there's a real high unmet need in cases where, like cancer care, for instance, if you actually have the disease and you're going through treatment, if we can take another huge leap. And again, the same price point we're in 2006, it was just like, do you, like, cilantro or not is now a liquid biopsy test that is integrated into your health care system that you can get monthly where it's nothing more than a blood draw.
But it's telling you, you know, you've got ten more cells of this clone than that clone of your cancer. It's like that's a herculean achievement to translate the tech to the front edge of patient pay health care, because I know that's the big issue. It's like we have a large, un and underinsured population. We have another group of people that doesn't like to tap their insurer, or they're concerned about engaging their insurer. They're concerned about genetic data, its security, and whether or not their insurer or life policy vendor is going to have an issue with prior authorization or things like that.
And they're just like, no, I just want to pay for it, but I want it to cost the same as a pair of AirPods. I think that's like a legitimate possibility that we're starting to see for more disease areas. And I think by 2025 and certainly by 2030, the areas that it's going to show up are going to be pretty insane, I think, versus now. So that's another big one is just like the translation of it. I think some other really amazing improvements actually happen in the algorithmic world.
Right. So we talked about this day to day use, and I'll give another proof point on numbers and CPU hours. So this task I've been alluding to, genome assembly. I think I've said it a few times. Genome assembly is a very special computational problem. The reason why genome assembly is special is because most times when we look at genetic data, we'll take all the reads, whether they're long or short. We take the reads and we look at a reference genome, which is sort of like a standard ruler.
And we look at each read and we align it to where it should go on the reference genome. And so we're leaning on that reference, it's kind of like a crutch to map every read to its correct location and then look for mutations. Right. So that's called variant calling. Assembly is another flavor of that. That is, historically has, been extremely computationally prohibitive. So the big difference between assembly and variant calling is there's no reference. Alright. So imagine trying to solve a jigsaw puzzle and I take away the front of the box and I say, solve this puzzle and it can be, in the case of a genome, could be millions of pieces.
Okay. And remember the really big, repetitive chunks, it would be like the puzzle that I gave you is actually just a picture of the blue sky, and there are no context clues. So you can't say, oh, well, there's like a rock over here and I can look at the pieces and say, okay, well, these have the same shading, and you can just start to imagine algorithmically how you do that. And it works the same for genome assembly. Right. You look for paradise overlaps between relationships, and you can represent those relationship
as we talked about graph theory, which is like this mathematical, very visual mathematical discipline of, you know, essentially like representing objects like sequence, read as objects and saying, well, this object is related to this object and they kind of tie together. Right. So it's very similar to that. And I think that a couple of years ago, it took like several hundred thousand CPU hours, and you'd have to have a massive server to do that. And I actually am really close to finishing and publishing another blog I've written, which is we can now assemble at the same level of quality.
So we're not losing any quality, same level of quality assembling a genome built from really accurate, long read data. You can potentially assemble it in ten minutes on a MacBook. Right. So that advancement now entirely an algorithmic development, right. It's not like it's a new type of CPU. It's like ten cores or whatever, you know, running on the new M1 Apple MacBook, you feed it high quality, long read data. Ten minutes later you have a reference grade assembly of your genome, it's like ridiculous. So I think that's going to be another big area.
It's like looking at algorithmic development and appreciating the fact that committing to the data, we've got to have the software to make sense of it. And we've really done a lot in the past few years. And I think it's just accelerating now.
Do you see other exciting industries, like starting to emerge? So you mentioned, like, ten years ago, it was just this sort of infotainment industry of hey, let me get my ancestry done. We were sequencing genomes for that. Do you see other forms of not necessarily healthcare specific industries starting to bud as a result of that decline in cost, increase in quality, increase in throughput.
Yeah. I think, like, actually, on the provider level, you're going to see innovative new incarnations of digital, like virtual care broadly, like this shift that we've had happened as a consequence of the pandemic. But we think is here to stay, not something that's transient. The fact that the genetics exist and is becoming so much more digitally interoperable. We think that's going to get pulled into this new sort of wave of digital primary care and digital longitudinal and chronic care and care that is based on a value-based care that's kind of where you're part of a risk sharing entity where your profits are generated from the fact that you've actually made savings for patients and you can you have a take rate from that.
So genetics is a great tool to actually, as we talked about, narrow down that diagnostic funnel, find inefficiencies that previously, we wouldn't have been able to. So I think there's going to be, you know, a lot of partnerships and deal making and business development and corporate kind of connections that happen there. And I hope to see that start to show up hopefully in the very near future.
That's awesome. Yeah. So one of my favorite questions that I got from my audience is we'd love to hear more about your journey to ARK Invest and what you've been doing the last couple of years and how that's changing looking forward. And we know you're publishing more on Medium, for example, walk us through some of the things that a typical researcher would do, but that you do differently at ARK Invest and what's happening in the future there.
Sure. So short statement is I love my job and I love talking about it. I feel very lucky to be working here with the people that I'm working with and especially a lot of everybody but folks on the investment team who have been great mentors. I would say to answer the last part of the question about what's different. I think the big thing is to be a genomics analyst, and I know we've covered everything from proteomics to transcriptomics. We'll call it limited with that stuff. Okay. So being that I think the main difference for me is like the biology is kind of my north star.
So when I'm thinking about how an invention or publication or a new technology will propagate into the investable universe, I'm not necessarily thinking about it in the way that it's being presented to me. So if a new application comes out that lets you look at DNA methylation in the bloodstream. It doesn't matter what that is. What matters is, let's say the paper said here's a great cancer application of this thing. But you know, from understanding the central dogma and the genetics and the molecular biology, that this methylation thing that I just said is also relevant to this disease type and to maybe even some problem that's happening over in synthetic biology, having to do with agricultural or fertilizer production or something that is otherwise orthogonal to how it was presented.
I think that is how I map out my own thinking at the firm. You're looking at it from a systems biology and evolutionary standpoint. So that's the lens. I think that you have to look through everything. You still have to be a health care analyst. You still have to understand reimbursement and the health care system and incentive structures and the different actors and what they do and how they inter relate and the geopolitical components, all of this because you're still operating within the same system. So there's the pragmatic piece, and then there's the biological piece.
So I would say, at some level it's the same thing. But on another level, you're really looking at everything through the lens of biology and also the ecosystem that we have, where I'm constantly working with people who are vastly more knowledgeable than I am when it comes to problems that machine learning can't and can't fix, and sometimes they are very nuanced points that I wouldn't necessarily appreciate but could be relevant for my space. So it's partly also the collaboration between those different groups. And as you point out, there are places where in the modeling, the cost declines come together and are very real, and they are really important for the way that we model and build these things out.
And that can become really important in terms of doing the modeling work. And I guess the last thing that's been really awesome for me is just the connections and going out on Twitter and sometimes taking flack and being polarizing. Other times people agreeing and whatnot it's great. It's great to be in that position. It's a very fortunate position to be in. And I love learning from everyone who is interested in the space and wants to ask questions because I think at the same time you have to not take yourself too seriously and you have to let go of any hubris around thinking that you're always going to be right, if you can be right the majority of the time,
that's great. I mean, what is it like a 300 batting average is awesome. Yeah. Right. If you can do that, you're in a good spot and being able to grow and mature and be able to take that feedback well and admit when you're wrong about things and augment your opinion in the face of new information is really, really important because the field moves fast and you're going to be wrong. You're going to be right and you just kind of have to brush it off and enjoy it at the end of the day.
One of the things that I told myself when I was going and looking for jobs, I was like being truthful to myself. I think I would be disappointed to not be able to do the tech and do the biology and because that's what I went to school for. That's what I really like. And I'm glad that I can be in an environment that just gives me the freedom and the fire power and the autonomy to just go and do that and have fun with it.
And sometimes it's very unexpected the results that you get to, but it always keeps things interesting.
I think your whole paradigm is spot on. I think being able to keep up with the pace of innovation and doing research at the speed of innovation requires a sort of iterative approach where you get something out there that's mostly right quick. You get feedback that you incorporate as opposed to this line, publish all the way through once and done model. I think we're going to keep seeing that this type of approach of a cross innovation platform, but be speedy iterative approach is going to pay off huge in terms of not just building a thesis about the future, but an investment thesis in that future.
Right. So I definitely know that when I read your papers on Medium. When I read your articles on ARKInvest.com, I get the sense that it was refined over time. It keeps track of what's been changing, what's been evolving in the space. And increasingly, I think regular academic publications are losing that. Right. So I think that's a big differentiator.
Yeah. Totally. I mean, what do they say? The enemy of good enough is perfect. And if it doesn't come out the door, then what good is it if it's perfect? Yeah, exactly. Right. Like you said, you've got to be willing to just get a minimum viable product or minimum viable insight out the door and get feedback and be mature enough to say, I've screwed up here, here, here, let me augment it be transparent about that because there are a lot of smart people out there. If piling PhDs in the room was always going to get you to the best investment or the best idea, I'm sure that that would have been done by now, but opening yourself up to the fact that the field is moving at a breakneck speed and I need to incorporate all the opinions that I can to even come close to contending with that.
We think it's the right way to go.
Yeah. I mean, I can't wait to read your next piece. I'm super excited to see how that plays out, too. So we're at 2 hours. Is there anything else you would like to say to the Ticker Symbol: You community? The floor is yours to say, go back, whatever you want whatever you want to talk about.
So I had a blast. You're a tremendous host, and I enjoyed all the questions that you came up with and that your community came up with, and I'm hopeful later on down the road, we can keep bumping into each other. I think the thing that I always say about this space is, like again, guaranteed to impact you personally and your family and your friends. And so I understand and sympathize with the fact that it's also super intimidating. And there's a lot of esoteric language and a lot of, you know, big egos in the space, and we want to try to do what we can to dismantle that a little bit.
And so I would encourage people on Twitter or on any other platform to always ask questions and be aggressive about learning and not really care about what other people think of your questions or your ideas. Because I guarantee you it's all valid and just stay active and stay curious. And hopefully we'll be able to like I said, keep bumping into each other and carrying out the conversation wherever it takes us.
Yeah. I would love that. I'm going to encourage my community to keep reaching out. And obviously, I would love to have you back on Ticker Symbol: You whenever you would like. This has been super informative and wildly entertaining for me as well. And thank you so much for taking so much time out of your busy day to chat with me and my community. I really appreciate it.
Happy to be here. Thanks a lot. Okay. Take care.
If you want to comment on this, please do so on the YouTube Video Here