Gut microbiome predicts cholera susceptibility

Hassan uz-Zaman
8 min readFeb 16, 2020

From where I’m standing, this paper seems to have blown up: Human gut microbiota predicts susceptibility to Vibrio cholerae infection. The view metrics are quite high, and it’s been posted on Duke University’s news site and on the icddr,b blog. I wanted to talk about it not only because I find the topic fascinating, but also because it hits close to home- icddr,b is the institute I’m currently affiliated with (although the authors are from a different lab than mine).

The set up is this. Not all people are equally susceptible to cholera, even after you take factors like age, blood group, or nutritional status into account. In fact, you could have two people with virtually identical circumstances- ecological niches, one wants to say- be exposed to the pathogen, and they might not have the same clinical outcome. Even when everything else is seemingly equal, people show intrinsic differences in their susceptibility to the disease.

Of course, this in itself is a rather banal observation. Disease susceptibility cannot be exhaustively boiled down to environmental and demographic factors alone, there are also traits intrinsic to individuals. Genetics, for example, does determine aspects of one’s immunological preparedness against certain pathogens, so some disparity can be attributed to difference at that level. The interesting question, then, is teasing out these and other specific “intrinsic” factors that might have a say on clinical outcome. The paper under discussion chooses to investigate gut microbiome profile as a candidate.

The game plan is fairly simple. Find a group of people who are at high risk of cholera exposure (in this case- household contacts of cholera patients), keep tallies on who gets a cholera infection and who does not, compare gut microbiota between these two groups to see if variations are associated with cholera infection (or lack thereof).

Here’s how the sampling went. Let’s say, one morning, Timmy shows up to the icddr,b hospital with the tell-tale signs of cholera. Cholera patients are, incidentally, not hard to spot- their stool has a very characteristic “look”. Imagine chicken stock, but browner. Anyway, the doctors can’t rely on their stool senses alone to pronounce Timmy a cholera patient, he must also test positive for a V. cholerae culture test. Which he does. Timmy is now our index patient.

We then recruit his household contacts for our study. There are some inclusion/exclusion criteria though- they must live within the city, mustn’t be too young (<2) or old (>60), or have some other disease. We also exclude people with recent (in the past week) antibiotics use. The general logic of these criteria is, of course, to make sure we only choose people with otherwise representative microbiota, as opposed to those whose gut flora has been ravaged by antibiotics or modified by other disease conditions. The individuals passing these criteria are now enrolled into the study, and they will be subjected to copious amounts of rectal swabbing for the days to come. I wonder how they managed to get consent for this study. Anyway, Timmy’s household contacts are now in the study population, and we extract information about their nutritional status, age, blood group, and other relevant epidemiological factors as per protocol. The authors managed to recruit 66 Timmys and their households for the study.

For the next 9 days, we continue to collect data from Timmy’s household contacts. Our researchers visit them every day, get information about whether they have developed diarrhea, and probe their rectums with cotton-tipped swab sticks. In addition, we also collect blood- but this is done more sparingly, only on days 2, 7, and 30 (where day 1 is the day Timmy was identified as a cholera patient at the hospital, and day 2 is the day we started stalking his family for rectum rubs). These three prongs of monitoring serve the same purpose of detecting cholera infection, but do so in different ways:

  1. Report of diarrhea indicate possible symptom of cholera
  2. A rectal swab being positive for V. cholerae indicates cholera infection (could be symptomatic or asymptomatic)
  3. A four-fold increase in vibriocidal antibody titer (between days 2 and 7, 2 and 30, or 7 and 30) indicates symptomatic cholera infection

If 1 is accompanied by 2 or 3- that indicates a symptomatic cholera infection, while 2 alone indicates the asymptomatic variety. They had to exclude 9 people from their study who showed “ambiguous clinical outcomes”, i.e. 1 without being accompanied by 2 or 3.

So we have settled on a study population at risk of cholera exposure, and we’ve set up a system by which we can track the people who develop cholera in the next 30 days from the ones they do not. Now comes the simple matter of taking their gut microbiota. On the same days that blood samples were collected from Timmy’s household contacts (days 2, 7 and 30, with day 1 being the day we met Timmy the cholera patient), another set of rectal swabs were collected for subsequent 16S sequencing.

Now, this study claims to be about gut microbiota, but the sampling is done via rectal swabs. Does the rectal swab population accurately represent gut microbiota? For starters, the rectum is quite a ways south of the intestinal lumen. This is a question the authors themselves bring up in the discussion. However, they also point out in one of the Supplementary files that their own research conducted earlier on the same population “demonstrated that rectal swab samples approximate 16S sequence results from stool samples”. I don’t know if that sufficiently allays the concern.

So after all was said and done, only 76 contacts (from 124 households) managed to escape the tough exclusion criteria, and it was now time to pull all the DNA from their rectal swabs and commence sequencing. This is a community composition analysis, meaning you’d only need to know what bacterial species are present- not what specific genes they encode; so only a region from the 16S rRNA sequence would suffice our purposes.

Up until this point, I wouldn’t blame you if the study design sounds underwhelming. Given institutional support, none of this seems super difficult to organize. The results might have been mundane as well. In fact, in their first volley, when they used the more garden variety models like univariate statistical tests and ANOVA, the authors found no association between cholera and gut microbiota. The sequence data was clustered into 4181 Operational Taxonomic Units at 97% similarity (OTU’s- remember, at this point you only have upwards of 97% similar clusters of DNA, but you don’t know if these clusters correspond to species, or any other taxonomic category that exists in reality. So provisionally, and for heuristic purposes, we choose to call them OTUs), but none of them were individually associated with susceptibility.

What makes the paper unique is what comes next. This is how I like to imagine the situation went down.

Remember, the sampling was done back in 2012–2014, so quite a ways back. The study design made eminent sense, the researchers had followed the protocol to a t, the microbial community structure data seemed to make sense as well. None of this was helping, though- the tried-and-true statistical analyses method had come up short. There seemed to be no way of predicting cholera susceptibility using gut microbiota. The researchers were sitting on a pile of valuable data, without a clue as to what to do with it. The government was getting ticked off (I don’t know who funded the study, but imagine the government was involved).

Ok so I started writing this setup with a lot of enthusiasm, but now I’m getting frustrated at my sheer inability to write fiction so let me just make this short: one of the researchers got into contact with Bruce Willis, brought him out of retirement despite his initial protests, and him and his ragtag band of outsiders solved the problem by utilizing their very particular, very obscure set of skills:

Machine learning.

Megan Fox may or may not have been involved. Bruce Willis dies at the end. For America.

Anyway, the machine learning model developed by the authors was indeed successful in what they started out to do. Without going into too much gritty detail, their first machine learning approach split up the study population into two groups. Data from the first group was used to train the model to predict cholera susceptibility based on particular gut microbiome profiles, and the model was validated by testing its predictions on the second group. This initial group split was kind of fortuitous- since there were two temporally separated cohorts in the study, the first group could conveniently be used as a “training” group, while the second could be used to test predictions. In subsequent analyses, however, the authors tinkered with their model by splitting the groups in other, random, ways. The predictive capacity was still retained, and this latter “cross-validation” scheme zeroed in on 88 Operational Taxonomic Units as being associated with cholera.

I can imagine the researchers being encouraged by this initial success. This showed that if you’re using the right toolkit, it is possible to find correlations between clinical outcomes and gut microbiome profile. Given how difficult it often is to attribute outcomes to microbiome (read chapter 5 in Ed Yong’s book, thank me later. Here’s my review in case you missed it), this is an important conclusion, as far as it goes. In the next important step of their analysis, the authors pick the 100 most important Operational Taxonomic Units (the earlier predictive 88, plus 12 more- better safe than sorry) and investigate their distribution between infected and uninfected individuals. Bacteria from the Bacteroides phylum were more prevalent among the uninfected, particularly members of the Prevotella genus- which is a common gut bacteria among healthy Bangladeshis.

At the tail end of this study, the team tries to build off of the fruits of this analysis, and use wet lab experiments to see if some of these predictive bacteria actually influence Vibrio cholerae growth. They do this via spent culture supernatant experiments, which consists of letting bacteria-X grow in a media for a specified time until all the nutrients are depleted (hence “spent culture”), filtering out the bacteria-X cells, and growing bacteria-Y in this spent culture supernatant. This is effectively growing bacteria-Y in the juices of bacteria-X. Since all the metabolites and secretions of bacteria-X are in the liquid, growing bacteria-Y in it simulates growing it in a bacteria-X “environment”, as if the two are growing together. This is thought to approximate the in vivo method of injecting a cocktail of two bacteria in (gnotobiotic) mice, and see how one affects the other. None of the predictive bacteria, however, were shown to affect V. cholerae growth one way or the other. There was one bacteria- not strictly predictive but still associated with ongoing V. cholerae infection- that stimulated V. cholerae growth. When the researchers wanted to investigate if this effect is mediated by influencing biofilm production, the experiment showed no effect, and the lead grew cold.

So this paper is an example of how sophisticated analyses can really blow up an otherwise mundane experimental setup. The study design itself is simple and straightforward, the metagenomics sequencing or analyses do not involve any high-end mumbo-jumbo. The raw data generated up to this point wouldn’t look super impressive or “out there”. The paper seems to have the significance that it has due to the analytical tools adopted following the generation of raw data.

I don’t know if modern biology has a bottleneck at the level of analysis. I mean, how difficult is it to develop machine learning models? If not particularly difficult, and there’s no significant bottleneck there, then I’m not sure why we’re not seeing more studies of this kind.

--

--

Hassan uz-Zaman

Husband, biologist, philosophy enthusiast, nothing else much besides. In pursuit of happiness and understanding.