As a graduate student using Next Generation Sequencing Techniques I ask myself 100 technical questions a day on the method to be used, the amount of decisions to take to get robust results is humongous. Usually, when you read recent papers and they all (or almost all) talk about OTUs at 97% similarity, you do the same without thinking. However, I hear more and more concern about the robustness of OTUs as proxies for ecological similarity in bacterial communities. And even more, I noticed that 9 OTUs represent 32,6% in one of our datasets. This raised a red flag for me and needed to be investigated… What are these OTUs and how do they behave across our samples?
Damn it, more questions…
Here for the blog: http://meren.github.io/
Here for the paper: http://onlinelibrary.wiley.com/doi/10.1111/2041-210X.12114/epdf
Oligotyping is a “supervised computational method”, based on canonical techniques, that enables researchers to go beyond OTUs and investigate the sub-structure of their sequences in environmental data sets of 16S rRNA gene data. An oligotype is identified by the presence of nucleotides in information-rich (highest Shannon entropy) positions in reads. Therefore, it allows us to structure an OTU into different groups of sequences differing by a single or multiple nucleotides. With these oligotypes, one can test if there are changes in their behavior in samples (species/time/location) and understand better the dynamics of the bacterial communities. Indeed, 97% similarity OTUs could be masking a huge part of bacterial ecology and dynamics across samples and weaken studies conclusions. This is Figure 3a from the Oligotyping paper:
Here for the paper: http://www.nature.com/ismej/journal/v9/n1/pdf/ismej2014117a.pdf
In comparison, there is also this paper from Tikhonov et al. (2015) where they present a clustering-free approach allowing researchers to define sub-OTUs structure into what they call “subpopulations” independently from the similarity of 16S tag sequences. They use time-series to demonstrate that it is possible to structure sub-OTUs groups by combining an error-model-based denoising and systematic cross-sample comparisons. The biggest difference with Oligotyping is that the method is unsupervised, needing no input from the researchers at each step. This method compares the dynamic of pairs of sequences in time through the Pearson correlation of the measured abundance traces (with normalization by maximum possible correlation). As shown by their results, two sequences sharing 100% similarity can behave differently through time (thus one could infer that they belonged to separate ecological population) whereas two sequences at 81% similarity can behave in the identical way. These results suggest that we should not rely only on OTUs to draw understand bacterial community dynamics. This is part of Figure 2 from Tikhonov et al. (2015):
I don’t mean to say that there is no value in looking at OTUs but rather that it appears beneficial to compare the trends seen at the OTU level with those at the sub-OTU level. For my fellow graduate students trying to find their way in analyzing 16S sequences without going crazy, I definitely suggest you read these two papers and consider going beyond OTUs to understand the ongoing dynamics in your samples. Good luck and I hope this was useful to you! Cheers!