It is a known fact that extreme habitats, such as hot springs, for example, spawn special life forms – for instance, bacteria that can thrive in water as hot as 120 degrees Celsius. Scientists and industry expect that studying these bacteria will give rise to new substances, such as heat-resistant proteins. These could be used, for example, in the production of cosmetics or foodstuffs, namely for production processes that require high temperatures. For this reason, scientists are now searching on land and at sea for such extraordinary new microbes.
The simplest solution would be to be able to breed the unicellular organisms in the laboratory and investigate them thoroughly there for new substances. However, many bacteria do not thrive in test tubes. Researchers have thus been rolling up their sleeves for some time now, collecting soil samples by the shovelful and immediately analyzing the entire genetic material of the microbial inhabitants. The hope is that promising genes will be found that contain information about new superproteins.
The problem, however, is that metagenomic analysis usually provides thousands of minute genetic fragments, of which only a few can be assigned to an organism. This is where McHardy’s method, which the researcher has already fed with the genetic fragments of known bacterial groups, comes into play. In this instance, the support vector machine was trained especially in a characteristic of the bacterial genome: short, recurring sequences of bases, known as oligomers, such as the base sequence ACTGAT. Interestingly, certain oligomers are characteristic of the genome of different bacterial groups, just like a fingerprint.
These oligomers arise not only in one, but in several locations of the DNA strand. As a result, oligomers offer a highly suitable means of imposing order on the metagenomic puzzle. When the support vector machine had learned which oligomers are associated with certain bacterial groups, McHardy fed new metagenome sequences of unknown origin into the system, for example sequences from microbe- rich sewage sludge. Once again, the data hovered past the learned numerical values in the vast number space. The experiment was successful: “Based on the characteristic oligomers, the program was able to assign many of the short metagenome sequences to certain bacteria.”