Exploiting Big Data to create innovative materials
Twelve Max Planck Society facilities are bundling their expertise in the data-driven materials sciences
Which alloying constituents lend a steel unique bending strength, extreme hardness and non-rusting properties? Are semiconductors that promise greater efficiencies for solar modules available, and do they offer greater flexibility than silicon? What would be the best catalyst for a very specific chemical reaction? Or: How should a surface be coated to achieve the best possible thermal protection? To more easily find answers to these typical problems facing materials scientist in future, researchers from eleven Max Planck Society facilities hope to better exploit the opportunities presented by analyzing large volumes of data. To this end, they cooperate in MaxNet on Big-Data-Driven Materials Science or, simply, BigMax.
Generally, when scientists search for a new material for a specific purpose, they previously had to rely on the results of experiments on selected materials. And yet they never know whether there is not a better solution out there. How practical would it be, then, if researchers from both academia and industry could simply refer to a table to find the optimal material for their purpose? However, this is still far from the reality. "To date, around 240,000 inorganic materials alone are known; yet we have knowledge of only some of the properties of less than 100 of these substances", says Matthias Scheffler, Director at the Max Planck Society's Fritz Haber Institute in Berlin. As a theoretical physicist, he is certain that the large volumes of data being universally collected, also referred to as Big Data, can help to move closer to the table mentioned above. He imagines though this table more as a kind of multi-dimensional materials map.
Scheffler is a co-initiator of the cross-institutional alliance MaxNet on Big Data-Driven Materials Science within the Max Planck Society. The declared aim of BigMax is to innovatively utilize the large, in part previously existing data, and to thereby make them a driving force in materials research. In addition to the Fritz Haber Institute, another eleven MPG facilities are collaborating: the Max Planck Institutes for Dynamics of Complex Technical Systems (in Magdeburg), Colloids and Interfaces (Potsdam-Golm), Microstructure Physics (Halle), Polymer Research (Mainz), Eisenforschung GmbH (Düsseldorf), Biogeochemistry (Jena), Physics of Complex Systems (Dresden), Structure and Dynamics of Matter (Hamburg), Intelligent Systems (Tübingen) and Informatics (Saarbrücken), and the Max Planck Computing and Data Facility (Garching).
Patterns in large data volumes reveal completely new information
"All of these facilities are already working with large data volumes collected during experiments or computer simulations", explains Peter Benner. At the Max Planck Institute for Dynamics of Complex Technical Systems in Magdeburg, the mathematician leads the Computational Methods in Systems and Control Theory Research Group. For example, Benner says, procedures such as x-ray structural analysis or atom probe tomography alone deliver millions of data values per minute; data from which researchers gain insights into the configuration of atoms in solids, for example. Enormous data volumes also result from the quantum mechanics analyses commonplace in solid-state physics and chemistry. The researchers can now draw conclusions on material properties from these data.
However, the new alliance aims to gain even more insights from these data. New methods will be developed to this end, and existing methods refined. "For example, in materials research the data present highly specific challenges to the computer algorithms", explains Benner, who coordinates the new collaboration together with Matthias Scheffler. "This can all be achieved better jointly", says Benner. "Because although we research in different disciplines, the methodological problems are the same for the respective data analyses”.
One of the central objectives: investigating the data for particular structures or patterns, which will then allow completely new information to be extracted, in addition to what is already known. Matthias Scheffler from Berlin points out other disciplines where this is already the case. Epidemiologists, for example, were able to derive in which regions the flu was prevalent based on user queries in Internet search engines. They were able to follow the outbreak's propagation and even forecast its future dispersal on this basis. As Scheffler says, one only needs to recognize the patterns in the data.
A new paradigm in the materials sciences
The cooperating Max Planck scientists are consequently now hopeful that in future, materials researchers can gain new insights from their existing data material. The network aims to concentrate joint activities on five different topics. The objective is to be able to theoretically predict the properties of metals and alloys, determine the causal relationships between material properties and data structures, develop data diagnostics methodologies to convert collected experimental data even more quickly to image information, and facilitate the design of polymer materials with specific, desired properties. In the fifth topic area, the network aims to continue the already started Materials Encyclopaedia. The Novel Materials Discovery Laboratory (NOMAD Centre of Excellence) had previously worked on this encyclopaedia, using exclusively theoretically computed entries. Experimental data will now also be included as part of BigMax.
For Peter Benner, there is no question that the cross-institutional cooperation will integrate complementary information and thus substantially simplify their work. One example he sees is data diagnostics, in which his Magdeburg-based Group collaborates with their colleagues in Potsdam-Golm. "In Golm, they are researching imaging methods that allow new insights into the nanostructures of biomaterials such as bones, for example", explains Benner. "Here, we mathematicians can help to suitably compress the accrued data volume such that they can be quickly converted to informative images."
Until the dream of the multi-dimensional material map is fulfilled, in which one simply looks up the best material to use, there is still a long way to go. But Matthias Scheffler does not doubt the fact that Big Data will help reach this target. Here, he sees a new paradigm in the materials sciences: "Previously, researchers have investigated selected systems and developed models based on a general theoretical understanding", says Scheffler. "I believe that the future quest in terms of Big Data analyses will be the search for structures and patterns in large data volumes. And once we have finally developed the equations to describe them, we can then apply them to materials that we have not even analyzed yet."
New thermoelectric materials based on data from solar cells
The physicist believes he can also reach unconventional solutions much more easily using this method. "In individual experiments one usually begins with established criteria", says Scheffler. "This means: one predominantly searches for supraconductors in the substance group in which one was previously successful." But it is exactly this that makes revolutionary developments more difficult. Here, the structural analysis of large data volumes is much more impartial. Matthias Scheffler can therefore readily envisage new thermoelectric materials – that is, materials that convert undesirable waste heat into useful electricity – being discovered in the future, for example in data generated during solar cell research.
If, one day, it is finally possible to theoretically derive material properties, Peter Benner also sees an additional advantage. "This would save the time and money expended on some experiments", says the mathematician from the Max Planck Institute in Magdeburg. And the patience of the researchers, who currently are often forced to approach solutions using the trial-and-error method, would also be less taxed.
BigMax research topics:
1. Structure and plasticity of metals: The aim is to achieve a scientific understanding of methods of alloying and treating steel to achieve specific properties by exploiting the enormous volumes of data on the positions of the individual atoms.
2. Data diagnostics in x-ray spectroscopy and tomography: Researchers aim to develop methods for converting the immense volumes of data accrued in what is known as small-angle x-ray scattering tomography (SAXS tomography) more quickly to 3D images. In this way, superfluous recognition processes can be more readily identified as such – and then be aborted. At the moment it takes hours, or even days, to generate an image.
3. Discovering interpretable patterns, correlations and causality: Using newly developed methods, the scientists aim to identify and interpret structures within the data. Here, one aim is to recognize relationships in structures with specific material properties – and identify the reasons.
4. Learning thermodynamic properties of soft-matter materials: Some of the properties of both organic and inorganic materials are fundamentally governed by thermodynamic variables such as entropy. Researchers would like to gain a deeper understanding of this and find ways of describing the influence of temperature, for example, on these relationships. Here, Big Data-based techniques such as machine learning play a major role.
5. The NOMAD Encyclopedia: Development of a materials encyclopaedia, compiled as part of the NOMAD Laboratory (Novel Materials Discovery) Centre of Excellence (also see https://nomad-coe.eu), will be continued. The encyclopedia has previously been developed using only computed data, i.e. the results of millions of complex simulations. In future, it will be expanded using experimental data.
KH