Machine Learning and Big Data in materials science: How big is Big?
Institutskolloquium
- Date: Jan 9, 2026
- Time: 10:30 AM - 12:00 PM (Local Time Germany)
- Speaker: Prof. Claudia Draxl
- Location: IPP
- Room: Günter-Grieger Lecture Hall (Greifswald) and Zoom
- Host: Dmitry Moseev
- Contact: dmitry.moseev@ipp.mpg.de
- Region: Mecklenburg-Vorpommern
- Topic: Discussion and debate formats, lectures
The term "big data" governs not only social media and online stores but also most modern research fields. It obviously also applies to materials science, revolutionizing many of its aspects. But what does "big" mean in the context of typical materials-science machine-learning problems? This question involves not only data volume, but also data quality and veracity as much as infrastructure issues. We ask, how models generalize to similar datasets or how high-quality datasets can be gathered from heterogeneous sources. Likewise, we explore how the feature set and complexity of a model can affect expressivity. And what requirements does this all impose on data infrastructures for creating and hosting large datasets and training models? Through selected examples, I will demonstrate that big data presents unique challenges in many aspects that may often be overlooked but would deserve more attention. I will also discuss how a scalable data infrastructure can make our research data AI ready, and thus contribute to solving the problem.