Machine Learning and Big Data in materials science: How big is Big?

Institutskolloquium

  • Date: Jan 9, 2026
  • Time: 10:30 AM - 12:00 PM (Local Time Germany)
  • Speaker: Prof. Claudia Draxl
  • Location: IPP
  • Room: Günter-Grieger Lecture Hall (Greifswald) and Zoom
  • Host: Dmitry Moseev
  • Contact: dmitry.moseev@ipp.mpg.de
  • Region: Mecklenburg-Vorpommern
  • Topic: Discussion and debate formats, lectures
Machine learning and Big Data in materials science: How big is Big?
The term "big data" governs not only social media and online stores but also most modern research fields. It obviously also applies to materials science, revolutionizing many of its aspects. But what does "big" mean in the context of typical materials-science machine-learning problems? This question involves not only data volume, but also data quality and veracity as much as infrastructure issues. We ask, how models generalize to similar datasets or how high-quality datasets can be gathered from heterogeneous sources. Likewise, we explore how the feature set and complexity of a model can affect expressivity. And what requirements does this all impose on data infrastructures for creating and hosting large datasets and training models? Through selected examples, I will demonstrate that big data presents unique challenges in many aspects that may often be overlooked but would deserve more attention. I will also discuss how a scalable data infrastructure can make our research data AI ready, and thus contribute to solving the problem.

Informationen zur Vortragende


Claudia Draxl is Einstein Professor at the Humboldt-Universität zu Berlin. Her research interests cover theoretical concepts and methodology to gain insight into a variety of materials and their properties. She is developer of the all-electron full-potential package "exciting", implementing density-functional theory (DFT) and methods beyond, with a focus on theoretical spectroscopy. A recently devloped package is the cluster-expansion code CELL. Actual research projects concern organic/inorganic hybrid structures, wide-gap oxides, thermoelectricity, solar-cell materials, film growth, excitation dynamics, and more. She is the spokesperson of the NFDI consortium FAIRmat, which is developing NOMAD, open-access library and a research data management service for collecting, organizing, sharing, analyzing, and publishing FAIR materials science data Novel Materials Discovery. Based on this, her data-driven research aims at finding structure in Big Data of materials science. 
Go to Editor View