Student Projects / Thesis topics / Student Assistants (m/f/d) | Chemical Literature Data Structuring and Search

Young Researchers Magdeburg
Structural and Cell Biology Chemistry Complex Systems
Job Offer from October 12, 2021

Background

For trained process engineers working in the field of chemical engineering it is difficult to search through specialized chemical literature (papers, textbooks, patents, databases, etc.) and harvest specific information on chemical synthesis pathways, alternative catalyst systems or suitable solvents in order to design the best production process. This is due to the fact, that chemical nomenclature uses numerous names for one specific substance (e.g. ethanol, (absolute) alcohol, ethyl alcohol, ethyl hydrate, hydroxyethane) or one reaction (e.g. hydroformylation, oxo-synthesis, oxo-process, Roelen-reaction), which is why fundamental chemical knowledge and understanding is the key to success along with time consuming manual labor.

With increasing possibilities and advantages automated name entry recognition (NER [1]) and machine learning techniques, other research groups already addressed this issue. Thereby specialized tools for extracting chemical information (e.g. CHEMDNER [2], ChemDataExtractor [3]) or chemical reactions (e.g. chemie-turk [4]) were developed. As these tools do not fulfill our specific requirements, the research groups of Prof. Nürnberger (chair of Data and Knowledge Engineering, Computer Science Faculty of the  Otto von Guericke University in Magdeburg) and Prof. Sundmacher (chair of  Process Systems Engineering, Process and Systems Engineering Faculty of the Otto von Guericke University and director of the Process Systems Engineering group at the Max Planck Institute for Dynamics of Complex Technical Systems) teamed up to develop our own software.

Research Problem / Task

The idea is to use advanced machine learning methods to automate the detection of chemical entries and reactions along with their relations in order to develop a prototype software. In a first step, we want to identify and evaluate the already existing tools with known benchmark datasets and, therefor we are looking for motivated students with:

  1. hands-on experience in programming with Python (sklearn, numpy)/Java and a basic understanding of machine learning;
  2. strong chemical knowledge to extract information from literature manually in order to generate a benchmark dataset. (Programming experience in python is not mandatory but helpful.)

For the first task, we plan to define theses or project works. The second task could also be done by means of a HiWi contract.

Application

Interested students can contact:

The Max Planck Society and the Otto von Guericke University are committed to increasing the number of individuals with disabilities in its workforce and therefore encourages applications from such qualified individuals. Furthermore, the Max Planck Society and the Otto von Guericke University seek to increase the number of women in those areas where they are underrepresented and therefore explicitly encourages women to apply.

Please note the information regarding storage of personal data:

 

[1]  https://en.wikipedia.org/wiki/Named-entity_recognition

[2]  https://jcheminf.biomedcentral.com/articles/10.1186/1758-2946-7-S1-S1

[3]  http://chemdataextractor.org/

[4]  https://pubs.acs.org/doi/10.1021/acs.jcim.1c00284

Go to Editor View