PHYS-E0550 - Project in Machine Learning for Materials Science D, Lecture, 24.10.2023-28.11.2023
Topic outline
-
Description - Project in machine learning for materials science
Machine learning (ML) techniques enable us to infer relationships from a large amount of seemingly uncorrelated input data. Their predictive power has made them central to product development in IT and we already use them in daily life (Amazon, Netflix, etc.). Physical sciences have been slow to capitalize on the promise of ML, even though their computational implementation is suited to modern simulation techniques. Materials science has recently benefited from a number of ML applications to materials discovery and design (featuring neural networks, genetic algorithms, regression methods, compressed sensing and Bayesian optimisation), that promise to accelerate development of novel technologies. Machine learning for materials science is an exciting new discipline that is now being taught at Aalto University.
"Project in machine learning for materials science" is a project-led lecture course for graduate students who wish to acquire key skills in this cross-disciplinary research field. The project work will be carried out in mixed teams (free choice of topic) and provides ideal opportunities for learning on realistic materials science datasets (experimental or computational). The target of the project work is to get hands-on experience in this field and explore the performance of different machine learning methods on different datasets, which already constitutes an important contribution to the field and the career of course participants. This course follows Introduction to Machine Learning in Materials Science, however, the previous course is not a prerequisite for this project-led course.
Credits
3 ECT are awarded for the course.
Assessment
The course grade is pass/fail. The passing criteria are to complete the project and to participate in the three project check-point sessions.
Course structure and workload
The course is taught in Period 2
- 2 x 2 h introduction and formation of project teams
- 3 x 2h project check-points sessions
- 2 x 2h project consultation sessions
- 70h of independent project work
Learning outcomes
After completion of the course you will be able to:
- Identify research questions in material science that can be solved by machine learning.
- Understand different types of material science datasets for machine learning.
- Perform basic data analysis of datasets.
- Select a suitable materials science data representation as input for machine learning.
- Consider which machine learning methods might be best for tackling different materials science problems.
- Assess and improve the performance of machine learning models.
- Carry out a machine learning project in materials science.
- Critically comment on machine learning applications in materials science (on quality of data analysis, suitability of chosen methods, quality of performance assessment, etc).
Teachers
Course dates
24.10-28.11.2022
-
-
Project work spans approximately 5 weeks. It is up to the groups to set up meetings and organise their project work. Three project-related presentations are scheduled to showcase project progress for each group.
Before the check-point presentations start, please fill in the Project Team Management form. Also, at each project check-point, you will be filling in a peer-review rubric, to help you evaluate and comment on machine-learning applications in materials science.
Project work kick-off presentation (31 October)
Each group will give a 10 min slide presentation followed by 5min of question time. It is desirable that all group members present during the course. You can divide the presentations between team members (e.g. 2 for kick-off, 2 for midterm and all members for the final presentation or any reasonable combination).The presentation should include:- title slide: presentation and group name, group members, course name and date
- introduction: brief description of the materials science question to be solved, project objectives, background of similar ML applications (if applicable).
- methodology: introduction to the dataset and brief outline of which machine learning method(s) you plan to apply
- implementation: work plan, distribution of work and timeline.
Project mid-point presentation (14 November)
Each group will give a 10 min slide presentation followed by 5min of question time. Please indicate which group members worked on which part of the project work. The presentation should include:
- title slide: presentation and group name, group members, course name and date
- brief introduction and timeline of your project: how far have you progressed with your work?
- data analysis: What data analysis have you performed? What does your data look like?
- preliminary tests: What machine learning have you done already?
- next steps: Based on your data analysis and preliminary tests, how will you continue your project?
- All references should be fully cited and/or linked to the resource.
Project final presentation (28 November)
Each group will give a 10 min slide presentation followed by 5min question and discussion time. Please indicate which group members worked on which part of the project work. The presentation should include:
- title slide: presentation and group name, group members, course name and date
- brief recap your project objectives: what did you set out to do?
- machine learning update: what machine learning methods did you apply since the mid-point check-point? How did you evaluate performance and how did they perform?
- final results and conclusions: did you answer the research question? What could be done in future work?
- all references should be fully cited and/or linked to the resource.
- title slide: presentation and group name, group members, course name and date
-
-
Textbooks
Good introduction books to machine learning are: Introduction to Statistical Learning (with applications in R), by G. James, D. Witten, T. Hastie, and R. Tibshirani; Pattern Recognition and Machine Learning by C. Bishop.
Data sources
Nature Scientific Data is a scientific journal that specialises on publishing data sets
Zenodo is an open access data platform on which you can find many data sets.
The article Data-Driven Materials Science: Status, Challenges, and Perspectives reviewed data infrastructures in materials science and contains a list of available infrastructures in mid 2019
The Open Catalyst Project provides computational data for catalysts and machine learning models that operate on this data.
Collection of data resources in materials science.
List of databases in inorganic chemistry by Information Resources on Inorganic Chemistry.
Machine learning in polymer informatics (2021) lists data sources in polymer science
Recent advances and applications of deep learning methods in materials science (2022) reviews deep learning in materials science and provides suitable data sources
Repositories of machine learning models:
DLHub: Simplifying publication, discovery, and use of machine learning models in science describes the DLHub repository of machine learning models.
Review and overview articles:
The following articles are more or less chronologically ordered.
Tutorial article, "Machine learning for quantum mechanics in a nutshell", M. Rupp, 2015 (includes dataset)
Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets, 2015 (review focussing on microscopy)
Machine learning: Trends, perspectives, and prospects, 2015 (early review in Science)
Machine learning in materials informatics: recent applications and prospects, 2017
Nature Physics Editorial, "Machine learning: New tool in the box", 2017 (fundamental materials science applications)
Recent advances and applications of machine learning in solid-state materials science, 2019
Artificial Intelligence to Power the Future of Materials Science and Engineering, 2020 (review that includes material design, performance prediction, and synthesis)
Perspective article on digitalization (2021): Digital Transformation in Materials Science: A Paradigm Change in Material's Development
Gaussian Process Regression for Materials and Molecules (2021) - clear review of the mathematical foundation of Gaussian process regression
The materials tetrahedron has a “digital twin”, 2022 (advocating for data science approach in materials science)
Perspective article on Machine Learning: A New Paradigm in Computational Electrocatalysis (2022)
Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery review (2022)
Recent advances and applications of deep learning methods in materials science (2022) reviews deep learning in materials science and provides suitable data sources -
Prof. Patrick Rinke (patrick.rinke@aalto.fi)
Dr. Joakim Löfgren (joakim.lofgren@aalto.fi)
Nitik Bhatia (nitik.bhatia@aalto.fi)