Topic outline

  • Description - Introduction to machine learning in materials science

    Machine learning (ML) techniques enable us to infer relationships from a large amount of seemingly uncorrelated input data. Their predictive power has made them central to product development in IT and we already use them in daily life (Amazon, Netflix, etc.). Physical sciences have been slow to capitalize on the promise of ML, even though their computational implementation is suited to modern simulation techniques. Materials science has recently benefited from a number of ML applications to materials discovery and design (featuring neural networks, genetic algorithms, regression methods, compressed sensing and Bayesian optimisation), that promise to accelerate development of novel technologies. Machine learning for materials science is an exciting new discipline that is now being taught at Aalto University.

    "Introduction to Machine learning in materials science" is a project-led lecture course for graduate students who wish to acquire key skills in this cross-disciplinary research field. Introductory lectures on materials science and machine learning will be followed by tutorial exercises. The course introduces different machine learning methods and provides examples for their application in materials science. The tutorials provide hands-on experience for the different methods. In the subsequent Project in Machine Learning for Materials Science course you will be able to apply the newly learned knowledge to your own data.

    Course level

    The course is intended for students who have completed their Bachelor's degree and have a basic understanding of machine learning or material science and a keen interest interdisciplinary science. Some programming experience or Python knowledge is required to take the course.


    3 ECT are awarded for the course. 


    The course grade is pass/fail. The passing criteria is to attend at least 5 of the 6 tutorial sessions.

    Course structure and workload 

    The course is taught in Period 1

    • 6 x 2 h lectures on machine learning in materials science
    • 6 x 2h hands-on tutorial sessions

    There is no homework for the course and no final exam.

    Learning outcomes

    After completion of the course you:

    • learned the importance of machine learning in materials science.
    • have gained an overview of different machine learning methods.
    • have hands-on experience with Python notebooks.
    • have used different machine learning methods in Python.
    • can approach a range of different problems with suitable machine learning methods.
    • can follow a presentation (e.g. conference or seminar) on machine learning in materials science.


    Course dates


  • In machine learning, we write programs that the machine executes to process data and to learn. To understand machine learning therefore means to understand also how these instructions to the machine are composed. Python has evolved into the standard programming language in machine learning and we will use it in this course for the machine learning tutorials. The course will provide a gentle introduction into Python in the first tutorial, but it would be advantageous, if you have some programming experience (not necessarily in Python) prior to taking the course. We have devised a short pre-assessment notebook for you to test, if you have sufficient programming knowledge. Please go through this pre-assessment, before you decide to sign up for the lecture.


    Here is the link for a short Google Colab notebook that we have designed for you to test your Python knowledge. If you can easily complete the tasks, you have sufficient knowledge for the course. If you know how to complete the tasks in a different language (e.g. C, Fortran, Scala, Matlab), but are unsure about how to do them in Python, you can brush up your Python knowledge before the course (see below) and sign up for the course. If you have no programming experience at all, it would be advisable to first acquire rudimentary Python skills and take this course next year instead.

    Python and machine learning resources

    The University of Helsinki has developed the Elements of AI free online course. This is an excellent resource to start familiarising yourself with machine learning and its practical aspects. The course can be taken in your own time. It is not a prerequisite for this course, but the 2nd part "Building AI" might be useful for you, if you are not sure about your Python knowledge.

    CSC - IT Center for Science provides a Beginner Python course (~10h to complete), which is also available as Jupyter Notebooks. You can also find many Python learning resources online and we encourage you to explore options that work best for you.

  • Not available unless: You belong to all_participants


    Good introduction books to machine learning are: Introduction to Statistical Learning (with applications in R), by G. James, D. Witten, T. Hastie, and R. Tibshirani; Pattern Recognition and Machine Learning by C. Bishop.

    Data sources

    Nature Scientific Data is a scientific journal that specialises on publishing data sets

    Zenodo is an open access data platform on which you can find many data sets. 

    The article Data-Driven Materials Science: Status, Challenges, and Perspectives reviewed data infrastructures in materials science and contains a list of available infrastructures in mid 2019

    The Open Catalyst Project provides computational data for catalysts and machine learning models that operate on this data.

    Collection of data resources in materials science.

    List of databases in inorganic chemistry by Information Resources on Inorganic Chemistry.

    Machine learning in polymer informatics (2021) lists data sources in polymer science

    Recent advances and applications of deep learning methods in materials science (2022) reviews deep learning in materials science and provides suitable data sources

    Repositories of machine learning models:

    DLHub: Simplifying publication, discovery, and use of machine learning models in science describes the DLHub repository of machine learning models.

    Review and overview articles: 

    The following articles are more or less chronologically ordered.

    Tutorial article, "Machine learning for quantum mechanics in a nutshell", M. Rupp, 2015 (includes dataset)

    Big data and deep data in scanning and electron microscopies: deriving functionality from multidimensional data sets, 2015 (review focussing on microscopy)

    Machine learning: Trends, perspectives, and prospects, 2015 (early review in Science)

    Machine learning in materials informatics: recent applications and prospects, 2017

    Nature Physics Editorial, "Machine learning: New tool in the box", 2017 (fundamental materials science applications)

    Recent advances and applications of machine learning in solid-state materials science, 2019

    Artificial Intelligence to Power the Future of Materials Science and Engineering, 2020 (review that includes material design, performance prediction, and synthesis)

    Perspective article on digitalization (2021): Digital Transformation in Materials Science: A Paradigm Change in Material's Development

    Gaussian Process Regression for Materials and Molecules (2021) - clear review of the mathematical foundation of Gaussian process regression

    Toward autonomous design and synthesis of novel inorganic materials (2021)

    The materials tetrahedron has a “digital twin”, 2022 (advocating for data science approach in materials science)

    Perspective article on Machine Learning: A New Paradigm in Computational Electrocatalysis (2022)

    Machine Learning for Electrocatalyst and Photocatalyst Design and Discovery review (2022)

    Recent advances and applications of deep learning methods in materials science (2022) reviews deep learning in materials science and provides suitable data sources

  • Not available unless: You belong to all_participants
  • Not available unless: You belong to all_participants

    In this course we use Google Colab notebooks for the tutorials. We will post the link for the tutorial here on Tuesdays in the corresponding folders. We will also post a solution notebook here on Tuesday evening.

    ❗ NOTE ❗ 

    1. You might see a warning saying that the notebook is not authored by Google, please ignore the warning.
    2. Please save the notebook to your Google drive to ensure that your work is saved. To save the notebook, Click File -> Save to Drive.

    Some useful resources:

    🔥 Colab Introduction: If you're not familiar with Colab, you can find a quick introduction  (approximately first 10 mins)   It shows you how to run code on colab and write text in the text cells.

    📚 Here's also a link to Colab documentation. Which goes over similar content as in the video above.

    🤔 Colab is like a Jupyter notebook, but run on Google servers. A more indepth introduction to Jupyter notebooks can be found in the following video. 

  • Not available unless: You belong to all_participants

    Prof. Patrick Rinke (
    Dr. Armi Tiihonen (
    Dr. Matthias Stosiek (