Topic outline

  • There are two groups in Student Project: (1) with a project submission deadline at 17.5; peer-grading deadline 31.5 and (2) with a project submission deadline at 31.5; peer-grading deadline 14.6. First group is meant for students learning with a faster pace and wishing to complete course and get grade before summer. Second option is for Python beginners who might need more time to complete project.

    In the student project, your task is to utilize machine learning methods to solve a problem of your choice. 

    In order to participate in the project, you must submit a project report on MyCourses. The report is submitted as a Python notebook (.ipynb format) on MyCourses page, and should follow the required outline presented below. You can fetch template for the project "R7_StudentProject" with instructions and tips in Jupyter Hub (no need to submit project in JHub). Also, there is an example of how student project could look like.

    The submitted report should contain the Python code used in the project (early prototyping and "scrapbooking" can be excluded). If your code includes large class or function implementations, these can be written in separate .py files. The notebook should be arranged so that the reader can replicate your workflow by running the cells in the notebook in order (See example). If you need to include data file, please move notebook and data file in one folder and upload it here as one zip file. Note, that there is a uploading file size limitation 400MB.

    In addition to submitting the project report, you will be required to grade 3 reports by other students after the deadline for project submission (see criteria below). Final student project grade is an average of points given by peer-reviewers.

    Below is the rough outline that is required for the project report. Note that the contents listed under the sections are not a comprehensive list of requirements, but rather a brief description of the purpose of each section.

    Required outline of the project report

    1. Introduction: Explain the application domain which might be a particular research question, a study assignment, a work-related aspect, or just some every-day life aspect (e.g. predict waiting time at the bus stop). 
    2. Problem Formulation: Formulate the application as an ML problem by explaining what data points are, what features and labels characterize data points, and what metric is used to assess the performance of the models. 
    3. Method: Explain how you applied ML methods to solve the problem. How did you obtain data  (from wikidata.org or your own data files?). How did you split data into training and validation? How did you learn the predictor (which Python library?). 
    4. Results: Discuss the results obtained from the methods. What is the training/validation error? How do the results depend on the hyper-parameters of the methods? 
    5. Conclusion: Summarize the main findings during the project work and outline avenues for future work. Are the results suggesting that the problem is solved satisfactorily or might there be room for improvement?