{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### MS-E2191 - Graduate Seminar on Operations Research\n", "#### Value iteration for solving Markov Decision Processes\n", "## 1 Craft beer company\n", "Using provided documentation and presentation material, fill in the missing 4 parts and solve the example from the presentation using Value Iteration. Use discount factor 0.9.\n", "\n", "We use MDPtoolbox, which is a package available for Python, R and Matlab. Its documentation can be found here https://cran.r-project.org/web/packages/MDPtoolbox/MDPtoolbox.pdf\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Preparations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "STUDENT_NAME = ## FILL YOUR NAME ##" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run cell to install and add package\n", "install.packages(\"MDPtoolbox\")\n", "library(MDPtoolbox)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solving the problem\n", "See documentation for examples" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Transition probabilities\n", "P <- array(0, c(2,2,2))\n", "P[,,1] <- matrix(c( 0.5, ## FILL PROBABILITIES ## ), 2, 2, byrow=TRUE) # for action 1\n", "P[,,2] <- matrix(c( ## FILLL PROBABILITIES ## ), 2, 2, byrow=TRUE) # for action 2\n", "\n", " \n", "# Rewards \n", "R<- ## FILL REWARDS (there are two ways to do this) ##\n", "\n", "# Solve using Value Iteration \n", "## SOLVE THE PROBLEM USING CORRECT FUNCTION (see documentation) ##" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Forest management\n", "\n", "Let's consider forest managemenet problem, where we can either keep the forest as it is or cut the forest. For every three year, we choose either Keep or Cut action. Every time we get 1 unit of money from cutting part of the forest and 0 from leaving it as it is. In the last year, we get either 3 units of money by cutting the rest of the forest or 3 units by conserving.\n", "\n", "We start growing the forest after a wildfire. It may burn again with probability p.\n", "\n", "#### Task\n", "Solve the problem using value iteration, similarly to the first exercise. \n", "\n", "#### A \n", "Start with discount factor 0.9. How different values of discount factor change the policy? How different values of discount factor affect the convergence of Value Iteration?\n", "\n", "#### B \n", "With discount factor 0.9, the probability of wildfire inceases to 0.3. How does the optimal policy change?\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "S = 10 # number of steps\n", "r1 = 3 # reward for conservation\n", "r2 = 3 # reward for cutting the rest of the forest\n", "p = 0.05 # wildfire probability\n", "\n", "values = mdp_example_forest(S, r1, r2, p) # generates values for this example\n", "P = values$P # transition probabilities\n", "R = values$R # rewards" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "R" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solving the problem" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "## SOLVE THE PROBLEM USING CORRECT FUNCTION (see documentation) ##" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 4 }