Sajil C. K.

I am an inquisitive person with interests falling at the intersection of mathematics, programming, and research. I like to understand concepts to the core by building things from scratch.

Iterative Policy Evaluation

Iterative Policy Evaluation for Estimating Value Function

Introduction In this tutorial, I am going to code the iterative policy evaluation algotithm from the book “Reinforcement Learning: An Introduction by Andrew Barto and Richard S. Sutton”. I am going to take psuedo code, image and examples from this text. The example I am taking for this tutorial is the gird world maze from Chapter …

Iterative Policy Evaluation for Estimating Value Function Read More »

discount factor dynamics

Discount Factor in Reinforcement Learning

This article shows two key visual intuitions behind the usage of a discount factor in reinforcement learning with image, code, and video. Introduction Most of the advances in science and technology happened in the last 100 years. We can see mind-boggling progress in automotive, medicine, communication, energy, etc. . Among these advances, some technologies shake …

Discount Factor in Reinforcement Learning Read More »

Automated PDF creation with Python and Latex

LaTeX Tutorial

This tutorial is on the typesetting language called LaTeX. LaTeX is useful when it comes to creating complex documents like thesis, project reports, research articles, etc. We will see the basic use-cases in around 22 examples. LaTeX Installation Instructions Windows In Windows, you need to install two programs called MiKTeX and TeXstudio. Please be careful to install …

LaTeX Tutorial Read More »

epsilong greedy algorithm rewards

Epsilon Greedy Algorithm in Bandit Problems

Introduction Bandit problems are the simplest possible reinforcement learning scenario. Here the bandit machine can have k arms and pulling each arm leaves the user a reward. One of the arms will be giving higher rewards in the long run and moreover this pattern could be changing over a time period. Think of the scenario …

Epsilon Greedy Algorithm in Bandit Problems Read More »