celeste-ai/report/parts/results.tex

\section{Results}
% The results of applying the methods to the data set.
% Also discuss why the results makes sense, possible implications.

After sufficient training, our model consistently completed the first stage of \textit{Celeste}. 4000 training episodes were required to achieve this result.

\vspace{2mm}

The figure below summarizes our model's performance during training. The color of each pixel in the plot is determined by the action with the highest predicted value, and the path the agent takes through the stage is shown in white. The agent completes the stage in the \say{4000 Episodes} plot, and fails to complete it within the allocated time limit in all the rest. Training the model on more than 4000 episodes did not have a significant effect on the agent's behavior.

\begin{center}
	\includegraphics[width=\textwidth]{plots}
\end{center}

A few things are interesting about these results. First, we see that the best-action patterns in the above graphs to not resemble the shape of the stage. At every point that the agent doesn't visit, the predicted best action does not resemble the action an intelligent human player would take. This is because the model is not trained on these points. The predictions there are a side-effect of the training steps applied to the points in the agent's path.

\vspace{2mm}

Second, the plots above clearly depict the effect of our modified explore/exploit policy. We can see the first few segments of the agent's path are the same in each graph. In addition, the more the agent trains, the longer this repeated path is. This is a direct result of our explore/exploit policy: our agent stops exploring sections of the stage it can reliably complete, and therefore repeats paths that work.

\vfill
\pagebreak