celeste-ai/report/parts/conclusion.tex

\section{Conclusion}
% What is the answer to the question?

Using the methods described above, we were able to successfully train a Q-learning agent to play \textit{Celeste Classic}.

\vspace{2mm}

The greatest limitation of our model is its slow training speed. It took the model 4000 episodes to complete the first stage, which translates to about 8 hours of training time. A simple evolutionary algorithm, such as the one presented in \textit{AI Learns to Speedrun Celeste} \cite{aispawn} would likely have better performance than our Q-learning agent. Such an algorithm is much better for incremental tasks (such as this one) than a Q-learning algorithm.

\vspace{2mm}

We could further develop this model by making it more autonomous---specifically, by training it on raw pixel data rather than curated \texttt{(player\_x, player\_y)} tuples. This modification would \textit{significantly} slow down training, and is therefore best left out of a project with a ten-week time limit.


\vspace{5mm}

While developing our model, we encountered a few questions that we could not resolve. The first of these is the effect of position scaling, which is visible in the graphs below. Note that colors are inconsistent between the graphs---since we refactored our graphing tools after the right graph was generated.

\vspace{5mm}

\begin{minipage}{0.5\textwidth}
	\begin{center}
	\includegraphics[width=0.9\textwidth]{goodprediction}

	\vspace{1mm}
	\begin{minipage}{0.9\textwidth}
		\raggedright
		\say{Best-action} plot after 500 training episodes with position rescaled to the range $[0, 1]$.
	\end{minipage}
	\end{center}
\end{minipage}
\hfill
\begin{minipage}{0.5\textwidth}
	\begin{center}
	\includegraphics[width=0.9\textwidth]{badprediction}

	\vspace{1mm}
	\begin{minipage}{0.9\textwidth}
		\raggedright
		\say{Best-action} plot after 500 training episodes with position in the original range $[0, 128]$.
	\end{minipage}
	\end{center}
\end{minipage}

\vspace{5mm}

In these graphs, we see that, without changing the model, the scaling of input values has a \textit{significant} effect on the model's performance. Large inputs cause a \say{zoomed-out linear fanning} effect in the rightmost graph, while the left graph, with rescaled values, has a much more reasonable \say{blob} pattern.

\vspace{2mm}

In addition to this, we found that re-centering the game's coordinate system so that \texttt{(0, 0)} is in the center rather than the top-left also has a significant effect on the model's performance. Without centering, the model performs perfectly well. With centering, our loss grows uncontrollably and the model fails to converge.

\vspace{5mm}

In both of these cases, the results are surprising. In theory, re-scaled or re-centered data should not affect the performance of the model. This should be accounted for while training, with the weights of the neural network being adjusted to account for different input ranges. We do not have an explanation for this behavior, and would be glad to find one.

\vfill
\pagebreak
Added report (filesystem cleanup) 2023-11-28 09:28:24 -08:00			`\section{Conclusion}`
			`% What is the answer to the question?`

			`Using the methods described above, we were able to successfully train a Q-learning agent to play \textit{Celeste Classic}.`

			`\vspace{2mm}`

			`The greatest limitation of our model is its slow training speed. It took the model 4000 episodes to complete the first stage, which translates to about 8 hours of training time. A simple evolutionary algorithm, such as the one presented in \textit{AI Learns to Speedrun Celeste} \cite{aispawn} would likely have better performance than our Q-learning agent. Such an algorithm is much better for incremental tasks (such as this one) than a Q-learning algorithm.`

			`\vspace{2mm}`

			`We could further develop this model by making it more autonomous---specifically, by training it on raw pixel data rather than curated \texttt{(player\_x, player\_y)} tuples. This modification would \textit{significantly} slow down training, and is therefore best left out of a project with a ten-week time limit.`



			`\vspace{5mm}`

			`While developing our model, we encountered a few questions that we could not resolve. The first of these is the effect of position scaling, which is visible in the graphs below. Note that colors are inconsistent between the graphs---since we refactored our graphing tools after the right graph was generated.`

			`\vspace{5mm}`

			`\begin{minipage}{0.5\textwidth}`
			`\begin{center}`
			`\includegraphics[width=0.9\textwidth]{goodprediction}`

			`\vspace{1mm}`
			`\begin{minipage}{0.9\textwidth}`
			`\raggedright`
			`\say{Best-action} plot after 500 training episodes with position rescaled to the range $[0, 1]$.`
			`\end{minipage}`
			`\end{center}`
			`\end{minipage}`
			`\hfill`
			`\begin{minipage}{0.5\textwidth}`
			`\begin{center}`
			`\includegraphics[width=0.9\textwidth]{badprediction}`

			`\vspace{1mm}`
			`\begin{minipage}{0.9\textwidth}`
			`\raggedright`
			`\say{Best-action} plot after 500 training episodes with position in the original range $[0, 128]$.`
			`\end{minipage}`
			`\end{center}`
			`\end{minipage}`

			`\vspace{5mm}`

			`In these graphs, we see that, without changing the model, the scaling of input values has a \textit{significant} effect on the model's performance. Large inputs cause a \say{zoomed-out linear fanning} effect in the rightmost graph, while the left graph, with rescaled values, has a much more reasonable \say{blob} pattern.`

			`\vspace{2mm}`

			`In addition to this, we found that re-centering the game's coordinate system so that \texttt{(0, 0)} is in the center rather than the top-left also has a significant effect on the model's performance. Without centering, the model performs perfectly well. With centering, our loss grows uncontrollably and the model fails to converge.`

			`\vspace{5mm}`

			`In both of these cases, the results are surprising. In theory, re-scaled or re-centered data should not affect the performance of the model. This should be accounted for while training, with the weights of the neural network being adjusted to account for different input ranges. We do not have an explanation for this behavior, and would be glad to find one.`

			`\vfill`
			`\pagebreak`