Convert "Regex" to typst

2025-01-22 21:19:29 -08:00
parent 5b245b9e16
commit 08eefd8d0b
2 changed files with 135 additions and 153 deletions
--- a/src/Warm-Ups/Regex/main.tex
+++ b/src/Warm-Ups/Regex/main.tex
@@ -1,153 +0,0 @@
 \documentclass[
 	solutions,
 	hidewarning,
 ]{../../../lib/tex/ormc_handout}
 \usepackage{../../../lib/tex/macros}
 \usepackage{xcolor}
 \usepackage{soul}
 \usepackage{hyperref}
 \definecolor{Light}{gray}{.90}
 \sethlcolor{Light}
 \newcommand{\htexttt}[1]{\texttt{\hl{#1}}}
 \title{The Regex Warm-Up}
 \uptitler{\smallurl{}}
 \subtitle{Prepared by Mark on \today}
 \begin{document}
 	\maketitle
 	Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \par
 	This is often abbreviated \say{regex}, which is pronounced like \say{gif.}
 	\vspace{2mm}
 	Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \par
 	Often enough, a clever regex pattern can do the work of a few hundred lines of code.
 	\vspace{2mm}
 	Like the DFAs we've studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology with regex, and instead say that a string \textit{matches} or \textit{doesn't match} a pattern.
 	\vspace{5mm}
 	Regex strings consist of characters, quantifiers, sets, and groups.
 	\vspace{5mm}
 	\textbf{Quantifiers} \par
 	Quantifiers specify how many of a character to match. \par
 	There are four of these: \htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}}
 	\vspace{2mm}
 	\htexttt{+} means \say{match one or more of the preceding token} \par
 	\htexttt{*} means \say{match zero or more of the preceding token}
 	For example, the pattern \htexttt{ca+t} will match the following strings:
 	\begin{itemize}
 		\item \texttt{cat}
 		\item \texttt{caat}
 		\item \texttt{caaaaaaaat}
 	\end{itemize}
 	\htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \par
 	The pattern \htexttt{ca*t} will match all the strings above, including \texttt{ct}.
 	\vspace{2mm}
 	\htexttt{?} means \say{match one or none of the preceding token} \par
 	The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}.
 	\vspace{2mm}
 	Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \par
 	They specify exactly how many tokens to match: \par
 	\htexttt{ab\{2\}a} will match only \texttt{abba}. \par
 	\htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \par
 	% spell:off
 	\htexttt{ab\{2,\}a} will match any \texttt{ab...ba} with at least two \texttt{b}s.
 	% spell:on
 	\vspace{5mm}
 	\problem{}
 	Write the patterns \htexttt{a*} and \htexttt{a+} using only \htexttt{\{ \}}.
 	\vfill
 	\problem{}
 	Draw a DFA equivalent to the regex pattern \htexttt{01*0}.
 	\vfill
 	\pagebreak
 	\textbf{Characters, Sets, and Groups} \par
 	In the previous section, we saw how we can specify characters literally: \par
 	\texttt{a+} means \say{one or more \texttt{a} character}
 	\vspace{2mm}
 	There are, of course, other ways we can specify characters.
 	\vspace{2mm}
 	The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \par
 	For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \par
 	\htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \par
 	You may negate a set with a \htexttt{\textasciicircum}. \par
 	\htexttt{[\textasciicircum abc]} will match any character except \texttt{a}, \texttt{b}, or \texttt{c}, including symbols and spaces.
 	\vspace{2mm}
 	If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \par
 	Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \par
 	\htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}.
 	\footnotetext{In other words, \say{unbreakable}}
 	\problem{}<regex>
 	You are now familiar with most of the tools regex has to offer. \par
 	Write patterns that match the following strings:
 	\begin{enumerate}[itemsep=1mm]
 		\item An ISO-8601 date, like \texttt{2022-10-29}. \par
 		\hint{Invalid dates like \texttt{2022-13-29} should also be matched.}
 		\item An email address. \par
 		\hint{Don't forget about subdomains, like \texttt{math.ucla.edu}.}
 		\item A UCLA room number, like \texttt{MS 5118} or \texttt{Kinsey 1220B}.
 		\item Any ISBN-10 of the form \texttt{0-316-00395-7}. \par
 		\hint{Remember that the check digit may be an \texttt{X}. Dashes are optional.}
 		\item A word of even length. \par
 		\hint{The set \texttt{[A-z]} contains every english letter, capitalized and lowercase. \\
 		\texttt{[a-z]} will only match lowercase letters.}
 		\item A word with exactly 3 vowels. \par
 		\hint{The special token \texttt{\textbackslash w} will match any word character. It is equivalent to \texttt{[A-z0-9\_]} \\ \texttt{\_} stands for a literal underscore.}
 		\item A word that has even length and exactly 3 vowels.
 		\item A sentence that does not start with a capital letter.
 	\end{enumerate}
 	\vfill
 	\problem{}
 	If you'd like to know more, check out \url{https://regexr.com}. It offers an interactive regex prompt, as well as a cheatsheet that explains every other regex token there is. \par
 	You will find a nice set of challenges at \url{https://alf.nu/RegexGolf}.
 	I especially encourage you to look into this if you are interested in computer science.
 \end{document}
--- a/src/Warm-Ups/Regex/main.typ
+++ b/src/Warm-Ups/Regex/main.typ
@@ -0,0 +1,135 @@
 #import "@local/handout:0.1.0": *
 #show: handout.with(
  title: [The Regex Warm-Up],
  by: "Mark",
 )
 Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \
 This is often abbreviated "regex," which is pronounced like "gif."
 #v(2mm)
 Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \
 Often enough, a clever regex pattern can do the work of a few hundred lines of code.
 #v(2mm)
 Like the DFAs we've studied, a regex pattern _accepts_ or _rejects_ a string. However, we don't usually use this terminology with regex, and instead say that a string _matches_ or _doesn't match_ a pattern.
 #v(5mm)
 Regex strings consist of characters, quantifiers, sets, and groups.
 #v(5mm)
 *Quantifiers* \
 Quantifiers specify how many of a character to match. \
 There are four of these: `+`, `*`, `?`, and `{ }`.
 #v(4mm)
 `+` means "match one or more of the preceding token" \
 `*` means "match zero or more of the preceding token"
 For example, the pattern `ca+t` will match the following strings:
 - `cat`
 - `caat`
 - `caaaaaaaat`
 `ca+t` will *not* match the string `ct`. \
 The pattern `ca*t` will match all the strings above, including `ct`.
 #v(4mm)
 `?` means "match one or none of the preceding token" \
 The pattern `linea?r` will match only `linear` and `liner`.
 #v(4mm)
 Brackets `{min, max}` are the most flexible quantifier. \
 They specify exactly how many tokens to match: \
 `ab{2}a` will match only `abba`. \
 `ab{1,3}a` will match only `aba`, `abba`, and `abbba`. \
 `ab{2,}a` will match any `ab...ba` with at least two `b`s. // spell:disable-line
 #problem()
 Write the patterns `a*` and `a+` using only `{ }`.
 #v(1fr)
 #problem()
 Draw a DFA equivalent to the regex pattern `01*0`.
 #v(1fr)
 #pagebreak()
 *Characters, Sets, and Groups* \
 In the previous section, we saw how we can specify characters literally: \
 `a+` means "one or more `a` characters" \
 There are, of course, other ways we can specify characters.
 #v(4mm)
 The first such way is the _set_, denoted `[ ]`. A set can pretend to be any character inside it. \
 For example, `m[aoy]th` will match `math`, `moth`, or `myth`. \
 `a[01]+b` will match `a0b`, `a111b`, `a1100110b`, and any other similar string. \
 #v(4mm)
 We can negate a set with a `^`. \
 `[^abc]` will match any single character except `a`, `b`, or `c`, including symbols and spaces.
 #v(4mm)
 If we want to keep characters together, we can use the _group_, denoted `( )`. \
 Groups work exactly as you'd expect, representing an atomic#footnote([In other words, "unbreakable"]) group of characters. \
 `a(01)+b` will match `a01b` and `a010101b`, but will *not* match `a0b`, `a1b`, or `a1100110b`.
 #problem()
 You are now familiar with most of the tools regex has to offer. \
 Write patterns that match the following strings:
 - An ISO-8601 date, like `2022-10-29`. \
  #hint([Invalid dates like `2022-13-29` should also be matched.])
 - An email address. \
  #hint([Don't forget about subdomains, like `math.ucla.edu`.])
 - A UCLA room number, like `MS 5118` or `Kinsey 1220B`.
 - Any ISBN-10 of the form `0-316-00395-7`. \
  #hint([Remember that the check digit may be an `X`. Dashes are optional.])
 - A word of even length. \
  #hint([
    The set `[A-z]` contains every english letter, capitalized and lowercase. \
    `[a-z]` will only match lowercase letters.
  ])
 - A word with exactly 3 vowels. \
  #hint([
    The special token `\w` will match any word character. \
    It is equivalent to `[A-z0-9_]`. `_` represents a literal underscore.
  ])
 - A word that has even length and exactly 3 vowels.
 - A sentence that does not start with a capital letter.
 #v(1fr)
 #problem()
 If you'd like to know more, check out `https://regexr.com`.
 It offers an interactive regex prompt,
 as well as a cheatsheet that explains every other regex token there is. \
 You can find a nice set of challenges at `https://alf.nu/RegexGolf`.