This commit is contained in:
Mark 2023-06-27 21:23:37 -07:00
parent a87e4417ce
commit 3bc44ed867
Signed by: Mark
GPG Key ID: AD62BB059C2AAEE4

View File

@ -6,6 +6,8 @@
\usepackage{xcolor} \usepackage{xcolor}
\usepackage{soul} \usepackage{soul}
\usepackage{hyperref}
\usepackage[T1]{fontenc} % Fixes texttt braces
\definecolor{Light}{gray}{.90} \definecolor{Light}{gray}{.90}
\sethlcolor{Light} \sethlcolor{Light}
@ -20,30 +22,33 @@
\maketitle \maketitle
Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \\ Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \par
(abbreviated \say{regex}, which is pronounced like \say{gif}) This is often abbreviated \say{regex}, which is pronounced like \say{gif.}
\vspace{2mm} \vspace{2mm}
Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \\ Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \par
Often enough, a clever regex pattern can do the work of a few hundred lines of code. \\ Often enough, a clever regex pattern can do the work of a few hundred lines of code.
\vspace{2mm} \vspace{2mm}
Like the DFAs we have studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology when discussing regex, instead opting to say a pattern \textit{matches} or \textit{doesn't match} a string. \\ Like the DFAs we've studied, a regex pattern \textit{accepts} or \textit{rejects} a string. However, we don't usually use this terminology with regex, and instead say that a string \textit{matches} or \textit{doesn't match} a pattern.
\vspace{5mm} \vspace{5mm}
\textbf{Quantifiers} \\ Regex strings consist of characters, quantifiers, sets, and groups.
Quantifiers tell us how many of a character to match. \\
There are four of them: \vspace{5mm}
\htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}}
\textbf{Quantifiers} \par
Quantifiers specify how many of a character to match. \par
There are four of these: \htexttt{+}, \htexttt{*}, \htexttt{?}, and \htexttt{\{ \}}
\vspace{2mm} \vspace{2mm}
\htexttt{+} means \say{match one or more of the preceding token} \\ \htexttt{+} means \say{match one or more of the preceding token} \par
\htexttt{*} means \say{match zero or more of the preceding token} \\ \htexttt{*} means \say{match zero or more of the preceding token}
For example, the pattern \htexttt{ca+t} will match the following strings: For example, the pattern \htexttt{ca+t} will match the following strings:
\begin{itemize} \begin{itemize}
@ -51,19 +56,19 @@
\item \texttt{caat} \item \texttt{caat}
\item \texttt{caaaaaaaat} \item \texttt{caaaaaaaat}
\end{itemize} \end{itemize}
\htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \\ \htexttt{ca+t} will \textbf{not} match the string \texttt{ct}. \par
The pattern \htexttt{ca*t} will match all the strings above, including \texttt{ct}. The pattern \htexttt{ca*t} will match all the strings above, including \texttt{ct}.
\vspace{2mm} \vspace{2mm}
\htexttt{?} means \say{match one or none of the preceding token} \\ \htexttt{?} means \say{match one or none of the preceding token} \par
The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}. \\ The pattern \htexttt{linea?r} will match only \texttt{linear} and \texttt{liner}.
\vspace{2mm} \vspace{2mm}
Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \\ Brackets \htexttt{\{min, max\}} are the most flexible quantifier. \par
They specify exactly how many tokens to match: \\ They specify exactly how many tokens to match: \par
\htexttt{ab\{2\}a} will match only \texttt{abba}. \\ \htexttt{ab\{2\}a} will match only \texttt{abba}. \par
\htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \\ \htexttt{ab\{1,3\}a} will match only \texttt{aba}, \texttt{abba}, and \texttt{abbba}. \par
\htexttt{ab\{2,\}a} will match any \texttt{ab...ba} with at least two \texttt{b}s. \htexttt{ab\{2,\}a} will match any \texttt{ab...ba} with at least two \texttt{b}s.
\vspace{5mm} \vspace{5mm}
@ -83,52 +88,52 @@
\textbf{Characters, Sets, and Groups} \\ \textbf{Characters, Sets, and Groups} \par
We specify characters literally, as shown above: \\ In the previous section, we saw how we can specify characters literally: \par
\texttt{a+} means \say{one or more \texttt{a} character} \\ \texttt{a+} means \say{one or more \texttt{a} character}
\vspace{2mm} \vspace{2mm}
There are, however, other ways we can specify characters. \\ There are, of course, other ways we can specify characters.
\vspace{2mm} \vspace{2mm}
The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \\ The first such way is the \textit{set}, denoted \htexttt{[ ]}. A set can pretend to be any character inside it. \par
For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \\ For example, \htexttt{m[aoy]th} will match \texttt{math}, \texttt{moth}, or \texttt{myth}. \par
\htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \\ \htexttt{a[01]+b} will match \texttt{a0b}, \texttt{a111b}, \texttt{a1100110b}, and any other similar string. \par
You may negate a set with a \htexttt{\textasciicircum}. \\ You may negate a set with a \htexttt{\textasciicircum}. \par
\htexttt{[\textasciicircum abc]} will match any character except \texttt{a}, \texttt{b}, or \texttt{c}, including symbols and spaces. \htexttt{[\textasciicircum abc]} will match any character except \texttt{a}, \texttt{b}, or \texttt{c}, including symbols and spaces.
\vspace{2mm} \vspace{2mm}
If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \\ If we want to keep characters together, we can use the \textit{group}, denoted \htexttt{( )}. \par
Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \\ Groups work exactly as you'd expect, representing an atomic\footnotemark{} group of characters. \par
\htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}. \\ \htexttt{a(01)+b} will match \texttt{a01b} and \texttt{a010101b}, but will \textbf{not} match \texttt{a0b}, \texttt{a1b}, or \texttt{a1100110b}.
\footnotetext{In other words, \say{unbreakable}} \footnotetext{In other words, \say{unbreakable}}
\problem{}<regex> \problem{}<regex>
You are now familiar with most of the tools regex has to offer. \\ You are now familiar with most of the tools regex has to offer. \par
Write patterns that match the following strings: Write patterns that match the following strings:
\begin{enumerate}[itemsep=1mm] \begin{enumerate}[itemsep=1mm]
\item An ISO-8601 date, like \texttt{2022-10-29}. \\ \item An ISO-8601 date, like \texttt{2022-10-29}. \par
\hint{Invalid dates like \texttt{2022-13-29} should also be matched.} \hint{Invalid dates like \texttt{2022-13-29} should also be matched.}
\item An email address. \\ \item An email address. \par
\hint{Don't forget about subdomains, like \texttt{math.ucla.edu}.} \hint{Don't forget about subdomains, like \texttt{math.ucla.edu}.}
\item A UCLA room number, like \texttt{MS 5118} or \texttt{Kinsey 1220B}. \item A UCLA room number, like \texttt{MS 5118} or \texttt{Kinsey 1220B}.
\item Any ISBN-10 of the form \texttt{0-316-00395-7}. \\ \item Any ISBN-10 of the form \texttt{0-316-00395-7}. \par
\hint{Remember that the check digit may be an \texttt{X}. Dashes are optional.} \hint{Remember that the check digit may be an \texttt{X}. Dashes are optional.}
\item A word of even length. \\ \item A word of even length. \par
\hint{The set \texttt{[A-z]} contains every english letter, capitalized and lowercase. \\ \hint{The set \texttt{[A-z]} contains every english letter, capitalized and lowercase. \\
\texttt{[a-z]} will only match lowercase letters.} \texttt{[a-z]} will only match lowercase letters.}
\item A word with exactly 3 vowels. \\ \item A word with exactly 3 vowels. \par
\hint{The special token \texttt{\textbackslash w} will match any word character. It is equivalent to \texttt{[A-z0-9\_]} \\ \texttt{\_} stands for a literal underscore.} \hint{The special token \texttt{\textbackslash w} will match any word character. It is equivalent to \texttt{[A-z0-9\_]} \\ \texttt{\_} stands for a literal underscore.}
\item A word that has even length and exactly 3 vowels. \item A word that has even length and exactly 3 vowels.
@ -145,6 +150,7 @@
\problem{} \problem{}
If you'd like to know more, check out \texttt{regexr.com}. It offers an interative regex prompt, as well as a cheatsheet that explains every other regex token there is. You will find a nice set of challenges at \texttt{http://regex.alf.nu}. \\ If you'd like to know more, check out \url{https://regexr.com}. It offers an interative regex prompt, as well as a cheatsheet that explains every other regex token there is. \par
You will find a nice set of challenges at \url{https://alf.nu/RegexGolf}.
I especially encourage you to look into this if you are interested in computer science. I especially encourage you to look into this if you are interested in computer science.
\end{document} \end{document}