Convert "Regex" to typst
This commit is contained in:
		
							
								
								
									
										135
									
								
								src/Warm-Ups/Regex/main.typ
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										135
									
								
								src/Warm-Ups/Regex/main.typ
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,135 @@ | ||||
| #import "@local/handout:0.1.0": * | ||||
|  | ||||
| #show: handout.with( | ||||
|   title: [The Regex Warm-Up], | ||||
|   by: "Mark", | ||||
| ) | ||||
|  | ||||
|  | ||||
| Last time, we discussed Deterministic Finite Automata. One interesting application of these mathematical objects is found in computer science: Regular Expressions. \ | ||||
| This is often abbreviated "regex," which is pronounced like "gif." | ||||
|  | ||||
| #v(2mm) | ||||
|  | ||||
| Regex is a language used to specify patterns in a string. You can think of it as a concise way to define a DFA, using text instead of a huge graph. \ | ||||
|  | ||||
| Often enough, a clever regex pattern can do the work of a few hundred lines of code. | ||||
|  | ||||
| #v(2mm) | ||||
|  | ||||
| Like the DFAs we've studied, a regex pattern _accepts_ or _rejects_ a string. However, we don't usually use this terminology with regex, and instead say that a string _matches_ or _doesn't match_ a pattern. | ||||
|  | ||||
| #v(5mm) | ||||
|  | ||||
| Regex strings consist of characters, quantifiers, sets, and groups. | ||||
|  | ||||
| #v(5mm) | ||||
|  | ||||
|  | ||||
|  | ||||
| *Quantifiers* \ | ||||
| Quantifiers specify how many of a character to match. \ | ||||
| There are four of these: `+`, `*`, `?`, and `{ }`. | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
| `+` means "match one or more of the preceding token" \ | ||||
| `*` means "match zero or more of the preceding token" | ||||
|  | ||||
| For example, the pattern `ca+t` will match the following strings: | ||||
| - `cat` | ||||
| - `caat` | ||||
| - `caaaaaaaat` | ||||
| `ca+t` will *not* match the string `ct`. \ | ||||
| The pattern `ca*t` will match all the strings above, including `ct`. | ||||
|  | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
|  | ||||
| `?` means "match one or none of the preceding token" \ | ||||
| The pattern `linea?r` will match only `linear` and `liner`. | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
| Brackets `{min, max}` are the most flexible quantifier. \ | ||||
| They specify exactly how many tokens to match: \ | ||||
| `ab{2}a` will match only `abba`. \ | ||||
| `ab{1,3}a` will match only `aba`, `abba`, and `abbba`. \ | ||||
| `ab{2,}a` will match any `ab...ba` with at least two `b`s. // spell:disable-line | ||||
|  | ||||
| #problem() | ||||
| Write the patterns `a*` and `a+` using only `{ }`. | ||||
| #v(1fr) | ||||
|  | ||||
| #problem() | ||||
| Draw a DFA equivalent to the regex pattern `01*0`. | ||||
| #v(1fr) | ||||
|  | ||||
| #pagebreak() | ||||
|  | ||||
|  | ||||
|  | ||||
|  | ||||
|  | ||||
|  | ||||
| *Characters, Sets, and Groups* \ | ||||
| In the previous section, we saw how we can specify characters literally: \ | ||||
| `a+` means "one or more `a` characters" \ | ||||
| There are, of course, other ways we can specify characters. | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
| The first such way is the _set_, denoted `[ ]`. A set can pretend to be any character inside it. \ | ||||
| For example, `m[aoy]th` will match `math`, `moth`, or `myth`. \ | ||||
| `a[01]+b` will match `a0b`, `a111b`, `a1100110b`, and any other similar string. \ | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
| We can negate a set with a `^`. \ | ||||
| `[^abc]` will match any single character except `a`, `b`, or `c`, including symbols and spaces. | ||||
|  | ||||
| #v(4mm) | ||||
|  | ||||
| If we want to keep characters together, we can use the _group_, denoted `( )`. \ | ||||
|  | ||||
| Groups work exactly as you'd expect, representing an atomic#footnote([In other words, "unbreakable"]) group of characters. \ | ||||
| `a(01)+b` will match `a01b` and `a010101b`, but will *not* match `a0b`, `a1b`, or `a1100110b`. | ||||
|  | ||||
| #problem() | ||||
| You are now familiar with most of the tools regex has to offer. \ | ||||
| Write patterns that match the following strings: | ||||
|  | ||||
| - An ISO-8601 date, like `2022-10-29`. \ | ||||
|   #hint([Invalid dates like `2022-13-29` should also be matched.]) | ||||
|  | ||||
| - An email address. \ | ||||
|   #hint([Don't forget about subdomains, like `math.ucla.edu`.]) | ||||
|  | ||||
| - A UCLA room number, like `MS 5118` or `Kinsey 1220B`. | ||||
|  | ||||
| - Any ISBN-10 of the form `0-316-00395-7`. \ | ||||
|   #hint([Remember that the check digit may be an `X`. Dashes are optional.]) | ||||
|  | ||||
| - A word of even length. \ | ||||
|   #hint([ | ||||
|     The set `[A-z]` contains every english letter, capitalized and lowercase. \ | ||||
|     `[a-z]` will only match lowercase letters. | ||||
|   ]) | ||||
|  | ||||
| - A word with exactly 3 vowels. \ | ||||
|   #hint([ | ||||
|     The special token `\w` will match any word character. \ | ||||
|     It is equivalent to `[A-z0-9_]`. `_` represents a literal underscore. | ||||
|   ]) | ||||
|  | ||||
| - A word that has even length and exactly 3 vowels. | ||||
|  | ||||
| - A sentence that does not start with a capital letter. | ||||
| #v(1fr) | ||||
|  | ||||
| #problem() | ||||
| If you'd like to know more, check out `https://regexr.com`. | ||||
| It offers an interactive regex prompt, | ||||
| as well as a cheatsheet that explains every other regex token there is. \ | ||||
| You can find a nice set of challenges at `https://alf.nu/RegexGolf`. | ||||
		Reference in New Issue
	
	Block a user