Regular expressions are a powerful language for matching text patterns. This section will explain the basic syntax of regular expressions, and demonstrate how they can be used to search text for patterns.
-
It's used instead of manually searching for a string or pattern in a text
# Manually (BAD) if '@' in email and '.' in email: print('Valid email') # Regex (GOOD) import re if re.search(r'@.*\.', email): # '\' before '.' to escape it (e.g. ".com") print('Valid email')
rin front of the string means that it's a raw string and not a normal string. which means that it will ignore the special characters like\nand\tand treat them as normal characters
-
It's done using
relibrary
They are a set of rules that are used to match a pattern in a string.
- It can be used to match a single character or a group of characters (e.g. a word) and the position of the match in the string
Examples:
-
.: matches any character except a newline# any character followed by '@' re.search(r'.@', 'Hello World') # false re.search(r'.@', 'Hello@World') # true
-
If you want to use literal
.you have to escape it using\(e.g.\.)# any character followed by '.' ✅ re.search(r'.\.', 'Hello World') # false re.search(r'.\.', 'Hello.World') # true # any character followed by '.' ❌ re.search(r'..', 'Hello World') # true -> "He
-
-
+: matches one or more repetitions of the preceding regexre.search(r'.+', 'Hello World') # "Hello World" (1 or more characters) # any character followed by '@' followed by any character re.search(r'.+@.+', 'Hello') # false re.search(r'.+@.+', 'Hello@') # false re.search(r'.+@.+', 'Hello@World') # true
-
It's equivalent to
{1,}or..*re.search(r'.+', 'Hello World') # "Hello World" (1 or more characters) re.search(r'.{1,}', 'Hello World') # "Hello World" (1 or more characters) re.search(r'..*', 'Hello World') # "Hello World" (1 character followed by 0 or more characters) (equivalent to .+)
-
-
*: matches zero or more repetitions of the preceding regex-
When mixed with
.it will match any number of characters and not just onere.search(r'..', 'Hello World') # true -> "He" # Matches zero or more repetitions of the preceding regex re.search(r'..*', 'Hello World') # "Hello World" (0 or more characters) re.search(r'..*', ' Hello World') # " " (0 or more characters)
-
-
?: matches zero or one repetitions of the preceding regex# Matches zero or one repetitions of the preceding regex re.search(r'..?', 'Hello World') # "He" (0 or 1 characters) re.search(r'..?', ' Hello World') # " H" (0 or 1 characters)
-
{m}: matches exactlymrepetitions of the preceding regex# Matches exactly m repetitions of the preceding regex re.search(r'.{2}', 'Hello World') # "He" re.search(r'.{2}', ' Hello World') # " H"
-
{m, n}: matches any number of repetitions of the preceding regex frommton# Matches any number of repetitions of the preceding regex from m to n -
^and$are used to match the start and end of a string respectively# Matches the start of a string re.search(r'^Hello', 'Hello World') # "Hello" # Matches the end of a string re.search(r'World$', 'Hello World') # "World"
-
[]: matches any character inside the brackets-
It's used when you want to look for a specific set of characters (e.g. vowels, numbers, etc.), instead of using
.which matches any characterre.search(r'[xzy]', 'Hello World') # means any character from x, z, or y -> false re.search(r'[aeiou]', ' Hello World') # means any character from a, e, i, o, or u -> true because of "e"
-
[a-z]: matches any character fromatozre.search(r'[a-z]', 'Hello World') # means any character from a to z re.search(r'[a-z]+', 'Hello World') # means any number of characters from a to z (equivalent to [a-z]*)
-
[^]: matches any character not inside the bracketsre.search(r'[^a-z]', 'Hello World') # means any character not from a to z -> false because of "H" re.search(r'[^a-z]', ' Hello World') # means any character not from a to z -> true because of " "
-
-
|: matches either the regex before or after it -> (OR)re.search(r'Hello|World', 'Hello World') # "Hello" re.search(r'Hello|World', 'Hello') # "Hello" re.search(r'Hello|World', 'World') # "World"
-
\w: matches any word character (a-z, A-Z, 0-9, _)# Matches any word character (a-z, A-Z, 0-9, _) re.search(r'\w', 'Hello World') # "H" re.search(r'\w\w', 'Hello World') # "He"
-
\W: matches any non-word character (spaces, symbols, etc.)# Matches any non-word character (spaces, symbols, etc.) re.search(r'\W', 'Hello World') # " " re.search(r'\W\W', 'Hello World') # " H" re.search(r'^.+@.+\.com$', 'test@test.com') # true (email) re.search(r'^.+@.+\.com$', 'test@test') # false (doesn't end with .com)
-
\s: matches any whitespace character (spaces, tabs, newlines, etc.)# Matches any whitespace character (spaces, tabs, newlines, etc.) re.search(r'\s', 'Hello World') # " " re.search(r'\s\s', 'Hello World') # " "
-
?: matches zero or one repetitions of the preceding regex (to its left) -> (optional)# Matches zero or one repetitions of the preceding regex re.search(r'\s?', 'Hello World') # " " re.search(r'\s?\s?', 'Hello World') # " " re.search(r'^https?://', 'https://google.com') # "https://" re.search(r'^https?://', 'http://google.com') # "http://" # instead of: re.search(r'^(https|http)://', 'https://google.com') # "https://" re.search(r'^(https|http)://', 'http://google.com') # "http://"
-
It's usually used with
()to group a regex and make it optionalre.search(r'(\s)?', 'Hello World') # " "
-
-
(): is used to group a regex or capture a group of characters# is used to group a regex re.search(r'(\s)?', 'Hello World') # " " # is used to capture a group of characters match = re.search(r'(\w+)@(\w+)\.com', 'test@test.com') print(match.group(0)) # test@test print(match.group(1)) # test print(match.group(2)) # test last, first = match.groups() # tuple unpacking
-
Example of using
()instead of.split()to capture a group of characters# Using .split() name = 'John, Smith' first, last = name.split(', ') # tuple unpacking # Using () regex name = 'John, Smith' match = re.search(r'^(.+), (.+)$', name) first, last = match.groups() # tuple unpacking
-
