Defining a regular expression is to provide a sequence of characters, the pattern, that will match sequences of characters in a target.
Here are several places to look for help:
- Python Library Reference: 4.2.1 Regular Expression Syntax
- Regular Expression HOWTO
The patterns or regular expressions can be defined as follows:
Literal characters must match exactly. For example, “a” matches “a”.
Concatenated patterns match concatenated targets. For example, “ab” (“a” followed by “b”) matches “ab”.
Alternate patterns, separated by a vertical bar, match either of the alternative patterns. For example, “(aaa)|(bbb)” will match either “aaa” or “bbb”.
Repeating and optional items:
- “abc*” matches “ab” followed by zero or more occurances of “c”, for example, “ab”, “abc”, “abcc”, etc.
- “abc+” matches “ab” followed by one or more occurances of “c”, for example, “abc”, “abcc”, etc, butnot “ab”.
- “abc?” matches “ab” followed by zero or one occurances of “c”, for example, “ab” or “abc”.
Sets of characters – Characters and sequences of characters in square brackets form a set; a set matches any character in the set or range. For example, “[abc]” matches “a” or “b” or “c”. And, for example, “[_a-z0-9]” matches an underscore or any lower-case letter or any digit.
Groups – Parentheses indicate a group with a pattern. For example, “ab(cd)*ef” is a pattern that matches “ab” followed by any number of occurances of “cd” followed by “ef”, for example, “abef”, “abcdef”, “abcdcdef”, etc.
There are special names for some sets of characters, for example “d” (any digit), “w” (any alphanumeric character), “W” (any non-alphanumeric character), etc. SeePython Library Reference: 4.2.1 Regular Expression Syntax for more.
Because of the use of backslashes in patterns, you are usually better off defining regular expressions with raw strings, e.g. r”abc”.