Python Regex Cheat Sheet
This guide is the “just enough to be dangerous” version of python regex. It shows how to use the re
module for real tasks—search
, match
, fullmatch
, capture groups, and flags—without turning simple jobs into unreadable puzzles. You’ll also see direct, plain-Python alternatives (in
, split()
, partition()
, slicing) so you don’t reach for regex when a one-liner will do.
We’ll keep it practical:
- When to use python and regex vs. when to stick with string methods.
- A small set of patterns you’ll actually remember.
- Clear python regex search example and python match regex example snippets.
- How to read and use a python regex capture group and a named python regex match group.
- Common pitfalls (greedy vs. lazy,
match
vs.search
) and how to avoid them.
If you’re looking for a python regex cheat sheet (or “regex cheat sheet python” if that’s what you typed), this article favors legible solutions first. When regex is the right tool, we’ll show the minimal pattern that solves the problem—no “clever” one-liners, no unnecessary abstraction, just code you’ll still understand next week.
What is Regex in Python?
Regex (regular expressions)—sometimes written regexp—is a compact mini-language for describing text patterns. In Python, regex lives in the standard-library re
module (“python re regex”). If you’ve ever asked “what is regex in Python?” or “what is regexp?”, the short answer is: it’s how you search, match, extract, and replace structured text with precision. Use it when simple methods (in
, split
, startswith
, slicing) aren’t enough—like validating formats, pulling fields with python regex match and capture groups, or normalizing messy input. Keep it pragmatic: start with built-ins; reach for regex when you need character classes, repetition, alternation, or context (lookarounds).
Regex Rules in Python
If you’re new to python regex, this is a compact, practical guide to python and regex with clear examples using the re
module—and, critically, simple non-regex alternatives (e.g., in
, split()
, slicing) for when regex is overkill. Think of this as a “regex cheat sheet python” that favors readability.
- Try built-ins first:
in
,startswith/endswith
,split/partition
,replace
, and slicing (s[:-5]
). - Reach for python re regex when you need character classes, repetition, alternation, or context (lookarounds).
Python ‘re’ Regular Expression
import re # standard library
# Use raw strings to avoid escape-hell:
pattern = r"\d{4}-\d{2}-\d{2}"
- Raw strings (
r""
): keep backslashes literal, sor"\d+"
means “digits” instead of “escape d”.
Core re
APIs (with minimal examples)
re.search
— find the first match anywhere
(python regex search example)
m = re.search(r"\d+", "Order #12345 shipped")
if m:
print(m.group()) # '12345'
re.match
— match only at the start
(python match regex example)
m = re.match(r"[A-Z]{2}\d{3}", "AB123-xyz")
bool(m) # True
Tip: Prefer
search()
unless you specifically want “start of string.” For whole-string validation, usefullmatch()
.
re.fullmatch
— match the entire string
bool(re.fullmatch(r"[A-F0-9]{8}", "9A3B00FF")) # True
re.findall
— return all non-overlapping matches (strings)
re.findall(r"\b\w+\b", "alpha beta") # ['alpha', 'beta']
re.finditer
— iterate Match
objects (more control)
for m in re.finditer(r"\d+", "a12 b345"):
print(m.group(), m.start(), m.end())
re.sub
/ re.subn
— substitution
re.sub(r"\s+", " ", "a b\tc\n") # 'a b c'
re.subn(r"\s+", " ", "a b\tc\n") # ('a b c', 3) -> also returns count
re.split
— split on a pattern
re.split(r"[;,]\s*", "a, b; c") # ['a', 'b', 'c']
re.compile
— precompile when reusing a pattern in a loop
pat = re.compile(r"\b\d{4}\b")
bool(pat.search("year 2025"))
re.escape
— treat arbitrary text as literal
needle = re.escape(".cfg[1]")
re.search(needle, "x.cfg[1] y") # safe literal match
Pattern syntax you’ll actually use
-
Literals:
abc
-
Any char:
.
-
Character classes:
- predefined:
\d
(digit),\w
(word),\s
(space); uppercase = inverse (\D
,\W
,\S
) - custom set:
[A-Za-z_]
, rangesa-z
, negate with[^...]
- predefined:
-
Quantifiers:
*
(0+),+
(1+),?
(0/1),{m}
,{m,n}
- Greedy by default; add
?
for lazy:+?
,*?
,{m,n}?
- Greedy by default; add
-
Anchors & boundaries:
^
(start),$
(end),\b
(word boundary) -
Alternation:
foo|bar
-
Groups:
- capturing:
( ... )
→ python regex capture group - non-capturing:
(?: ... )
- named:
(?P<name> ... )
→ python regex match group by name - backref:
\1
,(?P=name)
- capturing:
-
Lookarounds (zero-width):
(?=...)
ahead,(?!...)
negative ahead(?<=...)
behind,(?<!...)
negative behind
Groups & Match objects (practical)
m = re.search(r"(?P<user>[A-Za-z_]\w*)@(?P<host>[\w.-]+)", "id: user@site.com")
m.group(0) # full match: 'user@site.com'
m.group(1) # 'user'
m.group('host') # 'site.com'
m.groups() # ('user', 'site.com')
m.groupdict() # {'user': 'user', 'host': 'site.com'}
m.span() # (4, 19)
This is your canonical python regex match/python regex match group pattern.
Flags you’ll need
re.search(r"abc", "ABC", flags=re.IGNORECASE)
re.search(r"^item", "a\nitem", flags=re.MULTILINE) # ^ and $ per line
re.search(r".+", "line1\nline2", flags=re.DOTALL) # . matches newline
re.compile(r"""
^\s*([A-Za-z_]\w*) # name
\s*=\s*
(.+) # value
$""", flags=re.VERBOSE | re.MULTILINE)
re.IGNORECASE
/I
re.MULTILINE
/M
re.DOTALL
/S
re.VERBOSE
/X
(allow comments & whitespace—great for readability)re.ASCII
/A
(make\w
,\d
, etc. ASCII-only)
Small, real-world patterns
1) Extract the first integer (python regex search example)
m = re.search(r"\d+", "v=42; threshold=100")
num = int(m.group()) if m else None
2) Validate a simple SKU (python match regex example)
bool(re.fullmatch(r"[A-Z]{2}-\d{4}", "AB-2048"))
3) Split on multiple delimiters
parts = re.split(r"[|;,]\s*", "a|b; c, d") # ['a','b','c','d']
4) Normalize whitespace
clean = re.sub(r"\s+", " ", "a b\tc\n").strip()
5) Parse a date with named groups
m = re.fullmatch(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})", "2025-08-11")
(y, m_, d) = (int(m['y']), int(m['m']), int(m['d'])) if m else (None, None, None)
Non-regex alternatives (simpler is better)
Often, you don’t need python and regex at all. These are cleaner and less error-prone.
Contains / starts / ends
if "ERROR" in line: ...
if path.endswith(".csv"): ...
if slug.startswith("user-"): ...
Split / partition (safer than split(':')
+ indexing)
key, _, value = line.partition(":")
left, sep, right = text.partition("|") # 'sep' is '' if not found
Replace
msg = msg.replace("\t", " ").replace(" ", " ")
Slicing
basename = filename[:-4] # drop '.txt'
last5 = s[-5:] # last 5 chars
first_token = s.split()[0] if s.split() else ""
Simple numeric scrape without regex
digits = []
for ch in s:
if ch.isdigit():
digits.append(ch)
elif digits:
break
num = int("".join(digits)) if digits else None
Common pitfalls (and fixes)
- Forgetting raw strings: write
r"\bword\b"
, not"\\bword\\b"
. - Greedy vs. lazy:
".*"
will eat too much; prefer".*?"
when you mean “shortest.” - Using
match()
when you meantsearch()
:match()
is anchored at start. - Whole-string validation: use
fullmatch()
(not^...$
plussearch()
—just use the right API). - Reusing patterns in loops:
pat = re.compile(r"...")
for speed and clarity. - Escaping user input: wrap with
re.escape()
before mixing into patterns.
Minimal “toolbelt” you’ll rely on
search
,fullmatch
,finditer
,sub
,split
,compile
\d \w \s
, sets[...]
, quantifiers{m,n}
, groups()
, named groups(?P<name>...)
- Flags:
IGNORECASE
,MULTILINE
,DOTALL
,VERBOSE
- Raw strings
r""
With these, you can handle almost every python regex task you’ll meet as a beginner—while still choosing the simpler Pythonic alternative when it’s clearer.
If you want, give me a couple of your recurring text tasks and I’ll provide both versions: a clean non-regex approach and a compact regex version.
When Should You Use Regex in Python
Most of the time you should skip using regex in Python for small, one-off text (string) jobs. The runtime difference is noise; readability dominates.
Here’s a crisp way to think about it.
When to Typically Avoid Regex in Python
Prefer built-ins because they’re faster to read and harder to get subtly wrong:
- Trim stuff →
str.strip
,str.removeprefix
,str.removesuffix
- Split and iterate →
str.split
,str.splitlines
- Cut around a token →
str.partition
/rpartition
- Simple find →
str.find
/str.index
- Replace →
str.replace
- Keep/drop last/first N → slicing
s[:-5]
,s[-5:]
- Normalize whitespace →
' '.join(s.split())
When to Use RegEx in Python
- You need character classes (digits, words, unicode categories) or repetition.
- You need alternation across many tokens that would be clumsy with
replace
. - You need overlapping or context-sensitive matches.
- You must handle ill-formed input where structure isn’t reliable.
If you do reach for regex: use raw strings (r""
), non-capturing groups (?:...)
, and re.finditer
with named groups only when you truly need them.
Minimal, legible patterns (no “show-off” one-liners)
1) Key—value line into a dict (no regex)
line = "Name: John Doe | Age: 34 | City: St. Paul"
result = {}
for part in line.split(" | "):
key, _, value = part.partition(": ")
result[key] = value
2) Drop suffix/prefix (Py3.9+)
s = "report_2025.csv"
s = s.removesuffix(".csv")
s = s.removeprefix("report_")
3) Take everything but last 5 chars
s = s[:-5]
4) Normalize spaces and trim
clean = " ".join(s.split())
5) Split on either comma or semicolon (still no regex)
text = "a,b;c,d"
for ch in ",;":
text = text.replace(ch, "|")
parts = [p for p in text.split("|") if p]
(If the delimiter list gets long or has patterns, switch to re.split(r"[;,|/]\s*")
.)
6) Extract the first integer
Readable without regex:
num = None
buf = []
for ch in s:
if ch.isdigit():
buf.append(ch)
elif buf:
break
if buf:
num = int("".join(buf))
If you don’t care about decimals/negatives and just want the first run of digits, regex is simpler and still readable:
import re
m = re.search(r"\d+", s)
num = int(m.group()) if m else None
7) Before/after a marker
head, sep, tail = s.partition("TOKEN")
before = head
after = tail if sep else ""
8) “CSV-ish” text
If commas can be quoted/escaped, use the stdlib and avoid cleverness:
import csv
rows = list(csv.reader(s.splitlines()))
Python RegEx Examples
#!/usr/bin/env python3
"""
python_regex_cheatsheet_demo.py
A single-file, runnable tour of practical Python regex patterns
using the standard-library `re` module—paired with simple non-regex
alternatives when they're clearer.
The script favors:
- raw strings for patterns (r"...")
- minimal patterns you'll remember
- readable multi-line code (no show-off one-liners)
"""
import re
from typing import Iterable, Iterator, Optional, Tuple
def show(title: str, value) -> None:
"""Small helper to keep output readable."""
print(f"\n=== {title}")
print(value)
def demo_search_match_fullmatch() -> None:
s = "Order #12345 shipped"
# search(): find a match anywhere in the string (first occurrence)
m = re.search(r"\d+", s)
show("search() first run of digits", m.group() if m else None)
# match(): anchored at the START of the string
m2 = re.match(r"Order", s)
show("match() at start", bool(m2))
m3 = re.match(r"\d+", s) # won't match: string doesn't start with a digit
show("match() wrong anchor", bool(m3))
# fullmatch(): the *entire* string must match the pattern
sku = "AB-2048"
is_valid_sku = bool(re.fullmatch(r"[A-Z]{2}-\d{4}", sku))
show("fullmatch() SKU", is_valid_sku)
def demo_findall_finditer() -> None:
s = "IDs: a12 b345 c6"
# findall(): returns a list of strings (or tuples if groups present)
ids = re.findall(r"\d+", s)
show("findall() digits", ids)
# finditer(): yields Match objects (start/end, groups, etc.)
spans = [(m.group(), m.start(), m.end()) for m in re.finditer(r"\d+", s)]
show("finditer() digits with spans", spans)
def demo_groups_and_named_groups() -> None:
email = "id: user@site.com"
m = re.search(r"(?P<user>[A-Za-z_]\w*)@(?P<host>[\w.-]+)", email)
if m:
show("group(0) full", m.group(0))
show("group(1) user", m.group(1))
show("group('host') host", m.group("host"))
show("groups() tuple", m.groups())
show("groupdict()", m.groupdict())
else:
show("groups", None)
def demo_sub_and_subn() -> None:
messy = "a b\tc\n d"
# Collapse any run of whitespace to a single space and trim
clean = re.sub(r"\s+", " ", messy).strip()
show("sub() whitespace normalize", clean)
# subn(): same but also returns the substitution count
clean2, count = re.subn(r"\s+", " ", messy)
show("subn() normalized, count", (clean2.strip(), count))
# Use a function to transform matches (capitalize words)
text = "hello world\tpython"
def titlecase(m: re.Match) -> str:
return m.group(0).capitalize()
titled = re.sub(r"\b[a-z]+\b", titlecase, text)
show("sub() function: titlecase words", titled)
def demo_split() -> None:
s = "a|b; c, d"
parts = re.split(r"[|;,]\s*", s)
show("split() on multiple delimiters", parts)
def demo_flags() -> None:
s = "Line1\nline2\nLINE3"
# IGNORECASE makes casing irrelevant
ic = bool(re.search(r"line3", s, flags=re.IGNORECASE))
show("IGNORECASE", ic)
# MULTILINE makes ^ and $ work per line instead of whole string
# Find lines starting with "line" (case-insensitive)
starts = re.findall(r"^line\w*", s, flags=re.IGNORECASE | re.MULTILINE)
show("MULTILINE ^ per line", starts)
# DOTALL makes '.' match newline too
dotall = re.search(r"Line1.+LINE3", s, flags=re.DOTALL | re.IGNORECASE)
show("DOTALL spans newlines", bool(dotall))
# VERBOSE lets you write readable multi-line patterns with comments
pat = re.compile(
r"""
^\s* # start of line, optional whitespace
(?P<name>[A-Za-z_]\w*) # variable name
\s*=\s* # equals with optional spaces
(?P<value>.+?) # value (non-greedy)
\s*$ # trailing whitespace to end of line
""",
flags=re.VERBOSE | re.MULTILINE,
)
config = "user_name = user\nthreads= 8\n debug = true "
matches = [m.groupdict() for m in pat.finditer(config)]
show("VERBOSE pattern parse", matches)
def demo_lookarounds() -> None:
s = "price:$19, tax:$2, total:$21"
# Positive lookbehind/ahead: find numbers surrounded by `$` and comma or end
nums = re.findall(r"(?<=\$)\d+(?=,|$)", s)
show("lookarounds extract dollar amounts", nums)
# Negative lookahead: find 'cat' not followed by 'erpillar'
animals = "cat caterpillar catfish cat cart"
cats = re.findall(r"cat(?!erpillar)", animals)
show("negative lookahead", cats)
def demo_escape_user_input() -> None:
# If user gives a literal string that may contain regex metacharacters,
# escape it before building a pattern.
user_literal = ".cfg[1]"
safe = re.escape(user_literal)
text = "x.cfg[1] y.cfg[2]"
m = re.search(safe, text) # safe literal match
show("re.escape() literal search", m.group(0) if m else None)
def demo_compile_reuse() -> None:
# If you reuse a pattern in a loop, compile it once.
lines = [
"AB-0001 ok",
"ZZ-9999 fail",
"XY-0420 ok",
"BAD LINE",
]
pat = re.compile(r"(?P<pfx>[A-Z]{2})-(?P<num>\d{4})")
parsed = []
for line in lines:
m = pat.search(line)
if not m:
continue
parsed.append((m["pfx"], int(m["num"])))
show("compile() reuse in loop", parsed)
def demo_non_regex_alternatives() -> None:
s = "report_2025.csv"
# Prefer simple string methods when the task is simple.
has_csv = s.endswith(".csv")
base = s.removesuffix(".csv") # Python 3.9+
base = base.removeprefix("report_")
show("non-regex endswith/removesuffix/removeprefix", (has_csv, base))
line = "Name: John Doe | Age: 34 | City: St. Paul"
result = {}
for part in line.split(" | "):
key, _, value = part.partition(": ")
result[key] = value
show("non-regex split/partition key-value", result)
# Extract first integer without regex (readable loop)
stream = "v=42; threshold=100"
digits: list[str] = []
for ch in stream:
if ch.isdigit():
digits.append(ch)
elif digits:
break
num = int("".join(digits)) if digits else None
show("non-regex first integer", num)
def demo_match_objects_api() -> None:
s = "Item X23 cost $9 on 2025-08-11"
m = re.search(r"(?P<code>[A-Z]\d{2}).*?\$(?P<price>\d+)", s)
if not m:
show("match object", None)
return
show("match.group(0) full", m.group(0))
show("match.groupdict()", m.groupdict())
show("match.span('price')", m.span("price"))
def demo_validation_examples() -> None:
tests = [
"9A3B00FF",
"A0Z19",
"FFFFFFFF",
]
# Hex token: exactly 8 uppercase hex chars
pat = re.compile(r"[A-F0-9]{8}$")
results = [bool(pat.fullmatch(t)) for t in tests]
show("validation: 8-char hex", list(zip(tests, results)))
emails = ["a@b.co", "bad@", "@oops.com", "x@y.z"]
# Simple email (not RFC-perfect; good enough for demo)
email_pat = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")
show("validation: email", [bool(email_pat.fullmatch(e)) for e in emails])
def demo_greedy_vs_lazy() -> None:
s = "<tag>alpha</tag><tag>beta</tag>"
greedy = re.findall(r"<tag>.*</tag>", s) # one big match
lazy = re.findall(r"<tag>.*?</tag>", s) # two small matches
show("greedy vs lazy", {"greedy": greedy, "lazy": lazy})
def demo_multiline_block_extraction() -> None:
log = """
[INFO] start
[WARN] low disk
stack:
line1
line2
[INFO] end
""".strip()
# Extract blocks that start with [WARN] and include following indented lines.
# DOTALL to allow '.' to cross newlines; use a non-greedy quantifier.
warn_block = re.search(r"^\[WARN\].*?(?:\n\s+.*)*", log, flags=re.MULTILINE)
show("block extraction after [WARN]", warn_block.group(0) if warn_block else None)
def demo_backreferences() -> None:
s = "word1=foo; word2=foo; word3=bar"
# Find pairs like x=VALUE; ... x=VALUE (same VALUE) using a backreference
same = re.findall(r"word1=(\w+).*?word2=\1", s)
show("backreference same value", same)
def demo_unicode_word_boundaries() -> None:
s = "naïve café resume résumé"
# \w with default (Unicode) includes letters with accents
words = re.findall(r"\b\w+\b", s)
show(r"Unicode \w word extraction", words)
# If you want ASCII-only behavior, pass re.ASCII
words_ascii = re.findall(r"\b\w+\b", s, flags=re.ASCII)
show(r"ASCII-only \w", words_ascii)
def demo_error_handling() -> None:
# Demonstrate safe handling when no match exists
s = "no numbers here"
m = re.search(r"\d+", s)
value = int(m.group()) if m else None
show("safe no-match handling", value)
def iterate_file_like(lines: Iterable[str], pat: re.Pattern) -> Iterator[Tuple[int, str]]:
"""Yield (lineno, line) where pattern matches (useful in grep-like tools)."""
for idx, line in enumerate(lines, 1):
if pat.search(line):
yield idx, line.rstrip("\n")
def demo_compile_and_scan_text() -> None:
text = [
"2025-08-11 INFO Server started",
"2025-08-11 WARN Disk 90%",
"2025-08-11 ERROR Out of space",
"2025-08-12 INFO Cleanup done",
]
# Simple "grep" for ERROR or WARN
pat = re.compile(r"\b(?:ERROR|WARN)\b")
hits = list(iterate_file_like(text, pat))
show("grep-like scan", hits)
def main() -> None:
demo_search_match_fullmatch()
demo_findall_finditer()
demo_groups_and_named_groups()
demo_sub_and_subn()
demo_split()
demo_flags()
demo_lookarounds()
demo_escape_user_input()
demo_compile_reuse()
demo_non_regex_alternatives()
demo_match_objects_api()
demo_validation_examples()
demo_greedy_vs_lazy()
demo_multiline_block_extraction()
demo_backreferences()
demo_unicode_word_boundaries()
demo_error_handling()
demo_compile_and_scan_text()
if __name__ == "__main__":
main()