ChatGPT & Benji Asperheim— Mon Aug 11th, 2025

Python Regex Cheat Sheet

This guide is the “just enough to be dangerous” version of python regex. It shows how to use the re module for real tasks—search, match, fullmatch, capture groups, and flags—without turning simple jobs into unreadable puzzles. You’ll also see direct, plain-Python alternatives (in, split(), partition(), slicing) so you don’t reach for regex when a one-liner will do.

We’ll keep it practical:

When to use python and regex vs. when to stick with string methods.
A small set of patterns you’ll actually remember.
Clear python regex search example and python match regex example snippets.
How to read and use a python regex capture group and a named python regex match group.
Common pitfalls (greedy vs. lazy, match vs. search) and how to avoid them.

If you’re looking for a python regex cheat sheet (or “regex cheat sheet python” if that’s what you typed), this article favors legible solutions first. When regex is the right tool, we’ll show the minimal pattern that solves the problem—no “clever” one-liners, no unnecessary abstraction, just code you’ll still understand next week.

What is Regex in Python?

Regex (regular expressions)—sometimes written regexp—is a compact mini-language for describing text patterns. In Python, regex lives in the standard-library re module (“python re regex”). If you’ve ever asked “what is regex in Python?” or “what is regexp?”, the short answer is: it’s how you search, match, extract, and replace structured text with precision. Use it when simple methods (in, split, startswith, slicing) aren’t enough—like validating formats, pulling fields with python regex match and capture groups, or normalizing messy input. Keep it pragmatic: start with built-ins; reach for regex when you need character classes, repetition, alternation, or context (lookarounds).

Regex Rules in Python

If you’re new to python regex, this is a compact, practical guide to python and regex with clear examples using the re module—and, critically, simple non-regex alternatives (e.g., in, split(), slicing) for when regex is overkill. Think of this as a “regex cheat sheet python” that favors readability.

Try built-ins first: in, startswith/endswith, split/partition, replace, and slicing (s[:-5]).
Reach for python re regex when you need character classes, repetition, alternation, or context (lookarounds).

Python ‘re’ Regular Expression

import re  # standard library
# Use raw strings to avoid escape-hell:
pattern = r"\d{4}-\d{2}-\d{2}"

Raw strings (r""): keep backslashes literal, so r"\d+" means “digits” instead of “escape d”.

Core `re` APIs (with minimal examples)

`re.search` — find the first match anywhere

(python regex search example)

m = re.search(r"\d+", "Order #12345 shipped")
if m:
    print(m.group())  # '12345'

`re.match` — match only at the start

(python match regex example)

m = re.match(r"[A-Z]{2}\d{3}", "AB123-xyz")
bool(m)  # True

Tip: Prefer search() unless you specifically want “start of string.” For whole-string validation, use fullmatch().

`re.fullmatch` — match the entire string

bool(re.fullmatch(r"[A-F0-9]{8}", "9A3B00FF"))  # True

`re.findall` — return all non-overlapping matches (strings)

re.findall(r"\b\w+\b", "alpha beta")  # ['alpha', 'beta']

`re.finditer` — iterate `Match` objects (more control)

for m in re.finditer(r"\d+", "a12 b345"):
    print(m.group(), m.start(), m.end())

`re.sub` / `re.subn` — substitution

re.sub(r"\s+", " ", "a   b\tc\n")      # 'a b c'
re.subn(r"\s+", " ", "a   b\tc\n")     # ('a b c', 3) -> also returns count

`re.split` — split on a pattern

re.split(r"[;,]\s*", "a, b; c")  # ['a', 'b', 'c']

`re.compile` — precompile when reusing a pattern in a loop

pat = re.compile(r"\b\d{4}\b")
bool(pat.search("year 2025"))

`re.escape` — treat arbitrary text as literal

needle = re.escape(".cfg[1]")
re.search(needle, "x.cfg[1] y")  # safe literal match

Pattern syntax you’ll actually use

Literals: abc
Any char: .
Character classes:
- predefined: \d (digit), \w (word), \s (space); uppercase = inverse (\D, \W, \S)
- custom set: [A-Za-z_], ranges a-z, negate with [^...]
Quantifiers: * (0+), + (1+), ? (0/1), {m}, {m,n}
- Greedy by default; add ? for lazy: +?, *?, {m,n}?
Anchors & boundaries: ^ (start), $ (end), \b (word boundary)
Alternation: foo|bar
Groups:
- capturing: ( ... ) → python regex capture group
- non-capturing: (?: ... )
- named: (?P<name> ... ) → python regex match group by name
- backref: \1, (?P=name)
Lookarounds (zero-width):
- (?=...) ahead, (?!...) negative ahead
- (?<=...) behind, (?<!...) negative behind

Groups & Match objects (practical)

m = re.search(r"(?P<user>[A-Za-z_]\w*)@(?P<host>[\w.-]+)", "id: user@site.com")
m.group(0)         # full match: 'user@site.com'
m.group(1)         # 'user'
m.group('host')    # 'site.com'
m.groups()         # ('user', 'site.com')
m.groupdict()      # {'user': 'user', 'host': 'site.com'}
m.span()           # (4, 19)

This is your canonical python regex match/python regex match group pattern.

Flags you’ll need

re.search(r"abc", "ABC", flags=re.IGNORECASE)
re.search(r"^item", "a\nitem", flags=re.MULTILINE)   # ^ and $ per line
re.search(r".+", "line1\nline2", flags=re.DOTALL)    # . matches newline
re.compile(r"""
    ^\s*([A-Za-z_]\w*)  # name
    \s*=\s*
    (.+)                # value
$""", flags=re.VERBOSE | re.MULTILINE)

re.IGNORECASE / I
re.MULTILINE / M
re.DOTALL / S
re.VERBOSE / X (allow comments & whitespace—great for readability)
re.ASCII / A (make \w, \d, etc. ASCII-only)

Small, real-world patterns

1) Extract the first integer (python regex search example)

m = re.search(r"\d+", "v=42; threshold=100")
num = int(m.group()) if m else None

2) Validate a simple SKU (python match regex example)

bool(re.fullmatch(r"[A-Z]{2}-\d{4}", "AB-2048"))

3) Split on multiple delimiters

parts = re.split(r"[|;,]\s*", "a|b; c, d")  # ['a','b','c','d']

4) Normalize whitespace

clean = re.sub(r"\s+", " ", "a   b\tc\n").strip()

5) Parse a date with named groups

m = re.fullmatch(r"(?P<y>\d{4})-(?P<m>\d{2})-(?P<d>\d{2})", "2025-08-11")
(y, m_, d) = (int(m['y']), int(m['m']), int(m['d'])) if m else (None, None, None)

Non-regex alternatives (simpler is better)

Often, you don’t need python and regex at all. These are cleaner and less error-prone.

Contains / starts / ends

if "ERROR" in line: ...
if path.endswith(".csv"): ...
if slug.startswith("user-"): ...

Split / partition (safer than `split(':')` + indexing)

key, _, value = line.partition(":")
left, sep, right = text.partition("|")  # 'sep' is '' if not found

Replace

msg = msg.replace("\t", " ").replace("  ", " ")

Slicing

basename = filename[:-4]          # drop '.txt'
last5 = s[-5:]                    # last 5 chars
first_token = s.split()[0] if s.split() else ""

Simple numeric scrape without regex

digits = []
for ch in s:
    if ch.isdigit():
        digits.append(ch)
    elif digits:
        break
num = int("".join(digits)) if digits else None

Common pitfalls (and fixes)

Forgetting raw strings: write r"\bword\b", not "\\bword\\b".
Greedy vs. lazy: ".*" will eat too much; prefer ".*?" when you mean “shortest.”
Using match() when you meant search(): match() is anchored at start.
Whole-string validation: use fullmatch() (not ^...$ plus search()—just use the right API).
Reusing patterns in loops: pat = re.compile(r"...") for speed and clarity.
Escaping user input: wrap with re.escape() before mixing into patterns.

Minimal “toolbelt” you’ll rely on

search, fullmatch, finditer, sub, split, compile
\d \w \s, sets [...], quantifiers {m,n}, groups (), named groups (?P<name>...)
Flags: IGNORECASE, MULTILINE, DOTALL, VERBOSE
Raw strings r""

With these, you can handle almost every python regex task you’ll meet as a beginner—while still choosing the simpler Pythonic alternative when it’s clearer.

If you want, give me a couple of your recurring text tasks and I’ll provide both versions: a clean non-regex approach and a compact regex version.

When Should You Use Regex in Python

Most of the time you should skip using regex in Python for small, one-off text (string) jobs. The runtime difference is noise; readability dominates.

Here’s a crisp way to think about it.

When to Typically Avoid Regex in Python

Prefer built-ins because they’re faster to read and harder to get subtly wrong:

Trim stuff → str.strip, str.removeprefix, str.removesuffix
Split and iterate → str.split, str.splitlines
Cut around a token → str.partition / rpartition
Simple find → str.find / str.index
Replace → str.replace
Keep/drop last/first N → slicing s[:-5], s[-5:]
Normalize whitespace → ' '.join(s.split())

When to Use RegEx in Python

You need character classes (digits, words, unicode categories) or repetition.
You need alternation across many tokens that would be clumsy with replace.
You need overlapping or context-sensitive matches.
You must handle ill-formed input where structure isn’t reliable.

If you do reach for regex: use raw strings (r""), non-capturing groups (?:...), and re.finditer with named groups only when you truly need them.

Minimal, legible patterns (no “show-off” one-liners)

1) Key—value line into a dict (no regex)

line = "Name: John Doe | Age: 34 | City: St. Paul"
result = {}
for part in line.split(" | "):
    key, _, value = part.partition(": ")
    result[key] = value

2) Drop suffix/prefix (Py3.9+)

s = "report_2025.csv"
s = s.removesuffix(".csv")
s = s.removeprefix("report_")

3) Take everything but last 5 chars

s = s[:-5]

4) Normalize spaces and trim

clean = " ".join(s.split())

5) Split on either comma or semicolon (still no regex)

text = "a,b;c,d"
for ch in ",;":
    text = text.replace(ch, "|")
parts = [p for p in text.split("|") if p]

(If the delimiter list gets long or has patterns, switch to re.split(r"[;,|/]\s*").)

6) Extract the first integer

Readable without regex:

num = None
buf = []
for ch in s:
    if ch.isdigit():
        buf.append(ch)
    elif buf:
        break
if buf:
    num = int("".join(buf))

If you don’t care about decimals/negatives and just want the first run of digits, regex is simpler and still readable:

import re
m = re.search(r"\d+", s)
num = int(m.group()) if m else None

7) Before/after a marker

head, sep, tail = s.partition("TOKEN")
before = head
after  = tail if sep else ""

8) “CSV-ish” text

If commas can be quoted/escaped, use the stdlib and avoid cleverness:

import csv
rows = list(csv.reader(s.splitlines()))

Python RegEx Examples

#!/usr/bin/env python3
"""
python_regex_cheatsheet_demo.py

A single-file, runnable tour of practical Python regex patterns
using the standard-library `re` module—paired with simple non-regex
alternatives when they're clearer.

The script favors:
- raw strings for patterns (r"...")
- minimal patterns you'll remember
- readable multi-line code (no show-off one-liners)
"""

import re
from typing import Iterable, Iterator, Optional, Tuple


def show(title: str, value) -> None:
    """Small helper to keep output readable."""
    print(f"\n=== {title}")
    print(value)


def demo_search_match_fullmatch() -> None:
    s = "Order #12345 shipped"

    # search(): find a match anywhere in the string (first occurrence)
    m = re.search(r"\d+", s)
    show("search() first run of digits", m.group() if m else None)

    # match(): anchored at the START of the string
    m2 = re.match(r"Order", s)
    show("match() at start", bool(m2))

    m3 = re.match(r"\d+", s)  # won't match: string doesn't start with a digit
    show("match() wrong anchor", bool(m3))

    # fullmatch(): the *entire* string must match the pattern
    sku = "AB-2048"
    is_valid_sku = bool(re.fullmatch(r"[A-Z]{2}-\d{4}", sku))
    show("fullmatch() SKU", is_valid_sku)


def demo_findall_finditer() -> None:
    s = "IDs: a12 b345 c6"
    # findall(): returns a list of strings (or tuples if groups present)
    ids = re.findall(r"\d+", s)
    show("findall() digits", ids)

    # finditer(): yields Match objects (start/end, groups, etc.)
    spans = [(m.group(), m.start(), m.end()) for m in re.finditer(r"\d+", s)]
    show("finditer() digits with spans", spans)


def demo_groups_and_named_groups() -> None:
    email = "id: user@site.com"
    m = re.search(r"(?P<user>[A-Za-z_]\w*)@(?P<host>[\w.-]+)", email)
    if m:
        show("group(0) full", m.group(0))
        show("group(1) user", m.group(1))
        show("group('host') host", m.group("host"))
        show("groups() tuple", m.groups())
        show("groupdict()", m.groupdict())
    else:
        show("groups", None)


def demo_sub_and_subn() -> None:
    messy = "a   b\tc\n  d"
    # Collapse any run of whitespace to a single space and trim
    clean = re.sub(r"\s+", " ", messy).strip()
    show("sub() whitespace normalize", clean)

    # subn(): same but also returns the substitution count
    clean2, count = re.subn(r"\s+", " ", messy)
    show("subn() normalized, count", (clean2.strip(), count))

    # Use a function to transform matches (capitalize words)
    text = "hello  world\tpython"
    def titlecase(m: re.Match) -> str:
        return m.group(0).capitalize()

    titled = re.sub(r"\b[a-z]+\b", titlecase, text)
    show("sub() function: titlecase words", titled)


def demo_split() -> None:
    s = "a|b; c, d"
    parts = re.split(r"[|;,]\s*", s)
    show("split() on multiple delimiters", parts)


def demo_flags() -> None:
    s = "Line1\nline2\nLINE3"

    # IGNORECASE makes casing irrelevant
    ic = bool(re.search(r"line3", s, flags=re.IGNORECASE))
    show("IGNORECASE", ic)

    # MULTILINE makes ^ and $ work per line instead of whole string
    # Find lines starting with "line" (case-insensitive)
    starts = re.findall(r"^line\w*", s, flags=re.IGNORECASE | re.MULTILINE)
    show("MULTILINE ^ per line", starts)

    # DOTALL makes '.' match newline too
    dotall = re.search(r"Line1.+LINE3", s, flags=re.DOTALL | re.IGNORECASE)
    show("DOTALL spans newlines", bool(dotall))

    # VERBOSE lets you write readable multi-line patterns with comments
    pat = re.compile(
        r"""
        ^\s*                         # start of line, optional whitespace
        (?P<name>[A-Za-z_]\w*)       # variable name
        \s*=\s*                      # equals with optional spaces
        (?P<value>.+?)               # value (non-greedy)
        \s*$                         # trailing whitespace to end of line
        """,
        flags=re.VERBOSE | re.MULTILINE,
    )
    config = "user_name = user\nthreads= 8\n  debug = true  "
    matches = [m.groupdict() for m in pat.finditer(config)]
    show("VERBOSE pattern parse", matches)


def demo_lookarounds() -> None:
    s = "price:$19, tax:$2, total:$21"
    # Positive lookbehind/ahead: find numbers surrounded by `$` and comma or end
    nums = re.findall(r"(?<=\$)\d+(?=,|$)", s)
    show("lookarounds extract dollar amounts", nums)

    # Negative lookahead: find 'cat' not followed by 'erpillar'
    animals = "cat caterpillar catfish cat cart"
    cats = re.findall(r"cat(?!erpillar)", animals)
    show("negative lookahead", cats)


def demo_escape_user_input() -> None:
    # If user gives a literal string that may contain regex metacharacters,
    # escape it before building a pattern.
    user_literal = ".cfg[1]"
    safe = re.escape(user_literal)
    text = "x.cfg[1] y.cfg[2]"
    m = re.search(safe, text)  # safe literal match
    show("re.escape() literal search", m.group(0) if m else None)


def demo_compile_reuse() -> None:
    # If you reuse a pattern in a loop, compile it once.
    lines = [
        "AB-0001 ok",
        "ZZ-9999 fail",
        "XY-0420 ok",
        "BAD LINE",
    ]
    pat = re.compile(r"(?P<pfx>[A-Z]{2})-(?P<num>\d{4})")

    parsed = []
    for line in lines:
        m = pat.search(line)
        if not m:
            continue
        parsed.append((m["pfx"], int(m["num"])))
    show("compile() reuse in loop", parsed)


def demo_non_regex_alternatives() -> None:
    s = "report_2025.csv"

    # Prefer simple string methods when the task is simple.
    has_csv = s.endswith(".csv")
    base = s.removesuffix(".csv")  # Python 3.9+
    base = base.removeprefix("report_")
    show("non-regex endswith/removesuffix/removeprefix", (has_csv, base))

    line = "Name: John Doe | Age: 34 | City: St. Paul"
    result = {}
    for part in line.split(" | "):
        key, _, value = part.partition(": ")
        result[key] = value
    show("non-regex split/partition key-value", result)

    # Extract first integer without regex (readable loop)
    stream = "v=42; threshold=100"
    digits: list[str] = []
    for ch in stream:
        if ch.isdigit():
            digits.append(ch)
        elif digits:
            break
    num = int("".join(digits)) if digits else None
    show("non-regex first integer", num)


def demo_match_objects_api() -> None:
    s = "Item X23 cost $9 on 2025-08-11"
    m = re.search(r"(?P<code>[A-Z]\d{2}).*?\$(?P<price>\d+)", s)
    if not m:
        show("match object", None)
        return
    show("match.group(0) full", m.group(0))
    show("match.groupdict()", m.groupdict())
    show("match.span('price')", m.span("price"))


def demo_validation_examples() -> None:
    tests = [
        "9A3B00FF",
        "A0Z19",
        "FFFFFFFF",
    ]
    # Hex token: exactly 8 uppercase hex chars
    pat = re.compile(r"[A-F0-9]{8}$")
    results = [bool(pat.fullmatch(t)) for t in tests]
    show("validation: 8-char hex", list(zip(tests, results)))

    emails = ["a@b.co", "bad@", "@oops.com", "x@y.z"]
    # Simple email (not RFC-perfect; good enough for demo)
    email_pat = re.compile(r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$")
    show("validation: email", [bool(email_pat.fullmatch(e)) for e in emails])


def demo_greedy_vs_lazy() -> None:
    s = "<tag>alpha</tag><tag>beta</tag>"
    greedy = re.findall(r"<tag>.*</tag>", s)        # one big match
    lazy = re.findall(r"<tag>.*?</tag>", s)         # two small matches
    show("greedy vs lazy", {"greedy": greedy, "lazy": lazy})


def demo_multiline_block_extraction() -> None:
    log = """
    [INFO] start
    [WARN] low disk
    stack:
      line1
      line2
    [INFO] end
    """.strip()

    # Extract blocks that start with [WARN] and include following indented lines.
    # DOTALL to allow '.' to cross newlines; use a non-greedy quantifier.
    warn_block = re.search(r"^\[WARN\].*?(?:\n\s+.*)*", log, flags=re.MULTILINE)
    show("block extraction after [WARN]", warn_block.group(0) if warn_block else None)


def demo_backreferences() -> None:
    s = "word1=foo; word2=foo; word3=bar"
    # Find pairs like x=VALUE; ... x=VALUE (same VALUE) using a backreference
    same = re.findall(r"word1=(\w+).*?word2=\1", s)
    show("backreference same value", same)


def demo_unicode_word_boundaries() -> None:
    s = "naïve café resume résumé"
    # \w with default (Unicode) includes letters with accents
    words = re.findall(r"\b\w+\b", s)
    show(r"Unicode \w word extraction", words)

    # If you want ASCII-only behavior, pass re.ASCII
    words_ascii = re.findall(r"\b\w+\b", s, flags=re.ASCII)
    show(r"ASCII-only \w", words_ascii)


def demo_error_handling() -> None:
    # Demonstrate safe handling when no match exists
    s = "no numbers here"
    m = re.search(r"\d+", s)
    value = int(m.group()) if m else None
    show("safe no-match handling", value)


def iterate_file_like(lines: Iterable[str], pat: re.Pattern) -> Iterator[Tuple[int, str]]:
    """Yield (lineno, line) where pattern matches (useful in grep-like tools)."""
    for idx, line in enumerate(lines, 1):
        if pat.search(line):
            yield idx, line.rstrip("\n")


def demo_compile_and_scan_text() -> None:
    text = [
        "2025-08-11 INFO Server started",
        "2025-08-11 WARN Disk 90%",
        "2025-08-11 ERROR Out of space",
        "2025-08-12 INFO Cleanup done",
    ]
    # Simple "grep" for ERROR or WARN
    pat = re.compile(r"\b(?:ERROR|WARN)\b")
    hits = list(iterate_file_like(text, pat))
    show("grep-like scan", hits)


def main() -> None:
    demo_search_match_fullmatch()
    demo_findall_finditer()
    demo_groups_and_named_groups()
    demo_sub_and_subn()
    demo_split()
    demo_flags()
    demo_lookarounds()
    demo_escape_user_input()
    demo_compile_reuse()
    demo_non_regex_alternatives()
    demo_match_objects_api()
    demo_validation_examples()
    demo_greedy_vs_lazy()
    demo_multiline_block_extraction()
    demo_backreferences()
    demo_unicode_word_boundaries()
    demo_error_handling()
    demo_compile_and_scan_text()


if __name__ == "__main__":
    main()