Intro to Bash Regular Expressions

Regular expressions (regex or regexp) in Bash are patterns used for matching and manipulating strings. They provide a powerful and flexible way to search, match, and manipulate text in Bash scripts. The basic syntax for regular expressions in Bash is similar to that of other programming languages, but there are some specific nuances to be aware of.

Characters: These retain their literal meaning, e.g., "a" matches the letter "a".

Metacharacters: These have special meanings, like:
  1. . :Matches any single character (except newline).
  2. ^ :Matches the beginning of the line.
  3. $:Matches the end of the line.
  4. []:Matches one character from a set (e.g., [aeiou] matches vowels).
  5. *:Matches the preceding character zero or more times.
  6. +:Matches the preceding character one or more times.
  7. ?:Matches the preceding character zero or one time.
  8. \:Escapes the special meaning of the following character.

Here are some basic concepts and examples to help you understand regular expressions in Bash:

Basic Matching

Syntax:
[[ string =~ regex ]]
Example:
string="Hello, World!" if [[ $string =~ "Hello" ]]; then echo "Match found!" fi

Character Classes

Syntax:
.: Matches any character. []: Matches any one of the characters inside the brackets. [^]: Matches any character not inside the brackets.
Example:
string="abc123" if [[ $string =~ [0-9] ]]; then echo "Digit found!" fi

Anchors

Syntax:
^: Anchors the regex at the beginning of the string. $: Anchors the regex at the end of the string.
Example:
string="Start with this" if [[ $string =~ ^Start ]]; then echo "Start found at the beginning!" fi

Quantifiers

Syntax:
*: Matches 0 or more occurrences. +: Matches 1 or more occurrences. ?: Matches 0 or 1 occurrence. {n}: Matches exactly n occurrences. {n,}: Matches n or more occurrences. {n,m}: Matches between n and m occurrences.
Example:
string="aaaab" if [[ $string =~ a{3,} ]]; then echo "Three or more 'a' found!" fi

Grouping and Alternation

Syntax:
(): Groups expressions. |: Alternation (matches either of the patterns).
Example:
string="apple" if [[ $string =~ (appleorange) ]]; then echo "Fruit found!" fi

Escape Characters

Syntax:
\: Escapes a special character.
Example:
string="This is a dot: ." if [[ $string =~ \. ]]; then echo "Dot found!" fi

Case-Insensitive Matching

Syntax:
shopt -s nocasematch: Enables case-insensitive matching.
Example:
shopt -s nocasematch string="CaseInsensitive" if [[ $string =~ caseinsensitive ]]; then echo "Case-insensitive match found!" fi

Extract email addresses from a file

grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" emails.txt

This regex matches:

^: Beginning of line. [a-zA-Z0-9._%+-]+: One or more alphanumeric characters, ".", "_", "%", "+", or "-". @: The "@" symbol. [a-zA-Z0-9.-]+: One or more alphanumeric characters, ".", or "-". \.[a-zA-Z]{2,}$: A dot followed by 2 or more letters at the end.

Conclusion

Regular expressions (regex) are patterns used for string matching and manipulation. They employ symbols like =~, [], ^, $, *, +, ?, (), and \ to define and search for patterns in strings, allowing for powerful text processing and manipulation within scripts. Practicing with these basics helps users utilize the full potential of regex in Bash.