Regex / grep: Learning Regex Fast! ++IP Whois CheatSheets!

In this guide we go over the power of regex for all your pattern-matching needs!

Regex / grep: Learning Regex Fast! ++IP Whois CheatSheets!

Regex - Stands for 'Regular Expression'  and is used in pattern matching.

  • Regex has flavors. Python regex may not exactly match Java regex may not exactly match Linux, etc (you need to study each one..)
  • They can be subtly close in expressions.
  • People often struggle with regex for some time because they may lookup one flavor of regex - not realizing it doesn't match the regex engine that they are using.
  • The fastest way to learn regex is in a live feedback loop - where the highlights of your matching results show up in the output.
  • A very good engine that is recommended for live testing is regex101.com
  • Start with a simple regex and build-in, adding a bit at a time.

For our testing text we will use a snippet of the log from a nginx reverse proxy namely:

"Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:53:11 +0000] - 200 200 - GET https www.hotconfig.com "/ghost/api/admin/snippets/?limit=all" [Client 76.39.61.17] [Length 106] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:20 +0000] - 201 201 - POST https www.hotconfig.com "/ghost/api/admin/images/upload/" [Client 76.39.61.17] [Length 120] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:21 +0000] - 200 200 - GET https www.hotconfig.com "/content/images/2025/06/Screenshot-at-2025-06-15-07-54-08.png" [Client 76.39.61.17] [Length 94673] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:21 +0000] - 200 200 - GET https www.hotconfig.com "/ghost/api/admin/slugs/post/Re/" [Client 76.39.61.17] [Length 25] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:21 +0000] - 201 201 - POST https www.hotconfig.com "/ghost/api/admin/posts/" [Client 76.39.61.17] [Length 901] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:22 +0000] - 200 200 - GET https www.hotconfig.com "/ghost/api/admin/settings/?group=site%2Ctheme%2Cprivate%2Cmembers%2Cportal%2Cnewsletter%2Cemail%2Camp%2Clabs%2Cslack%2Cunsplash%2Cviews%2Cfirstpromoter%2Coauth%2Ceditor" [Client 76.39.61.17] [Length 2666] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"
[15/Jun/2025:17:54:22 +0000] - 200 200 - GET https www.hotconfig.com "/ghost/api/admin/newsletters/?filter=status%3Aactive&order=sort_order%20ASC" [Client 76.39.61.17] [Length 741] [Gzip -] [Sent-to 107.152.41.231] "Mozilla/5.0 (Windows NT 10.0; rv:128.0) Gecko/20100101 Firefox/128.0" "https://www.hotconfig.com/ghost/"

Python Regex (Ref / pythex.org)

  • Paste in the above text, then start typing out a regular expression:
You can see as you type your Regex - it will show you your matches.

We can quickly see the matches. So a match to get IP addresses becomes simply:

\[Client [0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}\]

Which we can see  - as we type it - and can quickly discern which characters require escaping, giving us:

Some simple expression examples

[0-9]  			- Matches characters 0-9 (in any order) for 1 character)
[0-9,a-z] 		- Matches characters 0-9 or a-z
[0-9,a-z,A-Z,:] - Matches characters 0-9,a-z,A-Z or :
[0-9]{1,5}:     - Matches characters 0-9 repeating 1-5 times followed by:
[.abc]          - Matches literal character '.' in a set with a,b,c
.               - Means a wildcard character (only outside the braces [])
?               - Means 0 or more occurances
+               - Means 1 or more occurances

For instance the python regex pattern

200?  

Would match '20'  '200' but not '205'. The zero is optional.

Grep Regex (Command Line / IP filtering)

  • grep is the Linux command-line pattern matching language. There isn't really any 'live testing' equivalent, and it stands on its own.
  • grep / egrep / fgrep / rgrep are all really part of grep
egrep = grep -E
fgrep = grep -F
rgrep = grep -R
  • grep can be considered to have 4 levels of pattern matching  depending on the options passed.
  • If none specified -G is the default "Fixed Strings"
  • Never feel bad if you don't know or use it. I've known Linux administrators with a decade of experience cannot use grep. Others rock at it.
cat *.log | grep "GET"

Will display the entire line if the word "GET" is found anywhere in it.

cat *.log | grep -o -F "[warn]"

Will return only the matching text portion of the line that is "Fixed Strings"

[warn]
[warn]
...
cat *.log | grep -o -G "\[warn\]"

Will return the same thing, however -G means standard regex matching and now we must escape the '{' character with a '\{'

cat proxy-host-1_error.log | grep -o -G "client: [0-9]\{1,3\}"

Will return:

client: 216
client: 216
...

However if we use the extended-regex it becomes non-escaped as in:

cat proxy-host-1_error.log | grep -o -E "client: [0-9]{1,3}"

Will return:

client: 216
client: 216
...

So we can see to build up our regex slowly..

Matching an IP..

We can use grep to match all IP addresses in a log with:

cat proxy-host-2_access.log | grep -o -e "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}"

Will return a unsorted, list as it finds it:

We can now use uniq, and sort to quickly give us a ordered list of IP addresses!

cat proxy-host-2_access.log | grep -o -e "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" | sort | uniq

Which is nicely sorted, and unique listed without duplicates.

Feeding grep filtering back into whois:

In one command you can filter an entire IP log, and lookup all the addresses with whois!  The whois root servers may get upset if you run this too much so maybe use sparingly:

cat proxy-host-2_access.log | grep -o -e "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" | sort | uniq | xargs  -I{} whois {} >> collect.txt

Going over this very powerful command:

cat - concatenate or print the file to the console.

| - redirection operator, take the output of the previous command and pipe it into the next command.

grep -o -e "[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]\{1,3\}.[0-9]\{1,3\}"

- means grep only the matching portion of a line with regex matching for IP addresses.

Note the difference if we used 'extended-regex':

grep -o -E "[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}"

sort - Sort the results as they are passed in from the redirection operator.

uniq - Only allow a line to show up once. You must run sort before this!

xargs -I{} whois {} - xargs is used to process multiple lines of output passing them back as individual lines of commands, what this effectively becomes:

xargs -I 9.1.2.3 
whois 9.1.2.3
...
repeat for each returned IP address

>> - Redirection operator (append) if you wanted to write a new line each time you use (>)

collect.txt - where to put the redirection.

grep stacking:

grep stacking can easily be accomplished by piping the output of one grep command into another grep command, consider:

cat *.log | grep -E "GET" | grep -v "1.1.1.1"
  • This command is catting every file that ends with '.log' to the console which is being piped '|' into the first grep command.
  • The first grep command will filter and pass only if "GET" is somewhere in the line of text
  • The second grep command is a inverse-filter. If '1.1.1.1' is found do not pass to the output.

ava / PHP / Golang / Net / Rust Regex is very well covered at regex101.com and simply live-type in your regex string against your test text.

grep power trick (combining with tail -f)

  • Say you want to follow a log typically it is done with:
tail -f mylog.log
  • To grep filter this we need to turn on the buffering, thus:
tail -f mylog.log | grep --line-buffered "myfilter"

If that does not work you may also do it in this fashion:

tail -f proxy-host-2_access.log | stdbuf -o0 grep -v "76.39.61.17"

Summary: Linux is crazy powerful, but you have to work at it persistently to get it's real power out.  Judicious and prodigous documentation will help you down the road - or at least a good solid list of bookmarks that show you where to look. It is typical that you will have a giant rash of grep then you might go several years and not look at it. Competent system administrators take the time to be organized in their documentation so that this information is quickly retrieved!

Linux Rocks Every Day