Splunk> SPL REGEX

A Regular Expression or REGEX is a way of specifying a series of characters used in a search pattern; these patterns are used to find/match a pattern within a data set and return the matched results. Regular Expressions can be simply thought of “wildcards on steroids” as they allow very deep and specific pattern matching.

Do you need to learn or use regex within splunk, well for basic searches the answer is “NO”, also by default splunk does a good job of automatically extracting interesting fields (using its inbuilt regex extractor IFX “Interactive Field Extractor”), and displays them on the left of the search panel, but this only works if splunk can recognise the interesting field using a key/value pair, e.g. key=value, so something like username=xxx is fine, but something like password wrong for user xxx would not work, at this point step in regex and take a bow…


REGEX – The Basics

A REGEX contains some basic elements (shown below) , which all work together to specify the search pattern. This is not an in depth regex tutorial, its a basic overview of some the searches available and how they integrate into splunk.

Control CharactersCharacter TypesRepetition
^ start of line
or NOT char if inside [ ]
\s White Spacea* 0 or more a’s
$ end of line\S NOT White Spacea+ 1 or more a’s
. any character
(excl newline)
\d Digit (Numeric 0-9)a? 0 or 1 a’s (optional a)
* match 0 or more times\D NOT Digita{m} exactly m x a’s
+ match 1 or more times\w Word Character (alpha, # or _)a{m,} at least m x a’s
? match 0 or 1 times
or shortest match
\W NOT a Word Charactera{m,n} at least m,
but at most n x a’s
| alternative\b Word boundary
(word followed by non-alpha)
\ escape character\B NOT a Word boundary
(word followed by alpha)
[ ] set of characters
{ } repetition modifier
Basic Elements of REGEX

The following samples show how some of the basic REGEXs work, to show how these elements all work together to execute the search based on the matching pattern.

Examples:

ExpressionMatches
abcexact match “abc” anywhere in the search string
^abcexact match “abc” at the start of the search string
xyw$exact match “xyz” at the end of the search string
a|b|cmatch either “a“, “b” or “c” anywhere in the search string
^abc|xyz$match either “abc” at the start or “xyz” at the end of the search string
ab{2}cexact match an “a“, followed by exactly 2 x “b“, followed by a “c
ab{2,}cmatch an “a“, followed by at least 2 x “b“, followed by a “c
ab{2,4}cmatch an “a“, followed by 2,3 or 4 x “b“, followed by a “c
a.cmatch an “a“, followed by any character, followed by a “c
a\.cexact match an “a“, followed by a “.” (dot), followed by a “c
[Aa]bcmatch either “Abc” or “abc
[abc]+match a non-empty string, containing a‘s, b‘s or c‘s
[^abc]+match a non-empty string, which does not contain a‘s, b‘s or c‘s
[^”]match all text between quotes, so “abc” returns abc, as such removes the quotes
\d\dmatch any 2 digits, same as \d{2}
abc\bmatches word “abc” followed by boundary, so “abc,” but not “abcd
abc\Bmatches word “abc” NOT followed by boundary, so “abcd“, but not “abc,
\w+matches any non-empty word (essentially any sequence of alphanumeric chars)
\w{7}matches any 7 letter word
\w{7}\bmatches any 7 letter word, followed by a word boundary (non-alphanumeric)
\bis\bmatches “is” as a whole word, because it’s preceded and followed by a boundary
is\bmatches words ending with “is”, because it’s followed by a boundary
REGEX examples and meanings

Using REGEX in Splunk with REX

So you have some basic knowledge of REGEX, now you want to use it…one of the ways in which splunk uses regex, allows the result of the regex to be stored in a field, which can then be used in other parts of the search and also in the output (I prefer this method, no reason, I just do).

rex

rex is a SPL (Search Processing Language) command that extracts fields from the raw data based on the pattern specified using regular expressions (or regex)”

The assignment of a result field for the output of the regex, is done via the splunk command rex, a example of the syntax is below:

rex field=_raw "invalid\suser\s(?<faileduserid>\w+)\s"

The above shows how a simple regex “invalid\suser\s(?<faileduserid>\w+)\s” is included into the splunk search query, this regex selects a word of any length, after the string “invalid user ” and before the next space; and the output is placed inside field “faileduserid

This targets the highlighted string in the log entry for a failed password “Failed password for invalid user appserver from 194.8.74.23 port 3351 ssh2“.

This can then be used to search on, for example adding | where faileduserid = “appserver”

The command rex can be used against the raw data (using _raw or excluding any field), alternatively if your logs are large, then you could target the rex against a particular field within the log, which would aid performance using “rex field=<field>“.


Example Search for User Authentication Failure

The following shows an example of a user authentication failure

This splits the error message into fields faileduserid, failedfromIP and failedport.

Splunk SPL REGEX Search and Select

All the above fields can seen using the field viewer on the left, and these fields are also open for use in the search itself.

The search above could also be done using one regex within the splunk search, with the same results.

Splunk SPL REGEX Search and Select

Remember, with regex, there’s always more to learn…so enjoy this basic overview


Footnote – The Future of Logging Data

With the advent of splunk, this should guide you to ensure your logs contain more easily identifiable and relevant information (if you control your logs, by that I mean you write your own log data during code execution); if you are writing out a log, then not only write out the error, but also include key/value pairs.

Instead of writing out:
2021/04/08 11:21:39,TESTUSER,1234567,192.168.1.100,WEBSVR-LIVE1-LB1,AUTH,AUTH-FAIL,”failed authentication”,”invalid username/password”,”161787719632038420″,”INFO”

Think of writing out:
2021/04/08 11:21:39, username=TESTUSER, clientip=192.168.1.100, server=WEBSVR-LIVE1-LB1, system=AUTH, err-type=AUTH-FAIL, err-msg1=”failed authentication”, err-msg2=”invalid username/password”, sessionid=”161787719632038420″, severity=”INFO”

This will allow splunk to automatically identify the fields using its IFX, meaning from the above you already know Username, Client IP Address, Internal Server Name, System Name, Error Type, Error Messages 1/2, Session ID and Message Severity without doing any additional work.


I’d appreciate your feedback, as this is one of first blog posts;
how does it read? is it pitched right? or too technical?
have you tried the code? does it work for you?


* Any SQL provided, I’ve tried to be quite generic, I tend to use SNOWFLAKE, T-SQL and DB2-SQL.
** Splunk is available free of charge, on a local only machine, which means you can download and use locally to familiarise yourself with the syntax, as well as trying out the above examples.
*** Any data shown in this blog, is taken from the sample data provided with the splunk local test/free version.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: