Splunk: Zero to Power User

Introduction

What is Splunk?

Security Information and Event management (SIEM)
Network analysis tool that serves as a platform to conduct your big data analytics
- Bring our data into Splunk, we read the raw events, and then we structure those events to make sense of the data that we're looking at
- Take any kind of data structure your information is in, and is going to parse them and then create raw event logs for you to then search
- Can create a bunch of different visual displays with the data you bring in and generate reports
- Lets you tell the story of what's happening on your network and can give you insight to intelligence indicators you put into it and traffic that's traversing your network

What Makes Up Splunk?

Big three components:

Forwarders
- going to forward off your data
Indexers
- going to index and process your data
Search Headers
- going to allow you to query and search your environment

Forwarder

Three kinds:

Universal forwarder
Heavy forwarder
Intermediary forwarder

going to forward off the data of those raw logs to an indexer from the machine that it resides on

Indexer

going to take the raw data and process it
think of an indexer as a page
- write one line at a time starting at the top going to the bottom until the page is full
processes those raw logs are going to get sent over in the form of buckets
- for now, think of a bucket as a stored directory of data that lives on the indexer, and it's grouped by time of that data for the events because the events are processed into these time groupings

Search Head

leverage your search head searching by time
most efficient delimiter to set as it tells the indexer exactly where to pull the data from disk and where to search on the indexer's page
main interface for querying your data that resides in your environment
craft your spells here, execute your search requests, and then send those requests off to the indexers to be executed

Types of Splunk Deploys

Stand-alone
- download Splunk on your local computer
- Splunk server would function as the search head and the indexer
- handle all those search requests and processing of the data
- no need to deploy forwarders, so your inputs would reside with whatever configurations you make on that single server or your laptop
Basic
- start utilizing forwarders that reside on a remote machines to forward the data from those machines back to the Splunk server
- Splunk server is still going to be our search engine and our indexer, but the inputs can now be handled by setting up the forwarder agents out on our remote machines
Multi-instance
- common for how most companies utilize Splunk for their large production environments
- key here is functional separation from your search head
- indexers and the forwarders each handle their own roles
  - search head search only
  - indexer index only
  - forwarders only forward

Clustering

can increase your search capacity when we have a clustered search head
each user can collaborate their shared resources and knowledge objects
each search in a clustered environment should be a one for one replica of another search head in that environment, and you need a minimum of three search heads to have a clustered search head environment
a deployer is what you would use to manage your search, head, cluster environment
clustering your indexers can increase your data availability by doing data replication
- replication factors that get involved with that
if you were to have hundreds of forwarders out in your environment, you would need to manage these through a deployment server

Getting Data into Splunk

Forwarders
- have the data, and they're forwarding it off, and your data is going to be in streams
- If it's not coming from a forwarder, it may be coming from local logs, could be from a TCP port monitoring some kind of network traffic event, generations, etc.
- anything in Splunk can really be inputted into the SIM
Parsing
- handled by indexer
- turned from streams into events and then also handled by the indexer
Indexing
- compressed and written to disk
Search
- query and display results

Input Types

Http event collector
log files
network traffic
etc.

Metadata

host-who sent the data
source-path to the data
source type-how the data will be formatted
index

App vs Add-on

App

something that can be launched and has a GUI component
usually reside on the search head and visibly displayed in the drop-down in the app's menu

Add-on or technology add-on (TA)

can also reside in the drop-down in the app's menu, but need to change the visibility settings for add-ons to be displayed there
added to Splunk instance for additional functionality
usually runs in the background and also usually vendor specific for the type of data involved
workstation display does not change
can land on indexers, search head, or forwarder

The Basics of Searching

Search Types

Keywords and phrases
- designate phrases inside ""
File paths

Wildcards

LIKE function with the percent ( % ) symbol as a wildcard for matching multiple characters
underscore ( _ ) character to match a single character
asterisk ( * ) character as a wildcard to match an unlimited number of characters in a string

Boolean Operators

AND
OR
NOT (exclude results from your search)

Comparison Operators

Operator	Example	Result
`=`	field=foo	Multivalued field values that exactly match "foo"
`!=`	field!=foo	Multivalued field values that don't exactly match "foo"
`<`	field<x	Numerical field values that are less than x
`>`	field>x	Numerical field values that are greater than x
`<=`	field<=x	Numerical field values that are less than and equal to x
`>=`	field>=x	Numerical field values that are greater than and equal to x

!= (value does not match the value you specify)

Knowledge Objects

tools and useful things to take advantage when conducting analysis

What are Knowledge Objects?

a set of user-defined searches, fields, and reports that enrich your data and give it structure

Tools
- conduct analysis, enrich your events
Fields, field extractions, a lookup, tags, a field alias, data model or a saved search
Teamwork
- shareable, reusable, and searchable based on permission sets

How are they managed?

Knowledge Manager
- Ruler of the KOs
- a person who provides centralized oversight and maintenance of KOs for a Splunk environment
- Ex. Owner of a dashboard
Naming conventions
- <Group name>_<type>_<description>
- Ex. SOC_alert_LoginFailures

Permissions

Private
- only the person who created the object can use or edit it
This app only
- objects persist in the context of a specific app
All apps
- objects persist globally across all apps

Show Me the Fields

What are fields?

key-value pairs
searchable by name
ability to search mutliple fields at once or exclude fields from a search
created by Splunk or recognized from an Add-On

Meta-fields

Source
Source-type
Host

Making Use of Your Fields

you can create more seleccted fields

!= vs NOT

index=web sourcetype=access_combined categoryId!=SPORTS

This will tell Splunk to search for everything that does not contain the field value of sports for that field

index=web sourcetype=access_combined NOT categoryId!=SPORTS

will tell Splunk to search for everything that does not contain the field value of sports and all events where the category ID field doesn't exist

Search Processing Language (SPL)

Splunk syntax and colors

Orange - command modifiers
- tell the search what you are looking for
- can include your boolean operators, your keywords, or your phrases with as or by clauses set
  - OR, NOT, AND, as, by
Blue - commands
- tell Splunk what you want to do with the results
  - Stats, Table, Rename, Dedup, Sort, Timechart
Green - arguments
- these are the variables that you apply to the search, usually to a function
  - Limit, Span
Purple - functions
- tell your search to do things such as perform mathematical functions or calculate fields
  - Tostring, Sum, Values, Min, Max, Avg

Building effective SPLs

index=web OR index=security | stats sum(bytes) as Total_Bytes | eval Total_Bytes = tostring(Total_Bytes, "commas")

index=web OR index=security

pull all data from disk
- name you indexes and meta-fields

stats sum(bytes) as Total_Bytes

set your command
- what are we trying to do

eval Total_Bytes = tostring(Total_Bytes, "commas")

determine your functions
- do we need to calculate results?
call your arguments
- what fields are needed?

The search was built from left to right, starting with determining where the data resides, setting the calculations, and then formatting the results, how to be displayed

Table, rename, fields, dedup, sort

table
- make a table of the results based off the variables and arguments you set in your search.
rename
- rename the fields that currently exist in the data or rename fields that you've calculated and built in your searches
fields
- allows you to call on fields you want to include or exclude in your results
dedup
- stands for a duplicate and it will remove duplicated values from the results from the fields you select to duplicate
sort
- will sort your results based off the arguments you set

Transforming Your Search

What is a Transforming Command?

search command that orders the results into a data table
transform the specified cell value for each event into numerical values that Splunk can use for statistical purposes
searches that use transforming commands are called transforming searches

Three Transforming Commands

Top
- finds the top common values of a field in a table
- top 10 results by default
- can use with arguments
Rare
- finds the least common values of a field
- opposite of top
Stats
- calculate statistics
- count, dc, sum avg list, values, etc.

What are the Events Telling Me?

Transaction Command

Events can be grouped into transactions based on the associated and related identified fields
helps enumerate that relation

Arguments

maxspan
- Max time between all related events
- Ex: maxspan=15m
maxpause
- Max time between each event
- Ex: maxpause=1m
startswith & endswith
- Set your variables for keywords, Windows EventIDs, or other searches of interest
- Ex: startswith=4624 & endswith=4647

Investigating your events

Events that span time
- they can come from multiple hosts, relate to one host of interest
Grouping of events
- show the entire conversation, from start to finish in one view
Aid investigations
- relate user activity for logins, session lengths, browsing history, etc.
Log validation
- check to see if data is related to network logs of interest, website traffic, emails, etc.

Transaction vs stat

Transaction	Stats
slow and will tax your environment	faster, more efficient searching
granular analysis (logs, user behavior, conversations	looking at larger pools of events for trend analysis (no limit on number of events returned)
small scope on one item of interest	broad searching and grouping events
correlations need to be found from start to end	mathematical functions needed

Manipulating Your Data

Eval command

Calculates fields
- does the math you ask: +, -, *, /, AND, XOR, >=, ==
Functions friendly
- just like stats, it takes plenty of functional arguments
  - if
  - null
  - cidrmatch
  - like
  - lookup
  - tostring
  - md5
  - now
  - strftime
- if the field already exists, it will overwrite that field, but it won't modify the underlying data already written to the disk
Create new fields
- Eval will take the results of the SPL into a field that is existing, or create a new one
Converting data
- tell Splunk to display a field vale of bytes to megabytes by providing the math in a eval statement Strftime, strptime

where and search commands

Where	Search
can't place before the first pipe in the SPL	place it anywhere in the SPL
comparing values, or searching for a matching value	search on a keyword, or matching value
use with functions	search with wildcards
think boolean operators=where	think expressional searches=search

Fields, Part 2

Field extraction methods

Regex - unstructured data
Delimiters -structured data
Commands -work with rex & erex in SPL

erex & rex commands

rex
- regex pro
- using regex to create a new field out of an existing field
- have to tell what field to extract the data from
erex
- aids in generating the regex for the extraction
- must provide examples

Lookups

What is a lookup?

A file
- mostly static data that is not in an index
- Ex: csv of all employees
A tool
- add additional fields to search for
- fields will be added to the fields bar menu

How to use one

Data enrichment
- add information and store it in a table/file format to then search
Commands
- Lookup
  - used to load the results contained in the lookup
  - can be used to just view the data
  - can be used as a form of validation
- inputlookup
  - used to search the contents of a lookup table
- outputlookup
  - used to write to that lookup table
- OUTPUT
  - this argument when added will overwrite existing fields
- OUTPUTNEW
  - this argument when added will not overwrite existing fields
Create or Upload
- select a file to upload or make one to reference

Making a lookup

Navigate to Settings > Lookup table files
Click New Lookup Table File

Visualize Your Data

Types of visualizations

Tables
Charts
Maps

Visualization commands

timechart
- time series will display statistical trends over time
- single or multi-series
  - to get multi-series, you need to have chart or timechart command in the search
- ex: Span=1d

chart
- line, area, bar, bubble, pie, scatter, etc.
- stacking available
- remove empty values
  - Useother=f Usenull=f
stats
- can easily alter any stats table

Options for panels

stacking
- On = events are vertically stacked (top to bottom)
- Off = counts are horizontally stacked (left to right)
overlay
- ex: add two line charts over each other
Trellis
- display multiple charts at once
Multi-series
- On=y-axis to split for each value
- Off=all fields share the y-axis

Visualizations, Part 2

Additional commands

iplocation
- add location information to visualizations
- can be towns, cities, countries or just lat and long
geostats
- calculate functions to display a cluster map
- must be used with lat and long fields
- all other arguments are optional
- latitude, longitude, globallimit, locallimit
addtotals
- add multiple values together on a chart, compute total sums of values
- Fieldname, label, labelfield
trendline
- overlay on a chart to show the moving average
- sma (simple moving average), ema (exponential moving average), wma (weighted moving average)
- needs the functional trend type you're using for that field that you're calculating that function from included in the command
- need you to define an integer value for the period that you want to set

Reports & Drill Downs

Reports

What are reports?

a saved search
- anything that is a search can be saved as a report
live results
- re-run a report or set it to run on a schedule
Shareable knowledge object
- let anyone view your reports, or add them to a dashboard for people to reference
- ex: Audit_Report_LicenseUseage

Drill-down functionality

Actions
- link to search
- link to dashboard
- link to report
$tokens$
- tokens play a key role in passing variables from panel to panel
- values that we can pass within a dashboard or search to optimize the shared values of what we want to search
- used to allow for user input to be taken and then searched against
Export
- export as a PDF, print, or include a report

Make a home dashboard

Navigate toSettings > Dashboards > Edit > Set as Home Dashboard

change in your preferences what you launch into after login

Alerts

What are alerts?

saved searches
- run on a schedule
- run real-time
content matches
- fire when a condition is matched
create trigger actions
- log
- send email
- webhook
- custom action
create trigger conditions
- per result
- no. of results
- no. of sources
- custom
- throttle

Welcome, Tags, & Events

What is a tag?

quick reminder
- what was it that I was trying to see again
aid for reading data
- create as many tags as you want
case sensitive
- typing matters when searching

What are event types

highlighter
- make them colors, mark events with similar criteria
like a report, but not
- save searches as specific event types, sort into categories, no time range
- ex: status=400 can be saved as "Not Found"
more specific
- set strings, field values, & tags

Macros

What are macros?

shortcuts
- fast, saved off searches to run by name
Repeatable
- macros never change unless you edit them
expandable
- CTRL+SHIFT+E on windows
- CMND+SHIFT+E on macs
macroname
- run with the use of backticks, not single quotes
macros can take one or more arguments
- if you want use arguments, you must surround them with parenthesis

Making a macro

Navigate to Settings > Advanced search > Search macros
Click Add new to create one

Workflows to Save You Time

Introduction to workflow actions

Assess actions
- depending on use case, there are three available workflow actions which provide different functionalities
Create workflow action
- using Splunk web, create a new workflow action to either push, pull or search data
Configure workflow action
- within the web GUI,configure the previously determined action type with a 3rd party source
Validation
- check to see if data is being pushed, pulled searched for after configuration

Splunk provides two main workflow actions:

GET
- create HTML links to interact with sites
- ex:Google searches, query WHOIS databases
POST
- generate HTTP POST request to specific URI
- ex: create entries in management systems, forums

Another workflow action is Search

Launch secondary searches using field values
- ex: occurrences of IP addresses over events

GET Workflow action

Navigate to Settings > Fields > Workflow Actions
Click New to open up a workflow action form

Data Normalization and Troubleshooting

Field aliases

ex: src ip_address source
normalize your data
apply multiple fields to the same field alias
make searching and training easier amongst users
thing CIM

Navigate to Settings > Fields > Field Aliases
click New Field Alias to create one

Calculated fields

like a macro but for fields
save off quick math to output fields using the eval command, then use it in a search
Navigate to Settings > Fields > Calculated Aliases
click New Calculated Field to create one

Buckets

Hot
- data is being actively written to the bucket by the indexer
- *only writable bucket
- data is searchable
Warm
- data is getting older
- rolled from hot > warm
- data is searchable
Cold
- data is even older
- data is searchable

Based on the retention policy with Splunk, the data will eventually roll over to the frozen bucket and the data will either get archived or deleted
Frozen buckets are not searchable

Job inspector

tool	information	tips
Allows you to trouble shoot your search efficiently, or reason for failing	Gives you information about how the search completed, and time it took to run	If you are using a KO wrong, it will suggest how to correct your search

Datamodels

What are datamodels?

Hierarchical
- parent and child relationship
- root dataset
Dataset search
- select the specific datamodel and dataset you want to search
Normalization tool
- CIM compliant
- data mapping to a model that fits that type of data
Large data searches
- search larger amounts of data faster, with tstats and accelerated datamodels

Commands

datamodel
tstats
pivot

Syntax

| tstats <stats-function> from datamodel=<datamodel-name> where <where-conditions> by <field-list>

Examples:

| datamodel | Network_Traffic All_Traffic search | search sourcetype=cisco:* | stats count by sourcetype

| tstats count from datamodel=web

| tstats \summariesonly` count from datamodel=Intrusion_Detection.IDS_Attacks where IDS_Attacks.severity=high OR IDS_Attacks.severity=critical by IDS.Attacks.src, IDS_Attacks.dest, IDS_Attacks.signature, IDS_Attacks.severity`

The Common Information Model (CIM)

What is the CIM?

A Model
- a model to use and reference a common standard of operations for how all data is handled
An Application
- provides 22 pre-configured data models for you use and build off, tune and map your data to
- CIM Add-On and CIM Add-On Builder are available for free
Data Normalizer
- in the end, all fields can have the same name
- all apps can coexist together

How to leverage its features

Normalize data
Assistance
- leverage it when creating field extractions, aliases, tags, etc.
Datamodel command
- be able to run common searches that span larger amounts of data

Why is it important

Splunk Premium Apps
- Splunk ES relies heavily on CIM compliant data
Health Check Tool
- perform faster, more efficient searches that leverage searching data models instead of raw events
Ease of Use
- find commonality among Splunkers
Audit
- check to see if all our data going into Splunk is CIM compliant

Last modified: 10 March 2024