My Software Engineering Notes Help

Splunk: Zero to Power User

Introduction

What is Splunk?

  • Security Information and Event management (SIEM)

  • Network analysis tool that serves as a platform to conduct your big data analytics

    • Bring our data into Splunk, we read the raw events, and then we structure those events to make sense of the data that we're looking at

    • Take any kind of data structure your information is in, and is going to parse them and then create raw event logs for you to then search

    • Can create a bunch of different visual displays with the data you bring in and generate reports

    • Lets you tell the story of what's happening on your network and can give you insight to intelligence indicators you put into it and traffic that's traversing your network

    splunk intro

What Makes Up Splunk?

Big three components:

  1. Forwarders

    • going to forward off your data

  2. Indexers

    • going to index and process your data

  3. Search Headers

    • going to allow you to query and search your environment

splunk components

Forwarder

Three kinds:

  1. Universal forwarder

  2. Heavy forwarder

  3. Intermediary forwarder

  • going to forward off the data of those raw logs to an indexer from the machine that it resides on

Indexer

  • going to take the raw data and process it

  • think of an indexer as a page

    • write one line at a time starting at the top going to the bottom until the page is full

  • processes those raw logs are going to get sent over in the form of buckets

    • for now, think of a bucket as a stored directory of data that lives on the indexer, and it's grouped by time of that data for the events because the events are processed into these time groupings

Search Head

  • leverage your search head searching by time

  • most efficient delimiter to set as it tells the indexer exactly where to pull the data from disk and where to search on the indexer's page

  • main interface for querying your data that resides in your environment

  • craft your spells here, execute your search requests, and then send those requests off to the indexers to be executed

Types of Splunk Deploys

  1. Stand-alone

    stand alone
    • download Splunk on your local computer

    • Splunk server would function as the search head and the indexer

    • handle all those search requests and processing of the data

    • no need to deploy forwarders, so your inputs would reside with whatever configurations you make on that single server or your laptop

  2. Basic

    basic
    • start utilizing forwarders that reside on a remote machines to forward the data from those machines back to the Splunk server

    • Splunk server is still going to be our search engine and our indexer, but the inputs can now be handled by setting up the forwarder agents out on our remote machines

  3. Multi-instance

    multi-instance
    • common for how most companies utilize Splunk for their large production environments

    • key here is functional separation from your search head

    • indexers and the forwarders each handle their own roles

      • search head search only

      • indexer index only

      • forwarders only forward

Clustering

clusters
  • can increase your search capacity when we have a clustered search head

  • each user can collaborate their shared resources and knowledge objects

  • each search in a clustered environment should be a one for one replica of another search head in that environment, and you need a minimum of three search heads to have a clustered search head environment

  • a deployer is what you would use to manage your search, head, cluster environment

  • clustering your indexers can increase your data availability by doing data replication

    • replication factors that get involved with that

  • if you were to have hundreds of forwarders out in your environment, you would need to manage these through a deployment server

Getting Data into Splunk

data pipeline
  1. Forwarders

    • have the data, and they're forwarding it off, and your data is going to be in streams

    • If it's not coming from a forwarder, it may be coming from local logs, could be from a TCP port monitoring some kind of network traffic event, generations, etc.

    • anything in Splunk can really be inputted into the SIM

  2. Parsing

    • handled by indexer

    • turned from streams into events and then also handled by the indexer

  3. Indexing

    • compressed and written to disk

  4. Search

    • query and display results

Input Types

  • Http event collector

  • log files

  • network traffic

  • etc.

Metadata

  • host-who sent the data

  • source-path to the data

  • source type-how the data will be formatted

  • index

App vs Add-on

App

  • something that can be launched and has a GUI component

  • usually reside on the search head and visibly displayed in the drop-down in the app's menu

Add-on or technology add-on (TA)

  • can also reside in the drop-down in the app's menu, but need to change the visibility settings for add-ons to be displayed there

  • added to Splunk instance for additional functionality

  • usually runs in the background and also usually vendor specific for the type of data involved

  • workstation display does not change

  • can land on indexers, search head, or forwarder

The Basics of Searching

Search Types

  • Keywords and phrases

    • designate phrases inside ""

  • File paths

Wildcards

  • LIKE function with the percent ( % ) symbol as a wildcard for matching multiple characters

  • underscore ( _ ) character to match a single character

  • asterisk ( * ) character as a wildcard to match an unlimited number of characters in a string

Boolean Operators

  • AND

  • OR

  • NOT (exclude results from your search)

Comparison Operators

Operator

Example

Result

=

field=foo

Multivalued field values that exactly match "foo"

!=

field!=foo

Multivalued field values that don't exactly match "foo"

<

field<x

Numerical field values that are less than x

>

field>x

Numerical field values that are greater than x

<=

field<=x

Numerical field values that are less than and equal to x

>=

field>=x

Numerical field values that are greater than and equal to x

  • != (value does not match the value you specify)

Knowledge Objects

  • tools and useful things to take advantage when conducting analysis

What are Knowledge Objects?

knowledge objects
  • a set of user-defined searches, fields, and reports that enrich your data and give it structure

  1. Tools

    • conduct analysis, enrich your events

  2. Fields, field extractions, a lookup, tags, a field alias, data model or a saved search

  3. Teamwork

    • shareable, reusable, and searchable based on permission sets

How are they managed?

  1. Knowledge Manager

    • Ruler of the KOs

    • a person who provides centralized oversight and maintenance of KOs for a Splunk environment

    • Ex. Owner of a dashboard

  2. Naming conventions

    • <Group name>_<type>_<description>

    • Ex. SOC_alert_LoginFailures

Permissions

  1. Private

    • only the person who created the object can use or edit it

  2. This app only

    • objects persist in the context of a specific app

  3. All apps

    • objects persist globally across all apps

Show Me the Fields

What are fields?

  • key-value pairs

  • searchable by name

  • ability to search mutliple fields at once or exclude fields from a search

  • created by Splunk or recognized from an Add-On

Meta-fields

  1. Source

  2. Source-type

  3. Host

Making Use of Your Fields

  • you can create more seleccted fields

!= vs NOT

index=web sourcetype=access_combined categoryId!=SPORTS

  • This will tell Splunk to search for everything that does not contain the field value of sports for that field

index=web sourcetype=access_combined NOT categoryId!=SPORTS

  • will tell Splunk to search for everything that does not contain the field value of sports and all events where the category ID field doesn't exist

Search Processing Language (SPL)

Splunk syntax and colors

  • Orange - command modifiers

    • tell the search what you are looking for

    • can include your boolean operators, your keywords, or your phrases with as or by clauses set

      • OR, NOT, AND, as, by

  • Blue - commands

    • tell Splunk what you want to do with the results

      • Stats, Table, Rename, Dedup, Sort, Timechart

  • Green - arguments

    • these are the variables that you apply to the search, usually to a function

      • Limit, Span

  • Purple - functions

    • tell your search to do things such as perform mathematical functions or calculate fields

      • Tostring, Sum, Values, Min, Max, Avg

Building effective SPLs

index=web OR index=security | stats sum(bytes) as Total_Bytes | eval Total_Bytes = tostring(Total_Bytes, "commas")

index=web OR index=security

  • pull all data from disk

    • name you indexes and meta-fields

stats sum(bytes) as Total_Bytes

  • set your command

    • what are we trying to do

eval Total_Bytes = tostring(Total_Bytes, "commas")

  • determine your functions

    • do we need to calculate results?

  • call your arguments

    • what fields are needed?

The search was built from left to right, starting with determining where the data resides, setting the calculations, and then formatting the results, how to be displayed

Table, rename, fields, dedup, sort

  • table

    • make a table of the results based off the variables and arguments you set in your search.

  • rename

    • rename the fields that currently exist in the data or rename fields that you've calculated and built in your searches

  • fields

    • allows you to call on fields you want to include or exclude in your results

  • dedup

    • stands for a duplicate and it will remove duplicated values from the results from the fields you select to duplicate

  • sort

    • will sort your results based off the arguments you set

What is a Transforming Command?

  • search command that orders the results into a data table

  • transform the specified cell value for each event into numerical values that Splunk can use for statistical purposes

  • searches that use transforming commands are called transforming searches

Three Transforming Commands

  1. Top

    • finds the top common values of a field in a table

    • top 10 results by default

    • can use with arguments

  2. Rare

    • finds the least common values of a field

    • opposite of top

  3. Stats

    • calculate statistics

    • count, dc, sum avg list, values, etc.

What are the Events Telling Me?

Transaction Command

  • Events can be grouped into transactions based on the associated and related identified fields

  • helps enumerate that relation

Arguments

  • maxspan

    • Max time between all related events

    • Ex: maxspan=15m

  • maxpause

    • Max time between each event

    • Ex: maxpause=1m

  • startswith & endswith

    • Set your variables for keywords, Windows EventIDs, or other searches of interest

    • Ex: startswith=4624 & endswith=4647

Investigating your events

  • Events that span time

    • they can come from multiple hosts, relate to one host of interest

  • Grouping of events

    • show the entire conversation, from start to finish in one view

  • Aid investigations

    • relate user activity for logins, session lengths, browsing history, etc.

  • Log validation

    • check to see if data is related to network logs of interest, website traffic, emails, etc.

Transaction vs stat

Transaction

Stats

slow and will tax your environment

faster, more efficient searching

granular analysis (logs, user behavior, conversations

looking at larger pools of events for trend analysis (no limit on number of events returned)

small scope on one item of interest

broad searching and grouping events

correlations need to be found from start to end

mathematical functions needed

Manipulating Your Data

Eval command

  • Calculates fields

    • does the math you ask: +, -, *, /, AND, XOR, >=, ==

  • Functions friendly

    • just like stats, it takes plenty of functional arguments

      • if

      • null

      • cidrmatch

      • like

      • lookup

      • tostring

      • md5

      • now

      • strftime

    • if the field already exists, it will overwrite that field, but it won't modify the underlying data already written to the disk

  • Create new fields

    • Eval will take the results of the SPL into a field that is existing, or create a new one

  • Converting data

    • tell Splunk to display a field vale of bytes to megabytes by providing the math in a eval statement Strftime, strptime

where and search commands

Where

Search

can't place before the first pipe in the SPL

place it anywhere in the SPL

comparing values, or searching for a matching value

search on a keyword, or matching value

use with functions

search with wildcards

think boolean operators=where

think expressional searches=search

Fields, Part 2

Field extraction methods

  • Regex - unstructured data

  • Delimiters -structured data

  • Commands -work with rex & erex in SPL

erex & rex commands

  • rex

    • regex pro

    • using regex to create a new field out of an existing field

    • have to tell what field to extract the data from

  • erex

    • aids in generating the regex for the extraction

    • must provide examples

Lookups

What is a lookup?

  • A file

    • mostly static data that is not in an index

    • Ex: csv of all employees

  • A tool

    • add additional fields to search for

    • fields will be added to the fields bar menu

How to use one

  • Data enrichment

    • add information and store it in a table/file format to then search

  • Commands

    • Lookup

      • used to load the results contained in the lookup

      • can be used to just view the data

      • can be used as a form of validation

    • inputlookup

      • used to search the contents of a lookup table

    • outputlookup

      • used to write to that lookup table

    • OUTPUT

      • this argument when added will overwrite existing fields

    • OUTPUTNEW

      • this argument when added will not overwrite existing fields

  • Create or Upload

    • select a file to upload or make one to reference

Making a lookup

  • Navigate to Settings > Lookup table files

  • Click New Lookup Table File

Visualize Your Data

Types of visualizations

  • Tables

  • Charts

  • Maps

Visualization commands

  • timechart

    • time series will display statistical trends over time

    • single or multi-series

      • to get multi-series, you need to have chart or timechart command in the search

    • ex: Span=1d

  • chart

    • line, area, bar, bubble, pie, scatter, etc.

    • stacking available

    • remove empty values

      • Useother=f Usenull=f

  • stats

    • can easily alter any stats table

Options for panels

  • stacking

    • On = events are vertically stacked (top to bottom)

    • Off = counts are horizontally stacked (left to right)

  • overlay

    • ex: add two line charts over each other

  • Trellis

    • display multiple charts at once

  • Multi-series

    • On=y-axis to split for each value

    • Off=all fields share the y-axis

Visualizations, Part 2

Additional commands

  • iplocation

    • add location information to visualizations

    • can be towns, cities, countries or just lat and long

  • geostats

    • calculate functions to display a cluster map

    • must be used with lat and long fields

    • all other arguments are optional

    • latitude, longitude, globallimit, locallimit

  • addtotals

    • add multiple values together on a chart, compute total sums of values

    • Fieldname, label, labelfield

  • trendline

    • overlay on a chart to show the moving average

    • sma (simple moving average), ema (exponential moving average), wma (weighted moving average)

    • needs the functional trend type you're using for that field that you're calculating that function from included in the command

    • need you to define an integer value for the period that you want to set

Reports & Drill Downs

Reports

What are reports?

  • a saved search

    • anything that is a search can be saved as a report

  • live results

    • re-run a report or set it to run on a schedule

  • Shareable knowledge object

    • let anyone view your reports, or add them to a dashboard for people to reference

    • ex: Audit_Report_LicenseUseage

Drill-down functionality

  • Actions

    • link to search

    • link to dashboard

    • link to report

  • $tokens$

    • tokens play a key role in passing variables from panel to panel

    • values that we can pass within a dashboard or search to optimize the shared values of what we want to search

    • used to allow for user input to be taken and then searched against

  • Export

    • export as a PDF, print, or include a report

Make a home dashboard

Navigate toSettings > Dashboards > Edit > Set as Home Dashboard

  • change in your preferences what you launch into after login

Alerts

What are alerts?

  • saved searches

    • run on a schedule

    • run real-time

  • content matches

    • fire when a condition is matched

  • create trigger actions

    • log

    • send email

    • webhook

    • custom action

  • create trigger conditions

    • per result

    • no. of results

    • no. of sources

    • custom

    • throttle

Welcome, Tags, & Events

What is a tag?

  • quick reminder

    • what was it that I was trying to see again

  • aid for reading data

    • create as many tags as you want

  • case sensitive

    • typing matters when searching

What are event types

  • highlighter

    • make them colors, mark events with similar criteria

  • like a report, but not

    • save searches as specific event types, sort into categories, no time range

    • ex: status=400 can be saved as "Not Found"

  • more specific

    • set strings, field values, & tags

Macros

What are macros?

  • shortcuts

    • fast, saved off searches to run by name

  • Repeatable

    • macros never change unless you edit them

  • expandable

    • CTRL+SHIFT+E on windows

    • CMND+SHIFT+E on macs

  • macroname

    • run with the use of backticks, not single quotes

  • macros can take one or more arguments

    • if you want use arguments, you must surround them with parenthesis

Making a macro

  • Navigate to Settings > Advanced search > Search macros

  • Click Add new to create one

Workflows to Save You Time

Introduction to workflow actions

  • Assess actions

    • depending on use case, there are three available workflow actions which provide different functionalities

  • Create workflow action

    • using Splunk web, create a new workflow action to either push, pull or search data

  • Configure workflow action

    • within the web GUI,configure the previously determined action type with a 3rd party source

  • Validation

    • check to see if data is being pushed, pulled searched for after configuration

Splunk provides two main workflow actions:

  1. GET

    • create HTML links to interact with sites

    • ex:Google searches, query WHOIS databases

  2. POST

    • generate HTTP POST request to specific URI

    • ex: create entries in management systems, forums

Another workflow action is Search

  • Launch secondary searches using field values

    • ex: occurrences of IP addresses over events

GET Workflow action

  • Navigate to Settings > Fields > Workflow Actions

  • Click New to open up a workflow action form

Data Normalization and Troubleshooting

Field aliases

  • ex: src ip_address source

  • normalize your data

  • apply multiple fields to the same field alias

  • make searching and training easier amongst users

  • thing CIM

  • Navigate to Settings > Fields > Field Aliases

  • click New Field Alias to create one

Calculated fields

  • like a macro but for fields

  • save off quick math to output fields using the eval command, then use it in a search

  • Navigate to Settings > Fields > Calculated Aliases

  • click New Calculated Field to create one

Buckets

splunk buckets
  • Hot

    • data is being actively written to the bucket by the indexer

    • *only writable bucket

    • data is searchable

  • Warm

    • data is getting older

    • rolled from hot > warm

    • data is searchable

  • Cold

    • data is even older

    • data is searchable

  • Based on the retention policy with Splunk, the data will eventually roll over to the frozen bucket and the data will either get archived or deleted

  • Frozen buckets are not searchable

Job inspector

tool

information

tips

Allows you to trouble shoot your search efficiently, or reason for failing

Gives you information about how the search completed, and time it took to run

If you are using a KO wrong, it will suggest how to correct your search

Datamodels

What are datamodels?

  • Hierarchical

    • parent and child relationship

    • root dataset

  • Dataset search

    • select the specific datamodel and dataset you want to search

  • Normalization tool

    • CIM compliant

    • data mapping to a model that fits that type of data

  • Large data searches

    • search larger amounts of data faster, with tstats and accelerated datamodels

Commands

  1. datamodel

  2. tstats

  3. pivot

Syntax

| datamodel <Data_Model> <Data_Model_Dataset> search | search sourcetype=<your:sourcetype> | table * | fields - <List any statistics columns you do not want to display> | fieldsummary

| tstats <stats-function> from datamodel=<datamodel-name> where <where-conditions> by <field-list>

Examples:

| datamodel | Network_Traffic All_Traffic search | search sourcetype=cisco:* | stats count by sourcetype

| tstats count from datamodel=web

| tstats \summariesonly` count from datamodel=Intrusion_Detection.IDS_Attacks where IDS_Attacks.severity=high OR IDS_Attacks.severity=critical by IDS.Attacks.src, IDS_Attacks.dest, IDS_Attacks.signature, IDS_Attacks.severity`

The Common Information Model (CIM)

What is the CIM?

  • A Model

    • a model to use and reference a common standard of operations for how all data is handled

  • An Application

    • provides 22 pre-configured data models for you use and build off, tune and map your data to

    • CIM Add-On and CIM Add-On Builder are available for free

  • Data Normalizer

    • in the end, all fields can have the same name

    • all apps can coexist together

How to leverage its features

  • Normalize data

  • Assistance

    • leverage it when creating field extractions, aliases, tags, etc.

  • Datamodel command

    • be able to run common searches that span larger amounts of data

Why is it important

  • Splunk Premium Apps

    • Splunk ES relies heavily on CIM compliant data

  • Health Check Tool

    • perform faster, more efficient searches that leverage searching data models instead of raw events

  • Ease of Use

    • find commonality among Splunkers

  • Audit

    • check to see if all our data going into Splunk is CIM compliant

Last modified: 10 March 2024