Note: This is a guide that I wrote for my own home lab setup. It is not using the latest version of ELK, and there are many other great ways to get an ELK stack set up using Docker, prebuilt virtual machines (VMs), and many other great projects.
Building the stack manually will test your troubleshooting skills and give you a much better understanding for how these components work together so that you can utilize them in other versions and even other products. There are sections that could be scripted and automated; as much as it pains me, for the sake of this article, I made things as manual as possible for easy troubleshooting—and hopefully better understanding.
Building a cluster this way has served me well in my career and I hope that it can do the same for you! Let’s get started.
"ELK" is the acronym for three open source projects: Elasticsearch, Logstash, and Kibana. Elasticsearch is a search and analytics engine. Logstash is a server‑side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a "stash" like Elasticsearch. Kibana lets users visualize data with charts and graphs in Elasticsearch.
We’ll start by dissecting this statement through the lens of cybersecurity—though it applies to many other areas in development and engineering to name a few. When we as defenders are trying to find a potential incident in our or a customer’s network, we need one thing at a minimum: visibility.
For our purposes, visibility == logs. Whether they’re network logs coming from our network devices such as routers, switches, and firewalls or logs showing what’s running on a user’s workstation, to piece things together effectively, we need logs. This is where the log shippers or agents come in.
Let’s take a look at the diagram below for context:
We can see on a Windows machine, we have the beats or elastic-agent installed. The beats agent is in charge of collecting logs on a machine or device that we want to monitor. We can tell it what to collect based on how we configure the agent. If we forego the beats agent for its successor the elastic-agent, we get some other great features like endpoint monitoring and alerts as well!
The question we need to ask ourselves is, “What data do I want to collect from which devices?” We can then task our collectors (beats or elastic-agent) to fetch those logs.
So, we have our little fetchers that are collecting logs on a given system. Where do we send them for storage?
This is where the elasticsearch component, or the “E” in our ELK stack, comes into play. Think of this like a database that will properly process, index, and store the data collected from our log collectors for easy retrieval. I find it helps to picture this like a library with tons of librarians taking the books (our log data) and sorting it on the proper shelves.
If there’s no title, author, or the books are written in an entirely different language, those librarians may have issues with how to properly index them. This is where the “L” of our ELK stack comes in: Logstash. Some logs have standard data structures that elasticsearch reads and knows how to index based on how it’s configured. If we wrote our own custom logs or collected them from a less known data source (or self-published book), however, we may need to send them to someone to identify and translate those logs for elasticsearch to be able to understand.
That’s the role of logstash: it takes in data, and based on how we configure it, will transform that data into a form that elasticsearch will then be able to index properly.
This is cool and all, but what good does a library of all the relevant logs in our network do us when no-one can read anything in it? This is where the “K” of our ELK stack comes in with Kibana.
I like to think of Kibana as the librarian at the resource desk. We may be looking for a specific book, a genre, or all published authors within a certain time period. We’d take this information and ask or query the librarian at the resource desk who will interact with the elasticsearch librarians to return the data you requested.
Between these three, we have lots of power at our fingertips; however, when you combine these search functions with alerting, visualizations, and even an endpoint detection agent, you can create a formidable Security Information and Events Management (SIEM) system that can give you the exposure to querying and using these tools as if you were in an enterprise environment! Talk about a good resume bullet…
So... let’s get started! First, you’ll need a hypervisor, whether it’s VirtualBox or VMWare—it doesn’t matter. You’ll also want to check the table below to make sure your host machine has enough resources to spare:
MINIMUM Hardware Requirements:
1 core processor
Internet connection (for install)
RECOMMENDED Minimum Hardware Requirements:
2+ core processors
Internet connection (for install)
Section 1: Set up Ubuntu ISO VM
Note: If not explicitly mentioned in this walkthrough, continue with the install on the default setting.
echo "deb [<https://artifacts.elastic.co/packages/7.x/apt>](<https://artifacts.elastic.co/packages/7.x/apt>) stable main" | sudo tee /etc/apt/sources.list.d/elastic-7.x.list
We should see an OK to indicate that the GPG key is added and an echo of our /etc/apt/sources entry as a successful output. This will allow us to pull from the elasticsearch repository when we look to install the ELK components and verify that the packages are coming from a trusted source by verifying its GPG key!
3. Update the recently added elasticsearch repository and pull from it to install elasticsearch.
sudo apt update -y
sudo apt install elasticsearch -y
4. Now we’ll want to reload the known daemons and services on our machine to verify that elasticsearch gets properly added.
sudo systemctl daemon-reload
5. Verify that elasticsearch is installed, but not running using systemctl
sudo systemctl status elasticsearch.service
Section 4: Edit the Elasticsearch YML for Configuration and Start the Service
Now that we’ve properly installed the elasticsearch application, we need to properly configure it so that it knows how to run, what ports it needs to open, where it needs to communicate. We can configure this by editing the yaml file associated with it.
Be careful though! YAML syntax can get finicky with unintentional spaces and other characters.
1. Open the /etc/elasticsearch/elasticsearch.yml file using your favorite text editor:
sudo vim /etc/elasticsearch/elasticsearch.yml
2. Edit the /etc/elasticsearch/elasticsearch.yml file to add the following (NOTE: You'll want to be sure to remove the # in front of the cluster.name and other entries. These are comment characters telling elasticsearch not to read those lines when applying settings.)
Note: The discovery.type entry does not exist in the default yaml file and will need to be added:
3. After you’ve updated the configuration file, start and enable the elasticsearch.service service. This may take a minute or two.
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
4. Test to make sure elasticsearch is running properly by browsing to our VM’s IP address over port 9200.
curl -XGET “192.168.1.150:9200”
With the JSON response and tagline “You Know, for Search” we know that our elasticsearch service is running properly on port 9200 and is accessible! This is the port that our beats shippers/agents will use to send data to the elasticsearch database for indexing.
Section 5: Install Kibana and Edit the Kibana YML for Configuration
Now we need to install the other component we’ll be using to interact with the elasticsearch database: Kibana.
1. Install Kibana.
sudo apt install kibana -y
2.Edit the /etc/kibana/kibana.yml file with your favorite text editor to add the following:
3. Start and enable the Kibana service and restart elasticsearch (this is always a good practice when restarting either service as Kibana and elasticsearch are very dependent on each other).
sudo systemctl start kibana
sudo systemctl enable kibana
sudo systemctl restart elasticsearch
4. Check the status of both the elasticsearch and Kibana services before moving on to the next step (NOTE: If you run the command as pictured in the screenshot, you will have to press “Q” to see the next output):
sudo systemctl status kibana
sudo systemctl status elasticsearch
Section 6 (Optional): Install Logstash
Now that we’ve established the kibana and elasticsearch services on our machine, we can add Logstash to help with any potential logs that elasticsearch may not know how to index by default and filebeat to get some other cool features and monitoring on our VM itself. This section is optional but encouraged!
1. Install and enable Logstash.
sudo apt install logstash -y
sudo systemctl start logstash
sudo systemctl enable logstash
This will get the foundation laid for our Logstash service; however, configuring pipelines with Logstash will be outside of the scope of this article.
Section 7: Install Filebeat
1. Install Filebeat.
sudo apt install filebeat -y
2. Edit the /etc/filebeat/filebeat.yml file with your favorite text editor to add the following entry to connect to our elasticsearch service.
# Configure what output to use when sending the data collected by the beat.
# ---------------------------- Elasticsearch Output ---------------------------- output.elasticsearch: # Array of hosts to connect to. hosts: ["192.168.1.150:9200"]
Now we’ve set up some basic logging to send logs to elasticsearch over port 9200! We’ll want to test it now to make sure it’s collecting logs and sending it for proper indexing. (NOTE: We will have to change this once we start enabling security features.)
3. Run the Filebeat setup commands with the following arguments to tell Filebeat we don’t need Logstash logging and to ensure we have the proper host configured in our yml file: