Python and Luigi: The Ultimate Tools for Building an Automated Recon Pipeline - Part I
How to Build an Automated Recon Pipeline with Python and Luigi Part I
In this article, you will learn how to build an automated recon pipeline with Python and Luigi. Recon, or reconnaissance, is the process of gathering information about a target system, network, or application. Recon is an essential step for any security assessment, penetration testing, or bug bounty hunting. However, recon can also be tedious, time-consuming, and error-prone if done manually. That's why automating recon can save you a lot of time and effort, as well as improve the quality and consistency of your results.
How To Build An Automated Recon Pipeline With Python And Luigi Part I
But how do you automate recon? There are many tools and frameworks available for recon automation, but in this article, we will focus on two: Python and Luigi. Python is a popular programming language that offers a rich set of libraries and modules for various tasks, including web scraping, network programming, data analysis, and more. Luigi is a Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
By using Python and Luigi, you can create a recon pipeline that consists of multiple tasks that run in sequence or parallel, depending on their dependencies. Each task can perform a specific recon activity, such as domain enumeration, subdomain discovery, port scanning, directory brute-forcing, etc. You can also customize your tasks to suit your needs and preferences. The recon pipeline will automatically handle the input and output of each task, as well as the error handling and logging. You can also monitor the progress and status of your pipeline with a web interface that Luigi provides.
In this article, we will cover the following topics:
How to install Python and Luigi on your system
How to install recon-pipeline, a ready-made recon pipeline that uses Python and Luigi
How to configure recon-pipeline for your target
How to run the recon pipeline with luigi command
How to monitor the recon pipeline with luigi web interface
How to analyze the results of the recon pipeline
This article is part one of a series on how to build an automated recon pipeline with Python and Luigi. In part two, we will dive deeper into the code and logic behind each task in the recon pipeline. We will also learn how to extend and modify the recon pipeline to add new features and functionalities.
Setup and Scope
Before we can start building our recon pipeline, we need to set up our environment and define our scope. In this section, we will cover how to install Python and Luigi on your system, how to install recon-pipeline from GitHub, how to configure it for your target domain, and how to define the scope of your recon.
Installing Python and Luigi
To run our recon pipeline, we need Python 3.6 or higher installed on our system. You can check your Python version by running python --version in your terminal. If you don't have Python installed or have an older version, you can download it from https://www.python.org/downloads/ and follow the instructions for your operating system.
Once you have Python installed, you can install Luigi with pip, the Python package manager. Pip should be included with your Python installation, but if not, you can install it from https://pip.pypa.io/en/stable/installing/. To install Luigi, run pip install luigi in your terminal. This will install the latest stable version of Luigi from PyPI, the Python Package Index.
Installing recon-pipeline
Recon-pipeline is an open-source project that provides a ready-made recon pipeline that uses Python and Luigi. It was created by epi052, a security researcher and bug bounty hunter, and you can find it on GitHub at https://github.com/epi052/recon-pipeline. Recon-pipeline consists of several tasks that perform different recon activities, such as:
Domain enumeration with Amass and crt.sh
Subdomain discovery with Sublist3r, massdns, and dnsgen
Port scanning with masscan and nmap
Directory brute-forcing with gobuster
Content discovery with LinkFinder and Waybackurls
Screenshotting with Aquatone
Vulnerability scanning with Nuclei
To install recon-pipeline, you can clone the GitHub repository to your local machine by running git clone https://github.com/epi052/recon-pipeline.git in your terminal. This will create a folder called recon-pipeline in your current directory. Alternatively, you can download the zip file from GitHub and extract it to your desired location.
After cloning or downloading recon-pipeline, you need to install its dependencies. Recon-pipeline requires some external tools to run, such as Amass, masscan, nmap, etc. You can find the list of required tools and their installation instructions in the README file of the GitHub repository. You can also use the install script provided by recon-pipeline to automate the installation process. To run the install script, navigate to the recon-pipeline folder and run ./install.sh in your terminal. This will install all the required tools and their dependencies for you.
Configuring recon-pipeline
After installing recon-pipeline and its dependencies, you need to configure it for your target domain. Recon-pipeline uses a configuration file called config.toml to store the settings and parameters for each task in the pipeline. You can find this file in the recon-pipeline folder. You can edit this file with any text editor of your choice.
The configuration file has several sections, each corresponding to a task in the pipeline. Each section has some options that you can modify to suit your needs and preferences. For example, you can change the number of threads, the wordlist, the timeout, etc. You can also enable or disable certain tasks by setting their enabled option to true or false.
The most important option that you need to set is the target-domain option in the [general] section. This is where you specify the domain that you want to perform recon on. For example, if you want to recon gitlab.com, you would set target-domain = "gitlab.com". You can also specify multiple domains by separating them with commas, such as target-domain = "gitlab.com, github.com".
You can also set some global options in the [general] section, such as output-directory, which specifies where to store the results of the pipeline, and nmap-top-ports, which specifies how many ports to scan with nmap.
You can find more details about each option and its meaning in the comments of the configuration file. You can also refer to the documentation of each tool that recon-pipeline uses for more information.
Defining the scope of the recon
The last step before running our recon pipeline is to define the scope of our recon. The scope is the set of targets that we want to include or exclude from our recon activities. For example, if we are doing a bug bounty program on gitlab.com, we might want to include only subdomains that belong to gitlab.com and exclude any third-party domains or IP addresses 71b2f0854b