Support tasks often involve replicating situations in production that are not easy to simulate in a local environment. The most effective way to replicate these situations is by emulating as precise as possible the same context in which the issue was found, including configuration, data, external agents/integration, etc. While there is no way to simulate one environment in a 100% precise way, there are techniques that help considerably on this task.
One technique that can be used on the data front is to download a dump of the production database(s) to replicate the same dataset locally. In a local environment we can leverage on tools like a debugger, refine logging and simulate situations without worrying about third-party effects. It's important to take into account that, for the sake of security and privacy, we need to ensure that no Personal Identification Information (PII) is transferred. Here we are going to show how we can do this when dealing with a relational MySQL database.
The easiest way of generating a dump from a production environment is to use tools each DBMS provides. In MySQL, we can use mysqldump, with a couple of parameters (host, username, password, DB name) and that must be enough to obtain an entire dump of a database that we can import locally. However, this approach has two issues:
To solve the aforementioned issues, we can rely on one feature that mysqldump provides: the where parameter. This parameter allows us to write conditions to filter the information that will be included in each dump. In this case, we need to do it table by table by including the following parameters
To structure this in a modular way in a Bash script, we can read a JSON file (we will name it table_conditions.json) which has all the specific filters for each table (if needed). In this example, we’ll use the customer id as a cut variable, we’ll refer to that id as $ENTITY_ID to replace it with the specific id later in the script execution:
With this spec, we can iterate through the JSON file, retrieving all the tables and their conditions and storing them into an associative array to simplify its management
And then we can run mysqldump individually on each case to generate one dump per table, applying the filters. If we are connecting to the DB via a Bastion Host (as we do) we would need to store the dumps temporarily on that host so it can be downloaded after they are generated. In this case, it's also recommended to compress the dumps to speed up their transfer.
# Function to create the backup for a table
After we have all the dumps stored locally, we can easily import them (with on-the-fly decompression) using
# Function to decompress and import the .sql file
Obtaining dumps of remote information is often a very effective way to replicate problems in productive environments locally, which gives powerful access to tools like debuggers and fine-grain logging. In this article, we've explored a practical way to generate these dumps by relying on existing tools - using a RDBMS, MySQL, as an example. At Ensolvers, we have applied this technique to other databases and search engines like PostgreSQL and ElasticSearch.