undo
Go Beyond the Code
arrow_forward_ios

Reducing tech support issues resolution by automating data dumping

November 28, 2023

Introduction

Support tasks often involve replicating situations in production that are not easy to simulate in a local environment. The most effective way to replicate these situations is by emulating as precise as possible the same context in which the issue was found, including configuration, data, external agents/integration, etc. While there is no way to simulate one environment in a 100% precise way, there are techniques that help considerably on this task.

One technique that can be used on the data front is to download a dump of the production database(s) to replicate the same dataset locally. In a local environment we can leverage on tools like a debugger, refine logging and simulate situations without worrying about third-party effects. It's important to take into account that, for the sake of security and privacy, we need to ensure that no Personal Identification Information (PII) is transferred. Here we are going to show how we can do this when dealing with a relational MySQL database.

Generating an entire dump

The easiest way of generating a dump from a production environment is to use tools  each DBMS provides. In MySQL, we can use mysqldump, with a couple of parameters (host, username, password, DB name) and that must be enough to obtain an entire dump of a database that we can import locally. However, this approach has two issues:

  • As a project scales, the size of the dump can require significant amounts of time to download and import. 
  • There is no way of pruning sensitive data that we don't want to export and import locally

Refining the information 

To solve the aforementioned issues, we can rely on one feature that mysqldump provides: the where parameter. This parameter allows us to write conditions to filter the information that will be included in each dump. In this case, we need to do it table by table by including the following parameters

To structure this in a modular way in a Bash script, we can read a JSON file (we will name it table_conditions.json) which has all the specific filters for each table (if needed). In this example, we’ll use the customer id as a cut variable, we’ll refer to that id as $ENTITY_ID to replace it with the specific id later in the script execution:

With this spec, we can iterate through the JSON file, retrieving all the tables and their conditions and storing them into an associative array to simplify its management

And then we can run mysqldump individually on each case to generate one dump per table, applying the filters. If we are connecting to the DB via a Bastion Host (as we do) we would need to store the dumps temporarily on that host so it can be downloaded after they are generated. In this case, it's also recommended to compress the dumps to speed up their transfer. 

# Function to create the backup for a table

After we have all the dumps stored locally, we can easily import them (with on-the-fly decompression) using 

# Function to decompress and import the .sql file


Conclusion

Obtaining dumps of remote information is often a very effective way to replicate problems in productive environments locally, which gives powerful access to tools like debuggers and fine-grain logging. In this article, we've explored a practical way to generate these dumps by relying on existing tools - using a RDBMS, MySQL, as an example. At Ensolvers, we have applied this technique to other databases and search engines like PostgreSQL and ElasticSearch.

Joaquin Franciscutti
Software Engineer & Solver

Start Your Digital Journey Now!

Which capabilities are you interested in?
You may select more than one.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.