Opvizor Blog

How to Extract PostgreSQL Data from VCSA Backup

Written by blog | Aug 27, 2024 2:56:12 PM

Introduction

Recently, one of our valued customers came to us with an interesting query:

“We want to investigate an issue in our infrastructure. For this, we need some historic vCenter Event data. Since retention for events is set to the default of 30 days, we don't have the historic data we need in our running vCenter anymore. However, we have several backups created with the VMware Appliance Management (VAMI - port 5480). Can you extract the events from these backups directly, or do we need to restore all these backups to search for the events data we need?”

Given that the vCenter Server Appliance (vCSA) uses PostgreSQL, extracting the required data seemed manageable. Michael, a core developer of Opvizor Metrics and Logs, eagerly took on this challenge. Below is a detailed account of how he successfully extracted the data.

Understanding the vCSA Backup Structure

Before diving into data extraction, it’s essential to understand the anatomy of a typical vCSA backup:

  • Backups are usually stored on a network share.
  • The top-level folder of the network share will have a folder named 'vCenter'.
  • Inside the 'vCenter' folder, you will find separate folders for each vCenter instance being backed up to the network share.
  • Each vCenter folder contains subfolders for individual backup runs. These subfolders are named using the following convention: an initial letter ('M' for manually created or 'S' for scheduled backups), followed by the vCenter version, and ending with a timestamp.
  • Inside these subfolders, you’ll find various files related to different vCSA components. For our purposes, we need specific files to restore the internal PostgreSQL database.

Setting Up an Environment to Read the Backup

To simplify the process and ensure isolation of the data, we decided to use Docker. This setup allows for multiple backup instances to be processed with minimal effort. Here's how we did it:

  1. Requirements: The only requirement is a workstation with Docker installed.
  2. Data Handling: Backup data is copied into a dedicated folder, with all operations confined to this folder and its subdirectories.
  3. PostgreSQL Setup: We used a PostgreSQL Docker image, eliminating the need for a complex setup.

Step 1: Create a Project Directory and Prepare the Data

Start by creating a directory under your home directory for the project:

mkdir ~/vcsa_recovery
cd ~/vcsa_recovery

Copy the necessary backup files into this directory:

  • database_full_backup.tar.gz
  • wal_backup_1.tar.gz
  • wal_dir_struct.tar.gz

Extract the contents of these files into a new pg_data folder:

mkdir pg_data
cd pg_data
tar xzf ../database_full_backup.tar.gz
tar xzf ../wal_backup_1.tar.gz
tar xzf ../wal_dir_struct.tar.gz

Next, extract the WAL (Write-Ahead Logging) files from the backup and move them to their correct locations:

cd storage/archive/vpostgres/
gunzip *
mv * ../../dblog/vpostgres/pg_xlog/

Edit the postgresql.conf file using a text editor such as nano:

nano storage/db/vpostgres/postgresql.conf

Make the following changes to the postgresql.conf file:

Change the ownership of the database files to match the PostgreSQL Docker image’s user ID and group ID:

sudo chown -R 999:999 *
cd ..

Step 2: Start PostgreSQL

Now, start PostgreSQL using Docker with the following command:

docker run -t -i --rm --name=vcsa_recovery_db -e PGDATA=/storage/db/vpostgres -v ./pg_data/storage:/storage postgres:14

Step 3: Explore the Restored vCenter Database

With the PostgreSQL server running, you can now explore the restored database. Open a new shell and run:

docker exec -t -i vcsa_recovery_db psql -U postgres VCDB

This command will take you into psql, the PostgreSQL command-line interface. To find the event data, you can run:

select * from vpx_event_partition_lookup;

This query will provide the index for the events table. From here, you can start exploring the vpx_event_<index> tables to find the specific event data you're looking for.

Conclusion

By following this guide, you can set up a working PostgreSQL server to access the vCenter internal database from backups within minutes. This setup is versatile and can be used for various purposes, such as:

  • Exploring historical tasks and events.
  • Investigating the historical state of managed objects.
  • Conducting forensic investigations.

Using Docker provides the flexibility to isolate and manage multiple instances without complex setups, making it an efficient solution for dealing with vCSA backups.