At 2:23 AM last Tuesday, something went wrong with our database. Backups were a few days old, and so recovery took longer. That’s when I realized our backup system had to be better.
https://cdn.hashnode.com/res/hashnode/image/upload/v1765867594016/6a876012-b7bd-40de-a696-d3908ccf69ab.png?w=1600&h=840&fit=crop&crop=entropy&auto=compress,format&format=webp
Managed databases can be expensive to run. Going self-managed is the alternative, but it comes with its own responsibilities to achieve resiliency. Part of that for me involves building a backup system that:
-
Runs automatically at 2 AM
-
Retries up to 10 times if something fails
-
Sends Slack notifications on success or failure
-
Stores backups in S3
-
Makes restoration a single command
No database was harmed in the making of this system. Here's how I did it.
The Stack
-
Database: PostgreSQL 17 in Docker
-
Storage: AWS S3 with lifecycle policies
-
Orchestration: Bash scripts + cron
-
Notifications: Slack webhooks
-
Backup tool: pg_dump (PostgreSQL's built-in tool)
Architecture Overview
The backup system has three main components:
-
The Backup Script - Handles the actual
pg_dump, S3 upload, and local cleanup. Includes retry logic for resilience. -
Cron Jobs - Schedules backups (production daily, staging weekly).
-
Notification Webhook - POSTs status updates directly to Slack webhook.
This is straightforward… cron triggers the script → script backs up database → uploads to S3 → sends notification → cleans up old local files
Implementation
Step 1: Configure AWS and Docker
AWS Setup:
First, you want configure AWS CLI on your host machine and create S3 buckets for your backups. Install AWS CLI if it’s not.
aws configure
# Enter your credentials and region
# Test access
aws s3 ls
Docker Setup:
Add a volume mount to your database service in compose.yml:
services:
db:
image: postgres:17.2-bookworm
volumes:
- db_data:/var/lib/postgresql/data
- ./backups:/backups # Add this for backup files
environment:
POSTGRES_DB: myapp_db
POSTGRES_USER: myapp_user
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
db_data:
Restart your containers to apply the changes:
docker compose up -d
> Production Note: I actually use Docker Swarm in production/staging for better orchestration. The backup strategy is pretty much the same. Just update container filters to docker ps -qf "name=stackname_db.1" instead of "name=projectname_db". I'll cover this in the Production Considerations section.
Step 2: The Backup Script
Create ~/scripts/backup.sh with three key functions:
-
run_backup()- Executespg_dumpviadocker exec, uploads to S3, cleans up old local files -
send_notification()- POSTs backup status directly to Slack -
Retry logic - attempts backup up to 10 times with 10-minute intervals
Here's the simplified structure:
#!/bin/bash
set -euo pipefail
ENV="${1:-prod}"
MAX_RETRIES=10
RETRY_INTERVAL=600 # 10 minutes
# load configs (Slack webhook, AWS region, etc.)
source ~/.backup_env
# environment-specific config
if [ "$ENV" == "prod" ]; then
DB_NAME="myapp_prod"
DB_USER="myapp_user"
S3_BUCKET="myapp-backups"
else
DB_NAME="myapp_staging"
DB_USER="myapp_user"
S3_BUCKET="myapp-staging-backups"
fi
BACKUP_FILE="backup_$(date +%Y-%m-%dT%H-%M-%S).dump"
send_notification() {
local status=$1
local error_msg=${2:-""}
# determine color
local color="good"
[[ "$status" == *"failure"* ]] && color="danger"
# build the Slack payload
local payload=$(cat < ~/.backup_env << 'EOF'
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
EOF
chmod 600 ~/.backup_env # only the file's owner has read and write access
Step 3: Automation with Cron
Schedule the backups to run automatically:
crontab -e
# add these lines:
# Production - daily at 2 AM
0 2 * * * /home/username/scripts/backup.sh prod >> /home/username/.db_backup_logs/backup-prod.log 2>&1
# Staging - Sundays at 2 AM
0 2 * * 0 /home/username/scripts/backup.sh staging >> /home/username/.db_backup_logs/backup-staging.log 2>&1
# Cleanup old logs - daily at 3 AM
0 3 * * * find /home/username/.db_backup_logs -name "*.log" -mtime +30 -delete
mkdir -p ~/.db_backup_logs
Step 4: The Restoration Script
Backups are useless if you can't restore them. You could do it manually, but why not just use Bash scripting as well? Create ~/scripts/db-restore.sh:
```bash
!/bin/bash
set -euo pipefail
Usage: ./db-restore.sh prod backup_2025-12-13T02-00-00.dump [--full]
ENV="${1}"
BACKUP_FILE="${2}"
FULL_RESTORE="${3:-}"
Download from S3
aws s3 cp s3://myapp-backups/$BACKUP_FILE ~/restore/
Copy to container
CONTAINER_ID=$(docker ps -qf "name=db")
docker cp ~/restore/$BACKUP_FILE $CONTAINER_ID:/tmp/
Restore (with confirmation prompt)
if [ "$FULL_RESTORE" == "--full" ]; then
# Drop and recreate database
docker exec -i $CONTAINER_ID psql -U postgres <Originally published at blog.theolujay.dev