13 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Overview
This is a home server Infrastructure as Code (IAC) repository that manages a K3s Kubernetes cluster deployment with various self-hosted applications. The setup combines Ansible for system configuration and Kubernetes manifests for application deployment.
Quick Links:
- Storage and Backup Guide - Detailed storage, bcache, backup, and drive health information
- Ansible Review - Ansible configuration review and updates (October 2025)
- Ansible Maintenance Roles - Automated setup for maintenance, backups, and monitoring
- Certificate Management - K3s certificate rotation
- System Maintenance - Automated maintenance tasks
Architecture
Core Infrastructure
- K3s: Lightweight Kubernetes distribution running on home server hardware
- Ansible: System configuration and K3s cluster setup
- Traefik: Ingress controller (default with K3s)
- Cert-manager: TLS certificate management
Application Stack
- Core services: Bitwarden (password manager), Nextcloud (file storage), Home Assistant
- Media stack: Jellyfin, Sonarr, Deluge, Openbooks, Jackett, Transmission
- Development: Gitea (git hosting), Gitea Runner (CI/CD), Homarr (dashboard)
- Fun/personal: Horchposten, JNR-Web (locally built images)
- Infrastructure: Cloudflare Tunnel for external access
Storage Architecture
- NVMe SSD: Kingston SNV2S250G (233GB)
/dev/nvme0n1p1(512MB) →/boot/efi/dev/nvme0n1p2(70GB) →/(root filesystem)/dev/nvme0n1p3(163GB) → bcache cache device
- HDD 1: Seagate IronWolf ST4000VN006 (4TB)
/dev/sda1→/mnt/backup-mirror(backup mirror)
- HDD 2: Seagate IronWolf ST4000VN006 (4TB)
/dev/sdb→ bcache backing device- Combined with NVMe cache →
/dev/bcache0→/mnt/bcache(main data storage)
Bcache Configuration:
- Cache mode:
writearound(writes go to backing device, reads are cached) - Cache set UUID:
74a7d177-65f4-4902-9fe5-e596602c28d4 - Provides SSD-accelerated storage for Kubernetes persistent volumes
Bcache Management:
# Check bcache status
cat /sys/block/bcache0/bcache/state
# Check cache mode
cat /sys/block/bcache0/bcache/cache_mode
# View cache statistics
cat /sys/block/bcache0/bcache/stats_total/cache_hits
cat /sys/block/bcache0/bcache/stats_total/cache_misses
# If cache is detached (shows "no cache"), re-attach it:
echo "74a7d177-65f4-4902-9fe5-e596602c28d4" | sudo tee /sys/block/bcache0/bcache/attach
# Verify cache attached (should show "clean" or "dirty")
cat /sys/block/bcache0/bcache/state
Common Commands
Ansible Operations
Note: Ansible configuration was last reviewed and updated in October 2025. See Ansible/ANSIBLE_REVIEW_2025.md for details.
Maintenance Automation: All automated maintenance, backups, and monitoring can now be deployed via Ansible. See Ansible/MAINTENANCE_ROLES_README.md for details.
# Setup automated maintenance, backups, and monitoring
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml
# Setup only specific components
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags backup
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags smart
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags maintenance
# Deploy complete home server setup (system + K3s)
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml
# Update K3s cluster
ansible-playbook -i Ansible/inventory.ini Ansible/update_k3s.yaml
# Check bcache status
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags bcache
Kubernetes Operations
# Apply all manifests in a directory
kubectl apply -f k8s/core/bitwarden/
# Deploy Nextcloud via Helm
helm upgrade --install nextcloud ./k8s/nextcloud/
# Check cluster status
kubectl get nodes
kubectl get pods -A
# Access secrets (use with caution)
./k8s/secret-dump.sh
Infrastructure Management
# Install Helm (if needed)
./get_helm.sh
Key Directories
Ansible/: System configuration, K3s deployment, user managementk8s/core/: Essential cluster services (cert-manager, bitwarden, etc.)k8s/media/: Media server applicationsk8s/lab/: Development and experimental services (Gitea, Gitea Runner, Multus)k8s/fun/: Personal projects (Horchposten, JNR-Web)k8s/nextcloud/: Helm chart for Nextcloud deploymentcerts/: TLS certificates for services
Inventory & Configuration
- Primary server:
kimchi(192.168.178.55) - x86_64 K3s master - Secondary:
pi-one(192.168.178.11) - ARM device - Ansible inventory:
Ansible/inventory.ini - K3s config location:
/var/lib/rancher/k3s/server/manifests/
Important Notes
- K3s automatically deploys manifests placed in
/var/lib/rancher/k3s/server/manifests/ - Traefik ingress controller is pre-installed with K3s
- SSH port configuration may vary (see ssh_juggle_port.yml)
- Persistent volumes use local storage with specific node affinity
- Some services have hardcoded passwords/credentials (should be moved to secrets)
Certificate Management
K3s certificates expire and need periodic rotation. If kubectl commands fail with authentication errors like:
x509: certificate has expired or is not yet valid
Solution:
# Rotate certificates
sudo k3s certificate rotate
# Restart K3s service
sudo systemctl restart k3s
# Update kubeconfig with new certificates
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config
Signs of certificate expiration:
- kubectl commands return "the server has asked for the client to provide credentials"
- K3s logs show x509 certificate expired errors
- Unable to access cluster API even from the master node
Automated Certificate Management
Automatic rotation is now configured using systemd timers:
Scripts Available:
# Automated rotation script (runs monthly via systemd timer)
./scripts/k3s-cert-rotate.sh [--force] [--dry-run]
# Certificate monitoring script
./scripts/k3s-cert-check.sh
Manual Management:
# Check systemd timer status
sudo systemctl status k3s-cert-rotation.timer
# See next scheduled run
systemctl list-timers k3s-cert-rotation.timer
# Manual rotation (if needed)
sudo ./scripts/k3s-cert-rotate.sh --force
# Check certificate status
sudo ./scripts/k3s-cert-check.sh
Configuration:
- Automatic rotation: Monthly via systemd timer
- Rotation threshold: 30 days before expiration
- Backup location:
/var/lib/rancher/k3s/server/cert-backups/ - Logs:
/var/log/k3s-cert-rotation.log
The automation includes:
- Certificate expiration checking
- Automatic backup before rotation
- Service restart and kubeconfig updates
- Health verification after rotation
- Logging and notifications
Drive Health Monitoring
SMART Monitoring with smartmontools
Automatic SMART monitoring is configured for all drives:
Configuration: /etc/smartd.conf
Monitored Drives:
/dev/sda- Seagate IronWolf 4TB (backup mirror)/dev/sdb- Seagate IronWolf 4TB (bcache backing device)/dev/nvme0n1- Kingston SNV2S250G 233GB (cache + system)
Monitoring Features:
- Daily short self-tests at 2:00 AM
- Weekly long self-tests on Saturdays at 3:00 AM
- Temperature monitoring (warns at 45°C for HDDs, 60°C for NVMe)
- Automatic alerts to syslog for any SMART failures
- Tracks reallocated sectors and pending sectors
Useful Commands:
# Check drive health status
sudo smartctl -H /dev/sda
sudo smartctl -H /dev/sdb
sudo smartctl -H /dev/nvme0n1
# View full SMART attributes
sudo smartctl -a /dev/sda
# Check service status
systemctl status smartmontools
# View SMART logs
sudo journalctl -u smartmontools
Monitored Drives (continuous):
/dev/sdb: ~20,000 power-on hours, 42°C, 0 reallocated sectors ✓/dev/nvme0n1: 4% wear level, 58°C, 100% spare available ✓
Backup Drive (/dev/sda):
- Not continuously monitored - spins down after 10 minutes to save energy
- Health checked automatically during weekly backup runs
- Expected stats: ~18,000 power-on hours, 37°C, 0 reallocated sectors ✓
- Power savings: ~35-50 kWh/year
Networking
- Cloudflare Tunnel provides external access
- Gitea uses custom SSH port 55522
- Internal cluster networking via Traefik ingress
- TLS termination handled by cert-manager + Let's Encrypt
System Maintenance
Automated Tasks Summary
The home server has three automated maintenance systems running:
| Task | Schedule | Purpose | Log Location |
|---|---|---|---|
| K3s Certificate Rotation | Monthly (1st of month, 12:39 AM) | Rotates K3s certificates if expiring within 30 days | /var/log/k3s-cert-rotation.log |
| System Maintenance | Quarterly (Jan/Apr/Jul/Oct 1, 3:00 AM) | Prunes images, cleans logs, runs apt cleanup | /var/log/k3s-maintenance.log |
| Backup Mirror Sync | Weekly (Sundays, 2:00 AM) | Syncs /mnt/bcache to /mnt/backup-mirror |
/var/log/backup-mirror-sync.log |
| SMART Self-Tests | Daily short (2:00 AM), Weekly long (Sat 3:00 AM) | Tests drive health | journalctl -u smartmontools |
Check all timers:
systemctl list-timers
Automated Quarterly Maintenance
Automatic maintenance is now configured using systemd timers:
Scripts Available:
# Quarterly maintenance script (runs Jan 1, Apr 1, Jul 1, Oct 1 at 3:00 AM)
/usr/local/bin/k3s-maintenance.sh
# Check maintenance timer status
systemctl list-timers k3s-maintenance.timer
systemctl status k3s-maintenance.timer
What it does:
- Prunes unused container images
- Cleans journal logs (keeps 30 days)
- Runs apt autoremove and autoclean
- Logs to
/var/log/k3s-maintenance.log
Configuration:
- Schedule: Quarterly (January 1, April 1, July 1, October 1 at 3:00 AM)
- Script location:
/usr/local/bin/k3s-maintenance.sh - Service files:
/etc/systemd/system/k3s-maintenance.{service,timer} - Log location:
/var/log/k3s-maintenance.log
Manual Maintenance Tasks
System Updates (recommended monthly):
sudo apt update && sudo apt upgrade -y
sudo apt autoremove -y
Note: The old Kubernetes APT repository (apt.kubernetes.io) has been removed as it was deprecated. K3s provides its own kubectl via symlink at /usr/local/bin/kubectl -> /usr/local/bin/k3s.
Disk Cleanup (as needed):
# Clean old journal logs
sudo journalctl --vacuum-time=30d
# Prune container images
sudo crictl rmi --prune
# Check disk usage
df -h /
Important Locations:
- Root partition:
/dev/nvme0n1p2(69GB total) - Data storage:
/mnt/bcache(3.6TB bcache) - Backup mirror:
/mnt/backup-mirror(3.6TB) - K3s data:
/var/lib/rancher/k3s/ - Journal logs:
/var/log/journal/
Backup System
Automated Weekly Backups
Automatic backups are configured using rsync and systemd timers:
Scripts Available:
# Weekly backup script (runs Sundays at 2:00 AM)
/usr/local/bin/backup-mirror-sync.sh
# Check backup timer status
systemctl list-timers backup-mirror-sync.timer
systemctl status backup-mirror-sync.timer
# View backup logs
sudo tail -f /var/log/backup-mirror-sync.log
# Run manual backup
sudo /usr/local/bin/backup-mirror-sync.sh
Configuration:
- Source:
/mnt/bcache/(main data storage with bcache) - Destination:
/mnt/backup-mirror/(4TB mirror on/dev/sda1) - Schedule: Weekly on Sundays at 2:00 AM
- Method: Rsync with incremental sync and deletion of removed files
- Script location:
/usr/local/bin/backup-mirror-sync.sh - Service files:
/etc/systemd/system/backup-mirror-sync.{service,timer} - Log location:
/var/log/backup-mirror-sync.log
What it backs up:
- All Kubernetes persistent volume data
- Nextcloud files (2TB)
- Jellyfin media library
- Gitea repositories
- Bitwarden data
- Home Assistant configuration
- All other application data on
/mnt/bcache
Restore Process:
In case of main drive failure, data can be restored from /mnt/backup-mirror:
# Verify backup mount
mountpoint /mnt/backup-mirror
# Restore all data (if bcache is rebuilt)
sudo rsync -avh --delete /mnt/backup-mirror/ /mnt/bcache/
# Or restore specific directories
sudo rsync -avh /mnt/backup-mirror/nextcloud/ /mnt/bcache/nextcloud/