Files
HomeServer/CLAUDE.md
2026-03-14 11:42:08 +00:00

13 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a home server Infrastructure as Code (IAC) repository that manages a K3s Kubernetes cluster deployment with various self-hosted applications. The setup combines Ansible for system configuration and Kubernetes manifests for application deployment.

Quick Links:

Architecture

Core Infrastructure

  • K3s: Lightweight Kubernetes distribution running on home server hardware
  • Ansible: System configuration and K3s cluster setup
  • Traefik: Ingress controller (default with K3s)
  • Cert-manager: TLS certificate management

Application Stack

  • Core services: Bitwarden (password manager), Nextcloud (file storage), Home Assistant
  • Media stack: Jellyfin, Sonarr, Deluge, Openbooks, Jackett, Transmission
  • Development: Gitea (git hosting), Gitea Runner (CI/CD), Homarr (dashboard)
  • Fun/personal: Horchposten, JNR-Web (locally built images)
  • Infrastructure: Cloudflare Tunnel for external access

Storage Architecture

  • NVMe SSD: Kingston SNV2S250G (233GB)
    • /dev/nvme0n1p1 (512MB) → /boot/efi
    • /dev/nvme0n1p2 (70GB) → / (root filesystem)
    • /dev/nvme0n1p3 (163GB) → bcache cache device
  • HDD 1: Seagate IronWolf ST4000VN006 (4TB)
    • /dev/sda1/mnt/backup-mirror (backup mirror)
  • HDD 2: Seagate IronWolf ST4000VN006 (4TB)
    • /dev/sdb → bcache backing device
    • Combined with NVMe cache → /dev/bcache0/mnt/bcache (main data storage)

Bcache Configuration:

  • Cache mode: writearound (writes go to backing device, reads are cached)
  • Cache set UUID: 74a7d177-65f4-4902-9fe5-e596602c28d4
  • Provides SSD-accelerated storage for Kubernetes persistent volumes

Bcache Management:

# Check bcache status
cat /sys/block/bcache0/bcache/state

# Check cache mode
cat /sys/block/bcache0/bcache/cache_mode

# View cache statistics
cat /sys/block/bcache0/bcache/stats_total/cache_hits
cat /sys/block/bcache0/bcache/stats_total/cache_misses

# If cache is detached (shows "no cache"), re-attach it:
echo "74a7d177-65f4-4902-9fe5-e596602c28d4" | sudo tee /sys/block/bcache0/bcache/attach

# Verify cache attached (should show "clean" or "dirty")
cat /sys/block/bcache0/bcache/state

Common Commands

Ansible Operations

Note: Ansible configuration was last reviewed and updated in October 2025. See Ansible/ANSIBLE_REVIEW_2025.md for details.

Maintenance Automation: All automated maintenance, backups, and monitoring can now be deployed via Ansible. See Ansible/MAINTENANCE_ROLES_README.md for details.

# Setup automated maintenance, backups, and monitoring
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml

# Setup only specific components
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags backup
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags smart
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags maintenance

# Deploy complete home server setup (system + K3s)
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml

# Update K3s cluster
ansible-playbook -i Ansible/inventory.ini Ansible/update_k3s.yaml

# Check bcache status
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags bcache

Kubernetes Operations

# Apply all manifests in a directory
kubectl apply -f k8s/core/bitwarden/

# Deploy Nextcloud via Helm
helm upgrade --install nextcloud ./k8s/nextcloud/

# Check cluster status
kubectl get nodes
kubectl get pods -A

# Access secrets (use with caution)
./k8s/secret-dump.sh

Infrastructure Management

# Install Helm (if needed)
./get_helm.sh

Key Directories

  • Ansible/: System configuration, K3s deployment, user management
  • k8s/core/: Essential cluster services (cert-manager, bitwarden, etc.)
  • k8s/media/: Media server applications
  • k8s/lab/: Development and experimental services (Gitea, Gitea Runner, Multus)
  • k8s/fun/: Personal projects (Horchposten, JNR-Web)
  • k8s/nextcloud/: Helm chart for Nextcloud deployment
  • certs/: TLS certificates for services

Inventory & Configuration

  • Primary server: kimchi (192.168.178.55) - x86_64 K3s master
  • Secondary: pi-one (192.168.178.11) - ARM device
  • Ansible inventory: Ansible/inventory.ini
  • K3s config location: /var/lib/rancher/k3s/server/manifests/

Important Notes

  • K3s automatically deploys manifests placed in /var/lib/rancher/k3s/server/manifests/
  • Traefik ingress controller is pre-installed with K3s
  • SSH port configuration may vary (see ssh_juggle_port.yml)
  • Persistent volumes use local storage with specific node affinity
  • Some services have hardcoded passwords/credentials (should be moved to secrets)

Certificate Management

K3s certificates expire and need periodic rotation. If kubectl commands fail with authentication errors like:

x509: certificate has expired or is not yet valid

Solution:

# Rotate certificates
sudo k3s certificate rotate

# Restart K3s service
sudo systemctl restart k3s

# Update kubeconfig with new certificates
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config

Signs of certificate expiration:

  • kubectl commands return "the server has asked for the client to provide credentials"
  • K3s logs show x509 certificate expired errors
  • Unable to access cluster API even from the master node

Automated Certificate Management

Automatic rotation is now configured using systemd timers:

Scripts Available:

# Automated rotation script (runs monthly via systemd timer)
./scripts/k3s-cert-rotate.sh [--force] [--dry-run]

# Certificate monitoring script 
./scripts/k3s-cert-check.sh

Manual Management:

# Check systemd timer status
sudo systemctl status k3s-cert-rotation.timer

# See next scheduled run
systemctl list-timers k3s-cert-rotation.timer

# Manual rotation (if needed)
sudo ./scripts/k3s-cert-rotate.sh --force

# Check certificate status
sudo ./scripts/k3s-cert-check.sh

Configuration:

  • Automatic rotation: Monthly via systemd timer
  • Rotation threshold: 30 days before expiration
  • Backup location: /var/lib/rancher/k3s/server/cert-backups/
  • Logs: /var/log/k3s-cert-rotation.log

The automation includes:

  • Certificate expiration checking
  • Automatic backup before rotation
  • Service restart and kubeconfig updates
  • Health verification after rotation
  • Logging and notifications

Drive Health Monitoring

SMART Monitoring with smartmontools

Automatic SMART monitoring is configured for all drives:

Configuration: /etc/smartd.conf

Monitored Drives:

  • /dev/sda - Seagate IronWolf 4TB (backup mirror)
  • /dev/sdb - Seagate IronWolf 4TB (bcache backing device)
  • /dev/nvme0n1 - Kingston SNV2S250G 233GB (cache + system)

Monitoring Features:

  • Daily short self-tests at 2:00 AM
  • Weekly long self-tests on Saturdays at 3:00 AM
  • Temperature monitoring (warns at 45°C for HDDs, 60°C for NVMe)
  • Automatic alerts to syslog for any SMART failures
  • Tracks reallocated sectors and pending sectors

Useful Commands:

# Check drive health status
sudo smartctl -H /dev/sda
sudo smartctl -H /dev/sdb
sudo smartctl -H /dev/nvme0n1

# View full SMART attributes
sudo smartctl -a /dev/sda

# Check service status
systemctl status smartmontools

# View SMART logs
sudo journalctl -u smartmontools

Monitored Drives (continuous):

  • /dev/sdb: ~20,000 power-on hours, 42°C, 0 reallocated sectors ✓
  • /dev/nvme0n1: 4% wear level, 58°C, 100% spare available ✓

Backup Drive (/dev/sda):

  • Not continuously monitored - spins down after 10 minutes to save energy
  • Health checked automatically during weekly backup runs
  • Expected stats: ~18,000 power-on hours, 37°C, 0 reallocated sectors ✓
  • Power savings: ~35-50 kWh/year

Networking

  • Cloudflare Tunnel provides external access
  • Gitea uses custom SSH port 55522
  • Internal cluster networking via Traefik ingress
  • TLS termination handled by cert-manager + Let's Encrypt

System Maintenance

Automated Tasks Summary

The home server has three automated maintenance systems running:

Task Schedule Purpose Log Location
K3s Certificate Rotation Monthly (1st of month, 12:39 AM) Rotates K3s certificates if expiring within 30 days /var/log/k3s-cert-rotation.log
System Maintenance Quarterly (Jan/Apr/Jul/Oct 1, 3:00 AM) Prunes images, cleans logs, runs apt cleanup /var/log/k3s-maintenance.log
Backup Mirror Sync Weekly (Sundays, 2:00 AM) Syncs /mnt/bcache to /mnt/backup-mirror /var/log/backup-mirror-sync.log
SMART Self-Tests Daily short (2:00 AM), Weekly long (Sat 3:00 AM) Tests drive health journalctl -u smartmontools

Check all timers:

systemctl list-timers

Automated Quarterly Maintenance

Automatic maintenance is now configured using systemd timers:

Scripts Available:

# Quarterly maintenance script (runs Jan 1, Apr 1, Jul 1, Oct 1 at 3:00 AM)
/usr/local/bin/k3s-maintenance.sh

# Check maintenance timer status
systemctl list-timers k3s-maintenance.timer
systemctl status k3s-maintenance.timer

What it does:

  • Prunes unused container images
  • Cleans journal logs (keeps 30 days)
  • Runs apt autoremove and autoclean
  • Logs to /var/log/k3s-maintenance.log

Configuration:

  • Schedule: Quarterly (January 1, April 1, July 1, October 1 at 3:00 AM)
  • Script location: /usr/local/bin/k3s-maintenance.sh
  • Service files: /etc/systemd/system/k3s-maintenance.{service,timer}
  • Log location: /var/log/k3s-maintenance.log

Manual Maintenance Tasks

System Updates (recommended monthly):

sudo apt update && sudo apt upgrade -y
sudo apt autoremove -y

Note: The old Kubernetes APT repository (apt.kubernetes.io) has been removed as it was deprecated. K3s provides its own kubectl via symlink at /usr/local/bin/kubectl -> /usr/local/bin/k3s.

Disk Cleanup (as needed):

# Clean old journal logs
sudo journalctl --vacuum-time=30d

# Prune container images
sudo crictl rmi --prune

# Check disk usage
df -h /

Important Locations:

  • Root partition: /dev/nvme0n1p2 (69GB total)
  • Data storage: /mnt/bcache (3.6TB bcache)
  • Backup mirror: /mnt/backup-mirror (3.6TB)
  • K3s data: /var/lib/rancher/k3s/
  • Journal logs: /var/log/journal/

Backup System

Automated Weekly Backups

Automatic backups are configured using rsync and systemd timers:

Scripts Available:

# Weekly backup script (runs Sundays at 2:00 AM)
/usr/local/bin/backup-mirror-sync.sh

# Check backup timer status
systemctl list-timers backup-mirror-sync.timer
systemctl status backup-mirror-sync.timer

# View backup logs
sudo tail -f /var/log/backup-mirror-sync.log

# Run manual backup
sudo /usr/local/bin/backup-mirror-sync.sh

Configuration:

  • Source: /mnt/bcache/ (main data storage with bcache)
  • Destination: /mnt/backup-mirror/ (4TB mirror on /dev/sda1)
  • Schedule: Weekly on Sundays at 2:00 AM
  • Method: Rsync with incremental sync and deletion of removed files
  • Script location: /usr/local/bin/backup-mirror-sync.sh
  • Service files: /etc/systemd/system/backup-mirror-sync.{service,timer}
  • Log location: /var/log/backup-mirror-sync.log

What it backs up:

  • All Kubernetes persistent volume data
  • Nextcloud files (2TB)
  • Jellyfin media library
  • Gitea repositories
  • Bitwarden data
  • Home Assistant configuration
  • All other application data on /mnt/bcache

Restore Process: In case of main drive failure, data can be restored from /mnt/backup-mirror:

# Verify backup mount
mountpoint /mnt/backup-mirror

# Restore all data (if bcache is rebuilt)
sudo rsync -avh --delete /mnt/backup-mirror/ /mnt/bcache/

# Or restore specific directories
sudo rsync -avh /mnt/backup-mirror/nextcloud/ /mnt/bcache/nextcloud/