tas/HomeServer

Fork 0

Files

Thomas 0b16fb3ac3 update docs

2026-03-14 11:42:08 +00:00

13 KiB

Raw Permalink Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

This is a home server Infrastructure as Code (IAC) repository that manages a K3s Kubernetes cluster deployment with various self-hosted applications. The setup combines Ansible for system configuration and Kubernetes manifests for application deployment.

Quick Links:

Storage and Backup Guide - Detailed storage, bcache, backup, and drive health information
Ansible Review - Ansible configuration review and updates (October 2025)
Ansible Maintenance Roles - Automated setup for maintenance, backups, and monitoring
Certificate Management - K3s certificate rotation
System Maintenance - Automated maintenance tasks

Architecture

Core Infrastructure

K3s: Lightweight Kubernetes distribution running on home server hardware
Ansible: System configuration and K3s cluster setup
Traefik: Ingress controller (default with K3s)
Cert-manager: TLS certificate management

Application Stack

Core services: Bitwarden (password manager), Nextcloud (file storage), Home Assistant
Media stack: Jellyfin, Sonarr, Deluge, Openbooks, Jackett, Transmission
Development: Gitea (git hosting), Gitea Runner (CI/CD), Homarr (dashboard)
Fun/personal: Horchposten, JNR-Web (locally built images)
Infrastructure: Cloudflare Tunnel for external access

Storage Architecture

NVMe SSD: Kingston SNV2S250G (233GB)
- /dev/nvme0n1p1 (512MB) → /boot/efi
- /dev/nvme0n1p2 (70GB) → / (root filesystem)
- /dev/nvme0n1p3 (163GB) → bcache cache device
HDD 1: Seagate IronWolf ST4000VN006 (4TB)
- /dev/sda1 → /mnt/backup-mirror (backup mirror)
HDD 2: Seagate IronWolf ST4000VN006 (4TB)
- /dev/sdb → bcache backing device
- Combined with NVMe cache → /dev/bcache0 → /mnt/bcache (main data storage)

Bcache Configuration:

Cache mode: writearound (writes go to backing device, reads are cached)
Cache set UUID: 74a7d177-65f4-4902-9fe5-e596602c28d4
Provides SSD-accelerated storage for Kubernetes persistent volumes

Bcache Management:

# Check bcache status
cat /sys/block/bcache0/bcache/state

# Check cache mode
cat /sys/block/bcache0/bcache/cache_mode

# View cache statistics
cat /sys/block/bcache0/bcache/stats_total/cache_hits
cat /sys/block/bcache0/bcache/stats_total/cache_misses

# If cache is detached (shows "no cache"), re-attach it:
echo "74a7d177-65f4-4902-9fe5-e596602c28d4" | sudo tee /sys/block/bcache0/bcache/attach

# Verify cache attached (should show "clean" or "dirty")
cat /sys/block/bcache0/bcache/state

Common Commands

Ansible Operations

Note: Ansible configuration was last reviewed and updated in October 2025. See Ansible/ANSIBLE_REVIEW_2025.md for details.

Maintenance Automation: All automated maintenance, backups, and monitoring can now be deployed via Ansible. See Ansible/MAINTENANCE_ROLES_README.md for details.

# Setup automated maintenance, backups, and monitoring
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml

# Setup only specific components
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags backup
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags smart
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags maintenance

# Deploy complete home server setup (system + K3s)
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml --check --diff
ansible-playbook -i Ansible/inventory.ini Ansible/setup_home_server.yml

# Update K3s cluster
ansible-playbook -i Ansible/inventory.ini Ansible/update_k3s.yaml

# Check bcache status
ansible-playbook -i Ansible/inventory.ini Ansible/setup_maintenance.yml --tags bcache

Kubernetes Operations

# Apply all manifests in a directory
kubectl apply -f k8s/core/bitwarden/

# Deploy Nextcloud via Helm
helm upgrade --install nextcloud ./k8s/nextcloud/

# Check cluster status
kubectl get nodes
kubectl get pods -A

# Access secrets (use with caution)
./k8s/secret-dump.sh

Infrastructure Management

# Install Helm (if needed)
./get_helm.sh

Key Directories

Ansible/: System configuration, K3s deployment, user management
k8s/core/: Essential cluster services (cert-manager, bitwarden, etc.)
k8s/media/: Media server applications
k8s/lab/: Development and experimental services (Gitea, Gitea Runner, Multus)
k8s/fun/: Personal projects (Horchposten, JNR-Web)
k8s/nextcloud/: Helm chart for Nextcloud deployment
certs/: TLS certificates for services

Inventory & Configuration

Primary server: kimchi (192.168.178.55) - x86_64 K3s master
Secondary: pi-one (192.168.178.11) - ARM device
Ansible inventory: Ansible/inventory.ini
K3s config location: /var/lib/rancher/k3s/server/manifests/

Important Notes

K3s automatically deploys manifests placed in /var/lib/rancher/k3s/server/manifests/
Traefik ingress controller is pre-installed with K3s
SSH port configuration may vary (see ssh_juggle_port.yml)
Persistent volumes use local storage with specific node affinity
Some services have hardcoded passwords/credentials (should be moved to secrets)

Certificate Management

K3s certificates expire and need periodic rotation. If kubectl commands fail with authentication errors like:

x509: certificate has expired or is not yet valid

Solution:

# Rotate certificates
sudo k3s certificate rotate

# Restart K3s service
sudo systemctl restart k3s

# Update kubeconfig with new certificates
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config

Signs of certificate expiration:

kubectl commands return "the server has asked for the client to provide credentials"
K3s logs show x509 certificate expired errors
Unable to access cluster API even from the master node

Automated Certificate Management

Automatic rotation is now configured using systemd timers:

Scripts Available:

# Automated rotation script (runs monthly via systemd timer)
./scripts/k3s-cert-rotate.sh [--force] [--dry-run]

# Certificate monitoring script 
./scripts/k3s-cert-check.sh

Manual Management:

# Check systemd timer status
sudo systemctl status k3s-cert-rotation.timer

# See next scheduled run
systemctl list-timers k3s-cert-rotation.timer

# Manual rotation (if needed)
sudo ./scripts/k3s-cert-rotate.sh --force

# Check certificate status
sudo ./scripts/k3s-cert-check.sh

Configuration:

Automatic rotation: Monthly via systemd timer
Rotation threshold: 30 days before expiration
Backup location: /var/lib/rancher/k3s/server/cert-backups/
Logs: /var/log/k3s-cert-rotation.log

The automation includes:

Certificate expiration checking
Automatic backup before rotation
Service restart and kubeconfig updates
Health verification after rotation
Logging and notifications

Drive Health Monitoring

SMART Monitoring with smartmontools

Automatic SMART monitoring is configured for all drives:

Configuration: /etc/smartd.conf

Monitored Drives:

/dev/sda - Seagate IronWolf 4TB (backup mirror)
/dev/sdb - Seagate IronWolf 4TB (bcache backing device)
/dev/nvme0n1 - Kingston SNV2S250G 233GB (cache + system)

Monitoring Features:

Daily short self-tests at 2:00 AM
Weekly long self-tests on Saturdays at 3:00 AM
Temperature monitoring (warns at 45°C for HDDs, 60°C for NVMe)
Automatic alerts to syslog for any SMART failures
Tracks reallocated sectors and pending sectors

Useful Commands:

# Check drive health status
sudo smartctl -H /dev/sda
sudo smartctl -H /dev/sdb
sudo smartctl -H /dev/nvme0n1

# View full SMART attributes
sudo smartctl -a /dev/sda

# Check service status
systemctl status smartmontools

# View SMART logs
sudo journalctl -u smartmontools

Monitored Drives (continuous):

/dev/sdb: ~20,000 power-on hours, 42°C, 0 reallocated sectors ✓
/dev/nvme0n1: 4% wear level, 58°C, 100% spare available ✓

Backup Drive (/dev/sda):

Not continuously monitored - spins down after 10 minutes to save energy
Health checked automatically during weekly backup runs
Expected stats: ~18,000 power-on hours, 37°C, 0 reallocated sectors ✓
Power savings: ~35-50 kWh/year

Networking

Cloudflare Tunnel provides external access
Gitea uses custom SSH port 55522
Internal cluster networking via Traefik ingress
TLS termination handled by cert-manager + Let's Encrypt

System Maintenance

Automated Tasks Summary

The home server has three automated maintenance systems running:

Task	Schedule	Purpose	Log Location
K3s Certificate Rotation	Monthly (1st of month, 12:39 AM)	Rotates K3s certificates if expiring within 30 days	`/var/log/k3s-cert-rotation.log`
System Maintenance	Quarterly (Jan/Apr/Jul/Oct 1, 3:00 AM)	Prunes images, cleans logs, runs apt cleanup	`/var/log/k3s-maintenance.log`
Backup Mirror Sync	Weekly (Sundays, 2:00 AM)	Syncs `/mnt/bcache` to `/mnt/backup-mirror`	`/var/log/backup-mirror-sync.log`
SMART Self-Tests	Daily short (2:00 AM), Weekly long (Sat 3:00 AM)	Tests drive health	`journalctl -u smartmontools`

Check all timers:

systemctl list-timers

Automated Quarterly Maintenance

Automatic maintenance is now configured using systemd timers:

Scripts Available:

# Quarterly maintenance script (runs Jan 1, Apr 1, Jul 1, Oct 1 at 3:00 AM)
/usr/local/bin/k3s-maintenance.sh

# Check maintenance timer status
systemctl list-timers k3s-maintenance.timer
systemctl status k3s-maintenance.timer

What it does:

Prunes unused container images
Cleans journal logs (keeps 30 days)
Runs apt autoremove and autoclean
Logs to /var/log/k3s-maintenance.log

Configuration:

Schedule: Quarterly (January 1, April 1, July 1, October 1 at 3:00 AM)
Script location: /usr/local/bin/k3s-maintenance.sh
Service files: /etc/systemd/system/k3s-maintenance.{service,timer}
Log location: /var/log/k3s-maintenance.log

Manual Maintenance Tasks

System Updates (recommended monthly):

sudo apt update && sudo apt upgrade -y
sudo apt autoremove -y

Note: The old Kubernetes APT repository (apt.kubernetes.io) has been removed as it was deprecated. K3s provides its own kubectl via symlink at /usr/local/bin/kubectl -> /usr/local/bin/k3s.

Disk Cleanup (as needed):

# Clean old journal logs
sudo journalctl --vacuum-time=30d

# Prune container images
sudo crictl rmi --prune

# Check disk usage
df -h /

Important Locations:

Root partition: /dev/nvme0n1p2 (69GB total)
Data storage: /mnt/bcache (3.6TB bcache)
Backup mirror: /mnt/backup-mirror (3.6TB)
K3s data: /var/lib/rancher/k3s/
Journal logs: /var/log/journal/

Backup System

Automated Weekly Backups

Automatic backups are configured using rsync and systemd timers: