7.8 KiB
7.8 KiB
Monitoring Stack Resource Analysis
Date: October 23, 2025 System: kimchi homelab server Status: Planning/analysis only — monitoring stack has not been deployed yet
Current System Status
System Specifications:
- CPU: 4 cores
- Memory: 7.6 GB total
- Root Disk: 69 GB NVMe (
/dev/nvme0n1p2) - Data Storage: 3.6 TB bcache (
/mnt/bcache)
Current Usage:
- Load: 0.52 (13% CPU on 4 cores)
- Memory: 3.3 GB / 7.6 GB used (43%)
- Available: 3.8 GB
- Disk: 47 GB / 69 GB used (72% on root)
- Running pods: 29 total
Top Memory Consumers:
- K3s server: 687 MB (8.6%)
- Jellyfin: 458 MB (5.7%)
- MariaDB (Nextcloud): 330 MB (4.1%)
- Home Assistant: 306 MB (3.8%)
Prometheus + Grafana Resource Impact
For a minimal monitoring stack in this homelab setup:
Expected Resource Usage:
| Component | Memory | CPU | Notes |
|---|---|---|---|
| Prometheus | 400-600 MB | 200-400m (5-10%) | Main metrics database |
| Grafana | 150-250 MB | 100-200m (2-5%) | Visualization UI |
| Node Exporter | 20-50 MB | 50-100m (1-2%) | Per-node metrics |
| kube-state-metrics | 50-100 MB | 50-100m (1-2%) | K8s cluster metrics |
| AlertManager (optional) | 50-100 MB | 50m (<1%) | Alert routing |
| Total (minimal) | ~700-1100 MB | ~450-800m (11-20%) |
Impact on System:
CPU Load Increase:
- Current: 13% (0.52 load average)
- After monitoring: 24-33% (0.96-1.32 load average)
- Estimated increase: +11-20% (well within headroom)
Memory Impact:
- Current: 3.3 GB used / 3.8 GB available
- After monitoring: 4.0-4.4 GB used / 2.7-3.1 GB available
- Estimated increase: +700-1100 MB (manageable, but less buffer)
Disk Impact:
- Prometheus data: 2-5 GB for 15-day retention with ~30 pods
- Root partition: Already at 72% (47 GB used of 69 GB)
- Recommendation: Store Prometheus data on
/mnt/bcacheinstead of root
Recommended Configuration
Minimal kube-prometheus-stack Setup
Helm chart: prometheus-community/kube-prometheus-stack
values.yaml (optimized for homelab):
# Prometheus configuration
prometheus:
prometheusSpec:
retention: 15d # 15 days of metrics
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 500m
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: local-path # Uses /mnt/bcache
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
# Grafana configuration
grafana:
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 200m
persistence:
enabled: true
storageClassName: local-path
size: 1Gi
# Node exporter (per-node metrics)
prometheus-node-exporter:
resources:
requests:
memory: 30Mi
cpu: 50m
limits:
memory: 50Mi
cpu: 100m
# Kube-state-metrics (cluster metrics)
kube-state-metrics:
resources:
requests:
memory: 64Mi
cpu: 50m
limits:
memory: 128Mi
cpu: 100m
# AlertManager (optional - disable if not needed)
alertmanager:
enabled: false # Can enable later if needed
Installation Commands
# Add Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Create monitoring namespace
kubectl create namespace monitoring
# Install with custom values
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
-n monitoring \
-f values.yaml
# Check installation
kubectl get pods -n monitoring
kubectl get svc -n monitoring
# Access Grafana (port-forward)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# Default Grafana credentials
# Username: admin
# Password: prom-operator (check with: kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode)
What You'll Get
Features:
- Real-time CPU/memory/disk metrics for all pods and nodes
- Historical data for 15 days
- Pre-built dashboards for Kubernetes cluster overview
- Pod resource usage tracking
- Node health monitoring
- Ability to troubleshoot performance issues
- Optional alert notifications
Useful Dashboards:
- Kubernetes Cluster Overview (ID: 315)
- Kubernetes Pods Resource Usage (ID: 6336)
- Node Exporter Full (ID: 1860)
- K8s Cluster RAM and CPU Utilization (ID: 16734)
Alternatives to Consider
If Resources Are Tight:
-
Metrics Server Only
- Resource usage: ~50 MB memory, minimal CPU
- Provides:
kubectl top nodesandkubectl top podscommands - No historical data, no dashboards
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml -
Netdata
- Resource usage: ~100-200 MB total
- Lighter weight, simpler setup
- Good for single-node clusters
- Built-in web UI
-
Prometheus + Remote Write
- Run Prometheus locally but send metrics to external Grafana Cloud
- Free tier available (10k series, 14-day retention)
- Saves local resources
Monitoring Best Practices
Resource Tuning:
- Start with conservative limits and increase if needed
- Monitor Prometheus memory usage - it grows with number of metrics
- Use metric relabeling to drop unnecessary metrics
- Adjust retention period based on actual needs
Storage Considerations:
- Prometheus needs fast I/O - bcache is ideal
- Plan for ~300-500 MB per day of metrics with 30 pods
- Enable persistent volumes to survive pod restarts
Query Optimization:
- Use recording rules for frequently-used queries
- Avoid long time ranges in dashboards
- Use downsampling for historical data
Prometheus Metrics Retention Calculation
Formula: Storage = Retention × Ingestion Rate × Compression Factor
For this cluster:
- ~30 pods × ~1000 metrics per pod = 30k time series
- Sample every 15s = 5760 samples/day per series
- Compressed: ~1-2 bytes per sample
- 15-day retention: ~2.5-5 GB
Useful Prometheus Queries
CPU Usage:
# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total{namespace!=""}[5m])) by (pod, namespace)
# Node CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory Usage:
# Memory usage by pod
sum(container_memory_working_set_bytes{namespace!=""}) by (pod, namespace)
# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
Disk Usage:
# Disk usage by mountpoint
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100
# Bcache hit rate
rate(bcache_cache_hits_total[5m]) / (rate(bcache_cache_hits_total[5m]) + rate(bcache_cache_misses_total[5m]))
Bottom Line
Verdict: Yes, you can run Prometheus + Grafana with current resources.
Impact Summary:
- CPU load: 13% → 24-33% ✓ Acceptable
- Memory: 43% → 53-58% ✓ Acceptable (but less buffer)
- Disk: Need to use /mnt/bcache ⚠️ Root partition too full
Critical Requirement:
- Ensure Prometheus stores data on
/mnt/bcacheusinglocal-pathstorage class - Do NOT store on root partition (already at 72%)
Next Steps:
- Create
values.yamlwith resource limits above - Install kube-prometheus-stack via Helm
- Monitor actual resource usage for 1 week
- Tune retention period and limits as needed
- Set up ingress for Grafana access (optional)