Files

2026-03-14 11:42:08 +00:00

7.8 KiB

Raw Permalink Blame History

Monitoring Stack Resource Analysis

Date: October 23, 2025 System: kimchi homelab server Status: Planning/analysis only — monitoring stack has not been deployed yet

Current System Status

System Specifications:

CPU: 4 cores
Memory: 7.6 GB total
Root Disk: 69 GB NVMe (/dev/nvme0n1p2)
Data Storage: 3.6 TB bcache (/mnt/bcache)

Current Usage:

Load: 0.52 (13% CPU on 4 cores)
Memory: 3.3 GB / 7.6 GB used (43%)
Available: 3.8 GB
Disk: 47 GB / 69 GB used (72% on root)
Running pods: 29 total

Top Memory Consumers:

K3s server: 687 MB (8.6%)
Jellyfin: 458 MB (5.7%)
MariaDB (Nextcloud): 330 MB (4.1%)
Home Assistant: 306 MB (3.8%)

Prometheus + Grafana Resource Impact

For a minimal monitoring stack in this homelab setup:

Expected Resource Usage:

Component	Memory	CPU	Notes
Prometheus	400-600 MB	200-400m (5-10%)	Main metrics database
Grafana	150-250 MB	100-200m (2-5%)	Visualization UI
Node Exporter	20-50 MB	50-100m (1-2%)	Per-node metrics
kube-state-metrics	50-100 MB	50-100m (1-2%)	K8s cluster metrics
AlertManager (optional)	50-100 MB	50m (<1%)	Alert routing
Total (minimal)	~700-1100 MB	~450-800m (11-20%)

Impact on System:

CPU Load Increase:

Current: 13% (0.52 load average)
After monitoring: 24-33% (0.96-1.32 load average)
Estimated increase: +11-20% (well within headroom)

Memory Impact:

Current: 3.3 GB used / 3.8 GB available
After monitoring: 4.0-4.4 GB used / 2.7-3.1 GB available
Estimated increase: +700-1100 MB (manageable, but less buffer)

Disk Impact:

Prometheus data: 2-5 GB for 15-day retention with ~30 pods
Root partition: Already at 72% (47 GB used of 69 GB)
Recommendation: Store Prometheus data on /mnt/bcache instead of root

Recommended Configuration

Minimal kube-prometheus-stack Setup

Helm chart: prometheus-community/kube-prometheus-stack

values.yaml (optimized for homelab):

# Prometheus configuration
prometheus:
  prometheusSpec:
    retention: 15d  # 15 days of metrics
    resources:
      requests:
        memory: 512Mi
        cpu: 250m
      limits:
        memory: 1Gi
        cpu: 500m
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: local-path  # Uses /mnt/bcache
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 5Gi

# Grafana configuration
grafana:
  resources:
    requests:
      memory: 128Mi
      cpu: 100m
    limits:
      memory: 256Mi
      cpu: 200m
  persistence:
    enabled: true
    storageClassName: local-path
    size: 1Gi

# Node exporter (per-node metrics)
prometheus-node-exporter:
  resources:
    requests:
      memory: 30Mi
      cpu: 50m
    limits:
      memory: 50Mi
      cpu: 100m

# Kube-state-metrics (cluster metrics)
kube-state-metrics:
  resources:
    requests:
      memory: 64Mi
      cpu: 50m
    limits:
      memory: 128Mi
      cpu: 100m

# AlertManager (optional - disable if not needed)
alertmanager:
  enabled: false  # Can enable later if needed

Installation Commands

# Add Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install with custom values
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  -n monitoring \
  -f values.yaml

# Check installation
kubectl get pods -n monitoring
kubectl get svc -n monitoring

# Access Grafana (port-forward)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

# Default Grafana credentials
# Username: admin
# Password: prom-operator (check with: kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 --decode)

What You'll Get

Features:

Real-time CPU/memory/disk metrics for all pods and nodes
Historical data for 15 days
Pre-built dashboards for Kubernetes cluster overview
Pod resource usage tracking
Node health monitoring
Ability to troubleshoot performance issues
Optional alert notifications

Useful Dashboards:

Kubernetes Cluster Overview (ID: 315)
Kubernetes Pods Resource Usage (ID: 6336)
Node Exporter Full (ID: 1860)
K8s Cluster RAM and CPU Utilization (ID: 16734)

Alternatives to Consider

If Resources Are Tight:

Metrics Server Only
- Resource usage: ~50 MB memory, minimal CPU
- Provides: kubectl top nodes and kubectl top pods commands
- No historical data, no dashboards
```
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
```
Netdata
- Resource usage: ~100-200 MB total
- Lighter weight, simpler setup
- Good for single-node clusters
- Built-in web UI
Prometheus + Remote Write
- Run Prometheus locally but send metrics to external Grafana Cloud
- Free tier available (10k series, 14-day retention)
- Saves local resources

Monitoring Best Practices

Resource Tuning:

Start with conservative limits and increase if needed
Monitor Prometheus memory usage - it grows with number of metrics
Use metric relabeling to drop unnecessary metrics
Adjust retention period based on actual needs

Storage Considerations:

Prometheus needs fast I/O - bcache is ideal
Plan for ~300-500 MB per day of metrics with 30 pods
Enable persistent volumes to survive pod restarts

Query Optimization:

Use recording rules for frequently-used queries
Avoid long time ranges in dashboards
Use downsampling for historical data

Prometheus Metrics Retention Calculation

Formula: Storage = Retention × Ingestion Rate × Compression Factor

For this cluster:

~30 pods × ~1000 metrics per pod = 30k time series
Sample every 15s = 5760 samples/day per series
Compressed: ~1-2 bytes per sample
15-day retention: ~2.5-5 GB

Useful Prometheus Queries

CPU Usage:

# CPU usage by pod
sum(rate(container_cpu_usage_seconds_total{namespace!=""}[5m])) by (pod, namespace)

# Node CPU usage
100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory Usage:

# Memory usage by pod
sum(container_memory_working_set_bytes{namespace!=""}) by (pod, namespace)

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100

Disk Usage:

# Disk usage by mountpoint
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100

# Bcache hit rate
rate(bcache_cache_hits_total[5m]) / (rate(bcache_cache_hits_total[5m]) + rate(bcache_cache_misses_total[5m]))

Bottom Line

Verdict: Yes, you can run Prometheus + Grafana with current resources.

Impact Summary:

CPU load: 13% → 24-33% ✓ Acceptable
Memory: 43% → 53-58% ✓ Acceptable (but less buffer)
Disk: Need to use /mnt/bcache ⚠️ Root partition too full

Critical Requirement:

Ensure Prometheus stores data on /mnt/bcache using local-path storage class
Do NOT store on root partition (already at 72%)

Next Steps:

Create values.yaml with resource limits above
Install kube-prometheus-stack via Helm
Monitor actual resource usage for 1 week
Tune retention period and limits as needed
Set up ingress for Grafana access (optional)

7.8 KiB Raw Permalink Blame History Unescape Escape