Skip to content

Latest commit

 

History

History
263 lines (189 loc) · 8.04 KB

File metadata and controls

263 lines (189 loc) · 8.04 KB

Kubernetes Infrastructure on Proxmox

A comprehensive Infrastructure as Code (IaC) solution for deploying a production-ready Kubernetes cluster on Proxmox with monitoring, backup, and network scanning capabilities.

🏗️ Architecture Overview

This project deploys a complete Kubernetes infrastructure stack including:

  • Kubernetes Cluster: K3s-based cluster with control plane and worker nodes
  • Load Balancing: MetalLB for bare-metal load balancing
  • Monitoring Stack: Prometheus, Grafana, Loki, and Mimir for comprehensive observability
  • Log Aggregation: Syslog receiver for OPNsense and external device logs
  • Backup System: Automated backup solution with NFS storage
  • Ingress: Nginx ingress controller with host-based routing
  • Automation: N8N workflow automation platform

🚀 Quick Start

Prerequisites

  • Proxmox VE 7.0+ with API access
  • Terraform 1.0+
  • SSH key pair for VM access
  • NFS server for backup storage (optional)

1. Clone and Configure

git clone <repository-url>
cd kubernetes-proxmox-infrastructure
cp terraform.tfvars.example terraform.tfvars

2. Configure Variables

Edit terraform.tfvars with your environment settings:

# Proxmox Configuration
proxmox_api_url      = "https://your-proxmox:8006/api2/json"
proxmox_api_token_id = "your-token-id"
proxmox_api_token_secret = "your-token-secret"

# VM Configuration
ssh_public_key_path  = "~/.ssh/id_ed25519.pub"
ssh_private_key_path = "~/.ssh/id_ed25519"

# Network Configuration
vm_network_bridge = "vmbr0"
vm_network_vlan   = 100

# NFS Backup Configuration (optional)
nfs_server_ip   = "192.168.1.100"
nfs_backup_path = "/data/kubernetes/backups"

3. Deploy Infrastructure

# Pre-flight checks
./scripts/deployment/pre-flight-check.sh

# Deploy full stack
./scripts/deployment/deploy-full-stack.sh

📁 Project Structure

├── terraform/                    # Terraform infrastructure code
│   ├── infrastructure/          # Proxmox VMs, networking
│   ├── kubernetes/              # K8s clusters, storage, ingress
│   ├── applications/            # Application deployments
│   ├── platform/                # Monitoring, backup, logging
│   ├── main.tf                  # Main configuration
│   ├── providers.tf             # Provider configurations
│   ├── variables.tf             # Variable definitions
│   ├── outputs.tf               # Output definitions
│   └── terraform.tfvars         # Environment variables
├── docs/                        # Documentation
│   ├── deployment/              # Deployment guides
│   ├── backup/                  # Backup documentation
│   ├── monitoring/              # Monitoring setup
│   └── troubleshooting/         # Troubleshooting guides
├── scripts/                     # Automation scripts
│   ├── deployment/              # Deployment scripts
│   ├── backup/                  # Backup and restore scripts
│   ├── maintenance/             # Maintenance scripts
│   └── troubleshooting/         # Troubleshooting scripts
├── configs/                     # Configuration files
│   ├── grafana/                 # Grafana dashboards
│   ├── prometheus/              # Prometheus configs
│   └── backup/                  # Backup configurations
└── README.md                    # This file

🔧 Components

Infrastructure (Terraform)

The Terraform configuration is organized into logical subdirectories for better maintainability:

Directory Structure

  • infrastructure/: Proxmox VMs, networking, and base infrastructure
  • kubernetes/: Kubernetes clusters, storage, and ingress configuration
  • applications/: Application deployments (Immich, media apps, automation)
  • platform/: Platform services (monitoring, backup, logging)

Core Files

  • main.tf: Main configuration and resource orchestration
  • providers.tf: Provider configurations (Proxmox, Kubernetes, Helm)
  • variables.tf: Input variable declarations
  • outputs.tf: Output value declarations
  • backend.tf: Backend configuration for state management
  • versions.tf: Terraform and provider version constraints

See terraform/README.md for detailed structure documentation.

Key Features

🔍 Monitoring & Observability

  • Prometheus: Metrics collection and alerting
  • Grafana: Visualization and dashboards
  • Loki: Log aggregation and analysis
  • Mimir: Long-term metrics storage

💾 Backup & Recovery

  • Automated Backups: Scheduled ETCD and application data backups
  • Manual Backup Triggers: On-demand backup capabilities
  • Restoration Testing: Comprehensive restore validation
  • NFS Storage: Centralized backup storage with redundancy

🌐 Networking

  • MetalLB: Layer 2 load balancing for bare-metal
  • Traefik Ingress: HTTP/HTTPS routing with automatic SSL
  • Network Policies: Secure inter-pod communication

📖 Documentation

Deployment

Backup & Recovery

Monitoring

Troubleshooting

🛠️ Common Operations

Deployment

# Navigate to terraform directory
cd terraform

# Full stack deployment
terraform init
terraform plan
terraform apply

# Or use deployment scripts from root
./scripts/deployment/deploy-full-stack.sh

Backup Operations

# Manual backup (all components)
./scripts/backup/trigger-manual-backup.sh

# Test backup restoration
./scripts/backup/test-backup-restoration.sh dry-run

# Restore specific component
./scripts/backup/test-individual-restore.sh grafana

Maintenance

# Check NFS permissions
./scripts/maintenance/test-nfs-permissions.sh

# Update Grafana dashboards
./scripts/maintenance/update-grafana-dashboards.sh

Troubleshooting

# Diagnose NFS access issues
./scripts/troubleshooting/diagnose-nfs-access.sh



# Fix kubeconfig secret encoding
./scripts/troubleshooting/fix-kubeconfig-secret.sh

🔐 Security Considerations

  • SSH Key Authentication: Password authentication disabled by default
  • Network Segmentation: VLANs and network policies for isolation
  • Secret Management: Kubernetes secrets for sensitive data
  • Backup Encryption: Consider encrypting backup data at rest
  • Access Control: RBAC policies for service accounts

📊 Monitoring & Alerting

Default Dashboards

  • Kubernetes Cluster Overview: Node and pod metrics

  • Backup Monitoring: Backup status and performance

  • Application Metrics: Component-specific dashboards

Key Metrics

  • Cluster resource utilization
  • Backup success/failure rates
  • Network device discovery status
  • Application performance metrics

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

  • Documentation: Check the docs/ directory for detailed guides
  • Issues: Report bugs and feature requests via GitHub issues
  • Troubleshooting: Use the troubleshooting scripts in scripts/troubleshooting/

🏷️ Version

Current version: 1.0.0

📝 Changelog

See CHANGELOG.md for version history and updates.


Note: This infrastructure is designed for production use but should be thoroughly tested in your environment before deployment. Always follow your organization's security and operational guidelines.