Infra Architecture — bm01-prod

K8s 1.33.11 Cilium 1.19 ArgoCD GitOps Longhorn storage Authentik SSO Forgejo Actions CI

Auto-deployed через Forgejo Actions → registry → ArgoCD Image Updater → ArgoCD sync. Commit version: v3 — k8s details.

Содержание

  1. Physical layout
  2. K8s cluster topology
  3. K8s control plane components
  4. Cilium CNI + сетевая модель
  5. Admission pipeline (security)
  6. Namespaces composition
  7. Storage flow (Longhorn)
  8. Cluster apps + data flows
  9. CI/CD lifecycle

1. Physical layout

flowchart LR
  Internet[/"Internet (22, 80, 443)"/]
  subgraph VPS["VPS (Ubuntu 24.04, 78.109.17.180)"]
    direction TB
    KB["/root/infra/
KB + secrets + tofu + ansible"] Gatus[Gatus uptime monitor] Restic[(Restic SFTP repo)] end subgraph BM01["bm01 (Proxmox VE 9.1.5, AMD Ryzen 9 9950X, 180GB RAM, NVMe RAID 1)"] direction TB subgraph K8S["K8s 1.33.11 cluster — 6 VMs on vmbr1 10.10.0.0/24"] direction TB CP[3× control plane VMs
cp-1/2/3, 4GB RAM each] W[3× worker VMs
w-1/2/3, 12GB RAM each] end end Internet -->|SSH 22 / HTTPS 80,443 DNAT → k8s-w-1:30080/30443| BM01 VPS -->|SSH key-only ProxyJump| BM01 BM01 -->|SFTP backup daily 03:00 MSK| Restic Gatus -->|TCP 22 / ICMP probe 60s| BM01

2. K8s cluster topology

flowchart TB
  subgraph cp["Control Plane (3 VMs, stacked etcd)"]
    direction LR
    CP1["k8s-cp-1
10.10.0.10
apiserver+etcd+sched+cm"] CP2["k8s-cp-2
10.10.0.11
apiserver+etcd+sched+cm"] CP3["k8s-cp-3
10.10.0.12
apiserver+etcd+sched+cm"] end subgraph w["Worker nodes (3 VMs)"] direction LR W1["k8s-w-1
10.10.0.20
zone=vm-w-1"] W2["k8s-w-2
10.10.0.21
zone=vm-w-2"] W3["k8s-w-3
10.10.0.22
zone=vm-w-3"] end CP1 <-->|etcd raft| CP2 CP2 <-->|etcd raft| CP3 CP1 <-->|etcd raft| CP3 W1 -->|kubelet| CP1 W2 -->|kubelet| CP2 W3 -->|kubelet| CP3 note["Zone labels topology.kubernetes.io/zone используются
для HA-imitation topologySpreadConstraints
(ADR-0043) — replicas spread cross-VM"] W1 -.- note

3. K8s control plane components

flowchart TB
  Client["kubectl / ArgoCD / Image Updater / etc."]

  subgraph cp_pod["Per cp VM (static pods)"]
    direction TB
    APISERVER["kube-apiserver :6443
—audit-log /var/log/kubernetes/audit/
—authentication, authorization
—admission webhooks"] ETCD["etcd :2379, :2380
data /var/lib/etcd
k=v cluster state"] SCHED["kube-scheduler :10259
watches Pod (no node assigned)
+ NodeAffinity, TopologySpread, PDB"] CM["kube-controller-manager :10257
Deployment, ReplicaSet, Job
Node lifecycle, Endpoint, GC"] end subgraph node_pod["Per worker VM"] direction TB KUBELET["kubelet
watches /etc/kubernetes/manifests + apiserver Pods
creates containers via CRI (containerd 2.2)"] CILIUM["cilium-agent (DS)
kube-proxy replacement
NetworkPolicy enforcement
Hubble metrics :9965"] LONGHORN["longhorn-manager / instance-manager
(iSCSI :3260)"] ALLOY["grafana-alloy (DS)
scrape pod logs + journald + audit"] end Client -->|HTTPS :6443| APISERVER APISERVER <--> ETCD SCHED -->|watch| APISERVER CM -->|watch + write| APISERVER KUBELET -->|watch + node status| APISERVER CILIUM -->|watch CiliumNetworkPolicy| APISERVER ALLOY -->|push| LOKI[(Loki
/loki/api/v1/push)] APISERVER -->|audit-log fd| FILE[(/var/log/kubernetes/audit/audit.log)] FILE --> ALLOY

4. Cilium CNI + сетевая модель

flowchart TB
  Internet[/"Internet"/]

  subgraph BM01_NET["bm01 nftables"]
    EXT[eth1 193.39.168.159]
    VMBR1[vmbr1 10.10.0.1/24]
    NAT["DNAT prerouting:
:80→10.10.0.20:30080
:443→10.10.0.20:30443
+ hairpin для cert-manager self-check"] MASQ["postrouting masquerade:
vmbr1→eth1 (egress)
vmbr1→vmbr1 (hairpin)"] end subgraph CL["K8s cluster network (Cilium 1.19)"] direction TB POD_CIDR["Pod CIDR 10.233.64.0/18
per-node /24 alloc"] SVC_CIDR["Service CIDR 10.233.0.0/18
kube-proxy replacement (eBPF)"] INGNX["ingress-nginx
NodePort 30080/30443
2 replicas → spread"] NP["NetworkPolicy: default-deny ingress
12/12 platform namespaces
+ allow-rules per ns"] HUBBLE["Hubble
flow observability
UI hubble.georgeops.online"] end Internet --> EXT EXT --> NAT NAT --> INGNX INGNX --> POD_CIDR POD_CIDR <--> SVC_CIDR POD_CIDR -.->|policy denied| NP POD_CIDR -.->|flows| HUBBLE VMBR1 <-->|outbound NAT| MASQ MASQ --> EXT

5. K8s admission pipeline (security)

sequenceDiagram
  participant U as User/Controller
  participant API as kube-apiserver
  participant Auth as Authentication
  participant Authz as Authorization
  participant Mut as Mutating Webhooks
  participant Val as Validating Webhooks
  participant ETCD as etcd

  U->>API: Create Pod
  API->>Auth: x509 cert / SA token / OIDC
  Auth-->>API: user identity
  API->>Authz: RBAC (Role/ClusterRole)
  Authz-->>API: allow/deny
  API->>Mut: cert-manager-cainjector,
kyverno-mutate (if configured) Mut-->>API: mutated object API->>Val: PodSecurity (PSA labels)
Kyverno ClusterPolicies (5 Enforce + verify-images Audit) Val-->>API: pass/deny Note over Val: disallow-latest-tag
require-resource-requests
require-pod-probes
require-app-name-label
disallow-host-namespaces
verify-upstream-images API->>ETCD: persist API->>U: response (201 / 403 deny / etc.) API->>API: audit-log entry (Metadata level)

6. Namespaces composition

flowchart LR
  subgraph platform["Platform (GitOps + Auth + Storage)"]
    direction TB
    NS_argocd[argocd
PSA: restricted/baseline
30 Applications + image-updater] NS_authentik[authentik
PSA: restricted/baseline
IdP + Postgres + Redis] NS_forgejo[forgejo
PSA: restricted/baseline
git + registry + Actions] NS_oauth[oauth2-proxy
PSA: restricted/baseline
forward-auth 2 replicas] NS_openbao[openbao
PSA: restricted/baseline
vault + audit + auto-unseal] NS_ext[external-secrets
PSA: baseline
ESO controller + webhook] end subgraph storage_sec["Storage + Security"] direction TB NS_lh[longhorn-system
PSA: privileged
CSI + manager + UI + 24 pods] NS_cert[cert-manager
PSA: baseline
controller + webhook + cainjector] NS_kyv[kyverno
PSA: baseline
4 controllers + admission/cleanup] NS_trivy[trivy-system
PSA: privileged
operator + node-collector] end subgraph obs_ingress["Observability + Ingress"] direction TB NS_mon[monitoring
PSA: restricted/baseline
Prometheus + AM + Grafana + Loki + KSM + node-exporter] NS_ing[ingress-nginx
PSA: baseline
2 controller replicas] end subgraph workload["Workload + CI"] direction TB NS_arch[arch-viewer
PSA: baseline
2 nginx replicas (this app!)] NS_runner[forgejo-runner
PSA: privileged
act_runner + DinD sidecar] end NS_argocd -->|deploys| NS_authentik NS_argocd -->|deploys| NS_forgejo NS_argocd -->|deploys| NS_openbao NS_argocd -->|deploys| NS_arch NS_runner -->|build push| NS_forgejo

7. Storage flow (Longhorn)

flowchart TB
  subgraph app["App pod (e.g. openbao)"]
    APP[container]
    PVC[(PVC openbao-data 5Gi RWO)]
    APP --> PVC
  end

  subgraph longhorn["Longhorn control plane (ns longhorn-system)"]
    LH_MGR[longhorn-manager
orchestrates volumes] CSI[longhorn-csi-plugin
per-node
:9808 gRPC] end subgraph data_plane["Data plane (per-worker)"] direction LR E1["instance-manager-engine
volume head"] R1["instance-manager-replica
worker-1
/var/lib/longhorn/replicas"] R2["instance-manager-replica
worker-2"] end subgraph snap["Snapshots"] SD["daily 03:00 UTC
retain 7"] SW["weekly 04:00 UTC Sun
retain 4"] end PVC -.->|provisioning| LH_MGR LH_MGR --> CSI CSI -->|iSCSI :3260| E1 E1 -->|sync write replicas| R1 E1 -->|sync write replicas| R2 E1 -.->|snapshots metadata| SD E1 -.->|snapshots metadata| SW note["Off-site backup target — manual responsibility пользователя.
Snapshots local к replica node (NOT off-host)."] R1 -.- note

8. Cluster apps + data flows

flowchart TB
  USER[Browser] --> INGNX[ingress-nginx]

  subgraph SSO["SSO layer"]
    AUTH[Authentik OIDC IdP]
    OAUTH[oauth2-proxy forward-auth]
  end

  subgraph GitOps["GitOps platform"]
    direction TB
    FORGEJO[Forgejo
git + registry + Actions] ARGOCD[ArgoCD 33 Applications] RUNNER[forgejo-runner
CI workflows] IMGUPD[ArgoCD Image Updater
polls registry] end subgraph Secrets["Secrets"] OPENBAO[OpenBao vault] ESO[ESO per-app SecretStore] end subgraph Observability["Observability"] PROM[Prometheus 30d] LOKI[Loki 14d] ALLOY[Alloy DS] GRAF[Grafana 12+ dashboards] AM[Alertmanager pull-based] end subgraph Storage["Storage"] LH[Longhorn replica=2] end INGNX --> AUTH INGNX --> OAUTH OAUTH --> AUTH INGNX --> FORGEJO & ARGOCD ARGOCD -->|polls 180s| FORGEJO FORGEJO --> RUNNER RUNNER -->|kaniko push| FORGEJO IMGUPD -->|poll 2m registry| FORGEJO IMGUPD -->|patch Application kustomize.images| ARGOCD ESO --> OPENBAO ALLOY --> LOKI PROM --> AM GRAF --> PROM & LOKI ARGOCD -->|deploys| FORGEJO & PROM & LOKI & OPENBAO & ESO & AUTH

9. CI/CD lifecycle (это приложение!)

sequenceDiagram
  participant Dev as Developer
  participant Git as Forgejo git
  participant Run as Forgejo Runner
(kaniko) participant Reg as Forgejo Registry participant IU as Image Updater participant Argo as ArgoCD participant K8s as Kubernetes Dev->>Git: git push (commit SHA) Git->>Run: trigger workflow Run->>Git: wget tarball /api/archive/SHA.tar.gz Run->>Run: kaniko/executor build Run->>Reg: docker push image:SHA + :latest Note over Reg: ~1 min Reg-->>IU: registry poll (every 2m) IU->>Argo: patch Application kustomize.images[].newTag = SHA Argo->>K8s: sync — render kustomize with new tag K8s->>K8s: rolling restart (PDB minAvailable=1) Note over Dev,K8s: Hands-off total ~3-5 min

Source: forgejo.georgeops.online/infra/arch-viewer • Управляется ArgoCD из infra-k8s.git • См. ADR-0045, RB-021