1. 问题现象与典型报错

当使用GitLab Runner的Kubernetes执行器(Executor)时,常会在注册或运行流水线时遇到以下类型的错误:

# 注册阶段报错示例
ERROR: Failed to connect to Kubernetes cluster: Get "https://k8s-api-server:6443/version": x509: certificate signed by unknown authority

# 流水线运行时报错示例
Job failed (system failure): prepare environment: 
failed to create pod: pods "runner-xxxx" is forbidden: 
error looking up service account default/default: serviceaccount "default" not found

2. 核心原因分析

2.1 证书信任问题(TLS Handshake Error)

当Kubernetes API Server使用自签名证书时,GitLab Runner无法自动验证证书合法性。以下为典型的错误配置:

# 错误配置示例(缺少ca_file参数)
[[runners]]
  executor = "kubernetes"
  [runners.kubernetes]
    host = "https://192.168.1.100:6443"
    bearer_token = "xxxxxxxx"

2.2 RBAC权限不足(Forbidden Error)

当ServiceAccount未正确绑定ClusterRole时,Runner会因权限不足无法创建Pod:

# 错误现象
Error creating: pods "runner-1234" is forbidden: 
error looking up service account default/gitlab-runner: serviceaccount "gitlab-runner" not found

2.3 网络策略限制(NetworkPolicy Blocking)

当集群启用NetworkPolicy时,可能阻止Runner与API Server通信:

# 错误策略示例(未放行Runner所在节点的IP段)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all-except-whitelist
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

3. 完整解决方案及示例演示

3.1 证书验证配置(OpenSSL + kubectl技术栈)

步骤1:提取K8s集群CA证书

# 从kubeconfig中提取CA证书
kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d > ca.crt

步骤2:配置Runner的config.toml

[[runners]]
  name = "k8s-runner"
  url = "https://gitlab.example.com"
  token = "PROJECT_REGISTRATION_TOKEN"
  executor = "kubernetes"
  
  [runners.kubernetes]
    host = "https://k8s-api-server:6443"
    bearer_token = "xxxxxx"  # 从ServiceAccount的Secret获取
    ca_file = "/etc/ssl/certs/ca.crt"  # 证书挂载路径
    namespace = "gitlab-runner"
    service_account = "gitlab-runner-sa"  # 必须与RBAC配置匹配

  # 挂载证书到容器
  [[runners.kubernetes.volumes.host_path]]
    name = "k8s-certs"
    mount_path = "/etc/ssl/certs"
    read_only = true
    host_path = "/opt/k8s-certs"  # 宿主机证书目录

3.2 RBAC权限配置示例

# gitlab-runner-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gitlab-runner-sa
  namespace: gitlab-runner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: gitlab-runner-clusterrole
rules:
- apiGroups: [""]
  resources: ["pods", "pods/exec", "secrets"]
  verbs: ["*"]
- apiGroups: ["apps"]
  resources: ["deployments", "statefulsets"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: gitlab-runner-clusterrole-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: gitlab-runner-clusterrole
subjects:
- kind: ServiceAccount
  name: gitlab-runner-sa
  namespace: gitlab-runner

3.3 网络策略调试方法

# 临时允许所有流量(测试用)
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-test
spec:
  podSelector: {}
  ingress:
  - {}
  egress:
  - {}
  policyTypes:
  - Ingress
  - Egress
EOF

# 测试完成后恢复原策略
kubectl delete networkpolicy allow-all-test

4. 关联技术详解

4.1 ServiceAccount认证流程

  1. 创建ServiceAccount时自动生成Secret
  2. Secret中包含ca.crttoken字段
  3. Bearer Token需要经过Base64解码:
# 获取实际token值
kubectl get secret gitlab-runner-sa-token-xxxx -o jsonpath='{.data.token}' | base64 -d

4.2 证书信任链验证机制

graph LR
    A[GitLab Runner] -->|1. 发起HTTPS请求| B(K8s API Server)
    B -->|2. 返回服务器证书| A
    A -->|3. 检查CA证书| C[本地CA存储]
    C -->|4. 验证通过| D[建立加密连接]

5. 应用场景分析

5.1 动态CI/CD环境构建

  • 场景特征:每次流水线运行时自动创建独立Pod
  • 优势:避免宿主机环境污染,实现任务隔离
  • 配置要点
    [runners.kubernetes]
      cpu_limit = "1"
      memory_limit = "2Gi"
      service_cpu_limit = "500m"
    

5.2 多集群任务分发

  • 实现方案:配置多个Kubernetes执行器
  • 配置示例
    [[runners]]
      name = "cluster-1"
      [runners.kubernetes]
        host = "https://cluster1-api:6443"
    
    [[runners]]
      name = "cluster-2" 
      [runners.kubernetes]
        host = "https://cluster2-api:6443"
    

6. 技术优缺点对比

对比项 Kubernetes执行器 Shell执行器
环境隔离 完全隔离(Pod级别) 依赖宿主机环境
资源开销 较高(每个任务独立Pod) 极低(共享进程)
启动速度 较慢(需要调度Pod) 即时启动
调试难度 需要kubectl技能 可直接登录查看

7. 关键注意事项

  1. 版本兼容矩阵

    • GitLab Runner 15.x 需要 Kubernetes 1.21+
    • 使用kubectl version --short验证兼容性
  2. 安全加固建议

    # Pod安全策略示例
    apiVersion: policy/v1beta1
    kind: PodSecurityPolicy
    metadata:
      name: gitlab-runner-psp
    spec:
      privileged: false
      allowPrivilegeEscalation: false
      allowedCapabilities: ["*"]
    
  3. 资源限制配置

    [runners.kubernetes]
      cpu_request = "500m"
      memory_request = "1Gi"
      helper_cpu_limit = "200m"  # 辅助容器限制
    

8. 问题排查流程图

graph TD
    A[连接失败] --> B{证书错误?}
    B -->|是| C[配置ca_file参数]
    B -->|否| D{权限不足?}
    D -->|是| E[检查RBAC配置]
    D -->|否| F{网络可达?}
    F -->|是| G[检查防火墙规则]
    F -->|否| H[测试API Server连通性]

9. 实战调试技巧

9.1 手动验证API连接

# 使用curl测试连接
curl --cacert ./ca.crt -H "Authorization: Bearer $TOKEN" https://k8s-api:6443/version

# 期望返回结果
{
  "major": "1",
  "minor": "25",
  "gitVersion": "v1.25.3"
}

9.2 查看Runner详细日志

# 启动Runner时增加调试级别
gitlab-runner run --debug

# 关键日志线索
DEBU[0001] Trying to connect to Kubernetes cluster...  executor=kubernetes

10. 文章总结

本文详细剖析了GitLab Runner对接Kubernetes集群时的典型故障场景,通过证书配置、权限管理、网络策略三个维度提供了完整的解决方案。在实践过程中需要特别注意版本兼容性检查和最小权限原则的应用。建议采用分阶段验证法:先确保命令行工具可连通,再调试Runner配置,最后实施安全加固措施。