1. 问题现象与典型报错
当使用GitLab Runner的Kubernetes执行器(Executor)时,常会在注册或运行流水线时遇到以下类型的错误:
# 注册阶段报错示例
ERROR: Failed to connect to Kubernetes cluster: Get "https://k8s-api-server:6443/version": x509: certificate signed by unknown authority
# 流水线运行时报错示例
Job failed (system failure): prepare environment:
failed to create pod: pods "runner-xxxx" is forbidden:
error looking up service account default/default: serviceaccount "default" not found
2. 核心原因分析
2.1 证书信任问题(TLS Handshake Error)
当Kubernetes API Server使用自签名证书时,GitLab Runner无法自动验证证书合法性。以下为典型的错误配置:
# 错误配置示例(缺少ca_file参数)
[[runners]]
executor = "kubernetes"
[runners.kubernetes]
host = "https://192.168.1.100:6443"
bearer_token = "xxxxxxxx"
2.2 RBAC权限不足(Forbidden Error)
当ServiceAccount未正确绑定ClusterRole时,Runner会因权限不足无法创建Pod:
# 错误现象
Error creating: pods "runner-1234" is forbidden:
error looking up service account default/gitlab-runner: serviceaccount "gitlab-runner" not found
2.3 网络策略限制(NetworkPolicy Blocking)
当集群启用NetworkPolicy时,可能阻止Runner与API Server通信:
# 错误策略示例(未放行Runner所在节点的IP段)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-except-whitelist
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
3. 完整解决方案及示例演示
3.1 证书验证配置(OpenSSL + kubectl技术栈)
步骤1:提取K8s集群CA证书
# 从kubeconfig中提取CA证书
kubectl config view --raw -o jsonpath='{.clusters[0].cluster.certificate-authority-data}' | base64 -d > ca.crt
步骤2:配置Runner的config.toml
[[runners]]
name = "k8s-runner"
url = "https://gitlab.example.com"
token = "PROJECT_REGISTRATION_TOKEN"
executor = "kubernetes"
[runners.kubernetes]
host = "https://k8s-api-server:6443"
bearer_token = "xxxxxx" # 从ServiceAccount的Secret获取
ca_file = "/etc/ssl/certs/ca.crt" # 证书挂载路径
namespace = "gitlab-runner"
service_account = "gitlab-runner-sa" # 必须与RBAC配置匹配
# 挂载证书到容器
[[runners.kubernetes.volumes.host_path]]
name = "k8s-certs"
mount_path = "/etc/ssl/certs"
read_only = true
host_path = "/opt/k8s-certs" # 宿主机证书目录
3.2 RBAC权限配置示例
# gitlab-runner-rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitlab-runner-sa
namespace: gitlab-runner
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gitlab-runner-clusterrole
rules:
- apiGroups: [""]
resources: ["pods", "pods/exec", "secrets"]
verbs: ["*"]
- apiGroups: ["apps"]
resources: ["deployments", "statefulsets"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gitlab-runner-clusterrole-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gitlab-runner-clusterrole
subjects:
- kind: ServiceAccount
name: gitlab-runner-sa
namespace: gitlab-runner
3.3 网络策略调试方法
# 临时允许所有流量(测试用)
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-test
spec:
podSelector: {}
ingress:
- {}
egress:
- {}
policyTypes:
- Ingress
- Egress
EOF
# 测试完成后恢复原策略
kubectl delete networkpolicy allow-all-test
4. 关联技术详解
4.1 ServiceAccount认证流程
- 创建ServiceAccount时自动生成Secret
- Secret中包含
ca.crt
和token
字段 - Bearer Token需要经过Base64解码:
# 获取实际token值
kubectl get secret gitlab-runner-sa-token-xxxx -o jsonpath='{.data.token}' | base64 -d
4.2 证书信任链验证机制
graph LR
A[GitLab Runner] -->|1. 发起HTTPS请求| B(K8s API Server)
B -->|2. 返回服务器证书| A
A -->|3. 检查CA证书| C[本地CA存储]
C -->|4. 验证通过| D[建立加密连接]
5. 应用场景分析
5.1 动态CI/CD环境构建
- 场景特征:每次流水线运行时自动创建独立Pod
- 优势:避免宿主机环境污染,实现任务隔离
- 配置要点:
[runners.kubernetes] cpu_limit = "1" memory_limit = "2Gi" service_cpu_limit = "500m"
5.2 多集群任务分发
- 实现方案:配置多个Kubernetes执行器
- 配置示例:
[[runners]] name = "cluster-1" [runners.kubernetes] host = "https://cluster1-api:6443" [[runners]] name = "cluster-2" [runners.kubernetes] host = "https://cluster2-api:6443"
6. 技术优缺点对比
对比项 | Kubernetes执行器 | Shell执行器 |
---|---|---|
环境隔离 | 完全隔离(Pod级别) | 依赖宿主机环境 |
资源开销 | 较高(每个任务独立Pod) | 极低(共享进程) |
启动速度 | 较慢(需要调度Pod) | 即时启动 |
调试难度 | 需要kubectl技能 | 可直接登录查看 |
7. 关键注意事项
版本兼容矩阵:
- GitLab Runner 15.x 需要 Kubernetes 1.21+
- 使用
kubectl version --short
验证兼容性
安全加固建议:
# Pod安全策略示例 apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: gitlab-runner-psp spec: privileged: false allowPrivilegeEscalation: false allowedCapabilities: ["*"]
资源限制配置:
[runners.kubernetes] cpu_request = "500m" memory_request = "1Gi" helper_cpu_limit = "200m" # 辅助容器限制
8. 问题排查流程图
graph TD
A[连接失败] --> B{证书错误?}
B -->|是| C[配置ca_file参数]
B -->|否| D{权限不足?}
D -->|是| E[检查RBAC配置]
D -->|否| F{网络可达?}
F -->|是| G[检查防火墙规则]
F -->|否| H[测试API Server连通性]
9. 实战调试技巧
9.1 手动验证API连接
# 使用curl测试连接
curl --cacert ./ca.crt -H "Authorization: Bearer $TOKEN" https://k8s-api:6443/version
# 期望返回结果
{
"major": "1",
"minor": "25",
"gitVersion": "v1.25.3"
}
9.2 查看Runner详细日志
# 启动Runner时增加调试级别
gitlab-runner run --debug
# 关键日志线索
DEBU[0001] Trying to connect to Kubernetes cluster... executor=kubernetes
10. 文章总结
本文详细剖析了GitLab Runner对接Kubernetes集群时的典型故障场景,通过证书配置、权限管理、网络策略三个维度提供了完整的解决方案。在实践过程中需要特别注意版本兼容性检查和最小权限原则的应用。建议采用分阶段验证法:先确保命令行工具可连通,再调试Runner配置,最后实施安全加固措施。