256 lines
7.9 KiB
Markdown
256 lines
7.9 KiB
Markdown
# 部署运维详设文档
|
||
|
||
> **文档版本**: v1.0
|
||
> **撰写人**: DevOps工程师
|
||
> **创建日期**: 2024年9月11日
|
||
|
||
## 1. 架构总览
|
||
|
||
我们将采用云原生技术栈,以容器化为核心,利用Kubernetes进行服务编排,实现高可用、可扩展、易于维护的部署架构。
|
||
|
||
- **云服务提供商**: 推荐使用腾讯云、阿里云等主流云厂商,以获得稳定的基础设施和丰富的云产品支持。
|
||
- **容器化**: Docker
|
||
- **容器编排**: Kubernetes (K8s)
|
||
- **CI/CD**: GitHub Actions
|
||
- **监控与告警**: Prometheus + Grafana + Alertmanager
|
||
- **日志管理**: ELK Stack (Elasticsearch, Logstash, Kibana) 或 Loki
|
||
|
||
## 2. 容器化 (Docker)
|
||
|
||
### 2.1 后端服务 Dockerfile
|
||
|
||
一个优化的、多阶段构建的`Dockerfile`示例:
|
||
|
||
```dockerfile
|
||
# ---- Base Stage ----
|
||
FROM node:18-alpine AS base
|
||
WORKDIR /app
|
||
COPY package*.json ./
|
||
|
||
# ---- Dependencies Stage ----
|
||
FROM base AS dependencies
|
||
RUN npm install --frozen-lockfile
|
||
|
||
# ---- Build Stage ----
|
||
FROM dependencies AS build
|
||
COPY . .
|
||
RUN npm run build
|
||
|
||
# ---- Production Stage ----
|
||
FROM node:18-alpine AS production
|
||
WORKDIR /app
|
||
COPY --from=build /app/dist ./dist
|
||
COPY --from=dependencies /app/node_modules ./node_modules
|
||
COPY package*.json ./
|
||
|
||
EXPOSE 8080
|
||
CMD ["node", "dist/main.js"]
|
||
```
|
||
|
||
- **多阶段构建**: 减小最终镜像体积,只包含生产运行所需的依赖。
|
||
- **使用`alpine`镜像**: 基于轻量级的Alpine Linux,进一步减小镜像大小。
|
||
- **缓存优化**: 将`package.json`的复制和`npm install`分层,利用Docker的层缓存机制,只有在依赖变更时才重新安装。
|
||
|
||
### 2.2 镜像仓库
|
||
|
||
- **推荐**: 使用云厂商提供的容器镜像服务(如腾讯云TCR、阿里云ACR)。
|
||
- **CI/CD集成**: GitHub Actions将在构建成功后,自动将Docker镜像推送到指定的镜像仓库,并打上版本标签(如`git commit hash`)。
|
||
|
||
## 3. Kubernetes (K8s) 部署
|
||
|
||
### 3.1 部署物清单 (Manifests)
|
||
|
||
我们将使用YAML文件来定义所有K8s资源。
|
||
|
||
#### a) `deployment.yaml` (后端服务)
|
||
```yaml
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
metadata:
|
||
name: game-server-deployment
|
||
spec:
|
||
replicas: 3 # 初始副本数为3,可根据负载自动伸缩
|
||
selector:
|
||
matchLabels:
|
||
app: game-server
|
||
template:
|
||
metadata:
|
||
labels:
|
||
app: game-server
|
||
spec:
|
||
containers:
|
||
- name: game-server
|
||
image: your-registry/game-server:latest # 镜像地址
|
||
ports:
|
||
- containerPort: 8080
|
||
resources:
|
||
requests:
|
||
cpu: "250m"
|
||
memory: "512Mi"
|
||
limits:
|
||
cpu: "500m"
|
||
memory: "1Gi"
|
||
envFrom:
|
||
- configMapRef:
|
||
name: game-server-config
|
||
- secretRef:
|
||
name: game-server-secrets
|
||
livenessProbe: # 存活探针
|
||
httpGet:
|
||
path: /api/v1/healthz
|
||
port: 8080
|
||
initialDelaySeconds: 15
|
||
periodSeconds: 20
|
||
readinessProbe: # 就绪探针
|
||
httpGet:
|
||
path: /api/v1/healthz
|
||
port: 8080
|
||
initialDelaySeconds: 5
|
||
periodSeconds: 10
|
||
```
|
||
|
||
#### b) `service.yaml` (服务暴露)
|
||
```yaml
|
||
apiVersion: v1
|
||
kind: Service
|
||
metadata:
|
||
name: game-server-service
|
||
spec:
|
||
type: LoadBalancer # 使用云厂商的LB暴露服务
|
||
selector:
|
||
app: game-server
|
||
ports:
|
||
- protocol: TCP
|
||
port: 80 # LB监听80端口
|
||
targetPort: 8080
|
||
```
|
||
|
||
#### c) `hpa.yaml` (水平Pod自动伸缩)
|
||
```yaml
|
||
apiVersion: autoscaling/v2
|
||
kind: HorizontalPodAutoscaler
|
||
metadata:
|
||
name: game-server-hpa
|
||
spec:
|
||
scaleTargetRef:
|
||
apiVersion: apps/v1
|
||
kind: Deployment
|
||
name: game-server-deployment
|
||
minReplicas: 3
|
||
maxReplicas: 10
|
||
metrics:
|
||
- type: Resource
|
||
resource:
|
||
name: cpu
|
||
target:
|
||
type: Utilization
|
||
averageUtilization: 75 # CPU使用率超过75%时扩容
|
||
```
|
||
|
||
### 3.2 配置与密钥管理
|
||
|
||
- **ConfigMap**: 用于存储非敏感配置,如数据库地址、Redis地址、日志级别等。
|
||
- **Secret**: 用于存储敏感信息,如数据库密码、JWT密钥等。必须进行Base64编码。
|
||
|
||
## 4. CI/CD (GitHub Actions)
|
||
|
||
### 4.1 工作流 (`.github/workflows/deploy.yml`)
|
||
|
||
```yaml
|
||
name: Deploy to Production
|
||
on:
|
||
push:
|
||
branches:
|
||
- main
|
||
|
||
jobs:
|
||
build-and-deploy:
|
||
runs-on: ubuntu-latest
|
||
steps:
|
||
- name: Checkout code
|
||
uses: actions/checkout@v3
|
||
|
||
- name: Set up Node.js
|
||
uses: actions/setup-node@v3
|
||
with:
|
||
node-version: '18'
|
||
|
||
- name: Install dependencies
|
||
run: npm install --frozen-lockfile
|
||
|
||
- name: Run tests
|
||
run: npm test
|
||
|
||
- name: Log in to Docker Registry
|
||
uses: docker/login-action@v2
|
||
with:
|
||
registry: your-registry.com
|
||
username: ${{ secrets.DOCKER_USERNAME }}
|
||
password: ${{ secrets.DOCKER_PASSWORD }}
|
||
|
||
- name: Build and push Docker image
|
||
uses: docker/build-push-action@v4
|
||
with:
|
||
context: .
|
||
push: true
|
||
tags: your-registry.com/game-server:${{ github.sha }}
|
||
|
||
- name: Set up Kubeconfig
|
||
uses: azure/k8s-set-context@v3
|
||
with:
|
||
method: kubeconfig
|
||
kubeconfig: ${{ secrets.KUBECONFIG }}
|
||
|
||
- name: Deploy to Kubernetes
|
||
uses: azure/k8s-deploy@v4
|
||
with:
|
||
action: 'deploy'
|
||
manifests: |
|
||
k8s/deployment.yaml
|
||
k8s/service.yaml
|
||
images: |
|
||
your-registry.com/game-server:${{ github.sha }}
|
||
```
|
||
|
||
- **触发条件**: 当代码推送到`main`分支时自动触发。
|
||
- **流程**: 拉取代码 -> 安装依赖 -> 运行测试 -> 登录镜像仓库 -> 构建并推送镜像 -> 部署到K8s。
|
||
- **密钥管理**: 所有敏感信息(如密码、Kubeconfig)都存储在GitHub Secrets中。
|
||
|
||
## 5. 监控与告警
|
||
|
||
- **Prometheus**:
|
||
- 通过`kube-prometheus-stack`部署在K8s集群中。
|
||
- 自动发现并抓取K8s Pod和Node的指标。
|
||
- 后端服务需要暴露一个`/metrics`端点,提供自定义业务指标(如在线玩家数、活跃游戏数、API响应延迟等)。
|
||
- **Grafana**:
|
||
- 提供可视化的监控仪表盘(Dashboard)。
|
||
- 预置Dashboard用于监控集群资源、Node.js应用性能等。
|
||
- 自定义Dashboard展示核心业务指标。
|
||
- **Alertmanager**:
|
||
- 根据Prometheus中预设的告警规则(如CPU使用率过高、服务Pod重启频繁、API错误率上升),通过邮件、钉钉、企业微信等方式发送告警通知。
|
||
|
||
## 6. 日志管理
|
||
|
||
- **方案**: **Loki + Promtail**
|
||
- **Promtail**: 作为日志代理,部署在每个K8s节点上,负责收集容器日志并发送给Loki。
|
||
- **Loki**: 轻量级的日志聚合系统,对日志进行索引和存储。
|
||
- **集成**: 在Grafana中配置Loki作为数据源,可以直接在Grafana中查询和分析日志,与监控指标在同一平台展示,方便问题排查。
|
||
|
||
## 7. 部署策略 (蓝绿部署)
|
||
|
||
为实现零停机更新,采用蓝绿部署策略。
|
||
|
||
1. **当前版本 (Blue)**: `game-server-deployment-blue` 正在运行,并通过`game-server-service`对外提供服务。
|
||
2. **部署新版本 (Green)**:
|
||
- 创建一个新的Deployment `game-server-deployment-green`,包含新版本的代码。
|
||
- 等待`green`环境的所有Pod都进入`Ready`状态。
|
||
3. **流量切换**:
|
||
- 修改`game-server-service`的`selector`,将其指向`green`环境的Pod (`app: game-server-green`)。
|
||
- K8s会自动将流量无缝切换到新版本。
|
||
4. **观察期**:
|
||
- 观察新版本运行是否稳定,监控核心指标是否正常。
|
||
5. **下线旧版本**:
|
||
- 如果一切正常,删除`game-server-deployment-blue`。
|
||
- 如果出现问题,可以快速将Service的`selector`切回`blue`环境,实现秒级回滚。
|
||
|
||
此流程可通过CI/CD脚本自动化。 |