使用Docker Compose部署Hugging Face文本分类模型并提供REST API

1. 业务场景与目标

企业需要将预训练的文本分类模型（如BERT）快速部署为API服务，供内部应用调用。例如，客服系统需要实时分析用户反馈的情感倾向（正面/负面）。目标：用Docker Compose实现一键部署，提供高可用REST API，支持并发请求，并易于版本管理和扩展。

2. 环境准备（uv + 依赖）

使用uv管理Python环境，确保依赖一致。

# 安装uv（如果未安装）
curl -LsSf https://astral.sh/uv/install.sh | sh

# 创建项目目录并初始化
mkdir text-classification-api && cd text-classification-api
uv init

# 添加依赖到pyproject.toml（或直接运行）
uv add fastapi uvicorn transformers torch docker-compose

# 同步依赖
uv sync

3. 数据说明（真实数据口径或模拟数据生成逻辑）

任务类型：二分类（情感分析：正面/负面）。使用公开数据集模拟，如IMDB影评，但这里直接使用Hugging Face预训练模型distilbert-base-uncased-finetuned-sst-2-english，它已在SST-2数据集上微调，适用于通用英文情感分析。输入为文本字符串，输出为概率分布。

4. 训练/实现步骤（完整代码）

跳过训练，直接部署预训练模型。创建FastAPI应用和Docker配置。

app/main.py

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
import torch
import logging

# 初始化FastAPI应用
app = FastAPI(title="Text Classification API", version="1.0")

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# 加载模型和tokenizer（使用缓存，避免每次请求加载）
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
try:
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
    logger.info(f"Model {model_name} loaded successfully.")
except Exception as e:
    logger.error(f"Failed to load model: {e}")
    raise

# 定义请求/响应模型
class TextRequest(BaseModel):
    text: str

class PredictionResponse(BaseModel):
    label: str
    score: float

@app.get("/health")
async def health_check():
    return {"status": "healthy"}

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: TextRequest):
    """
    预测文本情感（正面/负面）。
    """
    if not request.text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty.")

    try:
        # 使用pipeline进行预测
        result = classifier(request.text, truncation=True, max_length=512)
        # 结果示例: [{'label': 'POSITIVE', 'score': 0.9998}]
        prediction = result[0]
        return PredictionResponse(label=prediction['label'], score=prediction['score'])
    except Exception as e:
        logger.error(f"Prediction error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error during prediction.")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Dockerfile

# 使用Python 3.9 slim镜像减小体积
FROM python:3.9-slim

# 设置工作目录
WORKDIR /app

# 安装系统依赖（可选，用于优化）
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# 复制依赖文件
COPY pyproject.toml .

# 使用uv安装依赖（复制uv可执行文件或直接安装）
# 这里简化，使用pip安装uv然后同步
RUN pip install --no-cache-dir uv \
    && uv sync --frozen \
    && pip uninstall -y uv \
    && apt-get purge -y gcc \
    && apt-get autoremove -y

# 复制应用代码
COPY app/ ./app/

# 暴露端口
EXPOSE 8000

# 启动命令
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

docker-compose.yml

version: '3.8'

services:
  text-classification-api:
    build: .
    container_name: huggingface-text-classifier
    ports:
      - "8000:8000"
    environment:
      - TRANSFORMERS_CACHE=/app/model-cache  # 设置模型缓存路径
    volumes:
      - model-cache:/app/model-cache  # 持久化模型缓存，避免重复下载
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  model-cache:

5. 调用方式（离线批量 + 单条示例）

单条示例（使用curl）

# 启动服务
cd text-classification-api
docker-compose up --build

# 在另一个终端调用API
curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "I love this product, it works perfectly!"}'

# 预期响应
# {"label":"POSITIVE","score":0.9998}

离线批量调用（Python脚本示例）

import requests
import json

# API端点
url = "http://localhost:8000/predict"

# 批量文本
texts = [
    "This movie is terrible.",
    "Amazing experience, highly recommend!",
    "It's okay, not great but not bad."
]

# 发送请求
for text in texts:
    response = requests.post(url, json={"text": text})
    if response.status_code == 200:
        result = response.json()
        print(f"Text: {text} -> Label: {result['label']}, Score: {result['score']}")
    else:
        print(f"Error for text '{text}': {response.status_code}")

6. 指标说明

任务类型：二分类。

AUC（Area Under ROC Curve）：范围0-1，值越接近1模型越好。用于衡量模型在不同阈值下区分正负类的能力，适合类别不平衡场景（如负面评论较少）。
F1 Score：精确率和召回率的调和平均，范围0-1。当需要平衡误报和漏报时使用（例如，客服系统既要减少误判负面，也要避免漏掉负面反馈）。
Accuracy：正确预测的比例，简单但可能在不平衡数据上误导。本例使用预训练模型，指标已在原始训练中评估（如SST-2上Accuracy约91%），部署时更关注API性能。

7. 上线后评估（离线监控、线上指标、重训触发条件）

离线监控：定期（如每周）用标注数据测试API，计算AUC/F1，对比基线下降>5%则告警。
线上指标：监控API延迟（P95 < 500ms）、错误率（< 1%）、吞吐量（QPS），使用Prometheus+Grafana。
重训触发条件：1) 离线指标持续下降；2) 业务数据分布漂移（如新领域文本）；3) 模型版本更新（Hugging Face发布新版本）。

8. 常见坑与排查

Docker镜像过大：使用slim基础镜像，清理缓存（如apt-get autoremove），多阶段构建可进一步优化。
模型加载慢：首次启动会下载模型（约250MB），使用volumes持久化缓存避免重复下载。
API并发性能不足：FastAPI默认异步，但模型推理是CPU/GPU瓶颈。优化：使用GPU、批处理请求、增加worker数（如uvicorn --workers 4）。
内存溢出：大文本或高并发可能导致OOM。限制输入长度（代码中已设置max_length=512），监控容器内存使用。
版本管理混乱：在Dockerfile中固定模型版本（如distilbert-base-uncased-finetuned-sst-2-english），使用标签区分镜像版本。