【大模型实战篇】BGE-Rerank-base重排服务部署教程
1. 下载BGE-reranker-base模型
从modelscope(https://www.modelscope.cn/models/BAAI/bge-reranker-base/summary)下载对应的模型权重文件:
https://www.modelscope.cn/models/BAAI/bge-reranker-base/files
模型信息:
total 2.1G
-rw-r--r-- 1 root root 799 May 26 10:26 config.json
-rw-r--r-- 1 root root 77 May 26 10:26 configuration.json
-rw-r--r-- 1 root root 1.1G May 26 10:28 model.safetensors
drwxr-xr-x 2 root root 4.0K May 26 10:28 onnx
-rw-r--r-- 1 root root 1.1G May 26 10:28 pytorch_model.bin
-rw-r--r-- 1 root root 34K May 26 10:26 README.md
-rw-r--r-- 1 root root 4.9M May 26 10:26 sentencepiece.bpe.model
-rw-r--r-- 1 root root 279 May 26 10:26 special_tokens_map.json
-rw-r--r-- 1 root root 443 May 26 10:26 tokenizer_config.json
-rw-r--r-- 1 root root 17M May 26 10:26 tokenizer.json
2. 本地构建镜像
2.1 build_image.sh
registry=example.cn/enterprise
version=${1:-0.0.1} # 版本号
publish=${2:-false} # 是否发布到镜像仓库docker build -f ./Dockerfile -t ${registry}/rerank:${version} .
docker tag ${registry}/rerank:${version} ${registry}/rerank:latest
if [ "${publish}" = "true" ]; thendocker push ${registry}/rerank:${version}
fi
2.2 Dockerfile
FROM swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel# 安装Python包管理工具pip
RUN apt-get update && \apt-get install -y --no-install-recommends python3-pip && \rm -rf /var/lib/apt/lists/*
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN apt-get update && \apt-get install -y --no-install-recommends vim && \rm -rf /var/lib/apt/lists/*# 创建项目
WORKDIR /app# 复制项目文件
ADD src /app/src
ADD *.sh /app/
ADD README.md /app/
ADD pyproject.toml /app/RUN pip install build && \pip install .# 运行项目
ENTRYPOINT ["bash", "/app/start_service.sh"]
2.3 pyproject.toml
[project]
name = "rerank_service"
version = "0.0.1"
description = "rerank_service"
readme = "README.md"
requires-python = ">=3.10"
dependencies = ["fastapi>=0.115.0","uvicorn>=0.31.0","FlagEmbedding>=1.2.10","peft==0.13.2","transformers==4.41.1"
]
这里之前遇到个版本兼容性报错:
cannot import name ‘EncoderDecoderCache‘ from ‘transformers‘“
主要是peft与transformers包版本的问题,安装下述版本可以解决:
peft==0.13.2
transformers==4.41.1
2.4 start_service.sh
#!/bin/bashPORT=${1:-7099}
WORKERS=${2:-1}
cmd="uvicorn src.service:app --host 0.0.0.0 --port $PORT --workers $WORKERS"
echo "Running command: $cmd"
eval "$cmd"
2.5 service.sh
import osimport uvicorn
from fastapi import FastAPI, Request
from loguru import logger
from pydantic import BaseModel, Fielddef load_model(model_path):logger.info(f"loading model from :{model_path}")from FlagEmbedding import FlagRerankerreranker = FlagReranker(model_path, use_fp16=True) return rerankerMODEL_HOME = os.environ["MODEL_HOME"]MODEL = load_model(f"{MODEL_HOME}/BAAI/bge-reranker-base")app = FastAPI()class Response(BaseModel):code: int = Field(description="返回码,200为正常返回", default=200)msg: str = Field(description="返回消息", default="success")data: dict | list | BaseModel = Field(description="返回数据", default={})class Config:schema_extra = {"example": {"code": 200,"msg": "success",}}@app.get("/health")
def health() -> Response:resp = Response(data={"status": "OK"})return respclass Req(BaseModel):text1: strtext2: str@app.post("/get_sim_score/")
def get_sim_score(text1: str, text2: str):logger.info(f"text1={text1}, text2={text2}")score = MODEL.compute_score([text1, text2])if isinstance(score, list):score = score[0]logger.info(f"score={score}")return Response(data={"score": score})@app.post("/get_sim_score_v1")
async def get_sim_score_v1(request: Request):json_post = await request.json()data = {}try:if 'rerank_pairs' in json_post:contents = json_post['rerank_pairs']logger.info(f"rerank_pairs={contents}")score = MODEL.compute_score(contents)logger.info(f"score={score}")return Response(data={"score": score})else:return Response(data=data, code=400, msg="rerank_pairs not found")except Exception as e:return Response(data=data, code=500, msg=str(e))if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=7099)
2.6 启动镜像
server_name='rerank_service'
docker stop $server_name
docker rm $server_name# 运行容器
sudo docker run -itd --restart=always --name $server_name \
--gpus \"device=1\" \
--cap-add SYS_TIME \
--shm-size 32g -p 7099:7099 \
-v /data/model:/model \
-v /etc/localtime:/etc/localtime:ro \
-v /etc/timezone:/etc/timezone:ro \
-e MODEL_HOME=/model \
example.cn/enterprise/rerank:latest 7099 1# 查看日志
docker logs -f --tail 10 $server_name
3. 测试脚本
3.1 get_rel_score
curl --location --request POST 'http://IP:7099/get_sim_score?text1=ball&text2=tennis'
3.2 get_rel_score_v1
curl --location --request POST 'http://IP:7099/get_sim_score_v1' \--header 'Content-Type: application/json' \
--data-raw '{"rerank_pairs": [["ball", "tennis"]]
}'