当前位置: 首页 > news >正文

【大模型实战篇】BGE-Rerank-base重排服务部署教程

1. 下载BGE-reranker-base模型

        从modelscope(https://www.modelscope.cn/models/BAAI/bge-reranker-base/summary)下载对应的模型权重文件:

https://www.modelscope.cn/models/BAAI/bge-reranker-base/files

    模型信息:

total 2.1G
-rw-r--r-- 1 root root  799 May 26 10:26 config.json
-rw-r--r-- 1 root root   77 May 26 10:26 configuration.json
-rw-r--r-- 1 root root 1.1G May 26 10:28 model.safetensors
drwxr-xr-x 2 root root 4.0K May 26 10:28 onnx
-rw-r--r-- 1 root root 1.1G May 26 10:28 pytorch_model.bin
-rw-r--r-- 1 root root  34K May 26 10:26 README.md
-rw-r--r-- 1 root root 4.9M May 26 10:26 sentencepiece.bpe.model
-rw-r--r-- 1 root root  279 May 26 10:26 special_tokens_map.json
-rw-r--r-- 1 root root  443 May 26 10:26 tokenizer_config.json
-rw-r--r-- 1 root root  17M May 26 10:26 tokenizer.json

2. 本地构建镜像

2.1 build_image.sh

registry=example.cn/enterprise
version=${1:-0.0.1} # 版本号
publish=${2:-false} # 是否发布到镜像仓库docker build -f ./Dockerfile -t ${registry}/rerank:${version} .
docker tag ${registry}/rerank:${version} ${registry}/rerank:latest
if [ "${publish}" = "true" ]; thendocker push ${registry}/rerank:${version}
fi

2.2 Dockerfile

FROM swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel# 安装Python包管理工具pip
RUN apt-get update && \apt-get install -y --no-install-recommends python3-pip && \rm -rf /var/lib/apt/lists/*
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN apt-get update && \apt-get install -y --no-install-recommends vim && \rm -rf /var/lib/apt/lists/*# 创建项目
WORKDIR /app# 复制项目文件
ADD src /app/src
ADD *.sh /app/
ADD README.md /app/
ADD pyproject.toml /app/RUN pip install build && \pip install .# 运行项目
ENTRYPOINT ["bash", "/app/start_service.sh"] 

2.3 pyproject.toml

[project]
name = "rerank_service"
version = "0.0.1"
description = "rerank_service"
readme = "README.md"
requires-python = ">=3.10"
dependencies = ["fastapi>=0.115.0","uvicorn>=0.31.0","FlagEmbedding>=1.2.10","peft==0.13.2","transformers==4.41.1"
]

这里之前遇到个版本兼容性报错:

cannot import name ‘EncoderDecoderCache‘ from ‘transformers‘“

主要是peft与transformers包版本的问题,安装下述版本可以解决:

 peft==0.13.2
 transformers==4.41.1

2.4 start_service.sh

#!/bin/bashPORT=${1:-7099}
WORKERS=${2:-1}
cmd="uvicorn src.service:app --host 0.0.0.0 --port $PORT --workers $WORKERS"
echo "Running command: $cmd"
eval "$cmd"

2.5 service.sh

import osimport uvicorn
from fastapi import FastAPI, Request
from loguru import logger
from pydantic import BaseModel, Fielddef load_model(model_path):logger.info(f"loading model from :{model_path}")from FlagEmbedding import FlagRerankerreranker = FlagReranker(model_path, use_fp16=True) return rerankerMODEL_HOME = os.environ["MODEL_HOME"]MODEL = load_model(f"{MODEL_HOME}/BAAI/bge-reranker-base")app = FastAPI()class Response(BaseModel):code: int = Field(description="返回码,200为正常返回", default=200)msg: str = Field(description="返回消息", default="success")data: dict | list | BaseModel = Field(description="返回数据", default={})class Config:schema_extra = {"example": {"code": 200,"msg": "success",}}@app.get("/health")
def health() -> Response:resp = Response(data={"status": "OK"})return respclass Req(BaseModel):text1: strtext2: str@app.post("/get_sim_score/")
def get_sim_score(text1: str, text2: str):logger.info(f"text1={text1}, text2={text2}")score = MODEL.compute_score([text1, text2])if isinstance(score, list):score = score[0]logger.info(f"score={score}")return Response(data={"score": score})@app.post("/get_sim_score_v1")
async def get_sim_score_v1(request: Request):json_post = await request.json()data = {}try:if 'rerank_pairs' in json_post:contents = json_post['rerank_pairs']logger.info(f"rerank_pairs={contents}")score = MODEL.compute_score(contents)logger.info(f"score={score}")return Response(data={"score": score})else:return Response(data=data, code=400, msg="rerank_pairs not found")except Exception as e:return Response(data=data, code=500, msg=str(e))if __name__ == "__main__":uvicorn.run(app, host="0.0.0.0", port=7099)

2.6 启动镜像

server_name='rerank_service'
docker stop $server_name
docker rm $server_name# 运行容器
sudo docker run -itd --restart=always --name $server_name \
--gpus \"device=1\" \
--cap-add SYS_TIME \
--shm-size 32g -p 7099:7099 \
-v /data/model:/model \
-v /etc/localtime:/etc/localtime:ro \
-v /etc/timezone:/etc/timezone:ro \
-e MODEL_HOME=/model \
example.cn/enterprise/rerank:latest 7099 1# 查看日志
docker logs -f --tail 10 $server_name

3. 测试脚本

3.1 get_rel_score

curl --location --request POST 'http://IP:7099/get_sim_score?text1=ball&text2=tennis'

3.2 get_rel_score_v1

curl --location --request POST 'http://IP:7099/get_sim_score_v1' \--header 'Content-Type: application/json' \
--data-raw '{"rerank_pairs": [["ball", "tennis"]]
}'

http://www.xdnf.cn/news/671725.html

相关文章:

  • AI前端开发岗位面试准备指南
  • 什么是数据驱动?以及我们应如何理解数据驱动?
  • 什么是可重组机器人?
  • 33. 自动化测试开发之使用mysql异步连接池实现mysql数据库操作
  • 前端域名、端口、协议一样,本地缓存可以共享吗?
  • 【b站计算机拓荒者】【2025】微信小程序开发教程 - chapter3 项目实践 - 2信息采集
  • Protocol Buffers 复杂嵌套编译指南:生成 C++ 代码
  • JavaScript- 3.2 JavaScript实现不同显示器尺寸的响应式主题和页面
  • 开源酷炫大数据可视化大屏html+eacher 100+套
  • 力扣热题——分类求和并作差
  • Vue-02 (使用不同的 Vue CLI 插件)
  • 从 PyTorch 到 TensorFlow Lite:模型训练与推理
  • 【华为云物联网】iOtDA数据以表格字段转发OBS的设置攻略,便于以后数据上大屏
  • 如何描述BUG
  • VUE项目部署IIS服务器手册
  • 机器学习笔记【Week6】
  • 打板策略实战对比,khQuant回测横评第三弹【AI量化第29篇】
  • Nginx 在四大核心场景中的应用实践与优化
  • 深入解析 Flink 中的时间与窗口机制
  • 安卓证书的申请(保姆级图文)
  • Python服务器请求转发服务
  • KT6368A通过蓝牙芯片获取手机时间详细说明,对应串口指令举例
  • ubuntu中,c和c+程序,预编译、编译、链接和运行命令
  • 交换机 路由器
  • 2025 年江西研究生数学建模竞赛题A题电动汽车充电桩共享优化与电网安全协同模型完整思路 模型代码 结果 成品分享
  • 模板应用更新同步到所有开发中的应用
  • 思澈LCD-kit测试过程记录
  • 跳表(Skip List)查找算法详解
  • 2024年12月英语六级(第二套)真题+解析全24页
  • MySQL-5.7 修改密码和连接访问权限