当前位置：首页 > ds >正文

轻量级高性能推理引擎MNN 学习笔记 02.MNN主要API

ds 2025/7/2 9:45:34

1. MNN 主要API

注意：本学习笔记只介绍了我在学习过程中常用的API ，更多MNN API 请参考官方文档。

1.1. 推理时操作流程

创建Interpreter ： createFromFile()
通过Interpreter创建Session ：createSession()
设置输入数据: getSessionInput()、map()、unmap()、copyFromHostTensor（）
通过Session进行推理: runSession()
获取推理结果：getSessionOutput()、map()、unmap()、copyToHostTensor（）
释放Interpreter：delete

1.2. Interpreter

使用MNN推理时，有两个层级的抽象，分别是解释器Interpreter和会话Session。Interpreter是模型数据的持有者；Session通过Interpreter创建，是推理数据的持有者。多个推理可以共用同一个模型，即多个Session可以共用一个Interpreter。

在创建完Session，且不再创建Session或更新训练模型数据时，Interpreter可以通过releaseModel函数释放模型数据，以节省内存。

1.2.1. 创建Interpreter

通过磁盘文件创建

/*** @brief create net from file.* @param file  given file.* @return created net if success, NULL otherwise.*/
static Interpreter* createFromFile(const char* file);

函数返回的Interpreter实例是通过new创建的，务必在不再需要时，通过delete释放，以免造成内存泄露。

1.3. Session

一般通过Interpreter::createSession创建Session：

/*** @brief create session with schedule config. created session will be managed in net.* @param config session schedule config.* @return created session if success, NULL otherwise.*/
Session* createSession(const ScheduleConfig& config);

函数返回的Session实例是由Interpreter管理，随着Interpreter销毁而释放，一般不需要关注。也可以在不再需要时，调用Interpreter::releaseSession释放，减少内存占用。

创建Session 一般而言需要较长耗时，而Session在多次推理过程中可以重复使用，建议只创建一次多次使用。

1.4. ScheduleConfig

简易模式:不需要额外设置调度配置，函数会根据模型结构自动识别出调度路径、输入输出，例如：

ScheduleConfig conf;
Session* session = interpreter->createSession(conf); // 创建Session

这种模式下采用CPU推理。

高级模式：需要设置调度配置，例如：

/** session schedule config */
struct ScheduleConfig {/** which tensor should be kept */std::vector<std::string> saveTensors;/** forward type */MNNForwardType type = MNN_FORWARD_CPU;/** CPU:number of threads in parallel , Or GPU: mode setting*/union {int numThread = 4;int mode;};/** subpath to run */struct Path {std::vector<std::string> inputs;std::vector<std::string> outputs;enum Mode {/*** Op Mode* - inputs means the source op, can NOT be empty.* - outputs means the sink op, can be empty.* The path will start from source op, then flow when encounter the sink op.* The sink op will not be compute in this path.*/Op = 0,/*** Tensor Mode* - inputs means the inputs tensors, can NOT be empty.* - outputs means the outputs tensors, can NOT be empty.* It will find the pipeline that compute outputs from inputs.*/Tensor = 1};/** running mode */Mode mode = Op;};Path path;/** backup backend used to create execution when desinated backend do NOT support any op */MNNForwardType backupType = MNN_FORWARD_CPU;/** extra backend config */BackendConfig* backendConfig = nullptr;
};

推理时，主选后端由type指定，默认为CPU。若模型中存在主选后端不支持的算子，这些算子会使用由backupType指定的备选后端运行。

推理路径包括由path的inputs到outputs途径的所有算子，在不指定时，会根据模型结构自动识别。为了节约内存，MNN会复用outputs之外的tensor内存。如果需要保留中间tensor的结果，可以使用saveTensors保留tensor结果，避免内存复用。

CPU推理时，并发数与线程数可以由numThread修改。numThread决定并发数的多少，但具体线程数和并发效率，不完全取决于numThread：

iOS，线程数由系统GCD决定；
启用MNN_USE_THREAD_POOL时，线程数取决于第一次配置的大于1的numThread；
OpenMP，线程数全局设置，实际线程数取决于最后一次配置的numThread；

GPU推理时，可以通过mode来设置GPU运行的一些参量选择(暂时只支持OpenCL)。GPU mode参数如下：

typedef enum {// choose one tuning mode OnlyMNN_GPU_TUNING_NONE    = 1 << 0,/* Forbidden tuning, performance not good */MNN_GPU_TUNING_HEAVY  = 1 << 1,/* heavily tuning, usually not suggested */MNN_GPU_TUNING_WIDE   = 1 << 2,/* widely tuning, performance good. Default */MNN_GPU_TUNING_NORMAL = 1 << 3,/* normal tuning, performance may be ok */MNN_GPU_TUNING_FAST   = 1 << 4,/* fast tuning, performance may not good */// choose one opencl memory mode Only/* User can try OpenCL_MEMORY_BUFFER and OpenCL_MEMORY_IMAGE both, then choose the better one according to performance*/MNN_GPU_MEMORY_BUFFER = 1 << 6,/* User assign mode */MNN_GPU_MEMORY_IMAGE  = 1 << 7,/* User assign mode */
} MNNGpuMode;

目前支持tuning力度以及GPU memory用户可自由设置。例如：

MNN::ScheduleConfig config;
config.mode = MNN_GPU_TUNING_NORMAL | MNN_GPU_MEMORY_IMAGE;

tuning力度选取越高，第一次初始化耗时越多，推理性能越佳。如果介意初始化时间过长，可以选取MNN_GPU_TUNING_FAST或者MNN_GPU_TUNING_NONE，也可以同时通过下面的cache机制，第二次之后就不会慢。GPU_Memory用户可以指定使用MNN_GPU_MEMORY_BUFFER或者MNN_GPU_MEMORY_IMAGE，用户可以选择性能更佳的那一种。如果不设定，框架会采取默认判断帮你选取(不保证一定性能最优)。

上述CPU的numThread和GPU的mode，采用union联合体方式，共用同一片内存。用户在设置的时候numThread和mode只需要设置一种即可，不要重复设置。

对于GPU初始化较慢的问题，提供了Cache机制。后续可以直接加载cache提升初始化速度。

具体可以参考tools/cpp/MNNV2Basic.cpp里面setCacheFile设置cache方法进行使用。
当模型推理输入尺寸有有限的多种时，每次resizeSession后调用updateCacheFile更新cache文件。
当模型推理输入尺寸无限随机变化时，建议config.mode设为1，关闭MNN_GPU_TUNING。

1.5. 输入数据

1.5.1. 获取输入tensor

/*** @brief get input tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first input.* @return tensor if found, NULL otherwise.*/
Tensor* getSessionInput(const Session* session, const char* name);/*** @brief get all input tensors.* @param session   given session.* @return all output tensors mapped with name.*/
const std::map<std::string, Tensor*>& getSessionInputAll(const Session* session) const;

Interpreter上提供了两个用于获取输入Tensor的方法：getSessionInput用于获取单个输入tensor， getSessionInputAll用于获取输入tensor映射。

在只有一个输入tensor时，可以在调用getSessionInput时传入NULL以获取tensor。

1.5.2. 【推荐】映射填充数据

映射输入Tensor的内存，部分后端可以免数据拷贝

auto input = interpreter->getSessionInput(session, NULL);
void* host = input->map(MNN::Tensor::MAP_TENSOR_WRITE, input->getDimensionType());
// fill host memory data
input->unmap(MNN::Tensor::MAP_TENSOR_WRITE,  input->getDimensionType(), host);

1.5.3. 【不推荐】拷贝填充数据

NCHW示例，适用 ONNX / Caffe / Torchscripts 转换而来的模型：

auto inputTensor = interpreter->getSessionInput(session, NULL);
auto nchwTensor = new Tensor(inputTensor, Tensor::CAFFE);
// nchwTensor-host<float>()[x] = ...
inputTensor->copyFromHostTensor(nchwTensor);
delete nchwTensor;

通过这类拷贝数据的方式，用户只需要关注自己创建的tensor的数据布局，copyFromHostTensor会负责处理数据布局上的转换（如需）和后端间的数据拷贝（如需）。

1.6. 运行会话

MNN中，Interpreter一共提供了三个接口用于运行Session，但一般来说，简易运行就足够满足绝对部分场景。

1.6.1. 简易运行

/*** @brief run session.* @param session   given session.* @return result of running.*/
ErrorCode runSession(Session* session) const;

1.7. 获取输出tensor

/*** @brief get output tensor for given name.* @param session   given session.* @param name      given name. if NULL, return first output.* @return tensor if found, NULL otherwise.*/
Tensor* getSessionOutput(const Session* session, const char* name);/*** @brief get all output tensors.* @param session   given session.* @return all output tensors mapped with name.*/
const std::map<std::string, Tensor*>& getSessionOutputAll(const Session* session) const;

Interpreter上提供了两个用于获取输出Tensor的方法：getSessionOutput用于获取单个输出tensor， getSessionOutputAll用于获取输出tensor映射。

在只有一个输出tensor时，可以在调用getSessionOutput时传入NULL以获取tensor。

1.7.1. 【推荐】映射输出数据

映射输出Tensor的内存数据，部分后端可以免数据拷贝

auto outputTensor = net->getSessionOutput(session, NULL);
void* host = outputTensor->map(MNN::Tensor::MAP_TENSOR_READ,  outputTensor->getDimensionType());
// use host memory by yourself
outputTensor->unmap(MNN::Tensor::MAP_TENSOR_READ,  outputTensor->getDimensionType(), host);

1.7.2. 【不推荐】拷贝输出数据

NCHW示例，适用 ONNX / Caffe / Torchscripts 转换而来的模型：

auto outputTensor = interpreter->getSessionOutput(session, NULL);
auto nchwTensor = new Tensor(outputTensor, Tensor::CAFFE);
outputTensor->copyToHostTensor(nchwTensor);
auto score = nchwTensor->host<float>()[0];
auto index = nchwTensor->host<float>()[1];
// ...
delete nchwTensor;

通过这类拷贝数据的方式，用户只需要关注自己创建的tensor的数据布局，copyToHostTensor会负责处理数据布局上的转换（如需）和后端间的数据拷贝（如需）

enum

MNNForwardType

缺省值是 MNN_FORWARD_CPU =0 ，即表示采用CPU后端进行推理。

typedef enum {MNN_FORWARD_CPU = 0,/*Firtly find the first available backends not equal to CPUIf no other backends, use cpu*/MNN_FORWARD_AUTO = 4,/*Hand write metal*/MNN_FORWARD_METAL = 1,/*NVIDIA GPU API*/MNN_FORWARD_CUDA = 2,/*Android / Common Device GPU API*/MNN_FORWARD_OPENCL = 3,MNN_FORWARD_OPENGL = 6,MNN_FORWARD_VULKAN = 7,/*Android 8.1's NNAPI or CoreML for ios*/MNN_FORWARD_NN = 5,/*User can use API from Backend.hpp to add or search Backend*/MNN_FORWARD_USER_0 = 8,MNN_FORWARD_USER_1 = 9,MNN_FORWARD_USER_2 = 10,MNN_FORWARD_USER_3 = 11,MNN_FORWARD_ALL = 12,/* Apply arm extension instruction set to accelerate some Ops, this forward typeis only used in MNN internal, and will be active automatically when user set forward typeto be MNN_FORWARD_CPU and extension instruction set is valid on hardware.*/MNN_FORWARD_CPU_EXTENSION = 13,// use for shared memory on android deviceMNN_MEMORY_AHARDWAREBUFFER = 14
} MNNForwardType;