ATC模型转换动态shape问题案例 _模型

ATC（）是异构计算架构CANN体系下的模型转换工具：它可以将开源框架的网络模型（如等）以及 IR定义的单算子描述文件转换为昇腾AI处理器支持的离线模型；模型转换过程中，ATC会进行算子调度优化、权重数据重排、内存使用优化等具体操作，对原始的深度学习模型进行进一步的调优，从而满足部署场景下的高性能需求，使其能够高效执行在昇腾AI处理器上。
本期就分享几个关于ATC模型转换动态shape相关问题的典型案例，并给出原因分析及解决方法：
原始网络模型shape中存在不固定的维度值，模型转换未设置shape信息动态/动态分辨率/动态维度场景，只设置一个档位，模型转换失败使用动态参数转模型时，其他档位设置了-1，模型转换失败使用动态分辨率参数转模型时，其他档位设置了-1，模型转换失败 01原始网络模型shape中存在不固定的维度值，模型转换时未设置shape信息问题现象描述
获取原始网络模型，执行如下命令进行模型转换：
atc --model=./resnet_shape.pb --framework=3 --output=./out/resnet_shape --soc_version=Ascend310
报错信息如下：
ATC run failed, Please check the detail log, Try 'atc --help' for more informationE10001: Value [-1] for parameter [Inputs] is invalid. Reason: maybe you should set input_shape to specify its shapeSolution: Try again with a valid argument.
原因分析
原始模型的shape存在不固定的维度值“-1”，模型输入样例如下，模型转换时，并未给不固定的维度值赋值。
解决措施
atc --model=./resnet_shape.pb --framework=3 --output=./out/resnet_shape --soc_version=Ascend310 --input_shape="Inputs:1,224,224,3"
与动态参数配合使用，使转换后的模型进行推理时，可以每次处理多种数量的图片，示例如下：
atc --model=./resnet_shape.pb --framework=3 --output=./out/resnet_shape --soc_version=Ascend310 --input_shape="Inputs:-1,224,224,3" --dynamic_batch_size="1,2,4,8"
这样转换后的离线模型，可以支持每次处理1、2、4、8张图片，而不用再进行4次模型转换。
模型转换时，将对应维度的值设置成一个范围，示例如下：
atc --model=./resnet_shape.pb --framework=3 --output=./out/resnet_shape --soc_version=Ascend910 --input_shape="Inputs:1~10,224,224,3"

文章插图
这样转换后的离线模型，可以支持每次处理1~10张范围内的图片。
02 动态/动态分辨率/动态维度场景，只设置一个档位，模型转换失败问题现象描述
此类问题我们以--参数为例进行说明。
使用ATC工具进行模型转换时，使用--参数转换支持多个的模型，转换命令样例如下：

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:-1,224,224,3" --dynamic_batch_size="2" --soc_version=Ascend310 --output=./out/test --framework=3

报错信息如下：

ATC run failed, Please check the detail log, Try 'atc --help' for more informationE10035: [--dynamic_batch_size], [--dynamic_image_size], or [--dynamic_dims] has [1] profiles, which is less than the minimum ([2]).Solution: Ensure that the number of profiles configured in [--dynamic_batch_size], [--dynamic_image_size], or [--dynamic_dims] is at least the minimum.TraceBack (most recent call last):[GraphOpt][Prepare] Failed to run multi-dims-process for graph[test].[FUNC:OptimizeAfterGraphNormalization][FILE:fe_graph_optimizer.cc][LINE:639]Call OptimizeAfterGraphNormalization failed, engine_name:AIcoreEngine, graph_name:test[FUNC:OptimizeAfterGraphNormalization][FILE:graph_optimize.cc][LINE:224]build graph failed, graph id:0, ret:1343225857[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1656]

原因分析
使用ATC工具进行模型转换，如果使用了--或--或--动态shape参数时，请确保设置的档位数取值范围为(1,100] ，既必须设置至少2个档位，最多支持100档配置。
上述模型转换命令，只设置了一个档位，不符合参数设置要求。
解决措施
重新设置模型转换时的档位信息，至少设置2个档位，档位之间使用英文逗号分隔。改后样例如下：

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:-1,224,224,3" --dynamic_batch_size="2,4" --soc_version=Ascend310 --output=./out/test --framework=3

03 使用动态参数转模型时，其他档位设置了-1，模型转换失败问题现象描述
使用ATC工具进行模型转换时，使用--参数转换支持多个的模型，转换命令样例如下：

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:-1,-1,-1,3" --dynamic_batch_size="2,4,8" --soc_version=Ascend310 --output=./out/test --framework=3

报错信息如下：

文章插图

ATC run failed, Please check the detail log, Try 'atc --help' for more informationE10018: Value [-1] for shape [1] is invalid. When [--dynamic_batch_size] is included, only batch size N can be –1 in [--input_shape].Possible Cause: When [--dynamic_batch_size] is included, only batch size N can be –1 in the shape.Solution: Try again with a valid [--input_shape] argument. Make sure that non-batch size axes are not –1.TraceBack (most recent call last):[--dynamic_batch_size] is included, but none of the nodes specified in [--input_shape] have a batch size equaling –1.

原因分析
使用ATC工具进行模型转换，如果使用了--参数，shape中只有N支持设置为"-1"，且只支持N在shape首位的场景，既shape的第一位设置为"-1" 。如果N在非首位场景下，请使用--参数进行设置。
上述模型转换命令， shape中N、H、W都设置了"-1" ，不符合参数设置要求。
解决措施
重新设置模型转换时的参数信息，只设置shape中的N为"-1" 。改后样例如下：

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:-1,224,224,3" --dynamic_batch_size="2,4,8" --soc_version=Ascend310 --output=./out/test --framework=3

04使用动态分辨率参数转模型时，其他档位设置了-1，模型转换失败问题现象描述
使用ATC工具进行模型转换时，使用--参数转换支持多个分辨率的模型，转换命令样例如下：
【ATC模型转换动态shape问题案例】

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:-1,-1,-1,3" --dynamic_image_size="448,448;224,224" --soc_version=Ascend310 --output=./out/test --framework=3

报错信息如下：

ATC run failed, Please check the detail log, Try 'atc --help' for more informationE10019: When [--dynamic_image_size] is included, only the height and width axes can be –1 in [--input_shape].Possible Cause: When [--dynamic_image_size] is included, only the height and width axes can be –1 in the shape.Solution: Try again with a valid [--input_shape] argument. Make sure that axes other than height and width are not –1.

原因分析
使用ATC工具进行模型转换，如果使用了--参数，shape中只有H、W支持设置为"-1"，且只支持为NCHW、NHWC格式；其他场景，设置分辨率请使用--参数。上述模型转换命令，shape中N、H、W都设置了"-1"，不符合参数设置要求。
解决措施
重新设置模型转换时的参数信息，只设置shape中的H ， W为"-1" 。改后样例如下：

atc --model=./resnet50_tensorflow_1.7.pb --input_shape="Placeholder:1,-1,-1,3" --dynamic_image_size="448,448;224,224" --soc_version=Ascend310 --output=./out/test --framework=3