AI部署之路 | 模型选型、本地部署、服务器部署、模型转换全栈打通！( 三 ) _推理

个人建议，如果自己训练的模型想要被别人更好地使用，想开源模型贡献的话，也可以参考写个类似的[model card]()，介绍下你的模型哈。
回到正题，拿到模型的第一点，当然是观察模型的结构。模型结构主要是为了分析这个模型是基于哪个、哪个框架训练的，可以知道模型的大概，心里有数：
一般来说要看：
一般经验丰富的算法工程师，大概看一下模型结构、模型op类型就知道这个模型坑不坑了。
分两种，一种是评测模型的精度指标，一些常见的指标：
不列了，不同任务模型评价的指标也不一样，明白这个意思就行。
另一种是模型的性能，在确认模型精度符合要求之后，接下来需要看模型的几个指标：
相关概念：
to theof time it takes for the model to make aafterinput. In other words, it's the timethe input and the.is anfor real-timewhere fastare , such as in self- cars or.

文章插图
to theofthat a model can make in a givenof time. It'sinper(PPS).is anfor large-scalewhere the model needs toa largeof data , such as in imageor.
举个的例子：

[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.45354 ms - Host latency: 5.05334 ms (enqueue 1.61294 ms)[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.46018 ms - Host latency: 5.06121 ms (enqueue 1.61682 ms)[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.47092 ms - Host latency: 5.07136 ms (enqueue 1.61714 ms)[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.48318 ms - Host latency: 5.08337 ms (enqueue 1.61753 ms)[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.49258 ms - Host latency: 5.09268 ms (enqueue 1.61719 ms)[01/08/2023-10:47:32] [I] Average on 10 runs - GPU latency: 4.50391 ms - Host latency: 5.10193 ms (enqueue 1.43665 ms)[01/08/2023-10:47:32] [I] [01/08/2023-10:47:32] [I] === Performance summary ===[01/08/2023-10:47:32] [I] Throughput: 223.62 qps[01/08/2023-10:47:32] [I] Latency: min = 5.00714 ms, max = 5.33484 ms, mean = 5.06326 ms, median = 5.05469 ms, percentile(90%) = 5.10596 ms, percentile(95%) = 5.12622 ms, percentile(99%) = 5.32755 ms[01/08/2023-10:47:32] [I] Enqueue Time: min = 0.324463 ms, max = 1.77274 ms, mean = 1.61379 ms, median = 1.61826 ms, percentile(90%) = 1.64294 ms, percentile(95%) = 1.65076 ms, percentile(99%) = 1.66541 ms[01/08/2023-10:47:32] [I] H2D Latency: min = 0.569824 ms, max = 0.60498 ms, mean = 0.58749 ms, median = 0.587158 ms, percentile(90%) = 0.591064 ms, percentile(95%) = 0.592346 ms, percentile(99%) = 0.599182 ms[01/08/2023-10:47:32] [I] GPU Compute Time: min = 4.40759 ms, max = 4.73703 ms, mean = 4.46331 ms, median = 4.45447 ms, percentile(90%) = 4.50464 ms, percentile(95%) = 4.5282 ms, percentile(99%) = 4.72678 ms[01/08/2023-10:47:32] [I] D2H Latency: min = 0.00585938 ms, max = 0.0175781 ms, mean = 0.0124573 ms, median = 0.0124512 ms, percentile(90%) = 0.013855 ms, percentile(95%) = 0.0141602 ms, percentile(99%) = 0.0152588 ms[01/08/2023-10:47:32] [I] Total Host Walltime: 3.01404 s[01/08/2023-10:47:32] [I] Total GPU Compute Time: 3.00827 s[01/08/2023-10:47:32] [W] * GPU compute time is unstable, with coefficient of variance = 1.11717%.

的场景和方式也有很多。除了本地离线测试，云端的也是需要的，举个压测--的例子：

*** Measurement Settings ***Batch size: 1Service Kind: TritonUsing "time_windows" mode for stabilizationMeasurement window: 45000 msecUsing synchronous calls for inferenceStabilizing using average latencyRequest concurrency: 1Client: Request count: 32518Throughput: 200.723 infer/secAvg latency: 4981 usec (standard deviation 204 usec)p50 latency: 4952 usecp90 latency: 5044 usecp95 latency: 5236 usecp99 latency: 5441 usecAvg HTTP time: 4978 usec (send/recv 193 usec + response wait 4785 usec)Server: Inference count: 32518Execution count: 32518Successful request count: 32518Avg request latency: 3951 usec (overhead 46 usec + queue 31 usec + compute 3874 usec)Composing models: centernet, version: Inference count: 32518Execution count: 32518Successful request count: 32518Avg request latency: 3512 usec (overhead 14 usec + queue 11 usec + compute input 93 usec + compute infer 3370 usec + compute output 23 usec)centernet-postprocess, version: Inference count: 32518Execution count: 32518Successful request count: 32518Avg request latency: 96 usec (overhead 15 usec + queue 8 usec + compute input 7 usec + compute infer 63 usec + compute output 2 usec)image-preprocess, version: Inference count: 32518Execution count: 32518Successful request count: 32518Avg request latency: 340 usec (overhead 14 usec + queue 12 usec + compute input 234 usec + compute infer 79 usec + compute output 0 usec)Server Prometheus Metrics: Avg GPU Utilization:GPU-6d31bfa8-5c82-a4ec-9598-ce41ea72b7d2 : 70.2407%Avg GPU Power Usage:GPU-6d31bfa8-5c82-a4ec-9598-ce41ea72b7d2 : 264.458 wattsMax GPU Memory Usage:GPU-6d31bfa8-5c82-a4ec-9598-ce41ea72b7d2 : 1945108480 bytesTotal GPU Memory:GPU-6d31bfa8-5c82-a4ec-9598-ce41ea72b7d2 : 10504634368 bytesInferences/Second vs. Client Average Batch LatencyConcurrency: 1, throughput: 200.723 infer/sec, latency 4981 usec
上一页
1
2
3
4
5
6
下一页
		  	









GPT-4发布：多模态大模型，AI能力再度进化，可识别图像内容 

像素游戏制作大师手机版,pg制作大师怎么添加人物模型？ 

推特爆火！揭晓大模型的未来何去何从 

【深度学习·实践篇】CodeT5模型学习与基于CodeT5进行新的模型训练 

飞跃冰瀑——“花甲”老狄的“极限”之路 跳水世界吉尼斯记录 

Vicuna：与ChatGPT 性能最相匹配的开源模型 

这四款益智玩具开启了儿子的数学启蒙之路 世界十大高智商玩具 

基于Kubernetes环境的高扩展机器学习部署利器——KServe 

哈工大赛尔实验室开源“巧板”儿童情感陪伴大模型 

【回答问题】ChatGPT上线了！推荐30个以上比较好的命名实体识别模型