Appearance
性能实测
💎 划重点啦!
实测数据基于测试环境:Pytorch 2.7.0 、Python 3.12.4 、Cuda 12.6
测试结果
4090-48G 性能数据
"Iteration 0, 4454.28 images/s in 0.588s (7 runs, 374 images/run).
Iteration 1, 4455.73 images/s in 0.588s (7 runs, 374 images/run).
Iteration 2, 4452.13 images/s in 0.588s (7 runs, 374 images/run).
Iteration 3, 4451.93 images/s in 0.588s (7 runs, 374 images/run).
Iteration 4, 4449.36 images/s in 0.588s (7 runs, 374 images/run).
Summary - ResNet50 float16 (performance mode):
Batch size: 374
Repeats per iter: 7
Peak memory: 1.98GB (4.2% of total VRAM)"
"Iteration 0, 2386.04 images/s in 0.566s (7 runs, 193 images/run).
Iteration 1, 2386.20 images/s in 0.566s (7 runs, 193 images/run).
Iteration 2, 2386.36 images/s in 0.566s (7 runs, 193 images/run).
Iteration 3, 2386.06 images/s in 0.566s (7 runs, 193 images/run).
Iteration 4, 2385.55 images/s in 0.566s (7 runs, 193 images/run).
Summary - ResNet50 float32 (performance mode):
Batch size: 193
Repeats per iter: 7
Peak memory: 2.09GB (4.4% of total VRAM)"
"Iteration 0, 3214.64 images/s in 0.549s (14 runs, 126 images/run).
Iteration 1, 3259.15 images/s in 0.541s (14 runs, 126 images/run).
Iteration 2, 3231.10 images/s in 0.546s (14 runs, 126 images/run).
Iteration 3, 3243.55 images/s in 0.544s (14 runs, 126 images/run).
Iteration 4, 3244.36 images/s in 0.544s (14 runs, 126 images/run).
Summary - ViT Transformer float16 (performance mode):
Batch size: 126
Repeats per iter: 14
Peak memory: 0.64GB (1.3% of total VRAM)"
"Iteration 0, 945.65 images/s in 0.584s (6 runs, 92 images/run).
Iteration 1, 955.15 images/s in 0.578s (6 runs, 92 images/run).
Iteration 2, 942.71 images/s in 0.586s (6 runs, 92 images/run).
Iteration 3, 948.69 images/s in 0.582s (6 runs, 92 images/run).
Iteration 4, 943.47 images/s in 0.585s (6 runs, 92 images/run).
Summary - ViT Transformer float32 (performance mode):
Batch size: 92
Repeats per iter: 6
Peak memory: 1.01GB (2.1% of total VRAM)"
4090-24G 性能数据
"Iteration 0, 4607.13 images/s in 0.565s (14 runs, 186 images/run).
Iteration 1, 4604.64 images/s in 0.566s (14 runs, 186 images/run).
Iteration 2, 4608.12 images/s in 0.565s (14 runs, 186 images/run).
Iteration 3, 4606.81 images/s in 0.565s (14 runs, 186 images/run).
Iteration 4, 4606.74 images/s in 0.565s (14 runs, 186 images/run).
Summary - ResNet50 float16 (performance mode):
Batch size: 186
Repeats per iter: 14
Peak memory: 1.02GB (4.3% of total VRAM)"
"Iteration 0, 2452.09 images/s in 0.587s (15 runs, 96 images/run).
Iteration 1, 2452.83 images/s in 0.587s (15 runs, 96 images/run).
Iteration 2, 2451.25 images/s in 0.587s (15 runs, 96 images/run).
Iteration 3, 2450.24 images/s in 0.588s (15 runs, 96 images/run).
Iteration 4, 2451.44 images/s in 0.587s (15 runs, 96 images/run).
Summary - ResNet50 float32 (performance mode):
Batch size: 96
Repeats per iter: 15
Peak memory: 1.09GB (4.6% of total VRAM)"
"Iteration 0, 3402.21 images/s in 0.588s (23 runs, 87 images/run).
Iteration 1, 3374.88 images/s in 0.593s (23 runs, 87 images/run).
Iteration 2, 3371.34 images/s in 0.594s (23 runs, 87 images/run).
Iteration 3, 3385.16 images/s in 0.591s (23 runs, 87 images/run).
Iteration 4, 3365.35 images/s in 0.595s (23 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
Batch size: 87
Repeats per iter: 23
Peak memory: 0.49GB (2.1% of total VRAM)"
"Iteration 0, 993.71 images/s in 0.584s (10 runs, 58 images/run).
Iteration 1, 992.78 images/s in 0.584s (10 runs, 58 images/run).
Iteration 2, 992.30 images/s in 0.584s (10 runs, 58 images/run).
Iteration 3, 991.81 images/s in 0.585s (10 runs, 58 images/run).
Iteration 4, 992.31 images/s in 0.584s (10 runs, 58 images/run).
Summary - ViT Transformer float32 (performance mode):
Batch size: 58
Repeats per iter: 10
Peak memory: 0.76GB (3.2% of total VRAM)"
4090-24G-DDR5 性能数据
"Iteration 0, 4645.05 images/s in 0.561s (14 runs, 186 images/run).
Iteration 1, 4644.38 images/s in 0.561s (14 runs, 186 images/run).
Iteration 2, 4645.39 images/s in 0.561s (14 runs, 186 images/run).
Iteration 3, 4645.01 images/s in 0.561s (14 runs, 186 images/run).
Iteration 4, 4644.87 images/s in 0.561s (14 runs, 186 images/run).
Summary - ResNet50 float16 (performance mode):
Batch size: 186
Repeats per iter: 14
Peak memory: 1.02GB (4.3% of total VRAM)"
"Iteration 0, 2467.26 images/s in 0.545s (14 runs, 96 images/run).
Iteration 1, 2467.40 images/s in 0.545s (14 runs, 96 images/run).
Iteration 2, 2467.53 images/s in 0.545s (14 runs, 96 images/run).
Iteration 3, 2467.44 images/s in 0.545s (14 runs, 96 images/run).
Iteration 4, 2467.41 images/s in 0.545s (14 runs, 96 images/run).
Summary - ResNet50 float32 (performance mode):
Batch size: 96
Repeats per iter: 14
Peak memory: 1.09GB (4.6% of total VRAM)"
"Iteration 0, 3426.37 images/s in 0.584s (23 runs, 87 images/run).
Iteration 1, 3404.94 images/s in 0.588s (23 runs, 87 images/run).
Iteration 2, 3422.09 images/s in 0.585s (23 runs, 87 images/run).
Iteration 3, 3422.40 images/s in 0.585s (23 runs, 87 images/run).
Iteration 4, 3401.27 images/s in 0.588s (23 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
Batch size: 87
Repeats per iter: 23
Peak memory: 0.49GB (2.1% of total VRAM)"
"Iteration 0, 994.42 images/s in 0.583s (10 runs, 58 images/run).
Iteration 1, 1006.83 images/s in 0.576s (10 runs, 58 images/run).
Iteration 2, 1001.10 images/s in 0.579s (10 runs, 58 images/run).
Iteration 3, 1001.71 images/s in 0.579s (10 runs, 58 images/run).
Iteration 4, 1006.54 images/s in 0.576s (10 runs, 58 images/run).
Summary - ViT Transformer float32 (performance mode):
Batch size: 58
Repeats per iter: 10
Peak memory: 0.76GB (3.2% of total VRAM)"
3090-24G 性能数据
"Iteration 0, 3146.71 images/s in 0.591s (5 runs, 372 images/run).
Iteration 1, 3153.64 images/s in 0.590s (5 runs, 372 images/run).
Iteration 2, 3147.99 images/s in 0.591s (5 runs, 372 images/run).
Iteration 3, 3145.41 images/s in 0.591s (5 runs, 372 images/run).
Iteration 4, 3139.09 images/s in 0.593s (5 runs, 372 images/run).
Summary - ResNet50 float16 (performance mode):
Batch size: 372
Repeats per iter: 5
Peak memory: 2.11GB (8.9% of total VRAM)"
"Iteration 0, 1685.09 images/s in 0.573s (5 runs, 193 images/run).
Iteration 1, 1683.83 images/s in 0.573s (5 runs, 193 images/run).
Iteration 2, 1681.13 images/s in 0.574s (5 runs, 193 images/run).
Iteration 3, 1681.29 images/s in 0.574s (5 runs, 193 images/run).
Iteration 4, 1681.70 images/s in 0.574s (5 runs, 193 images/run).
Summary - ResNet50 float32 (performance mode):
Batch size: 193
Repeats per iter: 5
Peak memory: 2.09GB (8.9% of total VRAM)"
"Iteration 0, 1584.39 images/s in 0.549s (10 runs, 87 images/run).
Iteration 1, 1584.61 images/s in 0.549s (10 runs, 87 images/run).
Iteration 2, 1587.71 images/s in 0.548s (10 runs, 87 images/run).
Iteration 3, 1582.08 images/s in 0.550s (10 runs, 87 images/run).
Iteration 4, 1584.56 images/s in 0.549s (10 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
Batch size: 87
Repeats per iter: 10
Peak memory: 0.49GB (2.1% of total VRAM)"
"Iteration 0, 418.50 images/s in 0.559s (6 runs, 39 images/run).
Iteration 1, 418.37 images/s in 0.559s (6 runs, 39 images/run).
Iteration 2, 419.16 images/s in 0.558s (6 runs, 39 images/run).
Iteration 3, 417.34 images/s in 0.561s (6 runs, 39 images/run).
Iteration 4, 418.92 images/s in 0.559s (6 runs, 39 images/run).
Summary - ViT Transformer float32 (performance mode):
Batch size: 39
Repeats per iter: 6
Peak memory: 0.62GB (2.6% of total VRAM)"
综合对比与选型建议
- 性能对比表(平均处理速度:images/s)
GPU型号 | resnet50 (float16) | resnet50 (float32) | vit_base (float16) | vit_base (float32) |
---|---|---|---|---|
4090-48G | 4,452.69 | 2,386.04 | 3,238.56 | 947.13 |
4090-24G | 4,606.69 | 2,451.57 | 3,379.79 | 992.58 |
4090-24G-DDR5 | 4,644.94 | 2,467.41 | 3,415.41 | 1,002.12 |
4090D-24G | 4,464.12 | 2,362.95 | 3,091.78 | 918.95 |
3090-24G | 3,146.57 | 1,682.61 | 1,584.67 | 418.46 |
- 显存占用对比表(峰值显存:GB)
GPU型号 | resnet50 (float16) | resnet50 (float32) | vit_base (float16) | vit_base (float32) |
---|---|---|---|---|
4090-48G | 1.98 | 2.09 | 0.64 | 1.01 |
4090-24G | 1.02 | 1.09 | 0.49 | 0.76 |
4090-24G-DDR5 | 1.02 | 1.09 | 0.49 | 0.76 |
4090D-24G | 1.02 | 1.09 | 0.54 | 0.76 |
3090-24G | 2.11 | 2.09 | 0.49 | 0.62 |
- 选型建议
使用场景 | 推荐显卡 |
---|---|
新手推荐 | RTX 3090-24G |
性价比最高 | RTX 4090D-24G |
极致体验 | RTX 4090-48G |