Skip to content

性能实测

💎 划重点啦!

实测数据基于测试环境:Pytorch 2.7.0 、Python 3.12.4 、Cuda 12.6

测试结果

4090-48G 性能数据

"Iteration 0, 4454.28 images/s in 0.588s (7 runs, 374 images/run).
Iteration 1, 4455.73 images/s in 0.588s (7 runs, 374 images/run).
Iteration 2, 4452.13 images/s in 0.588s (7 runs, 374 images/run).
Iteration 3, 4451.93 images/s in 0.588s (7 runs, 374 images/run).
Iteration 4, 4449.36 images/s in 0.588s (7 runs, 374 images/run).
Summary - ResNet50 float16 (performance mode):
  Batch size: 374
  Repeats per iter: 7
  Peak memory: 1.98GB (4.2% of total VRAM)"

"Iteration 0, 2386.04 images/s in 0.566s (7 runs, 193 images/run).
Iteration 1, 2386.20 images/s in 0.566s (7 runs, 193 images/run).
Iteration 2, 2386.36 images/s in 0.566s (7 runs, 193 images/run).
Iteration 3, 2386.06 images/s in 0.566s (7 runs, 193 images/run).
Iteration 4, 2385.55 images/s in 0.566s (7 runs, 193 images/run).
Summary - ResNet50 float32 (performance mode):
  Batch size: 193
  Repeats per iter: 7
  Peak memory: 2.09GB (4.4% of total VRAM)"
"Iteration 0, 3214.64 images/s in 0.549s (14 runs, 126 images/run).
Iteration 1, 3259.15 images/s in 0.541s (14 runs, 126 images/run).
Iteration 2, 3231.10 images/s in 0.546s (14 runs, 126 images/run).
Iteration 3, 3243.55 images/s in 0.544s (14 runs, 126 images/run).
Iteration 4, 3244.36 images/s in 0.544s (14 runs, 126 images/run).
Summary - ViT Transformer float16 (performance mode):
  Batch size: 126
  Repeats per iter: 14
  Peak memory: 0.64GB (1.3% of total VRAM)"

"Iteration 0, 945.65 images/s in 0.584s (6 runs, 92 images/run).
Iteration 1, 955.15 images/s in 0.578s (6 runs, 92 images/run).
Iteration 2, 942.71 images/s in 0.586s (6 runs, 92 images/run).
Iteration 3, 948.69 images/s in 0.582s (6 runs, 92 images/run).
Iteration 4, 943.47 images/s in 0.585s (6 runs, 92 images/run).
Summary - ViT Transformer float32 (performance mode):
  Batch size: 92
  Repeats per iter: 6
  Peak memory: 1.01GB (2.1% of total VRAM)"

4090-24G 性能数据

"Iteration 0, 4607.13 images/s in 0.565s (14 runs, 186 images/run).
Iteration 1, 4604.64 images/s in 0.566s (14 runs, 186 images/run).
Iteration 2, 4608.12 images/s in 0.565s (14 runs, 186 images/run).
Iteration 3, 4606.81 images/s in 0.565s (14 runs, 186 images/run).
Iteration 4, 4606.74 images/s in 0.565s (14 runs, 186 images/run).
Summary - ResNet50 float16 (performance mode):
  Batch size: 186
  Repeats per iter: 14
  Peak memory: 1.02GB (4.3% of total VRAM)"

"Iteration 0, 2452.09 images/s in 0.587s (15 runs, 96 images/run).
Iteration 1, 2452.83 images/s in 0.587s (15 runs, 96 images/run).
Iteration 2, 2451.25 images/s in 0.587s (15 runs, 96 images/run).
Iteration 3, 2450.24 images/s in 0.588s (15 runs, 96 images/run).
Iteration 4, 2451.44 images/s in 0.587s (15 runs, 96 images/run).
Summary - ResNet50 float32 (performance mode):
  Batch size: 96
  Repeats per iter: 15
  Peak memory: 1.09GB (4.6% of total VRAM)"
"Iteration 0, 3402.21 images/s in 0.588s (23 runs, 87 images/run).
Iteration 1, 3374.88 images/s in 0.593s (23 runs, 87 images/run).
Iteration 2, 3371.34 images/s in 0.594s (23 runs, 87 images/run).
Iteration 3, 3385.16 images/s in 0.591s (23 runs, 87 images/run).
Iteration 4, 3365.35 images/s in 0.595s (23 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
  Batch size: 87
  Repeats per iter: 23
  Peak memory: 0.49GB (2.1% of total VRAM)"

"Iteration 0, 993.71 images/s in 0.584s (10 runs, 58 images/run).
Iteration 1, 992.78 images/s in 0.584s (10 runs, 58 images/run).
Iteration 2, 992.30 images/s in 0.584s (10 runs, 58 images/run).
Iteration 3, 991.81 images/s in 0.585s (10 runs, 58 images/run).
Iteration 4, 992.31 images/s in 0.584s (10 runs, 58 images/run).
Summary - ViT Transformer float32 (performance mode):
  Batch size: 58
  Repeats per iter: 10
  Peak memory: 0.76GB (3.2% of total VRAM)"

4090-24G-DDR5 性能数据

"Iteration 0, 4645.05 images/s in 0.561s (14 runs, 186 images/run).
Iteration 1, 4644.38 images/s in 0.561s (14 runs, 186 images/run).
Iteration 2, 4645.39 images/s in 0.561s (14 runs, 186 images/run).
Iteration 3, 4645.01 images/s in 0.561s (14 runs, 186 images/run).
Iteration 4, 4644.87 images/s in 0.561s (14 runs, 186 images/run).
Summary - ResNet50 float16 (performance mode):
  Batch size: 186
  Repeats per iter: 14
  Peak memory: 1.02GB (4.3% of total VRAM)"

"Iteration 0, 2467.26 images/s in 0.545s (14 runs, 96 images/run).
Iteration 1, 2467.40 images/s in 0.545s (14 runs, 96 images/run).
Iteration 2, 2467.53 images/s in 0.545s (14 runs, 96 images/run).
Iteration 3, 2467.44 images/s in 0.545s (14 runs, 96 images/run).
Iteration 4, 2467.41 images/s in 0.545s (14 runs, 96 images/run).
Summary - ResNet50 float32 (performance mode):
  Batch size: 96
  Repeats per iter: 14
  Peak memory: 1.09GB (4.6% of total VRAM)"
"Iteration 0, 3426.37 images/s in 0.584s (23 runs, 87 images/run).
Iteration 1, 3404.94 images/s in 0.588s (23 runs, 87 images/run).
Iteration 2, 3422.09 images/s in 0.585s (23 runs, 87 images/run).
Iteration 3, 3422.40 images/s in 0.585s (23 runs, 87 images/run).
Iteration 4, 3401.27 images/s in 0.588s (23 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
  Batch size: 87
  Repeats per iter: 23
  Peak memory: 0.49GB (2.1% of total VRAM)"

"Iteration 0, 994.42 images/s in 0.583s (10 runs, 58 images/run).
Iteration 1, 1006.83 images/s in 0.576s (10 runs, 58 images/run).
Iteration 2, 1001.10 images/s in 0.579s (10 runs, 58 images/run).
Iteration 3, 1001.71 images/s in 0.579s (10 runs, 58 images/run).
Iteration 4, 1006.54 images/s in 0.576s (10 runs, 58 images/run).
Summary - ViT Transformer float32 (performance mode):
  Batch size: 58
  Repeats per iter: 10
  Peak memory: 0.76GB (3.2% of total VRAM)"

3090-24G 性能数据

"Iteration 0, 3146.71 images/s in 0.591s (5 runs, 372 images/run).
Iteration 1, 3153.64 images/s in 0.590s (5 runs, 372 images/run).
Iteration 2, 3147.99 images/s in 0.591s (5 runs, 372 images/run).
Iteration 3, 3145.41 images/s in 0.591s (5 runs, 372 images/run).
Iteration 4, 3139.09 images/s in 0.593s (5 runs, 372 images/run).
Summary - ResNet50 float16 (performance mode):
  Batch size: 372
  Repeats per iter: 5
  Peak memory: 2.11GB (8.9% of total VRAM)"

"Iteration 0, 1685.09 images/s in 0.573s (5 runs, 193 images/run).
Iteration 1, 1683.83 images/s in 0.573s (5 runs, 193 images/run).
Iteration 2, 1681.13 images/s in 0.574s (5 runs, 193 images/run).
Iteration 3, 1681.29 images/s in 0.574s (5 runs, 193 images/run).
Iteration 4, 1681.70 images/s in 0.574s (5 runs, 193 images/run).
Summary - ResNet50 float32 (performance mode):
  Batch size: 193
  Repeats per iter: 5
  Peak memory: 2.09GB (8.9% of total VRAM)"
"Iteration 0, 1584.39 images/s in 0.549s (10 runs, 87 images/run).
Iteration 1, 1584.61 images/s in 0.549s (10 runs, 87 images/run).
Iteration 2, 1587.71 images/s in 0.548s (10 runs, 87 images/run).
Iteration 3, 1582.08 images/s in 0.550s (10 runs, 87 images/run).
Iteration 4, 1584.56 images/s in 0.549s (10 runs, 87 images/run).
Summary - ViT Transformer float16 (performance mode):
  Batch size: 87
  Repeats per iter: 10
  Peak memory: 0.49GB (2.1% of total VRAM)"

"Iteration 0, 418.50 images/s in 0.559s (6 runs, 39 images/run).
Iteration 1, 418.37 images/s in 0.559s (6 runs, 39 images/run).
Iteration 2, 419.16 images/s in 0.558s (6 runs, 39 images/run).
Iteration 3, 417.34 images/s in 0.561s (6 runs, 39 images/run).
Iteration 4, 418.92 images/s in 0.559s (6 runs, 39 images/run).
Summary - ViT Transformer float32 (performance mode):
  Batch size: 39
  Repeats per iter: 6
  Peak memory: 0.62GB (2.6% of total VRAM)"

综合对比与选型建议

  1. 性能对比表(平均处理速度:images/s)
GPU型号resnet50 (float16)resnet50 (float32)vit_base (float16)vit_base (float32)
4090-48G4,452.692,386.043,238.56947.13
4090-24G4,606.692,451.573,379.79992.58
4090-24G-DDR54,644.942,467.413,415.411,002.12
4090D-24G4,464.122,362.953,091.78918.95
3090-24G3,146.571,682.611,584.67418.46
  1. 显存占用对比表(峰值显存:GB)
GPU型号resnet50 (float16)resnet50 (float32)vit_base (float16)vit_base (float32)
4090-48G1.982.090.641.01
4090-24G1.021.090.490.76
4090-24G-DDR51.021.090.490.76
4090D-24G1.021.090.540.76
3090-24G2.112.090.490.62
  1. 选型建议
使用场景推荐显卡
新手推荐RTX 3090-24G
性价比最高RTX 4090D-24G
极致体验RTX 4090-48G