ToB企服应用市场:ToB评测及商务社交产业平台

标题: 【TVM教程】为 NVIDIA GPU 主动调度神经网络 [打印本页]

作者: 北冰洋以北    时间: 前天 04:25
标题: 【TVM教程】为 NVIDIA GPU 主动调度神经网络
Apache TVM 是一个深度的深度学习编译框架,适用于 CPU、GPU 和各种机器学习加快芯片。更多 TVM 中文文档可访问 →https://tvm.hyper.ai/
作者:Lianmin Zheng
针对特定装备和工作负载的主动调优对于获得最佳性能至关重要。本文介绍如何使用 auto-scheduler 为 NVIDIA GPU 调优整个神经网络。
为主动调优神经网络,需要将网络划分为小的子图并独立调优。每个子图被视为一个搜刮任务,任务调度器对时间举行切片并动态地为这些任务分配时间资源,并预测每个任务对端到端执行时间的影响,优先思量最能淘汰执行时间的任务。
对于每个子图,使用 tvm/python/topi 中的盘算声明来获取张量表达式情势的盘算 DAG。然后用 auto-scheduler 来构建这个 DAG 的搜刮空间,并搜刮合适的调度(低级优化)。
与基于 template 的 AutoTVM(依赖手动 template 来定义搜刮空间的) 不同,auto-scheduler 无需任何调度 template。换言之,auto-scheduler 只使用 tvm/python/topi 中的盘算声明,不使用现有的调度 template。
注意,本教程无法在 Windows 或最新版本的 macOS 上运行。如需运行,请将本教程的主体放在 if __name__ == "__main__": 代码块中。
  1. import numpy as np
  2. import tvm
  3. from tvm import relay, auto_scheduler
  4. import tvm.relay.testing
  5. from tvm.contrib import graph_executor
复制代码
定义网络​

起首,要用 Relay 前端 API 定义网络。可以从 tvm.relay.testing 加载一些预定义的网络。也可以从 MXNet、ONNX、PyTorch 和 TensorFlow 加载模子(参见 前端教程)。
对于卷积神经网络,尽管 auto-scheduler 可以在任何布局下正常运行,但通过 NHWC 布局实现的性能最佳。auto-scheduler 对 NHWC 布局举行了很多优化,因此推荐将模子转换为 NHWC 布局,从而得以使用 auto-scheduler。可用 ConvertLayout pass 在 TVM 中举行布局转换。
  1. def get_network(name, batch_size, layout="NHWC", dtype="float32"):
  2. """Get the symbol definition and random weight of a network"""
  3. # auto-scheduler 更适合 NHWC 布局
  4. if layout == "NHWC":
  5.         image_shape = (224, 224, 3)
  6. elif layout == "NCHW":
  7.         image_shape = (3, 224, 224)
  8. else:
  9. raise ValueError("Invalid layout: " + layout)
  10.     input_shape = (batch_size,) + image_shape
  11.     output_shape = (batch_size, 1000)
  12. if name.startswith("resnet-"):
  13.         n_layer = int(name.split("-")[1])
  14.         mod, params = relay.testing.resnet.get_workload(
  15.             num_layers=n_layer,
  16.             batch_size=batch_size,
  17.             layout=layout,
  18.             dtype=dtype,
  19.             image_shape=image_shape,
  20. )
  21. elif name.startswith("resnet3d-"):
  22.         n_layer = int(name.split("-")[1])
  23.         mod, params = relay.testing.resnet.get_workload(
  24.             num_layers=n_layer,
  25.             batch_size=batch_size,
  26.             layout=layout,
  27.             dtype=dtype,
  28.             image_shape=image_shape,
  29. )
  30. elif name == "mobilenet":
  31.         mod, params = relay.testing.mobilenet.get_workload(
  32.             batch_size=batch_size, layout=layout, dtype=dtype, image_shape=image_shape
  33. )
  34. elif name == "squeezenet_v1.1":
  35. assert layout == "NCHW", "squeezenet_v1.1 only supports NCHW layout"
  36.         mod, params = relay.testing.squeezenet.get_workload(
  37.             version="1.1",
  38.             batch_size=batch_size,
  39.             dtype=dtype,
  40.             image_shape=image_shape,
  41. )
  42. elif name == "inception_v3":
  43.         input_shape = (batch_size, 3, 299, 299) if layout == "NCHW" else (batch_size, 299, 299, 3)
  44.         mod, params = relay.testing.inception_v3.get_workload(batch_size=batch_size, dtype=dtype)
  45. elif name == "mxnet":
  46. # MXNet 模型的示例
  47. from mxnet.gluon.model_zoo.vision import get_model
  48. assert layout == "NCHW"
  49.         block = get_model("resnet18_v1", pretrained=True)
  50.         mod, params = relay.frontend.from_mxnet(block, shape={"data": input_shape}, dtype=dtype)
  51.         net = mod["main"]
  52.         net = relay.Function(
  53.             net.params, relay.nn.softmax(net.body), None, net.type_params, net.attrs
  54. )
  55.         mod = tvm.IRModule.from_expr(net)
  56. return mod, params, input_shape, output_shape
  57. # 定义神经网络和编译目标
  58. network = "resnet-18"
  59. batch_size = 1
  60. layout = "NHWC"
  61. target = tvm.target.Target("cuda")
  62. dtype = "float32"
  63. log_file = "%s-%s-B%d-%s.json" % (network, layout, batch_size, target.kind.name)
复制代码
提取搜刮任务​

接下来,从网络中提取搜刮任务及其权重。任务的权重是任务的子图在整个网络中出现的次数。通过使用权重,可以将网络的端到端耽误近似为 sum(latency[t] * weight[t]),其中 latency[t] 是任务的耽误,而 weight[t] 是任务的权重,任务调度器仅针对该目标举行优化。
  1. # 从网络中提取任务
  2. print("Extract tasks...")
  3. mod, params, input_shape, output_shape = get_network(network, batch_size, layout, dtype=dtype)
  4. tasks, task_weights = auto_scheduler.extract_tasks(mod["main"], params, target)
  5. for idx, task in enumerate(tasks):
  6. print("========== Task %d  (workload key: %s) ==========" % (idx, task.workload_key))
  7. print(task.compute_dag)
复制代码
输出结果:
  1. Extract tasks...
  2. /workspace/python/tvm/driver/build_module.py:268: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  3. "target_host parameter is going to be deprecated. "
  4. ========== Task 0 (workload key: ["8654f16aeddf785bad9f028164b3a48d", [1, 56, 56, 64], [1, 1, 64, 64], [1, 56, 56, 64]]) ==========
  5. placeholder = PLACEHOLDER [1, 56, 56, 64]
  6. pad_temp(i0, i1, i2, i3) = placeholder[i0, i1, i2, i3]
  7. placeholder = PLACEHOLDER [1, 1, 64, 64]
  8. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, (yy + ry), (xx + rx), rc]*placeholder[ry, rx, rc, ff])
  9. ========== Task 1 (workload key: ["c4500b4e2fd04e695c32d2f31bbdc14a", [1, 28, 28, 128], [4, 4, 128, 128], [1, 28, 28, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========
  10. placeholder = PLACEHOLDER [1, 28, 28, 128]
  11. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 29)) && (i2 >= 1)) && (i2 < 29)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  12. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*2) + eps), ((floormod(p, 14)*2) + nu), ci]
  13. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  14. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  15. placeholder = PLACEHOLDER [4, 4, 128, 128]
  16. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  17. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  18. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  19. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*14)*14) + (floordiv(h, 2)*14)) + floordiv(w, 2)), co]
  20. placeholder = PLACEHOLDER [1, 28, 28, 128]
  21. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  22. placeholder = PLACEHOLDER [1, 1, 1, 128]
  23. T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  24. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  25. ========== Task 2 (workload key: ["06f578e6519a86e85028eecf4de64b25", [1, 56, 56, 64], [1, 1, 64, 128], [1, 28, 28, 128]]) ==========
  26. placeholder = PLACEHOLDER [1, 56, 56, 64]
  27. pad_temp(i0, i1, i2, i3) = placeholder[i0, i1, i2, i3]
  28. placeholder = PLACEHOLDER [1, 1, 64, 128]
  29. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  30. ========== Task 3 (workload key: ["b8b52b9be9df6102466a22a014c44c1f", [1, 14, 14, 256], [4, 4, 256, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========
  31. placeholder = PLACEHOLDER [1, 14, 14, 256]
  32. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  33. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 49), ((floormod(floordiv(p, 7), 7)*2) + eps), ((floormod(p, 7)*2) + nu), ci]
  34. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  35. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  36. placeholder = PLACEHOLDER [4, 4, 256, 256]
  37. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  38. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  39. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  40. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*7)*7) + (floordiv(h, 2)*7)) + floordiv(w, 2)), co]
  41. placeholder = PLACEHOLDER [1, 1, 1, 256]
  42. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  43. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  44. ========== Task 4 (workload key: ["e4cdf917b876dbdd64488c3818d9c141", [1, 28, 28, 128], [4, 4, 128, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========
  45. placeholder = PLACEHOLDER [1, 28, 28, 128]
  46. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 29)) && (i2 >= 1)) && (i2 < 29)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  47. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*2) + eps), ((floormod(p, 14)*2) + nu), ci]
  48. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  49. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  50. placeholder = PLACEHOLDER [4, 4, 128, 128]
  51. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  52. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  53. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  54. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*14)*14) + (floordiv(h, 2)*14)) + floordiv(w, 2)), co]
  55. placeholder = PLACEHOLDER [1, 1, 1, 128]
  56. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  57. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  58. ========== Task 5 (workload key: ["d730bcd28f0920f6b97245e2a11bd8d6", [1, 7, 7, 512], [4, 4, 512, 512], [1, 7, 7, 512], [1, 7, 7, 512]]) ==========
  59. placeholder = PLACEHOLDER [1, 7, 7, 512]
  60. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 8)) && (i2 >= 1)) && (i2 < 8)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  61. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*2) + eps), ((floormod(p, 4)*2) + nu), ci]
  62. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  63. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  64. placeholder = PLACEHOLDER [4, 4, 512, 512]
  65. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  66. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  67. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  68. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*4)*4) + (floordiv(h, 2)*4)) + floordiv(w, 2)), co]
  69. placeholder = PLACEHOLDER [1, 7, 7, 512]
  70. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  71. ========== Task 6 (workload key: ["b818b53148cd450f86569dfc3e04cb8a", [1, 56, 56, 64], [6, 6, 64, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========
  72. placeholder = PLACEHOLDER [1, 56, 56, 64]
  73. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  74. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*4) + eps), ((floormod(p, 14)*4) + nu), ci]
  75. B(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 6) == 5)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 6) == 4)),  ..(OMITTED).. (floormod(j, 6) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 6) == 0)), 1f, 0f))))))))))))))))))))))))))))))))))))
  76. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  77. placeholder = PLACEHOLDER [6, 6, 64, 64]
  78. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  79. A(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 4) == 2)),  ..(OMITTED).. 6) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))))))))))
  80. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  81. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 4), floormod(w, 4), ((((n*14)*14) + (floordiv(h, 4)*14)) + floordiv(w, 4)), co]
  82. placeholder = PLACEHOLDER [1, 1, 1, 64]
  83. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  84. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  85. ========== Task 7 (workload key: ["ad6cecbf5d85cb1cda3c2bb7af170211", [1, 7, 7, 512], [4, 4, 512, 512], [1, 7, 7, 512], [1, 1, 1, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========
  86. placeholder = PLACEHOLDER [1, 7, 7, 512]
  87. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 8)) && (i2 >= 1)) && (i2 < 8)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  88. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*2) + eps), ((floormod(p, 4)*2) + nu), ci]
  89. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  90. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  91. placeholder = PLACEHOLDER [4, 4, 512, 512]
  92. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  93. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  94. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  95. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*4)*4) + (floordiv(h, 2)*4)) + floordiv(w, 2)), co]
  96. placeholder = PLACEHOLDER [1, 7, 7, 512]
  97. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  98. placeholder = PLACEHOLDER [1, 1, 1, 512]
  99. T_multiply(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3]*placeholder[ax0, 0, 0, ax3])
  100. placeholder = PLACEHOLDER [1, 1, 1, 512]
  101. T_add(ax0, ax1, ax2, ax3) = (T_multiply[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  102. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  103. ========== Task 8 (workload key: ["f3b6c10fcc6ce01ff01add933e4d21e9", [1, 14, 14, 256], [4, 4, 256, 256], [1, 14, 14, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========
  104. placeholder = PLACEHOLDER [1, 14, 14, 256]
  105. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  106. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 49), ((floormod(floordiv(p, 7), 7)*2) + eps), ((floormod(p, 7)*2) + nu), ci]
  107. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  108. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  109. placeholder = PLACEHOLDER [4, 4, 256, 256]
  110. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  111. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  112. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  113. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*7)*7) + (floordiv(h, 2)*7)) + floordiv(w, 2)), co]
  114. placeholder = PLACEHOLDER [1, 14, 14, 256]
  115. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  116. placeholder = PLACEHOLDER [1, 1, 1, 256]
  117. T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  118. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  119. ========== Task 9 (workload key: ["d7b65649a4dd54becea0a52aabbc5af5", [1, 1000], [1, 1000]]) ==========
  120. placeholder = PLACEHOLDER [1, 1000]
  121. T_softmax_maxelem(i0) max= placeholder[i0, k]
  122. T_softmax_exp(i0, i1) = tir.exp((placeholder[i0, i1] - T_softmax_maxelem[i0]))
  123. T_softmax_expsum(i0) += T_softmax_exp[i0, k]
  124. T_softmax_norm(i0, i1) = (T_softmax_exp[i0, i1]/T_softmax_expsum[i0])
  125. ========== Task 10 (workload key: ["69115f188984ae34ede37c3b8ca40b43", [1, 7, 7, 512], [1, 1, 1, 512]]) ==========
  126. placeholder = PLACEHOLDER [1, 7, 7, 512]
  127. tensor(ax0, ax1, ax2, ax3) += placeholder[ax0, ((ax1*7) + rv0), ((ax2*7) + rv1), ax3]
  128. tensor(ax0, ax1, ax2, ax3) = (tensor[ax0, ax1, ax2, ax3]/(float32((select((bool)1, ((ax1 + 1)*7), (((ax1 + 1)*7) + 1)) - (ax1*7)))*float32((select((bool)1, ((ax2 + 1)*7), (((ax2 + 1)*7) + 1)) - (ax2*7)))))
  129. ========== Task 11 (workload key: ["3a69f9fbc63760d99e36b4c17b3bfc57", [1, 7, 7, 512], [4, 4, 512, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========
  130. placeholder = PLACEHOLDER [1, 7, 7, 512]
  131. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 8)) && (i2 >= 1)) && (i2 < 8)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  132. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 16), ((floormod(floordiv(p, 4), 4)*2) + eps), ((floormod(p, 4)*2) + nu), ci]
  133. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  134. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  135. placeholder = PLACEHOLDER [4, 4, 512, 512]
  136. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  137. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  138. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  139. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*4)*4) + (floordiv(h, 2)*4)) + floordiv(w, 2)), co]
  140. placeholder = PLACEHOLDER [1, 1, 1, 512]
  141. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  142. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  143. ========== Task 12 (workload key: ["06f578e6519a86e85028eecf4de64b25", [1, 28, 28, 128], [1, 1, 128, 256], [1, 14, 14, 256]]) ==========
  144. placeholder = PLACEHOLDER [1, 28, 28, 128]
  145. pad_temp(i0, i1, i2, i3) = placeholder[i0, i1, i2, i3]
  146. placeholder = PLACEHOLDER [1, 1, 128, 256]
  147. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  148. ========== Task 13 (workload key: ["96daaa9daa1b41bc383b7c05ce8b58de", [1, 14, 14, 256], [3, 3, 256, 512], [1, 1, 1, 512], [1, 7, 7, 512]]) ==========
  149. placeholder = PLACEHOLDER [1, 14, 14, 256]
  150. pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  151. placeholder = PLACEHOLDER [3, 3, 256, 512]
  152. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  153. placeholder = PLACEHOLDER [1, 1, 1, 512]
  154. T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  155. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  156. ========== Task 14 (workload key: ["dac19035dd5fe9424ee8617421b9c817", [1, 28, 28, 128], [4, 4, 128, 128], [1, 28, 28, 128], [1, 28, 28, 128]]) ==========
  157. placeholder = PLACEHOLDER [1, 28, 28, 128]
  158. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 29)) && (i2 >= 1)) && (i2 < 29)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  159. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*2) + eps), ((floormod(p, 14)*2) + nu), ci]
  160. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  161. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  162. placeholder = PLACEHOLDER [4, 4, 128, 128]
  163. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  164. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  165. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  166. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*14)*14) + (floordiv(h, 2)*14)) + floordiv(w, 2)), co]
  167. placeholder = PLACEHOLDER [1, 28, 28, 128]
  168. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  169. ========== Task 15 (workload key: ["96daaa9daa1b41bc383b7c05ce8b58de", [1, 28, 28, 128], [3, 3, 128, 256], [1, 1, 1, 256], [1, 14, 14, 256]]) ==========
  170. placeholder = PLACEHOLDER [1, 28, 28, 128]
  171. pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 29)) && (i2 >= 1)) && (i2 < 29)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  172. placeholder = PLACEHOLDER [3, 3, 128, 256]
  173. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  174. placeholder = PLACEHOLDER [1, 1, 1, 256]
  175. T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  176. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  177. ========== Task 16 (workload key: ["1e3c4211ffd2f2db91078ae4d04b779d", [1, 56, 56, 64], [6, 6, 64, 64], [1, 56, 56, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========
  178. placeholder = PLACEHOLDER [1, 56, 56, 64]
  179. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  180. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*4) + eps), ((floormod(p, 14)*4) + nu), ci]
  181. B(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 6) == 5)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 6) == 4)),  ..(OMITTED).. (floormod(j, 6) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 6) == 0)), 1f, 0f))))))))))))))))))))))))))))))))))))
  182. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  183. placeholder = PLACEHOLDER [6, 6, 64, 64]
  184. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  185. A(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 4) == 2)),  ..(OMITTED).. 6) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))))))))))
  186. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  187. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 4), floormod(w, 4), ((((n*14)*14) + (floordiv(h, 4)*14)) + floordiv(w, 4)), co]
  188. placeholder = PLACEHOLDER [1, 56, 56, 64]
  189. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  190. placeholder = PLACEHOLDER [1, 1, 1, 64]
  191. T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  192. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  193. ========== Task 17 (workload key: ["96daaa9daa1b41bc383b7c05ce8b58de", [1, 224, 224, 3], [7, 7, 3, 64], [1, 1, 1, 64], [1, 112, 112, 64]]) ==========
  194. placeholder = PLACEHOLDER [1, 224, 224, 3]
  195. pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 3) && (i1 < 227)) && (i2 >= 3)) && (i2 < 227)), placeholder[i0, (i1 - 3), (i2 - 3), i3], 0f)
  196. placeholder = PLACEHOLDER [7, 7, 3, 64]
  197. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  198. placeholder = PLACEHOLDER [1, 1, 1, 64]
  199. T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  200. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  201. ========== Task 18 (workload key: ["3ea73fb9b0364374730d09e068821f95", [1, 56, 56, 64], [6, 6, 64, 64], [1, 56, 56, 64], [1, 56, 56, 64]]) ==========
  202. placeholder = PLACEHOLDER [1, 56, 56, 64]
  203. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  204. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 196), ((floormod(floordiv(p, 14), 14)*4) + eps), ((floormod(p, 14)*4) + nu), ci]
  205. B(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 6) == 5)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 6) == 4)),  ..(OMITTED).. (floormod(j, 6) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 6) == 0)), 1f, 0f))))))))))))))))))))))))))))))))))))
  206. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  207. placeholder = PLACEHOLDER [6, 6, 64, 64]
  208. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  209. A(i, j) = select(((floormod(i, 6) == 5) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 6) == 5) && (floormod(j, 4) == 2)),  ..(OMITTED).. 6) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 6) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))))))))))
  210. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  211. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 4), floormod(w, 4), ((((n*14)*14) + (floordiv(h, 4)*14)) + floordiv(w, 4)), co]
  212. placeholder = PLACEHOLDER [1, 56, 56, 64]
  213. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  214. ========== Task 19 (workload key: ["d374e472bd9d8164892b9e28a0a8cb59", [1, 14, 14, 256], [4, 4, 256, 256], [1, 14, 14, 256], [1, 14, 14, 256]]) ==========
  215. placeholder = PLACEHOLDER [1, 14, 14, 256]
  216. data_pad(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 15)) && (i2 >= 1)) && (i2 < 15)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  217. input_tile(eps, nu, p, ci) = data_pad[floordiv(p, 49), ((floormod(floordiv(p, 7), 7)*2) + eps), ((floormod(p, 7)*2) + nu), ci]
  218. B(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 4) == 3)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 4) == 2)),  ..(OMITTED).. ormod(i, 4) == 0) && (floormod(j, 4) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 4) == 0)), 1f, 0f))))))))))))))))
  219. data_pack(eps, nu, p, ci) += ((input_tile[r_a, r_b, p, ci]*B[r_a, eps])*B[r_b, nu])
  220. placeholder = PLACEHOLDER [4, 4, 256, 256]
  221. bgemm(eps, nu, p, co) += (data_pack[eps, nu, p, ci]*placeholder[eps, nu, co, ci])
  222. A(i, j) = select(((floormod(i, 4) == 3) && (floormod(j, 2) == 1)), 1f, select(((floormod(i, 4) == 3) && (floormod(j, 2) == 0)),  ..(OMITTED).. ct(((floormod(i, 4) == 0) && (floormod(j, 2) == 1)), 0f, select(((floormod(i, 4) == 0) && (floormod(j, 2) == 0)), 1f, 0f))))))))
  223. inverse(vh, vw, p, co) += ((bgemm[r_a, r_b, p, co]*A[r_a, vh])*A[r_b, vw])
  224. conv2d_winograd(n, h, w, co) = inverse[floormod(h, 2), floormod(w, 2), ((((n*7)*7) + (floordiv(h, 2)*7)) + floordiv(w, 2)), co]
  225. placeholder = PLACEHOLDER [1, 14, 14, 256]
  226. T_add(ax0, ax1, ax2, ax3) = (conv2d_winograd[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3])
  227. ========== Task 20 (workload key: ["64b98c71af70a904fdbb81d7d4188d84", [1, 112, 112, 64], [1, 1, 1, 64], [1, 56, 56, 64]]) ==========
  228. placeholder = PLACEHOLDER [1, 112, 112, 64]
  229. pad_temp(ax0, ax1, ax2, ax3) = tir.if_then_else(((((ax1 >= 1) && (ax1 < 113)) && (ax2 >= 1)) && (ax2 < 113)), placeholder[ax0, (ax1 - 1), (ax2 - 1), ax3], -3.40282e+38f)
  230. tensor(ax0, ax1, ax2, ax3) max= pad_temp[ax0, ((ax1*2) + rv0), ((ax2*2) + rv1), ax3]
  231. placeholder = PLACEHOLDER [1, 1, 1, 64]
  232. T_add(ax0, ax1, ax2, ax3) = (tensor[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  233. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
  234. ========== Task 21 (workload key: ["06f578e6519a86e85028eecf4de64b25", [1, 14, 14, 256], [1, 1, 256, 512], [1, 7, 7, 512]]) ==========
  235. placeholder = PLACEHOLDER [1, 14, 14, 256]
  236. pad_temp(i0, i1, i2, i3) = placeholder[i0, i1, i2, i3]
  237. placeholder = PLACEHOLDER [1, 1, 256, 512]
  238. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  239. ========== Task 22 (workload key: ["7d44c6e3c81cd80f61ff2265b2bae89a", [1, 512], [1000, 512], [1, 1000], [1, 1000]]) ==========
  240. placeholder = PLACEHOLDER [1, 512]
  241. placeholder = PLACEHOLDER [1000, 512]
  242. T_matmul_NT(i, j) += (placeholder[i, k]*placeholder[j, k])
  243. placeholder = PLACEHOLDER [1, 1000]
  244. T_add(ax0, ax1) = (T_matmul_NT[ax0, ax1] + placeholder[ax0, ax1])
  245. ========== Task 23 (workload key: ["96daaa9daa1b41bc383b7c05ce8b58de", [1, 56, 56, 64], [3, 3, 64, 128], [1, 1, 1, 128], [1, 28, 28, 128]]) ==========
  246. placeholder = PLACEHOLDER [1, 56, 56, 64]
  247. pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i1 >= 1) && (i1 < 57)) && (i2 >= 1)) && (i2 < 57)), placeholder[i0, (i1 - 1), (i2 - 1), i3], 0f)
  248. placeholder = PLACEHOLDER [3, 3, 64, 128]
  249. conv2d_nhwc(nn, yy, xx, ff) += (pad_temp[nn, ((yy*2) + ry), ((xx*2) + rx), rc]*placeholder[ry, rx, rc, ff])
  250. placeholder = PLACEHOLDER [1, 1, 1, 128]
  251. T_add(ax0, ax1, ax2, ax3) = (conv2d_nhwc[ax0, ax1, ax2, ax3] + placeholder[ax0, 0, 0, ax3])
  252. T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
复制代码
开始调优​

接下来为调优和启动搜刮任务设置一些选项

  1. def run_tuning():
  2. print("Begin tuning...")
  3.     measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1, min_repeat_ms=300, timeout=10)
  4.     tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
  5.     tune_option = auto_scheduler.TuningOptions(
  6.         num_measure_trials=200, # 将此更改为 20000 以达到最佳性能
  7.         runner=measure_ctx.runner,
  8.         measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
  9. )
  10.     tuner.tune(tune_option)
  11. # 不在网页服务器中运行调优,因为它需要的时间太长。
  12. # 取消注释运行下面行。
  13. # run_tuning()
复制代码
  备注
解释调优过程中打印的信息
在调优过程中,控制台上会打印很多用于调试的信息,最重要的信息是任务调度步调的输出,下表是输出示例。


  ------------------------------ [ Task Scheduler ]

  | ID | Latency (ms) | Speed (GFLOPS) | Trials |

  | 0 | 0.005 | 0.88 | 64 |
| 1 | 0.010 | 99.10 | 64 |
| 2 | 0.006 | 0.00 | 64 |
| 3 | 0.145 | 979.78 | 384 |
| 4 | 0.130 | 1097.02 | 384 |
| 5 | 0.143 | 992.69 | 384 |
| 6 | 0.076 | 1526.86 | 192 |
| 7 | 0.115 | 999.44 | 320 |
| 8 | 0.079 | 1449.39 | 320 |
| 9 | 0.122 | 938.73 | 384 |
| 10 | 0.063 | 1832.98 | 192 |
| 11 | 0.072 | 1763.62 | 256 |
| 12 | 0.062 | 2036.40 | 192 |
| 13 | 0.068 | 1874.44 | 192 |
| 14 | 0.049 | 2346.50 | 128 |
| 15 | 0.076 | 1694.31 | 256 |
| 16 | 0.067 | 1933.30 | 448 |
| 17 | 0.076 | 1680.90 | 256 |
| 18 | 0.022 | 98.43 | 64 |
| 19 | 0.076 | 3112.55 | 192 |
| 20 | 0.013 | 2026.44 | 64 |
| 21 | 0.011 | 1136.69 | 64 |
| 22 | 0.013 | 992.47 | 64 |
| 23 | 0.020 | 627.56 | 64 |


  Estimated total latency: 1.587 ms Trials: 4992 Used time : 13296 s Next ID: 3
  此表列出了所有任务的耽误和(预估)速度,还列出了所有任务的测试分配。末了一行打印了这些任务的总加权耽误,可以大略估计网络的端到端执行时间。末了一行还打印了测试试验的总数、主动调优所花费的总时间以及下一个要调优的任务的 ID。
还有一些「tvm::Error」错误,因为 auto-scheduler 会尝试一些无效的调度。若调优继承运行,则可以忽略这些错误,因为这些错误与主进程隔离。
    备注
提前终止调优
可以通过逼迫终止此进程来提前终止调优,只要在日志文件中为每个任务获得至少一个有效的调度,就能够举行编译(下面的部门)。
  编译及评估​

主动调优后,用找到的最佳调度来编译网络。在主动调优期间,所有测试记录都被转储到日志文件中,可以读取日志文件加载最佳调度。
  1. # 用历史最佳编译
  2. print("Compile...")
  3. with auto_scheduler.ApplyHistoryBest(log_file):
  4. with tvm.transform.PassContext(opt_level=3, config={"relay.backend.use_auto_scheduler": True}):
  5.         lib = relay.build(mod, target=target, params=params)
  6. # 创建图执行器
  7. dev = tvm.device(str(target), 0)
  8. module = graph_executor.GraphModule(lib["default"](dev))
  9. data_tvm = tvm.nd.array((np.random.uniform(size=input_shape)).astype(dtype))
  10. module.set_input("data", data_tvm)
  11. # 评估
  12. print("Evaluate inference time cost...")
  13. print(module.benchmark(dev, repeat=3, min_repeat_ms=500))
复制代码
输出结果:
  1. Compile...
  2. /workspace/python/tvm/driver/build_module.py:268: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
  3. "target_host parameter is going to be deprecated. "
  4. Evaluate inference time cost...
  5. Execution time summary:
  6. mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
  7. 10.0003 9.9944 10.0327 9.9738 0.0244
复制代码
其他本事​

下载 Python 源代码:tune_network_cuda.py
下载 Jupyter Notebook:tune_network_cuda.ipynb

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。




欢迎光临 ToB企服应用市场:ToB评测及商务社交产业平台 (https://dis.qidao123.com/) Powered by Discuz! X3.4