Deep learning compiler tool, Tensor Virtual Machine (TVM), has excellent deployment, compilation, and optimization capabilities supported by the industry following the vigorous growth in neural networks (NN). It has a unified intermediate representation (IR) format that can provide efficient compilation and portability. However, its high operational complexity requires considerable effort in development. For beginners with programming backgrounds, a new and easy-to-use design approach is needed. This paper proposes a visual concept approach that can execute artificial intelligence (AI) computing using block-based tools with AI knowledge. This research also develops a web-based NNBlocks framework that uses this approach to integrate with TVM. We conduct experiments to evaluate this approach: (
Keywords: Blockly; Visualization; Visual block; Scheduling optimization; Intermediate representation; Neural network
Deep learning frameworks[
Deep learning compiler tool, Tensor Virtual Machine (TVM) [[
This research proposes a novel and easy-to-use design approach and creates a web-based NNBlocks framework that integrates TVM. The target user is beginners with a programming background. This approach uses the block-based Blockly[
This contribution focuses on proposing a visual design approach for the deep learning compiler tool. This approach uses extensible features of Blockly to create a design including Relay IR, inference framework, and optimization. In addition, this approach focuses on the integration of logical abstraction concepts in design, rather than following traditional programming behaviors. This research proposes that the NNBlocks framework constructed using this approach is also designed according to this abstract concept to integrate the system software TVM with high operational complexity. Therefore, this approach can provide a reference for the interface design of complex system software in the future.
The purpose of this paper proposes a visual concept design approach that can enable users with a programming background to use Blockly with deep learning knowledge to operate AI computing. This research creates a web-based visualization NNBlocks framework that uses this approach and integrated it with the TVM to implement an easy-to-use AI learning tool. This paper answers the following research questions through further evaluation:
- Intuition Is the framework easy to learn?
- Usability How usability is the framework?
- Significance How significant is the framework for AI learning?
- Impact Does the framework burden the system?
The remainder of this paper is organized as follows: Sect. 2 describes related work that includes a background overview of AI platforms and visual blocks, Sect. 3 introduces the design approach and framework, Sect. 4 introduces the evaluation methodology, Sect. 5 presents assessment results, Sect. 6 presents discussion, and conclusions are drawn in Sect. 7.
Currently, due to the development of AI applications, the design of the AI platforms include two major categories. One is the deep learning frameworks that use the tensor libraries provided by specific vendors to support the existing CPU or GPU environment [[
As illustrated in Fig. 1 (referenced from [[
Graph: Fig. 1 TVM stack with Relay IR [[
In the development of the deep learning compiler stack, the TVM and the Tensor Comprehensions [[
In addition, TVM can provide the optimization of accelerator devices in the supercomputing field and can bring the best performance advantages to the automated optimization of high-performance computing libraries (such as linear algebra libraries) through the Relay IR features. For Tensor Comprehensions, it merely integrates with the deep learning frameworks Caffe2 and PyTorch. Moreover, TVM can be deployed on diverse back-end hardware platforms such as CPUs, GPUs, and accelerators, but Tensor Comprehensions are mainly based on CUDA and CPU hardware. Therefore, the support for optimizing supercomputing services and back-ends hardware is the second reason to choose a TVM as the back-end platform of our visualization tools.
The third reason for choosing TVM is performance evaluation. In TVM's paper [[
Previous research has investigated visualization methods combined with AI applications [[
TensorBoard[
Similarly, Netron[
Since deep learning has a high degree of domain-specific knowledge, it is worth discussing how to support GUI operations to be easy to use and meet professional needs. In this work, we choose TVM as the back-end platform of GUI tools to further discuss the visual design of Relay-IR presentation and optimization functions. This paper provides a GUI tool solution that can be used in AI computing, and the effectiveness of AI learning can be explored through the use of block-based tools.
Graph: Fig. 2 The Blockly and environment
In the history of visual programming language development, which has gradually concluded two presentation methods, one is block-based, like through drag and drop to stack blocks, such as Blockly and Scratch [[
Since the block-based is easy to use, it can further promote the usability of young learners or beginners in the application of domain-specific languages or development tools and help improve the effectiveness of learning [[
Blockly, which was proposed by Google in 2012, as illustrated in Fig. 2, is a web-based open-source for creating block-based visual programming languages editor. It can be used to stack blocks using dragging and dropping to generate codes. From an application perspective, Blockly can be used to implement JavaScript, Python, PHP, Lua, Dart, XML (as indicated by red box 1 in Fig. 2), or other custom-defined languages. The categories of the design blocks (as indicated by red box 2) include logic, loops, math, text, lists, color, variables, and functions. Since it has the characteristic of self-defining blocks, which can be applied to the various scenarios, such as providing programming environment [[
Portability, usability, and open-source are the primary considerations in this paper when choosing a visualization tool as the framework for our research. BlocklyDuino[
Currently, the advantages of block-based applications are applied to the research of high domain-specific knowledge, such as GPUBlock proposed by Hwang et al. [[
Next, we further discuss the current status that block-based applications are applied to machine learning of highly specialized knowledge, such as Jatzlau et al. [[
Alturayeif et al. [[
After a discussion of the above works of the literature, this paper proposes an AI framework with a block-based GUI that can allow the switching of windows to view the text-format content of the blocks. For the AI learning process, we need to provide high-degree-of-freedom customizable blocks, advanced optimization services for this framework, and highlight its importance through usability surveys.
This section presents NNBlocks that includes blocks design approach and framework architecture, which we have designed for AI computing. The design approach is designed based on the TVM Relay IR and its system flow. They are designed to be able to support mainstream NN models.
In this section, we describe the design approach principle of NNBlocks and block overview. In addition, the design of the framework has extensible features. This means that it can be expanded in accordance with the functions provided by TVM, such as advanced functions for scheduling optimization.
Graph: Fig. 3 NNBlocks screenshot: a Blockly framework for TVM AI computing
Figure 3 shows the screen display of our NNBlocks for TVM AI computing. Red box 1 in the figure indicates the framework based on the three groups, including neural network Relay IR, inference framework, and advanced blocks. The block shapes of these items are designed using the Google Blockly Developer Tools[
Graph: Fig. 4 The concept of BNF grammars for the Relay IR
We now introduce the key component in the system, the format of the Relay IR, which is a highly conceptualized IR-format language. This IR format can be described based on statements of closures, recursion, conditionals, operators, and tensors using the Backus–Naur Format (BNF) grammar. Figure 4 illustrates the essential grammar with BNF grammar. An NN model definition consists of a model declaration and its body. The model declaration includes a list of identifiers and type expressions, while the body describes the content of the model including a list of operators, where the left-hand side of the operator is a left-value expression, and the right-hand side is the right-value expression. The full BNF grammar can be found in [[
This purpose can display the structure of relay IR through visual blocks. Depending on the Relay IR format, we brief a description of the following categories of neural network blocks:
- Graph and I/O blocks support initial data and I/O descriptions.
- Activation blocks support mathematical equations of the activation function. The purpose of these blocks is to make the output of the NN model as close as possible to the actual expected value after processing by the activation function.
- Arithmetic blocks support arithmetic operations. This means that a tensor comprises multidimensional array data that arithmetic operations are performed on using operators such as addition, division, matrix multiplication, and multiplication.
- Tensor blocks support the status of tensor shapeshifting and data construction.
- Computation blocks. From Cong and Xiao [[
12 ]], we found that the convolution layers accounted for 90.7% of the total computation for the NN model, while the pooling layer contributed 9.15%. Therefore, here support blocks of convolution and pooling. - Normalization blocks support technical methods to improve accuracy for NN model inferencing.
Graph
In the design of blocks, we provide a way to improve the stacking of blocks. In order to avoid long blocks like in Fig. 5, we design %alexnet1_conv_bias as a variable declaration. Figure 6A shows the blocks declared by the variable %alexnet1_conv0_bias (red box 1), and Fig. 6B illustrates a concept of input forwarding that means that this variable is a source imported into data2 of the bias-add block (red box 2). Therefore, for visual consistency, we focus our variable declarations on the same area, as illustrated by red box 1 in Fig. 7.
Graph: Fig. 5 Shapeshifting of long blocks
Graph: Fig. 6 Input forwarding through variable declarations
In addition, Blockly also provides tooltips and helps URL designs, which can provide auxiliary information for the meaning of blocks themselves. As the neural network is highly domain knowledge and cannot be fully understood through the building block design, we can provide the necessary information for users to understand through these two designs. Figure 8 shows the ImageNet block screenshot of Blockly supporting tooltips and help URL. When users put the mouse on the block, a small tip help window can appear, as indicated by red box 1. Or, when users right clicking the block, a help item that is associated with a specific help page can be used in the menu, as indicated by red box 2.
Graph: Fig. 7 The example of variable declarations for the AlexNet
Graph: Fig. 8 A screenshot of Tooltips and help URL for Blockly
The purpose is to realize the implementation of the TVM inferencing model by visualizing blocks to reduce the development efforts of the users.
- Framework block is used for setting up the execution environment of the TVM.
- Data set blocks support three sets of the data set that are provided for source inputs of NN models.
- NNModel blocks collect some general NN models both from the ONNX and MXNet NN models.
- Hardware block is currently only implemented on x86 hardware platforms. However, if there are platforms such as ARM or NVIDIA, we can also flexibly provide corresponding blocks for users to choose from.
Graph: Fig. 9 An example of AlexNet model using the inference framework blocks.
Figure 9 is an example of an AlexNet model using inference framework blocks. We can use this block to put the data set (ImageNet, default batch size 10), NN model (AlexNet, from MXNet), and hardware platform (x86) on this block according to its interface requirements, so that the NN model can be performed and the inference result can be generated.
Advanced blocks are performed by the TVM scheduling API to provide further performance tuning for loop optimizations in the computation of the operations of the NN model. In our work, we provide these abstract advanced blocks design for such the scheduling API to make it easy for users to optimize each operation of the NN model through block options or dragging and dropping the block. In our design, these advanced blocks with scheduling will be forwarded to TVM through the Python program of the optimization configuration to complete the optimization flow. Hence, the purpose is to realize the implementation of TVM inference optimization by reducing the workload of the user through the visual scheduling blocks. Below we further describe these advanced blocks design that NNBlocks provide for scheduling optimization.
Advanced blocks with scheduling are illustrated in Fig. 10. We support three block types: First, TVM Baseline, with no schedule optimization (Fig. 10(
Graph: Fig. 10 Advanced blocks of a TVM
Graph: Fig. 11 A scalable and flexible for advanced blocks
Graph
TVM Schedule self_define, which supports an extended block for users to use the name self_define (Fig. 10(
In this section, we describe the overall system architecture for the NNBlocks framework, the flow of translation of NN Model into Blocks and propose a running example for NNBlocks with TVM scheduling operation.
NNBlocks is a web-based visualization application that includes a client-side user mode and connects with the server-side architecture. Figure 12 is the system architecture of NNBlocks. The key prerequisite and underlying software are described as follows:
- Blockly is a prerequisite block-based Web environment for NNBlocks. It provides corresponding visual block objects for NN model structure, TVM execution flow, data set loading, and optimization functions. In Fig. 12, the configuration of its block is sent by NodeJS to the server-side TVM for execution.
- TVM is a prerequisite server-side AI execution system for NNBlocks. In Fig. 12, the entire NN model's execution and identification are performed by it. It can execute the NN model inferencing, load the data set, and provide the optimization function through the GUI interface provided by Blockly. Finally, the results of TVM execution are sent back to the browser via NodeJS.
- NodeJS is a necessary runtime environment for NNBlocks. It can execute JavaScript on the server side to receive the data sent from the client side. In Fig. 12, the result of using the browser to stack the blocks of Blockly can be sent to the server through NodeJS for TVM execution.
- Python is the underlying interpreted programming language of NNBlocks. In Fig. 12, TVM itself is developed by more than 50% of Python so that the operating system must support Python to operate it.
- LLVM is the underlying compiler infrastructure for TVM to generate CPU executable files. In Fig. 12, when TVM executes the NN model, LLVM will be launched to compile NN operations and generate executable files that can be executed on the hardware.
Graph: Fig. 12 The system architecture of NNBlocks
NNBlocks support the translation flow that the NN model can be translated to above Relay IR blocks categories we design. Figure 13 is a flowchart describing how our NNBlocks perform the translation process. When a trained NN model file, such as AlexNet, is processed by calling TVM Relay front-end API, the content of AlexNet Relay IR is generated. The RelayJSON2Block algorithm designed by us traverses the graph content automatically because of the graph structure of Relay IR and exports it into a Relay IR text file, such as JSON format. Next, when the web interface of NNBlocks wants to display the blocks of the NN model, the text file is automatically loaded by this algorithm mechanism and corresponds to each Relay IR blocks category to generate complete Relay IR blocks of AlexNet. Finally, the Relay IR blocks of AlexNet are displayed on the web interface of NNBlocks.
Graph: Fig. 13 Translation flow from NN model into blocks
Graph
We further elaborate on the algorithm mechanism of the RelayJSON2Block of Fig. 13. Algorithm 1, lines 1 to 18, describes the process flow of RelayJSON2Block. The F is an input source, which is defined as the Relay IR JSON text file of the NN model. The R is an output result, which is defined as the Relay IR blocks of the NN model. First, lines 1 to 2, when users trigger the spread model button on the web interface from the client side, the GetRelayModelFile() function is initialized and emits a request to the server side. Next, lines 3 to 6, the content of the existing F is read through the ServerSocketGetModel() function of the server side to obtain the model name N and model data D. And, line 7, the JSON content
We propose a running example for NNBlocks with TVM scheduling operation. Figure 14 provides an example of blocks for scheduling optimization. Red box a in Fig. 14 indicates a target block of the TVM's framework that can import a data set, NN model, and hardware platform. Therefore, we import the ImageNet block as a data set and AlexNet of MXNet as an NN model, as well as select the run—x86 block as the hardware platform. Red box b in Fig. 14 illustrates choosing one TVM scheduling optimization option. In this example, we implemented the FPV options simultaneously. In other words, the convolution layers of AlexNet are optimized by these three options to improve the execution time.
Graph: Fig. 14 An example of blocks for scheduling optimization with their corresponding scheduling program snippets
We further illustrate the program snippet of the three TVM scheduling optimization options for Fig. 14b. Figure 14(1–3) is the program snippets corresponding to these three options. The fusion option consists of three scheduling APIs (comprising about five key lines of code, as shown in Fig. 14(
This section provides a usability tutorial to evaluate the design approach we proposed and the research questions mentioned in Sect. 1. It includes assessment steps, background for interviewees, and a scenario.
The usability tutorial we designed contains five steps that have concept introduction, target pretest, tutorial, target posttest, and measure. Each step is described as follows:
Concept Introduction Step The target of this step is to provide interviewees with an introduction that understand the TVM operations. Since NNBlocks are designed according to the concept of TVM, we introduced the basic AI concepts, architecture, and execution process behind TVM to the interviewees. This is to ensure that the interviewees can build basic knowledge before operating this framework.
Target Pretest Step The target of this step is to allow interviewees to use NNBlocks to complete a basic topic we specified. Before that, they had never used this framework. When they only have understood the basic concepts of TVM, we immediately let them operate this framework to stack simple AI execution contents. Besides, while they use this framework, we will record how long it takes to complete this topic, this is to check whether the visual content we designed is intuitive.
Tutorial Step The target of this step is to make interviewees understand the functions and operations of NNBlocks. Since the design of this framework is derived from TVM, the detailed explanation provided by us enables them to understand the principles of its design. In addition, through Q&A interaction to understand the user experience.
Target Posttest Step The target of this step is to require interviewees to complete an advanced topic we specify, which its content must include using optimization blocks. In the process, when they encounter technical problems on this topic, they can ask questions for guidance. At the same time, we will record how long it took for them to complete this topic. This is to check whether they will improve their operations when they have understood the functions of NNBlocks.
Measure Step This step consists of two items. The first item is for interviewees to fill out a questionnaire with seven questions. The survey consists of two parts. The first part is a UMUX questionnaire with four questions which is proposed by Kraig Finstad's paper [[
Table 1 Four questions of the UMUX questionnaire
UMUX questionnaire Q1 NNBlocks's capabilities meet my requirements 1 to 7, Strongly Disagree to Strongly Agree Q2 Using NNBlocks is a frustrating experience 1 to 7, Strongly Disagree to Strongly Agree Q3 NNBlocks is easy to use 1 to 7, Strongly Disagree to Strongly Agree Q4 I have to spend too much time correcting things with NNBlocks 1 to 7, Strongly Disagree to Strongly Agree
Table 2 Theme Survey 3 Questions
Theme Survey Questions Q5 For NNBlocks, which are generated by simplifying complex such as TVM, its operation has a positive learning significance 1 to 7, Strongly Disagree to Strongly Agree Q6 NNBlocks provides blocks operation that supports advanced optimization, which will help you to engage in AI-related work in the future 1 to 7, Strongly Disagree to Strongly Agree Q7 Feedback Users free to text
The usability tutorial we designed was conducted in the CS540400 (2020 Advanced Compiler) course of the Department of Computer Science, National Tsing Hua University, Taiwan. A total of 20 students who took this course participated in this tutorial. These students were currently majoring in computer science. Since NNBlocks itself have the basic concept of information and logic, users must have basic programming capabilities and GUI operation experience. Therefore, these students who had not been exposed to any AI platform (such as TensorFlow, TVM) met the requirements for this tutorial.
We designed 100 min for this usability tutorial. These 20 students must use their own laptops to complete this tutorial. During the tutorial, we first introduced the concept, architecture, and execution flow of TVM. Then, students use their laptops to open a browser to connect to NNBlocks webpage and complete the topic of the designated blocks without any prompts. Then, we started to introduce the purpose, architecture, and functions of NNBlocks. Finally, we designed an advanced topic of the blocks for students to complete. When students encounter technical problems on this advanced topic, they can get guidance by asking.
This section explains the research results and discussions of this framework for the four questions listed in Sect. 1.
Is the framework easy to learn?
For intuition, we report the results through the status of use. This report contains two parts, one is the result of quantity, and the other is the results of time. First, the result of quantity, we take the aforementioned Fig. 14 as an example, which demonstrates that users want to complete the scheduling optimization for the convolution layers of AlexNet. This example requires only to stack seven blocks (this number depends on the design of the blocks and so is not an absolute value). However, the traditional handwritten programming, Fig. 14(1–3), indicates that at least 28 lines of code are needed, but the complete TVM execution and optimization must also include other declarations and programming for execution flow. The easy to use block type of NNBlocks facilitates direct evaluations of performance issues with different TVM scheduling optimization options compared with the effort required to handwritten the lines of code of implementing the scheduling policies programming. Therefore, we can understand that compared with users (especially beginners) actually developing optimization programs, NNBlocks is relatively intuition, which can greatly reduce the handwritten programming effort in deploying the NN model and optimization configurations.
Second, the results of time, we asked a beginner with a CS background to record the time required to operate the TVM for the first time to execute the NN model. The time includes the TVM environment setting, python syntax programming, and the building for the NN model inferencing. Next, as mentioned in Sect. 4.1 above, the 20 interviewees were asked to record the time required for the pretest step and posttest step to operate NNBlocks. Figure 15 records the results of time. In the setting of time, it is less than 1 min, 60 s will be used as the unit. For TVM, it takes 1800 s (30 min) for beginners to complete. For NNBlocks pretest step (finish a basic topic content is as Fig. 9), the average time is 135 s (2.15 min). The fastest is 60 s (1 min) and, the slowest is 300 s (5 min). For NNBlocks posttest step (finish an advanced topic content is similar to Fig. 14), the average time is 96 s (1.36 min). The fastest is 60 s (1 min), and the slowest is 300 s (5 min). Due to the visual application of NNBlocks, which takes much less time than TVM, the intuitive results meet our expectations that the framework is easy to learn.
Graph: Fig. 15 The results of time for NNBlocks pretest step and posttest step
How usability is the framework?
For usability analysis, we report the result of the UMUX questionnaire (Q1–O4) (see Table 3). The UMUX paper [[
Table 3 records the Q1–Q4 results of the UMUX questionnaire. We got a score of 75.00, which is the result of 20 interviewees mentioned in Sect. 4.2. The standard deviation is 14.56, confidence interval 95% (
Table 3 Results of UMUX questionnaire Q1–Q4
Likert scale (1 to 7) Mean S.D. C.I. Q1 NNBlocks's capabilities meet my requirements 75.00 14.560 6.38 [68.62–81.38] Q2 Using NNBlocks is a frustrating experience Q3 NNBlocks is easy to use Q4 I have to spend too much time correcting things with NNBlocks
(n = 20,
Table 4 Results of theme survey question Q5–Q6
Likert scale (1 to 7) Mean S.D. Q5 For NNBlocks, which are generated by simplifying complex such as TVM, its operation has a positive learning significance 5.40 1.046 Q6 NNBlocks provides blocks operation that supports advanced optimization, which will help you to engage in AI-related work in the future 5.05 1.276
(n = 20)
How significant is the framework for AI learning?
For significance, we report the results of theme survey questions (Q5–Q7) (see Table 4). We evaluated the two questions Q5–Q6 independently, and its assessment method is different from the UMUX metric model. Instead, the Likert scale with seven points is directly added to obtain the average and standard deviation. Q7 is a feedback question that interviewees can answer freely. Table 4 records the Q5–Q6 results of the survey (Q5 = 5.40, Q6 = 5.05), and its content shows that interviewees believe that is significant and meaningful for AI learning through this framework. In addition, we listed positive and constructive suggestions answered by interviewees in response to feedback questions:
- "If it develops towards productization, more detailed help functions can be provided."
- "Through this, beginners can understand the basic concepts of AI."
- "The visualization of the AI TVM execution flow is good."
- "The interface is easy to operate, making it easier for AI beginners to understand."
- "AWESOME work!"
Does the framework burden the system?
For system impact, we report the results through related experiments. First, we produced statistical data to illustrate the status of the components with NN models and the server-side/client-side configuration. Then, with the basic configuration of NNBlocks, the experimental data were generated by using the scheduling optimization options to execute the NN models and used to illustrate the impact on the system.
In server side, the hardware used in the experimental environment was a 3.50-GHz, 12-core, Intel Core i7-5930K CPU connected to 64 GB of RAM, while the software was version 14.04.5 of the Ubuntu operating system and TVM version 0.6 executes inference result with multi-thread in run-time. For the client side, the hardware was a 2.30-GHz, 2-core, Intel Core i5-6200U CPU connected to 8 GB of RAM, based on Ubuntu 18.04.
As indicated in Table 5, these 11 test samples are the objects of our experiment, comprise three models (MobileNet, ResNet18v2, and SqueezeNet) taken from the ONNX model zoo[
Table 5 Description of these 11 test samples in the experiments
Network Description Source MobileNet Lightweight NN model suitable for mobile and embedded environments ONNX model zoo ResNet18v2 An 18-layer CNN model from ONNX that uses shortcuts to generate residuals to improve the accuracy of image recognition SqueezeNet Lightweight NN model that simplifies network complexity while maintaining accuracy AlexNet CNN model used to classify images MXNet model zoo Inception_v3 Modified version of MXNet based on GoogleNet whose main feature is the factorization of convolution, which aims to reduce the number of connections/parameters without reducing the NN efficiency ResNet50_v1 A 50-layer CNN model. Increasing the number of layers decreases the computation required and the computation and the number of parameters are the number of parameters, and improves the recognition accuracy SqueezeNet1.1 A modified version of SqueezeNet with more-compressed data and a simplified NN VGG-16 A 16-layer CNN model using smaller convolution filters to improve the accuracy of the classified image VGG-16 (BN) Same as VGG-16 except for supporting the normalization of small batches of data VGG-19 Same as VGG-16 except for the addition of three layers of convolution VGG-19 (BN) Same as VGG-19 except for supporting the normalization of small batches of data
Graph: Fig. 16 Usage statistics of NN operations of these 11 test samples in the cache
Graph: Fig. 17 Maximum resident set size of these 11 test samples for the process during the lifetime of the model
First, we illustrate the status of the components in our system with test samples and server side, using the TVM cache API and the command parameters provided by the server system. Figure 16 demonstrates the statistical results that usage statistics of NN operations of these 11 test samples in the cache status of the server side. From Fig. 16, it can be found that the NN operations usage statistics of these two models, ONNX_MobileNet and MXNet_Inception_v3, are higher than other models obviously. Figure 17 describes the statistical results for the maximum resident size of these 11 test samples on the system RAM of the server side. We can find that the VGG series of test samples occupy more RAM to store data than other test samples. Furthermore, Fig. 18 is the translation time results that translating these 11 test samples' Relay IR format into blocks automatically. This automatic translating operation was supported by NNBlocks on the client side and examined the performance of NNBlocks operation and client side. Since these two test samples, ONNX_MobileNet and MXNet_Inception_v3, have a lot more blocks than other test samples. Therefore, we can see that the translation time results of these two test samples are longer than other test samples.
Graph: Fig. 18 Translation time for the Relay IR of these 11 test samples into blocks
Graph: Fig. 19 Execution times of these 11 test samples with our NNBlocks representations for these three different scheduling options
Then, Fig. 19 presents the performance results for the execution times of these 11 test samples listed in Table 5 on the server side. The scheduling optimization options used these three options (Baseline, TVMDefalut, and FPV) listed in Table 6 (these options listed from advanced blocks of Fig. 10), and the data set was ImageNet with a batch size of 100. Figure 19 indicates that using the Baseline option as the benchmark, the FPV options, and the TVMDefalut option, which can accelerate up to 16 times and 32.8 times, respectively. Hence, these two options can provide huge optimizations in terms of accelerated execution times for these 11 test samples.
Table 6 Descriptions of using these three options for those test samples scheduling experiments
Scheduling Description Baseline No scheduling optimization TVMDefault Hardware-dependent. This option already provided by TVM default optimized setting that supports specifically for x86 hardware platform FPV Hardware-independent. Scheduling optimization comprises the FPV (fusion, parallel, and vectorize) options
Whether it is the number of NN operations in the cache (Fig. 16) or the maximum resident size on the system RAM (Fig. 17) or translation time for Relay IR into blocks (Fig. 18), the NN model execution time is most concerned for AI can be a significant improvement through using the advanced blocks provided by the framework (Fig. 19). In other words, the optimization results can greatly reduce the burden on the system. From a tool perspective, the NN model execution time can be shortened through the experience of easy to use blocks of optimization options, which greatly increases the user's learning motivations.
This research proposes a visual concept design approach and uses it to establish a block-based deep learning compiler framework. Therefore, beginners with a programming background can use this framework to operate AI computing. This paper further completes user evaluation surveys for this framework to demonstrate the effectiveness of this approach.
Aiming at using a framework with highly specialized AI domain knowledge, the user target selected in this paper needs to have a background in basic computer logic and programming concepts. This time the interviewees are majoring in computer science but have not been exposed to AI-related courses. This condition satisfies that users can explore the operation of this tool with their own computer science background. Such a background premise helps to fit the real situation with tool design and questionnaire survey.
This approach proposed in this paper does not represent the best approach for the contents of the visual block designed for TVM. But compared with TensorFlow and Netron, this approach focuses more on the design of abstract concepts and further explores intuition and usability, which is quite appropriate. From the perspective of a tool, the concept of abstraction can highlight the important features of a deep learning compiler tool, such as the presentation of NN models (using abstract IR content) and the performance of optimization functions. Therefore, the intuition and usability survey results are consistent with the hypothesis of this research. At the same time, this approach is designed in the way of deep learning compiler operation, which conforms to the concept of system software. This is for beginners with a programming background, the idea of this tool design is quite suitable for them. From the survey results on the importance of AI learning and system impact, the interviewees agree with the benefits of AI learning and there is no impact on the execution of the system, which is also consistent with the hypothesis of this research. In addition, the current block-based tools combined with AI are mainly based on applications and teaching courses using specific NN models [[
Since the AI knowledge of deep learning tools had features of operational complexity and high thresholds, this paper proposed a visual design approach that specifically defines and simplifies the complex operation of a deep learning compiler tool. This approach provided for beginners with a programming background to quickly understand optimizing the deep learning deployment. This research also developed the NNBlocks framework that uses this approach to integrate TVM. This framework provided abstract blocks that can execute deep learning flow and optimization features, which make it easy to experience the different AI inference efficiency for beginners. The results of the evaluation indicated that comparing the process of program development and NNBlocks operation, regardless of the number of lines, the number of blocks, and the development time, the results met our intuitive expectations for the ease of operation of the framework. The UMUX questionnaire indicated that users had evaluated NNBlocks, and when they first use the framework with AI knowledge, the results were quite positive for evaluation. The theme survey indicated that users believed that the use of blocks was significant for AI learning. At the same time, they thought it could be helpful for AI-related work in the future. The experiments of the impact indicated that the impact of the system using the optimization function to accelerate NN models could not only improve the execution efficiency of the framework but also reduce the possibility of system burdening.
This approach not only can be used in the deep learning compiler tool but also potentially apply to deep learning frameworks such as Theano, TensorFlow, and MXnet in the future. Even though the core technology of each framework is different, the structure of the entire NN model is through the programming design. Therefore, we have the opportunity to use this approach to design abstract blocks for the program structure of these frameworks, making it easier for users to use the block-based interface of this framework and improving the effectiveness of AI learning.
The work is supported by the Taiwan Ministry of Science and Technology under Grant No.: MOST 107-2221-E-007-005-MY3.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
By Tai-Liang Chen; Yi-Ru Chen; Meng-Shiun Yu and Jenq-Kuen Lee
Reported by Author; Author; Author; Author