Stochastic computing system hardware design for convolutional neural networks optimized for accuracy area and energy efficiency

Abstract
Stochastic computing (SC) is an alternative computing paradigm that can lead to designs that offer lower area and power consumption compared to that of the conventional binary-encoded (BE) deterministic computing. In SC, numbers are encoded as a bit-stream of ‘0’s and ‘1’s, where SC computation elements (or functions) operate on one or more bit-streams. To obtain accurate results, some functions require the bit-streams to be correlated, while others require uncorrelated bit-streams or a combination of both. The relationship between SC function accuracy and correlation is not well studied in previous works. Thus, managing the correlation across the SC system is a key challenge in the effort to achieve optimum accuracy. In addition, to perform SC computation, the input values are converted from BE domain to SC; then on the completion of the computation, back to BE to obtain the results. The conversion processes require circuitry that typically consume over 80% of the overall SC system area, hence this is another key challenge of the problem. To address the above mentioned challenges, this thesis proposes a framework of an end-to-end system design optimized for accuracy and area. The framework provides guidelines to design an effective SC function or system that exploit correlation. This framework is applied in designing the SC functional units and the complete SC system for convolutional neural network (CNN), which is the dominant approach in the implementation of recognition systems. This thesis shows that although CNN is a compute-intensive and resource-demanding algorithm, through the proposed SC design framework, it is possible to implement CNN in an embedded system with limited area and power budget. Several novel SCbased functions are proposed that outperform previous works and obtain significant area savings and high accuracy to replace the BE equivalent functions. These functions include inner product, max pooling, ReLU activation function, and average pooling. Then, some training considerations are specified to enable achieving low error rates for SC-based CNN. Experimental results show that the SC-based CNN attained no or minor accuracy degradation compared to BE counterpart. SC-based CNN achieves 99.6% and 96.25% classification accuracy using MNIST digit classification and AT&T face recognition datasets, respectively. Moreover, the SC-based CNN of ResNet-20 model achieves 86.5% classification accuracy using CIFAR-10 object dataset. To rapidly map an SC system into FPGA, a generic design strategy for high-level synthesis of SC computation engines is proposed. The SC-based CNN hardware on FPGA obtains the lowest resource utilization compared to previous works on FPGA-based CNN accelerators. In addition, the proposed hardware architecture achieves 277.46 GOP/s/W energy efficiency, which outperforms previous works.
Description
Thesis (PhD. (Electrical Engineering))
Keywords
Stochastic analysis, Computable functions
Citation