Mind the structure: adopting structural information for deep neural network compression

Abstract

Deep neural networks have huge number of parameters and require large number of bits for representation. This hinders their adoption in decentralized environments where model transfer among different parties is a characteristic of the environment while the communication bandwidth is limited. Parameter quantization is a compression approach to address this challenge by reducing the number of bits required to represent a model, e.g. a neural network. However, majority of existing neural network quantization methods do not exploit structural information of layers and parameters during quantization. In this paper, focusing on Convolutional Neural Networks (CNNs), we present a novel quantization approach by employing the structural information of neural network layers and their corresponding parameters. Starting from a pre-trained CNN, we categorize network parameters into different groups based on the similarity of their layers and their spatial structure. Parameters of each group are independently clustered and the centroid of each cluster is used as representative for all parameters in the cluster. Finally, the centroids and the cluster indexes of the parameters are used as a compact representation of the parameters. Experiments with two different tasks, i.e., acoustic scene classification and image compression, demonstrate the effectiveness of the proposed approach.

Publication
International Conference on Image Processing