2) 采用深度可分离卷积(DW),将标准卷积分解成深度卷积(depthwise convolution)和逐点卷积(pointwise convolution),其优点是可以大幅度降低参数量和计算量。
相比MobileNet V1,MobileNet V2
1) 在DW之前增加了了一个1*1的“扩张”层(PW),由于 DW 卷积本身没有改变通道数的能力,上一层输入多少通道,它就输出多少通道。因此如果上一层输入的通道数较少,DW 就只能在低维空间提特征,导致效果不够好。针对这个问题MobileNet V2 给每个 DW 之前都增加了一个 PW,专门用来升维,从而可以提取更高维的特征。
2) 将第二个PW之后激活函数ReLU6改成Linear,这是因为非线性在高维空间有益处,但在低维空间会破坏特征,反而不如线性好。而第二个PW的作用就是降维,因此PW之后就不宜再使用 ReLU6 了。
MobileNet V3:
微结构
[attach]291[/attach]
相比MobileNet V1,MobileNet V2
1) 引入了基于Squeeze-and-Excitation结构的轻量级注意力模型(SE模块),放在DW之后。其计算原理就相当于对每一个Feature map加权计算。计算过程包括:首先通过一个Avgpool得到一个一维的向量,元素个数和Feature map数目一样。然后两个带ReLU激活函数的全连接层,最后加一个带h-sigmoid激活函数的全连接层。
2) 引入使用了一种新的激活函数h-swish(x)
[attach]292[/attach]
首先,由于几乎所有的软件和硬件框架上都可以使用ReLU6的优化实现,因此选择ReLU6来逼近swish函数,让swish变得更硬(hard)。其次,它能在特定模式下消除了由于近似sigmoid的不同实现而带来的潜在的数值精度损失。
3)
对MobileNet V2网络端部最后段进行了修改。论文作者在实验中发现最后提取特征1*1卷积作用在7*7和1*1接受域上的准确率相同,因此最后一个用来减小计算量的bottleneck 就显得冗余了,因此去掉了这两层,使得总计算量和latency都大大降低。
[attach]293[/attach]
4) 修改了头部卷积核channel数量。通过使用h-swish代替ReLU6的方式提高了精度,所以减少了第一层的通道数,从32*3*3减为16*3*3,在精度的得到保证的情况下,提升了3ms的速度。
5) 网络结构搜索中,结合两种技术:资源受限的NAS(platform-aware NAS)与NetAdapt (这一技术基本上土豪才有机会详细了解,这里就不介绍了) 。
3.实验结果Google团队使用MobileNet V1/V2/V3在ImageNet分类任务上排名第一的类别与实际结果相符的准确率Top-1,乘法累加运算Madds和网络参数量Params对比如下:
其中Mobilenet V2/V3采用了缩减参数alpha调节每层的宽度,它可用来影响输入通道数M及输出通道N的数目。若施加了alpha参数,那么在真正计算时所用的M与N将分别为alpha x M与alpha x N。
4.Matlab实现由于Matlab深度学习库函数中还没有自带的Mobilenet V3实现,因此这里选择Mobilenet V2的Matlab代码作为实例。数据集选择包含煤和矸石的2类图像数据,其中煤的图像个数为408个,矸石图像个数为396个,共计804个数据样本按照随机排列的8:2比例进行训练和预测分类。由于选用Adam优化器,可以不设置测试集。
- numClasses = 2;
- lgraph = layerGraph();
- tempLayers =[imageInputLayer([size(XTrain,1) size(XTrain,2) size(XTrain,3)], "Name", "input_1","Normalization","zscore")
- convolution2dLayer([3 3],32,"Name","Conv1","Padding","same","Stride",[2 2])
- batchNormalizationLayer("Name","bn_Conv1","Epsilon",0.001)
- clippedReluLayer(6,"Name","Conv1_relu")
- groupedConvolution2dLayer([3 3],1,32,"Name","expanded_conv_depthwise","Padding","same")
- batchNormalizationLayer("Name","expanded_conv_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","expanded_conv_depthwise_relu")
- convolution2dLayer([1 1],16,"Name","expanded_conv_project","Padding","same")
- batchNormalizationLayer("Name","expanded_conv_project_BN","Epsilon",0.001)
- convolution2dLayer([1 1],96,"Name","block_1_expand","Padding","same")
- batchNormalizationLayer("Name","block_1_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_1_expand_relu")
- groupedConvolution2dLayer([3 3],1,96,"Name","block_1_depthwise","Padding","same","Stride",[2 2])
- batchNormalizationLayer("Name","block_1_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_1_depthwise_relu")
- convolution2dLayer([1 1],24,"Name","block_1_project","Padding","same")
- batchNormalizationLayer("Name","block_1_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],144,"Name","block_2_expand","Padding","same")
- batchNormalizationLayer("Name","block_2_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_2_expand_relu")
- groupedConvolution2dLayer([3 3],1,144,"Name","block_2_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_2_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_2_depthwise_relu")
- convolution2dLayer([1 1],24,"Name","block_2_project","Padding","same")
- batchNormalizationLayer("Name","block_2_project_BN","Epsilon",0.001)];
- lgraph = addLayers(lgraph,tempLayers);
- tempLayers = [
- additionLayer(2,"Name","block_2_add")
- convolution2dLayer([1 1],144,"Name","block_3_expand","Padding","same")
- batchNormalizationLayer("Name","block_3_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_3_expand_relu")
- groupedConvolution2dLayer([3 3],1,144,"Name","block_3_depthwise","Padding","same","Stride",[2 2])
- batchNormalizationLayer("Name","block_3_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_3_depthwise_relu")
- convolution2dLayer([1 1],32,"Name","block_3_project","Padding","same")
- batchNormalizationLayer("Name","block_3_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],192,"Name","block_4_expand","Padding","same")
- batchNormalizationLayer("Name","block_4_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_4_expand_relu")
- groupedConvolution2dLayer([3 3],1,192,"Name","block_4_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_4_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_4_depthwise_relu")
- convolution2dLayer([1 1],32,"Name","block_4_project","Padding","same")
- batchNormalizationLayer("Name","block_4_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = additionLayer(2,"Name","block_4_add");
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],192,"Name","block_5_expand","Padding","same")
- batchNormalizationLayer("Name","block_5_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_5_expand_relu")
- groupedConvolution2dLayer([3 3],1,192,"Name","block_5_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_5_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_5_depthwise_relu")
- convolution2dLayer([1 1],32,"Name","block_5_project","Padding","same")
- batchNormalizationLayer("Name","block_5_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- additionLayer(2,"Name","block_5_add")
- convolution2dLayer([1 1],192,"Name","block_6_expand","Padding","same")
- batchNormalizationLayer("Name","block_6_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_6_expand_relu")
- groupedConvolution2dLayer([3 3],1,192,"Name","block_6_depthwise","Padding","same","Stride",[2 2])
- batchNormalizationLayer("Name","block_6_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_6_depthwise_relu")
- convolution2dLayer([1 1],64,"Name","block_6_project","Padding","same")
- batchNormalizationLayer("Name","block_6_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],384,"Name","block_7_expand","Padding","same")
- batchNormalizationLayer("Name","block_7_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_7_expand_relu")
- groupedConvolution2dLayer([3 3],1,384,"Name","block_7_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_7_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_7_depthwise_relu")
- convolution2dLayer([1 1],64,"Name","block_7_project","Padding","same")
- batchNormalizationLayer("Name","block_7_project_BN","Epsilon",0.001)];
- lgraph = addLayers(lgraph,tempLayers);
- tempLayers = additionLayer(2,"Name","block_7_add");
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],384,"Name","block_8_expand","Padding","same")
- batchNormalizationLayer("Name","block_8_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_8_expand_relu")
- groupedConvolution2dLayer([3 3],1,384,"Name","block_8_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_8_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_8_depthwise_relu")
- convolution2dLayer([1 1],64,"Name","block_8_project","Padding","same")
- batchNormalizationLayer("Name","block_8_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = additionLayer(2,"Name","block_8_add");
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],384,"Name","block_9_expand","Padding","same")
- batchNormalizationLayer("Name","block_9_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_9_expand_relu")
- groupedConvolution2dLayer([3 3],1,384,"Name","block_9_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_9_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_9_depthwise_relu")
- convolution2dLayer([1 1],64,"Name","block_9_project","Padding","same")
- batchNormalizationLayer("Name","block_9_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- additionLayer(2,"Name","block_9_add")
- convolution2dLayer([1 1],384,"Name","block_10_expand","Padding","same")
- batchNormalizationLayer("Name","block_10_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_10_expand_relu")
- groupedConvolution2dLayer([3 3],1,384,"Name","block_10_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_10_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_10_depthwise_relu")
- convolution2dLayer([1 1],96,"Name","block_10_project","Padding","same")
- batchNormalizationLayer("Name","block_10_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],576,"Name","block_11_expand","Padding","same")
- batchNormalizationLayer("Name","block_11_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_11_expand_relu")
- groupedConvolution2dLayer([3 3],1,576,"Name","block_11_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_11_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_11_depthwise_relu")
- convolution2dLayer([1 1],96,"Name","block_11_project","Padding","same")
- batchNormalizationLayer("Name","block_11_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = additionLayer(2,"Name","block_11_add");
- lgraph = addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],576,"Name","block_12_expand","Padding","same")
- batchNormalizationLayer("Name","block_12_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_12_expand_relu")
- groupedConvolution2dLayer([3 3],1,576,"Name","block_12_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_12_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_12_depthwise_relu")
- convolution2dLayer([1 1],96,"Name","block_12_project","Padding","same")
- batchNormalizationLayer("Name","block_12_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- additionLayer(2,"Name","block_12_add")
- convolution2dLayer([1 1],576,"Name","block_13_expand","Padding","same")
- batchNormalizationLayer("Name","block_13_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_13_expand_relu")
- groupedConvolution2dLayer([3 3],1,576,"Name","block_13_depthwise","Padding","same","Stride",[2 2])
- batchNormalizationLayer("Name","block_13_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_13_depthwise_relu")
- convolution2dLayer([1 1],160,"Name","block_13_project","Padding","same")
- batchNormalizationLayer("Name","block_13_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],960,"Name","block_14_expand","Padding","same")
- batchNormalizationLayer("Name","block_14_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_14_expand_relu")
- groupedConvolution2dLayer([3 3],1,960,"Name","block_14_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_14_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_14_depthwise_relu")
- convolution2dLayer([1 1],160,"Name","block_14_project","Padding","same")
- batchNormalizationLayer("Name","block_14_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = additionLayer(2,"Name","block_14_add");
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- convolution2dLayer([1 1],960,"Name","block_15_expand","Padding","same")
- batchNormalizationLayer("Name","block_15_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_15_expand_relu")
- groupedConvolution2dLayer([3 3],1,960,"Name","block_15_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_15_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_15_depthwise_relu")
- convolution2dLayer([1 1],160,"Name","block_15_project","Padding","same")
- batchNormalizationLayer("Name","block_15_project_BN","Epsilon",0.001)];
- lgraph =addLayers(lgraph,tempLayers);
- tempLayers = [
- additionLayer(2,"Name","block_15_add")
- convolution2dLayer([1 1],960,"Name","block_16_expand","Padding","same")
- batchNormalizationLayer("Name","block_16_expand_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_16_expand_relu")
- groupedConvolution2dLayer([3 3],1,960,"Name","block_16_depthwise","Padding","same")
- batchNormalizationLayer("Name","block_16_depthwise_BN","Epsilon",0.001)
- clippedReluLayer(6,"Name","block_16_depthwise_relu")
- convolution2dLayer([1 1],320,"Name","block_16_project","Padding","same")
- batchNormalizationLayer("Name","block_16_project_BN","Epsilon",0.001)
- convolution2dLayer([1 1],1280,"Name","Conv_1")
- batchNormalizationLayer("Name","Conv_1_bn","Epsilon",0.001)
- clippedReluLayer(6,"Name","out_relu")
- globalAveragePooling2dLayer("Name","global_average_pooling2d_1")
- fullyConnectedLayer(numClasses,"Name","Logits")
- softmaxLayer("Name","Logits_softmax")
- classificationLayer("Name","ClassificationLayer_Logits")];
- lgraph =addLayers(lgraph,tempLayers);
- lgraph = connectLayers(lgraph,"block_1_project_BN","block_2_expand");
- lgraph = connectLayers(lgraph,"block_1_project_BN","block_2_add/in2");
- lgraph = connectLayers(lgraph,"block_2_project_BN","block_2_add/in1");
- lgraph = connectLayers(lgraph,"block_3_project_BN","block_4_expand");
- lgraph = connectLayers(lgraph,"block_3_project_BN","block_4_add/in2");
- lgraph = connectLayers(lgraph,"block_4_project_BN","block_4_add/in1");
- lgraph = connectLayers(lgraph,"block_4_add","block_5_expand");
- lgraph = connectLayers(lgraph,"block_4_add","block_5_add/in2");
- lgraph = connectLayers(lgraph,"block_5_project_BN","block_5_add/in1");
- lgraph = connectLayers(lgraph,"block_6_project_BN","block_7_expand");
- lgraph = connectLayers(lgraph,"block_6_project_BN","block_7_add/in2");
- lgraph = connectLayers(lgraph,"block_7_project_BN","block_7_add/in1");
- lgraph = connectLayers(lgraph,"block_7_add","block_8_expand");
- lgraph = connectLayers(lgraph,"block_7_add","block_8_add/in2");
- lgraph = connectLayers(lgraph,"block_8_project_BN","block_8_add/in1");
- lgraph = connectLayers(lgraph,"block_8_add","block_9_expand");
- lgraph = connectLayers(lgraph,"block_8_add","block_9_add/in2");
- lgraph = connectLayers(lgraph,"block_9_project_BN","block_9_add/in1");
- lgraph = connectLayers(lgraph,"block_10_project_BN","block_11_expand");
- lgraph = connectLayers(lgraph,"block_10_project_BN","block_11_add/in2");
- lgraph = connectLayers(lgraph,"block_11_project_BN","block_11_add/in1");
- lgraph = connectLayers(lgraph,"block_11_add","block_12_expand");
- lgraph = connectLayers(lgraph,"block_11_add","block_12_add/in2");
- lgraph = connectLayers(lgraph,"block_12_project_BN","block_12_add/in1");
- lgraph = connectLayers(lgraph,"block_13_project_BN","block_14_expand");
- lgraph = connectLayers(lgraph,"block_13_project_BN","block_14_add/in2");
- lgraph = connectLayers(lgraph,"block_14_project_BN","block_14_add/in1");
- lgraph = connectLayers(lgraph,"block_14_add","block_15_expand");
- lgraph = connectLayers(lgraph,"block_14_add","block_15_add/in2");
- lgraph = connectLayers(lgraph,"block_15_project_BN","block_15_add/in1");
复制代码
其中训练和测试网络的参数设计如下:
[attach]300[/attach]
从下图中可以看出Mobilenet V2在训练第6轮时训练集得分类准确率和损失函数曲线达到稳定状态。
[attach]294[/attach]
训练好的Mobilenet V2模型对测试集的预测分类混淆矩阵如下图,准确率达到了88.2%,之前用一个两层的卷积网络准确率达到了99%以上,可能对这个数据集较少,使用深度网络Mobilenet V2可能会导致模型过拟合。当然这里的目的不是要说明Mobilenet V2的性能,仅仅只是提供程序实现的参考实例,有关提升Mobilenet V2对所使用数据集提升预测分类的准确率调参方法这里不再详细展开讨论。
[attach]295[/attach]
通过Matlab中的函数activations可以查看测试集中数据使用训练好的模型Mobilenet V2提取的中间层特征,其中对应一个煤和矸石的第八层expanded_conv_project对应的特征图如下两图所示:
[attach]296[/attach]
[attach]297[/attach]
通过对Mobilenet v2第151层做t-SNE图能比较直观地看出数据集分类的数据结构,数据大致聚成2团,但是有部分几乎很难分开,这也与最后的分类准确率基本相符。
[attach]298[/attach]
参考文献:
论文:MobileNets: Efficient Convolutional Neural Networks for MobileVision Applications
论文:MobileNetV2: Inverted Residuals and Linear Bottlenecks
论文:Searching for MobileNetV3