ResNet

Resnet

method

Resnet引入了所谓残差连接的思想：对于每一个层，我们要学习的不是$\mathcal{H}(x)$，而是$\mathcal{F}(x) = \mathcal{H}(x) - x$。换句话讲，我们不需要再学之前层学过的东西，而是要学两者的差距。

注意Resnet在连接时，存在两种不同的连接方式： $$ y = \mathcal{F}(x,{W_i})+x \ $$

$$ y = \mathcal{F}(x,{W_i})+W_sx $$

当跨越维度连接时，有两种可选方案来解决维度不适配的问题：1) 补0 2)投影。消融实验表明，这两种方法对结果没有本质影响。

不同的Resnet配置如下：

code

BasicBlock

见resnet.py。

Resnet提供了两种原版结构：BasicBlock和BottleNeck。此处不再深究两者的区别，以默认的BasicBlock为例：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
                 base_width=64, dilation=1, norm_layer=None):
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError('BasicBlock only supports groups=1 and base_width=64')
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = conv3x3(in_planes=inplanes, out_planes=planes, stride=stride)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x) # residual connection

        out += identity
        out = self.relu(out)

        return out

每一个BasicBlock对应的是这样一个模块：

其__init__参数中的inplanes和planes分别指代input channel和output channel，一个简单示例如下：

1
2
3
4
5
6
7
8
9


>> basicblockimpl = BasicBlock(64, 128)
>> print(basicblockimpl)
BasicBlock(
  (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

其forward部分如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


def forward(self, x):
    identity = x

    out = self.conv1(x)
    out = self.bn1(out)
    out = self.relu(out)

    out = self.conv2(out)
    out = self.bn2(out)

    if self.downsample is not None:
        identity = self.downsample(x) # residual connection

    out += identity
    out = self.relu(out)

    return out

注意到其中的残差连接部分。值得注意的是，为了进行维度匹配，forward中还存在self.downsample部分，是一个channel维度不匹配的二维卷积。之后在Resnet的解释中会详细说明：

1
2
3
4


(downsample): Sequential(
    (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

Resnet

观察其forward函数，我们逐个分析其中的模块：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


def _forward_impl(self, x): # [2, 3, 224, 224]
    x = self.conv1(x) # [2, 64, 112, 112]
    x = self.bn1(x) 
    x = self.relu(x) 
    x = self.maxpool(x)  # [2, 64, 56, 56]

    x = self.layer1(x) # [2, 64, 56, 56]
    x = self.layer2(x) # [2, 128, 28, 28]
    x = self.layer3(x) # [2, 256, 14, 14]
    x = self.layer4(x) # [2, 512, 7, 7]

    x = self.avgpool(x) # [2, 512, 1, 1]
    x = torch.flatten(x, 1) # [2, 512]
    x = self.fc(x) # [2, 1000]

    return x

self.conv1(x)使用7x7卷积核，将图片的channel从3变为64，同时特征图长宽缩短一半。

1

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

self.maxpool(x)将特征图长宽再缩短一半。

1

(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)

之后是模块的主体部分：

1
2
3
4
5
6
7


self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2,
                               dilate=replace_stride_with_dilation[0])
self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
                               dilate=replace_stride_with_dilation[1])
self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
                               dilate=replace_stride_with_dilation[2])

首先看这4个层中共有的特征：64/128/256/512是输出通道数，layers[i]是通道的层数。

注意到layer1和其它模块参数的不同之处：无stride。这决定了：

每一个layer第一个BasicBlock的状态：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


## layer 1
(0): BasicBlock(
  (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

## layer 2
(0): BasicBlock(
  (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (downsample): Sequential(
    (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)

第一个layer用于缩小特征图和增加channel数（stride=1时除外）。其它的layer保持特征图的尺寸不变。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


layers = []
layers.append(block(self.inplanes, planes, stride, downsample, self.groups,
                    self.base_width, previous_dilation, norm_layer))
self.inplanes = planes * block.expansion
for _ in range(1, blocks):
    layers.append(block(self.inplanes, planes, groups=self.groups,
                        base_width=self.base_width, dilation=self.dilation,
                        norm_layer=norm_layer))
(layer2): Sequential(
(0): BasicBlock(
  (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (downsample): Sequential(
    (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
)
(1): BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(2): BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(3): BasicBlock(
  (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)

是否含有downsample层：

1
2
3
4


downsample = nn.Sequential(
    conv1x1(self.inplanes, planes * block.expansion, stride),
    norm_layer(planes * block.expansion),
)

当跨层进行残差连接时，可能会出现维度不匹配的问题，这时候需要进行downsampling以使得维度匹配：

1
2
3


out: torch.Size([2, 128, 28, 28])
in before: torch.Size([2, 64, 56, 56])
in after: torch.Size([2, 128, 28, 28])

self.avgpool所使用的是torch.nn.AdaptiveAvgPool2d，使用时只需要指定最后两维的尺寸(H_0, W_0)即可。

self.fc将图片分为1000类。

最后我们对模型的使用做一个总结（以resnet34为例）：

1
2
3
4


def resnet34(pretrained=False, progress=True, **kwargs):
    return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress, **kwargs)

# layer1~4's depth is 3,4,6,3

1
2
3
4


resnet34impl = resnet34()
img = torch.randn([2, 3, 224, 224])
out = resnet34impl(img)
print(out.shape) # [2, 1000]