源码文件

不论是在训练脚本文件 train_net.py 还是在测试脚本文件 test_net.py 中, 都调用了 build_detection_model(cfg) 函数来创建模型, 该函数封装了模型定义的内部细节, 使得我们可以通过配置文件轻松的组合出不同类型的模型, 为了能够更好的了解模型的内部细节, 我们有必要知道这些模型是如何被定义, 又是如何组合到一起的, 为此我们需要对 MaskrcnnBenchmark 的 modeling 文件夹进行解析, 该文件夹的结构及文件关系如下所示(位于 ./maskrcnn_benchmark/modeling/ 文件夹下):

backbone
detector
- detectors.py
- generalized_rcnn.py
roi_heads
- box_head
- mask_head
  - inference.py
  - loss.py
  - mask_head.py
  - roi_mask_feature_extractors.py
rpn
balanced_positive_negative_sampler.py
box_coder.py
matcher.py
poolers.py
registry.py
utils.py

下面, 我们根据各个文件和函数之间的逻辑关系(而不是上面的文件顺序), 对 MaskrcnnBenchmark 的模型定义模块展开详细的解析和讨论. 想要透彻了解此部分的代码, 只需要按照本文的顺序仔细阅读即可.

detector 模型定义入口

第一部分是 detector 文件夹, 该文件夹中的两个文件定义了是整个 modeling 模块的入口. 文件解析如下

detectors.py 文件解析

第一个文件 detectors.py 中的代码只有短短几行, 其主要功能就是根据给定的配置信息实例化一个 class GeneralizedRCNN 的对象, 代码如下所示:

# ./maskrcnn_benchmark/modeling/detector/detectors.py

from .generalized_rcnn import GeneralizedRCNN

_DETECTION_META_ARCHITECTURES = {"GeneralizedRCNN": GeneralizedRCNN}

# 该函数是创建模型的入口函数, 也是唯一的模型创建函数
def build_detection_model(cfg):
    # 构建一个模型字典, 虽然只有一对键值, 但是可以方便后续的扩展
    meta_arch = _DETECTION_META_ARCHITECTURES[cfg.MODEL.META_ARCHITECTURE]
    # 下面的语句等价于
    # return GeneralizedRCNN(cfg)
    return meta_arch(cfg)

上面的代码利用配置信息 cfg 实例化了一个 class GeneralizedRCNN 类, 该类定义在 ./maskrcnn_benchmark/modeling/detector/generalized_rcnn.py 文件中, 关于该文件的解析请看下一节.

generalized_rcnn.py 文件解析

该文件定义了 MaskrcnnBenchmark 的 GeneralizedRCNN 类, 用于表示各种组合后的目标检测模型, 代码解析如下:

import torch
from torch import nn

# 该函数定义于 ./maskrcnn_benchmark/structures/image_list.py 文件中
from maskrcnn_benchmark.structures.image_list import to_image_list

from ..backbone import build_backbone
from ..rpn.rpn import build_rpn
from ..roi_heads.roi_heads import build_roi_heads

# 定义类的具体实现
class GeneralizedRCNN(nn.Module):
    # 该类是 MaskrcnnBenchmark 中所有模型的共同抽象, 目前支持 boxes 和 masks 两种形式的标签
    # 该类主要包含以下三个部分:
    # - backbone
    # - rpn(option)
    # - heads: 利用前面网络输出的 features 和 proposals 来计算 detections / masks.

    def __init__(self, cfg): # 根据配置信息初始化模型
        super(GeneralizedRCNN, self).__init__()

        # 根据配置信息创建 backbone 网络
        self.backbone = build_backbone(cfg)

        # 根据配置信息创建 rpn 网络
        self.rpn = build_rpn(cfg)

        # 根据配置信息创建 roi_heads
        self.roi_heads = build_roi_heads(cfg)

    def forward(self, images, targets=None): # 定义模型的前向传播过程
        # images (list[Tensor] or ImageList)
        # targets (list[BoxList])
        # 返回值: result (list[BoxList] or dict[Tensor])
        # 在训练阶段, 返回字典类型的模型损失, 在测试阶段, 返回模型的预测结果.

        # 当 training 设置为 True 时, 必须提供 targets.
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")

        images = to_image_list(images) # 将图片的数据类型转换成 ImageList

        # 利用 backbone 网络获取图片的 features
        features =  self.backbone(images.tensors)

        # 利用 rpn 网络获取 proposals 和相应的 loss
        proposals, proposal_losses = self.rpn(images, features, targets)
        if self.roi_heads: # 如何 roi_heads 不为 None 的话, 就计算其输出的结果
            x, result, detector_losses = self.roi_heads(features, proposals, targets)
        else:
            # RPN-only models don't have roi_heads
            x = features
            result = proposals
            detector_losses = {}
        if self.training: # 在训练模式下, 输出损失值
            losses = {}
            losses.update(detector_losses)
            losses.update(proposal_losses)

        return result # 如果不在训练模式下, 则输出模型的预测结果.

上面的代码中, to_image_list 函数位于 MaskrcnnBenchmark 的结构模块当中, 具体解析可以看structures. 另外, 可以看出, MaskrcnnBenchmark 模型的创建主要依赖于三个函数, 即 build_backbone(cfg), build_rpn(cfg), build_roi_heads(cfg). 下面, 我们就按照模型定义的顺序, 分别讲解这三个函数的内部实现

backbone 模型骨架定义

modeling/ 文件夹下面的 backbone/ 文件夹定义了有关模型骨架的相关代码, 该文件夹中总共三个主要的文件, 分别为:

backbone.py
fpn.py
resnet.py

backbone.py 文件解析

我们在定义骨架网络时使用到的 build_backbone(cfg) 函数, 正位于 ./maskrcnn_benchmark/modeling/backbone/backbone.py 文件中, 因此, 我们首先来看看该文件的内部实现.

from collections import OrderedDict # 导入有序字典

from torch import nn

# 注册器, 用于管理 module 的注册, 使得可以像使用字典一样使用 module
from maskrcnn_benchmark.modeling import registry

from . import fpn as fpn_module # 同文件夹下的文件, 会在后面讲解
from . import resnet # 同文件夹下的文件, 会在后面讲解

# 创建 resnet 骨架网络, 根据配置信息会被后面的 build_backbone() 函数调用
@registry.BACKBONES.register("R-50-C4")
def build_resnet_backbone(cfg):
    body = resnet.ResNet(cfg) # resnet.py 文件中的 class ResNet(cfg)
    model = nn.Sequential(OrderedDict([("body", body)])) # 利用 nn.Sequential 定义模型
    return model

# 创建 fpn 网络, 根据配置信息会被下面 build_backbone 函数调用
@registry.BACKBONES.register("R-50-FPN")
@registry.BACKBONES.register("R-101-FPN")
def build_resnet_fpn_backbone(cfg):
    body = resnet.ResNet(cfg) # 先创建 resnet 网络

    # 获取 fpn 所需的channels参数
    in_channels_stage2 = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
    out_channels = cfg.MODEL.BACKBONE.OUT_CHANNELS
    fpn = fpn_module.FPN( # 利用 fpn.py 文件夹的 class FPN 创建 fpn 网络
        in_channels_list=[
            in_channels_stage2,
            in_channels_stage2 * 2,
            in_channels_stage2 * 4,
            in_channels_stage2 * 8,
        ],
        out_channels=out_channels,
        top_blocks=fpn_module.LastLevelMaxPool(),
    )
    model = nn.Sequential(OrderedDict([("body", body), ("fpn", fpn)]))
    return model

上面两个函数分别定义了创建 ResNet 和 FPN 的代码逻辑, 下面我们就用这两个函数来进行模型创建, 代码解析如下:

def build_backbone(cfg):
    assert cfg.MODEL.BACKBONE.CONV_BODY in registry.BACKBONES, \
        "cfg.MODEL.BACKBONE.CONV_BODY: {} are not registered in registry".format(
            cfg.MODEL.BACKBONE.CONV_BODY
        )
    return registry.BACKBONES[cfg.MODEL.BACKBONE.CONV_BODY](cfg)

resnet.py 网络主体(特征提取器)

在上面一节中的 backbone.py 文件中的两个函数 build_resnet_backbone() 和 build_resnet_fpn_backbone() 都使用了 body = resnet.ResNet(cfg) 来创建网络的主体, 这部分的代码定义位于 ./maskrcnn_benchmark/modeling/backbone/resnet.py 文件中, 下面我们就该该文件进行解析, 由于该文件篇幅较多, 因此我们先来看一下文件的整体结构:

# ./maskrcnn_benchmark/modeling/backbone/resnet.py

# 导入各种包及函数
# ...
from maskrcnn_benchmark.layers ipmort FrozenBatchNorm2d
# ...

# ResNet stage specification
StageSpec = #...

# ResNet
class ResNet(nn.Module):
    def __init__(self, cfg):
        super(ResNet, self).__init__()
        # 初始化
        # ...

    def _freeze_backbone(self, freeze_at):
        # 将指定的参数置为: requires_grad = False
        # ...

    def forward(self, x):
        # 定义 resnet 的前向传播过程
        # ...

# ResNetHead
class ResNetHead(nn.Module):
    def __init__(...):
        # 初始化
        # ...

    def foward(self, x):
        # 定义 ResNetHead 的前向传播过程
        # ...

def _make_stage(...):
    # 创建 ResNet 的 residual-block
    # ...

class BottleneckWithFixedBatchNorm(nn.Module):
    # 使用固定的BN
    def __init__(...):
        # 初始化
        # ...
    def forward(self, x):
        # 定义前向传播过程
        # ...

class StemWithFixedBatchNorm(nn.Module):
    def __init__(self, cfg):
        # 初始化
        # ...

    def forward(self, x):
        # 定义前向传播过程
        # ...

_TRANSFORMATION_MODULES = Registry({..})

_STEM_MODULES = Registry({..})

_STAGE_SPECS = Registry({..})

ResNet Stage Specification

文件的开头定义了 ResNet 的不同 stage 下的 block 的定义, 使用了 namedtuple 数据结构(命名元组, 可以用名字访问元素)来实现, 如下所示:

StageSpec = namedtuple(
    "StageSpec",
    [
        "index", # stage 的下标, 如 1, 2, ..., 5
        "block_count", # stage 当中的 block 的数量
        "return_features", # 布尔值, 若为 True, 则返回当前 stage 的最后一层的 feature map
    ],
)

# 标准 ResNet 模块

# ResNet-50 full stages 的2~5阶段的卷积层数分别为:3,4,6,3
ResNet50StagesTo5 = tuple( # 元组内部的元素类型为 StageSpec
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, False), (4, 3, True))
)

# ResNet-50-C4, 只使用到第四阶段输出的特征图谱
ResNet50StagesTo4 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, False), (2, 4, False), (3, 6, True))
)

# ResNet-50-FPN full stages, 由于 FPN需要用到每一个阶段输出的特征图谱, 故 return_features 参数均为 True
ResNet50FPNStagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 6, True), (4, 3, True))
)

# ResNet-101-FPN full stages 的卷积层数分别为: 3, 4, 23, 3
ResNet101FPNStagesTo5 = tuple(
    StageSpec(index=i, block_count=c, return_features=r)
    for (i, c, r) in ((1, 3, True), (2, 4, True), (3, 23, True), (4, 3, True))
)

ResNet 类

为了使阅读代码时不被搞混, 我们首先将文件最后的注册的各个模块贴出来, 这些模块会通过配置文件中的字符串信息来决定调用哪一个类或者参数, 代码如下所示:

_TRANSFORMATION_MODULES = Registry({
    "BottleneckWithFixedBatchNorm": BottleneckWithFixedBatchNorm
})

_STEM_MODULES = Registry({"StemWithFixedBatchNorm": StemWithFixedBatchNorm})

_STAGE_SPECS = Registry({
    "R-50-C4": ResNet50StagesTo4,
    "R-50-C5": ResNet50StagesTo5,
    "R-50-FPN": ResNet50FPNStagesTo5,
    "R-101-FPN": ResNet101FPNStagesTo5,
})

当定义完各个 ResNet 模型的 stages 的卷积层数量后, 我们再来看一看 ResNet 类的实现, 代码解析如下所示:

# ./maskrcnn_benchmark/modeling/backbone/resnet.py

class ResNet(nn.Module):
    def __init__(self, cfg):
        super(ResNet, self).__init__()

        # 如果我们希望在 forward 函数中使用 cfg, 那么我们就应该创建一个副本以供其使用
        # self.cfg = cfg.clone()

        # 将配置文件中的字符串转化成具体的实现, 下面三个分别使用了对应的注册模块, 定义在文件的最后

        # 这里是 stem 的实现, 也就是 resnet 的第一阶段 conv1
        # cfg.MODEL.RESNETS.STEM_FUNC = "StemWithFixedBatchNorm"
        stem_module = _STEM_MODULES[cfg.MODEL.RESNETS.STEM_FUNC]

        # resnet conv2_x~conv5_x 的实现
        # eg: cfg.MODEL.CONV_BODY="R-50-FPN"
        stage_specs = _STAGE_SPECS[cfg.MODEL.CONV_BODY]

        # residual transformation function
        # cfg.MODEL.RESNETS.TRANS_FUNC="BottleneckWithFixedBatchNorm"
        transformation_module = _TRANSFORMATION_MODULES[cfg.MODEL.RESNETS.TRANS_FUNC]

        # 获取上面各个组成部分的实现以后, 就可以利用这些实现来构建模型了

        # 构建 stem module(也就是 resnet 的stage1, 或者 conv1)
        self.stem = stem_module(cfg)

        # 获取相应的信息来构建 resnet 的其他 stages 的卷积层

        # 当 num_groups=1 时为 ResNet, >1 时 为 ResNeXt
        num_groups = cfg.MODEL.RESNETS.NUM_GROUPS

        #
        width_per_group = cfg.MODEL.RESNETS.WIDTH_PER_GROUP

        # in_channels 指的是向后面的第二阶段输入时特征图谱的通道数,
        # 也就是 stem 的输出通道数, 默认为 64
        in_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS

        # 第二阶段输入的特别图谱的通道数
        stage2_bottleneck_channels = num_groups * width_per_group

        # 第二阶段的输出, resnet 系列标准模型可从 resnet 第二阶段的输出通道数判断后续的通道数
        # 默认为256, 则后续分别为512, 1024, 2048, 若为64, 则后续分别为128, 256, 512
        stage2_out_channels = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS

        # 创建一个空的 stages 列表和对应的特征图谱字典
        self.stages = []
        self.return_features = {}

        for stage_spec in stage_specs: # 关于 stage_specs 的定义可以看上一节
            name = "layer" + str(stage_spec.index)

            # 计算每个stage的输出通道数, 每经过一个stage, 通道数都会加倍
            stage2_relative_factor = 2 ** (stage_spec.index - 1)

            # 计算输入图谱的通道数
            bottleneck_channels = stage2_bottleneck_channels * stage2_relative_factor

            # 计算输出图谱的通道数
            out_channels = stage2_out_channels * stage2_relative_factor

            # 当获取到所有需要的参数以后, 调用本文件的 `_make_stage` 函数,
            # 该函数可以根据传入的参数创建对应 stage 的模块(注意是module而不是model)
            module = _make_stage(
                transformation_module,
                in_channels, # 输入的通道数
                bottleneck_channels, # 压缩后的通道数
                out_channels, # 输出的通道数
                stage_spec.block_count, #当前stage的卷积层数量
                num_groups, # ResNet时为1, ResNeXt时>1
                cfg.MODEL.RESNETS.STRIDE_IN_1X1,
                # 当处于 stage3~5时, 需要在开始的时候使用 stride=2 来downsize
                first_stride=int(stage_spec.index > 1) + 1,
            )

            # 下一个 stage 的输入通道数即为当前 stage 的输出通道数
            in_channels = out_channels

            # 将当前stage模块添加到模型中
            self.add_module(name, module)

            # 将stage的名称添加到列表中
            self.stages.append(name)

            # 将stage的布尔值添加到字典中
            self.return_features[name] = stage_spec.return_features

        # 根据配置文件的参数选择性的冻结某些层(requires_grad=False)
        self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_CONV_BODY_AT)

    def _freeze_backbone(self, freeze_at):
        # 根据给定的参数冻结某些层的参数更新
        for stage_index in range(freeze_at):
            if stage_index == 0:
                m = self.stem # resnet 的第一阶段, 即为 stem
            else:
                m = getattr(self, "layer" + str(stage_index))
            # 将 m 中的所有参数置为不更新状态.
            for p in m.parameters():
                p.requires_grad = False

    # 定义 ResNet 的前行传播过程
    def forward(self, x):
        outputs = []
        x = self.stem(x) # 先经过 stem(stage 1)

        # 再依次计算 stage2~5的结果
        for stage_name in self.stages:
            x = getattr(self, stage_name)(x)
            if self.return_features[stage_name]:
                # 将stage2~5的所有计算结果(也就是特征图谱)以列表形式保存
                outputs.append(x)

        # 将结果返回, outputs为列表形式, 元素为各个stage的特征图谱, 刚好作为 FPN 的输入
        return outputs

ResNetHead 类

接下来, 我们来看看 ResNetHead 类的实现, 代码解析如下所示:

class ResNetHead(nn.Module):
    def __init__(
        self,
        block_module,
        stages,
        num_groups=1,
        width_per_group=64,
        stride_in_1x1=True,
        stride_init=None,
        res2_out_channels=256,
    ):
        super(ResNetHead, self).__init__()

        # 获取不同stage的通道数相对于stage2的倍数
        stage2_relative_factor = 2 ** (stages[0].index - 1)

        # 获取压缩后的 stage2 的 channels
        stage2_bottleneck_channels = num_groups * width_per_group

        # 获取输出的 channels
        out_channels = res2_out_channels * stage2_relative_factor

        # 获取输入的 channels
        in_channels = out_channels // 2

        # 获取压缩后的 channels
        bottleneck_channels = stage2_bottleneck_channels * stage2_relative_factor

        # 根据给定的名称获取相应 block_module
        # 目前 _TRANSFORMATION_MODULES 只包含 "BottleneckWithFixedBatchNorm" 这一个模块
        block_module = _TRANSFORMATION_MODULES[block_module]

        # 创建一个空的 stages 列表
        self.stages = []

        # 初始化 stride
        stride = stride_init

        for stage in stages:
            name = "layer" + str(stage.index)
            if not stride:
                # 当处于 stage3~5时, 需要在开始的时候使用 stride=2 来downsize
                stride = int(stage.index > 1) + 1
            module = _make_stage(
                block_module,
                in_channels,
                bottleneck_channels,
                out_channels,
                stage.block_count,
                num_groups,
                stride_in_1x1,
                first_stride=stride,
            )
            stride = None
            self.add_module(name, module)
            self.stages.append(name)

    # 定义前向传播过程
    def forward(self, x):
        for stage in self.stages:
            x = getattr(self, stage)(x)
        return x

make_stage

在上面两个类中, 都使用了 _make_stage() 函数来创建对应的 stage, 下面, 我们就来看看该函数的具体实现, 代码解析如下所示:

# ./maskrcnn_benchmark/modeling/backbone/resnet.py

def _make_stage(
    transformation_module,
    in_channels,
    bottleneck_channels,
    out_channels,
    block_count,
    num_groups,
    stride_in_1x1,
    first_stride,
):
    blocks = []
    stride = first_stride
    for _ in range(block_count):
        blocks.append(
            transformation_module(
                in_channels,
                bottleneck_channels,
                out_channels,
                num_groups,
                stride_in_1x1,
                stride,
            )
        )
        stride = 1
        in_channels = out_channels

StemWithFixedBatchNorm 类

该类负责构建 ResNet 的 stem 模块, 也可以认为是 ResNet 的第一阶段(或者说是第零阶段), 在 ResNet 50 中, 该阶段主要包含一个 7×7 大小的卷积核, 在 MaskrcnnBenchmark 的实现中, 为了可以方便的复用实现各个 stage 的代码, 它将第二阶段最开始的 3×3 的 max pooling 层也放到了 stem 中的 forward 函数中实现(一般不带参数网络层的都放在 forward 中), 该类的实现代码解析如下:

# ./maskrcnn_benchmark/modeling/backbone/resnet.py

class StemWithFixedBatchNorm(nn.Module):
    def __init__(self, cfg):
        super(StemWithFixedBatchNorm, self).__init__()

        # resnet-50, out_channels=64
        out_channels = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS

        # 输入的 channels 为 3, 输出为 64
        self.conv1 = Conv2d(
            3, out_channels, kernel_size=7, stride=2, padding=3, bias=False
        )

        # 使用固定参数的 BN 层
        self.bn1 = FrozenBatchNorm2d(out_channels)

    # 定义前向传播过程
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = F.relu_(x) # 原地激活, 因为不含参数, 因此不放在模型定义中, 而放在 forward 中实现
        x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
        return x

上面代码中的 Conv2d 是封装在 ./maskrcnn_benchmark/layers/misc.py 文件中的 class Conv2d(nn.Conv2d) 类, 它会根据 tensor 的 numel 参数决定其返回值, 当 x.numel()>0 时, 与普通的 torch.nn.Conv2d() 函数没有区别. 另外还使用了 ./maskrcnn_benchmark/layers/batch_norm.py 文件中定义的 class FrozenBatchNorm2d(nn.Module) 类, 该类主要实现了 BN 层的功能, 只不过其中的参数都是固定的, 而非可更新的.

BottleneckWithFixedBatchNorm 类

创建完 stem(stage1) 以后, 接下来就是需要创建 resnet 的 stage2~5, 根据 resnet 的特点我们可以知道, resnet2~5 阶段的整体结构是非常相似的, 都是有最基础的 resnet bottleneck block 堆叠形成的, 不同 stage 的 bottleneck block 的数量不同, 对于 resnet50 来说, 每一个阶段的 bottleneck block 的数量分别为 3,4,6,3, 并且各个相邻 stage 之间的通道数都是两倍的关系, 所以可以很容易的从一个 stage 的通道数推知另一个 stage 的通道数, 关于 bottleneck block 的代码解析如下所示:

# ./maskrcnn_benchmark/modeling/backbone/resnet.py

class BottleneckWithFixedBatchNorm(nn.Module):
    def __init__(
        self,
        in_channels, # bottleneck 的输入 channels
        bottleneck_channels, # bottleneck 压缩后的 channels
        out_channels, # 当前stage的输出channels
        num_groups=1,
        stride_in_1x1=True,
        stride=1,
    ):
        super(BottleneckWithFixedBatchNorm, self).__init__()

        # downsample: 当 bottleneck 的输入和输出的 channels 不相等时, 则需要采用一定的策略
        # 在原文中, 有 A, B, C三种策略, 本文采用的是 B 策略(也是原文推荐的)
        # 即只有在输入输出通道数不相等时才使用 projection shortcuts,
        # 也就是利用参数矩阵映射使得输入输出的 channels 相等
        self.downsample = None

        # 当输入输出通道数不同时, 额外添加一个 1×1 的卷积层使得输入通道数映射成输出通道数
        if in_channels != out_channels:
            self.downsample = nn.Sequential(
                Conv2d(
                    in_channels, out_channels, kernel_size=1, stride=stride, bias=False
                ),
                FrozenBatchNorm2d(out_channels), # 后街一个固定参数的 BN 层
            )

        # 在 resnet 原文中, 会在 conv3_1, conv4_1, conv5_1 处使用 stride=2 的卷积
        # 而在 fb.torch.resnet 和 caffe2 的实现中, 是将之后的 3×3 的卷积层的 stride 置为2
        # 下面中的 stride 虽然默认值为1, 但是在函数调用时, 如果stage为3~5, 则会显示置为2
        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)

        # 当获取到当前stage所需的参数后, 就创建相应的卷积层, 创建原则参见 resnet50 的定义
        self.conv1 = Conv2d(
            in_channels,
            bottleneck_channels,
            kernel_size=1,
            stride=stride_1x1,
            bias=False,
        )
        self.bn1 = FrozenBatchNorm2d(bottleneck_channels) # 后接一个固定参数的 BN 层

        # 创建 bottleneck 的第二层卷积层
        self.conv2 = Conv2d(
            bottleneck_channels,
            bottleneck_channels,
            kernel_size=3,
            stride=stride_3x3,
            padding=1,
            bias=False,
            groups=num_groups,
        )
        self.bn2 = FrozenBatchNorm2d(bottleneck_channels) # 后接一个 BN 层

        # 创建 bottleneck 的最后一个卷积层, padding默认为1
        self.conv3 = Conv2d(
            bottleneck_channels, out_channels, kernel_size=1, bias=False
        )
        self.bn3 = FrozenBatchNorm2d(out_channels)

    def forward(self, x):
        # 执行一次forward, 相当于执行一次 bottleneck,
        # 默认情况下, 具有三个卷积层, 一个恒等连接, 每个卷积层之后都带有 BN 和 relu 激活
        # 注意, 最后一个激活函数要放在恒等连接之后

        residual = x # 恒等连接, 直接令残差等于x即可

        # conv1, bn1
        out = self.conv1(x)
        out = self.bn1(out)
        out = F.relu_(out)

        # conv2, bn2
        out = self.conv2(out)
        out = self.bn2(out)
        out = F.relu_(out)

        # conv3, bn3
        out0 = self.conv3(out) # 这里的out0好像没必要带0?
        out = self.bn3(out0)

        if self.downsample is not None:
            # 如果输入输出的通道数不同, 则需要通过映射使之相同.
            residual = self.downsample(x)

        out += residual # H = F + x
        out = F.relu_(out) # 最后进行激活

        return out # 返回带有残差项的卷积结果

fpn.py 特征金字塔网络

对于 ResNet-50-C4 来说, 只需要上面的 ResNet 模型即可完成特征提取任务, 但是对于 ResNet-50-FPN 来说, 我们还需要实现 FPN 网络以获得更强的特征提取能力, 在 backbone.py 文件中的 build_resnet_fpn_backbone(cfg) 函数中, 就使用了 fpn = fpn_module.FPN(...) 来创建一个 FPN 类的实例对象, 并且利用 nn.Sequential() 将 ResNet 和 FPN 组合在一起形成一个模型, 并将其返回, 下面, 我们就来看看 FPN 网络的具体实现, 实例代码位于 ./maskrcnn_benchmark/modeling/backbone/fpn.py 文件中, 解析如下:

# ./maskrcnn_benchmark/modeling/backbone/fpn.py

import torch
import torch.nn.functional as F
from torch import nn

class FPN(nn.Module):
    # 在一系列的 feature map (实际上就是stage2~5的最后一层输出)添加 FPN
    # 这些 feature maps 的 depth 假定是不断递增的, 并且 feature maps 必须是连续的(从stage角度)

    def __init__(self, in_channels_list, out_channels, top_blocks=None):
        # in_channels_list (list[int]): 指示了送入 fpn 的每个 feature map 的通道数
        # out_channels (int): FPN表征的通道数, 所有的特征图谱最终都会转换成这个通道数大小
        # top_blocks (nn.Module or None): 当提供了 top_blocks 时, 就会在 FPN 的最后
        # 的输出上进行一个额外的 op, 然后 result 会扩展成 result list 返回
        super(FPN, self).__init__()

        # 创建两个空列表
        self.inner_blocks = []
        self.layer_blocks = []

        # 假设我们使用的是 ResNet-50-FPN 和配置, 则 in_channels_list 的值为:
        # [256, 512, 1024, 2048]
        for idx, in_channels in enumerate(in_channels_list, 1): # 下标从1开始
            # 用下表起名: fpn_inner1, fpn_inner2, fpn_inner3, fpn_inner4
            inner_block = "fpn_inner{}".format(idx)

            # fpn_layer1, fpn_layer2, fpn_layer3, fpn_layer4
            layer_block = "fpn_layer{}".format(idx)

            # 创建 inner_block 模块, 这里 in_channels 为各个stage输出的通道数
            # out_channels 为 256, 定义在用户配置文件中
            # 这里的卷积核大小为1, 该卷积层主要作用为改变通道数到 out_channels(降维)
            inner_block_module = nn.Conv2d(in_channels, out_channels, 1)

            # 改变 channels 后, 在每一个 stage 的特征图谱上再进行 3×3 的卷积计算, 通道数不变
            layer_block_module = nn.Conv2d(out_channels, out_channels, 3, 1, 1)

            for module in [inner_block_module, layer_block_module]:
                # Caffe2 的实现使用了 XavierFill,
                # 实际上相当于 PyTorch 中的 kaiming_uniform_
                nn.init.kaiming_uniform_(module.weight, a=1)
                nn.init.constant_(module.bias, 0)

            # 在当前的特征图谱上添加 FPN
            self.add_module(inner_block, inner_block_module) #name, module
            self.add_module(layer_block, layer_block_module)

            # 将当前 stage 的 fpn 模块的名字添加到对应的列表当中
            self.inner_blocks.append(inner_block)
            self.layer_blocks.append(layer_block)

        # 将top_blocks作为 FPN 类的成员变量
        self.top_blocks = top_blocks

    def forward(self, x):
        # x (list[Tensor]): 每个 feature level 的 feature maps,
        # ResNet的计算结果正好满足 FPN 的输入要求, 也因此可以使用 nn.Sequential 将二者直接结合
        # results (tuple[Tensor]): 经过FPN后的特征图谱组成的列表, 排列顺序是高分辨率的在前

        # 先计算最后一层(分辨率最低)特征图谱的fpn结果.
        last_inner = getattr(self, self.inner_blocks[-1])(x[-1])

        # 创建一个空的结果列表
        results=[]

        # 将最后一层的计算结果添加到 results 中
        results.append(getattr(self, self.layer_blocks[-1])(last_inner))


        # [:-1] 获取了前三项, [::-1] 代表从头到尾切片, 步长为-1, 效果为列表逆置
        # 举例来说, zip里的操作 self.inner_block[:-1][::-1] 的运行结果为
        # [fpn_inner3, fpn_inner2, fpn_inner1], 相当于对列表进行了逆置
        for feature, inner_block, layer_block in zip(
            x[:-1][::-1], self.inner_block[:-1][::-1], self.layer_blocks[:-1][::-1]
        ):
            # 根据给定的scale参数对特征图谱进行放大/缩小, 这里scale=2, 所以是放大
            inner_top_down = F.interpolate(last_inner, scale_factor=2, mode="nearest")

            # 获取 inner_block 的计算结果
            inner_lateral = getattr(self, inner_block)(feature)

            # 将二者叠加, 作为当前stage的输出 同时作为下一个stage的输入
            last_inner = inner_lateral + inner_top_down

            # 将当前stage输出添加到结果列表中, 注意还要用 layer_block 执行卷积计算
            # 同时为了使得分辨率最大的在前, 我们需要将结果插入到0位置
            results.insert(0, getattr(self, layer_block)(last_inner))

        # 如果 top_blocks 不为空, 则执行这些额外op
        if self.top_blocks is not None:
            last_results = self.top_blocks(results[-1])
            results.extend(last_results) # 将新计算的结果追加到列表中

        # 以元组(只读)形式返回
        return tuple(results)

# 最后一级的 max pool 层
class LastLevelMaxPool(nn.Module):
    def forward(self, x):
        return [F.max_pool2d(x, 1, 2, 0)]

roi_heads

在detector/generalized_rcnn.py文件中, 模型定义如下所示:

def __init__(self, cfg):
    super(GeneralizedRCNN, self).__init__()

    self.backbone = build_backbone(cfg)
    self.rpn = build_rpn(cfg, self.backbone.out_channels)
    self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels)

所以, 当使用 backbone 和 rpn 构建后特征图谱的生成结构以后, 我们就需要在特征图谱上划分相应的 RoI, 该模块的定义入口就是roi_heads/roi_heads.py中build_roi_heads函数, 下面我们对该文件进行解析

roi_heads

首先是入口函数build_roi_heads

def build_roi_heads(cfg, in_channels):
    # individually create the heads, that will be combined together
    # afterwards
    roi_heads = []
    if cfg.MODEL.RETINANET_ON: # RetinaNet 不需要 RoI
        return []

    # 从概念上, 下面的 roi 可以同时开启, 互不影响, 但通常只会开启其中一个
    if not cfg.MODEL.RPN_ONLY: # 使用 RPN
        roi_heads.append(("box", build_roi_box_head(cfg, in_channels)))
    if cfg.MODEL.MASK_ON: # 使用 Mask
        roi_heads.append(("mask", build_roi_mask_head(cfg, in_channels)))
    if cfg.MODEL.KEYPOINT_ON: # 使用 key point
        roi_heads.append(("keypoint", build_roi_keypoint_head(cfg, in_channels)))

    # combine individual heads in a single module
    if roi_heads:
        roi_heads = CombinedROIHeads(cfg, roi_heads)

    return roi_heads

上面在构建roi时, 根据种类的不同分别使用了build_roi_box_head, build_roi_mask_head, 以及build_roi_keypoint_head, 同时, 利用CombinedROIHeads将它们结合在一起, 前三个是函数, CombinedROIHeads是一个类. 下面我们逐个介绍

box_head

build_roi_box_head()

该函数位于roi_heads/box_head/box_head.py文件中, 我们来看一下该函数的实现:

def build_roi_box_head(cfg, in_channels):
    """
    Constructs a new box head.
    By default, uses ROIBoxHead, but if it turns out not to be enough, just register a new class
    and make it a parameter in the config
    """
    return ROIBoxHead(cfg, in_channels)

该函数返回了ROIBoxHead的实例, 该类同样定义在roi_heads/box_head/box_head.py文件中, 实现如下:

class ROIBoxHead(torch.nn.Module):
    """
    Generic Box Head class.
    """

    def __init__(self, cfg, in_channels):
        super(ROIBoxHead, self).__init__()
        self.feature_extractor = make_roi_box_feature_extractor(cfg, in_channels) # roi_box_feature_extractors.py 文件中的函数
        self.predictor = make_roi_box_predictor( # roi_box_predictors.py 文件中的函数
            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_box_post_processor(cfg) # inference.py 文件中函数
        self.loss_evaluator = make_roi_box_loss_evaluator(cfg) # loss.py 文件中函数

    def forward(self, features, proposals, targets=None):
        """
        Arguments:
            features (list[Tensor]): feature-maps from possibly several levels
            proposals (list[BoxList]): proposal boxes
            targets (list[BoxList], optional): the ground-truth targets.

        Returns:
            x (Tensor): the result of the feature extractor
            proposals (list[BoxList]): during training, the subsampled proposals
                are returned. During testing, the predicted boxlists are returned
            losses (dict[Tensor]): During training, returns the losses for the
                head. During testing, returns an empty dict.
        """

        if self.training:
            # Faster R-CNN subsamples during training the proposals with a fixed
            # positive / negative ratio
            with torch.no_grad():
                proposals = self.loss_evaluator.subsample(proposals, targets)

        # extract features that will be fed to the final classifier. The
        # feature_extractor generally corresponds to the pooler + heads
        x = self.feature_extractor(features, proposals)
        # final classifier that converts the features into predictions
        class_logits, box_regression = self.predictor(x)

        if not self.training:
            result = self.post_processor((class_logits, box_regression), proposals)
            return x, result, {}

        loss_classifier, loss_box_reg = self.loss_evaluator(
            [class_logits], [box_regression]
        )
        return (
            x,
            proposals,
            dict(loss_classifier=loss_classifier, loss_box_reg=loss_box_reg),
        )

可以看出, ROIBoxHead 主要由 4 个部分组成, 分别是:feature_extractor, predictor, post_processor, 以及loss_evaluator. 其中, feature_extractor主要是提取各个 RoI 的特征, predictor是对每个 RoI 进行预测, 得到class_logits和box_regression, 然后, 利用post_processor和loss_evaluator计算分类器和回归器的损失, 这四个部分的分别位于roi_heads/box_head/目录下的不同文件中, 下面我们一一对其进行分析.

make_roi_box_feature_extractor

该函数位于roi_heads/box_head/roi_box_feature_extractors.py文件中, 函数定义如下:

def make_roi_box_feature_extractor(cfg, in_channels):
    func = registry.ROI_BOX_FEATURE_EXTRACTORS[
        cfg.MODEL.ROI_BOX_HEAD.FEATURE_EXTRACTOR
    ]
    return func(cfg, in_channels)

上面的代码表示, 该函数会根据用户配置文件中指定的cfg.MODEL.ROI_BOX_HEAD.FEATURE_EXTRACTOR来调用不同的函数, 对应了不同的 RoIHEAD, roi_box_feature_extractors.py文件中定义了三种不同的 ROI_BOX_HEAD.FEATURE_EXTRACTOR, 分别如下所示:

ResNet50Conv5ROIFeatureExtractor

@registry.ROI_BOX_FEATURE_EXTRACTORS.register("ResNet50Conv5ROIFeatureExtractor")
class ResNet50Conv5ROIFeatureExtractor(nn.Module):
    def __init__(self, config, in_channels):
        super(ResNet50Conv5ROIFeatureExtractor, self).__init__()

        resolution = config.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
        scales = config.MODEL.ROI_BOX_HEAD.POOLER_SCALES
        sampling_ratio = config.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
        pooler = Pooler( # 位于 modeling/pooler.py 文件中
            output_size=(resolution, resolution),
            scales=scales,
            sampling_ratio=sampling_ratio,
        )

        stage = resnet.StageSpec(index=4, block_count=3, return_features=False)
        head = resnet.ResNetHead(
            block_module=config.MODEL.RESNETS.TRANS_FUNC,
            stages=(stage,),
            num_groups=config.MODEL.RESNETS.NUM_GROUPS,
            width_per_group=config.MODEL.RESNETS.WIDTH_PER_GROUP,
            stride_in_1x1=config.MODEL.RESNETS.STRIDE_IN_1X1,
            stride_init=None,
            res2_out_channels=config.MODEL.RESNETS.RES2_OUT_CHANNELS,
            dilation=config.MODEL.RESNETS.RES5_DILATION
        )

        self.pooler = pooler
        self.head = head
        self.out_channels = head.out_channels

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
        x = self.head(x)
        return x

FPN2MLPFeatureExtractor

@registry.ROI_BOX_FEATURE_EXTRACTORS.register("FPN2MLPFeatureExtractor")
class FPN2MLPFeatureExtractor(nn.Module):
    """
    Heads for FPN for classification
    """

    def __init__(self, cfg, in_channels):
        super(FPN2MLPFeatureExtractor, self).__init__()

        resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
        scales = cfg.MODEL.ROI_BOX_HEAD.POOLER_SCALES
        sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
        pooler = Pooler(
            output_size=(resolution, resolution),
            scales=scales,
            sampling_ratio=sampling_ratio,
        )
        input_size = in_channels * resolution ** 2
        representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM
        use_gn = cfg.MODEL.ROI_BOX_HEAD.USE_GN
        self.pooler = pooler
        self.fc6 = make_fc(input_size, representation_size, use_gn)
        self.fc7 = make_fc(representation_size, representation_size, use_gn)
        self.out_channels = representation_size

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
        x = x.view(x.size(0), -1)

        x = F.relu(self.fc6(x))
        x = F.relu(self.fc7(x))

        return x

FPNXconv1fcFeatureExtractor

@registry.ROI_BOX_FEATURE_EXTRACTORS.register("FPNXconv1fcFeatureExtractor")
class FPNXconv1fcFeatureExtractor(nn.Module):
    """
    Heads for FPN for classification
    """

    def __init__(self, cfg, in_channels):
        super(FPNXconv1fcFeatureExtractor, self).__init__()

        resolution = cfg.MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION
        scales = cfg.MODEL.ROI_BOX_HEAD.POOLER_SCALES
        sampling_ratio = cfg.MODEL.ROI_BOX_HEAD.POOLER_SAMPLING_RATIO
        pooler = Pooler(
            output_size=(resolution, resolution),
            scales=scales,
            sampling_ratio=sampling_ratio,
        )
        self.pooler = pooler

        use_gn = cfg.MODEL.ROI_BOX_HEAD.USE_GN
        conv_head_dim = cfg.MODEL.ROI_BOX_HEAD.CONV_HEAD_DIM
        num_stacked_convs = cfg.MODEL.ROI_BOX_HEAD.NUM_STACKED_CONVS
        dilation = cfg.MODEL.ROI_BOX_HEAD.DILATION

        xconvs = []
        for ix in range(num_stacked_convs):
            xconvs.append(
                nn.Conv2d(
                    in_channels,
                    conv_head_dim,
                    kernel_size=3,
                    stride=1,
                    padding=dilation,
                    dilation=dilation,
                    bias=False if use_gn else True
                )
            )
            in_channels = conv_head_dim
            if use_gn:
                xconvs.append(group_norm(in_channels))
            xconvs.append(nn.ReLU(inplace=True))

        self.add_module("xconvs", nn.Sequential(*xconvs))
        for modules in [self.xconvs,]:
            for l in modules.modules():
                if isinstance(l, nn.Conv2d):
                    torch.nn.init.normal_(l.weight, std=0.01)
                    if not use_gn:
                        torch.nn.init.constant_(l.bias, 0)

        input_size = conv_head_dim * resolution ** 2
        representation_size = cfg.MODEL.ROI_BOX_HEAD.MLP_HEAD_DIM
        self.fc6 = make_fc(input_size, representation_size, use_gn=False)
        self.out_channels = representation_size

    def forward(self, x, proposals):
        x = self.pooler(x, proposals)
        x = self.xconvs(x)
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc6(x))
        return x

mask_head

接下来是build_roi_mask_head函数, 该函数位于roi_heads/mask_head/mask_head.py文件中, 定义如下:

1 2	def build_roi_mask_head(cfg, in_channels): return ROIMaskHead(cfg, in_channels)

ROIMaskHead 定义如下:

class ROIMaskHead(torch.nn.Module):
    def __init__(self, cfg, in_channels):
        super(ROIMaskHead, self).__init__()
        self.cfg = cfg.clone()
        self.feature_extractor = make_roi_mask_feature_extractor(cfg, in_channels)
        self.predictor = make_roi_mask_predictor(
            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_mask_post_processor(cfg)
        self.loss_evaluator = make_roi_mask_loss_evaluator(cfg)

    def forward(self, features, proposals, targets=None):
        """
        Arguments:
            features (list[Tensor]): feature-maps from possibly several levels
            proposals (list[BoxList]): proposal boxes
            targets (list[BoxList], optional): the ground-truth targets.

        Returns:
            x (Tensor): the result of the feature extractor
            proposals (list[BoxList]): during training, the original proposals
                are returned. During testing, the predicted boxlists are returned
                with the `mask` field set
            losses (dict[Tensor]): During training, returns the losses for the
                head. During testing, returns an empty dict.
        """

        if self.training:
            # during training, only focus on positive boxes
            all_proposals = proposals
            proposals, positive_inds = keep_only_positive_boxes(proposals)
        if self.training and self.cfg.MODEL.ROI_MASK_HEAD.SHARE_BOX_FEATURE_EXTRACTOR:
            x = features
            x = x[torch.cat(positive_inds, dim=0)]
        else:
            x = self.feature_extractor(features, proposals)
        mask_logits = self.predictor(x)

        if not self.training:
            result = self.post_processor(mask_logits, proposals)
            return x, result, {}

        loss_mask = self.loss_evaluator(proposals, mask_logits, targets)

        return x, all_proposals, dict(loss_mask=loss_mask)

同样可以看到, ROIMaskHead 由四部分组成, 分别为feature_extractor, predictor, post_processor以及loss_evaluator, 他们分别定义在下面的四个文件当中, 具体细节可查看源代码.

roi_mask_feature_extractors.py, roi_mask_predictors.py, inference.py, loss.py

keypoint_head

函数build_roi_keypoint_head定义在roi_heads/keypoint_head/keypoint_head.py, 具体如下所示:

def build_roi_keypoint_head(cfg, in_channels):
    return ROIKeypointHead(cfg, in_channels)
class ROIKeypointHead(torch.nn.Module):
    def __init__(self, cfg, in_channels):
        super(ROIKeypointHead, self).__init__()
        self.cfg = cfg.clone()
        self.feature_extractor = make_roi_keypoint_feature_extractor(cfg, in_channels)
        self.predictor = make_roi_keypoint_predictor(
            cfg, self.feature_extractor.out_channels)
        self.post_processor = make_roi_keypoint_post_processor(cfg)
        self.loss_evaluator = make_roi_keypoint_loss_evaluator(cfg)

    def forward(self, features, proposals, targets=None):
        """
        Arguments:
            features (list[Tensor]): feature-maps from possibly several levels
            proposals (list[BoxList]): proposal boxes
            targets (list[BoxList], optional): the ground-truth targets.

        Returns:
            x (Tensor): the result of the feature extractor
            proposals (list[BoxList]): during training, the original proposals
                are returned. During testing, the predicted boxlists are returned
                with the `mask` field set
            losses (dict[Tensor]): During training, returns the losses for the
                head. During testing, returns an empty dict.
        """
        if self.training:
            with torch.no_grad():
                proposals = self.loss_evaluator.subsample(proposals, targets)

        x = self.feature_extractor(features, proposals)
        kp_logits = self.predictor(x)

        if not self.training:
            result = self.post_processor(kp_logits, proposals)
            return x, result, {}

        loss_kp = self.loss_evaluator(proposals, kp_logits)

        return x, proposals, dict(loss_kp=loss_kp)