我们知道, FasterRCNN 作为目标检测任务的一个标志性的检测模型, 在目标检测领域具有十分广泛的应用, 其模型原理主要包含以下几个重要的组成部分:

BackBone: VGG, ResNet等
RoIPool: 感兴趣区域池化层
RPN: 候选框区域推荐网络, FasterRCNN最主要的贡献点

接下来, 我们就按照上面的模块划分, 介绍一下 FasterRCNN 的具体实现(源码地址: https://github.com/jwyang/faster-rcnn.pytorch).

RPN

RPN 网络是在 FasterRCNN 中提出来的, 也算是 FasterRCNN 的核心所在.

rpn.py 文件:

该文件定义了 RPN 的网络结构.
由于 FasterRCNN 模型的突出贡献在于提出了 RPN 网络, 并且要实现 FasterRCNN 模型, 首先就需要实现 RPN 网络结构, 因此, 我们先来看一下如何利用 PyTorch 实现 RPN 的网络结构.

init 函数

# ./lib/model/rpn/rpn.py
class _RPN(nn.Module):
    def __init__(self, din): # 代表输入的特征图谱的深度, 如 512
        super(_RPN, self).__init__()

        self.din = din
        # from model.utils.config import cfg
        self.anchor_scales = cfg.ANCHOR_SCALES
        self.anchor_ratios = cfg.ANCHOR_RATIOS
        self.feat_stride = cfg.FEAT_STRIDE[0] # 图片转化成特征图谱后缩小的尺寸倍数

        # 定义 RPN 网络的卷积层
        self.RPN_Conv = nn.Conv2d(self.din, 512, 3, 1, 1, bias=True) # kernel size为3, stride 和 padding 为1, 所以输出图谱的尺寸不变.

        # 定义前景和后景的分类层
        self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2 # 输出深度为 2(前/后景) * 9 (9个anchors)
        self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0) # 输出尺寸不变, 深度变为 2*9. 对应9个anchor box的前后景概率.

        # 定义anchor box的坐标偏移量预测层
        self.nc_bbox_out = len(self.anchor_scales) * len(self.anchor_ratios) * 4  # 输出的深度(channels), 对应 9 个anchor boxes 的 4 个坐标
        self.RPN_bbox_pred = nn.Con2d(512, self.nc_bbox_out, 1, 1, 0)

        # 定义 proposal 层, from .proposal_layer import _ProposalLayer
        self.RPN_proposal = _ProposalLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)

        # 定义 anchor 匹配层(将anchor于gt匹配), from .anchor_target_layer import _AnchorTargetLayer
        self.RPN_anchor_target = _AnchorTargetLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)

        self.rpn_loss_cls = 0
        self.rpn_loss_box = 0

在 RPN 网络类的初始化函数中, 可以看出, 除了定义预测预测分类和box坐标的两个卷积层外, 最关键的两行代码分别来自于 _ProposalLayer 和 _AnchorTargetLayer 这两个类, 前者定义在proposal_layer.py文件中, 后者定义在anchor_target_layer.py文件中. 因此, 在继续分析 RPN 网络的其他函数之前, 我建议你先看看这两个类的内部实现(点击名字直接跳转).

reshape 函数

将指定 tensor 的维度改变.

# ./lib/model/rpn/rpn.py
@staticmethod
def reshape(x, d):
    input_shape = x.size()
    x = x.view(
        input_shape[0], # batch 不变
        int(d),
        int(float(input_shape[1]*input_shape[2]) / float(d)),
        input_shape[3]  # ?
    )
    return x

forward 函数

# ./lib/model/rpn/rpn.py
def foward(self, base_feat, im_info, gt_boxes, num_boxes):
    batch_size = base_feat.size(0)

    # 首先得到经过RPN网络第一层卷积的特征图谱
    rpn_conv1 = F.relu(self.RPN_Conv(base_feat), inplace = True)
    # 获得rpn分类score, 1×1的卷积网络
    rpn_cls_score = self.RPN_cls_score(rpn_conv1)

    rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2) # 将score按照前后景概率reshape
    rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1) # 利用softmax在第一个维度上, 也就是前后景概率的维度上, 将socre转换为概率
    rpn_cls_prob = self.reshape(rpn_cls_prob_reshape, self.nc_score_out) # 转化成每一类的概率

    # 获取到 anchor boxes 的 offsets
    rpn_bbox_pred = self.RPN_bbox_pred(rpn_conv1)

    # proposal layer 候选框提取层
    cfg_key = "TRAIN" if self.training else "TEST"

    # 输入给 RPN_proposal 对应的 forward 函数的 input 是一个包含4个元素的元组
    rois = self.RPN_proposal((rpn_cls_prob.data, rpn_bbox_pred.data, im_info, cfg_KEY))

    self.rpn_loss_cls = 0
    self.rpn_loss_box = 0

    # 生成训练时产生的预测结果, 并且构造rpn损失
    if self.training:
        assert gt_boxes is not None
        # 利用 RPN_anchor_target 得到rpn的计算结果
        rpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes))

        # TODO loss的计算
        # 计算分类 loss


        # 计算 bbox 回归 loss

proposal_layer.py 文件

该文件中定义了类 _ProposalLayer, 其功能主要是根据规则的 anchor box 生成对应的 proposal box

init() 函数

# ./lib/model/rpn/proposal_layer.py
class _ProposalLayer(nn.Module):
    """
    该类通过在一系列的标准box(anchor box)上应用
    estimated bounding-box transformations
    来输出目标检测的候选框
    """
    def __init__(self, feat_stride, scales, ratios):
        super(_ProposalLayer, self).__init__()

        self._feat_stride = feat_stride # 图谱缩小的倍数
        # from .generate_anchors import generate_anchors
        self._anchors = torch.from_numpy(generate_anchors(scales=np.array(scales), ratios=np.array(ratios))).float()
        self._num_anchors = self._anchors.size(0)

在上面的初始化函数中, 我们可以看到, 调用了 generate_anchors (位于文件 generate_anchor.py中)函数来生成标准的box (anchor box).

forward() 函数

接下来, 我们具体看一下该类的 foraward() 函数的实现方法, 其实现过程体现了 RoI 的生成方式.

# ./lib/model/rpn/proposal_layer.py
def foward(self, input):
    # 算法:
    # 对于每一个 location i 上的 (H,W)
    #       生成以cell i 为中心的 A 个 anchor boxes
    #       将所有预测的到 bbox deltas 应用到 A 个 anchors 上面
    # 在图像上截取该 boxes
    # 移除掉那些宽度或高度不满足要求的 predicted boxes
    # 按照score从高到低对proposal排序
    # 应用NMS筛选一定数量的proposals
    # 在NMS筛选出的proposals中选取N个
    # 返回 top N proposals


    scores = input[0][:, self._num_anchors:, :, :]
    bbox_deltas = input[1]
    im_info = input[2]
    cfg_key = input[3]

    pre_nms_topN = cfg

其他函数

1 2	# ./lib/model/rpn/proposal_layer.py def backward(self, ):

1 2	# ./lib/model/rpn/proposal_layer.py def reshape(self, bottom, top):

1 2	# ./lib/model/rpn/proposal_layer.py def _filter_boxes():

generate_anchors.py 文件

该文件的功能主要用于生成规则的 anchor box.
在 class _ProposalLayer 的初始化函数中, 我们可以看到, 调用了 generate_anchors() 函数来生成标准的box (anchor box), 该函数的具体实现如下:

# ./lib/model/rpn/generate_anchors.py
def generate_anchors(base_size=16, ratios=[0.5, 1, 2], scales=2**np.arange(3, 6)): # ** 代表幂次, 所以 scales = [2^3, 2^4, 2^5] = [8,16,32]
    """base_anchor 的大小为 16×16的, 其坐标为(0,0,15,15)"""
    base_anchor = np.array([1, 1, base_size, base_size]) - 1 # base_anchor = array([ 0,  0, 15, 15])
    # _ratio_enum 为本文件内定义的函数, 作用为相对于每一个anchor枚举所有可能ratios的anchor box.(注意, base_anchor的size只是作用一个过渡使用, 后面的语句会利用scales参数将其覆盖)
    ratio_anchors = _ratio_enum(base_anchor, ratios) # 在给定anchor下, 根据scale的值枚举所有可能的anchor box
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) for i in range(ratio_anchors.shape[0])])

whctrs 函数

# ./lib/model/rpn/generate_anchors.py
def _whctrs(anchor):
    # 返回一个anchor的宽, 高, 以及中心点的(x,y)坐标值
    w = anchor[2] - anchor[0] + 1 # eg:15-0+1 = 16
    h = anchor[3] - anchor[1] + 1 # eg:15-0+1 = 16
    x_ctr = anchor[0] + 0.5 * (w-1) # eg: 0+0.5*(16-1) = 7.5
    y_ctr = anchor[1] + 0.5 * (h-1) # eg:0+0.5*(16-1) = 7.5

mkanchors 函数

# ./lib/model/rpn/generate_anchors.py
def _mkanchors(ws, hs, x_ctr, y_ctr):
    # 给定一组围绕中心点(x_ctr, y_ctr) 的 widths(ws) 和 heights(hs) 序列, 输出对应的 anchors
    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    # anchors里面的坐标分别对应着左上角的坐标和右下角的坐标
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))

ratio_enum 函数

相对于每一个anchor, 遍历其所有可能ratios对应的anchors

# ./lib/model/rpn/generate_anchors.py
def _ratio_enum(anchor, ratios):
    w, h, x_ctr, y_ctr = whctrs(anchor) # 返回一个anchor的宽, 高, 以及中心点的(x,y)坐标值
    size = w * h
    size_ratios = size / ratios
    ws = np.round(np.sqrt(size_ratios))
    hs = np.round(ws * ratios)
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

scale_enum() 函数

# ./lib/model/rpn/generate_anchors.py
def _scale_enum(anchor, scales):
    # 根据给定的anchor(box), 枚举其所有可能scales的anchors(boxes)
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    ws = w * scales
    hs = h * scales
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

anchor_target_layer.py 文件

该文件中定义了class _AnchorTargetLayer(nn.Module), 可以将 anchors 和 groun-truth 匹配起来, 然后生成相应的分类标签和bounding-box的回归targets.

init() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py/class _AnchorTargetLayer(): def __init__(self, feat_stride, scales, ratios):

forward() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py/class _AnchorTargetLayer(): def __init(self, feat_stride, scales, ratios):

backward() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py/class _AnchorTargetLayer(): def backward():

reshape() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py/class _AnchorTargetLayer(): def reshape():

unmap() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py 不是类内部的函数 def _unmap():

compute_targets_batch() 函数

1 2	# ./lib/model/rpn/anchor_target_layer.py 不是类内部的函数 def _compute_targets_batch():

bbox_transform.py 文件

bbox_transform() 函数

bbox_transform_batch() 函数

bbox_transform_inv() 函数

BackBone

FasterRCNN 采用的 backbone 主要有两种, 一种是经典简单的 VGG16 网络, 一种是提取能力更强的 ResNet网络, 接下来我们对这两个网络的实现进行代码说明.

VGG16

首先, 我们可以利用 PyTorch 的 torchvision.models 来载入 VGG16 模型(当然也可以自己实现, 不过这不在本文的讨论范围内), 从卷积核的size等信息可以看出, 这已经是优化过的 vgg16 网络, 在网络层参数设置上和原始的 vgg16 有略微不同, 但大体上结构是相似的, 如下所示:

1 2	import torchvision vgg = models.vgg16()

可以看一下 vgg16 网络的内部结构(可以依照此结构来复现 vgg16 网络):

1	print(vgg)

输出如下:

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

ResNet

ResNet 的结构稍微复杂一些. 这里就不再贴出了, 不过和 VGGNet 相同, 都是利用 torchvision.mdoels 模块来导入的.

Faster RCNN 模型结构

在了解了以上两种模型骨架之后, 我们首先创建 Faster RCNN 的整个结构(包含 RoIPool 和 RPN, 不过, 这里只是先用作占位, 具体实现在后面).

class _fasterRCNN(nn.Module): # 以单下划线开头, 表明为内部函数
    """faster RCNN"""
    def __init__(self, classes, class_agnostic):
        super(_FasterRCNN, self).__init__()
        self.classes = classes #
        self.n_classes = len(self.classes) # 类别个数
        self.class_agnostic = class_agnostic # 标志是否是类别不可知的, 默认为False, 即类别是可知的
        # loss
        self.RCNN_loss_cls = 0 # 分类损失
        self.RCNN_loss_bbox = 0 # 边框回归损失

        # 定义RPN网络
        # from model.rpn.rpn import _RPN
        self.RCNN_rpn = _RPN(self.dout_base_model)
        self.

Faster-RCNN 源码实现 (PyTorch)

RPN

rpn.py 文件:

init 函数

reshape 函数

forward 函数

proposal_layer.py 文件

init() 函数

forward() 函数

其他函数

generate_anchors.py 文件

whctrs 函数

mkanchors 函数

ratio_enum 函数

scale_enum() 函数

anchor_target_layer.py 文件

init() 函数

forward() 函数

backward() 函数

reshape() 函数

unmap() 函数

compute_targets_batch() 函数

bbox_transform.py 文件

bbox_transform() 函数

bbox_transform_batch() 函数

bbox_transform_inv() 函数

BackBone

VGG16

ResNet

Faster RCNN 模型结构

RoI Pooling