Torch+SSD实现目标检测口罩分类

导读

参考论文:《高效视频智能分类系统的设计与实现》

本文是通过目标检测实现对视频中人们是否戴着口罩的分类,从这一角度对深度学习在该方向上的算法进行总结。

在充分调研前期的基础上,该系统的模型包括:训练集获取模块,深度学习系统训练模块,单张图像标注模块实时视频检测模块。

系统关键问题在于反复将小权重矩阵应用于视频中的所有帧,由于这些模型查看视频中的每个帧,因此即使内存占用量很小,浮动点运算 (FLOP) 的数量仍然很大。因此专注于构建具有较少 FLOP 的计算效率模型,SSD采用金字塔结构,即利用了conv4-3/conv-7/conv6-2/conv7-2/conv8_2/conv9_2这些大小不同的feature maps,在多个feature maps上同时进行softmax分类和位置回归,待检测图像作为输入候选框,采用滑动窗口方法;基于颜色/纹理/形状的方法,将PCA语义特征进行降维,最后通过NMS先选一个置信度最高的框,然后和剩下的候选框都计算一个IOU,如果IOU大于某个阈值,说明这个框和候选框重叠区域很大,可以将该候选框删除,置信度直接降为0,从而过滤候选框重叠的部分,以确保最终视频分类训练结果的准确度。

Pytorch各版本下载链接:https://download.pytorch.org/whl/torch_stable.html

本系统环境:

cuda10

python3.6

torch1.1

torchvison0.4

摘要

近年来,面部识别一直都是机器视觉领域中的一个热门方向。其中YOLO、SSD、R-CNN都是应用于目标检测的模型,在人群密集的公共场景检测精确度低,为了有效解决候选框遮挡严重的场景下人脸检测的问题,使用CNN共享卷积特征,一次性提取图像特征得到SSD候选框规格。分类检测采用神经网络共享卷积层一次性提取图像特征,位置检测采用线性回归固定候选框,通过结合这两个目标检测的结果获得最终的目标。最终结果显示,该算法可以准确地检测出脸部的坐标值信息,有效地解决遮挡问题,从而提升了目标是否佩戴口罩的分类检测精度。

数据集构建

1. 制作/获取数据集

WIDER Face Dataset 数据集是针对人们脸部的识别检测,如图1所示,它总共包含282692 张脸部图像和21192 张图像,并且在表述、装修和灯照等因素具有不一样的影响。其次它将近30万个人的脸部,它是当今世界图像数量最多、情景是最有难度以及挑战性非常高的面部识别数据集。因为它具有最精准的标注信息和最高的挑战性,故Wider Face Dataset已经超越了FDDB作为各大企业和研究所相互竞争的人脸识别的范例,很多著名的高校研究所都已经在该数据集的基础上计算出了各自的算法成效。

本系统使用 Wider Face Dataset对SSD模型进行多次的测试和验证。该数据集中所标人脸尺度不一、姿态各异,人脸的尺度多分布于 40 ~ 140 像素之间,并存在不同程度的物体遮挡和光照差异,是目前人脸检测领域广为认可的基准数据集。在系统中通过对Wider Face Dataset中 的3114张图像进行模型验证,其中验证集共包含780张图像,并且对测试集中不同复杂度的子集进行模型识别测试。

 

2.打标签

标注流程如图2所示:

面部:使用正方形候选框标记,不计小于三十二像素、不清晰的图像或图像扭曲等因素的候选框,这些因素的候选框会被删除掉,不会对它们进行继续分类。

眼睛:面部中一双眼睛的坐标值应该标记出来。

遮挡面部的位置:遮挡面部的东西要用长方形的候选框标注出来,比如眼镜。

面部朝向:图像中的面部要定义五中走向:左、右、前、左前,右前。

遮挡度:整个面部主要为四种类型:下巴、嘴巴、鼻子、眼睛。

遮挡类型:分为4种遮挡类型:人与人之间的遮挡、地理环境的遮挡、被人的某个特殊部位的遮挡、带着眼镜的遮挡。

总结: MAFA Dataset是一个人们脸部被遮挡的目标识别数据集,他的图像大多数都是从互联网上得到的。如图3所示, MAFA Dataset数据集一共包括和29700张普通的图像和24705张被不同类型和不同程度遮挡面部的图像,其中被遮挡面部的图像中有上面所述的六种特征,并且这几千张图像全是由公司员工手工进行标注的。

3.生成.txt文件(make_txt.py)

#!/opt/conda/envs/python35-paddle120-env/bin/python
import os
import random
import sys 
sys.path.append('/home/aistudio/ssd.pytorch-master/data/')
# 训练集:训练90%,验证集10%
trainval_percent = 0.1
# 训练集90%,测试集10%
train_percent = 0.9
xmlfilepath = 'mask_or_not/Annotations'
txtsavepath = 'mask_or_not/ImagesSets'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)
print(num)
list = range(num)
train_val_num = int(num * train_percent)
print(train_val_num)
val = int(train_val_num * trainval_percent)
print(val)
trainval = random.sample(list, train_val_num)
print(len(trainval))
validation = random.sample(trainval, val)

ftrainval = open('/home/aistudio/ssd.pytorch-master/data/mask_or_not/ImageSets/trainval.txt', 'w')
ftest = open('/home/aistudio/ssd.pytorch-master/data/mask_or_not/ImageSets/test.txt', 'w')
ftrain = open('/home/aistudio/ssd.pytorch-master/data/mask_or_not/ImageSets/train.txt', 'w')
fval = open('/home/aistudio/ssd.pytorch-master/data/mask_or_not/ImageSets/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in validation:
            fval.write(name)
        else:
            ftrain.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

txt内容就是各个图片的索引,意味着哪些图片用做训练,哪些用做测试。

SSD模型训练流程

1.修改配置文件(config.py),根据自己的情况进行修改或者添加。

#!/opt/conda/envs/python35-paddle120-env/bin/python
import sys 

# config.py
import os.path

# gets home dir cross platform

HOME = os.path.expanduser("/home/aistudio/ssd.pytorch-master")

# for making bounding boxes pretty
COLORS = ((255, 0, 0, 128), (0, 255, 0, 128), (0, 0, 255, 128),
          (0, 255, 255, 128), (255, 0, 255, 128), (255, 255, 0, 128))

MEANS = (104, 117, 123)


# 新加的
mask = {
    'num_classes': 3,
    'lr_steps': (80000, 100000, 120000), # 学习率变化步数
    # 训练12万次
    'max_iter': 120000,
    # 6个特征图
    'feature_maps': [38, 19, 10, 5, 3, 1],
    'min_dim': 300,
    'steps': [8, 16, 32, 64, 100, 300], # 特征图对应的缩放比
    'min_sizes': [30, 60, 111, 162, 213, 264],
    'max_sizes': [60, 111, 162, 213, 264, 315],
    'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]], # 先验框宽高比
    'variance': [0.1, 0.2],
    'clip': True,
    'name': 'MASK',
}

# SSD300 CONFIGS
voc = {
    'num_classes': 21,
    'lr_steps': (80000, 100000, 120000),
    'max_iter': 120000,
    'feature_maps': [38, 19, 10, 5, 3, 1],
    'min_dim': 300,
    'steps': [8, 16, 32, 64, 100, 300],
    'min_sizes': [30, 60, 111, 162, 213, 264],
    'max_sizes': [60, 111, 162, 213, 264, 315],
    'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]],
    'variance': [0.1, 0.2],
    'clip': True,
    'name': 'VOC',
}

coco = {
    'num_classes': 201,
    'lr_steps': (280000, 360000, 400000),
    'max_iter': 400000,
    'feature_maps': [38, 19, 10, 5, 3, 1],
    'min_dim': 300,
    'steps': [8, 16, 32, 64, 100, 300],
    'min_sizes': [21, 45, 99, 153, 207, 261],
    'max_sizes': [45, 99, 153, 207, 261, 315],
    'aspect_ratios': [[2], [2, 3], [2, 3], [2, 3], [2], [2]],
    'variance': [0.1, 0.2],
    'clip': True,
    'name': 'COCO',
}

2.对voc0712.py进行修改(mask.py)

#!/opt/conda/envs/python35-paddle120-env/bin/python

from .config import HOME
import os.path as osp
import sys 
import os
sys.path.append('/home/aistudio')
import torch
import torch.utils.data as data
import cv2
import numpy as np
if sys.version_info[0] == 2:
    import xml.etree.cElementTree as ET
else:
    import xml.etree.ElementTree as ET


# 类别,这里有两类,一类为face_mask:戴口罩,另一类为face:不带口罩
MASK_CLASSES = (  # always index 0
   'face','face_mask'
   )

# note: if you used our download scripts, this should be right
home = os.path.abspath(os.path.dirname(__file__))

#MASK_ROOT = osp.join(HOME, "data/mask_or_not")
MASK_ROOT = osp.join(home, "mask_or_not")

class MASKAnnotationTransform(object):
    """Transforms a MASK annotation into a Tensor of bbox coords and label index
    Initilized with a dictionary lookup of classnames to indexes

    Arguments:
        class_to_ind (dict, optional): dictionary lookup of classnames -> indexes
            (default: alphabetic indexing of MASK's 2 classes)
        keep_difficult (bool, optional): keep difficult instances or not
            (default: False)
        height (int): height
        width (int): width
    获取xml里面的坐标值和label,并将坐标值转换成0到1
    """

    def __init__(self, class_to_ind=None, keep_difficult=False):
        # 将类别名字转换成数字label
        self.class_to_ind = class_to_ind or dict(
            zip(MASK_CLASSES, range(len(MASK_CLASSES))))
        # 在xml里面,有个difficult的参数,这个表示特别难识别的目标,一般是小目标或者遮挡严重的目标
        # 因此,可以通过这个参数,忽略这些目标
        self.keep_difficult = keep_difficult

    def __call__(self, target, width, height):
        """
        Arguments:
            target (annotation) : the target annotation to be made usable
                will be an ET.Element
        Returns:
            a list containing lists of bounding boxes  [bbox coords, class name]
        将一张图里面包含若干个目标,获取这些目标的坐标值,并转换成0到1,并得到其label
        :param target: xml格式
        :return: 返回List,每个目标对应一行,每行包括5个参数[xmin, ymin, xmax, ymax, label_ind]
        """
        res = []
        for obj in target.iter('object'):
            difficult = int(obj.find('difficult').text) == 1
            if not self.keep_difficult and difficult:
                continue
            name = obj.find('name').text.lower().strip() # text是获得目标的名称,lower将字符转换成小写,strip去除前后空格
            bbox = obj.find('bndbox')

            pts = ['xmin', 'ymin', 'xmax', 'ymax']
            bndbox = []
            for i, pt in enumerate(pts):
                cur_pt = int(bbox.find(pt).text) - 1
                # scale height or width
                # 将坐标转换成[0,1],这样图片尺寸发生变化的时候,真实框也随之变化,即平移不变形
                cur_pt = cur_pt / width if i % 2 == 0 else cur_pt / height
                bndbox.append(cur_pt)
            label_idx = self.class_to_ind[name]
            bndbox.append(label_idx)
            res += [bndbox]  # [xmin, ymin, xmax, ymax, label_ind]
            # img_id = target.find('filename').text[:-4]

        return res  # [[xmin, ymin, xmax, ymax, label_ind], ... ]


class MASKDetection(data.Dataset):
    """VOC Detection Dataset Object

    input is image, target is annotation

    Arguments:
        root (string): filepath to VOCdevkit folder.
        image_set (string): imageset to use (eg. 'train', 'val', 'test')
        transform (callable, optional): transformation to perform on the
            input image
        图片预处理的方式,这里使用了大量数据增强的方式
        target_transform (callable, optional): transformation to perform on the
            target `annotation`
            (eg: take in caption string, return tensor of word indices)
        真实框预处理的方式
        dataset_name (string, optional): which dataset to load
            (default: 'VOC2007')
    """
    # image_sets=[('2007', 'trainval'), ('2012', 'trainval')],
    def __init__(self, root,
                 image_sets='trainval',
                 transform=None, target_transform=MASKAnnotationTransform(),
                 dataset_name='MASK'):
        self.root = root
        self.image_set = image_sets
        self.transform = transform   # 定义图像转换方法
        self.target_transform = target_transform   #定义标签的转换方法
        self.name = dataset_name
        self._annopath = osp.join('%s', 'Annotations', '%s.xml')
        self._imgpath = osp.join('%s', 'JPEGImages', '%s.jpg')
        self.ids = list()
        for line in open(MASK_ROOT+'/ImageSets/'+self.image_set+'.txt'):
            self.ids.append((MASK_ROOT, line.strip()))

    def __getitem__(self, index):
        im, gt, h, w = self.pull_item(index)

        return im, gt

    def __len__(self):
        return len(self.ids)

    def pull_item(self, index):
        img_id = self.ids[index]

        target = ET.parse(self._annopath % img_id).getroot()
        img = cv2.imread(self._imgpath % img_id)
        height, width, channels = img.shape

        if self.target_transform is not None:
            # 真实框处理
            target = self.target_transform(target, width, height)

        if self.transform is not None:
            # 图像预处理,进行数据增强,只在训练进行数据增强,测试的时候不需要
            target = np.array(target)
            # 前四个和最后一个类别
            img, boxes, labels = self.transform(img, target[:, :4], target[:, 4])
            # to rgb
            img = img[:, :, (2, 1, 0)]  #opencv读入图像的顺序是BGR,该操作将图像转为RGB
            # img = img.transpose(2, 0, 1)
            target = np.hstack((boxes, np.expand_dims(labels, axis=1)))
        return torch.from_numpy(img).permute(2, 0, 1), target, height, width
        # return torch.from_numpy(img), target, height, width

    def pull_image(self, index):
        '''Returns the original image object at index in PIL form

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to show
        Return:
            PIL img
        测试时候用,返回opencv读取的图像
        '''
        img_id = self.ids[index]
        return cv2.imread(self._imgpath % img_id, cv2.IMREAD_COLOR)

    def pull_anno(self, index):
        '''Returns the original annotation of image at index

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to get annotation of
        Return:
            list:  [img_id, [(label, bbox coords),...]]
                eg: ('001718', [('dog', (96, 13, 438, 332))])
        测试时候用,返回图片名称和原始未经缩放的GT框
        '''
        img_id = self.ids[index]
        anno = ET.parse(self._annopath % img_id).getroot()
        gt = self.target_transform(anno, 1, 1)
        return img_id[1], gt

    def pull_tensor(self, index):
        '''Returns the original image at an index in tensor form

        Note: not using self.__getitem__(), as any transformations passed in
        could mess up this functionality.

        Argument:
            index (int): index of img to show
        Return:
            tensorized version of img, squeezed
        '''
        return torch.Tensor(self.pull_image(index)).unsqueeze_(0)

3.修改SSD框架的ssd.py和train.py

4.添加训练好的模型到eval.py,对模型进行验证,得到较好的训练结果ssd300_MASK_88000.pth

单张图像检测

图片测试代码:

#!/opt/conda/envs/python35-paddle120-env/bin/python
from ssd import build_ssd
from data import MASKDetection, MASKAnnotationTransform, MASK_ROOT
from data import MASK_CLASSES as labels
import os
import sys
# module_path = os.path.abspath(os.path.join('..'))
# if module_path not in sys.path:
#     sys.path.append(module_path)

import sys 
sys.path.append('/home/aistudio')
import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
from torch.autograd import Variable
import numpy as np
import cv2
if torch.cuda.is_available():
    torch.set_default_tensor_type('torch.cuda.FloatTensor')

net = build_ssd('test', 300, len(labels)+1)    # initialize SSD
net.load_weights('./weights/ssd300_MASK_88000.pth')

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

image = cv2.imread('mm.jpg', cv2.IMREAD_COLOR)
# testset = MASKDetection(MASK_ROOT, 'test', None, MASKAnnotationTransform())
# img_id = 1
# image = testset.pull_image(img_id)
rgb_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# View the sampled input image before transform
plt.figure(figsize=(10,10))

plt.imshow(rgb_image)

plt.show()

# 对于SSD,在测试时,我们使用一个自定义的BaseTransform调用将图像大小调整为300x300,减去数据集的平均rgb值,并将颜色通道交换为SSD300的输入。
# Pre-process the input.
x = cv2.resize(image, (300, 300)).astype(np.float32)
x -= (104.0, 117.0, 123.0)
x = x.astype(np.float32)
x = x[:, :, ::-1].copy()
plt.imshow(x)
x = torch.from_numpy(x).permute(2, 0, 1)

# 现在只需将图像包装在一个变量中,这样它就可以被PyTorch autograd识别
# SSD Forward Pass
xx = Variable(x.unsqueeze(0))     # wrap tensor in Variable
if torch.cuda.is_available():
    xx = xx.cuda()
y = net(xx)


# Parse the Detections and View Results

# Filter outputs with confidence scores lower than a threshold Here we choose 60%

top_k=10

plt.figure(figsize=(10,10))
colors = plt.cm.hsv(np.linspace(0, 1, 21)).tolist()
plt.imshow(rgb_image)  # plot the image for matplotlib
currentAxis = plt.gca()

detections = y.data
# scale each detection back up to the image
scale = torch.Tensor(rgb_image.shape[1::-1]).repeat(2)
for i in range(detections.size(1)):
    j = 0
    while detections[0,i,j,0] >= 0.6:
        score = detections[0,i,j,0]
        label_name = labels[i-1]
        display_txt = '%s: %.2f'%(label_name, score)
        pt = (detections[0,i,j,1:]*scale).cpu().numpy()
        coords = (pt[0], pt[1]), pt[2]-pt[0]+1, pt[3]-pt[1]+1
        color = colors[i]
        currentAxis.add_patch(plt.Rectangle(*coords, fill=False, edgecolor=color, linewidth=2))
        currentAxis.text(pt[0], pt[1], display_txt, bbox={'facecolor':color, 'alpha':0.5})
        j+=1
plt.savefig('112.png')

运行结果:

非口罩类:

口罩类:

监控视频实时检测

#!/opt/conda/envs/python35-paddle120-env/bin/python

from __future__ import print_function
import sys 
sys.path.append('/home/aistudio')


import torch
from torch.autograd import Variable
import cv2
import time
from imutils.video import FPS, WebcamVideoStream
import argparse

parser = argparse.ArgumentParser(description='Single Shot MultiBox Detection')
parser.add_argument('--weights', default='../weights/ssd300_MASK_88000.pth',
                    type=str, help='Trained state_dict file path')
parser.add_argument('--cuda', default=False, type=bool,
                    help='Use cuda in live demo')
args = parser.parse_args()

COLORS = [(255, 0, 0), (0, 255, 0), (0, 0, 255)]
FONT = cv2.FONT_HERSHEY_SIMPLEX


def cv2_demo(net, transform):
    def predict(frame):
        height, width = frame.shape[:2]
        x = torch.from_numpy(transform(frame)[0]).permute(2, 0, 1)
        x = Variable(x.unsqueeze(0))
        y = net(x)  # forward pass
        detections = y.data
        # scale each detection back up to the image
        scale = torch.Tensor([width, height, width, height])
        for i in range(detections.size(1)):
            j = 0
            while detections[0, i, j, 0] >= 0.6:
                score = detections[0,i,j,0]
                label_name = labels[i-1]
                display_txt = '%s: %.2f'%(label_name, score)
                pt = (detections[0, i, j, 1:] * scale).cpu().numpy()
                cv2.rectangle(frame,
                              (int(pt[0]), int(pt[1])),
                              (int(pt[2]), int(pt[3])),
                              COLORS[i % 3], 2)
                #cv2.putText(frame, labelmap[i - 1], (int(pt[0]), int(pt[1])),
                  #          FONT, 2, (255, 255, 255), 2, cv2.LINE_AA)
                color = COLORS[i]
                cv2.putText(frame, display_txt,(int(pt[0]), int(pt[1])),
                            FONT, 1, (255, 255, 255), 2, cv2.LINE_AA)
                j += 1
        return frame

    # start video stream thread, allow buffer to fill
    print("[INFO] starting threaded video stream...")
    global stream
    stream = WebcamVideoStream(src=0).start()  # default camera
    time.sleep(1.0)
    # start fps timer
    # loop over frames from the video file stream
    while True:
        # grab next frame
        frame = stream.read()
        key = cv2.waitKey(1) & 0xFF

        # update FPS counter
        fps.update()
        frame = predict(frame)

        # keybindings for display
        if key == ord('p'):  # pause
            while True:
                key2 = cv2.waitKey(1) or 0xff
                cv2.imshow('frame', frame)
                if key2 == ord('p'):  # resume
                    break
        cv2.imshow('frame', frame)
        if key == 27:  # exit
            break


if __name__ == '__main__':
    import sys
    from os import path
    sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))
    from data import MASKDetection, MASKAnnotationTransform, MASK_ROOT
    from data import MASK_CLASSES as labels 
    from data import BaseTransform, VOC_CLASSES as labelmap
    from ssd import build_ssd

    net = build_ssd('test', 300, 3)    # initialize SSD
    #net.load_state_dict(torch.load('../weights/ssd300_MASK_88000.pth'))
    net.load_weights('../weights/ssd300_MASK_88000.pth')

    transform = BaseTransform(net.size, (104/256.0, 117/256.0, 123/256.0))

    fps = FPS().start()
    cv2_demo(net.eval(), transform)
    # stop the timer and display FPS information
    fps.stop()

    print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
    print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

    # cleanup
    cv2.destroyAllWindows()
    stream.stop()

运行结果截图:

代码(包括数据集)可关注微信公众号:IamZLT,后台回复:口罩,即可获取。

微信公众号:

 

点赞
  1. mingxinbiji.cn说道:
    Google Chrome Windows 10

    个人认为这个就可以发会议论文了
    最好做与其他算法的比较与优化

    1. ZLT说道:
      Google Chrome Windows 10

      这都是参考的文章开篇的论文里的内容,用于学习来着,只加了自己的理解和一些解释

发表评论

电子邮件地址不会被公开。必填项已用 * 标注