2023年11月28日发(作者:沃尔沃车型价格)

基于深度学习的车牌检测识别(Pytorch)(ResNet+Transformer

车牌识别

概述

基于深度学习的车牌识别,其中,车辆检测?络直接使?YOLO侦测。?后,才是使??络侦测车牌与识别车牌号。

车牌的侦测?络,采?的是resnet18,?络输出检测边框的仿射变换矩阵,可检测任意形状的四边形。

车牌号序列模型,采?Resnet18+transformer模型,直接输出车牌号序列。

数据集上,车牌检测使?CCPD 2019数据集,在训练检测模型的时候,会使?程序?成虚假的车牌,覆盖于数据集图?上,来加强检测的

能?。

车牌号的序列识别,直接使?程序?成的车牌图?训练,并佐以适当的图像增强?段。模型的训练直接采?端到端的训练?式,输?图?,

直接输出车牌号序列,损失采?CTCLoss。

?、?络模型

1、车牌的侦测?络模型:

?络代码定义如下:

class WpodNet(nn.Module):

def __init__(self):

\"\"\"

车牌侦测?络,直接使?Resnet18,仅改变输出层。

\"\"\"

super(WpodNet, self).__init__()

resnet = resnet18(True)

backbone = list(resnet.children())

self.backbone = nn.Sequential(

nn.BatchNorm2d(3),

*backbone[:3],

*backbone[4:8],

)

self.detection = nn.Conv2d(512, 8, 3, 1, 1)

def forward(self, x):

features = self.backbone(x)

out = self.detection(features)

out = rearrange(out, \'n c h w -> n h w c\') #

变换形状

return out

该?络,相当于直接对图?划分cell,即在16X16的格?中,侦测车牌,输出的为该车牌边框的反射变换矩阵。

2、车牌号的序列识别?络:

车牌号序列识别的主??络:采?的是ResNet18+transformer,其中有ResNet18完成对图?的编码?作,再由transformer解码为对

应的字符。

?络代码定义如下:

from torch import nn

from torchvision.models import resnet18

import torch

from einops import rearrange

class OcrNet(nn.Module):

def __init__(self,num_class):

super(OcrNet, self).__init__()

resnet = resnet18(True)

backbone = list(resnet.children())

self.backbone = nn.Sequential(

nn.BatchNorm2d(3),

*backbone[:3],

*backbone[4:8],

) # ResNet18

创建

self.decoder = nn.Sequential(

Block(512, 8, False),

Block(512, 8, False),

) # Transformer

构成的解码器

self.out_layer = nn.Linear(512, num_class) #

线性输出层

self.abs_pos_emb = AbsPosEmb((3, 9), 512) #

绝对位置编码

def forward(self,x):

x = self.backbone(x)

x = rearrange(x,\'n c h w -> n (w h) c\')

x = x + self.abs_pos_emb()

x = self.decoder(x)

x = rearrange(x, \'n s v -> s n v\')

return self.out_layer(x)

其中的Block类的代码如下:

class Block(nn.Module):

r\"\"\"

Args:

embed_dim: 词向量的特征数。

num_head: 多头注意?的头数。

is_mask: 是否添加掩码。是,则?络只能看到每个词前的内容,??法看到后?的内容。

Shape:

- Input: N,S,V (批次,序列数,词向量特征数)

- Output:same shape as the input

Examples::

# >>> m = Block(720, 12)

# >>> x = (4, 13, 720)

# >>> output = m(x)

# >>> print()

# ([4, 13, 720])

\"\"\"

def __init__(self, embed_dim, num_head, is_mask):

super(Block, self).__init__()

self.ln_1 = nn.LayerNorm(embed_dim)

self.attention = SelfAttention(embed_dim, num_head, is_mask)

self.ln_2 = nn.LayerNorm(embed_dim)

self.feed_forward = nn.Sequential(

nn.Linear(embed_dim, embed_dim * 6),

nn.ReLU(),

nn.Linear(embed_dim * 6, embed_dim)

)

def forward(self, x):

\'\'\'计算多头?注意?\'\'\'

attention = self.attention(self.ln_1(x))

\'\'\'残差\'\'\'

x = attention + x

x = self.ln_2(x)

\'\'\'计算feed forward部分\'\'\'

h = self.feed_forward(x)

x = h + x #

增加残差

return x

位置编码的代码如下:

class AbsPosEmb(nn.Module):

def __init__(

self,

fmap_size,

dim_head

):

super().__init__()

height, width = fmap_size

scale = dim_head ** -0.5

self.height = nn.Parameter(torch.randn(height, dim_head) * scale)

self.width = nn.Parameter(torch.randn(width, dim_head) * scale)

def forward(self):

emb = rearrange(self.height, \'h d -> h () d\') + rearrange(self.width, \'w d -> () w d\')

emb = rearrange(emb, \' h w d -> (w h) d\')

return emb

Block类使?的?注意?代码如下:

class SelfAttention(nn.Module):

r\"\"\"多头?注意?

Args:

embed_dim: 词向量的特征数。

num_head: 多头注意?的头数。

is_mask: 是否添加掩码。是,则?络只能看到每个词前的内容,??法看到后?的内容。

Shape:

- Input: N,S,V (批次,序列数,词向量特征数)

- Output:same shape as the input

Examples::

# >>> m = SelfAttention(720, 12)

# >>> x = (4, 13, 720)

# >>> output = m(x)

# >>> print()

# ([4, 13, 720])

\"\"\"

def __init__(self, embed_dim, num_head, is_mask=True):

super(SelfAttention, self).__init__()

assert embed_dim % num_head == 0

self.num_head = num_head

self.is_mask = is_mask

self.linear1 = nn.Linear(embed_dim, 3 * embed_dim)

self.linear2 = nn.Linear(embed_dim, embed_dim)

def forward(self, x):

\'\'\'x 形状 N,S,V\'\'\'

x = self.linear1(x) # N,S,3V

形状变换为

n, s, v = x.shape

\"\"\"分出头来,形状变换为 N,S,H,V\"\"\"

x = x.reshape(n, s, self.num_head, -1)

\"\"\"换轴,形状变换? N,H,S,V\"\"\"

x = torch.transpose(x, 1, 2)

\'\'\'分出Q,K,V\'\'\'

query, key, value = torch.chunk(x, 3, -1)

dk = value.shape[-1] ** 0.5

\'\'\'计算?注意?\'\'\'

w = torch.matmul(query, key.transpose(-1, -2)) / dk # w N,H,S,S

形状

if self.is_mask:

\"\"\"?成掩码\"\"\"

mask = torch.tril(torch.ones(w.shape[-1], w.shape[-1])).to(w.device)

w = w * mask - 1e10 * (1 - mask)

w = torch.softmax(w, dim=-1) # softmax

归?化

attention = torch.matmul(w, value) # , N,H,S,V

各个向量根据得分合并合并形状

\'\'\'换轴? N,S,H,V\'\'\'

attention = attention.permute(0, 2, 1, 3)

n, s, h, v = attention.shape

\'\'\'合并HV,相当于吧每个头的结果cat在?起。形状?N,S,V\'\'\'

attention = attention.reshape(n, s, h * v)

return self.linear2(attention) #

经过线性层后输出

?、数据加载

1、车牌号的数据加载

同过程序?成?组车牌号:

再通过数据增强,

主要包括:

随机污损:

?斯模糊:

仿射变换,粘贴于?张?图中:

四边形的四个?的位置随机偏移些许后扣出:

然后直接训练车牌号的序列识别?络,

loss_func = nn.CTCLoss(blank=0, zero_infinity=True)

optimizer = torch.optim.Adam(self.net.parameters(), lr=0.00001)

优化器直接使?Adam,损失函数为CTCLoss。

2、车牌检测的数据加载

数据使?的是CCPD数据集,在这过程中,会随机的使??成车牌,覆盖原始图?的车牌位置,来训练?络对车牌的检测能?。

if random.random() < 0.5:

plate, _ = self.draw()

plate = cv2.cvtColor(plate, cv2.COLOR_RGB2BGR)

plate = self.smudge(plate) #

随机污损

image = enhance.apply_plate(image, points, plate) #

粘贴车牌图?于数据图中

[x1, y1, x2, y2, x4, y4, x3, y3] = points

points = [x1, x2, x3, x4, y1, y2, y3, y4]

image, pts = enhance.augment_detect(image, points, 208)

三、训练

分别训练即可

其中,侦测?络的损失计算,如下:

def count_loss(self, predict, target):

condition_positive = target[:, :, :, 0] == 1 #

筛选标签

condition_negative = target[:, :, :, 0] == 0

predict_positive = predict[condition_positive]

predict_negative = predict[condition_negative]

target_positive = target[condition_positive]

target_negative = target[condition_negative]

n, v = predict_positive.shape

if n > 0:

loss_c_positive = self.c_loss(predict_positive[:, 0:2], target_positive[:, 0].long())

else:

loss_c_positive = 0

loss_c_nagative = self.c_loss(predict_negative[:, 0:2], target_negative[:, 0].long())

loss_c = loss_c_nagative + loss_c_positive

if n > 0:

affine = torch.cat(

(

predict_positive[:, 2:3],

predict_positive[:,3:4],

predict_positive[:,4:5],

predict_positive[:,5:6],

predict_positive[:,6:7],

predict_positive[:,7:8]

),

dim=1

)

# print()

# exit()

trans_m = affine.reshape(-1, 2, 3)

unit = torch.tensor([[-0.5, -0.5, 1], [0.5, -0.5, 1], [0.5, 0.5, 1], [-0.5, 0.5, 1]]).transpose(0, 1).to(

trans_m.device).float()

# print(unit)

point_pred = torch.einsum(\'n j k, k d -> n j d\', trans_m, unit)

point_pred = rearrange(point_pred, \'n j k -> n (j k)\')

loss_p = self.l1_loss(point_pred, target_positive[:, 1:])

else:

loss_p = 0

# exit()

return loss_c, loss_p

侦测?络输出的反射变换矩阵,但对车牌位置的标签给的是四个?点的位置,所以需要响应转换后,做损失。其中,该cell是否有?标,使

?CrossEntropyLoss,?对车牌位置损失,采?的则是L1Loss。

四、推理

1、侦测?络的推理

按照?般侦测?络,推理即可。只是,多了?步将反射变换矩阵转换为边框位置的计算。

另外,在YOLO侦测到得测量图?传?该级进?车牌检测的时候,会做?步操作。代码见下,讲车辆检测框的图?扣出,然后resize到长宽

均为16的整数倍。

h, w, c = image.shape

f = min(288 * max(h, w) / min(h, w), 608) / min(h, w)

_w = int(w * f) + (0 if w % 16 == 0 else 16 - w % 16)

_h = int(h * f) + (0 if h % 16 == 0 else 16 - h % 16)

image = cv2.resize(image, (_w, _h), interpolation=cv2.INTER_AREA)

288?max(h,w)

min(h,w)

,608)/min(h,w)f=min(

2、序列检测?络的推理

对?络输出的序列,进?去重操作即可,如间隔标识符为“*”时:

def deduplication(self, c):

\'\'\'符号去重\'\'\'

temp = \'\'

new = \'\'

for i in c:

if i == temp:

continue

else:

if i == \'*\':

temp = i

continue

new += i

temp = i

return new

五、完整代码

/HibikiJie/LicensePlate

不包含,YOLO使?的部分,?件具有?张测试图?,可供测试使?。如需完整使?,务必??添加测量检测模型及代码。

权重?件:

链接:/s/1r1ymtv0RHG87O4Yut1oUiQ

提取码:6yoj


更多推荐

车牌识别