2023年11月28日发(作者:沃尔沃车型价格)
基于深度学习的车牌检测识别(Pytorch)(ResNet+Transformer)
车牌识别
概述
基于深度学习的车牌识别,其中,车辆检测?络直接使?YOLO侦测。?后,才是使??络侦测车牌与识别车牌号。
车牌的侦测?络,采?的是resnet18,?络输出检测边框的仿射变换矩阵,可检测任意形状的四边形。
车牌号序列模型,采?Resnet18+transformer模型,直接输出车牌号序列。
数据集上,车牌检测使?CCPD 2019数据集,在训练检测模型的时候,会使?程序?成虚假的车牌,覆盖于数据集图?上,来加强检测的
能?。
车牌号的序列识别,直接使?程序?成的车牌图?训练,并佐以适当的图像增强?段。模型的训练直接采?端到端的训练?式,输?图?,
直接输出车牌号序列,损失采?CTCLoss。
?、?络模型
1、车牌的侦测?络模型:
?络代码定义如下:
class WpodNet(nn.Module):
def __init__(self):
\"\"\"
车牌侦测?络,直接使?Resnet18,仅改变输出层。
\"\"\"
super(WpodNet, self).__init__()
resnet = resnet18(True)
backbone = list(resnet.children())
self.backbone = nn.Sequential(
nn.BatchNorm2d(3),
*backbone[:3],
*backbone[4:8],
)
self.detection = nn.Conv2d(512, 8, 3, 1, 1)
def forward(self, x):
features = self.backbone(x)
out = self.detection(features)
out = rearrange(out, \'n c h w -> n h w c\') #
变换形状
return out
该?络,相当于直接对图?划分cell,即在16X16的格?中,侦测车牌,输出的为该车牌边框的反射变换矩阵。
2、车牌号的序列识别?络:
车牌号序列识别的主??络:采?的是ResNet18+transformer,其中有ResNet18完成对图?的编码?作,再由transformer解码为对
应的字符。
?络代码定义如下:
from torch import nn
from torchvision.models import resnet18
import torch
from einops import rearrange
class OcrNet(nn.Module):
def __init__(self,num_class):
super(OcrNet, self).__init__()
resnet = resnet18(True)
backbone = list(resnet.children())
self.backbone = nn.Sequential(
nn.BatchNorm2d(3),
*backbone[:3],
*backbone[4:8],
) # ResNet18
创建
self.decoder = nn.Sequential(
Block(512, 8, False),
Block(512, 8, False),
) # Transformer
由构成的解码器
self.out_layer = nn.Linear(512, num_class) #
线性输出层
self.abs_pos_emb = AbsPosEmb((3, 9), 512) #
绝对位置编码
def forward(self,x):
x = self.backbone(x)
x = rearrange(x,\'n c h w -> n (w h) c\')
x = x + self.abs_pos_emb()
x = self.decoder(x)
x = rearrange(x, \'n s v -> s n v\')
return self.out_layer(x)
其中的Block类的代码如下:
class Block(nn.Module):
r\"\"\"
Args:
embed_dim: 词向量的特征数。
num_head: 多头注意?的头数。
is_mask: 是否添加掩码。是,则?络只能看到每个词前的内容,??法看到后?的内容。
Shape:
- Input: N,S,V (批次,序列数,词向量特征数)
- Output:same shape as the input
Examples::
# >>> m = Block(720, 12)
# >>> x = (4, 13, 720)
# >>> output = m(x)
# >>> print()
# ([4, 13, 720])
\"\"\"
def __init__(self, embed_dim, num_head, is_mask):
super(Block, self).__init__()
self.ln_1 = nn.LayerNorm(embed_dim)
self.attention = SelfAttention(embed_dim, num_head, is_mask)
self.ln_2 = nn.LayerNorm(embed_dim)
self.feed_forward = nn.Sequential(
nn.Linear(embed_dim, embed_dim * 6),
nn.ReLU(),
nn.Linear(embed_dim * 6, embed_dim)
)
def forward(self, x):
\'\'\'计算多头?注意?\'\'\'
attention = self.attention(self.ln_1(x))
\'\'\'残差\'\'\'
x = attention + x
x = self.ln_2(x)
\'\'\'计算feed forward部分\'\'\'
h = self.feed_forward(x)
x = h + x #
增加残差
return x
位置编码的代码如下:
class AbsPosEmb(nn.Module):
def __init__(
self,
fmap_size,
dim_head
):
super().__init__()
height, width = fmap_size
scale = dim_head ** -0.5
self.height = nn.Parameter(torch.randn(height, dim_head) * scale)
self.width = nn.Parameter(torch.randn(width, dim_head) * scale)
def forward(self):
emb = rearrange(self.height, \'h d -> h () d\') + rearrange(self.width, \'w d -> () w d\')
emb = rearrange(emb, \' h w d -> (w h) d\')
return emb
Block类使?的?注意?代码如下:
class SelfAttention(nn.Module):
r\"\"\"多头?注意?
Args:
embed_dim: 词向量的特征数。
num_head: 多头注意?的头数。
is_mask: 是否添加掩码。是,则?络只能看到每个词前的内容,??法看到后?的内容。
Shape:
- Input: N,S,V (批次,序列数,词向量特征数)
- Output:same shape as the input
Examples::
# >>> m = SelfAttention(720, 12)
# >>> x = (4, 13, 720)
# >>> output = m(x)
# >>> print()
# ([4, 13, 720])
\"\"\"
def __init__(self, embed_dim, num_head, is_mask=True):
super(SelfAttention, self).__init__()
assert embed_dim % num_head == 0
self.num_head = num_head
self.is_mask = is_mask
self.linear1 = nn.Linear(embed_dim, 3 * embed_dim)
self.linear2 = nn.Linear(embed_dim, embed_dim)
def forward(self, x):
\'\'\'x 形状 N,S,V\'\'\'
x = self.linear1(x) # N,S,3V
形状变换为
n, s, v = x.shape
\"\"\"分出头来,形状变换为 N,S,H,V\"\"\"
x = x.reshape(n, s, self.num_head, -1)
\"\"\"换轴,形状变换? N,H,S,V\"\"\"
x = torch.transpose(x, 1, 2)
\'\'\'分出Q,K,V\'\'\'
query, key, value = torch.chunk(x, 3, -1)
dk = value.shape[-1] ** 0.5
\'\'\'计算?注意?\'\'\'
w = torch.matmul(query, key.transpose(-1, -2)) / dk # w N,H,S,S
形状
if self.is_mask:
\"\"\"?成掩码\"\"\"
mask = torch.tril(torch.ones(w.shape[-1], w.shape[-1])).to(w.device)
w = w * mask - 1e10 * (1 - mask)
w = torch.softmax(w, dim=-1) # softmax
归?化
attention = torch.matmul(w, value) # , N,H,S,V
各个向量根据得分合并合并形状
\'\'\'换轴? N,S,H,V\'\'\'
attention = attention.permute(0, 2, 1, 3)
n, s, h, v = attention.shape
\'\'\'合并H,V,相当于吧每个头的结果cat在?起。形状?N,S,V\'\'\'
attention = attention.reshape(n, s, h * v)
return self.linear2(attention) #
经过线性层后输出
?、数据加载
1、车牌号的数据加载
同过程序?成?组车牌号:
再通过数据增强,
主要包括:
随机污损:
?斯模糊:
仿射变换,粘贴于?张?图中:
四边形的四个?的位置随机偏移些许后扣出:
然后直接训练车牌号的序列识别?络,
loss_func = nn.CTCLoss(blank=0, zero_infinity=True)
optimizer = torch.optim.Adam(self.net.parameters(), lr=0.00001)
优化器直接使?Adam,损失函数为CTCLoss。
2、车牌检测的数据加载
数据使?的是CCPD数据集,在这过程中,会随机的使??成车牌,覆盖原始图?的车牌位置,来训练?络对车牌的检测能?。
if random.random() < 0.5:
plate, _ = self.draw()
plate = cv2.cvtColor(plate, cv2.COLOR_RGB2BGR)
plate = self.smudge(plate) #
随机污损
image = enhance.apply_plate(image, points, plate) #
粘贴车牌图?于数据图中
[x1, y1, x2, y2, x4, y4, x3, y3] = points
points = [x1, x2, x3, x4, y1, y2, y3, y4]
image, pts = enhance.augment_detect(image, points, 208)
三、训练
分别训练即可
其中,侦测?络的损失计算,如下:
def count_loss(self, predict, target):
condition_positive = target[:, :, :, 0] == 1 #
筛选标签
condition_negative = target[:, :, :, 0] == 0
predict_positive = predict[condition_positive]
predict_negative = predict[condition_negative]
target_positive = target[condition_positive]
target_negative = target[condition_negative]
n, v = predict_positive.shape
if n > 0:
loss_c_positive = self.c_loss(predict_positive[:, 0:2], target_positive[:, 0].long())
else:
loss_c_positive = 0
loss_c_nagative = self.c_loss(predict_negative[:, 0:2], target_negative[:, 0].long())
loss_c = loss_c_nagative + loss_c_positive
if n > 0:
affine = torch.cat(
(
predict_positive[:, 2:3],
predict_positive[:,3:4],
predict_positive[:,4:5],
predict_positive[:,5:6],
predict_positive[:,6:7],
predict_positive[:,7:8]
),
dim=1
)
# print()
# exit()
trans_m = affine.reshape(-1, 2, 3)
unit = torch.tensor([[-0.5, -0.5, 1], [0.5, -0.5, 1], [0.5, 0.5, 1], [-0.5, 0.5, 1]]).transpose(0, 1).to(
trans_m.device).float()
# print(unit)
point_pred = torch.einsum(\'n j k, k d -> n j d\', trans_m, unit)
point_pred = rearrange(point_pred, \'n j k -> n (j k)\')
loss_p = self.l1_loss(point_pred, target_positive[:, 1:])
else:
loss_p = 0
# exit()
return loss_c, loss_p
侦测?络输出的反射变换矩阵,但对车牌位置的标签给的是四个?点的位置,所以需要响应转换后,做损失。其中,该cell是否有?标,使
?CrossEntropyLoss,?对车牌位置损失,采?的则是L1Loss。
四、推理
1、侦测?络的推理
按照?般侦测?络,推理即可。只是,多了?步将反射变换矩阵转换为边框位置的计算。
另外,在YOLO侦测到得测量图?传?该级进?车牌检测的时候,会做?步操作。代码见下,讲车辆检测框的图?扣出,然后resize到长宽
均为16的整数倍。
h, w, c = image.shape
f = min(288 * max(h, w) / min(h, w), 608) / min(h, w)
_w = int(w * f) + (0 if w % 16 == 0 else 16 - w % 16)
_h = int(h * f) + (0 if h % 16 == 0 else 16 - h % 16)
image = cv2.resize(image, (_w, _h), interpolation=cv2.INTER_AREA)
288?max(h,w)
min(h,w)
,608)/min(h,w)f=min(
2、序列检测?络的推理
对?络输出的序列,进?去重操作即可,如间隔标识符为“*”时:
def deduplication(self, c):
\'\'\'符号去重\'\'\'
temp = \'\'
new = \'\'
for i in c:
if i == temp:
continue
else:
if i == \'*\':
temp = i
continue
new += i
temp = i
return new
五、完整代码
/HibikiJie/LicensePlate
不包含,YOLO使?的部分,?件具有?张测试图?,可供测试使?。如需完整使?,务必??添加测量检测模型及代码。
权重?件:
链接:/s/1r1ymtv0RHG87O4Yut1oUiQ
提取码:6yoj
更多推荐
车牌识别
发布评论