基于视点变换的酒标图像数据增强研究

李晓晴; 张孝昌; 才子嘉; 马尽文

doi:10.16798/j.issn.1003-0530.2022.01.006

基于视点变换的酒标图像数据增强研究

1.
北京大学数学科学学院信息科学系，北京 100871
2.
军事医学研究院，北京 100850

基金项目:

科技部国家重点研发计划项目《科技创新2030-“新一代人工智能”重大项目》课题“神经网络的可解释性” 2018AAA0100205

详细信息

作者简介:
李晓晴　女，1993年生，山东人。北京大学数学科学学院信息科学系博士研究生，主要研究方向为图像处理、模式识别。E-mail：xiaoqing_li@pku.edu.cn

张孝昌　男，1996年生，河南人。军事医学研究院硕士研究生，主要从事生物信息学方面研究。E-mail：zxcsms@pku.edu.cn

才子嘉　男，1996年生，河北人。北京大学数学科学学院信息科学系硕士研究生，主要从事于机器学习、模式识别、自然语言处理等方面的研究。E-mail：caizijia@pku.edu.cn

马尽文　男，1962年生，陕西人。北京大学数学科学学院教授、博士生导师，中国电子学会信号处理分会副主任委员。主要从事智能信息处理、神经计算、模式识别、生物信息学等方面的研究。E-mail：jwma@math.pku.edu.cn

中图分类号: TP183
计量
- 文章访问数: 640
- HTML全文浏览量: 31
- PDF下载量: 25
出版历程
- 收稿日期: 2021-02-28
- 刊出日期: 2022-01-24

On Wine Label Image Data Augmentation Through Viewpoint Based Transformation

1.
Department of Information Science，School of Mathematical Sciences，Peking University，Beijing 100871，China
2.
The Institute of Military Medical Research，Beijing 100850，China

摘要

摘要: 随着人民生活水平的提高和红酒文化的发展，建立一个高效的自动化酒标图像检索系统变得越来越重要。然而，实际的酒标图像数据集普遍存在着类别样本量的不均衡、许多类样本量偏少的现象，使得基于深度学习的酒标图像检索模型难以进行有效的训练和参数学习。因此，对酒标图像进行数据增强操作就变得更为必要和迫切。为了解决这个问题，本文提出了一个专门针对于酒标图像数据进行变换和扩展的数据增强算法。它将酒标以立体的形式展示在圆柱体酒瓶的表面并通过一个拍摄视点投影到柱面切平面而形成了酒标图像。这样便可通过一幅图像对酒标进行柱面建模，并通过对视点的上下，左右，远近移动来对柱面酒标进行投影变换而生成新的酒标图像。通过在大规模的酒标图像数据集上的实验结果表明，本文所提出的基于视点变换的数据增强策略能够有效地实现对酒标图像数据的扩展，并且显著提高了酒标图像检索模型的检索能力。
- 酒标图像 /
- 深度学习 /
- 数据增强 /
- 视点变换 /
- 柱面建模
Abstract: With the improvement of people’s living standard and the development of red wine culture， it has become more and more important to establish an efficient automated wine label image retrieval system. However， the classes of wine label images are unbalanced and some classes are a few number of images so that the wine label image retrieval model based on deep learning is difficult to train. Therefore， data augmentation for wine label image becomes more necessary and urgent. In order to solve this problem， we propose a specialized data augmentation algorithm for wine label image. Specifically， we consider the wine label on the wine bottle as a cylinder and project it on the plane being tangent with the cylinder from a viewpoint to form the wine label image. In this way， we can make the cylinder modeling or reconstruction from a wine label image， and move the viewpoint up and down， left and right， far and near， to generate a new projection wine label image from the cylinder wine label with the viewpoint transformation. Experimental results on a large-scale wine label image dataset show that this viewpoint transformation-based data augmentation strategy can effectively increase the number of essentially different images of the same wine label， and significantly improve the retrieval ability of the wine label retrieval model.
- wine label image /
- deep learning /
- data augmentation /
- viewpoint transformation /
- cylinder modeling

HTML全文

图 1 基于视点变换的酒标图像数据增强算法流程

Figure 1. The flow of wine label image data augmentation algorithm based on viewpoint transformation

下载: 全尺寸图片幻灯片

图 2 FCN模型提取酒标区域的过程

Figure 2. The process of using the FCN to extract the wine label region

下载: 全尺寸图片幻灯片

图 3 视点左右移动的空间建模立体图

Figure 3. The stereogram of spatial modeling with the viewpoint moving left and right

下载: 全尺寸图片幻灯片

图 4 视点左右移动的空间建模俯视图

Figure 4. The planform of spatial modeling with the viewpoint moving left and right

下载: 全尺寸图片幻灯片

图 5 视点左右移动时新图像生成流程图

Figure 5. The flowchart of new image generation with the viewpoint moving left and right

下载: 全尺寸图片幻灯片

图 6 视点远近移动的空间建模立体图

Figure 6. The stereogram of spatial modeling with the viewpoint moving far and near

下载: 全尺寸图片幻灯片

图 7 视点前后移动的空间建模俯视图

Figure 7. The planform of spatial modeling with the viewpoint moving far and near

下载: 全尺寸图片幻灯片

图 8 视点上下移动的空间建模立体图

Figure 8. The stereogram of spatial modeling with the viewpoint moving up and down

下载: 全尺寸图片幻灯片

图 9 视点上下移动的空间建模俯视图

Figure 9. The planform of spatial modeling with the viewpoint moving up and down

下载: 全尺寸图片幻灯片

图 10 去除生成酒标图像的黑色边缘和规范化

Figure 10. Removing the black edges in the generated wine label image and normalizing the processed image

下载: 全尺寸图片幻灯片

图 11 酒标检索数据集中的部分样例

Figure 11. Some samples in the wine label retrieval dataset

下载: 全尺寸图片幻灯片

图 12 基于视点左右移动的数据增强效果图

Figure 12. A data augmentation instance based on the viewpoint moving left and right

下载: 全尺寸图片幻灯片

图 13 基于视点前后移动的数据增强效果图

Figure 13. A data augmentation instance based on the viewpoint moving forward and backward

下载: 全尺寸图片幻灯片

图 14 基于视点上下移动的数据增强效果图

Figure 14. A data augmentation instance based on the viewpoint moving up and down

下载: 全尺寸图片幻灯片

表 1 酒标检索数据集主品牌信息

Table 1 Main brand information of the wine label retrieval dataset

主品牌样本量	主品牌个数
1	5391
2	11464
3	7265
4	8653
5	5691
6	4717
7	2405
8	2644
9	2290
10	1789

下载: 导出CSV

表 2 使用不同数据增强策略训练的酒标检索模型在测试集上的评测结果

Table 2 Evaluation results of wine label retrieval models trained with different data augmentation strategies on the test dataset

数据增强策略	主品牌准确率
数据复制	0.716
数据翻转	0.692
数据旋转	0.725
添加高斯模糊	0.723
添加噪声	0.731
改变对比度与亮度	0.735
基于视点变换	0.757
DAGAN算法	0.719
Fast AutoAugment算法	0.744

下载: 导出CSV

参考文献(19)

[1]	KRIZHEVSKY A，SUTSKEVER I，HINTON G E. Imagenet classification with deep convolutional neural networks［J］. Communications of the ACM，2017，60（6）：84- 90. doi：10.1145/3065386 doi: 10.1145/3065386
[2]	HE K，ZHANG X，REN S，et al. Deep residual learning for image recognition［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2016：770- 778. doi：10.1109/cvpr.2016.90 doi: 10.1109/cvpr.2016.90
[3]	SIAL H，SANCHO-ASENSIO S，BALDRICH R，et al. Color-based data augmentation for reflectance estimation［C］. Color and Imaging Conference，2018：284- 289. doi：10.2352/issn.2169-2629.2018.26.284 doi: 10.2352/issn.2169-2629.2018.26.284
[4]	SHORTEN C，KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning［J］. Journal of Big Data，2019，6（1）：60. doi：10.1186/s40537-019-0197-0 doi: 10.1186/s40537-019-0197-0
[5]	于贺，余南南. 基于多尺寸卷积与残差单元的快速收敛GAN胸部X射线图像数据增强［J］. 信号处理，2019，35（12）：2045- 2054. YU He，YU Nanan. Enhancement of chest X-ray image data by using fast convergence GAN based on multi-dimensional convolution and residual unit［J］. Journal of Signal Processing，2019，35（12）：2045- 2054.（in Chinese）
[6]	RADFORD A，METZ L，CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks［C］. ICLR，2016.
[7]	CUBUK E D，ZOPH B，MANE D，et al. Autoaument：learning augmentation strategies from data［C］. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition，2019：113- 123. doi：10.1109/cvpr.2019.00020 doi: 10.1109/cvpr.2019.00020
[8]	MIKOLAJCZYK A，GROCHOWSKI M. Data augmentation for improving deep learning in image classification problem［C］. 2018 International Inter Disciplinary PhD workshop（IIPhDW）. IEEE，2018：117- 122. doi：10.1109/iiphdw.2018.8388338 doi: 10.1109/iiphdw.2018.8388338
[9]	LI X，YANG J，MA J. CNN-SIFT consecutive searching and matching for wine label retrieval［C］. International Conference on Intelligent Computing. Springer，Cham，2019：250- 261. doi：10.1007/978-3-030-26763-6_24 doi: 10.1007/978-3-030-26763-6_24
[10]	邹浩，林赟，洪文. 采用深度学习的多方位角SAR图像目标识别研究［J］. 信号处理，2018，34（5）：513- 522. ZOU Hao，LIN Yun，HONG Wen. Research on multi-aspect SAR images target recognition using deep learning［J］. Journal of Signal Processing，2018，34（5）：513- 522.（in Chinese）
[11]	SON J Y，KIM S H，KIM D S，et al. Image-forming principle of integral photography［J］. Journal of Display Technology，2008，4（3）：324- 331. doi：10.1109/jdt.2008.921906 doi: 10.1109/jdt.2008.921906
[12]	MEZIROW J. Perspective transformation［J］. Adult Education，1978，28（2）：100- 110. doi：10.1177/074171367802800202 doi: 10.1177/074171367802800202
[13]	LONG J，SHELHAMER E，DARRELL T. Fully convolutional networks for semantic segmentation［C］. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition，2015：3431- 3440. doi：10.1109/cvpr.2015.7298965 doi: 10.1109/cvpr.2015.7298965
[14]	BROWN M J W，MERIFIELS M J. Depiction of images using inverse perspective transformation to provide 3-D effect：U.S. Patent 5，933，544［P］. 1999-8-3.
[15]	KIRKLAND E J. Bilinear interpolation［M］. Advanced Computing in Electron Microscopy. Springer，Boston，MA，2010：261- 263. doi：10.1007/978-1-4419-6533-2_12 doi: 10.1007/978-1-4419-6533-2_12
[16]	LI X，YANG J，MA J. Large scale category-structured image retrieval for object identification through supervised learning of CNN and SURF-based matching［J］. IEEE Access，2020，8：57796- 57809. doi：10.1109/access.2020.2982560 doi: 10.1109/access.2020.2982560
[17]	邹亚君，李翌昕，马尽文. 基于深度学习的酒分割研究［J］. 信号处理，2019，35（4）：623- 630. ZOU Yajun，LI Yixin，MA Jinwen. Research on deep learning based wine label segmentation［J］. Journal of Signal Processing，2019，35（4）：623- 630.（in Chinese）
[18]	ANTONIOU A，STORKEY A，Edwards H. Data augmentation generative adversarial networks. arXiv preprint arXiv：1711.04340，2017. doi：10.1007/978-3-030-01424-7_58 doi: 10.1007/978-3-030-01424-7_58
[19]	LIM S，KIM I，KIM T，et al. Fast auto-augment［C］. Advances in Neural Information Processing Systems，2019.

施引文献

资源附件(0)

图(14) / 表(2)

计量

文章访问数: 640
HTML全文浏览量: 31
PDF下载量: 25
被引次数: 0

基于视点变换的酒标图像数据增强研究

计量

出版历程

On Wine Label Image Data Augmentation Through Viewpoint Based Transformation

计量

出版历程

目录