基于注意力机制的多用户全景视频视口预测

Predicting Viewports for Multi-User Panoramic Streams Using an Attention Mechanism

  • 摘要: 近年来,随着虚拟现实等沉浸式技术的发展,全景视频技术的应用前景也在逐步扩展。全景视频提供了逼真的沉浸式体验,但同时也给网络带宽带来了极大的传输压力。因此,如何降低其传输带宽成为了研究的焦点,视口预测则是该领域的研究热点。当前,主流的视口预测方案多是利用观看者的视点轨迹和画面内容,结合神经网络输出结果,并进行评估。现有的方法大多不能在长时间预测取得较好的效果,且没有充分利用多用户场景下的数据。为此,本文借鉴了目前流行的Transformer网络中的注意力机制,提出了一种在多用户场景下预测未来较长时间视口的方案。由于不同用户在观看同一视频的视点轨迹具有相似性,本文首先提出了一种多用户视口轨迹相似性比较方案,该方案利用目标用户的视口轨迹数据和历史用户的视口轨迹数据预测目标用户未来视口轨迹数据。其次,由于全景视频视口轨迹存在着不连续性,本文对不连续视口轨迹进行映射处理来解决单次预测轨迹数据不连续的问题,在实验中使用此方法处理数据集的效果较好。最后,本文通过实验对比了两个近年提出的具有相似输出的模型,结果显示本文提出的全景视频预测算法在平均绝对误差、曼哈顿距离以及本文提出的角度距离误差指标下有所减少,部分指标减少超过10%。这说明本文提出的方案能在较长时间视口预测取得更高的精度,引入注意力机制和多用户相似性比较有助于提升模型性能和泛化能力。

     

    Abstract: Recently, with the development of immersive technologies such as virtual reality, the application prospects of panoramic video technology have gradually expanded. While offering realistic experiences, panoramic videos strain network bandwidth. Therefore, reducing the transmission bandwidth has become a research focus, with viewport prediction emerging as a popular topic in the field. Currently, mainstream solutions for viewport prediction often utilize viewpoint trajectories and scene content, combined with neural network outputs for evaluation. Most of the existing methods cannot achieve good performance in long-term prediction and do not fully utilize information in multi-user scenarios. This paper proposes a viewport prediction method inspired by Transformer networks. Because of the similarity in viewpoint trajectories of different users watching the same video, this paper first proposes a scheme to compare multi-user viewport trajectory similarity, which uses the target user’s and historical user’s viewport trajectory data to predict the target user’s future viewport trajectory data. Owing to the discontinuity of the panoramic video viewport trajectory, this paper maps the discontinuous trajectory to solve the problem of discontinuous single prediction trajectory data. In an experiment, this method was used to process a dataset, and promising results were achieved. Finally, experimental comparisons with similar algorithms from recent years show a reduction in error across metrics such as the mean absolute error, Manhattan distance, and angle distance error proposed in this paper, with some metrics reduced by more than 10%. This indicates that the proposed solution can achieve higher accuracy in long-term viewport prediction, and the introduction of attention mechanism and multi-user similarity comparison can aid in improving model performance.

     

/

返回文章
返回