融合气象数据的并行化航班延误预测模型

Parallel Flight Delay Prediction Model Based on Fusion of Meteorological Data

  • 摘要: 针对航空数据呈现高维化、海量化趋势但传统模型处理大数据时单机计算资源不足的问题,本文提出一种基于Spark并融合气象数据的并行化航班延误预测模型。该模型利用数据框完成航班数据和气象数据的融合,从而在单个航班数据后加入不同小时的气象数据。然后,采用并行化方式进行随机森林的特征划分和树的生成,可快速进行航班延误预测。实验结果表明融入气象数据后查全率和正确率均有提高,针对不同阈值的延误时间进行预测时,大阈值的预测准确率更高。同时,并行化模型较单机模型更快收敛,具有较强的加速比。

     

    Abstract: Nowadays, aviation data show a high dimensional and massive trend, while the traditional models always lack computing resources. In order to solve this problem, a parallel flight delay prediction model considering meteorological data based on Spark was proposed in this paper. The DataFrame was used to complete the fusion of flight data and meteorological data, so as to add different hours of weather data to a single flight data. Then, the parallelization method was used to divide the characteristics of the random forest and generate the tree, thus the flight delay prediction can be carried out quickly. The experimental results show that the recall and the accuracy rate improve after integrating meteorological data. The prediction accuracy of large threshold is higher for predicting different delay time. At the same time, the parallelization model converges faster than the single machine model, and has stronger acceleration ratio.

     

/

返回文章
返回