重新思考自动曝光控制：一种具有语义引导的物理感知多流框架

王超; 谭旭东; 袁家康; 陈涛

doi:10.12466/xhcl.2026.04.010

重新思考自动曝光控制：一种具有语义引导的物理感知多流框架

Rethinking Automatic Exposure Control： A Physics-Aware Multi-Stream Framework with Semantic Guidance

摘要

摘要: 自动曝光（Auto-Exposure， AE）作为成像系统的核心前置环节，直接决定了图像的亮度均衡性与高层视觉任务的精度。然而，现有技术仍面临严峻挑战：传统规则算法受限于“语义鸿沟”，难以应对复杂光照下的语义歧义；而端到端深度学习方案往往沦为缺乏物理约束的“黑盒”，且存在显著的时域不稳定性。针对上述问题，本文提出了一种物理感知的白盒化自动曝光框架——PhysAEC（Physical Auto-Exposure Control）。不同于传统的参数回归或图像增强策略，本文创新性地将AE核心挑战重定义为“多重目标亮度预测（Multi-Target Luma Prediction）”任务，旨在为图像信号处理器（Image Signal Processor，ISP）控制回路提供兼具语义适应性与物理可解释性的最优曝光锚点。PhysAEC采用三流解耦架构（Three-Stream Decoupled Architecture）实现异构信息的同构融合：RGB语义流提取高层场景先验以消除逆光等场景的语义歧义；Raw域的空间网格流（Grid）与全局直方图流（Histogram）则分别提供精准的局部光强分布与动态范围边界约束。此外，针对连续推断中的时域震荡难题，本文结合光度控制的迟滞特性提出容差感知损失（Tolerance-Aware Loss，TAL），通过优化目标层面的物理正则化，有效抑制了微小波动引发的参数跳变。在构建的包含 10000 组高质量样本的Balanced-AE-Dataset上的实验表明，PhysAEC 综合预测准确率高达94.05%，平均绝对误差（Mean Absolute Error，MAE）从21.12骤降至2.53；在复杂高动态场景下，重建图像的峰值信噪比（Peak Signal-to-Noise Ratio，PSNR）达到38.98 dB，结构相似性指数（Structural Similarity Index，SSIM）达到0.994。结果证明，该方法成功实现了语义理解能力与物理控制鲁棒性的有机统一，确立了ISP底层控制任务的新范式。

Abstract: Auto-exposure （AE） is a pivotal component in imaging systems， playing a decisive role in achieving balanced image brightness and enhancing the accuracy of high-level vision tasks. However， existing techniques face considerable challenges： traditional rule-based algorithms are constrained by the “semantic gap” and struggle with semantic ambiguities in complex lighting conditions， while end-to-end deep learning approaches frequently operate as physically unconstrained “black boxes” leading to significant temporal instability. To address these issues， this paper introduces a physics-aware white-box auto-exposure framework， named PhysAEC. Departing from traditional parameter regression and image enhancement methods， we redefined the core AE challenge as a “multi-target luma prediction” task to establish optimal exposure anchors for the ISP control loop and ensure semantic adaptability and physical interpretability. PhysAEC adopts a three-stream decoupled architecture to facilitate the integration of heterogeneous information： an RGB semantic stream extracts high-level scene priors to eliminate semantic ambiguities （e.g.， in backlight scenarios）， while the raw-domain spatial grid and global histogram streams provide precise local intensity distributions and dynamic-range boundary constraints， respectively. Furthermore， to mitigate temporal oscillation during continuous inference， we introduced a tolerance-aware loss （TAL） that incorporates the hysteresis characteristics of photometric control. By optimizing physical regularization at the target level， TAL effectively suppressed parameter jitter resulting from minor fluctuations. Experiments conducted on our Balanced-AE-Dataset， comprising 10000 high-quality samples， revealed that PhysAEC achieves a prediction accuracy of 94.05% under standard conditions， with the mean absolute error decreasing from 21.12 to 2.53. In complex high-dynamic-range scenarios， the method yielded a PSNR of 38.98 dB and an SSIM of 0.994. These results underscored the proposed method’s successful integration of semantic understanding and robust physical control， establishing a new paradigm for low-level ISP control tasks.

HTML全文

参考文献(43)

施引文献

资源附件(0)