‍AN Ping, LIU Yiyao, ZHOU Min, et al. A lightweight method for human body and hand mesh reconstruction[J]. Journal of Signal Processing, 2024, 40(7): 1185-1196. DOI: 10.16798/j.issn.1003-0530.2024.07.001
Citation: ‍AN Ping, LIU Yiyao, ZHOU Min, et al. A lightweight method for human body and hand mesh reconstruction[J]. Journal of Signal Processing, 2024, 40(7): 1185-1196. DOI: 10.16798/j.issn.1003-0530.2024.07.001

A Lightweight Method for Human Body and Hand Mesh Reconstruction

  • ‍ ‍The use of 3D human body reconstruction shows substantial potential across various domains, including film and television production and virtual reality. Notably, the prevailing reconstruction methodologies predominantly emphasize the refinement of reconstruction accuracy and texture articulation, often necessitating high-performance computing or sophisticated acquisition apparatus. Nonetheless, the current landscape exhibits a dearth of investigations into cost-effective and lightweight reconstruction techniques. In response to the imperative to alleviate usage costs and hardware requisites associated with human body reconstruction, this study proposes a strategy that entails the disentanglement of body and hand components grounded in a parameterized human body model. Subsequently, distinct reconstruction networks have been tailored to accommodate the distinctive movement characteristics of the body and hands, offering a judicious balance between computational parsimony and performance robustness. Both the body and hand reconstruction modules adopt an encoder-decoder architecture. The encoder segment of the body reconstruction module features a dual-stage design. Initially, leveraging Litehrnet and Canny edge algorithms, we derive heatmaps and edge maps, which serve as surrogate representations for RGB images, facilitating the acquisition of preliminary features through downsampling and concatenation. Because of the challenges of directly extracting adequate features from RGB images via lightweight backbone networks, the images are represented using edge maps and heatmaps. Subsequently, global features are procured in the second stage by Shufflenet. To improve performance, the activation function has been modified. To reduce parameter count while ensuring reconstruction efficacy, low-dimensional MLPs are used to estimate parameters based on probability distributions. Shape parameters are derived via a single MLP based on the Gaussian distribution, and pose parameters are estimated sequentially for each joint point utilizing cascaded low-dimensional MLPs guided by the Fisher matrix distribution. For the hand reconstruction branch, reconstruction is conducted based on vertex regression, and parameters are obtained via hand vertices. Conversely, the encoder of the hand reconstruction branch employs Litehrnet to yield multi-resolution feature branches. Although high-resolution features coupled with shallow features exhibit enhanced granularity expression and low-resolution features afford superior global perception, we employ interpolation for pose pooling and fuse high- and low-resolution features to reconcile these disparate characteristics. Subsequently, the decoder employs a DSConv and upsample network to derive hand vertices. Shape parameters are estimated via MLP based on hand vertices, and joint rotation parameters are derived from vertex coordinates employing inverse topology mathematics. Compared to extant methodologies, the proposed method yields a notable reduction in parameter and computational requisites, with an overall parameter count of 6.12M and a computational load of 433M. Evaluation of the Human3.6M dataset showcases an MPJPE of 86.7 mm for the body reconstruction branch, outperforming the classical method HMR (88.0 mm) with a parameter size representing only 11.6% of HMR. Moreover, the reconstructed mesh PA-MPJPE of 10.8 mm for the hand reconstruction branch surpasses regression-based full-body reconstruction methods such as ExPose and PIXIE, with parameter quantities of 4.7% and 3.1%, respectively. Furthermore, deployment on mobile devices for real-time inference, facilitated by Android Studio and PyTorch Android, yields an inference speed of 79.7 ms (12.5 fps) on Snapdragon 8Gen3, thereby meeting the exigencies of real-time inference applications.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return