TL;DR. A Transformer estimating 3D Gaussian splatting from two 360 panorama (or omnidirectional) images,
utilizing the quasi-uniform and structured grid (Yin-Yang grid) for stable attention.
The overall process of OmniSplat.
The two reference omnidirectional images are decomposed into Yin-Yang images, and the cross-view attention is conducted across grids along with epipolar lines to compose cost volume.
The 3DGS parameters are estimated and Yin-Yang images are rasterized from the novel view.
The two images are combined to synthesize the final omnidirectional image.
In cross-view attention, we present red and yellow points and the corresponding sphere sweep curves with the same color.
Each image performs cross-attention to the Yin-Yang images from other views, following geometric constraints.
Feed-forward 3D Gaussian Splatting (3DGS) models have gained significant popularity due to their ability to generate scenes immediately without needing per-scene optimization. Although omnidirectional images are getting more popular since they reduce the computation for image stitching to composite a holistic scene, existing feed-forward models are only designed for perspective images. The unique optical properties of omnidirectional images make it difficult for feature encoders to correctly understand the context of the image, and make the Gaussian non-uniform in space, which hinders the image quality synthesized from novel views. We propose OmniSplat, a pioneering work for fast feed-forward 3DGS generation from a few omnidirectional images. We introduce Yin-Yang grid and decompose images based on it to reduce the domain gap between omnidirectional and perspective images. The Yin-Yang grid can use the existing CNN structure as it is, but its quasi-uniform characteristic allows the decomposed image to be similar to a perspective image, so it can exploit the strong prior knowledge of the learned feed-forward network. OmniSplat demonstrates higher reconstruction accuracy than existing feed-forward networks trained on perspective images. Furthermore, we enhance the segmentation consistency between omnidirectional images by leveraging attention from the encoder of OmniSplat, providing fast and clean 3DGS editing results.
@article{lee2024omnisplat,
title={OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities},
author={Suyoung Lee, Jaeyoung Chung, Kihoon Kim, Jaeyoo Huh, Gunhee Lee, Minsoo Lee, Kyoung Mu Lee},
journal={arXiv preprint arXiv:2412.00000},
year={2024}
}