当前位置：首页 > article >正文

Street View Synthesis with Gaussian Splatting and Diffusion Prior 学习笔记

article 2025/3/11 3:29:23

使用高斯散点和扩散先验进行街景合成

街景中的驾驶模拟在自动驾驶系统的发展中具有至关重要的作用。通过构建城市街道的数字孪生体，我们可以持续使用模拟数据增强自动驾驶系统，从而显著减少对真实场景中数据收集的依赖，使得构建一个强大的自动驾驶系统能够以更低的时间和财务成本实现。

在自动驾驶模拟中，早期的尝试 [7,28,30] 使用计算机图形学（CG）引擎来渲染图像。这不仅需要耗时的过程来重建虚拟场景，而且生成的结果在真实感和保真度上较低。近年来，用于新视图合成（NVS）的神经渲染技术，例如神经辐射场（NeRF）[18] 和 3D 高斯散点（3DGS）[12]，被引入用于合成照片级逼真的街景。当前的研究 [9,17,20,24,33,37,41,45,53] 主要探讨了街景合成面临的两个挑战：无限场景的重建和动态物体的建模。BlockNeRF [33] 提出了将场景分割为多个块，以增强模型呈现大规模无限街景的能力。NSG [20] 及后续方法 [37,41,43,45,53] 分别对静态背景和动态前景进行建模，以实现更高质量的背景渲染，同时减少前景车辆的运动模糊。

尽管取得了令人兴奋的进展，但现有工作在评估重建质量时尚未充分探索一个关键问题。众所周知，理想的场景模拟系统应该具备高质量的自由视图渲染能力。目前的工作通常采用从车辆捕获的视图，但这些视图在训练阶段未曾见过（如图 1 中的红色视点），而忽略了与训练视图偏离较大的新视图（如图 1 中的蓝色和绿色视点）。在处理这些新视图时，现有方法的渲染质量显著下降，出现模糊和伪影，如图 1 所示。这一问题归因于车辆收集图像时视角的固有限制。训练图像通常沿着车辆行驶方向捕获，且集中在车辆所在车道周围。由于车辆的快速行驶，帧之间的重叠有限，因此无法对场景中的物体进行全面的多视角观察。因此，自动驾驶的街景合成任务可以被理解为从稀疏视图中进行重建的问题。

此前提出的神经渲染方法为解决稀疏视图下的 NVS 挑战，主要分为两大分支。第一类 [6,32,38,42,48] 结合了场景先验知识，例如深度信息 [6,25]、法线 [38]，或从深度网络中提取的特征 [48]，以显式的方式对模型训练进行正则化。此外，另一类 [16,21,29,31,40] 试图利用预训练的扩散模型来实现 NVS。这些方法通常通过在大型多视图数据集 [3,5,23,49] 上微调文本到图像的扩散模型，转变为图像到图像的扩散模型，并将相对摄像机位姿作为条件，随后在神经渲染模型的训练中应用扩散模型进行正则化。然而，多视图数据集 [3,5,23,49] 与街景之间存在显著的领域差异，仅依赖相对摄像机位姿不足以学习更复杂街景中的几何细节。为解决这一问题，我们利用从多模态数据中获得的 3D 几何信息来控制扩散模型，能够直接在自动驾驶数据集上微调模型，且无需编码相对摄像机位姿。

为巩固这一思路，本文提出了一种基于 3D 高斯散点和微调扩散模型先验的街景新视图合成方法。我们首先在自动驾驶数据集 [14] 上微调扩散模型。对于每个输入图像，我们使用其相邻帧作为条件，并利用来自 LiDAR 点云的深度信息进行控制。该微调后的扩散模型通过提供未见视图的先验来辅助 3DGS 训练。我们的方法在 KITTI [8] 和 KITTI-360 [14] 数据集上，针对密集视点输入与最先进的方法（SOTA）[1,12,41] 表现出竞争力，并在稀疏视图场景中表现优于它们。值得注意的是，即使在远离训练视图的视点下，我们的方法仍保持了高质量的渲染。此外，由于我们的方法仅在训练期间应用，因此不会影响 3DGS 的实时推理能力。因此，我们的模型在自动驾驶模拟系统中提供了高效的渲染和灵活的视点控制。

总之，我们的贡献如下： – 我们提出了一种新视图合成框架，用于街景合成，在保持渲染效率的前提下提升了视点控制的自由度，适用于自动驾驶模拟。 – 据我们所知，我们的方法是首次从稀疏视图输入重建问题的角度解决街景合成任务，并通过结合 3D 高斯散点与定制扩散模型应对这一挑战。 – 我们提出了一种在自动驾驶数据集上微调扩散模型并赋予其新视图合成能力的新策略，克服了传统上对多视图数据集和相对摄像机位姿的依赖。

参考资料：

1. Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV (2023) 2. Chan, E.R., Nagano, K., Chan, M.A., Bergman, A.W., Park, J.J., Levy, A., Aittala, M., De Mello, S., Karras, T., Wetzstein, G.: Generative novel view synthesis with 3d-aware diffusion models. arXiv preprint arXiv:2304.02602 (2023) 3. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015) 4. Chen, Y., Gu, C., Jiang, J., Zhu, X., Zhang, L.: Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering. arXiv preprint arXiv:2311.18561 (2023) 5. Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., Farhadi, A.: Objaverse: A universe of annotated 3d objects. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 13142–13153 (2023) 6. Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: Depth-supervised nerf: Fewer views and faster training for free. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12882–12891 (2022) 7. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: An open urban driving simulator. In: Conference on robot learning. pp. 1–16. PMLR (2017) 8. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: The kitti vision benchmark suite. URL http://www. cvlibs. net/datasets/kitti 2(5) (2015) 9. Guo, J., Deng, N., Li, X., Bai, Y., Shi, B., Wang, C., Ding, C., Wang, D., Li, Y.: Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv preprint arXiv:2306.04988 (2023) 10. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017) 11. Huang, T., Dong, B., Yang, Y., Huang, X., Lau, R.W., Ouyang, W., Zuo, W.: Clip2point: Transfer clip to point cloud classification with image-depth pretraining. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 22157–22167 (2023) 12. Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics 42(4) (2023) 13. Kwak, M.S., Song, J., Kim, S.: Geconerf: Few-shot neural radiance fields via geometric consistency. arXiv preprint arXiv:2301.10941 (2023) 14. Liao, Y., Xie, J., Geiger, A.: Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(3), 3292–3310 (2022) 15. Liu, J.Y., Chen, Y., Yang, Z., Wang, J., Manivasagam, S., Urtasun, R.: Real-time neural rasterization for large scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 8416–8427 (2023) 16. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero1-to-3: Zero-shot one image to 3d object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9298–9309 (2023) 17. Lu, F., Xu, Y., Chen, G., Li, H., Lin, K.Y., Jiang, C.: Urban radiance field representation with deformable neural mesh primitives. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 465–476 (2023) 16 Z. Yu et al. 18. Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65(1), 99–106 (2021) 19. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 21(12), 4695–4708 (2012) 20. Ost, J., Mannan, F., Thuerey, N., Knodt, J., Heide, F.: Neural scene graphs for dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 2856–2865 (2021) 21. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d diffusion. In: The Eleventh International Conference on Learning Representations (2022) 22. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PMLR (2021) 23. Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 10901–10911 (2021) 24. Rematas, K., Liu, A., Srinivasan, P.P., Barron, J.T., Tagliasacchi, A., Funkhouser, T., Ferrari, V.: Urban radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12932–12942 (2022) 25. Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: Dense depth priors for neural radiance fields from sparse input views. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12892–12901 (2022) 26. Roessle, B., Müller, N., Porzi, L., Bulò, S.R., Kontschieder, P., Nießner, M.: Ganerf: Leveraging discriminators to optimize neural radiance fields. arXiv preprint arXiv:2306.06044 (2023) 27. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022) 28. Rong, G., Shin, B.H., Tabatabaee, H., Lu, Q., Lemke, S., Možeiko, M., Boise, E., Uhm, G., Gerow, M., Mehta, S., et al.: Lgsvl simulator: A high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC). pp. 1–6. IEEE (2020) 29. Sargent, K., Li, Z., Shah, T., Herrmann, C., Yu, H.X., Zhang, Y., Chan, E.R., Lagun, D., Fei-Fei, L., Sun, D., et al.: Zeronvs: Zero-shot 360-degree view synthesis from a single real image. arXiv preprint arXiv:2310.17994 (2023) 30. Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics: Results of the 11th International Conference. pp. 621–635. Springer (2018) 31. Shi, R., Chen, H., Zhang, Z., Liu, M., Xu, C., Wei, X., Chen, L., Zeng, C., Su, H.: Zero123++: a single image to consistent multi-view diffusion base model. arXiv preprint arXiv:2310.15110 (2023) 32. Somraj, N., Soundararajan, R.: ViP-NeRF: Visibility prior for sparse input neural radiance fields (August 2023). https://doi.org/10.1145/3588432.3591539 33. Tancik, M., Casser, V., Yan, X., Pradhan, S., Mildenhall, B., Srinivasan, P.P., Barron, J.T., Kretzschmar, H.: Block-nerf: Scalable large scene neural view synthesis. SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior 17 In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8248–8258 (2022) 34. Tancik, M., Weber, E., Ng, E., Li, R., Yi, B., Kerr, J., Wang, T., Kristoffersen, A., Austin, J., Salahi, K., Ahuja, A., McAllister, D., Kanazawa, A.: Nerfstudio: A modular framework for neural radiance field development. In: ACM SIGGRAPH 2023 Conference Proceedings. SIGGRAPH ’23 (2023) 35. Tonderski, A., Lindström, C., Hess, G., Ljungbergh, W., Svensson, L., Petersson, C.: Neurad: Neural rendering for autonomous driving. arXiv preprint arXiv:2311.15260 (2023) 36. Turki, H., Ramanan, D., Satyanarayanan, M.: Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12922–12931 (2022) 37. Turki, H., Zhang, J.Y., Ferroni, F., Ramanan, D.: Suds: Scalable urban dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12375–12385 (2023) 38. Verbin, D., Hedman, P., Mildenhall, B., Zickler, T., Barron, J.T., Srinivasan, P.P.: Ref-nerf: Structured view-dependent appearance for neural radiance fields. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5481–5490. IEEE (2022) 39. Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. arXiv preprint arXiv:2303.16196 (2023) 40. Wu, R., Mildenhall, B., Henzler, P., Park, K., Gao, R., Watson, D., Srinivasan, P.P., Verbin, D., Barron, J.T., Poole, B., Holynski, A.: Reconfusion: 3d reconstruction with diffusion priors. arXiv (2023) 41. Wu, Z., Liu, T., Luo, L., Zhong, Z., Chen, J., Xiao, H., Hou, C., Lou, H., Chen, Y., Yang, R., et al.: Mars: An instance-aware, modular and realistic simulator for autonomous driving. In: CAAI International Conference on Artificial Intelligence. pp. 3–15. Springer (2023) 42. Wynn, J., Turmukhambetov, D.: Diffusionerf: Regularizing neural radiance fields with denoising diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4180–4189 (2023) 43. Xie, Z., Zhang, J., Li, W., Zhang, F., Zhang, L.: S-nerf: Neural radiance fields for street views. arXiv preprint arXiv:2303.00749 (2023) 44. Xiong, H., Muttukuru, S., Upadhyay, R., Chari, P., Kadambi, A.: Sparsegs: Realtime 360 {\deg} sparse view synthesis using gaussian splatting. arXiv preprint arXiv:2312.00206 (2023) 45. Yan, Y., Lin, H., Zhou, C., Wang, W., Sun, H., Zhan, K., Lang, X., Zhou, X., Peng, S.: Street gaussians for modeling dynamic urban scenes. arXiv preprint arXiv:2401.01339 (2024) 46. Yang, J., Pavone, M., Wang, Y.: Freenerf: Improving few-shot neural rendering with free frequency regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8254–8263 (2023) 47. Yang, Z., Chen, Y., Wang, J., Manivasagam, S., Ma, W.C., Yang, A.J., Urtasun, R.: Unisim: A neural closed-loop sensor simulator. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1389–1399 (2023) 48. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 4578–4587 (2021) 49. Yu, X., Xu, M., Zhang, Y., Liu, H., Ye, C., Wu, Y., Yan, Z., Zhu, C., Xiong, Z., Liang, T., et al.: Mvimgnet: A large-scale dataset of multi-view images. In: Proceed- 18 Z. Yu et al. ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 9150–9161 (2023) 50. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3836–3847 (2023) 51. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018) 52. Zhang, Y., Guo, X., Poggi, M., Zhu, Z., Huang, G., Mattoccia, S.: Completionformer: Depth completion with convolutions and vision transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 18527–18536 (2023) 53. Zhou, X., Lin, Z., Shan, X., Wang, Y., Sun, D., Yang, M.H.: Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. arXiv preprint arXiv:2312.07920 (2023) 54. Zhou, Z., Tulsiani, S.: Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12588–12597 (2023)、翻译汉语

Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: 基于网格的抗锯齿神经辐射场. ICCV (2023)
Chan, E.R., Nagano, K., Chan, M.A., Bergman, A.W., Park, J.J., Levy, A., Aittala, M., De Mello, S., Karras, T., Wetzstein, G.: 使用3D感知扩散模型的生成式新视图合成. arXiv 预印本 arXiv:2304.02602 (2023)
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., 等: Shapenet: 一个信息丰富的3D模型库. arXiv 预印本 arXiv:1512.03012 (2015)
Chen, Y., Gu, C., Jiang, J., Zhu, X., Zhang, L.: 周期振动高斯：动态城市场景重建与实时渲染. arXiv 预印本 arXiv:2311.18561 (2023)
Deitke, M., Schwenk, D., Salvador, J., Weihs, L., Michel, O., VanderBilt, E., Schmidt, L., Ehsani, K., Kembhavi, A., Farhadi, A.: Objaverse: 一个注释良好的3D对象宇宙. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 13142–13153 (2023)
Deng, K., Liu, A., Zhu, J.Y., Ramanan, D.: 深度监督NeRF：更少视图和更快的训练. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 12882–12891 (2022)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: Carla: 一个开放的城市驾驶模拟器. In: 机器人学习会议. pp. 1–16. PMLR (2017)
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: KITTI视觉基准套件. URL http://www.cvlibs.net/datasets/kitti 2(5) (2015)
Guo, J., Deng, N., Li, X., Bai, Y., Shi, B., Wang, C., Ding, C., Wang, D., Li, Y.: Streetsurf: 将多视图隐式表面重建扩展到街景视图. arXiv 预印本 arXiv:2306.04988 (2023)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: 使用两时间尺度更新规则训练的GAN收敛于局部纳什均衡. 神经信息处理系统进展 30 (2017)
Huang, T., Dong, B., Yang, Y., Huang, X., Lau, R.W., Ouyang, W., Zuo, W.: Clip2point: 使用图像-深度预训练将CLIP迁移到点云分类. In: 计算机视觉国际会议 (ICCV) 论文集. pp. 22157–22167 (2023)
Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 用于实时辐射场渲染的3D高斯散射. ACM图形学交易 42(4) (2023)
Kwak, M.S., Song, J., Kim, S.: Geconerf: 几何一致性下的少样本神经辐射场. arXiv 预印本 arXiv:2301.10941 (2023)
Liao, Y., Xie, J., Geiger, A.: KITTI-360: 一个新的数据集及2D和3D城市场景理解基准. IEEE模式分析与机器智能交易 45(3), 3292–3310 (2022)
Liu, J.Y., Chen, Y., Yang, Z., Wang, J., Manivasagam, S., Urtasun, R.: 大场景的实时神经光栅化. In: 计算机视觉国际会议 (ICCV) 论文集. pp. 8416–8427 (2023)
Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero1-to-3: 零样本从一张图像生成3D对象. In: 计算机视觉国际会议 (ICCV) 论文集. pp. 9298–9309 (2023)
Lu, F., Xu, Y., Chen, G., Li, H., Lin, K.Y., Jiang, C.: 使用可变形神经网格原语的城市辐射场表示. In: 计算机视觉国际会议 (ICCV) 论文集. pp. 465–476 (2023)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: 用于视图合成的神经辐射场表示场景. 《ACM通讯》 65(1), 99–106 (2021)
Mittal, A., Moorthy, A.K., Bovik, A.C.: 无参考图像质量评估在空间域中的应用. IEEE图像处理交易 21(12), 4695–4708 (2012)
Ost, J., Mannan, F., Thuerey, N., Knodt, J., Heide, F.: 用于动态场景的神经场景图. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 2856–2865 (2021)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: 使用2D扩散进行文本到3D. In: 第十一届学习表征国际会议 (2022)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., 等: 从自然语言监督中学习可迁移的视觉模型. In: 机器学习国际会议. pp. 8748–8763. PMLR (2021)
Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: 3D常见物体：大规模学习和评估真实3D类别重建. In: 计算机视觉国际会议 (ICCV) 论文集. pp. 10901–10911 (2021)
Rematas, K., Liu, A., Srinivasan, P.P., Barron, J.T., Tagliasacchi, A., Funkhouser, T., Ferrari, V.: 城市辐射场. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 12932–12942 (2022)
Roessle, B., Barron, J.T., Mildenhall, B., Srinivasan, P.P., Nießner, M.: 使用稀疏输入视图的神经辐射场的密集深度先验. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 12892–12901 (2022)
Roessle, B., Müller, N., Porzi, L., Bulò, S.R., Kontschieder, P., Nießner, M.: Ganerf: 利用判别器优化神经辐射场. arXiv 预印本 arXiv:2306.06044 (2023)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: 使用潜扩散模型进行高分辨率图像合成. In: 计算机视觉与模式识别会议 (CVPR) 论文集. pp. 10684–10695 (2022)
Rong, G., Shin, B.H., Tabatabaee, H., Lu, Q., Lemke, S., Možeiko, M., Boise, E., Uhm, G., Gerow, M., Mehta, S., 等: Lgsvl模拟器: 一个高保真自动驾驶模拟器. In: 2020 IEEE第