当前位置: 首页 > article >正文

【读论文】【泛读】S-NERF: NEURAL RADIANCE FIELDS FOR STREET VIEWS

文章目录

    • 0. Abstract
    • 1. Introduction
    • 2. Related work
    • 3. Methods-NERF FOR STREET VIEWS
      • 3.1 CAMERA POSE PROCESSING
      • 3.2 REPRESENTATION OF STREET SCENES
      • 3.3 DEPTH SUPERVISION
      • 3.4 Loss function
    • 4. EXPERIMENTS
    • 5. Conclusion
    • Reference

0. Abstract

Problem introduction:

However, we conjugate that this paradigm does not fit the nature of the street views that are collected by many self-driving cars from the large-scale unbounded scenes. Also,the onboard cameras perceive scenes without much overlapping.

Solutions:

  • Consider novel view synthesis of both the large-scale background scenes and the foreground moving vehicles jointly.

  • Improve the scene parameterization function and the camera poses.

  • We also use the the noisy and sparse LiDAR points to boost the training and learn a robust geometry and reprojection-based confidence to address the depth outliers.

  • Extend our S-NeRF for reconstructing moving vehicles

Effect:

Reduce 7 40% of the mean-squared error in the street-view synthesis and a 45% PSNR gain for the moving vehicles rendering

1. Introduction

Overcome the shortcomings of the work of predecessors:

  • MipNeRF-360 (Barron et al., 2022) is designed for training in unbounded scenes. But it still needs enough intersected camera rays

  • BlockNeRF (Tancik et al., 2022) proposes a block-combination strategy with refined poses, appearances, and exposure on the MipNeRF (Barron et al., 2021) base model for processing large-scale outdoor scenes. But it needs a special platform to collect the data, and is hard to utilize the existing dataset.

  • Urban-NeRF (Rematas et al., 2022) takes accurate dense LiDAR depth as supervision for the reconstruction of urban scenes. But the accurate dense LiDAR is too expensive. S-Nerf just needs the noisy sparse LiDAR signals.

2. Related work

There are lots of papers in the field of Large-scale NeRF and Depth supervised NeRF

在这里插入图片描述

This paper is included in the field of both of them.

3. Methods-NERF FOR STREET VIEWS

3.1 CAMERA POSE PROCESSING

SfM used in previous NeRFs fails. Therefore, we proposed two different methods to reconstruct the camera poses for the static background and the foreground moving vehicles.

  1. Background scenes

    For the static background, we use the camera parameters achieved by sensor-fusion SLAM and IMU of the self-driving cars (Caesar et al., 2019; Sun et al., 2020) and further reduce the inconsistency between multi-cameras with a learning-based pose refinement network.

  2. Moving vehicles
    在这里插入图片描述

    We now transform the coordinate system by setting the target object’s center as the coordinate system’s origin.
    P ^ i = ( P i P b − 1 ) − 1 = P b P i − 1 , P − 1 = [ R T − R T T 0 T 1 ] . \hat{P}_i=(P_iP_b^{-1})^{-1}=P_bP_i^{-1},\quad P^{-1}=\begin{bmatrix}R^T&-R^TT\\\mathbf{0}^T&1\end{bmatrix}. P^i=(PiPb1)1=PbPi1,P1=[RT0TRTT1].

    And, P = [ R T 0 T 1 ] P=\begin{bmatrix}R&T\\\mathbf{0}^T&1\end{bmatrix} P=[R0TT1]represents the old position of the camera or the target object.

    After the transformation, only the camera is moving which is favorable in training NeRFs.

    How to convert the parameter matrix.

3.2 REPRESENTATION OF STREET SCENES

  1. Background scenes

    constrain the whole scene into a bounded range

    image-20231121223229096

    This part is from mipnerf-360

  2. Moving Vehicles

    Compute the dense depth maps for the moving cars as an extra supervision

    • We follow GeoSim (Chen et al., 2021b) to reconstruct coarse mesh from multi-view images and the sparse LiDAR points.

    • After that, a differentiable neural renderer (Liu et al., 2019) is used to render the corresponding depth map with the camera parameter (Section 3.2).

    • The backgrounds are masked during the training by an instance segmentation network (Wang et al., 2020).

    There are three references

3.3 DEPTH SUPERVISION

image-20231123205037308

To provide credible depth supervisions from defect LiDAR depths, we first propagate the sparse depths and then construct a confidence map to address the depth outliers(异常值).

  1. LiDAR depth completion

    Use NLSPN (Park et al., 2020) to propagate the depth information from LiDAR points to surrounding pixels.

  2. Reprojection confidence

    Measure the accuracy of the depths and locate the outliers.

    The warping operation can be represented as:

    X t = ψ ( ψ − 1 ( X s , P s ) , P t ) \mathbf{X}_t=\psi(\psi^{-1}(\mathbf{X}_s,P_s),P_t) Xt=ψ(ψ1(Xs,Ps),Pt)

    And we use three features to measure the similarity:

    C r g b = 1 − ∣ I s − I ^ s ∣ , C s s i m = S S I M ( I s , I ^ s ) ) , C v g g = 1 − ∥ F s − F ^ s ∥ . \mathcal{C}_{\mathrm{rgb}}=1-|\mathcal{I}_{s}-\hat{\mathcal{I}}_{s}|,\quad\mathcal{C}_{\mathrm{ssim}}=\mathrm{SSIM}(\mathcal{I}_{s},\hat{\mathcal{I}}_{s})),\quad\mathcal{C}_{\mathrm{vgg}}=1-\|\mathcal{F}_{s}-\hat{\mathcal{F}}_{s}\|. Crgb=1IsI^s,Cssim=SSIM(Is,I^s)),Cvgg=1FsF^s∥.

  3. Geometry confidence

    Measure the geometry consistency of the depths and flows across different views.

    The depth:

    C d e p t h = γ ( ∣ d t − d ^ t ) ∣ / d s ) , γ ( x ) = { 0 , if x ≥ τ , 1 − x / τ , otherwise . \left.\mathcal{C}_{depth}=\gamma(|d_t-\hat{d}_t)|/d_s),\quad\gamma(x)=\left\{\begin{array}{cc}0,&\text{if}x\geq\tau,\\1-x/\tau,&\text{otherwise}.\end{array}\right.\right. Cdepth=γ(dtd^t)∣/ds),γ(x)={0,1x/τ,ifxτ,otherwise.

    The flow:

    C f l o w = γ ( ∥ Δ x , y − f s → t ( x s , y s ) ∥ ∥ Δ x , y ∥ ) , Δ x , y = ( x t − x s , y t − y s ) . \mathcal{C}_{flow}=\gamma(\frac{\|\Delta_{x,y}-f_{s\rightarrow t}(x_{s},y_{s})\|}{\|\Delta_{x,y}\|}),\quad\Delta_{x,y}=(x_{t}-x_{s},y_{t}-y_{s}). Cflow=γ(Δx,yΔx,yfst(xs,ys)),Δx,y=(xtxs,ytys).

  4. Learnable confidence combination

    The final confidence map can be learned as C ^ = ∑ i ω i C i , \hat{\mathcal{C}}=\sum_{i}{\omega_{i}\mathcal{C}_{i}}, C^=iωiCi, where ∑ i w i = 1 \sum_iw_i=1 iwi=1 and i ∈ { r g b , s s i m , v g g , d e p t h , f l o w } i \in \{rgb,ssim,vgg,depth,flow\} i{rgb,ssim,vgg,depth,flow}

3.4 Loss function

A RGB loss, a depth loss, and an edge-aware smoothness constraint to penalize large variances in depth.
L c o l o r = ∑ r ∈ R ∥ I ( r ) − I ^ ( r ) ∥ 2 2 L d e p t h = ∑ C ^ ⋅ ∣ D − D ^ ∣ L s m o o t h = ∑ ∣ ∂ x D ^ ∣ exp ⁡ − ∣ ∂ x I ∣ + ∣ ∂ y D ^ ∣ exp ⁡ − ∣ ∂ y I ∣ L t o t a l = L c o l o r + λ 1 L d e p t h + λ 2 L s m o o t h \begin{aligned} &\mathcal{L}_{\mathrm{color}}=\sum_{\mathbf{r}\in\mathcal{R}}\|I(\mathbf{r})-\hat{I}(\mathbf{r})\|_{2}^{2} \\ &\mathcal{L}_{depth}=\sum\hat{\mathcal{C}}\cdot|\mathcal{D}-\hat{\mathcal{D}}| \\ &\mathcal{L}_{smooth}=\sum|\partial_{x}\hat{\mathcal{D}}|\exp^{-|\partial_{x}I|}+|\partial_{y}\hat{\mathcal{D}}|\exp^{-|\partial_{y}I|} \\ &\mathcal{L}_{\mathrm{total}}=\mathcal{L}_{\mathrm{color}}+\lambda_{1}\mathcal{L}_{depth}+\lambda_{2}\mathcal{L}_{smooth} \end{aligned} Lcolor=rRI(r)I^(r)22Ldepth=C^DD^Lsmooth=xD^expxI+yD^expyILtotal=Lcolor+λ1Ldepth+λ2Lsmooth

4. EXPERIMENTS

  1. Dataset: nuScenes and Waymo
  • For the foreground vehicles, we extract car crops from nuScenes and Waymo video sequences.
  • For the large-scale background scenes, we use scenes with 90∼180 images.
  • In each scene, the ego vehicle moves around 10∼40 meters, and the whole scene span more than 200m.
  1. Foreground Vehicles

    Different experiments in static vehicles and moving vehicles, compared with Origin-NeRF and GeoSim(the latest non-NeRF car reconstruction method).

    image-20231128233015946

    There are large room for improvement in PSNR

  2. Background scenes

    Compared with the state-of-the-art methods Mip-NeRF (Barron et al., 2021), Urban-NeRF (Rematas et al., 2022), and Mip-NeRF 360.
    在这里插入图片描述
    Also, show a 360-degree panorama to emphasize some details:
    在这里插入图片描述

  3. BACKGROUND AND FOREGROUND FUSION
    Depth-guided placement and inpainting (e.g. GeoSim Chen et al. (2021b)) and joint NeRF rendering (e.g. GIRAFFE Niemeyer & Geiger (2021)) heavily rely on accurate depth maps and 3D geometry information.
    A new method was used without an introduction?

  4. Ablation study
    在这里插入图片描述

    Split into RGB, depth confidence, and smooth loss.

5. Conclusion

In the future, we will use the block merging as proposed in Block-NeRF (Tancik et al., 2022) to learn a larger city-level neural representation

Reference

[1] S-NeRF (ziyang-xie.github.io)


http://www.kler.cn/news/148788.html

相关文章:

  • 聊聊Go语言的注释
  • 充电桩一些标准和协议介绍
  • asp.net core构造函数注入
  • Python爬虫入门:如何设置代理IP进行网络爬取
  • mysql 性能排查
  • 【论文阅读笔记】Prompt-to-Prompt Image Editing with Cross-Attention Control
  • MATLAB算法实战应用案例精讲-【人工智能】机器人指令编辑
  • 第一百八十三回 如何给图片添加阴影
  • 00.本地搭建 threejs 文档网站(网页版是外网比较慢)
  • redis运维(十九)redis 的扩展应用 lua(一)
  • Android:FragmentTransaction
  • a-range-picker 时间选择器的默认日期显示,日期格式化
  • OMP: Error #15: Initializing libiomp5md.dll
  • C语言——求π的近似值
  • 第八节HarmonyOS @Component自定义组件的生命周期
  • 【Qt之QSqlTableModel】介绍及使用
  • u-popup组件在UniApp中的讲解
  • Unity 关于生命周期函数的一些认识
  • 【创建一个组件并通过npm让其他人安装和调用】
  • 【数据结构】排序效率最优解之一:二叉树-堆
  • .netcore 获取appsettings
  • Leetcode—58.最后一个单词的长度【简单】
  • Linux处理文件常见命令
  • 基于合成数据的行人检测AI模型训练
  • 火柴人版王者-Java
  • java使用freemarker模板生成html,再生成pdf
  • 利用Spring Boot构建restful web service的详细流程
  • Nginx系列-正向代理和反向代理
  • Vue3+java开发组队功能
  • 【hive-design】hive架构详解:描述了hive架构,hive主要组件的作用、hsql在hive执行过程中的底层细节、hive各组件作用