Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance

Jianfei Guo 出版于 paper_reading

2020-12-03 2020-12-03 约 1820 字预计阅读 4 分钟

<IDR> Multiview neural surface reconstruction by disentangling geometry and appearance

NeurIPS2020 Advances in Neural Information Processing Systems

Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, Yaron Lipman

Weizmann Institute of Science

multi-view, unposed images, single masked object image, SDF

PDF Code

编者按

训练需要multi view分割好的unposed images，不一定需要相机pose
IDR=implicit differentiable renderer
有一点SDF与NeRF结合的味道，因为其颜色是从坐标位置、几何参数、观测方向共同得来的
公式推导比较细致，因为值除了对几何参数有导数表达式外，还对相机参数有导数表达式
重点比较对象是DVR

Motivation

SDF的differentiable renderer，用于从multi view image重建3D表面，在未知相机参数的情况下
DVR和本篇很像，但是DVR不能处理泛化的外观，并且不能处理未知的、大噪声的相机位置
SDF的优势
- 可以高效地用sphere tracing来做ray casting
- 平滑的、真实的表面

Overview

把3D surface表达为一个deep implicit field $f$ 的zero level set
$\mathcal{S}_{\theta}=\lbrace \boldsymbol{x}\in\mathbb{R}^3 \vert f(\boldsymbol{x},\theta)=0 \rbrace$
- 为了avoid everywhere 0 solution，$f$ 一般都会regularized，比如SDF的regularization；本篇用了 implicit geometric regularization(IGR)
三个未知量（也是被优化的量）：geometry几何$\theta\in\mathbb{R}^m$，appearance外观$\gamma\in\mathbb{R}^n$，cameras相机参数$\tau\in\mathbb{R}^k$
- 注意本篇中的相机参数也是一个未知量、被优化的值，因此所有值除了需要对几何参数$\theta$有导数表达式外，还需要对相机参数$\tau$（i.e.相机中心点$\boldsymbol{c}$和view direction $\boldsymbol{v}$）有导数表达式
把一个像素处的颜色/radiance建模为一个射线交点坐标$\boldsymbol{\hat x}_p$、表面法向量$\boldsymbol{\hat n}_p$、view direction$\boldsymbol{\hat v}_p$、几何参数$\boldsymbol{\hat z}_p$、外观参数$\gamma$的映射
$L_p(\theta,\gamma,\tau)=M(\boldsymbol{\hat x}_p, \boldsymbol{\hat n}_p, \boldsymbol{\hat z}_p, \boldsymbol{\hat v}_p;\gamma)$
- 某种程度上像NeRF
- 射线交点坐标、表面法向量、几何参数、view direction 与几何$\theta$、相机参数$\tau$有关，因为$\boldsymbol{\hat x}_p=\boldsymbol{\hat x}_p(\theta,\tau)$
- M是又一个MLP
losses
- RGB loss，是L1-Norm，逐像素
- MASK loss，在render的时候就可以render出一个近似的可微分的mask，于是这里可以直接cross-entropy loss，逐像素
- reg loss，Eikonal regularization，保证是个SDF，即网络梯度模为1；bbox中均匀采点
  - ${\rm loss}E(\theta)=\mathbb{E}{\boldsymbol{x}}(\lVert \nabla_{\boldsymbol{x}}f(\boldsymbol{x};\theta) \rVert -1)^2$, where $\boldsymbol{x}$在scene的一个bbox中均匀分布

Differentiable intersections of view directions and geometry

假设交叉点坐标表示为 $\boldsymbol{\hat x}_ p(\theta,\tau)=\boldsymbol{c}+t(\theta,\boldsymbol{c},\boldsymbol{v})\boldsymbol{v}$，关键是 $t$ 这个标量值是 $\theta$, 相机中心点位置 $\boldsymbol{c}$, 观测方向 $\boldsymbol{v}$ 的函数
$ \boldsymbol{\hat x}_p(\theta,\tau)=\boldsymbol{c}+t_0\boldsymbol{v} - \frac {\boldsymbol{v}}{\nabla_x f(\boldsymbol{x}_0;\theta_0) \cdot \boldsymbol{v}_0} f(\boldsymbol{c}+t_0\boldsymbol{v};\theta) $
- 并且 is exact in value and first derivatives of $\theta$和$\tau$ at $\theta=\theta_0, \tau=\tau_0$
- Q: what?
用隐函数微分；
SDF在一点的法向量就是其梯度，是因为梯度的模就是1

approximation of the surface light field

masked rendering

==[*]== 在render的时候额外render出一个可微分的近似binary的mask
考虑如何render mask：
- 不可微的准确表达：
  $$ S(\theta, \tau) = \begin{cases} 1 & R(\tau) \cap \mathcal{S}_{\theta} \neq \emptyset \ 0 & \text{otherwise} \end{cases} $$
- 一个近似binary的可微表达：
  $$ S_{\alpha}(\theta, \tau)=\text{sigmoid} \left( -\alpha \min_{t \geq 0} f(\mathbf{c}+t\mathbf{v}; \theta) \right) $$
  - $ S_{\alpha}(\theta, \tau) \stackrel{ \alpha \rightarrow \infty }{\longrightarrow} S(\theta, \tau)$
  - SDF 表达，内部SDF < 0 ，外部 SDF > 0
  - 包络定理¹，可得到 $S_{\alpha}$ 相对于参数 $\mathbf{c}, \mathbf{v}, \theta$ 的梯度
  - 注意这个只会对让 $f$ 取得最小的那个值 $t^\ast$ 对应的采样点 $\mathbf{c} + t^\ast \mathbf{v}$ 产生梯度

results

可以做外观transfer

supp

解耦形状和外观：注意法向量condition的效果

Implementation

⚠️ 相机参数重新标准化，使得相机的 visual hulls 视化外壳包含在一个单位球内
- we re-normalize the cameras so that their visual hulls are contained in the unit sphere
- 在其代码仓中的描述：注意是让被观察的物体的 visual hull 大致位于单位球内
  - 在 ‘camera.npz’ 文件中的 ‘scaled_mat_{i}’ 矩阵即为 normalization matrix，意在重新 normalize 相机使得 the visual hull of the observed object is approximately inside the unit sphere.
  - 这个normalization matrix 是利用物体mask标注和相机投射矩阵计算得到的，脚本位于 code/preprocess/preprocess_camera.py
  - 该代码大致意思是，利用 camera_0 的图像和mask作为reference，然后对mask中的每个像素在其他观测中的最小/最大深度；根据这样得到一组最大/最小点，直接取平均值作为 centroid，取标准差作为每个维度的scale
- 🤔 考虑：这种 re-normalization 应该只适合环绕型的相机视角

最小值/最大值函数值相对于参数的梯度只与参数有关，和自变量无关 ↩︎

目录

目录