DL methods for shape as implicit surfaces

将 [形状] 视作隐式空间曲面的DL方法

Jianfei Guo 出版于 survey

2021-01-30 2021-01-30 约 14939 字预计阅读 30 分钟

overview

既然可以用一个隐函数 $f(x,y,z)=0$ 表达一个隐曲面
那当然可以先用 $某种神经网络_{一般是MLP+ReLU}$ 去拟合构建一个空间数量值函数 $f(x,y,z)_{数量值一般物理意义为占用概率/与表面距离/表面内外等}$ ，然后训练这个神经网络
训练好以后，如果需要从这个隐函数中提取mesh，一般就用marching cubes类方法空间采样

基础表征

<IM-Net> Learning implicit fields for generative shape modeling

CVPR2019 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhiqin Chen, Hao Zhang

SFU

implicit shape representation, inside-outside indicator

PDF Code Project code-improve code-pytorch

Motivation

inside / outside indicator
其实是一种类别级别的连续函数隐式的shape表征，类似occupancy networks；
输入code + one point 坐标，输出在shape 内/外；（类似SDF）

<Occupancy Networks> Occupancy networks: Learning 3d reconstruction in function space

CVPR2019 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas Geiger

University of Tübingen MPI, Google AI Berlin

continuous function occupancy, multi-resolution isosurface extraction, marching cubes

PDF Code

Motivation

用一个隐式函数来表达占用概率，从而可以实现任意分辨率的表达

主要框架

多分辨率等值面提取技术 [Multiresolution IsoSurface Extraction (MISE)]

Convolutional occupancy networks

ECCV2020 Computer vision–ECCV 2020: 16th european conference, glasgow, UK, august 23–28, 2020, proceedings, part III 16

Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, Andreas Geiger

ETH, University of Tübingen MPI, Amazon, Microsoft

Occupancy Networks

Preprint Code

Motivation

从Occupancy Network的continuous feature function到voxelized features + 3D conv

Dynamic plane convolutional occupancy networks

WACV2021 Proceedings of the IEEE/CVF winter conference on applications of computer vision

Stefan Lionar, Daniil Emtsev, Dusan Svilarkovic, Songyou Peng

ETH MPI

3D reconstruction, occupancy networks

Preprint Code

Motivation

occupancy networks是continuous function；
convolutional occupancy networks是voxelized features；
本篇是动态平面组上的features

<DeepSDF> Deepsdf: Learning continuous signed distance functions for shape representation

CVPR2019 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove

University of Washington, MIT Faceboook reality labs

SDF

Preprint Code

Motivation

SDF是CG领域又一个形状的表征；本篇是first to use deep SDF functions to model shapes

overview

单个形状用单个SDF网络，一个category用code conditioned
使用auto-decoder

<SAL> Sal: Sign agnostic learning of shapes from raw data

CVPR2020 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Matan Atzmon, Yaron Lipman

Weizmann Institute of Science

sign agnostic

Preprint

<BSP-Net> Bsp-net: Generating compact meshes via binary space partitioning

CVPR2020(Oral) Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

SFU Google

low-poly, convex composition, category-shape correspondence, part correpondence, inside-outside indicator

PDF Project code(tf,original) code(pytorch)

编者按

IM-Net同作的续作
效果很好；但是对于thin-structure表现不佳

Motivation

take inspiration from binary space partitions，学到更compact / 紧致 / low-poly的mesh表征

overview

依旧是输入point坐标 + shape code condition，输出inside / outside；
不同之处在于构造的内部模型是n个平面方程，靠n个这样的binary space partition的组合来表征shape
靠binary partition的组合来表达shape的示意图：
首先组合出一个个的convex凸包，再组合成 whole shape
- 其实做的事情本质上类似于把MLP+ReLU的空间线性划分过程显式化，不过这里的convex的概念值得思考

示意图	网络结构

few shot segmentation

因为同category的shape的convex组合之间已经建立起了correspondence，只需要手动给几个shape标一下convex id对应的part label，就可以利用correspondence获得其他同category shape的标注

results

<CvxNet> Cvxnet: Learnable convex decomposition

CVPR2020 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, Andrea Tagliasacchi

Google

convex composition, inside-outside indicator

PDF Code

编者按

和BSP-Net的概念很像；用一个个由平面包围出的convex定义surface；输入是点坐标，输出是 inside/outside indicator
和BSP-Net的区别是这里是softmax？

Motivation

from hyperplanes to occupancy

<Neural-Pull> Neural-pull: Learning signed distance functions from point clouds by learning to pull space onto surfaces

arxiv2020 arXiv preprint arXiv:2011.13495

Baorui Ma, Zhizhong Han, Yu-Shen Liu, Matthias Zwicker

Tsinghua, University of Maryland

reconstructing surfaces from 3D pointcloud, surface reconstruction

Preprint

Motivation

训练一个神经网络去把query 3D locations “拉” 到他们在表面上的最近邻居；
拉的操作，方向是query locations处的网络梯度，步长是query locations处的网络SDF值，这两个都是从网络自身计算出来的
让我们可以同时更新sdf值和梯度

overview

loss functions直接从GT点云本身定义，而不是利用GT SDF作回归；

DUDE: Deep unsigned distance embeddings for hi-fidelity representation of complex 3D surfaces

arxiv2020 arXiv preprint arXiv:2011.02570

Rahul Venkatesh, Sarthak Sharma, Aurobrata Ghosh, Laszlo Jeni, Maneesh Singh

CMU Verisk Analytics

unsigned distance field, normal vector field, open topogoly surfaces

Preprint

Motivation

现有的隐式表面deep networks方法只能表征拓扑上闭合的形状；
并且结果是，训练时候经常需要clean watertight meshes
本篇提出无符号的距离嵌入减轻了上述问题
- 利用unsigned distance field (uDF)无符号距离场来表达对表面的接近程度
- 利用normal vector field (nVF)法向量场来表达表面朝向
- uDF + nVF 可以表达任意开/闭拓扑的high fidelity形状
- 可以从带噪声的triangle soups学习，不需要watertight mehses
- 并且额外提供了学到的表征提取、渲染等值面的新方法

overview

uDF+nVF

<DIF> Deformed implicit field: Modeling 3d shapes with learned dense correspondence

CVPR2021 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yu Deng, Jiaolong Yang, Xin Tong

Tsinghua MSRA

3D deformation field, template field, category shape correspondence

Preprint Code

Motivation

把每个具体instance shape表达为一个template的shape的deformation
用deformation field建立起 shape correspondence，这样就可以做texture transfer、label transfer等

overview

用一个超网络从code预测DeformNet $D$的参数；
然后在空间中的每一处，从同一个template SDF，DeformNet $D$产生位置修正$v$与标量距离修正$\Delta s$，总共4维输出
即最终的$p$点处的SDF值为：$s=T(p+v)+\Delta s=T(p+D^v_{\omega}(p))+D^{\Delta s}_{\omega}(p)$
注意变形向量$v$其实反映的是从shape instance场到 template 场所需的变形向量

losses

SDF loss

被训练的量：变形场超网络$\Psi$，SDF输出场$\Phi$，模板场$T$，learned latent codes $\lbrace \alpha_j\rbrace$；$\Psi_i(p)$代表predicted SDF值$\Phi_{\Psi(\alpha_i)}(p)$，$\Omega$代表3D空间，$\mathcal{S}_i$ 代表形状表面
- $\Phi_{\Psi(\alpha)}(p)=T(p+D_{\Psi(\alpha)}^v(p)) + D_{\Psi(\alpha)}^{\Delta s}(p)$
$L_{sdf}=\underset {i}{\sum} \left( L_1 + L_2 + L_3 + L_4 \right)$
- $\underset {p \in \Omega}{\sum} \lvert \Phi_i(p)-\overline{s}\rvert$ 代表预测SDF和正确SDF的误差
  - $p \in \Omega $ 这里是在3D空间中采样
- $\underset{p\in \mathcal{S}_i}{\sum} (1-\langle \nabla\Phi_i(p), \overline{n} \rangle)$ 代表预测法向量和正确法向量的误差（角度误差，用夹角余弦应接近1来表达）
  - $p \in \mathcal{S}_i$，这里是在表面上采点
- $\underset{p\in\Omega}{\sum} \lvert \Vert \nabla\Phi_i(p) \rVert_2 - 1 \rvert$ 代表预测法向量的模应该是1 （因为是SDF）
  - $p \in \Omega $ 这里是在3D空间中采样
- $\underset{p\in\Omega \backslash \mathcal{S}_i}{\sum} \rho(\Phi_i(p)), ;where ; \rho(s)=\exp(-\delta \cdot \lvert s \rvert), \delta \gg 1$ 代表对 SDF值靠近0 的 非表面 点的惩罚；
  - $\delta \gg 1$就代表只有靠近0的时候这项loss才有值
    - Q: 类似一种负的L0-norm ？
  - 详见 (SIREN) Implicit neural representations with periodic activation functions. NeurIPS2020 论文

正则化

regularization loss to constrain the learned latent codes: $L_{reg}=\underset{i}{\sum} \lVert \alpha_i \rVert_2^2$
可以用一些其他更强的正则化，比如VAE训练时用的最小化latent code后验分布和高斯分布的KL散度

normal consistency prior 法向量一致性先验

考虑到表面点和语义高度关联：e.g. （在canonical space假设下）车顶总是指向天空，左车门总是指向左侧
因此，让相关的点的法向量互相一致
- 鼓励模板场中的点处的法向量和 所有给定shape instance 中的相关点处的法向量一致
- $L_{normal}=\underset{i}{\sum} \underset{p\in\mathcal{S}i}{\sum} \left( 1 - \langle \nabla T(p+D{\omega_i}^v (p)), \overline{n} \rangle \right)$
- 即让模板场中的对应位置p的点和真值法向量保持一致
- $p \in \mathcal{S}_i$，这里是在表面上采点
- ~~如果没有标量修正场，模板场对应位置p的点处的法向量就是最终输出场的法向量，和$L_{sdf}$的第2项一样~~
  - Q: 以下为笔者猜想。有待代码检查验证。
  - 变形后的形状shape instance场中的点坐标是$p$，模板场中的相关点坐标是 $p+D_{\omega_i}^v (p)$
  - 相关点处的法向量其实是$\nabla_{p+D_{\omega_i}^v (p)} T(p+D_{\omega_i}^v (p))$，而非$\nabla_p T(p+D_{\omega_i}^v (p))$
  - $L_{sdf}$第2项是$\nabla_p\Phi_i(p)=\nabla_p \left( \quad T(p+D_{\omega_i}^v (p)) ; (+D_{\omega_i}^{\Delta s}(p)) \quad \right)$
  - 即其主要是强调模板场和变形后的形状实例场中相关点处的两个场的法向量保持一致性
  - 其实应该是$\nabla_{p+D_{\omega_i}^v (p)} T(p+D_{\omega_i}^v (p))$和$\nabla_p\Phi_i(p)$的夹角，而不是和$\overline{n}$的夹角；
    只不过$\nabla_p\Phi_i(p)$就是$\overline{n}$的近似，所以用$\overline{n}$也可

deformation smoothness prior 变形平滑先验

鼓励平滑的变形、防止巨大的形状扭曲，引入一个对变形场的平滑loss
$L_{smooth}=\underset{i}{\sum} \underset{p\in\Omega}{\sum} \underset{d\in{X,Y,Z}}{\sum} \lVert \nabla D_{\omega_i}^v \vert_d (p) \rVert_2$
- ✔️ $\begin{pmatrix} \frac{\partial v_x}{\partial x} \ \frac{\partial v_x}{\partial y} \ \frac{\partial v_x}{\partial z} \end{pmatrix}$, $\begin{pmatrix} \frac{\partial v_y}{\partial x} \ \frac{\partial v_y}{\partial y} \ \frac{\partial v_y}{\partial z} \end{pmatrix}$, $\begin{pmatrix} \frac{\partial v_z}{\partial x} \ \frac{\partial v_z}{\partial y} \ \frac{\partial v_z}{\partial z} \end{pmatrix}$
  - 把$v = \begin{pmatrix} v_x \ v_y \ v_z \end{pmatrix} =D_{\omega_i}^v(p)$ 函数看作3个标量函数构成的向量值函数，每个标量值函数有自己的梯度式
- ❌ $\begin{pmatrix} \frac{\partial v_x}{\partial x} \ \frac{\partial v_y}{\partial x} \ \frac{\partial v_z}{\partial x} \end{pmatrix}$, $\begin{pmatrix} \frac{\partial v_x}{\partial y} \ \frac{\partial v_y}{\partial y} \ \frac{\partial v_z}{\partial y} \end{pmatrix}$, $\begin{pmatrix} \frac{\partial v_x}{\partial z} \ \frac{\partial v_y}{\partial z} \ \frac{\partial v_z}{\partial z} \end{pmatrix}$
penalizes the spatial gradient of the deformation field along X, Y and Z directions.
惩罚变形场函数沿着X,Y,Z方向的空间梯度
$p \in \Omega $ 这里是在3D空间中采样

minimal correction prior

鼓励形状表征主要是通过形变场，而不是通过标量修正
$L_c=\underset{i}{\sum} \underset{p\in\Omega}{\sum} \lvert D_{\omega_i}^{\Delta s}(p) \rvert$ 惩罚标量修正L1大小
$p \in \Omega $ 这里是在3D空间中采样

total

$\underset{\lbrace \alpha_j\rbrace, \Psi, T }{\arg\min} L_{sdf} + w_1 L_{normal}+w_2 L_{smooth}+w_3 L_c + w_4 L_{reg}$，
$L_{sdf}$中的4项：3e3, 1e2, 5e1, 5e2
$w_1=1{\rm e}2, w_2=\lbrace 1,2,5\rbrace, w_3=\lbrace 1{\rm e}2, 5{\rm e}1\rbrace, w_4 = 1{\rm e}2$

results

texture transfer
label transfer：可以看到对于椅子把这种时有时无的结构也可以handle

Ablation study / discussions

单纯的位置修正就已经可以构成变形场；但是本篇发现，仅仅位置修正不够，加入标量修正可以：
- ① 加入标量修正对生成所需shape有帮助
- ② 实验发现 加入标量修正对于学习高质量的相关性也很重要
  - Q: why ?
    试图解释：标量修正可以控制形状的一部分特征： 膨胀？结构/拓扑改变？，从而更容易学到简单、plausible的对应关系？
    - Q: 类似CGAN中，用一个随机噪声z控制一些"不想要"的特征？
    - Q: 除了标量修正这种控制"额外"/"不想要"的特征的方式以外，可否设法引入其他方式控制其他"不想要"的特征？
template implicit field ≠ template shape
- template implicit field并不是template shape；甚至都不是valid SDF
- instead，template implicit field 捕捉的是 一个category中不同物体的shape 结构
- 在实验中，发现如果loss不合适的情况下，template implicit field degenerates to a valid shape SDF representing a certain shape, 导致重建的 精确度下降、相关性降低
几个training loss对结果的影响

implementation details

网络结构

<DIT> Deep implicit templates for 3D shape representation

CVPR2021 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu

Tsinghua

spatial warping LSTM, category shape correpondence

Preprint Code Project Video

编者按

这种变形场类方法，最大的问题应该在于当层级结构 / 拓扑发生大的改变时，这种很大程度由位置决定的对应关系是否无法准确反应结构上的变化，从而导致degenerates的行为
和 deformed implicit field 思路很像，那篇也是清华的
- deformed implicit field 除了位置修正外还有标量$\Delta s$修正；本篇只有位置修正
  - deformed implicit field在表面上的点变形后不一定还在表面上；需要用 最近邻算法 来计算变形后的形状相关点的位置
  - 本篇在表面上的点，变形后一定还在表面上（变形前后的点的SDF值均为0）
- deformed implicit field 是一个超参数网络，从code得到位置修正、$\Delta S$修正的网络参数；本篇是一个LSTM，输入code+p输出位置修正
- 对于模板的理解与deformed implicit field 完全不同：
  - deformed implicit field认为模板是一种对类别中形状公共捕捉/“存储”，甚至模板本身不一定是一个valid SDF
  - 本篇认为模板就是一个valid shape，甚至可以选择数据集中的某个具体物体形状作为模板（user defined templates）
- 📌 对于structure discrepancy结构差异性的考虑，本篇不如deformed implicit field.
  - deformed implicit field有考虑用一个标量修正来cover一定的结构修改；位置修正只包括形状修改
  - 而本篇把结构修改和几何修改全部都用位置变化来cover
    - 比如下图，仔细看最上面一行chair的关键点，其实就是有问题的：最左边的chair，黄色的点是【可以坐的区域 / 椅面的边缘】，而最右边的chair，黄色的点是【沙发把手的边缘】；这显然**在语义上就不是相关的两个点**
~~因为有很多谨慎的设计（1. 使用LSTM warp而不是MLP warp 2.对canonical的正则化 3. 对空间扭曲的正则化），从transfer的效果上看要比deformed implicit field好一些？~~
效果不如deformed implicit field

	本篇：Deep Implicit Templates for 3D Shape Representation	deformed implicit field
texture transfer
label transfer	keypoint detection PCK accuracy	label transfer IOU banchmark
细节对比：本篇结果出现了错误的语义对应

Motivation

把一个具体shape表征为 conditional deformations of a template，建立起 category level 的dense correspondence
注意是 conditional deformations，相当与Deformed NeRF那篇，有一个deformation code
把一个条件空间变换分解为若干个仿射变换
training loss经过谨慎设计，无监督地保证重建的精度 + plausible template

overview

warping函数把首先把一个点p映射到一个canonical position ，然后在模板SDF中query这个canonical position来获取SDF值
照搬原DeepSDF训练是不行的：尤其容易学出一个过分简单的template和过拟合到一个复杂的transformer（这里译作变换器更合适），最终带来不准确的correspondence
目标：
- 一个最优的template，能够表达一组物体的公共结构
- together with a 空间变换器，能够建立精确的稠密的相关性
- 学到的模型应保留DeepSDF的表达能力和泛化能力，因此支持mesh补间和形状补完

spatial warping LSTM

实践发现用MLP来表达warping function不太合适：
- Q: 考虑理论上的原因
- MLP和LSTM作warping的对比：warping的补间
把一个点的空间变换表示为多步仿射变换：
- $ (\alpha^{(i)},\beta^{(i)},\phi^{(i)},\psi^{i})={\rm LSTMCell}(c,p^{(i-1)},\phi^{(i-1)},\psi^{(i-1)}) $
- 其中$\phi$和$\psi$是输出和cell state，$\alpha$和$\beta$是仿射变换的参数，角标$(i)$代表迭代的i-th step
- 点$p$的更新：$p^{(i)}=p^{(i-1)}+(\alpha^{(i)} p^{(i-1)}+\beta^{(i)})$
- 迭代重复S=8次，得到最终的warping的输出
训练loss
- reconstruction loss
  - 因为warping函数是迭代的，从 Curriculum deepsdf, Yueqi Duan et al.2020得到启发，用progressive reconstruction loss
- regularization loss
  - point-wise regularization
    - 认为所有 meshes都normlized 到一个单位球，并和canonical pose对齐
    - 因此，引入一个逐点的loss，通过 ==约束每个点的在warping前后的变化== 来实现这种正则化
    - Huber kernel：原点附近是二次函数，以外是线性函数
    - Q: 这样似乎只能保证canonical pose对齐，并不能保证canonical space具有单位大小
      - A: 笔者推测：用泛泛的位置变化的大小，来提供一种对所有物体的表征都处于canonical pose的约束；
  - point pair regularization 对空间扭曲程度的限制
    - 尽管deform时空间扭曲是不可避免的，极端的空间扭曲还是可以避免的
    - 其中，$\Delta p=T(p,c)-p$是点p的position shift，
      $\epsilon = 0.5$是控制扭曲容忍度的参数，对于防止shape collapse（形状崩塌，指学到过于简单的shape template）很关键
    - 笔者理解：距离越接近的一对点，position shift的差距(大小差距)应越小；即，距离越接近的一对点，变形的差距应越小
      - Q: 考虑这里只有模的差距？如果考虑方向的差距，是否对法向量也会有一定的约束？
        A: 注意这里是"位移向量"的方向差距，不是"法向量"的方向差距
    - 下图是在有无此loss的情况下学到的template；
      可见，如果没有point pair regularization，会学到过于简单的template

results

形状补间的效果：
因为已经建立起了shape correspondense，可以做关键点检测的迁移
应用：texture transfer，等

<IGR> Implicit geometric regularization for learning shapes

ICML2020 International conference on machine learning

Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, Yaron Lipman

Weizmann Institute of Science

SDF, Implicit geometrical regularization

Preprint Code Video

Motivation

从raw 点云中直接学习DeepSDF，在with or without 法向量数据的情况下
用隐式的shape先验，就可以获得plausible solutions
其实就是简单的loss函数，鼓励输入点云处的函数值为0，鼓励空间散布的点的梯度是单位模梯度

overview

given raw input pointcloud $\mathcal{X}=\lbrace x_i\rbrace_{i\in I} \subset \mathbb{R}^3$, with or without normal data $\mathcal{N}=\lbrace n_i\rbrace_{i\in I} \subset \mathbb{R}^3$，从中学出一个 plausible 的surface $\mathcal{M}$
学SDF时的常规loss：
有数据处函数值为0，法向量为真值；
(无数据处)空间分布的点法向量2-norm为1
然而只有上述loss存在问题
- 首先，不能保证学到的是SDF
- 其次，即使能学到SDF，也不能保证学到的是一个 plausible one
本篇通过理论证明，如果对上述loss使用梯度下降算法，，就可以避免bad critical solutions
- 是从平面的线性问题考虑的，把这种属性叫做 plane reduction

<PatchNets> PatchNets: Patch-based generalizable deep implicit 3D shape representations

ECCV2020 European conference on computer vision

Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollhöfer, Carsten Stoll, Christian Theobalt

MPI, facebook

implicit functions, patch-based surface representation, SDF

Preprint Slides

Motivation

mid-level patch-based SDF
因为在patch层次，不同类别的物体有相似性，用上这种相似性就可以做更泛化的模型
在一个canonical space下学到这些patch-based representation
从ShapeNet的一个类别学出来的representation，可以用于表征任何一个其他类别的非常细节的shapes；并且可以用更少的shape来训练

Overview

auto-decoder
losses：重建loss和patch extrinsics的guidance loss，还有regularization

extrinsic loss

这个loss保证所有的patch都对surface有贡献，并且处于caonical space
第i个物体的patch extrinsics: $\boldsymbol{e}i=[\boldsymbol{e}{i,0},\boldsymbol{e}{i,1},\ldots,\boldsymbol{e}{i,N_P-1}]$
$ \mathcal{L}_{ext}(\boldsymbol{e}i) = \mathcal{L}{sur}(\boldsymbol{e}i) + \mathcal{L}{cov}(\boldsymbol{e}i) + \mathcal{L}{rot}(\boldsymbol{e}i) + \mathcal{L}{scl}(\boldsymbol{e}i) + \mathcal{L}{var}(\boldsymbol{e}_i) $
$\mathcal{L}_{sur}(\boldsymbol{e}_i)$ 保证每个patch都离surface很近
- $\underset{逐patch}{\max}[surface上的所有点到该patch距离的最小值]$
$\mathcal{L}_{cov}(\boldsymbol{e}_i)$ symmetric coverage loss，鼓励surface上的每个点都至少被一个patch涵盖
$\mathcal{L}_{rot}(\boldsymbol{e}_i)$ 把patches和surface normals对齐
$\mathcal{L}_{scl}(\boldsymbol{e}_i)$ 鼓励patches to be reasonably small，防止不同patch之间显著的重叠
$\mathcal{L}_{var}(\boldsymbol{e}_i)$ 鼓励所有patch大小相似

result

<OverfitSDF> Overfit neural networks as a compact shape representation

arxiv2020 arXiv e-prints

Thomas Davies, Derek Nowrouzezahrai, Alec Jacobson

University of Toronto, McGill University

compact representation

Preprint Code

Motivation

现在的DeepSDF倾向于做category类别的泛化/生成；
本篇主要提出其实overfit到一个具体的shape的SDF可以作为mesh的一种更compact紧致的表征，而且相比于显式地mesh更省空间
同时，做了很多具体shape optimization的优化，比如采样时基于重要度采样，一些biased points，等

initialization / priors for auto-decoders

<MetaSDF> Metasdf: Meta-learning signed distance functions

NeurIPS2020 arXiv preprint arXiv:2006.09662

Vincent Sitzmann, Eric R Chan, Richard Tucker, Noah Snavely, Gordon Wetzstein

Stanford Google

meta-learning, SDF

Code

编者按

DeepSDF / deep implicit field类方法往往都喜欢用auto-decoder，因为set-encoder有欠拟合的问题
auto-decoder 在测试时也需要infer，需要很多步迭代，infer一次比较耗时
因此用meta-learning（MAML类）找出一个合适的auto-decoder优化的初值code
这样在测试时infer就只需要少量步数的迭代就可以得到很好的效果

Learned initializations for optimizing coordinate-based neural representations

CVPR2021 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P Srinivasan, Jonathan T Barron, Ren Ng

UCB Google

meta-learning

Preprint

编者按

通用地分析了各种 coordinate-based 方法的 meta-learning 效果

Motivation

对于coordinate-based neural representations在auto-decoder时，用meta-learned 的initialization
与MetaSDF的差别：进一步拓展到更多种类的neural coordinate-based signals，并且把the power of using initial weight settings开发为一种先验信息

Deep optimized priors for 3d shape modeling and reconstruction

CVPR2021 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Mingyue Yang, Yuxin Wen, Weikai Chen, Yongwei Chen, Kui Jia

南方科技大学 Tencent America

better shape priors

Preprint

Motivation

现有的很多方法test time都是从fixed trained priors
本篇提出在training以后，仍然从physical measurements进一步最优化learned prior

better iso-surface

<MeshSDF> Meshsdf: Differentiable iso-surface extraction

arxiv2020 arXiv preprint arXiv:2006.03997

Edoardo Remelli, Artem Lukoianov, Stephan R Richter, Benoı̂t Guillard, Timur Bagautdinov, Pierre Baque, Pascal Fua

EPFL Neuralconcept, Intel

differentiable iso-surface extraction, marching cubes, SDF

Preprint Code

highlight

differentiable iso-surface extraction

编者按

和DVR思路类似，首先手动推导出表面点坐标对网络参数的梯度，实际计算时就可以先用采样-based方法得出点坐标，再代入手动推导出的梯度式子构成完整的反向传播链路
手动推导表面点坐标对网络参数的梯度过程中，用到了SDF的特殊性质（某一点函数值的梯度就是该点的法向量），不适用于一般性implicit occupancy field

<Iso-Points> Iso-points: Optimizing neural implicit surfaces with hybrid representations

CVPR2021 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Wang Yifan, Shihao Wu, Cengiz Oztireli, Olga Sorkine-Hornung

ETH, Cambridge

iso-points, hybrid representation, SDF

Preprint Project

Motivation

目前这些输入点云deep implicit field学surface的方法，optimizing时，精确、鲁棒的重建仍然非常有挑战性
本篇提出用等值面上的点作为一个额外的显式表征；被计算、更新on-the-fly，有效提高收敛率和最终质量

overview

目标：给定一个neural implicit function $f_t(\boldsymbol{\rm p};\theta_t)$ at t-th iteration，efficiently generate and utilize 一组稠密的、均匀分布的iso-points (points at zero level set)
- 这组iso-points可以用于
  - 改进training data的sampling
  - 提供最优化时的regularization

iso-surface sampling：如何得到iso-surface上均匀分布的点

projection：projecing a point onto the iso-surface 可以被视作在一个给定点用牛顿法估计一个方程的根
- 考虑这里和贾奎那篇analytic marching算法初始找到表面上一个点的思路是很像的
- 给定隐函数$f(\boldsymbol{\rm p}): \mathbb{R}^3\rightarrow \mathbb{R}$，初始点$\boldsymbol{\rm q}0\in\mathbb{R}^3$
  牛顿法求根：$\boldsymbol{\rm q}{k+1}=\boldsymbol{\rm q}_{k}-J_f(\boldsymbol{\rm q}_k)^+ f(\boldsymbol{\rm q}_k)$, where $J_f(\boldsymbol{\rm q}_k)^+$是Jacobian的Moore-Penrose 伪逆
- $J_f$是一个row 3-vector，所以$J_f(\boldsymbol{\rm q}_k)^+ = \frac {J_f^{\top}(\boldsymbol{\rm q}_k)} {\lVert J_f(\boldsymbol{\rm q}_k) \rVert^2}$, where $J_f(\boldsymbol{\rm q}_k)$ 可以直接通过反向传播计算
- 不过，由于一些同时代的工作常采用sine activation functions或者positional encoding，SDF噪声很大，梯度高度non-smooth，直接使用牛顿法会导致overshooting和oscillation
- 当然可以用一些更精致的line search算法，不过这里直接用简单的clipping操作
- 点$\mathcal{Q}t$集合的初始化：刚开始就用一个unit sphere shape初始化，后面用$\mathcal{Q}{t-1}$初始化
- 最大10个牛顿迭代，停止阈值从$10^{-4}$逐渐缩小到$10^{-5}$
uniform resampling
- 迭代地把点从high-density regions移开
- 这步和f没有关系了，移开的方向都是由邻居点定义的
upsampling
- 基于 EAR (edge-aware resampling)
  - Edge-aware point set resampling, SIGGRAPH Asia 2013

results

analytic exact solution

Analytic marching: An analytic meshing solution from deep implicit surface networks

International conference on machine learning

Jiabao Lei, Kui Jia

南方科技大学 琶洲实验室

learning surface mesh via implicit field functions, MLP analytic solution

PDF Code Slides

Motivation

deep learning领域出现了很多研究，surface 的implicit functions用MLP+ReLU实现
为了实现meshing (exactly recover meshes) from learned implicit functions (MLP+ReLU)
- 现有的方法采用的事实上都是标准的marching cubes采样算法；虽然效果还行，但是损失了学到的MLP的精确度，due to 离散化的本质
- 基于ReLU-based MLP 把input空间分为很多线性区域的事实，本篇把这些区域识别为analytic cells与analytic faces，与implicit function的零值等值面有关
- 推导了这些identified analytic faces在什么理论条件下可以保证形成一个闭合的、piecewise的planar surface
- 基于本篇的这些理论推导，提出了一个可并行化的算法，在这些analytic cells上做marching，来==exactly recover==这些由learned MLP学出来的mesh

overview

算法的初始：先用SGD $\underset {\boldsymbol{x}\in\mathbb{R}^3}{\min} \lvert F(\boldsymbol{x}) \rvert$ 找到表面上的一个点

效果：解析解就是降维打击。精确度无限(exact 解) + CPU跑都比别人GPU跑快十几倍

resontruction from view

<DISN> DISN: Deep implicit surface network for high-quality single-view 3D reconstruction

NeruIPS2019 Advances in Neural Information Processing Systems

Qiangeng Xu, Weiyue Wang, Duygu Ceylan, Radomir Mech, Ulrich Neumann

University of Southern California Adobe

SDF, single-view, encoder

Preprint Code

编者按

训练时是有3D shape 的SDF的真值数据的；图像feature只是提供一个辅助的code输入而已

Motivation

希望学到的shape，不仅全局特征好，还想有局部fine grained details 细粒度细节

overview

同时用global features和local features来infer SDF

Learning to infer implicit surfaces without 3D supervision

NIPS2019 Advances in Neural Information Processing Systems

Shichen Liu, Shunsuke Saito, Weikai Chen, Hao Li

University of Southern California

ray casting

PDF

Motivation

implicit occupancy field

differentiable renderer

<DVR> Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision

CVPR2020 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger

University of Tübingen MPI

ray casting

Preprint Code

编者按

思路：首先手动推导出每个camera ray和隐表面交点的点坐标对网络参数的梯度，在实际计算时，就可以先在camera ray上采样得出交点坐标(类似二分法)，然后代入所手动推导出的式子构成完整的反向传播链路

<DIST> Dist: Rendering deep implicit signed distance function with differentiable sphere tracing

CVPR2020 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Shaohui Liu, Yinda Zhang, Songyou Peng, Boxin Shi, Marc Pollefeys, Zhaopeng Cui

ETH, Tsinghua University, Peking University MPI, Google, Peng Cheng Laboratory

SDF, differentiable renderer, sphere tracing

PDF Code Project Slides

编者按

文中出现了非常多技术细节的详细解释，值得一读
sphere tracing
训练一个神经网络，同时为每个3D location 预测signed distance 和color
需要silhouette真值

Motivation

给SDF加上一个differentiable renderer，来为inverse graphics models和deep implicit surface field建设桥梁
solving vision problem as inverse graphics process is one of the foundamental approaches, where the solution is the visual structure that best explains given observations 把视觉问题看做逆向图形学过程来解决；寻找能最好地解释给定观测的视觉结构
- 3D geometry理解领域：很早就被使用(1974, 1999, etc.)
- 常常需要一个高效的renderer来从从一个optimizable 的3D结构精确地simulate这些观测(e.g. depth maps)，同时需要是可微的，来反向传播局部观测的误差
- (first) a differentiable renderer for learning-based SDF
用一个可微分的renderer来把learning-based SDF可微分地渲染为 depth image, surface normal, silhouettes，从任意相机viewpoints
应用：可用于infer 3D shape from various inputs, e.g. multi-view images and single depth image

overview

[auto-decoder] 给定一个已经pre-trained generative model, e.g. DeepSDF, 通过在latent code space 寻找能产生和给定观测最一致的3D shape
[sphere tracing] 使用一个类似sphere tracing的框架来做可微分的渲染
- 直接应用sphere tracing因为需要对network做反复的query并且在反向传播时产生递归的计算图（笔者注：就像SRN那样），计算费时、费内存；所以需要对前向传播和反向传播过程都要做出优化
- sphere-traced results (i.e. camera ray上的距离)，可以用于产生各种输出，如深度图、表面法向量、轮廓等，因此可以用loss来方便地形成端到端的manner
- 前向通路
- 用一种coarse-to-fine的方法来save computation at initial steps
  - 考虑到在sphere tracing的前面几步，不同pixel的ray都非常接近
  - 从图像的1/4分辨率开始tracing，然后每3步以后把每个像素分成4份
  - 在6步后，full resolution下的每个像素都有一个对应的ray，一直marching直到收敛
- 一个aggresive 策略来加速ray marching
  - marching步长是$\alpha=1.5$倍的queried SDF value
  - 在距离表面很远的时候更快地朝表面march
  - 在ill-posed情况下能加速收敛（当表面法向量和ray direction的夹角很小时）
    - Q: what?
  - ray可以射穿表面，能够采样到表面内部(SDF<0)；对表面的两侧都可以应用supervision
- dynamic synchronized inference
- 一个safe convergence criteria来防止不必要的网络query，同时保留分辨率
反向传播
- 用SDF的梯度的近似值，对训练影响不大，但是显著减少计算和内存占用

实验

收敛速度
Texture Re-rendering
Shape Completion from Sparse Depths
Shape Completion over Different Sparsity
Inverse Optimization over Camera Extrinsics
Multi-view Reconstruction from Video Sequences 从多视角视频序列重建

<SDF-SRN> SDF-SRN: Learning signed distance 3D object reconstruction from static images

NeurIPS2020 Advances in Neural Information Processing Systems

Chen-Hsuan Lin, Chaoyang Wang, Simon Lucey

CMU

single-view

PDF Code Project

编者按

训练时需要single view silhouette

Motivation

单视角3D物体重建，过去的方法往往都有3D形状真值
最近的方法可以没有3D监督信号，但是还是需要训练时多视角的对同个instance的silhouettes标注；因此大多只能应对合成数据集
本篇提出SDF-SRN，只需要单视角图片(只在训练时+silhouette)输入

overview

single-view一般需要encoder

Results

学出的形状奇奇怪怪；不过总归是纯图片输入，而且只有训练时需要silhouette
颜色重建的质量也一般

<SDFDiff> Sdfdiff: Differentiable rendering of signed distance fields for 3d shape optimization

CVPR2020 Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Yue Jiang, Dantong Ji, Zhizhong Han, Matthias Zwicker

University of Maryland

SDF, differentiable rendering, multi-view, single-view, multi-resolution strategy

Preprint Code

编者按

需要分割好的多视角图片

Motivation

image-based shape optimization using differentiable rendering of 3D shapes represented by SDF
- SDF作为形状表征的优势：可以表征具有任意拓扑的形状，并且可以保证watertight

Overview

learn SDF on a 3D grid
perform ray-casting via sphere tracing
differentiable renderer
- 学到的是voxelized SDF，然后通过linear interpolation获取任意连续位置处的SDF
- 给定像素值的导数只与interpolation时的8个邻居体素有关
  - 或者说，sphere tracing本身不需要是可微分的
  - 只需要 local 8个邻居的 local 计算需要可微分
energy function & losses
- 从geometry相机位置等$\Theta$，可以render出image$I$：$I=R(\Theta)$
  inverse rendering就是$\Theta=R^{-1}(I)$
  但是inverse rendering并不直接可逆，因此把问题建模为energy minimization problem能量最小问题
  $\Theta^\ast=\underset{\Theta}{\arg\min} \mathcal{L}_{img}(R(\Theta),I)$
- 重点在于一个differentiable renderer：本篇强调shape。输入camera pose和shape，输出渲染图像
- $\mathcal{L}_{img}$衡量render图像和$I$的差别
- $\mathcal{L}_{reg}$ 正则化项，保证$\Theta$是一个valid signed distance field（i.e. 梯度是单位向量）
  实践中，是用$\Delta$近似的梯度
single view：从图像encode到一个voxelized 稀疏SDF，经过一些3D卷积refinement，经过differentiable renderer到image
multi view：就用auto-decoder直接训练

results

single view

<IDR> Multiview neural surface reconstruction by disentangling geometry and appearance

NeurIPS2020 Advances in Neural Information Processing Systems

Lior Yariv, Yoni Kasten, Dror Moran, Meirav Galun, Matan Atzmon, Basri Ronen, Yaron Lipman

Weizmann Institute of Science

multi-view, unposed images, single masked object image, SDF

PDF Code

编者按

训练需要multi view分割好的unposed images，不一定需要相机pose
IDR=implicit differentiable renderer
有一点SDF与NeRF结合的味道，因为其颜色是从坐标位置、几何参数、观测方向共同得来的
公式推导比较细致，因为值除了对几何参数有导数表达式外，还对相机参数有导数表达式
重点比较对象是DVR

Motivation

SDF的differentiable renderer，用于从multi view image重建3D表面，在未知相机参数的情况下
DVR和本篇很像，但是DVR不能处理泛化的外观，并且不能处理未知的、大噪声的相机位置
SDF的优势
- 可以高效地用sphere tracing来做ray casting
- 平滑的、真实的表面

Overview

把3D surface表达为一个deep implicit field $f$ 的zero level set
$\mathcal{S}_{\theta}=\lbrace \boldsymbol{x}\in\mathbb{R}^3 \vert f(\boldsymbol{x},\theta)=0 \rbrace$
- 为了avoid everywhere 0 solution，$f$ 一般都会regularized，比如SDF的regularization；本篇用了 implicit geometric regularization(IGR)
三个未知量（也是被优化的量）：geometry几何$\theta\in\mathbb{R}^m$，appearance外观$\gamma\in\mathbb{R}^n$，cameras相机参数$\tau\in\mathbb{R}^k$
- 注意本篇中的相机参数也是一个未知量、被优化的值，因此所有值除了需要对几何参数$\theta$有导数表达式外，还需要对相机参数$\tau$（i.e.相机中心点$\boldsymbol{c}$和view direction $\boldsymbol{v}$）有导数表达式
把一个像素处的颜色/radiance建模为一个射线交点坐标$\boldsymbol{\hat x}_p$、表面法向量$\boldsymbol{\hat n}_p$、view direction$\boldsymbol{\hat v}_p$、几何参数$\boldsymbol{\hat z}_p$、外观参数$\gamma$的映射
$L_p(\theta,\gamma,\tau)=M(\boldsymbol{\hat x}_p, \boldsymbol{\hat n}_p, \boldsymbol{\hat z}_p, \boldsymbol{\hat v}_p;\gamma)$
- 某种程度上像NeRF
- 射线交点坐标、表面法向量、几何参数、view direction 与几何$\theta$、相机参数$\tau$有关，因为$\boldsymbol{\hat x}_p=\boldsymbol{\hat x}_p(\theta,\tau)$
- M是又一个MLP
losses
- RGB loss，是L1-Norm，逐像素
- MASK loss，在render的时候就可以render出一个近似的可微分的mask，于是这里可以直接cross-entropy loss，逐像素
- reg loss，Eikonal regularization，保证是个SDF，即网络梯度模为1；bbox中均匀采点
  - ${\rm loss}E(\theta)=\mathbb{E}{\boldsymbol{x}}(\lVert \nabla_{\boldsymbol{x}}f(\boldsymbol{x};\theta) \rVert -1)^2$, where $\boldsymbol{x}$在scene的一个bbox中均匀分布

Differentiable intersections of view directions and geometry

假设交叉点坐标表示为 $\boldsymbol{\hat x}_ p(\theta,\tau)=\boldsymbol{c}+t(\theta,\boldsymbol{c},\boldsymbol{v})\boldsymbol{v}$，关键是 $t$ 这个标量值是 $\theta$, 相机中心点位置 $\boldsymbol{c}$, 观测方向 $\boldsymbol{v}$ 的函数
$ \boldsymbol{\hat x}_p(\theta,\tau)=\boldsymbol{c}+t_0\boldsymbol{v} - \frac {\boldsymbol{v}}{\nabla_x f(\boldsymbol{x}_0;\theta_0) \cdot \boldsymbol{v}_0} f(\boldsymbol{c}+t_0\boldsymbol{v};\theta) $
- 并且 is exact in value and first derivatives of $\theta$和$\tau$ at $\theta=\theta_0, \tau=\tau_0$
- Q: what?
用隐函数微分；
SDF在一点的法向量就是其梯度，是因为梯度的模就是1

approximation of the surface light field

masked rendering

==[*]== 在render的时候额外render出一个可微分的近似binary的mask
考虑如何render mask：
- 不可微的准确表达：
  $$ S(\theta, \tau) = \begin{cases} 1 & R(\tau) \cap \mathcal{S}_{\theta} \neq \emptyset \ 0 & \text{otherwise} \end{cases} $$
- 一个近似binary的可微表达：
  $$ S_{\alpha}(\theta, \tau)=\text{sigmoid} \left( -\alpha \min_{t \geq 0} f(\mathbf{c}+t\mathbf{v}; \theta) \right) $$
  - $ S_{\alpha}(\theta, \tau) \stackrel{ \alpha \rightarrow \infty }{\longrightarrow} S(\theta, \tau)$
  - SDF 表达，内部SDF < 0 ，外部 SDF > 0
  - 包络定理¹，可得到 $S_{\alpha}$ 相对于参数 $\mathbf{c}, \mathbf{v}, \theta$ 的梯度
  - 注意这个只会对让 $f$ 取得最小的那个值 $t^\ast$ 对应的采样点 $\mathbf{c} + t^\ast \mathbf{v}$ 产生梯度

results

可以做外观transfer

supp

解耦形状和外观：注意法向量condition的效果

Implementation

⚠️ 相机参数重新标准化，使得相机的 visual hulls 视化外壳包含在一个单位球内
- we re-normalize the cameras so that their visual hulls are contained in the unit sphere
- 在其代码仓中的描述：注意是让被观察的物体的 visual hull 大致位于单位球内
  - 在 ‘camera.npz’ 文件中的 ‘scaled_mat_{i}’ 矩阵即为 normalization matrix，意在重新 normalize 相机使得 the visual hull of the observed object is approximately inside the unit sphere.
  - 这个normalization matrix 是利用物体mask标注和相机投射矩阵计算得到的，脚本位于 code/preprocess/preprocess_camera.py
  - 该代码大致意思是，利用 camera_0 的图像和mask作为reference，然后对mask中的每个像素在其他观测中的最小/最大深度；根据这样得到一组最大/最小点，直接取平均值作为 centroid，取标准差作为每个维度的scale
- 🤔 考虑：这种 re-normalization 应该只适合环绕型的相机视角

最小值/最大值函数值相对于参数的梯度只与参数有关，和自变量无关 ↩︎

compositional

Semi-supervised learning of multi-object 3D scene representations

arXiv preprint arXiv:2010.04030

Cathrin Elich, Martin R Oswald, Marc Pollefeys, Joerg Stueckler

ETH MPI, Microsoft

recurrent encoder, SDF, differentiable renderer

Preprint

编者按

只用了clevrn类数据集，而且甚至还是简单的低分辨率渲染，实验比较简单

Motivation

把场景表征为多个物体
输入input RGB image，通过一个recurrent encoder，回归出每个物体的shape, pose, texture；shape通过SDF表征
半监督体现在训练时候用的是RGB-D，测试时候只需要RGB
single view见所有物体；物体个数是已知的

Overview

首先从example shapes有监督地训练SDF（的decoder）；
然后自监督地通过RGB-D训练differentiable renderer和recurrent encoder
Q: recurrent真的能这样设计吗？
可以看到recurrent的主要目的是迭代、逐个地得出object的code，倒是和之前*Multi-object representation learning with iterative variational inference.*那篇有些像
每个物体输出深度估计，图像估计，与occulusion mask

目录

目录