目录

目录

DL methods for shape as parametric surfaces

[形状] 视作 参数化空间曲面 的DL方法


learning parametric surface

  • keyword
    • neural parametric surface
    • parametric surface generation/generative
  • overview
    • 用一个参数方程\([x(s,t),y(s,t),z(s,t)]\)表达一个曲面
    • 可以用显式的手动构建或者隐式的神经网络来构建这个从s,t到x,y,z的映射关系

continuous patches

<AtlasNet> A papier-mâché approach to learning 3d surface generation
 
  • https://longtimenohack.com/posts/paper_reading/2018cvpr_groueix_papier/image-20201208004500075.png

Motivation

  • represents a surface as a collection of parametric surface elements
    把一个表面表征为一组parametric surface元素的集合
  • 学到的一族从单位方到局部 2-流形的映射,非常类似一个surface 的 atlas 图册
  • 每一个3D点最终都可以得到一个2D UV值

overview

  • https://longtimenohack.com/posts/paper_reading/2018cvpr_groueix_papier/image-20201208004950236.png
  • pointcloud基线,是把一个latent shape code输出为一组点
  • 本篇方法,额外输入一个从均匀单位方内采样的2D坐标点,用其来产生surface上的一个single point
    • 从点云/数据中学出这种2-manifold(i.e. two-dimensional manifolds,二维流形)的parameterization
    • 属于parametric approaches 分支
    • ==这里本质上就是一个从二维均匀分布到空间二维流形分布的映射,condition on一个shape code==
  • 很容易扩展多次,来把一个3D shape表征为几个surface 元素的联合

局部参数化表面的生成 locally parameterized surface generation

  • 把surface看做一个广义的2-manifold(允许self-intersection & disjoint sets),考虑局部的参数化
    consider a 2-manifold \(\mathcal{S}\), a point \(\boldsymbol{p} \in \mathcal{S}\), a parameterization \(\varphi\) of \(\mathcal{S}\) in a local neighborhood of \(\boldsymbol{p}\)
  • 假定这个局部参数化就是从单位方 \([0,1]^2\) 到2-manifold \(\mathcal{S}_{\theta}\) 的映射 \(\varphi_{\theta}(x)\) : \(\mathcal{S}_\theta=\varphi_{\theta}([0,1]^2)\)
    \(\mathcal{S}_{\theta}\)去估计/近似局部2-manifold \(S_{loc}\)
  • i.e.寻找 参数 \(\theta\) 来最小化目标函数 \(\underset{\theta}{\min}\mathcal{L}(\mathcal{S}_\theta,\mathcal{S}_{loc})+\lambda\mathcal{R}(\theta)\)
    上式的 \(\mathcal{L}\) 是两个2-manifold之间的loss,\(\mathcal{R}\)是参数\(\theta\)的正则化项;
    实践中,计算的不是两个2-manifold之间的loss,而是这两个2-manifold采样出的点集的chamfer 和 earth-mover距离
  • 证明了MLP+ReLU就可以产生2-manifolds
  • 证明了MLP+ReLU产生的2-manifolds can be learned to 很好地近似 target 2-manifolds
    用了universal representation theorum:
    Approximation capabilities of multilayer feedforward networks. Neural Networks, 1991
  • polygon mesh
  • 建立一套3D shape和2D domain之间的连接是几何处理的一个存在已久的问题,它的应用有:texture mapping, re-meshing, shape correspondance
  • 过去的方法需要input data就是parameterized;本篇直接从点云中学出这种parameterization
<Deep geometric prior> Deep geometric prior for surface reconstruction
 

Motivation

  • 首先把输入点云分成若干个重叠的部分,然后用MLP流形学习每个部分;
  • 每个local流形学习用2-Wasserstein loss / EMD loss
    并在所有流形之间保证consistency
  • https://longtimenohack.com/posts/paper_reading/2019cvpr_walliams_deep/image-20201228171157982.png

results

  • https://longtimenohack.com/posts/paper_reading/2019cvpr_walliams_deep/image-20201228174443654.png
<Pix2Surf> Pix2surf: Learning parametric 3d surface models of objects from images
 

Result

  • 评价:可以看到学出来的曲面可以不是闭合的
  • https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201207204146033.png https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201207204206853.png

Motivation

  • learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views
  • 使用2D patch来作为UV parameterization,处理多个non-adjacent views,并且建立2D pixels和3D surface points之间的correspondence
  • 那些用implicit functions表达的surface,想要得到显式的表面,需要昂贵的后处理步骤:如Marching Cubes;本文直接学习生成显式的表面

主要贡献

  • high-quality parametric surfaces 遵循multi view一致性
  • 生成的3D表面保留了精确的图像像素到3D表面点的correspondance,使得可以lift texture information去reconstruct 带有丰富集合与外观的 shapes

引用的directly reconstruct a parametric representation of a shape’s surface

  • class-specific templates (canonical template / mean shape in canonical space)
    逐个类别手动设计的shape template
    • [ECCV2018] Learning category-specific mesh reconstruction from image collections.
    • [ICCV2019] Canonical surface mapping via geometric cycle consistency
  • general structured templates
    适用于各种类别的通用shape template学习方法(应对不同的形状、拓扑)
    • [ICCV2019] Learning shape templates with structured implicit functions.
  • more generic surface representations
    • meshes deform
      • [ECCV2018] Pixel2mesh: Generating 3d mesh models from single rgb images.
      • [ICCV2019] Pixel2mesh++: Multi-view 3d mesh generation via deformation
      • [CVPR2019] 3DN: 3d deformation network.
    • differentiable mesh renderer + image supervision
      • [CVPR2018] Neural 3d mesh renderer
      • [2019] Soft rasterizer: A differentiable renderer for image-based 3d reasoning
      • [2019] Pix2vex: Image-togeometry reconstruction using a smooth differentiable renderer.
      • [CVPR2019] Learning view priors for single-view 3d reconstruction.
    • ==continuous 2D patches== 本篇类似:使用2D patch来作为UV parameterization
      • [CVPR2018] Atlasnet: A papier-mâché approach to learning 3d surface generation.
      • AtlasNet for video clip
        [CVPR2019] Photometric mesh optimization for video-aligned 3d object reconstruction.
      • introduce topology modification to atlasnet
        [ICCV2019] Deep mesh reconstruction from single rgb images via topology modification networks

preliminaries

  • NOCS
    • 可以预测出一张图片的nocs map和mask
  • surface parameterization
    • 表面的UV参数化即一个chart
    • 用一组全连接网络学习多个chart

overview

  • ==注意==:不同于atlas net,uv不是来自于均匀采样,而是来自于一个learned network,uv predictor
    所以是先预测出图像每个像素的uv值,再把图像上属于这个物体的uv值集合和图像的feature 拼接一起来 输出 三维点集合(二维流形的三维点坐标集)
  • graph LR
    	img[image coordinate] -.per index prediction.-> uv[uv value] --> MLP
    	image --> z[global latent code z] --> MLP
    	MLP --> 3d[3D surface coordinate]
    

  • https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208103708582.png

single view single chart pix2surf

  • NOCS-UV branch
    • 在过去的NOCS输出上额外加两个channel,输出uv值
    • uv不是均匀采样来的,而是直接从图像预测出一张2-channel uv image
      https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208101930111.png
    • 发现可以emergence of a chart,并且这个chart几乎已经multi view consistent,multi object consistent
      • 即网络可以自己学出来如何把一个物体shape unrap到一个flat 空间
    • code-extractor 一个小CNN
      • 单张图片输入,输出一个global latent code z
    • UV amplifier
      • 因为UV坐标只有2维,而global latent code z维度很大,这两个信息不平衡
      • 所以就是用一组MLP先把UV升维
  • SP(surface parameterization) branch
    • 类似atlas net,以升维后的UV和global latent code的拼接为输入,输出三维点坐标
    • 与atlas net的不同:
      • uv升维了
      • 有一个learned chart,建立起图像坐标和3D surface坐标的直接相关
      • uv不是来自于均匀采样,而是从一个网络学出来的(即上面的NOCS-UV branch)
    • 输出的三维点坐标位于NOCS空间
  • loss / train
    • NOCS map的真值
    • 3D surface point的真值(从shapenet 3d model直接得到)
    • 其余都是端到端的

multi view atlas pix2surf

  • 不同view的latent code取max pooling,max pooled code和该view的code concat在一起
  • 从一个view的pixel的NOCS map的真值,找到这个真值在另一个view下的绝对对应pixel位置
    最小化这两个pixel预测出的3D 点距离,即为所定义的multi view consistency loss
    https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208105212477.png
<Meshlet> Meshlet priors for 3d mesh reconstruction
 

Motivation

  • 输入点云,输出mesh
  • 过去的学习shape的方法,在学习先验时有两种:
    • object级别的先验,没有和pose解耦;
    • smooth regularizer先验,会损失local detail
  • 本篇想学习的是那些处于canonical pose下的local natural meshlets,用local natural meshlets,这种meshlets在不同物体、不同类别之间完全是shared,然后用这样纯粹的局部先验来拼出一个完整mesh
https://longtimenohack.com/posts/paper_reading/2020cvpr_badki_meshlet/image-20201216162000056.pngP指的是测试时的物体在数据集物体pose分布内,红P指不在数据集pose分布内
N指的是低噪声,红N指moderate noise
T指训练集见过的物体类别,T指训练集没有见过的物体类别
可以看到,本篇重点强调学出那些和pose解耦了的局部的meshlets,用这些meshlets来拼出完整mesh

geodesic parameterization

  • Geodesic polar coordinates on polygonal meshes.
  • 把一个顶点和周围的点映射到这个顶点的切平面的坐标上;然后把切平面通过变换变换到canonical pose(即顶点位移到坐标原点,切平面的法向量即z轴,切平面的u,v轴和x,y轴重合)
  • 这样,可以实现pose解耦,学到那些各种各样的局部的meshlets
  • https://longtimenohack.com/posts/paper_reading/2020cvpr_badki_meshlet/image-20201216163007426.png

VAE

  • 用VAE把各种meshlets压缩到一个latent space
  • 然后应用它fit一个点云集合的时候,首先用encoder提取一个初始的latent code,然后auto-decoder来更新几步latent code
  • https://longtimenohack.com/posts/paper_reading/2020cvpr_badki_meshlet/image-20201216162658302.png

overall optimization

  • 首先随便初始化一个rough mesh,从这个rough mesh提取meshlets,保证每个vertex至少被3个meshlets cover
    • 注意,这样训练时就有两个量要迭代优化更新:一个是mesh,一个是一组meshlets;
    • 其中,每个meshlets由顶点和形状code构成
  • 迭代:更新每一个局部的local shape
    • 用point cloud和meshlets“拼成的mesh”的loss来更新每一个meshlet的形状
  • 迭代:再让local shape形成global consistency
    • 最小化更新后的meshlets的形状和“拼成的mesh”的误差
    • 首先固定meshlets的形状code,更新mesh顶点
    • 然后固定mesh顶点,更新meshlet的形状code
  • https://longtimenohack.com/posts/paper_reading/2020cvpr_badki_meshlet/image-20201216163105638.png
Shape reconstruction by learning differentiable surface representations
 

Motivation

  • 目前有一些学习an ensumble of Parametric表征的方法
    • 但是这些方法并没有控制表面patch的变形,因此并不能阻止patches彼此重叠或者折叠成一个点、一条线
    • 这种情况下,计算表面法向量就会变得困难、不可靠
  • 本篇提出 在训练时,开发深度神经网络的天生的可微性
    • 来利用表面的微分属性去阻止patch折叠、显著减少互相重叠
    • 并且这让我们可以可靠地计算表面法向量、曲率等
  • https://longtimenohack.com/posts/paper_reading/2020cvpr_bednarik_shape/image-20201224164231425.png
  • Learning to Reconstruct Texture-Less Deformable Surfaces. 3DV2018
  • Marr Revisited: 2D-3D Model Alignment via Surface Normal Prediction. CVPR2016
  • A Two-Stream Network for Fast and Accurate 3D Cloth Draping. ICCV2019

overview

  • https://longtimenohack.com/posts/paper_reading/2020cvpr_bednarik_shape/image-20201224193837533.png

results

  • 主要对比基线就是atlasNet
  • Pointcloud Autoencoding (PCAE)
    https://longtimenohack.com/posts/paper_reading/2020cvpr_bednarik_shape/image-20201224175724685.png
  • single view reconstruction (SVR) 单目重建
    https://longtimenohack.com/posts/paper_reading/2020cvpr_bednarik_shape/image-20201224175853915.png
Better patch stitching for parametric surface reconstruction
 

Motivation

  • 对目前的multiple patch based parametric surface representations(atlas),改进patches的global consistency(即防止**孔洞**和多个patch不正确**交叉**“jagged/带**锯齿**的"的情况)
  • https://longtimenohack.com/posts/paper_reading/20203dv_deng_better/image-20201224174834579.png
  • 典型的缝合问题(1D表示)
    https://longtimenohack.com/posts/paper_reading/20203dv_deng_better/image-20201224175209265.png
  • FoldingNet Foldingnet: Point Cloud Auto-Encoder via Deep Grid Deformation.CVPR2018
    第一个基于深度神经网络的工作:学到一个参数化的函数来在3D空间中嵌入一个2D流形
  • 后面的工作shifted to ensembles of such learned functions来做patch-wise表征:
    • learning (encoder)
      • Atlasnet: A papier-mâché approach to learning 3d surface generation. CVPR2018
      • Learning elementary structures for 3d shape generation and matching. NeurIPS2019
      • Shape reconstruction by learning differentiable surface representations. CVPR2020 这是作者的前作,用正则化来减轻表面的扭曲、重叠
      • Tearingnet: Point cloud autoencoder to learn topology-friendly representations. arXiv, 2020.
    • optimization (auto-decoder)
      • Deep geometric prior for surface reconstruction. CVPR2019
      • Meshlet priors for 3d mesh reconstruction. CVPR2020
    • 2D output domain
      • Deep parametric shape predictions using distance fields. CVPR2020
    • 因为连续的patch可以以任意精度采样,因此在拟合的时候可以有很高的精度
    • 目前方法的主要缺陷
      • 学到的表面高度扭曲、大规模重叠;只能通过适当的regularization正则化来减轻(即作者前一篇工作Shape reconstruction by learning differentiable surface representations
      • 更紧急的问题:individual patches的放置时的global inconsistency,导致surface artifacts,比如孔洞,或者一些多个patch不正确交叉的区域
        • 这个问题在meshlet和Deep geometric prior for surface reconstruction. 两篇里有一定程度攻击,但是只在optimization settings,很缓慢,并且在test time还需要几何观测(如带噪声的点云);
      • 本篇主要基于learning-based (带encoder) 前作,利用它的低扭曲、低重叠属性,改进patches的global consistency