Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images

2020-12-08 2020-12-08 约 1380 字预计阅读 3 分钟

<Pix2Surf> Pix2surf: Learning parametric 3d surface models of objects from images

ECCV2020 European conference on computer vision

Jiahui Lei, Srinath Sridhar, Paul Guerrero, Minhyuk Sung, Niloy Mitra, Leonidas J Guibas

Zhejiang University, Stanford', UCL Adobe

parametric 3D shape/parameterization, 3D reconstruction, multi-view, single-view, surface reconstruction in NOCS

Preprint Code Project

Result

评价：可以看到学出来的曲面可以不是闭合的

Motivation

learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views
使用2D patch来作为UV parameterization，处理多个non-adjacent views，并且建立2D pixels和3D surface points之间的correspondence
那些用implicit functions表达的surface，想要得到显式的表面，需要昂贵的后处理步骤：如Marching Cubes；本文直接学习生成显式的表面

主要贡献

high-quality parametric surfaces 遵循multi view一致性
生成的3D表面保留了精确的图像像素到3D表面点的correspondance，使得可以lift texture information去reconstruct 带有丰富集合与外观的 shapes

引用的directly reconstruct a parametric representation of a shape’s surface

class-specific templates (canonical template / mean shape in canonical space)
逐个类别手动设计的shape template
- [ECCV2018] Learning category-specific mesh reconstruction from image collections.
- [ICCV2019] Canonical surface mapping via geometric cycle consistency
general structured templates
适用于各种类别的通用shape template学习方法（应对不同的形状、拓扑）
- [ICCV2019] Learning shape templates with structured implicit functions.
more generic surface representations
- meshes deform
  - [ECCV2018] Pixel2mesh: Generating 3d mesh models from single rgb images.
  - [ICCV2019] Pixel2mesh++: Multi-view 3d mesh generation via deformation
  - [CVPR2019] 3DN: 3d deformation network.
- differentiable mesh renderer + image supervision
  - [CVPR2018] Neural 3d mesh renderer
  - [2019] Soft rasterizer: A differentiable renderer for image-based 3d reasoning
  - [2019] Pix2vex: Image-togeometry reconstruction using a smooth differentiable renderer.
  - [CVPR2019] Learning view priors for single-view 3d reconstruction.
- ==continuous 2D patches== 本篇类似：使用2D patch来作为UV parameterization
  - [CVPR2018] Atlasnet: A papier-mâché approach to learning 3d surface generation.
  - AtlasNet for video clip
    [CVPR2019] Photometric mesh optimization for video-aligned 3d object reconstruction.
  - introduce topology modification to atlasnet
    [ICCV2019] Deep mesh reconstruction from single rgb images via topology modification networks

preliminaries

NOCS
- 可以预测出一张图片的nocs map和mask
surface parameterization
- 表面的UV参数化即一个chart
- 用一组全连接网络学习多个chart

overview

==注意==：不同于atlas net，uv不是来自于均匀采样，而是来自于一个learned network，uv predictor
所以是先预测出图像每个像素的uv值，再把图像上属于这个物体的uv值集合和图像的feature 拼接一起来输出三维点集合(二维流形的三维点坐标集)

graph LR
	img[image coordinate] -.per index prediction.-> uv[uv value] --> MLP
	image --> z[global latent code z] --> MLP
	MLP --> 3d[3D surface coordinate]

single view single chart pix2surf

NOCS-UV branch
- 在过去的NOCS输出上额外加两个channel，输出uv值
- uv不是均匀采样来的，而是直接从图像预测出一张2-channel uv image
- 发现可以emergence of a chart，并且这个chart几乎已经multi view consistent，multi object consistent
  - 即网络可以自己学出来如何把一个物体shape unrap到一个flat 空间
- code-extractor 一个小CNN
  - 单张图片输入，输出一个global latent code z
- UV amplifier
  - 因为UV坐标只有2维，而global latent code z维度很大，这两个信息不平衡
  - 所以就是用一组MLP先把UV升维
SP(surface parameterization) branch
- 类似atlas net，以升维后的UV和global latent code的拼接为输入，输出三维点坐标
- 与atlas net的不同：
  - uv升维了
  - 有一个learned chart，建立起图像坐标和3D surface坐标的直接相关
  - uv不是来自于均匀采样，而是从一个网络学出来的（即上面的NOCS-UV branch）
- 输出的三维点坐标位于NOCS空间
loss / train
- NOCS map的真值
- 3D surface point的真值（从shapenet 3d model直接得到）
- 其余都是端到端的

multi view atlas pix2surf

不同view的latent code取max pooling，max pooled code和该view的code concat在一起
从一个view的pixel的NOCS map的真值，找到这个真值在另一个view下的绝对对应pixel位置
最小化这两个pixel预测出的3D 点距离，即为所定义的multi view consistency loss

目录

目录