Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images
<Pix2Surf>
Pix2surf: Learning parametric 3d surface models of objects from images目录
Result
- 评价:可以看到学出来的曲面可以不是闭合的
Motivation
- learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views
- 使用2D patch来作为UV parameterization,处理多个non-adjacent views,并且建立2D pixels和3D surface points之间的correspondence
- 那些用implicit functions表达的surface,想要得到显式的表面,需要昂贵的后处理步骤:如Marching Cubes;本文直接学习生成显式的表面
主要贡献
- high-quality parametric surfaces 遵循multi view一致性
- 生成的3D表面保留了精确的图像像素到3D表面点的correspondance,使得可以lift texture information去reconstruct 带有丰富集合与外观的 shapes
引用的directly reconstruct a parametric representation of a shape’s surface
- class-specific templates (canonical template / mean shape in canonical space)
逐个类别手动设计的shape template- [ECCV2018] Learning category-specific mesh reconstruction from image collections.
- [ICCV2019] Canonical surface mapping via geometric cycle consistency
- general structured templates
适用于各种类别的通用shape template学习方法(应对不同的形状、拓扑)- [ICCV2019] Learning shape templates with structured implicit functions.
- more generic surface representations
- meshes deform
- [ECCV2018] Pixel2mesh: Generating 3d mesh models from single rgb images.
- [ICCV2019] Pixel2mesh++: Multi-view 3d mesh generation via deformation
- [CVPR2019] 3DN: 3d deformation network.
- differentiable mesh renderer + image supervision
- [CVPR2018] Neural 3d mesh renderer
- [2019] Soft rasterizer: A differentiable renderer for image-based 3d reasoning
- [2019] Pix2vex: Image-togeometry reconstruction using a smooth differentiable renderer.
- [CVPR2019] Learning view priors for single-view 3d reconstruction.
- ==continuous 2D patches== 本篇类似:使用2D patch来作为UV parameterization
- [CVPR2018] Atlasnet: A papier-mâché approach to learning 3d surface generation.
- AtlasNet for video clip
[CVPR2019] Photometric mesh optimization for video-aligned 3d object reconstruction. - introduce topology modification to atlasnet
[ICCV2019] Deep mesh reconstruction from single rgb images via topology modification networks
- meshes deform
preliminaries
- NOCS
- 可以预测出一张图片的nocs map和mask
- surface parameterization
- 表面的UV参数化即一个
chart
- 用一组全连接网络学习多个
chart
- 表面的UV参数化即一个
overview
- ==注意==:不同于atlas net,uv不是来自于均匀采样,而是来自于一个learned network,uv predictor
所以是先预测出图像每个像素的uv值,再把图像上属于这个物体的uv值集合和图像的feature 拼接一起来 输出 三维点集合(二维流形的三维点坐标集) graph LR img[image coordinate] -.per index prediction.-> uv[uv value] --> MLP image --> z[global latent code z] --> MLP MLP --> 3d[3D surface coordinate]
single view single chart pix2surf
- NOCS-UV branch
- 在过去的NOCS输出上额外加两个channel,输出uv值
- uv不是均匀采样来的,而是直接从图像预测出一张2-channel uv image
- 发现可以emergence of a chart,并且这个chart几乎已经multi view consistent,multi object consistent
- 即网络可以自己学出来如何把一个物体shape unrap到一个flat 空间
- code-extractor 一个小CNN
- 单张图片输入,输出一个global latent code z
- UV amplifier
- 因为UV坐标只有2维,而global latent code z维度很大,这两个信息不平衡
- 所以就是用一组MLP先把UV升维
- SP(surface parameterization) branch
- 类似atlas net,以升维后的UV和global latent code的拼接为输入,输出三维点坐标
- 与atlas net的不同:
- uv升维了
- 有一个learned chart,建立起图像坐标和3D surface坐标的直接相关
- uv不是来自于均匀采样,而是从一个网络学出来的(即上面的NOCS-UV branch)
- 输出的三维点坐标位于NOCS空间
- loss / train
- NOCS map的真值
- 3D surface point的真值(从shapenet 3d model直接得到)
- 其余都是端到端的
multi view atlas pix2surf
- 不同view的latent code取max pooling,max pooled code和该view的code concat在一起
- 从一个view的pixel的NOCS map的真值,找到这个真值在另一个view下的绝对对应pixel位置
最小化这两个pixel预测出的3D 点距离,即为所定义的multi view consistency loss