目录

目录

Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images


<Pix2Surf> Pix2surf: Learning parametric 3d surface models of objects from images

Result

  • 评价:可以看到学出来的曲面可以不是闭合的
  • https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201207204146033.png https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201207204206853.png

Motivation

  • learning to generate 3D parametric surface representations for novel object instances, as seen from one or more views
  • 使用2D patch来作为UV parameterization,处理多个non-adjacent views,并且建立2D pixels和3D surface points之间的correspondence
  • 那些用implicit functions表达的surface,想要得到显式的表面,需要昂贵的后处理步骤:如Marching Cubes;本文直接学习生成显式的表面

主要贡献

  • high-quality parametric surfaces 遵循multi view一致性
  • 生成的3D表面保留了精确的图像像素到3D表面点的correspondance,使得可以lift texture information去reconstruct 带有丰富集合与外观的 shapes

引用的directly reconstruct a parametric representation of a shape’s surface

  • class-specific templates (canonical template / mean shape in canonical space)
    逐个类别手动设计的shape template
    • [ECCV2018] Learning category-specific mesh reconstruction from image collections.
    • [ICCV2019] Canonical surface mapping via geometric cycle consistency
  • general structured templates
    适用于各种类别的通用shape template学习方法(应对不同的形状、拓扑)
    • [ICCV2019] Learning shape templates with structured implicit functions.
  • more generic surface representations
    • meshes deform
      • [ECCV2018] Pixel2mesh: Generating 3d mesh models from single rgb images.
      • [ICCV2019] Pixel2mesh++: Multi-view 3d mesh generation via deformation
      • [CVPR2019] 3DN: 3d deformation network.
    • differentiable mesh renderer + image supervision
      • [CVPR2018] Neural 3d mesh renderer
      • [2019] Soft rasterizer: A differentiable renderer for image-based 3d reasoning
      • [2019] Pix2vex: Image-togeometry reconstruction using a smooth differentiable renderer.
      • [CVPR2019] Learning view priors for single-view 3d reconstruction.
    • ==continuous 2D patches== 本篇类似:使用2D patch来作为UV parameterization
      • [CVPR2018] Atlasnet: A papier-mâché approach to learning 3d surface generation.
      • AtlasNet for video clip
        [CVPR2019] Photometric mesh optimization for video-aligned 3d object reconstruction.
      • introduce topology modification to atlasnet
        [ICCV2019] Deep mesh reconstruction from single rgb images via topology modification networks

preliminaries

  • NOCS
    • 可以预测出一张图片的nocs map和mask
  • surface parameterization
    • 表面的UV参数化即一个chart
    • 用一组全连接网络学习多个chart

overview

  • ==注意==:不同于atlas net,uv不是来自于均匀采样,而是来自于一个learned network,uv predictor
    所以是先预测出图像每个像素的uv值,再把图像上属于这个物体的uv值集合和图像的feature 拼接一起来 输出 三维点集合(二维流形的三维点坐标集)
  • graph LR
    	img[image coordinate] -.per index prediction.-> uv[uv value] --> MLP
    	image --> z[global latent code z] --> MLP
    	MLP --> 3d[3D surface coordinate]
    

  • https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208103708582.png

single view single chart pix2surf

  • NOCS-UV branch
    • 在过去的NOCS输出上额外加两个channel,输出uv值
    • uv不是均匀采样来的,而是直接从图像预测出一张2-channel uv image
      https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208101930111.png
    • 发现可以emergence of a chart,并且这个chart几乎已经multi view consistent,multi object consistent
      • 即网络可以自己学出来如何把一个物体shape unrap到一个flat 空间
    • code-extractor 一个小CNN
      • 单张图片输入,输出一个global latent code z
    • UV amplifier
      • 因为UV坐标只有2维,而global latent code z维度很大,这两个信息不平衡
      • 所以就是用一组MLP先把UV升维
  • SP(surface parameterization) branch
    • 类似atlas net,以升维后的UV和global latent code的拼接为输入,输出三维点坐标
    • 与atlas net的不同:
      • uv升维了
      • 有一个learned chart,建立起图像坐标和3D surface坐标的直接相关
      • uv不是来自于均匀采样,而是从一个网络学出来的(即上面的NOCS-UV branch)
    • 输出的三维点坐标位于NOCS空间
  • loss / train
    • NOCS map的真值
    • 3D surface point的真值(从shapenet 3d model直接得到)
    • 其余都是端到端的

multi view atlas pix2surf

  • 不同view的latent code取max pooling,max pooled code和该view的code concat在一起
  • 从一个view的pixel的NOCS map的真值,找到这个真值在另一个view下的绝对对应pixel位置
    最小化这两个pixel预测出的3D 点距离,即为所定义的multi view consistency loss
    https://longtimenohack.com/posts/paper_reading/2020eccv_lei_pix2surf/image-20201208105212477.png