目录

目录

Canonical Capsules: Unsupervised Capsules in Canonical Pose


Canonical capsules: Unsupervised capsules in canonical pose

Motivation

  • unsupervised capsule architecture for 3D point clouds
  • https://longtimenohack.com/posts/paper_reading/2020arxiv_sun_canonical/image-20201216170228754.png

overview

  • https://longtimenohack.com/posts/paper_reading/2020arxiv_sun_canonical/image-20201216171806453.png

decomposition

  • 把点云 \(\boldsymbol{P} \in \mathbb{R}^{P \times D}\) 用一个encoder计算出K-fold attention map \(\boldsymbol{A} \in \mathbb{R}^{P \times K}\) 和逐点的feature \(\boldsymbol{F} \in \mathbb{R}^{P \times C}\)
  • 然后计算 \(k\) -th capsule的pose \(\boldsymbol{\theta}_k \in \mathbb{R}^3\) 和对应的capsule descriptor \(\boldsymbol{\beta}_k \in \mathbb{R}^C\)
    • \(\boldsymbol{\theta}_k = \frac {\sum_p A_{p,k}P_p} {\sum_p A_{p,k}}\)
    • \(\boldsymbol{\beta}_k=\frac {\sum_p A_{p,k}F_p} {\sum_p A_{p,k}}\)
    • 其实就是attention map加权和后的点坐标和attention map加权和后的点feature

canonicalization

  • 单纯地保证不变性和等变性并不足以学出一个object-centric的3D表征,因为缺乏一种(无监督)的机制来==bring information into a shared “object-centric” reference frame==
  • 并且,一个"合适"的canonical frame其实就是一个convention,所以我们需要一个机制让网络做出一个**选择**——并且必须在所有物体中都是一致的
    • 比如,一个沿着+z轴放置的飞机和一个沿着+y轴放置的飞机是**一样好**的
  • 为了实现这一点:link the capsule descriptors to the capsule poses in canonical space;i.e. ask that objects with similar appearance to be located in similar Euclidean neighborhoods in canonical space
    • 具体做法是用一个全连接层,从descriptor直接回归出每个capsule的pose
    • \(\overline{\theta}=\mathcal{K}(\beta)\)
      \(\overline{\theta} \in \mathbb{R}^{K\times 3}\) 是canonical poses,
      \(\mathcal{K}\) 是全连接神经网络,
      \(\beta \in \mathbb{R}^{K \times C}\) 是capsule的descriptor
    • Q: why?居然直接从K个胶囊描述子直接回归出K个canonical pose