Canonical Capsules: Unsupervised Capsules in Canonical Pose
Canonical capsules: Unsupervised capsules in canonical pose
目录
Motivation
- unsupervised capsule architecture for 3D point clouds
overview
decomposition
- 把点云 $\boldsymbol{P} \in \mathbb{R}^{P \times D}$ 用一个encoder计算出K-fold attention map $\boldsymbol{A} \in \mathbb{R}^{P \times K}$ 和逐点的feature $\boldsymbol{F} \in \mathbb{R}^{P \times C}$
- 然后计算 $k$ -th capsule的pose $\boldsymbol{\theta}_k \in \mathbb{R}^3$ 和对应的capsule descriptor $\boldsymbol{\beta}_k \in \mathbb{R}^C$
- $\boldsymbol{\theta}k = \frac {\sum_p A{p,k}P_p} {\sum_p A_{p,k}}$
- $\boldsymbol{\beta}k=\frac {\sum_p A{p,k}F_p} {\sum_p A_{p,k}}$
- 其实就是attention map加权和后的点坐标和attention map加权和后的点feature
canonicalization
- 单纯地保证不变性和等变性并不足以学出一个object-centric的3D表征,因为缺乏一种(无监督)的机制来==bring information into a shared “object-centric” reference frame==
- 并且,一个"合适"的canonical frame其实就是一个convention,所以我们需要一个机制让网络做出一个**选择**——并且必须在所有物体中都是一致的
- 比如,一个沿着+z轴放置的飞机和一个沿着+y轴放置的飞机是**一样好**的
- 为了实现这一点:link the capsule descriptors to the capsule poses in canonical space;i.e. ask that objects with similar appearance to be located in similar Euclidean neighborhoods in canonical space
- 具体做法是用一个全连接层,从descriptor直接回归出每个capsule的pose
- $\overline{\theta}=\mathcal{K}(\beta)$
$\overline{\theta} \in \mathbb{R}^{K\times 3}$ 是canonical poses,
$\mathcal{K}$ 是全连接神经网络,
$\beta \in \mathbb{R}^{K \times C}$ 是capsule的descriptor - Q: why?居然直接从K个胶囊描述子直接回归出K个canonical pose