目录

目录

Survey: capsule networks


目录
Stacked capsule autoencoders
 
目录

Motivation

  • https://longtimenohack.com/posts/paper_reading/2019nips_kosiorek_stacked/image-20201216165444435.png
Canonical capsules: Unsupervised capsules in canonical pose
 

Motivation

  • unsupervised capsule architecture for 3D point clouds
  • https://longtimenohack.com/posts/paper_reading/2020arxiv_sun_canonical/image-20201216170228754.png

overview

  • https://longtimenohack.com/posts/paper_reading/2020arxiv_sun_canonical/image-20201216171806453.png

decomposition

  • 把点云 $\boldsymbol{P} \in \mathbb{R}^{P \times D}$ 用一个encoder计算出K-fold attention map $\boldsymbol{A} \in \mathbb{R}^{P \times K}$ 和逐点的feature $\boldsymbol{F} \in \mathbb{R}^{P \times C}$
  • 然后计算 $k$ -th capsule的pose $\boldsymbol{\theta}_k \in \mathbb{R}^3$ 和对应的capsule descriptor $\boldsymbol{\beta}_k \in \mathbb{R}^C$
    • $\boldsymbol{\theta}k = \frac {\sum_p A{p,k}P_p} {\sum_p A_{p,k}}$
    • $\boldsymbol{\beta}k=\frac {\sum_p A{p,k}F_p} {\sum_p A_{p,k}}$
    • 其实就是attention map加权和后的点坐标和attention map加权和后的点feature

canonicalization

  • 单纯地保证不变性和等变性并不足以学出一个object-centric的3D表征,因为缺乏一种(无监督)的机制来==bring information into a shared “object-centric” reference frame==
  • 并且,一个"合适"的canonical frame其实就是一个convention,所以我们需要一个机制让网络做出一个**选择**——并且必须在所有物体中都是一致的
    • 比如,一个沿着+z轴放置的飞机和一个沿着+y轴放置的飞机是**一样好**的
  • 为了实现这一点:link the capsule descriptors to the capsule poses in canonical space;i.e. ask that objects with similar appearance to be located in similar Euclidean neighborhoods in canonical space
    • 具体做法是用一个全连接层,从descriptor直接回归出每个capsule的pose
    • $\overline{\theta}=\mathcal{K}(\beta)$
      $\overline{\theta} \in \mathbb{R}^{K\times 3}$ 是canonical poses,
      $\mathcal{K}$ 是全连接神经网络,
      $\beta \in \mathbb{R}^{K \times C}$ 是capsule的descriptor
    • Q: why?居然直接从K个胶囊描述子直接回归出K个canonical pose
Unsupervised part representation by flow capsules
 

Motivation

  • capsule networks不能高效地学到low level的part descriptions
  • exploit motion as a powerful perceptual cue for part definition
    用运动作为一个部件定义的有力的感知线索

results

  • 从复杂背景中找出来原来的三角形、正方形、圆形等
    https://longtimenohack.com/posts/paper_reading/2021icml_sabour_unsupervised/image-20201216170936463.png
  • 对于运动的人学出来的部件https://longtimenohack.com/posts/paper_reading/2021icml_sabour_unsupervised/image-20201216171020883.png