Sparse and deep generalizations of the FRAME model

Wu, Ying Nian; Xie, Jianwen; Lu, Yang; Zhu, Song-Chun

doi:10.4310/AMSA.2018.v3.n1.a7

Contents Online

Annals of Mathematical Sciences and Applications

Volume 3 (2018)

Number 1

Special issue in honor of Professor David Mumford, dedicated to the memory of Jennifer Mumford

Guest Editors: Stuart Geman, David Gu, Stanley Osher, Chi-Wang Shu, Yang Wang, and Shing-Tung Yau

Sparse and deep generalizations of the FRAME model

Pages: 211 – 254

DOI: https://dx.doi.org/10.4310/AMSA.2018.v3.n1.a7

Authors

Ying Nian Wu (Department of Statistics, University of California at Los Angeles)

Jianwen Xie (Hikvision Research Institute, Santa Clara, California, U.S.A.)

Yang Lu (Amazon RSML (Retail System Machine Learning) Group)

Song-Chun Zhu (Department of Statistics, University of California at Los Angeles)

Abstract

In the pattern theoretical framework developed by Grenander and advocated by Mumford for computer vision and pattern recognition, different patterns are represented by statistical generative models. The FRAME (Filters, Random fields, And Maximum Entropy) model is such a generative model for texture patterns. It is a Markov random field model (or a Gibbs distribution, or an energy-based model) of stationary spatial processes. The log probability density function of the model (or the energy function of the Gibbs distribution) is the sum of translation-invariant potential functions that are one-dimensional non-linear transformations of linear filter responses. In this paper, we review two generalizations of this model. One is a sparse FRAME model for non-stationary patterns such as objects, where the potential functions are location specific, and they are non-zero only at a selected collection of locations. The other generalization is a deep FRAME model where the filters are defined by a convolutional neural network (CNN or ConvNet). This leads to a deep convolutional energy-based model. The local modes of the energy function satisfies an auto-encoder which we call the Hopfield auto-encoder. The model can be learned by an “analysis by synthesis” algorithm that iterates a sampling step for synthesis and a learning step for analysis. The algorithm admits an adversarial interpretation where the learning step and sampling step play a minimax game based on a value function. We can recruit a generator model as a direct and approximate sampler of the deep energy-based model to speed up the sampling step, and the two models can be learned simultaneously by a cooperative learning algorithm.

Keywords

adversarial interpretation, convolutional neural network, cooperative learning, energy-based model, generator model, Hopefield auto-encoder, sparse coding

Full Text (PDF format)

The work is supported by NSF DMS 1310391, DARPA SIMPLEX N66001-15-C-4035, ONR MURI N00014-16-1-2007, and DARPA ARO W911NF-16-1-0579.

Received 26 June 2017

Published 27 March 2018