Probabilistic programming is a mechanism for compactly and compositionally representing a large set of richly expressive generative models as well as performing general-purpose automatic algorithms to do inference over variables in such models.
Higher-order probabilistic programming languages open up possibility of doing inference over generative models themselves. The goal is to find an efficient robust way to automatically produce and tune models based on training data and knowledge.
"Probabilistic programming systems represent generative models as programs in a language that has specialized syntax for definition and conditioning of random variables. A backend provides one or more general-purpose inference methods for any program in this language, resulting in an abstraction boundary between model and inference algorithm design. Models specified as programs are often concise, modular, and easy to modify or extend. This allows definition of structured priors that are specifically tailored to an application domain in a manner that is efficient in terms of the dimensionality of its latent variables, albeit at the expense of performing inference with generic methods that may not take advantage of model-specific optimizations."
"A probabilistic programming language provides syntax to instantiate random variables, as well as syntax to impose conditions on these random variables. The goal of inference in a probabilistic program is to characterize the distribution on its random variables, subject to the imposed conditions. When conditions define hard constraints (such as a=10, or a>5, for some variable a), execution of a probabilistic program can be interpreted as rejection sampling. Expectation values can, in principle, be calculated by repeatedly running the program and excluding executions that violate one or more conditions. In practice, the language design often implicitly or explicitly guarantees that conditions can be satisfied with a computable non-zero probability in any execution. Probabilistic programs in such languages can be seen as importance samplers; repeated execution yields samples from a prior, which are assigned an importance weight according to the probability that its conditions are satisfied. From a modeling point of view, a probabilistic program P defines a probability distribution. In a language that provides if-statements, recursion, higher-order functions, and primitive operations such as matrix inversion, this distribution can have an almost arbitrary structure. This means that from an inference point of view, a program P is a sequence of opaque deterministic operations that is interrupted by (1) sample statements, which require generation of a value for a random variable and (2) conditioning statements, which change the importance weight of the execution history."
"Probabilistic models provide a framework for describing abstract prior knowledge and using it to reason under uncertainty. Probabilistic programs are a powerful tool for probabilistic modeling. A probabilistic programming language is a deterministic programming language augmented with random sampling and Bayesian conditioning operators. Performing inference on these programs then involves reasoning about the space of executions which satisfy some constraints, such as observed values. A universal probabilistic programming language, one built on a Turing-complete language, can represent any computable probability distribution, including open-world models, Bayesian non-parameterics, and stochastic recursion. Distribution is a meaning of program."
"One of the key characteristics of higher-order probabilistic programming languages equiped with eval is that program text both can be generated and evaluated. In higher-order languages (Lisp, Scheme, Church, Anglican and Venture) functions are first class objects - evaluating program text that defines a valid procedure returns a procedure that can be applied to arguments. The means that, among other things, program text can be programmatically generated by a program and then evaluated. In a probabilistic programming context this means that we can do inference about the program text that gave rise to an observed output or output relationship. In short, we can get computers to program themselves."
"Probabilistic programs express generative models, i.e., formal descriptions of processes that generate data. To the extent that we can mirror mechanisms “out there in the world” in a formal language, we can pose queries within this language and expect the answer to line up with what happens in the real world. For example, if we can accurately describe how genes give rise to proteins, this helps us in predicting which genetic changes lead to desirable results and which lead to harm. Traditionally, Bayesian networks have been used to formalize and reason about such processes. While extremely useful, this formalism is limited in its expressive power compared to programming languages in general. Probabilistic programming languages like Church provide a substrate for computational modeling that brings together Bayesian statistics and Turing-complete programming languages. What is the structure of thought? The language of thought hypothesis proposes that thoughts are expressions in a mental language, and that thinking consists of syntactic manipulation of these expressions. The probabilistic language of thought hypothesis suggests that these representations have probabilistic meaning. If we treat concepts as stochastic programs and thinking as approximate Bayesian inference, can we predict human behavior in experimental settings better than using other approaches, e.g., prototype theory? If we learn more about the primitives and composition mechanisms used in the brain to represent abstract concepts, can this inform how we construct tools for reasoning?"
(Andreas Stuhlmueller)
"If our goal is to build knowledge representations sufficient for human-level AI, it is natural to draw inspiration from the content and form of knowledge representations in human minds. Cognitive science has made significant relevant progress in the last several decades, and I will talk about several lessons AI researchers might take from it. First, rather than beginning by trying to extract common-sense knowledge from language, we should begin (as humans do) by capturing in computational terms the core of common sense that exists in prelinguistic infants, roughly 12 months and younger. This is knowledge about physical objects, intentional agents, and their causal interactions -- an intuitive physics with concepts analogous to force, mass and the dynamics of motion, and an intuitive psychology with concepts analogous to beliefs, desires and intentions, or probabilistic expectations, utilities and plans -- which is grounded in perception but has at its heart powerful abstractions that can extend far beyond our direct perceptual experience. Second, we should explore approaches for linguistic meaning, language understanding and language-based common-sense reasoning that build naturally on top of this common-sense core. Both of these considerations strongly favor an approach to knowledge representation based on probabilistic programs, a framework that combines the assets of traditional probabilistic (though not neural) and symbolic approaches."
(Joshua Tenenbaum)
"Probabilistic graphical models provide a formal lingua franca for modeling and a common target for efficient inference algorithms. However, many of the most innovative and useful probabilistic models published by the AI, machine learning, and statistics community far outstrip the representational capacity of graphical models and associated inference techniques. Models are communicated using a mix of natural language, pseudo code, and mathematical formulae and solved using special purpose, one-off inference methods. Rather than precise specifications suitable for automatic inference, graphical models typically serve as coarse, high-level descriptions, eliding critical aspects such as fine-grained independence, abstraction and recursion. Probabilistic programming languages aim to close this representational gap, unifying general purpose programming with probabilistic modeling; literally, users specify a probabilistic model in its entirety (e.g., by writing code that generates a sample from the joint distribution) and inference follows automatically given the specification. These languages provide the full power of modern programming languages for describing complex distributions, and can enable reuse of libraries of models, support interactive modeling and formal verification, and provide a much-needed abstraction barrier to foster generic, efficient inference in universal model classes."
(Pedro Domingos)
"The class of samplable distributions is, in a sense, the richest class you might hope to deal with. The question we asked was: is there an algorithm that, given a samplable distribution on two variables X and Y, represented by a program that samples values for both variables, can compute the conditional distribution of, say, Y given X=x, for almost all values for X? When X takes values in a finite, discrete set, e.g., when X is binary valued, there is a general algorithm, although it is inefficient. But when X is continuous, e.g., when it can take on every value in the unit interval [0,1], then problems can arise. In particular, there exists a distribution on a pair of numbers in [0,1] from which one can generate perfect samples, but for which it is impossible to compute conditional probabilities for one of the variables given the other. As one might expect, the proof reduces the halting problem to that of conditioning a specially crafted distribution. This pathological distribution rules out the possibility of a general algorithm for conditioning (equivalently, for probabilistic inference). The paper ends by giving some further conditions that, when present, allow one to devise general inference algorithms. Those familiar with computing conditional distributions for finite-dimensional statistical models will not be surprised that conditions necessary for Bayes’ theorem are one example."
(Daniel Roy)
"For me there are two types of generalisation, which I will refer to as Symbolic and Connectionist generalisation. If we teach a machine to sort sequences of numbers of up to length 10 or 100, we should expect them to sort sequences of length 1000 say. Obviously symbolic approaches have no problem with this form of generalisation, but neural nets do poorly. On the other hand, neural nets are very good at generalising from data (such as images), but symbolic approaches do poorly here. One of the holy grails is to build machines that are capable of both symbolic and connectionist generalisation."
(Nando de Freitas)
introduction by Beau Cronin
introduction by Daniel Roy
introduction by Michael Hicks
introduction by Alexey Popov in russian
"Programs and Probability" by Brian Hayes
"The State of Probabilistic Programming" by Mohammed AlQuraishi
"Bayesian Deep Learning" by Thomas Wiecki
"Edward, and some motivations" by Dustin Tran
overview by Vikash Mansinghka video
overview by Boris Yangel video in russian
tutorial by Frank Wood video
turorial by Frank Wood video
"Engineering and Reverse Engineering Intelligence with Probabilistic Programs, Program Induction, and Deep Learning" by Josh Tenenbaum and Vikash Mansinghka video
PROBPROG 2018 conference (videos)
"An Introduction to Probabilistic Programming" by Meent, Paige, Yang, Wood book
"The Design and Implementation of Probabilistic Programming Languages" by Goodman and Stuhlmuller book
Facebook Prophet (uses Stan)
Microsoft Office 365 (uses Infer.NET)
Microsoft Excel plugin (uses Infer.NET)
graphics in reverse
machine teaching
http://probabilistic-programming.org
Forest - a repository for generative models
Probability Zoo - a community repository for pre-trained probabilistic models
- languages for models & systems that simplify / automate aspects of inference (Edward, Stan, PyMC, PyRo, WebPPL, BLOG)
- models and queries defined in terms of complex stochastic computations (BayesDB)
- programs and languages as formal representations of probabilistic objects (Venture)
-
TensorFlow Distributions (paper by Dillon et al.)
BayesFlow -
"Edward, and some motivations" by Dustin Tran
"Deep Probabilistic Programming" by Tran et al.papersummary -
overview by Bob Carpenter
video
overview by Bob CarpentervideoProphet from Facebook
-
"Probabilistic Programming and Bayesian Methods for Hackers" by Cam Davidson-Pilon
book -
"Pyro: Deep Universal Probabilistic Programming" by Bingham et al.
paper -
overview by Stuart Russell
video -
overview by Vikash Mansinghka
video
overview by Vikash Mansinghkavideo -
BayesDB project
summary
interesting papers - bayesian inference and learning
interesting papers - bayesian deep learning
"Probabilistic Programming" Gordon, Henzinger, Nori, Rajamani
"Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations. Models from diverse application areas such as computer vision, coding theory, cryptographic protocols, biology and reliability analysis can be written as probabilistic programs. Probabilistic inference is the problem of computing an explicit representation of the probability distribution implicitly specified by a probabilistic program. Depending on the application, the desired output from inference may vary - we may want to estimate the expected value of some function f with respect to the distribution, or the mode of the distribution, or simply a set of samples drawn from the distribution. In this paper, we describe connections this research area called “Probabilistic Programming” has with programming languages and software engineering, and this includes language design, and the static and dynamic analysis of programs. We survey current state of the art and speculate on promising directions for future research."
"Probabilistic (Logic) Programming Concepts" De Raedt, Kimmig
"A multitude of different probabilistic programming languages exists today, all extending a traditional programming language with primitives to support modeling of complex, structured probability distributions. Each of these languages employs its own probabilistic primitives, and comes with a particular syntax, semantics and inference procedure. This makes it hard to understand the underlying programming concepts and appreciate the differences between the different languages. To obtain a better understanding of probabilistic programming, we identify a number of core programming concepts underlying the primitives used by various probabilistic languages, discuss the execution mechanisms that they require and use these to position and survey state-of-the-art probabilistic languages and their implementation. While doing so, we focus on probabilistic extensions of logic programming languages such as Prolog, which have been considered for over 20 years."
"First, there is an ongoing quest for efficient inference approaches for languages that support a broad range of programming concepts. Promising directions include lifted inference, which aims at exploiting symmetries and abstraction over individuals to speed up inference, knowledge compilation, which has contributed many data structures for compactly representing and efficiently querying various types of knowledge, and approximate methods such as MCMC, which is used in many probabilistic programming languages, but still requires proposal functions to be custom made for the program at hand. There also is a need for a clear understanding of the relative computational complexity of the various probabilistic languages and concepts that exist to date. Another question that has only seen partial answers so far is how to efficiently deal with evidence and constraints in different inference techniques. Adapting and extending program transformation and analysis techniques to the probabilistic setting promises opportunities to recognize and exploit program parts that are amenable to more efficient inference. Concepts such as time and dynamics require inference approaches that on the one hand exploit repeated structure, but on the other hand can also deal with changing structure over time. Last but not least, it still is a challenge to learn probabilistic programs, although a wide variety of learning techniques for probabilistic programming has already been developed. Many key challenges for both parameter and structure learning remain, many of which are related to efficient inference, as learning requires inference."
videohttps://youtube.com/watch?v=5g0Z5b77rOs (Kimmig)videohttps://youtube.com/watch?v=3lnVBqxjC88 (De Raedt)videohttps://youtube.com/watch?v=oNEtju9hE78 (De Raedt)slideshttps://lirias.kuleuven.be/bitstream/123456789/504183/1/pp-tutorial-ijcai15.pdf
"Deep Probabilistic Programming" Tran, Hoffman, Saurous, Brevdo, Murphy, Blei
"We propose Edward, a Turing-complete probabilistic programming language. Edward builds on two compositional representations - random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation, to variational inference, to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, on a benchmark logistic regression task, Edward is at least 35x faster than Stan and PyMC3."
"Automatic Variational Inference in Stan" Kucukelbir, Ranganath, Gelman, Blei
"Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult to automate. We propose an automatic variational inference algorithm, automatic differentiation variational inference. The user only provides a Bayesian model and a dataset; nothing else. We make no conjugacy assumptions and support a broad class of models. The algorithm automatically determines an appropriate variational family and optimizes the variational objective. We implement ADVI in Stan (code available now), a probabilistic programming framework. We compare ADVI to MCMC sampling across hierarchical generalized linear models, nonconjugate matrix factorization, and a mixture model. We train the mixture model on a quarter million images. With ADVI we can use variational inference on any model we write in Stan."
"We develop automatic differentiation variational inference in Stan. ASVI leverages automatic transformations, an implicit non-Gaussian variational approximation, and automatic differentiation. This is a valuable tool. We can explore many models, and analyze large datasets with ease."
videohttp://research.microsoft.com/apps/video/default.aspx?id=259601 (18:30) (Kucukelbir)
"Automatic Differentiation Variational Inference" Kucukelbir, Tran, Ranganath, Gelman, Blei
"Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference. Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models - no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use."
"Deep Armotized Inference for Probabilistic Programs" Ritchie, Horsfall, Goodman
"Probabilistic programming languages are a powerful modeling tool, able to represent any computable probability distribution. Unfortunately, probabilistic program inference is often intractable, and existing PPLs mostly rely on expensive, approximate sampling-based methods. To alleviate this problem, one could try to learn from past inferences, so that future inferences run faster. This strategy is known as amortized inference; it has recently been applied to Bayesian networks and deep generative models. This paper proposes a system for amortized inference in PPLs. In our system, amortization comes in the form of a parameterized guide program. Guide programs have similar structure to the original program, but can have richer data flow, including neural network components. These networks can be optimized so that the guide approximately samples from the posterior distribution defined by the original program. We present a flexible interface for defining guide programs and a stochastic gradient-based scheme for optimizing guide parameters, as well as some preliminary results on automatically deriving guide programs. We explore in detail the common machine learning pattern in which a ‘local’ model is specified by ‘global’ random values and used to generate independent observed data points; this gives rise to amortized local inference supporting global model learning."
"In this paper, we presented a system for amortized inference in probabilistic programs. Amortization is achieved through parameterized guide programs which mirror the structure of the original program but can be trained to approximately sample from the posterior. We introduced an interface for specifying guide programs which is flexible enough to reproduce state-of-the-art variational inference methods. We also demonstrated how this interface supports model learning in addition to amortized inference. We developed and proved the correctness of an optimization method for training guide programs, and we evaluated its ability to optimize guides for Bayesian networks, topic models, and deep generative models."
"Nonstandard Interpretations of Probabilistic Programs for Efficient Inference" Wingate, Goodman, Stuhlmuller, Siskind
"Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals."
"We have shown how nonstandard interpretations of probabilistic programs can be used to extract structural information about a distribution, and how this information can be used as part of a variety of inference algorithms. The information can take the form of gradients, Hessians, fine-grained dependencies, or bounds. Empirically, we have implemented two such interpretations and demonstrated how this information can be used to find regions of high likelihood quickly, and how it can be used to generate samples with improved statistical properties versus random-walk style MCMC. There are other types of interpretations which could provide additional information. For example, interval arithmetic could be used to provide bounds or as part of adaptive importance sampling. Each of these interpretations can be used alone or in concert with each other; one of the advantages of the probabilistic programming framework is the clean separation of models and inference algorithms, making it easy to explore combinations of inference algorithms for complex models. More generally, this work begins to illuminate the close connections between probabilistic inference and programming language theory. It is likely that other techniques from compiler design and program analysis could be fruitfully applied to inference problems in probabilistic programs."
"With an outline of probabilistic programming in hand, we now turn to nonstandard interpretations. The idea of nonstandard interpretations originated in model theory and mathematical logic, where it was proposed that a set of axioms could be interpreted by different models. For example, differential geometry can be considered a nonstandard interpretation of classical arithmetic. In programming, a nonstandard interpretation replaces the domain of the variables in the program with a new domain, and redefines the semantics of the operators in the program to be consistent with the new domain. This allows reuse of program syntax while implementing new functionality. For example, the expression “a ∗ b” can be interpreted equally well if a and b are either scalars or matrices, but the “∗” operator takes on different meanings. Practically, many useful nonstandard interpretations can be implemented with operator overloading: variables are redefined to be objects with operators that implement special functionality, such as tracing, reference counting, or profiling."
"Human-level Concept Learning Through Probabilistic Program Induction" Lake, Salakhutdinov, Tenenbaum
"People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy. People can also use learned concepts in richer ways than conventional algorithms - for action, imagination, and explanation. We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the world’s alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches. We also present several “visual Turing tests” probing the model’s creative generalization abilities, which in many cases are indistinguishable from human behavior."
"Vision program outperformed humans in identifying handwritten characters, given single training example"
"This work brings together three key ideas -- compositionality, causality, and learning-to-learn --- challenging (in a good way) the traditional deep learning approach"
videohttp://youtube.com/watch?v=kzl8Bn4VtR8 (Lake)videohttp://youtu.be/quPN7Hpk014?t=21m5s (Tenenbaum)videohttp://techtalks.tv/talks/one-shot-learning-of-simple-fractal-concepts/63049/ (Lake)noteshttps://casmls.github.io/general/2017/02/08/oneshot.htmlcodehttps://github.com/brendenlake/BPL
"Picture: A Probabilistic Programming Language for Scene Perception" Kulkarni, Kohli, Tenenbaum, Mansinghka
"Recent progress on probabilistic modeling and statistical learning, coupled with the availability of large training datasets, has led to remarkable progress in computer vision. Generative probabilistic models, or “analysis-by-synthesis” approaches, can capture rich scene structure but have been less widely applied than their discriminative counterparts, as they often require considerable problem-specific engineering in modeling and inference, and inference is typically seen as requiring slow, hypothesize-and-test Monte Carlo methods. Here we present Picture, a probabilistic programming language for scene understanding that allows researchers to ex- press complex generative vision models, while automatically solving them using fast general-purpose inference machinery. Picture provides a stochastic scene language that can express generative models for arbitrary 2D/3D scenes, as well as a hierarchy of representation layers for comparing scene hypotheses with observed images by matching not simply pixels, but also more abstract features (e.g., contours, deep neural network activations). Inference can flexibly integrate advanced Monte Carlo strategies with fast bottom-up datadriven methods. Thus both representations and inference strategies can build directly on progress in discriminatively trained systems to make generative vision more robust and efficient. We use Picture to write programs for generative 3D face analysis, 3D human pose estimation, and 3D object reconstruction – each in under 50 lines of code, and each competitive with specially engineered baselines."
videohttps://youtube.com/watch?v=quPN7Hpk014 (Tenenbaum)videohttps://youtu.be/-8QMqSWU76Q?t=44m8s (Mansinghka)videohttps://youtu.be/Rte-y6ThwAQ?t=5m18s (Mansinghka)videohttps://facebook.com/nipsfoundation/videos/1552060484885185?t=5988 (Reed)
"Practical Optimal Experiment Design with Probabilistic Programs" Ouyang, Tessler, Ly, Goodman
"Scientists often run experiments to distinguish competing theories. This requires patience, rigor, and ingenuity - there is often a large space of possible experiments one could run. But we need not comb this space by hand - if we represent our theories as formal models and explicitly declare the space of experiments, we can automate the search for good experiments, looking for those with high expected information gain. Here, we present a general and principled approach to experiment design based on probabilistic programming languages. PPLs offer a clean separation between declaring problems and solving them, which means that the scientist can automate experiment design by simply declaring her model and experiment spaces in the PPL without having to worry about the details of calculating information gain. We demonstrate our system in two case studies drawn from cognitive psychology, where we use it to design optimal experiments in the domains of sequence prediction and categorization. We find strong empirical validation that our automatically designed experiments were indeed optimal. We conclude by discussing a number of interesting questions for future research."
"Semantic Parsing to Probabilistic Programs for Situated Question Answering" Krishnamurthy, Tafjord
"Situated question answering is the problem of answering questions about an environment such as an image. This problem requires interpreting both a question and the environment, and is challenging because the set of interpretations is large, typically superexponential in the number of environmental objects. Existing models handle this challenge by making strong -- and untrue -- independence assumptions. We present Parsing to Probabilistic Programs (P3), a novel situated question answering model that utilizes approximate inference to eliminate these independence assumptions and enable the use of global features of the question/environment interpretation. Our key insight is to treat semantic parses as probabilistic programs that execute nondeterministically and whose possible executions represent environmental uncertainty. We evaluate our approach on a new, publicly-released data set of 5000 science diagram questions, finding that our approach outperforms several competitive baselines."
"We present Parsing to Probabilistic Programs (P3), a novel model for situated question answering that embraces approximate inference to enable the use of arbitrary features of the language and environment. P3 trains a semantic parser to predict logical forms that are probabilistic programs whose possible executions represent environmental uncertainty. We demonstrate this model on a challenging new data set of 5000 science diagram questions, finding that it outperforms several competitive baselines and that its global features improve accuracy. P3 has several advantageous properties. First, P3 can be easily applied to new problems: one simply has to write an initialization program and define the execution features. Second, the initialization program can be used to encode a wide class of assumptions about the environment. For example, the model can assume that every noun refers to a single object. The combination of semantic parsing and probabilistic programming makes P3 an expressive and flexible model with many potential applications."
"TerpreT: A Probabilistic Programming Language for Program Induction" Gaunt, Brockschmidt, Singh, Kushman, Kohli, Taylor, Tarlow
"We study machine learning formulations of inductive program synthesis; that is, given input-output examples, we would like to synthesize source code that maps inputs to corresponding outputs. Our aims in this work are to develop new machine learning approaches to the problem based on neural networks and graphical models, and to understand the capabilities of machine learning techniques relative to traditional alternatives, such as those based on constraint solving from the programming languages community. Our key contribution is the proposal of TerpreT, a domain-specific language for expressing program synthesis problems. TerpreT is similar to a probabilistic programming language: a model is composed of a specification of a program representation (declarations of random variables) and an interpreter that describes how programs map inputs to outputs (a model connecting unknowns to observations). The inference task is to observe a set of input-output examples and infer the underlying program. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper like-to-like comparisons between different approaches to inference. From a single TerpreT specification we can automatically perform inference using four different back-ends that include machine learning and program synthesis approaches. These are based on gradient descent (thus each specification can be seen as a differentiable interpreter), linear program relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesissystem. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. Our key, and perhaps surprising, empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations. We conclude with some suggestions on how the machine learning community can make progress on program synthesis."
"These works raise questions of (a) whether new models can be designed specifically to synthesize interpretable source code that may contain looping and branching structures, and (b) whether searching over program space using techniques developed for training deep neural networks is a useful alternative to the combinatorial search methods used in traditional IPS. In this work, we make several contributions in both of these directions."
"Shows that differentiable interpreter-based program induction is inferior to discrete search-based techniques used by the programming languages community. We are then left with the question of how to make progress on program induction using machine learning techniques."
videohttps://youtu.be/vzDuVhFMB9Q?t=2m40s (Gaunt)
"Black-Box Policy Search with Probabilistic Programs" Meent, Paige, Tolpin, Wood
"In this work, we explore how probabilistic programs can be used to represent policies in sequential decision problems. In this formulation, a probabilistic program is a black-box stochastic simulator for both the problem domain and the agent. We relate classic policy gradient techniques to recently introduced black-box variational methods which generalize to probabilistic program inference. We present case studies in the Canadian traveler problem, Rock Sample, and a benchmark for optimal diagnosis inspired by Guess Who. Each study illustrates how programs can efficiently represent policies using moderate numbers of parameters."
"In this paper we put forward the idea that probabilistic programs can be a productive medium for describing both a problem domain and the agent in sequential decision problems. Programs can often incorporate assumptions about the structure of a problem domain to represent the space of policies in a more targeted manner, using a much smaller number of variables than would be needed in a more general formulation. By combining probabilistic programming with black-box variational inference we obtain a generalized variant of well-established policy gradient techniques that allow us to define and learn policies with arbitrary levels of algorithmic sophistication in moderately high-dimensional parameter spaces. Fundamentally, policy programs represent some form of assumptions about what contextual information is most relevant to a decision, whereas the policy parameters represent domain knowledge that generalizes across episodes."