STL Research Projects Header

Additional information about the SystemsThatLearn@CSAIL Research projects is available for initiative members and may be viewed at:
http://cap.csail.mit.edu/STLCSAIL_research. You must be logged into your CSAIL Alliances account to access the page.

Computational Model

Computational Understanding of Visualizations and Graphic Designs

Researchers: Zoya Bylinskii, Fredo Durand, Aude Oliva

Armed with knowledge about how humans perceive visualizations, we are building computational tools to reason about posters, graphs, and visualizations, with applications to design, advertising, and user interfaces. We are using state-of-the-art deep learning approaches to automatically detect and parse text inside posters, to look around an image to automatically detect representative visual pictographs or icons, and to make predictions about the topics or concepts being communicated. We can make predictions about where people look on posters and graphs, and use this information to automatically generate text and visual summaries and thumbnails. Our approaches can be made to work in real-time within interactive design applications.

Visual Navigation

Lifelong Adaptive Learning  for Visual Navigation Networks

Researchers: Mathew Monfort, Aude Oliva

Robust training of deep neural networks for visual navigation requires a large amount of data. However, gathering and labeling this data can be expensive and demonstrated data may not cover the distribution of experiences an agent (network) will encounter in application.  We address this by growing a training set throughout an agents life from its own experiences and mistakes.  This allows the agent to efficiently adjust to changing circumstances (motion dynamics, visual features, etc) in complex environments.  We apply our method to the task of navigating 3D mazes in Minecraft with randomly changing block types.

Understanding Videos

Moments: A Large-scale Database for Learning to Understand Videos

Researchers: Aude Oliva, Mathew Monfort

Moments is a large-scale human-annotated dataset of ~ 1 million (and growing) labelled videos corresponding to real-world actions, motions and events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for atomic actions occurring in three second videos, poses daunting challenges: many meaningful events do not include only people, but also objects, animals, and nature; visual and auditory events can be symmetrical or not in time ("opening" means "closing" in reverse order), and transient or sustained. This project requires artificial systems to jointly learn three modalities: spatial, temporal and auditory, for recognizing the activities at human level, predicting future activities and sequences of actions and understanding causal action and agent relationships. Moments, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that can appropriately scale to the level of complexity and abstract reasoning that a human processes on a daily basis.

Software GPU

XNN: A Software GPU for Machine Learning

Researchers: Nir Shavit, Alex Matveev, Justin Kopinsky

We are developing XNN, a breakthrough machine learning execution engine that runs on commodity CPUs and can execute convolutional neural network algorithms (CNNs) at speeds comparable to the fastest GPUs. Moreover, because CPU memory is not limited like that of GPUs, XNN allows execution of large 3D CNN models, such as the ones used for analysis of video or medical image stacks, which are not effectively deployable on today's GPUs.

Deep Neuroscience Learning

Deep Neuroscience Learning: Transferring Human Brain Processes to AI

Researchers: Aude Oliva, Dimitrios Pantazis

Using a new software technology combining the strengths of MEG (magneto-encephalography) and fMRI (functional magnetic resonance imaging), we are able to characterize the spatiotemporal dynamics of perceived or imagined events at the level of the whole human brain. The approach could be suited to developing functional biomarkers to aid clinicians in diagnosing disorders, pinpointing impairments as a precursor to therapeutic interventions, studying how to enhance or maintain a perceptual or cognitive function in the healthy brain, developing novel deep learning architectures inspired by human brain as well as developing adaptive learning algorithms combining the respective strengths of human and A.I. prediction processes.

Machine Learning

Machine Learning-Guided 3D Modeling

Researchers: Justin Solomon, Yu Wang, Yue Wang

Current systems for modeling 3D surfaces and volumes are extremely labor-intensive and restricted to experts.  To alleviate the challenges of modeling for engineering, artistic, and manufacturing applications, we propose leveraging large datasets of CAD, mesh, and point cloud models to design new generative tools for 3D design.  Our focus will be on the engineering and underlying theory of a system for modeling that generates detailed models informed on the fly by datasets of millions of previously-designed artifacts.  The development of such a system will lead to progress not only in 3D modeling---a key application across industries---but also in 3D shape recognition/vision.  Our focus will be on *representational* concerns involving navigation and inference in datasets whose elements are surfaces rather than images; this will require reconsideration of the basic "deep convolutional learning" pipeline, since convolution does not exist on these heterogeneous domains.  A considerable systems component will involve organization and navigation of extremely large datasets of shapes that may or may not be accompanied by semantic labels.

GPS Traces

Automated Map Construction from GPS Traces

Researchers: Favyen Bastani, Songtao He, Mohammad Alizadeh, David DeWitt, Hari Balakrishnan, Samuel Madden

In this project, we are developing new algorithms and scalable systems architectures for converting GPS traces captured from smartphones, along with sensor data those phones (such as accelerometer traces), as well as satellite imagery, into detailed road maps.  Our goal is to automate what today is a costly and time consuming process that involves teams of drivers and manual curators updating road maps.  The maps we produce will capture not only the geometry and connectivity of roads, but also additional roadside features, including locations of signs, lights, obstructions, parking spaces, bus stops, stores, and more.

Diverse Subsets

Automated Map Construction from GPS Traces

Researchers: Chengtao Li, Stefanie Jegelka, Suvrit Sra

In this project, we are developing new algorithms and scalable systems architectures for converting GPS traces captured from smartphones, along with sensor data those phones (such as accelerometer traces), as well as satellite imagery, into detailed road maps.  Our goal is to automate what today is a costly and time consuming process that involves teams of drivers and manual curators updating road maps.  The maps we produce will capture not only the geometry and connectivity of roads, but also additional roadside features, including locations of signs, lights, obstructions, parking spaces, bus stops, stores, and more.

Scalable Optimization

Scalable Bayesian Black-Box Optimization

Researchers: Zi Wang, Stefanie Jegelka, Chengtao Li, Clement Gehring, Leslie Kaelbling, Tomas Lozano-Perez, Pushmeet Kohli

Optimizing an unknown (nonconvex) function is a common problem in robotics (e.g., finding good configurations, or planning paths), parameter tuning in machine learning, and many areas of science and engineering. Often, each data point (observation) corresponds to the outcome of an experiment to be conducted.

In Bayesian Optimization, we aim to simultaneously learn and optimize this unknown function, by sequentially selecting one or a set of experiments or measurements, incorporating the outcomes to update the learned model, and select new experiments based on the updated model. While Bayesian Optimization has become a popular tool, its practical applicability has been hindered by several drawbacks: (1) limited power of estimation and optimization in high-dimensions, (2) efficiency of selecting the next query points, and (3) ability to exploit parallel computing. In this project, we address these drawbacks via improved selection strategies, via learning underlying decomposition structure, and via new, parallelizable variants that scale to the large number of queries necessary for complex optimization problems.

Allocation

Robust Budget Allocation via Continuous Submodularity

Researchers: Matthew Staib, Stefanie Jegelka

The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision.

Communicating Machines

Machines that Learn to See and Communicate

Researchers: Boris Katz. Andrei Barbu, Guy Ben-Yosef, Candace Ross, Yen-Ling Kuo

In the past, data was scarce; today we are overwhelmed by it. In the past, it was hard to get an image or a description of an event. Today, there are hundreds of thousands of cameras that blanket cities, and getting a news feed of myriad events worldwide is easy. Big data has allowed us to organize and store this information but a real understanding of videos and text eludes existing machine learning methods. This understanding will enable many novel applications such as: answering natural-language questions about events and videos, searching for actions and actors, summarizing videos and stories, home robotics, etc. The limitations of current approaches to these problems are stark: they are unable to understand social interactions, unable to recognize events from new viewpoints, unable to recognize that the same event can be described in different ways depending on the listener, unable to adapt to new scenarios, etc. We intend to develop new integrated approaches to understanding events in videos and text by combining machine learning techniques with insights from cognitive science and neuroscience. By learning from how humans perform so well in these tasks and how they adapt to new scenarios, we can modify machine learning methods to do the same. Early results have created an approach that learns to search videos for events described by sentences and that can answer simple questions about videos. We intend to add new capabilities to these approaches by modeling social interactions and physics to recognize a wide range of events and discuss a wide range of topics. A deep understanding of massive video and text data that adapts to new scenarios will enable new applications of AI and machine learning that are well beyond what can be achieved today, and we are excited to see where these techniques can be applied.

Model Compression

Model Compression

Researchers: Piotr Indyk

As data gets larger, models are getting large too -- this reduces efficiency, esp. on mobile devices. Our goal is to compress models while approximately preserving the model quality (focusing on metrics like retrieval, classification, etc).  The basic approach is to called "pruned quantization", based on via multi-dimensional quadtree sketches.