Adaptation-Based Programming

I am Orange

Welcome to the home-page of the Adaptation-Based Programming project. The goal of this project is to enable a programmer to write code that learns to "do the right thing" after repeated executions. This ability is necessary when the program is complex and optimal program behavior is hard to encode. In other words, we want to make it easy for programmers to write adaptive code that learns to solve complex problems optimally. Our focus is on usability and performance, which for the programmer means getting the benefits of cutting-edge ideas from Machine Learning without having to learn any of it.

Our ABP library implemented for Java is available here. To learn more about the ABP paradigm, read on...

Why do we need ABP?

Standard programming paradigms expect programmers to write complete programs in the sense that the program behavior at any moment must be completely specified. This paradigm is problematic whenever it is not obvious how to best solve a problem. Consider for example the design of intelligent opponent programs for real-time strategy games. Computer-controlled opponents are typically quite weak and hardly adaptive compared to an experienced human. This is not too surprising since it is very difficult for a programmer to design a complete program for such complex, dynamic environments. The same type of difficulty is faced when developing intelligent agents for other types of problems, for instance, writing network control protocols that achieve close to optimal performance.

An alternative to writing complete programs arisies from the field of machine learning under the name reinforcement learning(RL). Rather than specify a complete program, an RL practitioner instead specifies an objective function to be optimized -- e.g. the score in a real-time strategy game or throughput in a networking application -- along with a set of control actions that can be executed at any moment. RL algorithms will then automatically interact with the environment in order to learn a policy for selecting actions that best optimize the objective. This can be viewed as a form of automatic programming, where the learned policy is the program. While RL has been used to create world-championship level Backgammon programs here and build complex helicopter controllers here among many other things, it can require significant expertise and debugging to formulate a complex problem in such a way that RL will be successful.

Bridging the two extremes

We observe that traditional programming and RL are the extremes of a programming spectrum. The former often asks too much of the programmer, forcing them to make choices that they are unsure about while the latter often does not ask enough of the programmer, attempting to construct a program or policy from scratch. ABP explores the full spectrum by having the programmer write adaptive programs, in which they can completely specify parts of the program that they are certain about and leave unspecified aspects of the program that they are unsure about. At runtime, RL methods are applied automatically to adapt the uncertain parts of the program attempting to optimize a programmer-specified objective function. The programmer has the option of specifying as much or as little of the program as is appropriate or possible for the specific problem, spanning the spectrum from traditional programming to pure RL.

Click here to see ABP for Java. For a more detailed description of this library including its usage, click here.