Rademacher Complexity

22 Jan 2016

A few years ago I was sitting with a bunch of grad students in a data analytics class joking about R package names and ML terminology: e1071, partykit, confusion matrix, artifical nueral network, Perceptron, Belief Nets, Maximum a Posteriori. Take them of context and pretend you don’t know what they mean. (If you don’t, you’re probably a lot saner), and you’ll realize these words sound really cool. My vote for coolest name goes to Random Forest, but Confusion Matrix sounds awesome, too. My grad friends and I started nominating these for band names and song titles.

Perceptron comes out with it’s latest album, Random Forest, featuring its hit single Tangled in the Neural Net

Thus, I’m writing about Rademacher Complexity because of the cool title. Also, to further the liberation of machine learning knowledge from graduate school textbooks.

RC solves the problem of estimating over-fitting. In a PAC learning scenario, where we pick an Empirical Risk Minimizer $g$, our out of sample error can be thought of as how well we fit our training set plus a penalty on how much we overfit.

[1] $E_{out}(g) = E_{in}(g) + overfitpenalty$

The Rademacher penalty estimates the overfit penalty for classification tasks. For a more general estimate, see the permutation bound. The insight to the Rademacher penalty is that it estimates the overfit penalty by computing how much we overfit a dataset for which we can compute the optimism.

In a binary classification task, we copy the data and replace each original label with a label drawn from the “Rademacher Distribution,” a distribution where each label has an equal chance of being drawn $P (y_n =1/2)$. From there, we train the model on the data, minimize the in-sample error, and see how well we fit the noise. We call that error $E_r$.

\[\hat E_{out} = E_{in} + (1/2 - E_r)\]

What’s nice about the Rademacher penalty is that it is simple, intuitive, and data dependent. It accounts for how well the training set lends itself to over-fitting. That makes it tighter than the worst-case VC Bound.

\[P \left(\text{test error} \leq \text{training error} + \sqrt{h(\log(2N/h)+1)-\log(\eta/4)\over N} \right) = 1 - \eta\]

These two bounds are related. For a constant $C$ and VC dimenion $d$, the Rademacher Complexity is bounded by

\[C\sqrt{\frac{d}{n}}\]

Other tools

Rademacher Complexity is a good tool for model selection and controlling over-fitting in classification scenarios. It’s great because it directly estimates $E_{out}$ without a test set.

There are some other approaches you could take to estimate $E_{out}$. If you did the same thing using normally distributed variables, it’s called the Gaussian Complexity.

You can also compute a similar estimate by permuting the labels of the training data. It works for regression as well as classification, and has a tighter bound because it takes the original output labels into account. It’s called the permutation estimate and has an associated bound [3].

Confusion Matrix’s, new music video Probably Approximately Correct goes viral.

Abu-Mostafa, Magdon-Ismail, Lin: Nov-2014, Page 30, e-chapter 9, Learning From Data.
Peter L. Bartlett, Shahar Mendelson (2002) Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research 3 463-482
Magdon-Ismail, Permutation Complexity Bound on Out-Sample Error, http://www.cs.rpi.edu/~magdon/ps/conference/PermCompNIPS2010.pdf

What I Like About Bacon.js

11 Jan 2016

These are notes from a class I taught on bacon.js

Event Listeners become Event Streams

Wrap your events in bacon! This common pattern

$("#clickme").on('click', function(event){ alert(event.target)})

becomes

var clicked = $("#btn").asEventStream('click');
clicked.onValue(function(event){ alert(event.target) });

You can give asEventStream the same arguments as jQuery.on.

$("div").asEventStream("click", ".specific-selector",function(e){})

Promises

Bacon can also create a stream by wrapping a promise object.

Bacon.fromPromise($.ajax({ url : "/sup.json"})) ;

or an event

Bacon.fromEvent(document.body, "click");

Properties

Bacon provides an abstraction for the common pattern in which you want to update a value asynchronously. It does this by allowing streams to become Properties, which are like streams but take the last value passed as the current value.

stream.toProperty()

You can give .toPoperty a starting value until data gets sent down the stream.

stream.toProperty(1)

This will contain the default value of true until it changes, then keep the the latest value.

var checked = $("#checkbox").asEventStream('click')
			.map(function(e){ return e.target.value })
			.toProperty(true);

Observables

EventStreams and Properties are both “Observables” and share some methods that also return observables for chaining.

.subscribe - Attaches a handler and returns an unsubscribe function.
.onValue - assigns a function to be called for each new value and returns an unsubscribe. These are good for adding side-effects. The difference between onValue and subscribe is that subscribe handlers get events while onValue gets the value of the stream.
.onValues is like the above, but if the value is an array it passes the it as function arguments
.toPromise - takes a promise constructor if given followign the ESS6 implementation. The promise is resolved with the last event from the observable
.firstToPromise - returns a promise with the first event from the observable

This is really cool because of the power this abstraction affords. You can now manipulate events as they travel through your code using all those functional tools you love.

$("#input1").asEventStream("keyup")
.map(function(event) { return $(event.target).val() })

You may be familiar with group, flatMap, join, reduce, combine, transform, skip, filter, take, merge, first, transform.

See http://www.cheatography.com/proloser/cheat-sheets/bacon-js/ for a complete list with descriptions.

Defining your own event sources

You can define your sources using Bacon.fromBinder. It takes a function that gets passed a function to send events and returns a function that is called to unsubscribe from the event stream.

var event = Bacon.Next("This is a way to create a Bacon event");

var myStream = Bacon.fromBinder(function(send){
	send(1); // can send values
	send(event); // can take an event object
	send(Bacon.Error("fail")); // or an error
	send(new Bacon.End()); // returning Bacon.End indicates the stream is over
});


  var rsvps = Bacon.fromBinder(function(sink) {
    must.Rsvps(sink);
    return $.noop;
  });

Repeat

Repeat is interesting for asynchronous patterns.

From the docs

Bacon.repeat(fn) Calls generator function which is expected to return an observable. The returned EventStream contains values and errors from the spawned observable. When the spawned observable ends, the generator is called again to spawn a new observable.

This is repeated until the generator returns a falsy value (such as undefined or false).

We can use this for a long-polling application where we want to keep a local mirror of a remote resource.

var ajaxParams = { url: "/resource.json", timeout: 1000000 }

var localResource = Bacon.repeat(function(){
	return Bacon.fromPromise($.ajax(ajaxParams)));
}).toProperty();

Warning: Lazy Evaluation

LE is a good thing because it can minimize the amount of iteration necessary, but it can lead to traps. Remember that functions are not evaluated when they’re called like in _ , they are evaluated when necessary, so try to not use mutable variables from another scope in a closure.

Function construction

Bacon has some jazz that saves you from always having to create functions manually for certain tasks.

At least onValue, onError, onEnd, doAction, map, filter, assign, takeWhile, mapError support function construction.

You can pass a handler an object and a method name.

Bacon.fromPromise($.get('/name.txt'))
	.toProperty("loading")
	.onValue($("#name"), 'text');

That’s equivalent to

Bacon.fromPromise($.get('/name.txt'))
	.toProperty("loading")
	.onValue(function(val){
		$("#name").text(val);
	})

var name = $('#name');
Bacon.fromPromise($.get('/name.txt'))
	.toProperty("loading")
	.onValue(name.text.bind(name))

You can also access attributes declarativly.

stream.map('.box.1')
// is the same as
stream.map(function(val){val.box[1]});
// you can map things to constants

Currying

When setting a handler, the remaining arguments gets passed to the function.

stream.map(function(text, value){ return text+value; }, "hello");

Buses

These are special EventStreams with a push method that adds events to the stream. They can also be composed with other EventStreams.

var bus = Bacon.Bus(); // creates bus
bus.push(1); // add value to stram
bus.end() // ends the stream
bus.error(error) // sends an error to all subscribers
bus.plug(stream) // funnels an event stream into the bus

Merging

Lastly, event streams can me merged. This is really useful.

bus.merge(eventStream)

Testing the Random-walk Hypothesis with a Dickey-Fuller Test

07 Oct 2015

I came across this excellent article on using NIST tests designed for random number generators to investigate the random-walk hypothesis in the stock market.

It reminded me of other ways to test for random walks from Econometrics.

The Dickey-Fuller Test

Suppose you model a time series with an autoregressive model.

\[y_t = \beta y_{-t} + u_t\]

where $y_t $ is the value of $y$ at time $t$, $\beta$ is a constant and $u_t$ is the random error term. Then the processes is called a random-walk if $\beta=1$, because then it’s next move only depends on the random error term. This hypothesis test is conducted under the null-hypothesis that $\beta=1$. If we fail to reject the null hypothesis, it is said that there is a “unit autoregressive root” or just “unit root”.

This is a very simplistic test. Some other tests for a unit root are the Augmented Dickey-Fuller and the Phillips–Perron tests.

Estimation Problems

A unit root can cause estimation problems because it implies that the underlying distribution is non-stationary, since the variance increases over time. If $\beta = 1$ in the following model,

\[y_t = \beta y_{-t} + u_t = y_{t-1} + u_t\]

then

\[y_t = y_0 \sum_{j=1}^t u_t\]

The variance is given by

\[\operatorname{Var}(y_t) = \sum_{j=1}^t \sigma^2=t \sigma^2\]

The variance diverges to infinite with time. When the stochastic process is non-stationary, OLS can produce invalid results despite high t-statistics and $R^2$. Granger and Newbold called such results “spurrious regressions.”

Notes

This was my first technical post! I’ll follow up with example code and more time-series techniques.

References

Dicky and Fuller https://www.jstor.org/stable/2286348?&seq=1

Granger and Newbold http://wolfweb.unr.edu/~zal/STAT758/Granger_Newbold_1974.pdf

Also Stata documentation for the Dickey-Fullet test

Introductions