A few years ago I was sitting with a bunch of grad students in a data analytics class joking about R package names and ML terminology: e1071, partykit, confusion matrix, artifical nueral network, Perceptron, Belief Nets, Maximum a Posteriori. Take them of context and pretend you don’t know what they mean. (If you don’t, you’re probably a lot saner), and you’ll realize these words sound really cool. My vote for coolest name goes to Random Forest, but Confusion Matrix sounds awesome, too. My grad friends and I started nominating these for band names and song titles.

Perceptron comes out with it’s latest album, Random Forest, featuring its hit single Tangled in the Neural Net

Thus, I’m writing about Rademacher Complexity because of the cool title. Also, to further the liberation of machine learning knowledge from graduate school textbooks.

RC solves the problem of estimating over-fitting. In a PAC learning scenario, where we pick an Empirical Risk Minimizer $g$, our out of sample error can be thought of as how well we fit our training set plus a penalty on how much we overfit.

[1] $$E_{out}(g) = E_{in}(g) + overfitpenalty$$

The Rademacher penalty estimates the overfit penalty for classification tasks. For a more general estimate, see the permutation bound. The insight to the Rademacher penalty is that it estimates the overfit penalty by computing how much we overfit a dataset for which we can compute the optimism.

In a binary classification task, we copy the data and replace each original label with a label drawn from the “Rademacher Distribution,” a distribution where each label has an equal chance of being drawn $P (y_n =1/2)$. From there, we train the model on the data, minimize the in-sample error, and see how well we fit the noise. We call that error $E_r$.

$\hat E_{out} = E_{in} + (1/2 - E_r)$

What’s nice about the Rademacher penalty is that it is simple, intuitive, and data dependent. It accounts for how well the training set lends itself to over-fitting. That makes it tighter than the worst-case VC Bound.

$P \left(\text{test error} \leq \text{training error} + \sqrt{h(\log(2N/h)+1)-\log(\eta/4)\over N} \right) = 1 - \eta$

These two bounds are related. For a constant $C$ and VC dimenion $d$, the Rademacher Complexity is bounded by

$C\sqrt{\frac{d}{n}}$

### Other tools

Rademacher Complexity is a good tool for model selection and controlling over-fitting in classification scenarios. It’s great because it directly estimates $E_{out}$ without a test set.

There are some other approaches you could take to estimate $E_{out}$. If you did the same thing using normally distributed variables, it’s called the Gaussian Complexity.

You can also compute a similar estimate by permuting the labels of the training data. It works for regression as well as classification, and has a tighter bound because it takes the original output labels into account. It’s called the permutation estimate and has an associated bound [3].

Confusion Matrix’s, new music video Probably Approximately Correct goes viral.

1. Abu-Mostafa, Magdon-Ismail, Lin: Nov-2014, Page 30, e-chapter 9, Learning From Data.
2. Peter L. Bartlett, Shahar Mendelson (2002) Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research 3 463-482
3. Magdon-Ismail, Permutation Complexity Bound on Out-Sample Error, http://www.cs.rpi.edu/~magdon/ps/conference/PermCompNIPS2010.pdf

# What I Like About Bacon.js

These are notes from a class I taught on bacon.js

## Event Listeners become Event Streams

Wrap your events in bacon! This common pattern

$("#clickme").on('click', function(event){ alert(event.target)})  becomes var clicked =$("#btn").asEventStream('click');


You can give asEventStream the same arguments as jQuery.on.

$("div").asEventStream("click", ".specific-selector",function(e){})  ## Promises Bacon can also create a stream by wrapping a promise object. Bacon.fromPromise($.ajax({ url : "/sup.json"})) ;


or an event

Bacon.fromEvent(document.body, "click");


## Properties

Bacon provides an abstraction for the common pattern in which you want to update a value asynchronously. It does this by allowing streams to become Properties, which are like streams but take the last value passed as the current value.

stream.toProperty()


You can give .toPoperty a starting value until data gets sent down the stream.

stream.toProperty(1)


This will contain the default value of true until it changes, then keep the the latest value.

.map(function(event) { return $(event.target).val() })  You may be familiar with group, flatMap, join, reduce, combine, transform, skip, filter, take, merge, first, transform. See http://www.cheatography.com/proloser/cheat-sheets/bacon-js/ for a complete list with descriptions. ## Defining your own event sources You can define your sources using Bacon.fromBinder. It takes a function that gets passed a function to send events and returns a function that is called to unsubscribe from the event stream. var event = Bacon.Next("This is a way to create a Bacon event"); var myStream = Bacon.fromBinder(function(send){ send(1); // can send values send(event); // can take an event object send(Bacon.Error("fail")); // or an error send(new Bacon.End()); // returning Bacon.End indicates the stream is over }); var rsvps = Bacon.fromBinder(function(sink) { must.Rsvps(sink); return$.noop;
});


## Repeat

Repeat is interesting for asynchronous patterns.

From the docs

Bacon.repeat(fn) Calls generator function which is expected to return an observable. The returned EventStream contains values and errors from the spawned observable. When the spawned observable ends, the generator is called again to spawn a new observable.

This is repeated until the generator returns a falsy value (such as undefined or false).

We can use this for a long-polling application where we want to keep a local mirror of a remote resource.

var ajaxParams = { url: "/resource.json", timeout: 1000000 }

var localResource = Bacon.repeat(function(){
return Bacon.fromPromise($.ajax(ajaxParams))); }).toProperty();  ## Warning: Lazy Evaluation LE is a good thing because it can minimize the amount of iteration necessary, but it can lead to traps. Remember that functions are not evaluated when they’re called like in _ , they are evaluated when necessary, so try to not use mutable variables from another scope in a closure. ## Function construction Bacon has some jazz that saves you from always having to create functions manually for certain tasks. At least onValue, onError, onEnd, doAction, map, filter, assign, takeWhile, mapError support function construction. You can pass a handler an object and a method name. Bacon.fromPromise($.get('/name.txt'))
.onValue($("#name"), 'text');  That’s equivalent to Bacon.fromPromise($.get('/name.txt'))
.onValue(function(val){
$("#name").text(val); })  Or var name =$('#name');

## Notes

This was my first technical post! I’ll follow up with example code and more time-series techniques.

## References

Dicky and Fuller https://www.jstor.org/stable/2286348?&seq=1

Granger and Newbold http://wolfweb.unr.edu/~zal/STAT758/Granger_Newbold_1974.pdf