Greg Dennis

Greg Dennis - Drej

Drej - A Java Regression Library

About
Drej (pronounced "dredge") is a simple, open-source, Java library for linear and non-linear least-squares regression and regularized least-squares classification. Clients can easily plug-in arbitrary kernels and data sets, and Drej will fit the optimal curve.

Download
Current Version: 1.2 (11/12/2005)
Drej requires the javax.vecmath package in the Java3D library.
Drej is licensed under the GNU General Public License.
Source and binaries available here.
View Javadoc API Specification here or download it here.

Example
Let's say you have n data points, each of which have m features. Your code will probably look something like this:

 GMatrix data = new GMatrix(m, n);
 GVector values = new GVector(n);

 // here you would put all your data points into the data matrix
 // each data point goes into a column of the matrix
 // put the actual values for those data points in the values vector
 // the data point in the ith column of the data matrix should have
 // the value in the ith entry in the values vector.
 // I believe some kernels only work when your range of possible values has
 // zero as a midpoint. for instance, if you're classifying data points into "yes"
 // and "no", best to choose their values as 1 and -1, as opposed to 1 and 0.

 // construct the kernel you want to use:
 Kernel kernel = . . .;

 // choose a penalty factor on the complexity of the solution
 // this helps to prevent overfitting the data
 // I was told me this number should be between
 // 10^-3 and 1, I often choose 0.5, but you can play with it
 double lambda = 0.5;

 // do the regression, which returns a function fit to the data
 Representer representer = Regression.solve(data, values, kernel, lambda);

That's basically it. What happens next depends on what you want to use it for. If you'd like to use the regression to predict the value of a data point y, just feed y into the representer function:

 double predictedValue = representer.eval(y);

If you'd like to calculate how well the function fits the data, you can first calculate the vector of values the representer would predict for your data points, subtract from that the vector of actual values, and take the norm squared of that difference. Let's call this the "cost". The lower the cost, the better the function fits the data. You can try out different kernels, and see which one yields the best-fit curve (the lowest cost):

 GVector predictedValues = Matrices.mapCols(representer, data);
 predictedValues.sub(values);
 double cost = predictedValues.normSquared();