Friday, September 9, 2011

Create pluggable REST endpoints in elasticsearch

A quick introduction on how to create a plugin in elasticsearch that allows you to define new REST endpoints.

Monday, August 29, 2011

Machine Learning Ex2 - Benchmarks

In my previous post, I implemented the algorithm for linear regression using gradient descent in Scala using two different methods: standard builtin mathematical methods and Scalala, a Scala linear algebra library.

Shortly after writing the solution I started to wondering if using Scalala had any performance impact on the runtime cost of the solution. While Scalala does have the overhead of object creation, it also makes heavy use of specialized classes, which should provide a considerable improvement.

I decided to do some naive benchmarking. These benchmarks are nowhere near scientific, but should provide a general sense of the solution's runtime. Since I was benchmarking the two Scala solutions, I decided to look at also the MATLAB/Octave and R solutions.

Sunday, August 21, 2011

Machine Learning Ex2 - Linear Regression

Implementing linear regression using gradient descent in Scala based on Andrew Ng's machine learning course.

Tuesday, August 9, 2011

Steal this database? Don't mind if I do.

A while back, Meetup.com issued an pseudo-challenge: steal their database.  Nothing that would result in the FBI knocking on your door mind you, but a look into their streaming API.  Meetup.com streams all their public events and RSVPs via HTTP streaming or HTML5 websockets, so all the is required to steal their database is a connection to a stream and the ability to save the content.

Sunday, July 24, 2011

Next NoSQL Meetup: Real time processing with In-Memory-Data-Grid and NoSQL

The NoSQL NYC Meetup has been enjoying quite a year.  Great talks from all spectrums of the NoSQL world: document stores, graph databases, key-value stores, you name it. What I have learned from all these talks is that nothing wows a crowd more than a live demo.  Marko Rodriguez's demonstation of Gremlin using OpenRDF Sail data not even on his computer was particularly fascinating.

That is why I am excited to see what Shay Hassidim from GigaSpaces brings this week. He is promising live demos working with big data. The type of data that makes NoSQL systems a necessity and not merely a premature optimization.  The talk will cover data access patterns, MapReduce, Cassandra, and MongoDB.

If you are in New York City, stop by!
http://www.meetup.com/nosql-nyc/events/25379481/