Monday, January 15, 2018

Whither NoSQL

When I first started the NoSQL NYC meetup back in 2010, it was the first NoSQL group of its kind on Meetup.com. The reasoning for creating the group was not because I was an expert on the subject, it was because I knew nothing. How can I learn more? So many new database types were being released around this time. Hadoop started small in 2006, CouchDB became an Apache subproject in 2008 and MongoDB was released in 2009. Developers looking to move beyond whatever limitations they had with SQL databases now had choices. However, developers experienced in these technogies were few, so the meetup was created to bring like-minded people together.

By the time I hosted my last meetup before moving to California almost two years later, group membership was over a thousand of these like-minded developers. Interest was high and events were full. Document stores, graph databases, distributed filesystems, not a line of SQL in sight.

It was sad to leave the group behind, but it was left in the capable hands of a new organizer. Not long after, the number of meetups began to dwindle. It saddened me even more that the group I created and grew over those two years was being neglected. Should I have picked a different organizer? Should I have maintained an active role in organizing? The truth is that the new organizer was not negligent in his duties, but simply that NoSQL became the new norm. There was a better chance that a new startup was using MongoDB than MySQL. The need to lump all these new databases under one convienient term was no longer needed.

NoSQL was never against the SQL syntax, but simply an alternative to relational databases. Over time, even though the software development world fully embraced these new concepts, the term NoSQL simply went away. You will hear references to the CAP theorem, to eventual consistency, to document stores, but not NoSQL.

While the name is gone, the spirit will continue. Thanks for the memories.

Saturday, February 15, 2014

Document boosting in Elasticsearch

There has been some discussion on the Elasticsearch mailing list lately about applying index-time boosts at the document level, aka document boosting. The practice has been discouraged (in fact, the Elasticsearch team has officially deprecated document boosting) in favor of query-time scoring, but without any detailed explanation why. Instead of repeating myself each time the question is asked, I have decided to detail the various reasons why document boosts should no longer be used.

Tuesday, July 24, 2012

How private is your online job profile?


If you have been in the workforce for a considerable amount of time, you might have a job profile on a job site or two. Although using one's network is perhaps one of the best ways to secure a new position, I found myself once again using a job site to find a new position in an area where I had no connections.

For this job hunt, I posted my resume on Monster.com and Dice.com, two of the most popular job sites in the US. Once I secured a new job, I set all my online job profiles to Private in order to stop receiving emails about other opportunities. However, after a few months, I received a curious email that started off:

"I found your resume on Monster.com and wanted to run a new Java Developer position in Dulles, VA by you."

The email was sent via Monster using their anonymous mailer <Monster> anonredir@route.monster.com and the recruiter introduced themselves using my first name.

I quickly assumed that I left my profile as Public on Monster. However, my resume was in fact Private. Here is how my profile looks like:


How would this situation be possible? How can a recruiter contact me via Monster if my resume is set to Private? I regarded the email as a fluke and thought nothing of it. Then I received two more emails from the same recruiter. I quickly sent an email to Monster to understand how can it be possbile to be receive emails via my private resume. Their response was as follows:

The reason employers may be contacting you is that at one time your resume was set to Public or you applied to an employers job posting. In this case, a resume that you used to apply online for a job or that was searchable, employers, recruiters, and others who have paid for access to the Monster resume database, or have paid to obtain a copy of that database, as well as parties who have otherwise gained access, may have retained a copy of your resume in their own files or databases. Monster is not responsible for the retention, use, or privacy of resumes in these instances.

Let us break down the different scenarios on how a recruiter might have my information:

1) "at one time your resume was set to Public"

My job junt was not a secret. I was moving to a new area and I have already left my previous position. Setting my resume to Public helped me idenify companies I might not have found myself since they would be able to contact me (nothing interesting came up, but that's another story). Simply setting my resume to Public meant any employer or recruiter with access on Monster can save my profile ad infinitum regardless of any privacy changes I may make.

2) "or you applied to an employers job posting."

Not applicable in my case, but it is an acceptable scenario for someone's profile to be viewable.

3) "and others who have paid for access to the Monster resume database or have paid to obtain a copy of that database"

Is Monster.com telling me that even if someone's sets their profile to Private, Monster.com can still sell their resume database to anyone with a checkbook? There is no opt-out from having your information sold. According to Monster.com's own online FAQ, a Private profile is defined as:

"If you select private, your resume will not be seen by employers conducting resume searches. However, you can use your private resume to apply for jobs."
"If you select private as your resume status, your resume will not be seen by employers conducting resume searches (but you can still use your private resume to apply for specific jobs online)."

Monster.com has two conflicting viewpoints on how Private is defined. Further in the FAQ, regarding the deletion of a resume, Monster.com states:

"If you delete a resume that you used to apply online for a job or that was searchable, employers, recruiters, and others who have paid for access to the Monster resume database, or have paid to obtain a copy of that database, as well as parties who have otherwise gained access, may have retained a copy of your resume in their own files or databases. Monster is not responsible for the retention, use, or privacy of resumes in these instances."

This policy is similar to their Private resume policy. If a resume was deleted or set to Private, why does Monster.com allow someone to email the candidate in question? The email was not sent directly, but via Monster.com's email system. Monster.com can shut off access at any time, but choose not to. Nowadays, many of have have a virtual resume via a LinkedIn profile, but LinkedIn allows their user granular controls over what is publicly available. How much information does Monster.com actually sell? How much privacy are we entitled to when searching for a new job?

Friday, September 9, 2011

Create pluggable REST endpoints in elasticsearch

A quick introduction on how to create a plugin in elasticsearch that allows you to define new REST endpoints.

Monday, August 29, 2011

Machine Learning Ex2 - Benchmarks

In my previous post, I implemented the algorithm for linear regression using gradient descent in Scala using two different methods: standard builtin mathematical methods and Scalala, a Scala linear algebra library.

Shortly after writing the solution I started to wondering if using Scalala had any performance impact on the runtime cost of the solution. While Scalala does have the overhead of object creation, it also makes heavy use of specialized classes, which should provide a considerable improvement.

I decided to do some naive benchmarking. These benchmarks are nowhere near scientific, but should provide a general sense of the solution's runtime. Since I was benchmarking the two Scala solutions, I decided to look at also the MATLAB/Octave and R solutions.