A few weeks ago, the first-ever RJMetrics hackathon took place at our Philadelphia headquarters. I decided to throw my hat into the ring with a project I’d been thinking about for a while: a MySQL to MongoDB query translator.
This was a unique challenge because MongoDB and MySQL are very different technologies that store data in very different ways. To some, translating between them might seem like a non-sequitur. However, I knew there was a use case because of my personal experience learning MongoDB. I would often think about queries in terms of SQL syntax, and a translator like this would have greatly softened the learning curve.
The final product is available at our Query Mongo site, and I encourage you to give it a try. It’s not perfect, but we hope it will be a helpful learning tool for the many people who have SQL experience and are getting started with MongoDB.
In this blog post, I’ll provide some insights into how this tool works.
My Starting Point
Going into the hackathon, there were a few pre-built pieces of our source code that would prove critical to this project. (RJMetrics helps online businesses make smarter decisions using their data, so we’ve built up quite a few libraries related to query management.)
The first piece is our “query object” system, which allows us to represent relational database queries as objects that can be interpreted by our code. These objects allow us to represent most SQL-style queries independent of the platform on which they will ultimately be run. In other words, the same exact query object could be built by our code, and then used to extract data from MySQL, PostgreSQL or Microsoft SQL Server.
Obviously, these query objects require a “query renderer” to translate them into the SQL language of our choice. Our rendering system is pretty robust for the SQL platforms I mentioned, but was extremely weak for MongoDB (only the simplest SELECT queries would work).
The Missing Pieces
First and foremost, I knew that one of the keys to actually finishing this project was to limit its scope to the most common use cases. I decided that my translator would only support SELECT queries, would not have support for JOINs across multiple tables and would have limited support for complex WHERE clauses including embedded parentheses. These limitations may change over time but, in the scope of the hackathon, they were a necessity.
With a focused scope, I was ready to start coding. First, I significantly improved the query renderer’s support of MongoDB queries to include support for query properties such as:
- aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX()
- WHERE clauses, including IN statements and the OR operator
- GROUP BY clauses
- ORDER BY clauses
- LIMIT clauses
Next, I needed a system for taking a MySQL query string and translating it into one of our query objects. Before the hackathon, these query objects could only be built programmatically. Now, we needed to be able to parse human input so that we could build query objects that then rendered into MongoDB query strings.
I found a sweet open-source MySQL parser written in PHP that served as a starting point for tokenizing MySQL strings. Once I had a traversable representation of the query string, I had to interpret each of the query components, translate them into our query object components and build in robust error handling for query properties our objects don’t support.
The Final Product
You can play with the translator by visiting Query Mongo. I did my best to issue helpful error messages when the translator doesn’t support something or your input query is malformed. Comments or suggestions are always welcome, and I’ll do my best to continue improving on the translator over time. Happy querying!
Pingback: Geek Reading December 12, 2012 | Regular Geek