Why Graph Database?
Experiments
In typical developer fashion, I used a project as an excuse to play with new tools. In the process I ended up trying 3 different databases that are compatible with Java: OrientDB with the TinkerPop Blueprints API, Neo4j using both the web and embedded java api, and Apache Jena's implementation of RDF.OrientDB with TinkerPop Blueprints
Links - OrientDB , Blueprints PluginOrientDB is a fantastic document store that is especially notable for its embedded Java implementation. If you ever need to store and query data in a semi-structured way for an application in-memory or with a local cache, OrientDB is a great way to go.
When it comes to graph storage, things get a little bit more complicated. It is very possible to use the native OrientDB API to do graph storage, but using this approach makes querying data a little more difficult. To get around this I tried using the Blueprints plugin for OrientDB to allow me to use a more standardized wrapper for creating and querying nodes in the graph. The idea behind this was that I could switch data-stores if OrientDB was not performant enough. Unfortunately the Bluepages plugin for OrientDB is not yet very stable and this made it very difficult to do things such as indexing nodes based on an attribute. This led to some very ugly code and use of an external index using Redis.
So I gave up using OrientDB.
Neo4j
Neo4j is a popular graph database that benefits from a relatively expansive query language called Cypher that allows for some very complex queries and update operations. Neo4j can be used as a standalone server or as an embedded database within a Java application.
When running as a standalone database server, Neo4j provides a very clean REST api for adding, querying and updating nodes. Unfortunately there are no wrapper APIs written to allow a Java application to access the database remotely, you must write your own. This is especially problematic since there is no option to use a binary remote API, so communication is slower than it needs to be and more verbose.
Because of this limitation, I ended up using Neo4j as an embedded database within my application. In order to simplify querying, I used a version 2.0 preview release so that I could assign labels to nodes and query them by their label (or "class" essentially). Everything seemed to go very smoothly and writing queries in Cypher proved relatively easy.
Unfortunately I ran into some issues with stability in the multi-threaded environment of my application. Eventually I was able to iron these out by upgrading to the next preview milestone and locking down all updates and queries to be single threaded. This made high-volume updates very slow, but I was able to mitigate this by surrounding the graph storage with a redis cache that would allow me to identify data changes and only perform updates on change.
Unfortunately I ran into some issues with stability in the multi-threaded environment of my application. Eventually I was able to iron these out by upgrading to the next preview milestone and locking down all updates and queries to be single threaded. This made high-volume updates very slow, but I was able to mitigate this by surrounding the graph storage with a redis cache that would allow me to identify data changes and only perform updates on change.






