Press "Enter" to skip to content

Neo4j? What is it?

 

What is Neo4j? Well, it is a database, but a different type of database, it is a graph database to be exact! And that is a really cool thing.

But what is this "graph database"? Let’s dig into it.

What is it?

A database is a piece of software that takes care of storing data and finding a specific data from its contents. Simple as that. Most of you probably know the relational databases, like Oracle, MySQL, MS SQL, PostreSQL and all other SomethingSQL. These are examples of traditional databases, their root design comes back to 1974 when SQL language first appeared (at least according to Wikipedia). Their common denominator is that they use some flavor of SQL language and they store data in a relational way.

Neo4j is something different, it’s a NoSQL database, precisely, a graph database. How does it differ? First of all, it doesn’t use SQL language, as the name suggests. But there are much more differences.

In a relational database, we have Tables. Tables are the data store of the relational database. They define what data can be stored in them and in what format. So, for example, we can have table User and it has defined columns: Id, Login, FirstName, LastName, Email, RegistrationDate. Each of these columns must have defined data type, it’s format, it’s precision etc. Also, we can define a relationship between tables using constraints that link two columns in tables. We can add requirements and validation on values etc. All this definition is called a Schema. It’s is a really rigid definition of data format. The database will not let us store a data in a format other than allowed by the schema.

Now, Neo4j is different, a lot different. First of all, there is no schema and no constraints. We can add whatever we want. Here there are two most important parts. Nodes and Edges. It shouldn’t be a surprise to you because every graph consists of these two. Each node can have multiple Labels, that act like a tag. Each node can have any numbers of Labels. So, for example, you can mark a node as a Person and Author to indicate that it belongs to those two groups. In Neo4j, relationships are defined by edges, each edge can only connect two nodes in one direction. If you want to connect it in both directions, then you have to create another edge. Edge also must have a label, only one label, that describes it. Our data can look like that for example:

Our data can look like that for example:

 

Nodes and edges may be not enough to store all our needed data. Fortunately, Neo4j supports storing additional data. Every node and every edge can have any number of simple properties. It really is just like a flat JSON. We can’t have here any complex nested structures, so if we have requirements like that, we need to store them somehow else – for example, another database! and link them together somehow. But that’s a totally different topic.

Queries

So now you know what Neo4j is, but how to use it? Well, if you want to get some data from it, you need to run a query like in any other database. In the case of relational databases, most of the time we use SQL language, in Neo4j we can use Cypher. Cypher is a new query language that was developed for Neo4j.

With SQL queries, we have to specify how to find the desired data, what data we want and from where to get it:

We can also select only a part of data with specific values:

With Cypher we define what we want to find, not how:

We define a MATCH, the thing that we are looking for, we are looking for a node with the label :User, assign it to the variable user and return it as a result. We can also query for specific property values. We have two options here. Either we define property values on node:

Or we define a WHERE clause:

Ok, these were simple examples, but how do more complicated queries?

In SQL, most of the time we have to use joins to extract data from several tables. Something like that:

Above query returns a list of post titles with author’s name. As you see, we had to join both tables using a foreign key on Posts table. In Cypher we do it a bit different:

Here we define that we are looking for a relation IS_AUTHOR between nodes User and Post, and we return only desired properties. Easy, isn’t it? 🙂

Here are few more simple examples in Cypher:

As you see, querying with Cypher is not hard, it’s very visual, you can clearly see what we are looking for. Of course, these are really simple queries, and the more complicated stuff you need, the more complex queries do get.

How to add data?

In SQL first, we have to create the schema, create tables, constraints, define columns, data types. That is a lot of work. Only after we will create this foundation, we will be able to add some data to the database. In SQL we do this with INSERT expression. We define what we want to put in and into where.

Now, in Cypher we do it like that:

Or if we want to create few nodes with a relationship in one go:

If we want to add a relationship to existing nodes, we first have to find them with MATCH:

As you see, creating nodes and edges in Cypher is very easy, we don’t have to worry about data types, each node can even have completely different properties.

What about updates?

Ok, we know how to find data, we know how to create new nodes, but what about updating existing? In SQL we have an UPDATE statement, that works like that:

We specify which table to update, what property to set and on which row to do it. It’s not really complicated. In Cypher, it is similar:

First, we have to find our node or edge anyway we want, and then we just SET a property that we want to change.

Can I play with it?

Yes, you can. Using Neo4j is really easy. You just have to download it and install from their website or use a docker image to run it. Personally, I prefer docker, you can use command bellow to run it:

When it’ll finish booting up, you can head to http://localhost:7474 in your browser to open server console. You have to set an admin password and you are all set and ready to go.

Summary

As you see, Neo4j, at its core, has a really simple concept. We have just nodes and edges that connect them. We can add properties anywhere we need them – we are not constrained with the schema. Using Cypher is also a breeze, it’s really easy to learn, at least the basics. But this allows you to jump into graph databases with ease. As far as I know, there are some technologies that support Java development with Neo4j like Spring Data Neo4j and Querydsl. I plan to explore these technologies in future posts, so stay tuned for more graph goodness.


Also published on Medium.