Week #4 – Big problems

It is the¬†fourth week of work on Commutee project. Well, technically it is the fifth week, but I didn’t have much time for work on this project, so let’s pretend that it is week number four ūüėČ

I have started thinking about the neo4j data model for all this public transport data that I have been¬†parsing lately, and I have a big problem here…

The problem is, that I don’t know how to model it! And that is a really big problem.

The model should be designed in such way, that it would enable me to easily query for connections between two points in space and as a result, should return me a list of bus stops, departure times and bus lines that I should use to get to my desired destination.

My initial idea is the following. Each bus stop is a separate node. And each of these nodes is connected to another bus stop node to which there is a connection possible. There should be a relation for each possible type of connection, for each bus line, each tram line, subway or even walking if it is not far. For example, we have bus stop A, B, C, D and E. We can get from A to B, and from B to C using bus line 175, and from D to E using bus line 205. Also, we can get from C to D by walking.

Something like that, in my pseudo Cypher:

With a¬†model like this, I should be able to query a connection. Having two points in space, a pair of coordinates, I would be able to find a starting and ending bus stop, as each stop has its own coordinates (I didn’t include it in the example above). Now, with those two nodes, I could search for a list of nodes, connected by CONNECTS_TO relation. Having list like that, I could later analyze it some more, taking into account all other options like a number of stops, distance to walk etc. This part looks sort of ok. I have not tried it yet, but I think that it should work as intended, somehow.

The problem

The problem comes, when I take into account timetables.

Each timetable, for each line, is valid only in a¬†certain date range and on certain days of a¬†week. That means that I should have a version of all these connections for each possible day or at least for today and few next days. That adds some complexity. But let’s say that I could add another attribute to CONNECTS_TO relation, for example, a pair of timestamp, validSince and validTill. That would somehow do it, but it is no a solution that I really like. I don’t know why¬†it just feels bad.

Another problem with timetables is a… timetable. Timetable consists of a list of departure times for a line. On each stop, each line have a bunch of different departure times, some have holes in them, some departure every 5 minutes, some every 30 minutes etc. It is a lot of data and I must say, that I think that I¬†shouldn’t store it in the graph. I can’t see a good way to do it. Of course, I can just dump it there somehow, or create a CONNECTS_TO relation for each departure time. But that would create massive amounts of relations. And this also feels bad for me.

Another option

As a possible solution for storing all this data, I was thinking about using another database, that would just store timetables. I could use here some kind of document NoSQL database so that each line at each stop could have its own document stored.

That way I could first use Neo4j to find all the connections and stops, then use another database to fetch all needed time data. It doesn’t sound that bad, but it also feels bad for me.


So what’s the solution to above problem? I don’t know yet. I have to try it one way, then another and see how it works. But here comes another problem, a really big one, lack of time. Yes, I can’t find¬†enough free time and strength to work on this. One-third of time has come to a pass and all I have is parsing of input data. That’s not good. Taking this into account, I have taken a decision to limit a scope of the project. Angular client is out, there is no way that I will create it. Also, Android client is out – there is a really minimal chance that I will find to make it, but I will try if time will allow it. But for now, the project plan is cut to a minimum, that is, only the backend REST service. We will see how it will go, as even this goes really slow ūüôĀ

Also published on Medium.