Press "Enter" to skip to content

Week #8 – Reading 7zip archive

Commutee is coming along pretty slow, but it is moving forward! This time I have focused on reading real data.

This week I have decided, that it would be a good idea to start doing some real work on this project. I have already created a parser, the business logic for processing timetables data and all the needed infrastructure (well, except database, which is mocked ;)) Now it would be a good time to test it on real data.

I have downloaded timetables data from the official site of ZTM (Warsaw’s public transport company), it is a pretty small compressed file, 4-5 MB of data in 7-zip format. I thought that I would feed this into ZipInputStream from JDK and just read the uncompressed text file (which uncompressed size is 200+ megs big). So how it went? Well, it went bad, it just didn’t work 🙂 As it happens, ZipInputStream cannot read 7-zip archive, it is a different format! So, I thought ok, let’s try another way, this time I tried to decompress it with Deflater. And I have failed again.

This was the moment, where I had to do what most of the developers do when they don’t know how to do something, I google it!

Quick note here to all junior developers, there is nothing bad in not knowing stuff, it is normal, you can’t know everything. If you don’t know how to do something, google it, search on Stack Overflow, ask a fellow developer, try to find a solution in the documentation. Seek this knowledge, it is one of the most important skill in software development, an ability to find a solution to a problem!

7-zip

My googling took me to the 7zip binding project that could solve my problem. A dug a bit into it, found some examples on Stack Overflow and I have coded myself a solution that worked. Unfortunately, this library didn’t offer operations on InputStream which is a bit sad, because this makes my app a bit more memory hungry. I had to use ByteArrayOutputStream and ByteArrayInputStream to funnel the data out of the archive and produce an InputStream that I would be able to use later. Here is the complete code for reading the data:

It’s not so bad, it’s not so good, but it works, and for now, that is the most important part.

Errors!

The moment I have started reading real data, I started seeing exception everywhere. Stuff like NumberFormatException, NullPointers etc. Why? Well, documentation wasn’t so good as it looked like. Some data, like coordinates, sometimes were not filled, that is instead of getting a number like 51.2345, I got xx.xxxx. There were similar problems in few other places. I fixed all of this as each appeared, but it took some time. Remember, if you write documentation, write about stuff like that in it 😉

Summary

So this week was pretty good when it comes to progress on Commutee. I have created some simple data loading mechanism and I have started reading the data that interests me into the data store. That is a big step.

There are now two steps left to do for the backend app:

  1. Store this data into the graph database
  2. Query the database through the Rest API

There is not much time left to do all this, as these are not so simple things, but I’ll try to do it somehow 🙂

You can find the current state of the Commutee in my GitHub repository.


Also published on Medium.