Flo’s Blog

This blog is an opportunity to stay in touch with the relevant news around here (mostly English, some German entries every now and then). For friends I also provide a feed of this blog and things that I share on the web.

  • Efficient Filtering and Conversion of OpenStreetMap Files with Osmium

    Oct. 14, 2011

    The OpenStreetMap database currently contains over 1.2 billion points, 110 million lines and one million relations (composite objects). This data is made available in the so called planet dump, a file that contains all the objects in sequential order (first all nodes (points), then all ways (lines) followed by all relations) and is currently – depending on the format – between 9 and 18 gigabytes large when compressed.

    Processing a full planet dump is a task that will let you face your computer’s hardware limitations, even if you have a reasonably powerful machine by general standards. Very simple tasks can, depending on the tools you use, take days for completion. For this reason, smaller extracts of the data are made available, constrained to individual countries or even cities, sufficient for most users.

    For creating the world maps for Maps Without Borders, I am currently creating thematic extracts of the global OSM data, containing for example all motorways or train routes of the world. In other words, the task is trivial, but on a huge scale: scan through the data, keep a tiny fraction (for example, only 0.6% of all ways in the database concern railways according to OSM’s TagInfo) and throw away everything else. I want to be able to do this on my 3 year old laptop, which sports a dual-core 2 GHz processor and 2 gigabytes of RAM (at the time I bought the laptop this was a top-end machine).

    After initial unsuccessful attempts using the standard, Java-based OSM processing tool Osmosis, I discovered Osmium, a C++ library for writing your own tools to work with OSM data. Osmium comes with an example that already does exactly what I want: keep all ways with a certain tag and export them as a Shapefile. This is done very efficiently: since each way needs the coordinate information of all of its nodes, which are stored earlier in the file, it uses an optimized datastructure to first store the coordinates of all nodes using 8 bytes per coordinate pair, and then writes the ways to the file using these previously stored coordinates. For storing the coordinates, you have two options: either use a google::sparsetable, which uses only 1 bit for empty records, or use mmap, a Unix system call that provides your program with a block of virtual memory which can be transparently stored in a (possibly huge) file on disk.

    For the 1.2 billion points in the database, you need 9 gigabytes of memory to just keep the coordinates in memory before you come to processing the ways. So this was not an option for me. Unfortunately, my hopes of using the second method were soon shattered, too: although I have enough free diskspace on my SSD, the transparent virtual memory mapping done by mmap means that you have to be able to address that memory - and on a 32 bit system you can only address 4 gigabytes of memory, no matter how much virtual memory you may be able to provide. Of course I could have looked for or written a different disk-based storage mechanism, but integrating it into Osmium and making it perform reasonably seemed too big of an effort.

    In the end I settled for a two-pass approach: In the first pass I skip all nodes, and analyze only the ways to generate a list of ids of nodes that we need to construct the ways that we are actually interested in. Using a sparse table of integer values (storing a “1” for every node id we need), we end up needing just 1 bit for all nodes (a total of ~143 MB) plus one byte for every node that should actually be stored (if we assume we end up storing less than 1% of all points this should be below 11 MB). Of course this could be further optimized by using a static bitfield, but given the sparseness of the data I decided to not bother and use Osmium’s on-board means. So our first pass looks like this:

    typedef Osmium::Storage::SparseTable storage_ids_t;
    
    class Pass1Handler : public Osmium::Handler::Base {
    
    public:
    
        storage_ids_t store_node_ids;
    
        void way(Osmium::OSM::Way *way) {
            const char *railway = way->tags().get_tag_by_key("highway");
            if (railway && !strcmp(railway, "motorway")) {
                if (way->id() >= 0) {
                    for (Osmium::OSM::WayNodeList::iterator it = way->nodes().begin(); it != way->nodes().end(); ++it) {
                        const int64_t id = it->ref();
                        store_node_ids.set( id, 1 );
                    }                    
                }
                // TODO: negative ids (used for temporary data, not found in planet dumps) not implemented
            }
        }
    };
    

    After the first pass, the procedure for the second pass is more or less the original approach: for each node, look up if we actually need its coordinates (using the sparsetable generated in the first pass), and if so store them in another sparesetable. Afterwards we construct the ways a ususal and can be sure that all the required coordinates have been stored.

    typedef Osmium::Storage::SparseTable storage_sparsetable_t;
    typedef Osmium::Handler::CoordinatesForWays coord_handler_t;
    
    class Pass2Handler : public Osmium::Handler::Base {
    
        storage_ids_t* store_node_ids;
    
        storage_sparsetable_t store_pos;
        storage_sparsetable_t store_neg;
        coord_handler_t* coord_handler;
    
        Osmium::Export::LineStringShapefile *shapefile_linestring;
    
    public:
    
        // pass in stored node ids from 1st pass
        Pass2Handler(storage_ids_t* node_ids) {
    
            store_node_ids = node_ids;
            coord_handler = new coord_handler_t(store_pos, store_neg);
    
            shapefile_linestring = new Osmium::Export::LineStringShapefile("highways");
            shapefile_linestring->add_field("id", FTInteger, 10);
        }
    
        ~Pass2Handler() {
            delete shapefile_linestring;
        }
    
        void node(Osmium::OSM::Node *node) {
            // we have to make sure id is in the range of stored ids and id is in table
            if (node->id() < store_node_ids->size() && (*store_node_ids)[node->id()]) {
                coord_handler->node(node);
            }
        }
    
        void way(Osmium::OSM::Way *way) {
            const char *tag = way->tags().get_tag_by_key("highway");
            if (tag && !strcmp(tag, "motorway")) {
                coord_handler->way(way);
                try {
                    Osmium::Geometry::LineString linestring(*way);
                    shapefile_linestring->add_geometry(linestring.create_shp_object());
                    shapefile_linestring->add_attribute(0, way->id());
                } catch (Osmium::Exception::IllegalGeometry) {
                    std::cerr << "Ignoring illegal geometry for way " << way->id() << ".\n";
                }
            }
        }
    
    };
    

    Originally I combined the two handlers classes in one, using an internal flag to distinguish the first and second pass. Investigating the Osmium source code it turned out that Osmium adapts its parsing by skipping parts that are not actually processed by the handler. To make this work (and gain quite a performance benefit), the corresponding method must not be implemented in the handler, so I had to split the two passes to benefit from skipping the unprocessed nodes in the first pass.

    Finally, add the glue to turn the two handlers into an executable program:

    int main(int, char *argv[]) {
        Osmium::init(true);
    
        Osmium::OSMFile infile(argv[1]);
        Pass1Handler handler1;
        Pass2Handler handler2(&handler1.store_node_ids);
    
        infile.read(handler1);
        infile.read(handler2);
    }
    

    Filtering planet.osm.pbf in this way takes approximately 40 minutes on my laptop, and peaks at 900 MB memory usage. Still serious processing, but orders of magnitude faster and more efficient than the alternatives.

    Obviously this could be extended much more (I actually simplified the code for this post) but is already a lifesaver for generating full planet extracts in this hardcoded form.

    Permalink
  • A map of the Alps – Without borders, South up

    June 13, 2011

    For an upcoming research project, Maps without borders, I am rendering maps that do not resort to national or administrative boundaries to structure the space on the map. After all, within the Schengen area, borders have become largely invisible, and somewhat irrelevant, to most travellers, which has so far not been reflected in the way these borders are depicted on maps.

    The goal of the project will be to develop alternative graphical features that will allow us to still know what is where on the map. I am starting out by creating a few base maps based entirely on open data, and I like to turn maps upside down in the process to free us from the preconceptions we have when looking at a well-known map and see the space in a fresh way. So here’s a first teaser for you, a map of the Alps, without borders, “upside down”. Take some time to look at it – it really provides some fresh insight into this geographical space (Click on the image for the high-res version).

    The final maps should be more abstract, but as stated above this will be a medium term research project. Rendering South-up oriented map using opensource tools still involves a few small hurdles currently – I want to provide some more details on the technical workflow I am using in a future post. The data used for this map comes from Natural Earth and OpenStreetMap, with a few local corrections to the data that will be published on the Maps without borders website.

    Comments are not yet implemented on this blog so if you have questions or want to get in touch concerning the project, drop me an email.

    Permalink
  • An interview on Vivir Bien, solidarity economy in Europe and informal ontologies

    June 7, 2011

    A few weeks ago I was interviewed by Mariana Malta of Universidade do Minho on the topic of Vivir Bien, a platform for solidarity economy I am involved with. While the interview was initially done for research in the context of the RIPESS-EU process of linking solidarity economy networks across Europe, I found it may be interesting for a wider audience, so I edited the transcript she sent me to publish it here.

    Read on...
  • Welcome to my new, self-hosted blog and website!

    June 6, 2011

    After years with a Blogspot blog and a single static html page as a website, I am upgrading my online presence to a full-featured dynamic web publishing outlet. While it may lack the fancy understatement of the old site for now, it will allow me to publish projects, ideas and experiments in a much more versatile way. I am also “eating my own dogfood” here, as the infrastructure this site runs on is a self written content-management-system I have sold to several clients.

    Please bear with me as the site may still be a bit rough around the edges while I put all the content in and polish the design and functionality. If you notice any crass errors, please drop me a line.

    I separated the blog into three main sections, which you can subscribe to individually. So if you’re only interested in one specific aspect of my writings, you can got to the categories mapping, technology or general observations, respectively.

    Permalink
  • The Beauty Of The Earth’s Surface

    Dec. 18, 2009

    NASA’s Shuttle Radar Topography Mission measured the earth’s surface elevation in February 2000 using radar interferometry. The resulting data is available to the public and can be used for various purposes. I have been playing with the data over the last days and experimented with different methods for visualizing terrains in 2d and 3d, primarily in the context of Openstreetmap rendering. However, nothing has impressed me more than this very first plain grayscale image I produced (showing the area around and to the north of Vienna).

    I have never seen so clearly the details and variation of the earth’s relief. On most maps, data like this is rendered in the background to give the viewer a sense of depth, but map graphics and surface textures are usually layered on top and take away a lot of the clarity that is so stunning here. One has to consider that the image shows an area that is usually considered to be rather “flat” - but look at how one can see the thousands and thousands of years of wind and rain working on the “Weinviertel” area to the north. Or look at the sharp edge of the “Wienerwald” area, suddenly dropping down to the really flat plains of the “Tullner Becken” and “Wiener Becken”.

    I considered adding some labels and paths as means of orientation, but then I liked the idea to leave it like it is as a puzzle for you to figure out the locations of places you know in the image.

    Permalink
  • Augmented Reality in „Print 2.0“

    Oct. 26, 2009

    It’s nice to see a project I have been involved with going live: The October issue of Red Bull’s “Red Bulletin” magazine features augmented reality content that supplements the printed articles. But see for yourself…

    Read on...
  • Playing with Arduino

    June 4, 2009

    Finally I found some time to do a little hardware hacking on the Arduino platform. As a quick first project I came up with something that a) makes sense and b) can be done with a minimum of money (€10 if you already have an Arduino) and a main focus on programming: A two-joint articulated plotting robot.

    Read on...
  • Empty Inbox!

    July 29, 2008

    Usually I sort incoming mails into folders immediately, besides the ones that require some kind of action or a reply, which I leave in my inbox (I sometimes even pull automatically sorted emails back into the inbox if I want to remind myself to act upon them). This means there are always some messages in my inbox, usually between 10 and 30, depending on my workload and other factors. Some of these emails can get very old in there, years in some cases where I just don’t find the right mindset for a reply.

    I am leaving for holiday tomorrow and wanted to answer some of these long waiting emails before that. The basis was laid already a few months ago when I decided to declare as abandoned some private email threads that have been waiting for multiple years, but now the impossible has happened: Empty Inbox! I can’t remember if this has ever happened before since I started using email in … 1995 I guess.

    How do you handle your inbox?

    Permalink