Creating Centerlines with PostGIS and ArcGIS

High voltage electricity tower at sunset

A Walk in the Park

A client came to us with an interesting problem a few weeks ago. They are an ArcGIS shop and had a long-running geoprocessing workflow to generate centerlines from conductors running inside trenches.

Input: 

Centerlines Input

Output: 

Centerlines Output

Should be a breeze, no problem for a GIS, right? What was that, 2,000,000 features? Sounds like a walk in the park. As in, you set the GP tool running and go take a long-ish walk in the park, and when you get back, a freshly-squeezed featureclass is awaiting your return and you give thanks to Redlands.

Not So Fast: ArcPy GP-Based Solution

In lieu of a “CreateCenterlines” piece of GP magic, the client had created centerlines by tiling out the data and :

1. Buffering the lines

2. Dissolving the Buffers

3. Densifying/Creating vertices from the polygon edges

4. Creating Thiessen Polygons

5. Clipping the polygons

6. Polygons to Lines

7. SymDiff

8. Spatial Join

9. Select

10. Trim

My favorite screenshot of this process exemplifies the main issue of this approach; an explosion of data. Millions of vertices got turned into billions.

Data explosion - billions of vertices
Battarang-style Thiessens 

99 Problems…in 32 Bits

Turns out that 32 bit processes run out of oomph pretty quickly as they head towards the 1.3GB RAM glass ceiling. Using ArcPY this was a “big data” problem, by which I mean something that had to be (manually) scaled out to handle processing:

• Required 4 machines for 4 days

• Sometimes partially completed, required manually re-running

• ArcPY Error 99999. 

• Required manually eyeballing the results to see if any centerlines were missing.

Several Ways to Skin a Cat

Let me say straightaway that I would never skin a cat; the skin adds all the flavor. Poor-tasting feline gags aside, I looked at the following mechanisms:

• ArcGIS Polygon To Centerline Tool

• FME

• PostGIS

Skinning a Cat is Always Messy

In looking at this problem it became apparent that there is no silver bullet; no algorithm will get 100% accuracy. My first instinct was to use an OOTB toolbox tool to get this job done inside the GIS. 

FME Centerlines
FME Centerlines

FME is basically yet-another-gp-toolbox, and so I had high hopes for its ability to deliver via its Medial Axis parameter. The first time I ran it, the process ran out of memory. Aha, change the source from ESRI Geodatabase Reader -> File Geodatabase Reader and use the full memory space. This worked — albeit slowly — and I believed that I was done. Then I ran a slightly longer buffer, and a day later I had to wave the white flag as the process was still running at 60GB RAM.

ST_ Functions FTW

Ever since using Oracle Spatial for an integration project, I have pushed for spatial-processing-in-the-database wherever possible. It’s almost always nearly an order of magnitude faster than ArcObjects/ArcPy, with less code & complexity involved. The simplest apples-to-apples comparison for this is ST_Buffer vs arcpy BufferAnalysis on 60k lines. 10 seconds in Postgres vs 75 seconds in ArcPy/Catalog for the same output.

ST Approximate Medial Axis

What I didn’t realize was that, with v2.3 of PostGIS, the SFCGAL functions provide more geospatial pizzaz. Whoever that gal from SFC is, she has written some fine functions including ST_ApproximateMedialAxis. No soup for you, SQL Server / Oracle!

ST_ApproximateMedialAxis did almost everything for me; all I needed to do was create a round buffer end for my polygon to extend the medial axis to each end. [Note, if you’re in an SDE schema in PostGIS you’ll need to use the PG_GEOMETRY default geometry type (aka geometry, not ST_GEOMETRY)].

Using ST Approximate Medial Axis 

Dealing with an O(n log n) Algorithm

The ST_ApproximateMedialAxis, whether it’s on FME or PostGIS, does not seem to scale in a linear fashion. One of the nice things about working in SQL vs ArcPy is that it’s easy / quick to be inquisitive about shapes. In this case, st_npoints was useful.

select objectid,st_npoints(geom) from buffer_table order by st_npoints(geom) desc;

This gave me an ordered list of shapes that were likely to be tricky, and a candidate list to check out the time cost of the st_approximatemedialaxis algorithm. So I chose a few complex features to fire the centerline algorithm at.

select st_linemerge(st_approximatemedialaxis(geom)) from buffer_table where objectid=1213; 

–100 67ms

# of Points Time (ms)

100

67

1000

1100

2000

3800

4000

28000

Once a feature goes above a few thousand points the algorithm slows to a crawl. A feature over 10,000 points takes over 5 minutes, chews through 100% of a CPU and consumes an ever-increasing amount of memory — there is no memory limit to this query, and it will consume as much as it wants until the host OS chokes. A buffer with 25k points used 30GB RAM but a 50k buffer killed my 64GB RAM laptop in spectacular fashion.

The solution was to tile each large input feature (>20k points) around a fixed grid size. I adapted a version of ST_CreateFishnet to use a fixed-width/height grid size:

CREATE OR REPLACE FUNCTION ST_CreateFishnetByGridSize(

        geom geometry, gridsize integer,

        OUT “row” integer, OUT col integer,

        OUT geom geometry)

    RETURNS SETOF record AS

$$

SELECT y + 1 AS row, x + 1 AS col, ST_Translate(cell, x, y) AS geom

FROM 

generate_series(floor(st_xmin($1))::int, ceiling(st_xmax($1))::int, $2) as x,

generate_series(floor(st_ymin($1))::int, ceiling(st_ymax($1))::int, $2) as y,

(

SELECT ST_PolygonFromText(‘POLYGON((0 0,0 ‘||$2||’,’||$2||’ ‘||$2||’,’||$2||’ 0,0 0))’) AS cell

) AS foo;

$$ LANGUAGE sql IMMUTABLE STRICT;

Fishnet on the red feature
Fishnet on the red feature

In order to prevent hanging chads at the borders of each tile, the tile was expanded before the medial axis was created and then clipped back to the original tile extent.

Slicing Slivers with the ST_ Swiss Army Knife

After dissolving buffers I was left with “slivers” between my polygons which doubled my centerlines in various places. Postgis gives the ability to deal with this, once you can wrap your head around the generate_series/array functions

Removing slivers between polygons in PostGIS
Removing slivers

— This will desconstruct/explode the polygon and then reconstruct it without the interior rings in the last where clause below

with ipolys as

(select ST_InteriorRingN(rec_buf.geom, generate_series(1, ST_NumInteriorRings(rec_buf.geom))) as geom,rec_buf.objectid),

opoly as (select st_exteriorring(rec_buf.geom) as geom,rec_buf.objectid,rec_buf.mapno,rec_buf.grid_id)

insert into ug_buffer_expanded(geom, grid_id, mapno) select st_makepolygon(o.geom, ARRAY(select geom from ipolys where 

— last where clause to exclude shapes of a certain size and/or shape

st_area(st_makepolygon(geom)) > MAXIMUM_SLIVER_AREA or (st_area(st_makepolygon(geom)) > MINIMUM_AREA and ((st_perimeter(st_makepolygon(geom)) – |/(@((st_perimeter(st_makepolygon(geom)) * st_perimeter(st_makepolygon(geom))) – (16 * st_area(st_makepolygon(geom))))))/4) >= MAXIMUM_SLIVER_WIDTH) and o.objectid=objectid)),o.grid_id,o.mapno from opoly o;

ArcBarf

After creating my first set of centerlines and displaying just fine in QGIS, my heart sank as I attempted to render the table in ArcGIS — only to be greeted by this mystifying error. 

ArcGIS Rendering Error Notification
ArcGIS Rendering Error

Hey, at least it wasn’t a “999999” error — this was a meaningful description with an actionable response. Although ST_Dump -> ST_Collect is still a bit of a headscratcher (meaning I stackoverflowed this, it worked and I walked away), there was a solution. It removes linear segments that are probably smaller than the tolerance (and treated as zero-length).

update tline_oh_centerline set geom = (SELECT ST_Collect(f.the_geom) as singlegeom 

    FROM (SELECT (ST_Dump(rec_ug.geom)).geom As the_geom from tline_oh_centerline where id = rec_ug.id) as f

    where st_length(f.the_geom) > 1) where id = rec_ug.id;

The Law of ‘Nothing is Perfect’

Sad to say but even ST_ApproximateMedialAxis could be foiled now and then, and always with the same sort of error; touching interior rings. So I had a two-fold response. In the exception handler, iteratively remove a single interior ring and try function on the amended geometry until it worked. If that failed then just use the exterior ring, log it and move on. Out of 350k polygons there were only 4 exceptions. Yay!

Our Final Takeaway and Results

Why would anyone use ArcGIS? Well, I still used venerable Arcpy for a number of operations which were either faster or missing in PostGIS. A special shout-out goes to Dissolve in ArcPy, which must have some special sauce to beat ST_Union/ST_MemUnion hands-down. Unsplit and Trim were also key GP tools without (easy/obvious) PostGIS equivalents.

Ultimately PostGIS delivered a solution on a single machine which completes in less than a day. Highlights included:

• The Granularity of ArcObjects

• The Coarseness of ArcPy

• Performance beyond client-based technology’s reach

• Robustness

• ST_ApproximateMedialAxis delivered acceptable data quality for the client

For non-versioned, non-geometric network workflows, PostGIS is a compelling addition to any gisdev’s toolkit. In addition to QGIS, PostGIS works well in a mixed environment with ArcGIS. Having used QGIS as a visualization tool, it seems like it has a ton of GP-style functionality.

Phil Penn headshot

10 years at UDC / 31 years in GIS

Phillip Penn

Phil is a Senior Software Engineer and Technical Consultant for UDC. His background and expertise is geospatial technologies and systems integration for utilities and the public sector. Phil also works with UDC clients and vendors in providing Solution Architecture services.