Getting geodatabase features with arcpy and heapq Python module

If you have ever needed to merge multiple spatial datasets into a single one using ArcGIS, you have probably used the Merge geoprocessing tool. This tool can take multiple datasets and create a single one by merging all the features together. However, when your datasets are stored on disk as multiple files and you only want to get a subset of features from it, running the Merge tool to get all the features together into a single feature class may not be very smart.

First, merging features will take some time particularly if your datasets are large and there are a few of them. Second, even after you have merged the features together into a single feature class, you still need to iterate it getting the features you really need.

Let’s say you have a number of feature classes and each of them stores cities (as points) in a number of states (one feature class per state). Your task is to find out 10 most populated cities in all of the feature classes. You could definitely run the Merge tool and then use the arcpy.da.SearchCursor with the sql_clause to iterate over sorted cities (the sql_clause argument can have an ORDER BY SQL clause). Alternatively, you could chain multiple cursor objects and then use the sorted built-in function to get only the top 10 items. I have already blogged about using the chains to combine multiple arcpy.da.SearchCursor objects in this post.

However, this can also be done without using the Merge geoprocessing tool or sorted function (which will construct a list object in memory) solely with the help of arcpy.da.SearchCursor and the built-in Python heapq module. Arguably, the most important advantage of using the heapq module lies in ability to avoid constructing lists in memory which can be critical when operating on many large datasets.

The heapq module is present in Python 2.7 which makes it available to ArcGIS Desktop users. However, in Python 3.6, it got two new optional key and reverse arguments which made it very similar to the built-in sorted function. So, ArcGIS Pro users have a certain advantage because they can choose to sort the iterator items in a custom way.

Here is a sample code that showcases efficiency of using the heapq.merge over constructing a sorted list in memory. Please mind that the key and reverse arguments are used, so this code can be run only with Python 3.

 

Advertisements

One thought on “Getting geodatabase features with arcpy and heapq Python module

  1. Another direction: use counters:

    “`
    from collections import Counter
    from itertools import chain

    import arcpy

    cursors = [
    arcpy.da.SearchCursor(fc, [‘CITY_NAME’, ‘STATE_NAME’, ‘POP1990’])
    for fc in fcs
    ]

    city_pops = Counter({(city, state): pop for city, state, pop in chain(*cursors)})

    top_ten = city_pops.most_common(10)
    “`

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s