Getting geodatabase features with arcpy and heapq Python module

If you have ever needed to merge multiple spatial datasets into a single one using ArcGIS, you have probably used the Merge geoprocessing tool. This tool can take multiple datasets and create a single one by merging all the features together. However, when your datasets are stored on disk as multiple files and you only want to get a subset of features from it, running the Merge tool to get all the features together into a single feature class may not be very smart.

First, merging features will take some time particularly if your datasets are large and there are a few of them. Second, even after you have merged the features together into a single feature class, you still need to iterate it getting the features you really need.

Let’s say you have a number of feature classes and each of them stores cities (as points) in a number of states (one feature class per state). Your task is to find out 10 most populated cities in all of the feature classes. You could definitely run the Merge tool and then use the arcpy.da.SearchCursor with the sql_clause to iterate over sorted cities (the sql_clause argument can have an ORDER BY SQL clause). Alternatively, you could chain multiple cursor objects and then use the sorted built-in function to get only the top 10 items. I have already blogged about using the chains to combine multiple arcpy.da.SearchCursor objects in this post.

However, this can also be done without using the Merge geoprocessing tool or sorted function (which will construct a list object in memory) solely with the help of arcpy.da.SearchCursor and the built-in Python heapq module. Arguably, the most important advantage of using the heapq module lies in ability to avoid constructing lists in memory which can be critical when operating on many large datasets.

The heapq module is present in Python 2.7 which makes it available to ArcGIS Desktop users. However, in Python 3.6, it got two new optional key and reverse arguments which made it very similar to the built-in sorted function. So, ArcGIS Pro users have a certain advantage because they can choose to sort the iterator items in a custom way.

Here is a sample code that showcases efficiency of using the heapq.merge over constructing a sorted list in memory. Please mind that the key and reverse arguments are used, so this code can be run only with Python 3.



Printing pretty tables with Python in ArcGIS

This post would of interest to ArcGIS users authoring custom Python script tools who need to print out tables in the tool dialog box. You would also benefit from the following information if you need to print out some information in the Python window of ArcMap doing some ad hoc data exploration.

Fairly often your only way to communicate the results of the tool execution is to print out a table that the user could look at. It is possible to create an Excel file using a Python package such as xlsxwriter or by exporting an existing data structure such as a pandas data frame into an Excel or .csv file which user could open. Keep in mind that it is possible to start Excel with the file open using the os.system command:

os.system('start excel.exe {0}'.format(excel_path))

However, if you only need to print out some simple information into a table format within the dialog box of the running tool, you could construct such a table using built-in Python. This is particularly helpful in those cases where you cannot guarantee that the end user will have the 3rd party Python packages installed or where the output table is really small and it is not supposed to be analyzed or processed further.

However, as soon as you would try to build something flexible with the varying column width or when you don’t know beforehand what output columns and what data the table will be printed with, it gets very tedious. You need to manipulate multiple strings and tuples making sure everything draws properly.

In these cases, it is so much nicer to be able to take advantage of the external Python packages where all these concerns have been already taken care of. I have been using the tabulate, but there are a few others such as PrettyTable and texttable both of which will generate a formatted text table using ASCII characters.

To give you a sense of the tabulate package, look at the code necessary to produce a nice table using the ugly formatted strings (the first part) and using the tabulate package (the second part):

The output of the table produced using the built-in modules only:


The output of the table produced using the tabulate module:




Multiple Ring Buffer with PostGIS and SQL Server

Recently I needed to generate multiple ring buffers around some point features. This can be done using a dozen of tools – Multiple Ring Buffer geoprocessing tool in ArcGIS, using arcpy to generate multiple buffer polygons and merging them into a single feature class using the buffer() method of arcpy.Geometry() object, or by using open source GIS tools such as QGIS. This is also possible to achieve using relational database that has support for the spatial functions. In this post, I would like to show you how this can be done using the ST_Buffer spatial function in PostGIS and SQL Server.

In order to generate multiple buffer distance values (for instance, from 100 to 500 with the step of 100) in SQL Server, I would probably need use CTE or just create a plain in-memory table using declare; in other words, this is what it takes to run range(100, 501, 100) in Python.

In the gist below, there are two ways to generate multiple buffers – using the plain table and the CTE.

Generating a sequence of distances in Postgres is a lot easier thanks to the presence of the generate_series function which provides the same syntax as range in Python.

Using Python start up script for all Python interpreters

This post would be helpful for users of desktop GIS software such as ArcMap who need to use Python inside those applications.

There is a not so well known trick to trigger execution of a Python script before any Python interpreter on your system starts.

Note: If you are a QGIS user, there is a special way of achieving this. Please see the question Script that runs automatically from the QGIS Python Console when QGIS starts, using schedule tasks on Windows 10 for details.

The way to do this is to set up an environment variable called PYTHONSTARTUP in your operating system. You need to do two things:

  1. Create an environment variable that would point to a path of a valid Python script file (.py) with the code that you would like to get executed before any Python interactive interpreter starts. Look at the question [Installing pythonstartup file]( for details.
  2. Write Python code that you would like to get executed.

A very important thing to consider is that

The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session.

This means that you can do a bunch of imports and define multiple variables which would be available to you directly at the start up of your GIS application. This is very handy because I often need to import the os and sys modules as well as import arcpy.mapping module and create mxd variable pointing to the current map document I have open in ArcMap.

Here is the code of my startup Python script which you can modify to suit your needs. If your workflow relies on having some data at hand, then you might need to add more variables exposed. I have ArcMap and ArcGIS Pro users in mind.

I have included in the example above a more specific workflow where you would like to be able to quickly execute SQL queries against an enterprise geodatabase (SDE). So, when ArcMap has started, you only need to create a variable conn pointing to a database connection file (.sde) and then use the sql() function running your query. Thanks to the tabulate package, the list of lists you get back is drawn in a nice table format.

2018-03-15 12_55_03-PythonWindowArcMap



Visualizing computational geometry concepts using JTS TestBuilder

In this post, I would like to let you know about an excellent piece of software, Java Topology Suite (JTS).

JTS is an open source library of spatial predicates and functions for processing geometries. It provides a complete, consistent, and robust implementation of fundamental algorithms for processing linear geometry on the 2-dimensional Cartesian plane.

A funny thing about it is that JTS

is used by most java based Open Source geospatial applications, and GEOS, which is a C++ port of JTS, is used by most C based applications.

So, all the downstream projects using GEOS such as various Python wrappers around GEOS such as shapely and even the PostgreSQL extension, PostGIS, all of them really work against the JTS using the GEOS as the interface for communication. So the JTS is a very, very powerful Java library.

If you are not a Java developer, though, this might be of little interest to you. However, there is another little application, called JTS TestBuilder, which provides a GUI for geometry exploration and is an interface into the JTS API. It is not so famous as other pieces of open source GIS stack, such as QGIS or GRASS, though. Also its documentation is outdated and scarce, so you would need to find out how to use the application on your own.

Nevertheless, it is an indispensable tool for anyone who spends a fair amount of time working with computational geometry or spatial data processing applications. It would also serve as a great visualization tool for GIS instructors who need to visually explain how GIS algorithms operate. I have used it to show how Convex Hull is created from a set of points, for instance. One obvious advantage of JTS TestBuilder is that you do not need to run any heavy GIS applications and the “geometry modification – running analysis – seeing the result” cycle is really short.

Here I’ve loaded cities of California along with the state boundary and created a convex hull for the boundary geometry.

2018-03-14 17_41_18-JTS TestBuilder

Having said that, you can work in the following manner:

  • Use your favorite GIS database management tool to get WKT of a geometry you would like to inspect or analyze.
  • Use the JTS TestBuilder to draw the features.
  • Run JTS Geometry Functions constructing new geometries or answering spatial questions.
  • Load the results of the analysis back into your GIS (either for ad hoc exploration or for storage).

The code to read the features into WKT and write back from WKT using arcpy:


JTS TestBuilder can also help you to learn something new with regard to GIS theory. If you think that you are a well seasoned GIS professional who can amaze others by mentioning a few cool names like Voronoi or Thiessen, I encourage you to explore the geometry functions JTS TestBuilder provides. I am pretty sure just a few of you have heard of:

  • Koch snowflake which are used a lot in space-filling as well as cartographic simplifaction algorithms.
  • Seirpinski carpet which is not used extensively in GIS yet, but there are some emerging applications regarding urban pattern analysis.

If you would like to take advantage of the computational geometry algorithms implemented in JTS, there are also ports to .NET and JavaScript.

Another very similar application that is particularly popular among math teachers is GeoGebra. I have been using it for a while, too, but it lacks export of result geometries into WKT which can be put into a geospatial database or drawn directly in a desktop GIS application such as ArcMap or QGIS. You can try GeoGebra online or by installing a desktop application. It is also available as an app for iOS, Android, and Windows Phone.

Changing selection color in exporting ArcMap layout with ArcPy

If you are an ArcMap user and you export your map layouts into images or .pdf files using arcpy, you may have noticed that the selection symbol stays always the same (the default light green color). I was looking recently for a way to change the selection color that would be respected when exporting map layouts programmatically using arcpy.mapping.ExportToPNG().

Changing a selection color in ArcMap (Selection menu > Selection Options) does change the selection color but it is only respected when exporting the layout manually from the File menu or when exporting the layout in the Python window (it respects the selection color set in ArcMap since I am inside a live ArcMap session). Exporting a layout using arcpy though always uses the default light green color for selection.

I have found out that the arcpy.mapping.Layer object does not expose the selection symbol property according to the docs of the Layer:

Not all layer properties are accessible through the Layer object. There are many properties available in the ArcMap Layer Properties dialog box that are not exposed to the arcpy scripting environment (for example, display properties, field aliases, selection symbology, and so on).

Adding duplicate layers pointing to the same data source and setting the definition queries manually sounds really cumbersome, particularly when I don’t know beforehand what layers I will need to run selection on.

I have solved it in another way.

  1. I pre-create three layers for each of the shape types (point, polyline, and polygon) using the necessary selection symbology (done manually).
  2. Before the export of the map document’s layout, I duplicate the layer that I will run selections on and update its symbology using the .lyr file.
  3. Then I set definition query on the dummy layer that will serve as a selection symbol layer only.

All of this is done programmatically, so the only thing I really need to supply is the name of the layer that selections will run on. It works great and gives me flexibility to use a custom color for selections on any layer inside any map document.

The script source code:

Working with SQL in Jupyter notebook and dumping pandas into a SQL database

I have posted previously an example of using the SQL magic inside Jupyter notebooks. Today, I will show you how to execute a SQL query against a PostGIS database, get the results back into a pandas DataFrame object, manipulate it, and then dump the DataFrame into a brand new table inside the very same database. This can be very handy if some of your operations are better done using plain SQL, for instance, with the help of spatial SQL functions, but other ones are easier to perform using pandas​, for instance, adding and calculating new columns when the data you need to access is not stored inside the database.

The steps are as below:

  • Connect to a database using the SQL magic syntax
  • Execute a SELECT SQL query getting back a result set
  • Read the result set into a pandas DataFrame object
  • Do stuff with the DataFrame object
  • Dump it to a new table in the database you are connected to using the PERSISTcommand

Again, this is very handy for prototyping when you need to produce some tables doing a database design or when you need to have a new temporary table created for some application to read or when running some integration tests and needing to have a mock-up table to work with.

The sample notebook is available as a gist: