Removing redundant polyline vertices using arcpy

If you maintain a polyline feature class of some entities such as city roads, you may have seen that some of your features have redundant vertices. These vertices are not necessary to preserve the feature’s shape. Not only they increase the size of your dataset, but they also may slow down processing time as it takes longer time to process geometries with a larger number of vertices.

You can see the redundant vertices (small green points) highlighted on a source polyline (thin green line). The cleaned shape (thick blue line) has only the essential vertices (large blue points).

collinear_image

To automate the process of removing the redundant vertices, I’ve written a Python script. The tricky part was to find out whether the points one iterates over are collinear.

Take a backup of your data before running this script as it modifies the features in place!

  1. Script works for single part as well as multi-part lines.
  2. You can choose what to do with the curves: they are either preserved without removing any redundant vertices this particular feature might have OR densified with the specified distance value. Look up the .densify() method on arcpy.Geometry object. You could alternatively use the Densify GP tool to fix your curves before running this script if you would like to experiment with the deviation values.
  3. Only polyline feature classes are supported (no points/polygons). Could be extended to support polygons though.

Advertisements

Static code analysis and linting for Python developers

If you have spent some time writing Python code, you may have met some of the most common error types that can occur when building a program: we may try to use an undefined variable, call a function that was not imported from another module, or access a non-existing property of a class instance. Today, I would like to explore how these types of errors could be found when programming with a very dynamic language such as Python.

Regardless of what Python IDE you use – Wing IDE, PyCharm, Visual Studio or Visual Studio Code – you may use PyLint which seems to be the most commonly used Python linter. This tool will report errors it founds in your code. Some IDEs such as PyCharm and VSCode (with Python extension) have built-in support for PyLint which means you will see the errors highlighted in your code as you type.

There are quite a few resources online (such as this, this, and this) that showcase the linting and type hinting for Python developers. In this post, I’d like to mention some tricks for arcpy developers because there are things we need to tweak. I use Wing IDE, but you need to find out whether your Python IDE supports static code analysis.

Working with arcpy objects

import arcpy
from arcpy.arcobjects import Result
fc = r’TemplateData\TemplateData.gdb\USA\states’
result = arcpy.GetCount_management(fc) # type: Result

By defining the type of the variable result, Wing IDE will be aware of the output variable type before the actual execution of the program. The type of variables can be indicated by a comment that follows it. When typing result.get, Wing gives you properties that Result object has, such as getOutput() and getMessages().

Note that if would not add the inline comment, the type of result variable would not be known to the IDE.

import arcpy
fc = r’TemplateData\TemplateData.gdb\USA\states’
result = arcpy.GetCount_management(fc) # not defining the output type

When you would type result. to inspect the variable, Wing tries to guess the type of the object, and you can see the properties of int, dict, list, and ​tuple types. In other words, Wing is sure that what it gets back as a result variable is not a set and not a ​str object, but it can be of some other data type.

This means that you might annotate your variables helping Wing analyzing your code. This will save you time because you wouldn’t need to jump to the ArcGIS Desktop Help to see what properties an object has.

Taking care of missing members in arcpy.da module

If you will run PyLint for this Python code,

import arcpy
fc = r’TemplateData\TemplateData.gdb\USA\states’
print [f[0] for f in arcpy.da.SearchCursor(fc, ‘OID@’)]

you will get this error message:

Line 3 Column 21 E1101: Module 'arcpy.da' has no 'SearchCursor' member

Not all members are exposed in arcpy​; this is applicable even to UpdateCursor​ and InsertCursor. Obviously, you would like to filter this error out. This can be done by using a PyLint configuration file. I prefer to have a pylintrc file in the folder with my Python modules with this content:

[TYPECHECK]
generated-members = arcpy.da.SearchCursor, arcpy.da.UpdateCursor

You can read more about other PyLint options here.

Type hints for Python 2.7

As most of the Python development still targets Python 2.7 (ArcGIS Desktop rather than ArcGIS Pro), you might look into how type hints can be used with Python 2.7 code. Below is a .gif animation to demonstrate how type hints can be used for a function definition or for a variable definition.

static_typing1

Adding configuration to Python programs

I had to use various configuration options when writing Python, everything from creating a global variable of the directory path used throughout the module to initializing the context class instance and accessing its properties. In this post, I would like to systematize the configuration options I have used lately when writing Python programs. Each option will have an associate gist with a sample code.

Using constant variables

When you write a small Python script, for example, the one that will read a text file stored in a folder, do some processing, and write some new files to disk, it may be enough just to define a constant that you would put to the top of your module and then re-use throughout the Python module. When you define these variables, you won’t need to update the strings in multiple places in your code should you need to re-run your Python code for another folder. From the PEP-8:

Constants are usually defined on a module level and written in all capital letters with underscores separating words. Examples include MAX_OVERFLOW and TOTAL.

If the number of constants or configuration values gets high, you may want to move them into a separate Python module and then import only those constants you need.

Using JSON file

JSON configuration settings can be very complex and large, but Python has a built-in module json for reading and writing JSON data. This approach also has the advantage of being very portable. There are some cons when using JSON as a main configuration file though: it doesn’t support comments and it doesn’t support any Python code (for instance, using os.path.join). Nevertheless, the JSON is a very popular choice in terms of configuration.

Using ConfigParser module

Python has a built-in module ConfigParser. From the Python help:

This module provides the ConfigParser class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files. You can use this to write Python programs which can be customized by end users easily.

Pretty similar to reading a JSON file. Some would argue that it’s easier to update .ini files than JSON files because you don’t need to make sure you keep the curly braces and the quotes in the right place. Definitely true if the configuration file will be updated by the end users of your programs.

Using global variables

The need for global variables that can be used among multiple Python modules comes often in situations when you don’t want to pass around a constant and would rather like have a variable that all modules are aware of. I would argue that you are better of with a class, but I think it really depends on how large your programs are. There is a very interesting approach to the global variables initialization here on StackOverflow.

Using operating system environment variables

This can be very handy when your module runs on different machines and you need to use different connections or user accounts depending on where your code runs. When you move your Python files between the environments (from staging to production, for instance), it will soon become a hustle to update the configuration to point it to the right files and data.

A much more flexible solution could be using operating system environments. They can be created once on each machine you know you will run your code and then use the os.getenv() Python function to read the value of the variable and use it to decide what values should be used. I find this approach to be very handy and I use it all the time: you never need to update the configuration as the correct settings will be picked regardless on what machine your code runs.

Using a class constructor

This approach would work best when you build a large program and you would like to make sure you don’t have any global variables cluttering your namespace and all your context variables come from a single point, namely the class interface. This obviously works best when you embrace the OOP approach writing a Python program as in this case the class instance will be aware of its own properties and there is no need to keep track of the global variables. If you are ready to explore some of the more complex approaches, take a look at dependency injections.

There are many excellent posts on StackOverflow regarding the configuration, such as this one and this one.

Some of the useful snippets to illustrate the configuration options are below.

 

 

 

 

Access ArcMap UI from Python comtypes

I have blogged earlier about the possibility to submit calls to methods defined in classes stored within a .NET library either using clr (with the help of pythonnet) or using native ctypes depending on how your libraries have been compiled. I have also shared some of the examples on how one can use comtypes to access COM-objects of ArcGIS Desktop to be able interact with ArcObjects via Python without writing any C#, Java, or C++ code.

This time, I need combine these approaches. I have to execute a tool on a toolbar in ArcMap (this can be a custom .NET add-in or a built-in tool such as Full Extent) from Python code while working in an open ArcMap session. This is different from what I’ve done before because I have always worked with comtypes in a stand-alone mode accessing existing map documents stored on the disk. This time, I have to use comtypes within an ArcMap session hooking into the currently open map document.

Fortunately, this wasn’t as hard as I expected. I just had to use the standard interface to get into the application, and then into the map document, and then into the command bars available. The last thing you would need to do is to find the tool you need and then execute it. I’ve published a snippet that would get you started.

The first one is for calling the Full Extent button on the Tools toolbar in ArcMap. The second one is for calling a button on a custom toolbar that is part of a custom .NET add-in in ArcMap. You can both of these samples as a part of a custom Python script tool or a Python add-in.

 

Dividing a polygon into a given number of equal areas with arcpy

I was recently searching the Internet trying to find any tool that would let me split an arbitrary polygon inside a geodatabase feature class into multiple polygons of equal area. ArcGIS does provide this functionality as a part of the parcel fabric functionality. Unfortunately, there is a lot of work involved in setting up the parcel fabric and there is a lot to learn before you will be able to divide your parcels. So I was looking for a simpler solution that would work directly with the geometry of a polygon.

However, I was not able to find any solution that would work and most helpful posts on the forums were pointing either at parcel fabrics or providing some ideas on the implementation of the workflow using multiple geoprocessing tools and some custom code. I found a nice ArcGIS custom script tool called Polygon Bisector which

computes a line that bisects, or divides in half, a polygon area along a line of constant latitude or longitude.

So, this would work great if you need to split a polygon creating a number that follows the exponent of two (2, 4, 8, 16, 32 and so forth). This is because after dividing a polygon into two polygons of equal area (now you have 2 polygons), you could divide each of them into two parts again (now you have 4 polygons), and so on. Since I want to be able to divide a polygon into an arbitrary number of areas, I had to write my own tool.

I have solved this problem this way. Say I want to have the polygon of area 1000 sq. m. divided into 5 equal areas:

  1. Get an extent of a polygon.
  2. Construct a polyline using the vertices of the polygon’s extent with a tiny shift of coordinates.
  3. Cut the polygon into two halves using this line.
  4. Find what is the area of the smallest polygon.
  5. If the area is smaller than the 200 sq. m. (that is, fifth part of the polygon), the shift the line again and re-run steps 2-4.
  6. If the area is 200 sq. m. or larger, than leave this part and keep working with the polygon that is left essentially running through the steps 2-5.
  7. When the original polygon has been successfully divided into equal areas, they are inserted into a new feature class along with the source polygon attributes.

The illustration of the cutting lines with the extent polygon is below.

extent_figure

This approach has several disadvantages, though.
First, if your polygon is very large and you want the parts of the polygon to have the same area with the minimal difference, the tool execution will take a lot of time as you will need to shift the cutting lines a few centimeters, cut the polygon into halves, and evaluate the result.
Second, you can only choose between the North-South or West-East direction of the cutting lines. You won’t be able to specify the angle yourself.

However, this tool works great for the target use I kept in mind when writing it. Using 0.5 meters as the step for moving the cutting line, the difference between the largest and the smallest sub-polygons was just around 1%. Running the same code for the same polygon with the shift value set to 0.05 meters (5 cm), I have observed the difference value to be around 0.1%.

The illustration of the polygon subdivision is below (West-East to the left, North-South to the right).

divide_polys_figure

The code is available at the GitHub Gist as usual. Run this code inside the Python window in ArcMap while having a single polygon selected.

 

 

Implementing Copy Parallel with ArcObjects

In ArcMap, in the Editor toolbar, you can find a command Copy Parallel which  will create a copy of a selected line feature at an offset distance you specify. This command is unique to the Editor toolbar and is not available as a geoprocessing tool. This makes it impossible to create parallel lines in batch (say, create lines to the both sides of the selected line feature starting with 10 meters up to 100 meters with the step of 20 meters).

A neat thing that exists in ArcMap is that you can assign a shortcut key to any command you can find in the Customize menu > Commands tab window. So, your workflow could look like this: you start the editing session, select a feature, use your shortcut key (e.g., Ctrl-Alt-P) and enter the offset value in the Copy Parallel window that appears. However, this implies that you would need to type an offset value for every copy to be created and you might also need to change other options in the window.

It’s important to understand that creating a parallel copy of a feature in the context of ArcGIS is not the same as creating a feature with a specified offset (because I thought it was). Compare the features produced using the Copy Parallel command (to the left) and using Python code (that just moves each vertex of a feature using an offset value) (to the right):

parallel

As you can see, the length of line segments can change when using the Copy Parallel command where as just moving vertices with offset preserves length of the segments.

You can call the Copy Parallel command using ArcObjects (just like any other command) but you won’t be able to supply any input parameters such as the offset value. So, it’s no different from assigning a shortcut key. The only solution left is to implement an own version of Copy Parallel that will mimic the built-in command. I have written an ArcGIS .NET add-in that can be called from a custom toolbar.

One can specify the intervals in a text box and then click the button.

parallelpara1

This is the code for the combo box item and for the button item:

 

 

Installing GDAL with Anaconda on Windows

It has been historically fairly hard to install GDAL along with all its C dependencies and make it play nicely with existing Python libraries leaving alone having multiple environments with multiple versions.

I have blogged a year ago on how to set up a nice open-source GIS sandbox using Anaconda. Nowadays, it got even easier. When installing Anaconda 4, an application called Anaconda Navigator will be installed. From the web site linked:

Anaconda Navigator is a desktop graphical user interface (GUI) that is included in Anaconda® and allows you to launch applications and easily manage conda packages, environments and channels without using command-line commands. You can configure Navigator to search for packages on Anaconda Cloud or in a local Anaconda Repository. It is available for Windows, macOS and Linux.

Using the Anaconda Navigator, you can install GDAL without even starting a terminal!

Here are the steps you need to follow:

  1. Download and install Anaconda 4.x, 64-bit, Python 2.7, Windows version.
  2. Make sure to install Microsoft Visual C++ 2008 Service Pack 1 Redistributable Package. If you don’t do that, you may get weird errors when trying to import gdal in your Python code.
  3. Install gdal 2.1.0 using the Navigator application.
  4. Optionally, add a system environment for specifying the gdal folder in your Anaconda installation folder –GDAL_DATA =  C:\Users\dev\Anaconda2\Library\share\gdal – if you would need to work with spatial references, for instance, use pyproj to re-project coordinates.
  5. Start a Python command prompt and run import gdal and import ogr. If both have been successfully imported, you are good to go!

It is very easy to manage your packages using Navigator. Make sure to add some other Conda channels such as conda-forge because you won’t be able to find all geospatial packages you might need in the default channel. Thereafter, you can easily add pyprojand other geospatial Python packages such as pysal, descartes, basemap and cartopy.