Writing helper Python functions with arcpy

If you have been working with arcpy for some time now, you might have found at some point that you re-use the same chunks of code in various modules and making changes in one place requires making modifications in other files involved.

At this stage, you should consider wrapping those bits of code into a function. This will make it easier to organize your code and make modifications as your project grows. Another thing with arcpy is that you might finding yourself going through the same procedure when starting working on a project. For instance, you set up a workspace, supply a feature class path, use the arcpy.da cursor to get values from a column and then start doing some analysis based on that. You may of course have multiple projects each of which will require a different setup, but this one is fairly common among many developers.

It is may be a good time at which you would need to compile a site package or just a helper module that will contain the functions and methods you need to use regularly.

An excellent example of compiled package of arcpy helper functions can be found here on github; the package is called arcapi. Caleb and Filip have done a great job putting together dozens of useful functions that nearly every arcpy developer uses daily. Take a look at the amazing work they have done.

So, for instance, you often need a sorted list of unique values within a column in a feature class. You could write a helper function that will return a sorted list and keep in a helper module apart from your project where you focus on implementing the business logic.

Provided this function is stored within a separate module named arcpy_helper.py, it’s just about importing this module and calling

arcpy_helper.select_distinct(“path_to_fc,”input_field_name”)

Much less code, much more productive.

Sometimes it’s not about bringing in a new piece of functionality though, but rather about writing less code.

Compare two function calls:

fields = [field.name for field in arcpy.ListFields(in_fc,field_type=”DOUBLE”)]

and

fields = get_fields(in_fc,”DOUBLE”)

Both of this calls will give you a list of fields of double type found in a feature class or a table. However, the second one is much shorter. If you need to run this piece of code just once, you are OK. But if you call it fairly often throughout the module, you can save a lot of space by using a wrapper function (sometimes referred to as syntactic sugar) that is more concise and can improve the code readability.

However, be cautious when trying to implement a functionality that you think is missing in arcpy, because there is a chance there is a ready-to-use geoprocessing tool for that. For instance, when you need to find out how many times a certain value is found in a column, the Summary Statistics GP tool can be used. When trying to remove the duplicate rows or features, the Delete Identical GP tool can be used.

But in other cases, it might be worth wrapping a chunk of code you think you will need “just for this project” into a helper function. You can collect those and at some point of time you will realize how much time you save by calling them instead searching through the projects’ code looking for a code snippet you need.

I will leave you with a helper function I’ve written for listing out fields where there are no values stored (i.e., all rows have NULL).

 

Accessing ArcObjects in Python

If you have used arcpy in ArcGIS for some time, you might have noticed that not all of those operations that are accessible to you via the user interface in ArcMap are exposed as functions and methods in arcpy.

From the Esri help:

It was not designed to be a complete replacement for ArcObjects or an attempt at creating a function, method, or property for every conceivable button, dialog box, menu choice, or context item in the ArcMap interface (that is what ArcObjects provides).

So, there are no plans to make the whole ArcGIS platform available via arcpy which is why it is sometimes referred to as coarse-grained API into ArcGIS. For those situations when you need to have a finer control over the GIS data management and maintenance, Esri recommends using ArcObjects. It is usually related to advanced data management operations where support in arcpy is very limited such as LIDAR data management, network dataset generation, write-access to the properties of workspaces and data repositories, or metadata management. All of this can be done in ArcObjects, and hence, the name – fine-grained API into ArcGIS.

Here are just a few of the things you cannot do in arcpy.

  • You cannot create a new network dataset with arcpy and neither do you have any geoprocessing tools for that; this can be done solely by using the New Network Dataset wizard.
  • You cannot create a new empty ArcMap map document with arcpy. This means if your workflows rely on generating map documents and adding layers into it, you need to pre-create an empty map document which will be used as a template.
  • You cannot create new bookmarks in a map document and neither can you import existing bookmarks into a map document. This can be done manually from the ArcMap user interface only.

However, if you do need to automate some of those workflows, either for one-time job when you need to process some data really quickly or when building a script that will be run on a regular basis, the only option you have is to use ArcObjects.

Learning ArcObjects can be hard due to its complexity. You would also need to learn Java or .NET (C# or VB) if you want to write ArcGIS add-ins or develop stand-alone applications. If you are not comfortable with those languages and have most of the workflows written in Python, I have good news for you. It is possible to access ArcObjects from Python.

This means that you can write your Python code using arcpy and other packages and incorporate some of the ArcObjects-based operations right into your code. This comes very handy when you lack just some minor operations in arcpy and need to use ArcObjects without getting out of your existing Python code.

To get started, please review this GIS.SE post: How do I access ArcObjects from Python? It has enough information to let you set up everything needed.

Then follow these steps:

  1. Install comtypes package (I recommend using pip, see How to install pip on Windows? on how to get it)
  2. Download the snippets.py file to get examples and some helper functions.
  3. Change the “10.2” in the snippets file to “10.3” and installation paths of ArcGIS accordingly.
  4. You are ready to access the ArcObjects from your Python code! Look here for a sample that will create a new map document.

There is no need to install the ArcObjects SDK; the only thing you need to have installed is ArcGIS Desktop.

You will need to play around with ArcObjects reference to find out what assembly and what interface you need to import. Here is the place you can start exploring the object model diagrams. Here is the section I recommend reviewing, Learning ArcObjects. Skip those parts you find irrelevant for you, though, as it covers nearly all ArcGIS Desktop operations.

In the API Reference part of the Help, you will find detailed information about the namespaces used in ArcObjects. Reading through this and visiting this often is an excellent way to lean ArcObjects. Here is an example of the Carto namespace.

Even though you do not need to know C#, VB, or Java, it is still worth to be able to read the code, as there are tons of useful snippets and code samples available in the Help system. Those will help you find out what interface should be used, any data type casting needed, and many more.

To learn more about the ArcObjects, listen to a recorded live-training seminar from Esri. Please share in comments what kind of operations you are missing in arcpy and would need to use ArcObjects to implement them.

 

Some thoughts on testing in GIS: part 2

GIS operations tests

In this post, I continue to share some of the insights into GIS testing. Here I would like to share a kind of cheat sheet for anyone involved in GIS testing.

Remember the interfaces exposed. If data in your geodatabases will be updated by various clients (purely with SQL, from a web application, and in a GIS desktop client), it is important to do the testing from various interfaces. What might look like a valid feature in ArcMap could be an invalid feature in terms of DBMS spatial functionality, and vice versa. If you do the data entry via the web interface, always check that the information was really preserved in the database. There is a risk that features are drawn in the web browser, but are stored only within the browser cache without being submitted into the database table. It’s usually helpful to add a feature via a feature service with Python with the Esri REST API (check ArcREST for that) or in a web application with Selenium, and then use any method to connect to the database and make sure the feature was actually stored. Web browser cache will bite you at some point of time, so always look at the database tables.

Know your data domains. What are max/min values for the point coordinates you want to let users to digitize? Apart from coordinates being within the geographic domain ([-90,+90] for latitude and [-180;+180] for longitude for a geographic coordinate system with degrees as units), you might expose other limits on the geographical span of the input features. Make sure you have a test suite for verifying that it’s impossible to supply invalid coordinates for the data entered. There can be many confusions when the coordinates are specified with the spatial reference units that are different from the units of the coordinate system used for the feature class (such as providing decimal degrees for a projected coordinate system that has meters as the spatial reference units).

Check geometry. Again, what could be valid in terms of SQL Server geometry spatial type may lead to inconsistencies when accessing the same data from a GIS client. With SQL Server, you can have features of different geometry types stored in the same table; with ArcGIS, you cannot. With Esri file geodatabase, you can store spatial objects larger than a hemisphere; with SQL Server geography type, you cannot.

You might also end up having features with null geometries that present in the table, but don’t really have any valid geometry defined. Make sure you have a test suite that can crunch the geometries of your features to find any outliers. Pay attention to sliver polygons and dangling nodes; set up a proper topology wherever applicable. Establish custom business rules for your line features connectivity policies by developing custom data controls using methods of the graph theory whenever dealing with transportation or network utilities data (Python networkx package may be very helpful).

Keep an eye on the field data types. Text sent from a web form might be truncated when trying to save into a table if the field length is too small. Floats can be rounded or stored with an unnecessary precision. Are nulls allowed for a field? What is the valid value range? What’s about zero, positive, or negative values?

Remember that hardware fails. No one wants a backup, everyone wants a restore. How does your GIS application behave when the network connectivity is lost? Remember that most of the geoprocessing operations are done outside of DBMS transactions, so make sure you have a safe way to roll back the changes in case needed. Establish smart rules for the time when the system goes offline or when the GIS map services lose connections to the database.

Develop smart data filters. Not everything saved into a database should be shown in a web application. Apply both attribute and spatial filters; these can be DBMS based such as views or GIS based such as SOI. Don’t make it complicated; it might be easier to create a view in a database that do the data filtering in a web client.

Be skeptic about the data level of detail. Don’t store or serve data with unnecessary details – generalize the geometry when needed. Use the best practices for cartographical generalization that can be done beforehand in a database or on-the-fly while submitting request from a client. Remember that data transfer via the network is an expensive operation and web browsers will have a hard time processing thousands of heavy vector features.

Try to automate testing wherever possible. You can use Python for almost any kind of testing. There is Selenium framework for the web interface testing. And you can also call Selenium functions from Python. Try to have discrete measurable unit tests and performance evaluation. Think in terms of the number of requests served and response times. Use ArcREST for submitting the ExportMap request to emulate the Apache JMeter server stress tests if you are comfortable with Python.

Keep historical views of your data. Develop an archiving workflow that will let you keep the information about how your features looked at a certain point of time. It might be relevant either only for attributes or geometry, or both. Make sure you don’t add redundancy to your data; you will want to be cautious about the archiving interval. Preserving only states of the features every week or month instead of days might suffice your needs.

Some thoughts on testing in GIS: part 1

It surprises me that when searching for GIS testing on the Internet, you don’t get many relevant results. Testing is an essential part of any IT system or workflow; GIS industry, however, doesn’t seem to be taking testing seriously. It is quite common for many organizations to have poorly defined testing plans or to not have any plans at all. This is something that might be partially because of the background that people playing role of GIS administrators might have. Such a person can be a self-taught geographer who performs duties of a GIS analyst; there is nothing wrong with that, but without a proper understanding of the IT operations there is a great chance that just some quick ad hoc tests for data integrity and smoke tests are done for the production data and workflows. However, without having and following a rigorous GIS testing plan, an organization puts itself into a great risk: incorrect GIS workflow or inconsistent datasets might lead to incorrect interpretation of GIS analysis results, and they also might have financial impact should data errors lead to direct profit losses.

In this series of posts, I would like to structure some of the GIS testing best practices I have collected over the years. I strongly believe that developing a solid methodology for GIS testing is essential for success of any organization in the GIS industry. There is a dedicated section on this in the System Design Strategies from Esri.

To be more specific, I would like to split GIS testing into several fairly independent topics. Let’s start with classifying those topics.

GIS software testing

This kind of testing implies that we want to verify that the GIS software operates correctly and operations that can be run execute successfully producing expected and correct results.

Regardless of the GIS software origin – whether it’s a commercial product, an open-source program, or an in-house built system – we want to make sure that with the new release, established workflows will continue to run normally and previously resolved issues will not occur. This implies that no upgrades of the software should occur without verifying that the established workflows will continue to work. You might think that the software vendor should take care of that and verify that the program behavior won’t change with the new release. Indeed, there is no need to test all of the features and functions within a software; however, running through all your workflows in a staging environment before upgrading your production machines is a must. A certain function or a tool that is a part of the larger process might be deprecated or modified which might break your workflow.

This may sound like regression testing when you verify that the previously existed issues that was resolved in earlier release do not appear again in the coming release. But it is also about making sure that the things that have worked in the previous will continue to work in a new release and the workflows might be executed successfully. Of course, if your workflow success was dependent on the bug fix or a patch in a previous version, it might be a good idea to investigate whether this issue cannot be reproduced again in the new release.

The software or scripting framework that is used for running the tests, doesn’t really matter as long as you are able to run them in an automated manner and schedule the run.

GIS data testing

GIS datasets are arguably one the most valuable assets an organization might have, hence it is crucially important that the spatial data are accurate, topologically correct, and consistent.

There is an endless number of business rules that an organization might enforce on its GIS data. Some of them may be unique to the organization, but most others will be relevant for any organization that uses GIS for data collection, management, or analysis.

Those rules that are generic are often related to the geographical rules enforced by the reality and common sense. That is, point features in the Schools feature class may never be located geographically within a lake feature from the Lakes feature class; number of lanes for a road cannot be negative, and speed restrictions on the roads perhaps should not exceed a certain value. It is up to you and your organization to define those business rules that your GIS data must follow.

If you are an ArcGIS user, there is a great extension called Data Reviewer. This will let you define those kinds of rules in a very convenient way; it’s a great tool for QA/QC which I have used over last years. Take a look at an instructor-led Esri course on QA/QC for GIS data. If this is not something you have access to, you might consider building own tests based on the geoprocessing framework within ArcGIS taking advantage of ModelBuilder or Python. Keep in mind that it is possible to publish ArcGIS for Server services based on Data Reviewer batch jobs where each job represents a group of checks. You will be able to run your scripts in a scheduled manner (say every midnight) and there is even a Data Reviewer widget for the Web App Builder which can make data verification performed by GIS operators very streamlined and intuitive. Esri Data Reviewer team is doing an excellent job; this is a great piece of software which provides a rich interface for automating your GIS data QA/QC workflows. There is also a free Esri training seminar on using Data Reviewer extension, worth taking a look.

LICEcap – recording the screen to animated GIF

If you have ever needed to explain to someone a sequence of steps to perform in an application, you have probably ended up with writing a long sequence of steps. This takes a lot of time and when even slight changes are made in the interface, you need to rewrite the whole thing.

I remember having nice visual animations in ArcGIS Desktop Help files where one could follow the clicks recorded in the ArcMap GUI. One of the ways to record the screen without creating large video files is to use a GIF recorder. I’ve looked for different ones and the LICEcap seems to suit me well.


LICEcap is GPL free software, so the source is available. It’s fast, has an option to adjust the FPS value and make pauses and add the text boxes. A great piece of software you could use to make a quick record of a certain workflow – you could use it yourself for any future reference or to send a GIF to someone when helping out with the software.

Stream Android screen to a PC with TeamViewer

I was looking for quite some time for an easy way to stream an Android tablet screen onto a PC (Windows) to do demos on how applications run on Android (such as demos on Collector and other ArcGIS apps). Of course you can use a portable cam or webcam, but what if you just want to show something really quickly without setting up the cameras?

In case you will ever need to do demos with the mobile apps – read below how to do this.

There are tons of ways to do this (vnc servers, paid apps, give root access on the device), but the most efficient I’ve found so far is to use the TeamViewer. There is also a free version of this software.

  1. Download TeamViewer on your Windows machine. You don’t need to install it, it is possible to just run the .exe once.
  2. Install TeamViewer QuickSupport on your tablet.
  3. When running the TeamViewer on your tablet, you will be asked if you want to install add-on for remote control of the device. The tablet will automatically find the add-on for your manufacture (Lenovo, Sony, Samsung, HTC, HP etc) and install it.
  4. Then just connect from the Windows machine to the Android tablet (both should be connected to the Internet of course).

There may be some lagging when having a poor Internet connection, but generally people have the understanding for this and tend focus on the functionality and not on the performance of the screen sharing. Keep in mind that you can also do the remote control, so troubleshooting directly on the customer’s tablet is an option if you work in tech support.

So, you can not only see your screen, but actually perform clicks to simulate touches on the tablet. An amazing piece of functionality I’ll use often now for making demonstrations to our customers.

Find fields in a feature class that have only nulls with arcpy

A friend asked to help to write a tiny script that would list fields that contain only nulls (i.e., no data at all). So, basically find any columns that were not populated with any value for any feature since their creation time.

Quite easy and quick to do with arcpy.


import arcpy
fc = r"C:\Users\user\Documents\ArcGIS\Default.gdb\_DeleteColumns"

#getting list of fields with nulls allowed (those that don't allow nulls won't have
#nulls so we can skip them already now
not_null_fields = [field.name for field in arcpy.ListFields(fc) if field.isNullable != "False"]

#getting a dict {field_name : [list of all values]}
fields_dict = {field: list(set([feature[not_null_fields.index(field)] for feature in arcpy.da.SearchCursor(fc,not_null_fields)])) for field in not_null_fields}

#finding out which fields contain only None (null in Python terms)
fields_to_remove = [k for k,v in fields_dict.iteritems() if v == [None]]

#remove fields from the fc that contain only Null values
for field in fields_to_remove:
arcpy.DeleteField_management(fc,field)