Some thoughts on testing in GIS: part 1

It surprises me that when searching for GIS testing on the Internet, you don’t get many relevant results. Testing is an essential part of any IT system or workflow; GIS industry, however, doesn’t seem to be taking testing seriously. It is quite common for many organizations to have poorly defined testing plans or to not have any plans at all. This is something that might be partially because of the background that people playing role of GIS administrators might have. Such a person can be a self-taught geographer who performs duties of a GIS analyst; there is nothing wrong with that, but without a proper understanding of the IT operations there is a great chance that just some quick ad hoc tests for data integrity and smoke tests are done for the production data and workflows. However, without having and following a rigorous GIS testing plan, an organization puts itself into a great risk: incorrect GIS workflow or inconsistent datasets might lead to incorrect interpretation of GIS analysis results, and they also might have financial impact should data errors lead to direct profit losses.

In this series of posts, I would like to structure some of the GIS testing best practices I have collected over the years. I strongly believe that developing a solid methodology for GIS testing is essential for success of any organization in the GIS industry. There is a dedicated section on this in the System Design Strategies from Esri.

To be more specific, I would like to split GIS testing into several fairly independent topics. Let’s start with classifying those topics.

GIS software testing

This kind of testing implies that we want to verify that the GIS software operates correctly and operations that can be run execute successfully producing expected and correct results.

Regardless of the GIS software origin – whether it’s a commercial product, an open-source program, or an in-house built system – we want to make sure that with the new release, established workflows will continue to run normally and previously resolved issues will not occur. This implies that no upgrades of the software should occur without verifying that the established workflows will continue to work. You might think that the software vendor should take care of that and verify that the program behavior won’t change with the new release. Indeed, there is no need to test all of the features and functions within a software; however, running through all your workflows in a staging environment before upgrading your production machines is a must. A certain function or a tool that is a part of the larger process might be deprecated or modified which might break your workflow.

This may sound like regression testing when you verify that the previously existed issues that was resolved in earlier release do not appear again in the coming release. But it is also about making sure that the things that have worked in the previous will continue to work in a new release and the workflows might be executed successfully. Of course, if your workflow success was dependent on the bug fix or a patch in a previous version, it might be a good idea to investigate whether this issue cannot be reproduced again in the new release.

The software or scripting framework that is used for running the tests, doesn’t really matter as long as you are able to run them in an automated manner and schedule the run.

GIS data testing

GIS datasets are arguably one the most valuable assets an organization might have, hence it is crucially important that the spatial data are accurate, topologically correct, and consistent.

There is an endless number of business rules that an organization might enforce on its GIS data. Some of them may be unique to the organization, but most others will be relevant for any organization that uses GIS for data collection, management, or analysis.

Those rules that are generic are often related to the geographical rules enforced by the reality and common sense. That is, point features in the Schools feature class may never be located geographically within a lake feature from the Lakes feature class; number of lanes for a road cannot be negative, and speed restrictions on the roads perhaps should not exceed a certain value. It is up to you and your organization to define those business rules that your GIS data must follow.

If you are an ArcGIS user, there is a great extension called Data Reviewer. This will let you define those kinds of rules in a very convenient way; it’s a great tool for QA/QC which I have used over last years. Take a look at an instructor-led Esri course on QA/QC for GIS data. If this is not something you have access to, you might consider building own tests based on the geoprocessing framework within ArcGIS taking advantage of ModelBuilder or Python. Keep in mind that it is possible to publish ArcGIS for Server services based on Data Reviewer batch jobs where each job represents a group of checks. You will be able to run your scripts in a scheduled manner (say every midnight) and there is even a Data Reviewer widget for the Web App Builder which can make data verification performed by GIS operators very streamlined and intuitive. Esri Data Reviewer team is doing an excellent job; this is a great piece of software which provides a rich interface for automating your GIS data QA/QC workflows. There is also a free Esri training seminar on using Data Reviewer extension, worth taking a look.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s