Some thoughts on testing in GIS: part 2

GIS operations tests

In this post, I continue to share some of the insights into GIS testing. Here I would like to share a kind of cheat sheet for anyone involved in GIS testing.

Remember the interfaces exposed. If data in your geodatabases will be updated by various clients (purely with SQL, from a web application, and in a GIS desktop client), it is important to do the testing from various interfaces. What might look like a valid feature in ArcMap could be an invalid feature in terms of DBMS spatial functionality, and vice versa. If you do the data entry via the web interface, always check that the information was really preserved in the database. There is a risk that features are drawn in the web browser, but are stored only within the browser cache without being submitted into the database table. It’s usually helpful to add a feature via a feature service with Python with the Esri REST API (check ArcREST for that) or in a web application with Selenium, and then use any method to connect to the database and make sure the feature was actually stored. Web browser cache will bite you at some point of time, so always look at the database tables.

Know your data domains. What are max/min values for the point coordinates you want to let users to digitize? Apart from coordinates being within the geographic domain ([-90,+90] for latitude and [-180;+180] for longitude for a geographic coordinate system with degrees as units), you might expose other limits on the geographical span of the input features. Make sure you have a test suite for verifying that it’s impossible to supply invalid coordinates for the data entered. There can be many confusions when the coordinates are specified with the spatial reference units that are different from the units of the coordinate system used for the feature class (such as providing decimal degrees for a projected coordinate system that has meters as the spatial reference units).

Check geometry. Again, what could be valid in terms of SQL Server geometry spatial type may lead to inconsistencies when accessing the same data from a GIS client. With SQL Server, you can have features of different geometry types stored in the same table; with ArcGIS, you cannot. With Esri file geodatabase, you can store spatial objects larger than a hemisphere; with SQL Server geography type, you cannot.

You might also end up having features with null geometries that present in the table, but don’t really have any valid geometry defined. Make sure you have a test suite that can crunch the geometries of your features to find any outliers. Pay attention to sliver polygons and dangling nodes; set up a proper topology wherever applicable. Establish custom business rules for your line features connectivity policies by developing custom data controls using methods of the graph theory whenever dealing with transportation or network utilities data (Python networkx package may be very helpful).

Keep an eye on the field data types. Text sent from a web form might be truncated when trying to save into a table if the field length is too small. Floats can be rounded or stored with an unnecessary precision. Are nulls allowed for a field? What is the valid value range? What’s about zero, positive, or negative values?

Remember that hardware fails. No one wants a backup, everyone wants a restore. How does your GIS application behave when the network connectivity is lost? Remember that most of the geoprocessing operations are done outside of DBMS transactions, so make sure you have a safe way to roll back the changes in case needed. Establish smart rules for the time when the system goes offline or when the GIS map services lose connections to the database.

Develop smart data filters. Not everything saved into a database should be shown in a web application. Apply both attribute and spatial filters; these can be DBMS based such as views or GIS based such as SOI. Don’t make it complicated; it might be easier to create a view in a database that do the data filtering in a web client.

Be skeptic about the data level of detail. Don’t store or serve data with unnecessary details – generalize the geometry when needed. Use the best practices for cartographical generalization that can be done beforehand in a database or on-the-fly while submitting request from a client. Remember that data transfer via the network is an expensive operation and web browsers will have a hard time processing thousands of heavy vector features.

Try to automate testing wherever possible. You can use Python for almost any kind of testing. There is Selenium framework for the web interface testing. And you can also call Selenium functions from Python. Try to have discrete measurable unit tests and performance evaluation. Think in terms of the number of requests served and response times. Use ArcREST for submitting the ExportMap request to emulate the Apache JMeter server stress tests if you are comfortable with Python.

Keep historical views of your data. Develop an archiving workflow that will let you keep the information about how your features looked at a certain point of time. It might be relevant either only for attributes or geometry, or both. Make sure you don’t add redundancy to your data; you will want to be cautious about the archiving interval. Preserving only states of the features every week or month instead of days might suffice your needs.


Some thoughts on testing in GIS: part 1

It surprises me that when searching for GIS testing on the Internet, you don’t get many relevant results. Testing is an essential part of any IT system or workflow; GIS industry, however, doesn’t seem to be taking testing seriously. It is quite common for many organizations to have poorly defined testing plans or to not have any plans at all. This is something that might be partially because of the background that people playing role of GIS administrators might have. Such a person can be a self-taught geographer who performs duties of a GIS analyst; there is nothing wrong with that, but without a proper understanding of the IT operations there is a great chance that just some quick ad hoc tests for data integrity and smoke tests are done for the production data and workflows. However, without having and following a rigorous GIS testing plan, an organization puts itself into a great risk: incorrect GIS workflow or inconsistent datasets might lead to incorrect interpretation of GIS analysis results, and they also might have financial impact should data errors lead to direct profit losses.

In this series of posts, I would like to structure some of the GIS testing best practices I have collected over the years. I strongly believe that developing a solid methodology for GIS testing is essential for success of any organization in the GIS industry. There is a dedicated section on this in the System Design Strategies from Esri.

To be more specific, I would like to split GIS testing into several fairly independent topics. Let’s start with classifying those topics.

GIS software testing

This kind of testing implies that we want to verify that the GIS software operates correctly and operations that can be run execute successfully producing expected and correct results.

Regardless of the GIS software origin – whether it’s a commercial product, an open-source program, or an in-house built system – we want to make sure that with the new release, established workflows will continue to run normally and previously resolved issues will not occur. This implies that no upgrades of the software should occur without verifying that the established workflows will continue to work. You might think that the software vendor should take care of that and verify that the program behavior won’t change with the new release. Indeed, there is no need to test all of the features and functions within a software; however, running through all your workflows in a staging environment before upgrading your production machines is a must. A certain function or a tool that is a part of the larger process might be deprecated or modified which might break your workflow.

This may sound like regression testing when you verify that the previously existed issues that was resolved in earlier release do not appear again in the coming release. But it is also about making sure that the things that have worked in the previous will continue to work in a new release and the workflows might be executed successfully. Of course, if your workflow success was dependent on the bug fix or a patch in a previous version, it might be a good idea to investigate whether this issue cannot be reproduced again in the new release.

The software or scripting framework that is used for running the tests, doesn’t really matter as long as you are able to run them in an automated manner and schedule the run.

GIS data testing

GIS datasets are arguably one the most valuable assets an organization might have, hence it is crucially important that the spatial data are accurate, topologically correct, and consistent.

There is an endless number of business rules that an organization might enforce on its GIS data. Some of them may be unique to the organization, but most others will be relevant for any organization that uses GIS for data collection, management, or analysis.

Those rules that are generic are often related to the geographical rules enforced by the reality and common sense. That is, point features in the Schools feature class may never be located geographically within a lake feature from the Lakes feature class; number of lanes for a road cannot be negative, and speed restrictions on the roads perhaps should not exceed a certain value. It is up to you and your organization to define those business rules that your GIS data must follow.

If you are an ArcGIS user, there is a great extension called Data Reviewer. This will let you define those kinds of rules in a very convenient way; it’s a great tool for QA/QC which I have used over last years. Take a look at an instructor-led Esri course on QA/QC for GIS data. If this is not something you have access to, you might consider building own tests based on the geoprocessing framework within ArcGIS taking advantage of ModelBuilder or Python. Keep in mind that it is possible to publish ArcGIS for Server services based on Data Reviewer batch jobs where each job represents a group of checks. You will be able to run your scripts in a scheduled manner (say every midnight) and there is even a Data Reviewer widget for the Web App Builder which can make data verification performed by GIS operators very streamlined and intuitive. Esri Data Reviewer team is doing an excellent job; this is a great piece of software which provides a rich interface for automating your GIS data QA/QC workflows. There is also a free Esri training seminar on using Data Reviewer extension, worth taking a look.