Some thoughts on testing in GIS: part 2

GIS operations tests

In this post, I continue to share some of the insights into GIS testing. Here I would like to share a kind of cheat sheet for anyone involved in GIS testing.

Remember the interfaces exposed. If data in your geodatabases will be updated by various clients (purely with SQL, from a web application, and in a GIS desktop client), it is important to do the testing from various interfaces. What might look like a valid feature in ArcMap could be an invalid feature in terms of DBMS spatial functionality, and vice versa. If you do the data entry via the web interface, always check that the information was really preserved in the database. There is a risk that features are drawn in the web browser, but are stored only within the browser cache without being submitted into the database table. It’s usually helpful to add a feature via a feature service with Python with the Esri REST API (check ArcREST for that) or in a web application with Selenium, and then use any method to connect to the database and make sure the feature was actually stored. Web browser cache will bite you at some point of time, so always look at the database tables.

Know your data domains. What are max/min values for the point coordinates you want to let users to digitize? Apart from coordinates being within the geographic domain ([-90,+90] for latitude and [-180;+180] for longitude for a geographic coordinate system with degrees as units), you might expose other limits on the geographical span of the input features. Make sure you have a test suite for verifying that it’s impossible to supply invalid coordinates for the data entered. There can be many confusions when the coordinates are specified with the spatial reference units that are different from the units of the coordinate system used for the feature class (such as providing decimal degrees for a projected coordinate system that has meters as the spatial reference units).

Check geometry. Again, what could be valid in terms of SQL Server geometry spatial type may lead to inconsistencies when accessing the same data from a GIS client. With SQL Server, you can have features of different geometry types stored in the same table; with ArcGIS, you cannot. With Esri file geodatabase, you can store spatial objects larger than a hemisphere; with SQL Server geography type, you cannot.

You might also end up having features with null geometries that present in the table, but don’t really have any valid geometry defined. Make sure you have a test suite that can crunch the geometries of your features to find any outliers. Pay attention to sliver polygons and dangling nodes; set up a proper topology wherever applicable. Establish custom business rules for your line features connectivity policies by developing custom data controls using methods of the graph theory whenever dealing with transportation or network utilities data (Python networkx package may be very helpful).

Keep an eye on the field data types. Text sent from a web form might be truncated when trying to save into a table if the field length is too small. Floats can be rounded or stored with an unnecessary precision. Are nulls allowed for a field? What is the valid value range? What’s about zero, positive, or negative values?

Remember that hardware fails. No one wants a backup, everyone wants a restore. How does your GIS application behave when the network connectivity is lost? Remember that most of the geoprocessing operations are done outside of DBMS transactions, so make sure you have a safe way to roll back the changes in case needed. Establish smart rules for the time when the system goes offline or when the GIS map services lose connections to the database.

Develop smart data filters. Not everything saved into a database should be shown in a web application. Apply both attribute and spatial filters; these can be DBMS based such as views or GIS based such as SOI. Don’t make it complicated; it might be easier to create a view in a database that do the data filtering in a web client.

Be skeptic about the data level of detail. Don’t store or serve data with unnecessary details – generalize the geometry when needed. Use the best practices for cartographical generalization that can be done beforehand in a database or on-the-fly while submitting request from a client. Remember that data transfer via the network is an expensive operation and web browsers will have a hard time processing thousands of heavy vector features.

Try to automate testing wherever possible. You can use Python for almost any kind of testing. There is Selenium framework for the web interface testing. And you can also call Selenium functions from Python. Try to have discrete measurable unit tests and performance evaluation. Think in terms of the number of requests served and response times. Use ArcREST for submitting the ExportMap request to emulate the Apache JMeter server stress tests if you are comfortable with Python.

Keep historical views of your data. Develop an archiving workflow that will let you keep the information about how your features looked at a certain point of time. It might be relevant either only for attributes or geometry, or both. Make sure you don’t add redundancy to your data; you will want to be cautious about the archiving interval. Preserving only states of the features every week or month instead of days might suffice your needs.

Advertisements

Some thoughts on testing in GIS: part 1

It surprises me that when searching for GIS testing on the Internet, you don’t get many relevant results. Testing is an essential part of any IT system or workflow; GIS industry, however, doesn’t seem to be taking testing seriously. It is quite common for many organizations to have poorly defined testing plans or to not have any plans at all. This is something that might be partially because of the background that people playing role of GIS administrators might have. Such a person can be a self-taught geographer who performs duties of a GIS analyst; there is nothing wrong with that, but without a proper understanding of the IT operations there is a great chance that just some quick ad hoc tests for data integrity and smoke tests are done for the production data and workflows. However, without having and following a rigorous GIS testing plan, an organization puts itself into a great risk: incorrect GIS workflow or inconsistent datasets might lead to incorrect interpretation of GIS analysis results, and they also might have financial impact should data errors lead to direct profit losses.

In this series of posts, I would like to structure some of the GIS testing best practices I have collected over the years. I strongly believe that developing a solid methodology for GIS testing is essential for success of any organization in the GIS industry. There is a dedicated section on this in the System Design Strategies from Esri.

To be more specific, I would like to split GIS testing into several fairly independent topics. Let’s start with classifying those topics.

GIS software testing

This kind of testing implies that we want to verify that the GIS software operates correctly and operations that can be run execute successfully producing expected and correct results.

Regardless of the GIS software origin – whether it’s a commercial product, an open-source program, or an in-house built system – we want to make sure that with the new release, established workflows will continue to run normally and previously resolved issues will not occur. This implies that no upgrades of the software should occur without verifying that the established workflows will continue to work. You might think that the software vendor should take care of that and verify that the program behavior won’t change with the new release. Indeed, there is no need to test all of the features and functions within a software; however, running through all your workflows in a staging environment before upgrading your production machines is a must. A certain function or a tool that is a part of the larger process might be deprecated or modified which might break your workflow.

This may sound like regression testing when you verify that the previously existed issues that was resolved in earlier release do not appear again in the coming release. But it is also about making sure that the things that have worked in the previous will continue to work in a new release and the workflows might be executed successfully. Of course, if your workflow success was dependent on the bug fix or a patch in a previous version, it might be a good idea to investigate whether this issue cannot be reproduced again in the new release.

The software or scripting framework that is used for running the tests, doesn’t really matter as long as you are able to run them in an automated manner and schedule the run.

GIS data testing

GIS datasets are arguably one the most valuable assets an organization might have, hence it is crucially important that the spatial data are accurate, topologically correct, and consistent.

There is an endless number of business rules that an organization might enforce on its GIS data. Some of them may be unique to the organization, but most others will be relevant for any organization that uses GIS for data collection, management, or analysis.

Those rules that are generic are often related to the geographical rules enforced by the reality and common sense. That is, point features in the Schools feature class may never be located geographically within a lake feature from the Lakes feature class; number of lanes for a road cannot be negative, and speed restrictions on the roads perhaps should not exceed a certain value. It is up to you and your organization to define those business rules that your GIS data must follow.

If you are an ArcGIS user, there is a great extension called Data Reviewer. This will let you define those kinds of rules in a very convenient way; it’s a great tool for QA/QC which I have used over last years. Take a look at an instructor-led Esri course on QA/QC for GIS data. If this is not something you have access to, you might consider building own tests based on the geoprocessing framework within ArcGIS taking advantage of ModelBuilder or Python. Keep in mind that it is possible to publish ArcGIS for Server services based on Data Reviewer batch jobs where each job represents a group of checks. You will be able to run your scripts in a scheduled manner (say every midnight) and there is even a Data Reviewer widget for the Web App Builder which can make data verification performed by GIS operators very streamlined and intuitive. Esri Data Reviewer team is doing an excellent job; this is a great piece of software which provides a rich interface for automating your GIS data QA/QC workflows. There is also a free Esri training seminar on using Data Reviewer extension, worth taking a look.

Automate web testing with Selenium

For some months, I have been using Selenium WebDriver a lot for automating the web testing procedures. If you haven’t heard of this piece of software yet, I highly recommend it to anyone in the software testing industry and for anyone who often performs repetitive tasks in a web browser.

The basic idea of this software is that it is able to execute commands to manipulate the web browser pretty much like a user would plus it can execute JavaScript commands to simulate certain events. You can start a web browser, navigate to a predefined URL, fill in some forms, press a button and evaluate the contents of the web page shown. You are free to write the sequence of the operations that will be taken in any supported language such as Java, Python or Ruby and some others.

This means that you will be able to automate a lot of the work you might have used to do manually. Think of a web application that you work on. You probably have certain tests you perform over and over every build or release and find yourself doing the same thing again and again. When the number of things to check gets large, it is also easy to miss a certain workflow which could have negative consequences in the QA/QC terms. Having a script for the tests will serve as a documentation for your colleagues. The script is also easy to extend to accumulate the regression test cases. The script can be executed as scheduled without you doing anything at all which might be a huge time saver.

Sounds interesting? Start at the Selenium home page. Navigate to the Download and download Selenium Client & WebDriver Language Binding of your choice (I am using Python binding). After downloading and installing the module, fire up your IDE (I use Wing IDE and like it very much) and let’s build a simple script.

Navigate to this example. Remember to choose the language you want to see examples in within the Programming Language Preference section that will follow you on the screen. Then you will see the code only in one language. The official Selenium documentation is great and the API itself is very intuitive and easy to get started. After finishing the tutorials and playing a bit, consider writing some small unit tests for the web application you work on. If you use Python, unittest module is probably the best to start with and you could wrap your Selenium script into individual test cases and run them from the command line or IDE of your choice.

If you have access to Pluralsight, there is a great course by John Sonmez called Automated Web Testing with Selenium. It helped me to get on track in no time at all and I highly recommend this one.

If you want to perform some tests that involve interacting with web map components and GIS functionality, you might find it hard to find any useful information on any best practices or helpful tips where to start.

I was interested in testing web apps build with ArcGIS API for JavaScript by using Selenium, too. From what I’ve found I could perform certain light interactions with pure Selenium (testing changing the basemap and zooming in/out by invoking keystrokes and mouse events). But it gets harder to get into the deep interaction with the Esri JS API. I’ve search for the information here at GIS.SE. There you can find more information on what kind of JS libraries can be used for web testing.

Here are some useful features of Selenium you should be aware of:

  • You can take screenshots of the web browser while the program is executed. This implies that you will be able to open the saved images later on and review the web application visually if there are some parts that are hard to test automatically (such as the layout design). Having an image of how the web application looked like earlier, you could use any image comparison algorithm (like PIL in Python) to find what have changed in the web application.
  • You can execute JavaScript code when running a browser. This means you can interact with the web map component and get its properties, such as the map extent:
console.log(map.extent.toJson());
  • It is possible to resize the window with the browser opened so it will not obstruct other open windows.
driver.set_window_size(500, 500)

Performance Testing with JMeter 2.9 book

JMeterI have recently skimmed through a book published by Packt Publishing on Performance Testing with JMeter 2.9. Since I have used this piece of software for a while for ArcGIS Server tests, I thought it might be worth reading a book to get a deeper understanding of some basic concepts related to testing per se.

I think the book is very well structured, the amount of information is just enough to get anyone started yet you are not overwhelmed by unnecessary details and minor things that usually make it hard to read reference books on programming. I liked very much that the book has lots of step-by-step tutorials for each chapter where one can test the things described and shown on figures on his own. Another useful thing is that you get instructions on how to set up the JMeter including configuring the Java settings. The last chapter, Helpful Tips, is something any user of the JMeter would benefit on, and it seems as in this chapter one can get more helpful tips than on the Internet (blogs and web sites). The book is not supposed to replace the official software documentation from Apache yet is really good as a kick-off start place to get you on track with the JMeter and see some examples how you can benefit from using this software.

I can highly recommend the book and suggest you taking a look at it when you get a chance. A sample chapter on submitting forms is available at the Packt web site.

Desktop virtualization

At my work, I use virtualization software extensively for various reasons. I have virtual machines with ESRI software that I use for testing purposes, for example, when installing newer software versions like ArcGIS Desktop 9.4 (now is renamed to ArcGIS Desktop 10). Sometimes it might be very good to have several copies of the same virtual machine, i.e., identical clones, which can be very useful when experimenting with operating system settings. Moreover, it is very convenient to share a virtual machine with a colleague so he/she could take a look at it and help to fix a problem. As far as I know, many companies (Microsoft, Oracle, and IBM just to mention a few) use virtualization techniques to distribute installed and configured software to their clients. In those cases a client has an access to a ftp-server and everything one has to do is to copy a folder with virtual machine files to a local server. I used to use VMware for about 5 years and I like it very much. The software interface is very intuitive, documentation is very well written, and there is a big user community. Of course there are nice free alternatives like VirtualBox (which I can recommend highly for desktop virtualization, but of course only couple of vendors provide tools for really enterprise level support). Recently, while working with an old virtual machine I had for a long time ago, I realized that its disc space was not sufficient. Thus, I was in need of extending the virtual disk size. After searching on Google for a while, dozens of suggested methods have been found. Most of them, however, were not very straightforward and quite often worked not as many users did expect (as displeased user comments testified!).

An elegant and free solution is actually provided directly by the vendor. The utility is called VMware vCenter Converter and is available for download from the VMware web site. It worked fine to me; to extend a virtual machine disk of 10GB to 15GB around 1 hour was required. Nevertheless, it is the best solution I have seen so far. This is really a must have utility for any person who use VMware machines extensively in day-to-day work.