Some thoughts on testing in GIS: part 2

GIS operations tests

In this post, I continue to share some of the insights into GIS testing. Here I would like to share a kind of cheat sheet for anyone involved in GIS testing.

Remember the interfaces exposed. If data in your geodatabases will be updated by various clients (purely with SQL, from a web application, and in a GIS desktop client), it is important to do the testing from various interfaces. What might look like a valid feature in ArcMap could be an invalid feature in terms of DBMS spatial functionality, and vice versa. If you do the data entry via the web interface, always check that the information was really preserved in the database. There is a risk that features are drawn in the web browser, but are stored only within the browser cache without being submitted into the database table. It’s usually helpful to add a feature via a feature service with Python with the Esri REST API (check ArcREST for that) or in a web application with Selenium, and then use any method to connect to the database and make sure the feature was actually stored. Web browser cache will bite you at some point of time, so always look at the database tables.

Know your data domains. What are max/min values for the point coordinates you want to let users to digitize? Apart from coordinates being within the geographic domain ([-90,+90] for latitude and [-180;+180] for longitude for a geographic coordinate system with degrees as units), you might expose other limits on the geographical span of the input features. Make sure you have a test suite for verifying that it’s impossible to supply invalid coordinates for the data entered. There can be many confusions when the coordinates are specified with the spatial reference units that are different from the units of the coordinate system used for the feature class (such as providing decimal degrees for a projected coordinate system that has meters as the spatial reference units).

Check geometry. Again, what could be valid in terms of SQL Server geometry spatial type may lead to inconsistencies when accessing the same data from a GIS client. With SQL Server, you can have features of different geometry types stored in the same table; with ArcGIS, you cannot. With Esri file geodatabase, you can store spatial objects larger than a hemisphere; with SQL Server geography type, you cannot.

You might also end up having features with null geometries that present in the table, but don’t really have any valid geometry defined. Make sure you have a test suite that can crunch the geometries of your features to find any outliers. Pay attention to sliver polygons and dangling nodes; set up a proper topology wherever applicable. Establish custom business rules for your line features connectivity policies by developing custom data controls using methods of the graph theory whenever dealing with transportation or network utilities data (Python networkx package may be very helpful).

Keep an eye on the field data types. Text sent from a web form might be truncated when trying to save into a table if the field length is too small. Floats can be rounded or stored with an unnecessary precision. Are nulls allowed for a field? What is the valid value range? What’s about zero, positive, or negative values?

Remember that hardware fails. No one wants a backup, everyone wants a restore. How does your GIS application behave when the network connectivity is lost? Remember that most of the geoprocessing operations are done outside of DBMS transactions, so make sure you have a safe way to roll back the changes in case needed. Establish smart rules for the time when the system goes offline or when the GIS map services lose connections to the database.

Develop smart data filters. Not everything saved into a database should be shown in a web application. Apply both attribute and spatial filters; these can be DBMS based such as views or GIS based such as SOI. Don’t make it complicated; it might be easier to create a view in a database that do the data filtering in a web client.

Be skeptic about the data level of detail. Don’t store or serve data with unnecessary details – generalize the geometry when needed. Use the best practices for cartographical generalization that can be done beforehand in a database or on-the-fly while submitting request from a client. Remember that data transfer via the network is an expensive operation and web browsers will have a hard time processing thousands of heavy vector features.

Try to automate testing wherever possible. You can use Python for almost any kind of testing. There is Selenium framework for the web interface testing. And you can also call Selenium functions from Python. Try to have discrete measurable unit tests and performance evaluation. Think in terms of the number of requests served and response times. Use ArcREST for submitting the ExportMap request to emulate the Apache JMeter server stress tests if you are comfortable with Python.

Keep historical views of your data. Develop an archiving workflow that will let you keep the information about how your features looked at a certain point of time. It might be relevant either only for attributes or geometry, or both. Make sure you don’t add redundancy to your data; you will want to be cautious about the archiving interval. Preserving only states of the features every week or month instead of days might suffice your needs.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s