How to be efficient as a GIS professional (part 3)

6. Automate, automate, automate

Whatever you are doing, take a second to think whether you will need to run the sequence of steps you’ve just completed again. It may seem first that you are very unlikely to run the same sequence of steps again, but in fact you may find yourself performing them over and over again later on.

Automating is not only about saving the time. It is also about the quality assurance. When you are doing something manually, there is always a chance to forget a certain step or detail which can potentially lead to an error. When having the workflow automated, you can always see what steps are being performed. An automated workflow is already a piece of documentation which you can share with others or use yourself as a reference.

Don’t trust your memory: you think you know what columns you’ve added to the table and why, yeah. Get back in two weeks and you will be surprised by how much of those memories you have left. If you will leave the job and get the work over to a new person, she will be happy to inherit a well maintained documentation and discrete description of the workflow he will be responsible for.

Considering desktop GIS automation, think about using Python for geospatial operations (think truncating tables + appending new data + perform data checks). For database automation, use SQL (add new columns + alter columns data type). Feel free to build SQL scripts with commands for adding/deleting/calculating columns and copying data, too. By preserving those scripts, you will always be able to re-run them on a another table, in another database or modify the script to match your needs. This gives you a way into looking at changes performed in your database. This is just like adding a field manually and then writing down that you have added field of type X into table Y at time Z. It is just so much easier to build a SQL script to avoid doing that.

7. SQL, SQL, SQL

Another advantage of the SQL for data processing is that it is very vendor neutral and can be executed either as is or with really minor adjustments on most DBMS platforms. This is applicable to SQL spatial functions which provide ISO and OGC compliant access to the geodatabase and database, too. Being able to execute SQL queries and perform data management operation is really advantageous when you work in a large IT environment. This might be helpful because you won’t always have the network connection to the production environment for data update and using ArcGIS might not be possible. Running a Python script would require having the Python installation on some machine and if you use arcpy – ArcGIS Desktop. Running a SQL code which has no dependencies might be your only alternative.

Many folks don’t know that one can use pure SQL with an enterprise geodatabase stored in any DBMS supported. This is just a short list of what you can do with SQL:

8. Python, Python, Python

I have blogged about using spatial functions of SQL Server earlier. Remember that you can also execute some of the SQL from Python code when using the arcpy.ArcSDESQLExecute class. Here is the SQL reference for query expressions used in ArcGIS some of which you can use in the arcpy.da cursors where clauses. Learn some of the useful Python libraries which could save you some time. Look at:

  • Selenium for automating ftp data download if this happens often and you have to browse through a set of pages;
  • scipy.spatial module for spatial analysis such as building Voronoi diagrams, finding distances between arrays, construct convex hulls in N dimensions and doing many other things;
  • Numpy, a fundamental package for scientific computing with Python, for handling huge GIS datasets (both vectors and rasters) with arcpy.

Read more about What are the Python tools/modules/add-ins crucial in GIS and watch an Esri Video on Python: Useful Libraries for the GIS Professional.

Get a chance to learn more about the SQL and Python and how you could take advantage of them in your work!

Build ArcGIS network dataset from OpenStreetMap

I have blogged previously on how you can get street data for use in the ArcGIS Network Analyst. If you have obtained TomTom or Nokia (Navstreets) data, you can easily build a network dataset (further ND) by using the Esri SDP toolbox which I have blogged about earlier.

If you don’t have any other sources for the data, consider using the OpenStreetMap (OSM) data if it is applicable in your business case. I have blogged earlier on how to get OSM data into ArcGIS network dataset, but this approach is outdated and I recommend another way to build the network. The overall workflow is fairly straightforward:

Download OSM data

Go to the OSM home page tab and choose area to download. You can either draw a rectangle or specify the bounding box coordinates. If the area you choose will be too large, you will have to use one of the sources listed at the left panel for bulk data downloads. Clicking the Overpass API link will trigger downloading the map file with no extension. Rename it by adding the .osm extension.

2015-02-20 15_46_54-OpenStreetMap _ ExportInstall ArcGIS Editor for OSM

Now you have to download the ArcGIS Editor for OSM, either 10.0, 10.1 or 10.2.x Desktop version. The installation file will install the libraries required as well as a geoprocessing (further GP) toolbox tools of which you will access later on. Read through the documentation on how to build a ND from OSM data on the ArcGIS OSM Editor home page. After installing, you should find the OpenStreetMap Toolbox in your ArcToolbox folder in ArcGIS.

Load OSM file into geodatabase

Start by running the Load OSM File GP tool. Please activate the Conserve Memory option if you have a large OSM file (larger than the amount of RAM), because during this process all nodes are going to be fetched. If you fail to do so, the process might crash. I’ve hard time to process some large files on a 8GB virtual machine partly because of the Windows paging 2GB limit. Running the processing from the 64bit Python might help, but this is something I have not tested yet. I remember that some of network data processing algorithms I have developed failed on building adjacency matrix for a network with 15 million edges when running with 32bit Python, but completed with no problems when running under 64bit Python taking almost 10GB of RAM on my machine.

Build a network dataset

When the data will be loaded into a feature dataset, you are ready to build a network dataset. You will need the Create OSM Network Dataset GP tool for that. You will need to provide a Network Configuration File which you can find in the C:\Program Files (x86)\ArcGIS\Desktop10.1\ArcToolbox\Toolboxes\ND_ConfigFiles provided. This is an XML file which provides parameters for interpreting your road types data into edge cost evaluators. The DriveGeneric.xml is for a generic motorcar routing network, but there is another one which can be used for cycling networks. There is one more file there – DriveMeters.xml. This configuration offers faster runtime performance (less Script evaluators), but will only work with coordinate systems that have a linear unit of meters. Let the tool to run as it might take a lot of time if you have a large dataset. After the ND is built, feel free to modify its properties and test how it works.

OSM_serviceareasI suggest start by downloading a small area to verify the tools are working as expected. The map.osm file you download should not be larger than 20MB. After you have verified the workflow, feel free to try larger datasets. There are some other useful tools in the ArcGIS OSM Editor toolbox which you might want to explore. There are some for designing maps based on the OSM data and loading data into PostgreSQL database.

Building custom UI tools for ArcGIS with Python

I often see people looking for a way to extend ArcGIS software: some need an extra tool that is missing in the core product, for others it is about integration with an existing system or application. A good part of users want to have custom dialogs and UI elements embedded as a part of geoprocessing tool dialog window. In this post, I have tried to summarize the options you have for customizing ArcGIS including developing new features on top of the core product.

ArcGIS-based solutions (script tools + Python add-ins)

If you develop a geoprocessing tool and have a Python script, you can make a custom script tool which will have the GUI any other core geoprocessing tool has. There are panels and boxes with Browse buttons, you can work with drop-down lists, check boxes, multi-value tables and many others. Read through all the parameter types you have (you can let users click on the map, draw features and use those features in the analysis and many other advanced features). I am sure quite few of you have not known of this rich functionality.

You can embed your script as a script tool in a custom geoprocessing toolbox and as a Python toolbox. There are two great posts to review: Comparing custom and Python toolboxes and Why learn/use Python Toolboxes over Python Script Tools? to learn more when to use which. If you are just starting with the ArcGIS, consider testing script tools first before playing with Python toolboxes. Setting up a script tool without using a Python toolbox might be much easier for a beginner.

As a last resort, if you want your end users will be able to have a custom dialog box when they will run your tools plus some additional parameter handling, consider embedding your Python script tool into a custom C++/.NET tool which might provide some additional GUI features, but you will be limited to the GP tools GUI scope anyway. I am urged that it is not a good idea to invest into developing with ArcObjects since this technology has a very steep learning curve and will eventually become obsolete as ArcGIS Pro and its .NET SDK will gain popularity. Moving ArcObjects code into ArcGIS Pro is not supported and therefore in my opinion it is better to stay with Python unless you really have to develop something special on top of ArcGIS right now.

Keep in mind that you have Python add-ins which provide additional functionality with the windows, messages and dialogs. They are easy to build and distribute and if you are familar with Python and arcpy, you can start developing them in no time at all.

Desktop app / embed external GUI into a toolbox tool

If you want to develop a stand-alone application (such as .exe file for Windows), you would need to convert your Python script into a an .exe file with any utility such as py2exe. In order for this script to run, you would need to have ArcGIS installed on the machine because it will need to use arcpy site-package which is installed when installing ArcGIS.

As for the custom GUI, you have various Python libraries such as Tkinter (which is shipped with the core Python installation), PyQt/PySide (free Qt bindings), wxPython (wxWidgets of C++ library), and Kivy (a great cross-platform library with rich UI). I have tried them all and liked PyQt most. Here is a couple of GIS.SE resources to learn from:

Because of the ArcMap architecture, you might have troubles running custom GUI in the same process as ArcMap. I’ve seen some examples with Tkinter, but in general there are many issues to tackle with.

From what I’ve experienced I can say that it is probably better either to stay with the core GUI interface which provides in most of the time everything you’d ever need or develop a custom application importing arcpy (with some extra tuning in configuration) and working with the custom GUI (such as developed with PyQt) without starting any ArcGIS application at all. There is an ArcGIS Idea Form Builder for Python Tools, but it is hard to say if this going to be implemented any soon, so you better search for other alternatives.

I’ve done some tests embedding a custom Python script into a toolbox in ArcGIS Pro 1.0 invoking the PyQt 4 script and there were no problems setting up this and running. If using ArcGIS Pro is an option for you, you might consider this – it will be much easier to embed custom Python tools with own GUI into Pro than ArcMap. One of the gotchas is that Pro uses Python 3.4 and it is 64-bit Python which has certain implications for compatibility with PyQt or any other platform of your choice.

SQL Server spatial functions for GIS users

If you have been using SQL Server for some time, you’ve probably heard of the spatial data support. This might be particularly interesting for anyone who is using any desktop GIS for data management and analysis. If you are an ArcGIS user and have enterprise geodatabases stored within SQL Server databases, you might have wondered whether it is possible to interact with the spatial data. This is useful when you don’t have a chance to use ArcMap to access the database due to some restrictions (permissions, network connections or software compatibility).

Well, you actually can do a whole lot with your geographic data just with SQL. It is important that you define the Shape field as of the Geometry/Geography data type. For most of the GIS work, you’d probably choose Geometry type which represents data in a Euclidean (flat) coordinate system. As soon as you have a geodatabase feature class which has the Shape field defined as of Geometry type, you can use native SQL Server tools to interact both with the feature class attributes and geometry.

Beginning with ArcGIS 10.1, feature classes created in geodatabases in SQL Server use the Microsoft Geometry type by default. To move your existing feature classes to the Geometry storage type, use the Migrate Storage geoprocessing tool or a Python script.

Alright, so after you have copied your file geodatabase feature class into a SQL Server geodatabase, you are ready to use native SQL to interact with the spatial data.

Let’s select all the features from the Parcels feature class.

SELECT * FROM dbo.PARCELS

Because we have a SHAPE column that of Geometry type, we get another tab in the results grid – Spatial results. There you can see your geometries visualized.

Microsoft SQL Server Management Studio

Let’s see what coordinate system our feature class was defined in.

DECLARE @srid INT = (SELECT TOP 1 shape.STSrid FROM dbo.PARCELS)
SELECT @srid AS SRID,
srtext AS Name FROM sde.SDE_spatial_references WHERE auth_srid = @srid

Here we use <GeometryColumnName>.STSrid to get the spatial reference id (SRID) of the coordinate system of the first feature. Because our geographic data is stored in a projected coordinate system (and Geometry type), we cannot get its name by using core SQL Server spatial references table, sys.spatial_reference_systems.

Here is why:

The coordinate systems in this table are for the geography type only as it contains information about the ellipsoid that is required to perform calculations. No such information is required to perform calculations for projected coordinate systems on the plane used by the geometry type, so you are free to use any reference system you like. For the calculations done by the geometry type, it is the same no matter what you use.

Let us explore next what kind of geometry is stored within a table. It is possible to store different types of geometry (such as polygon and polyline) within one table in SQL Server.

Let us see if it is true:

SELECT Id,GeomData AS Geometry,GeomData.STAsText() AS GeometryData
FROM [testgdb].[dbo].[GeneralizedData]

Microsoft SQL Server Management Studio

Yes indeed we store in one table features of different geometry and SQL Server has no problems with that. To be able to visualize this table in ArcMap though, you would need to use a query layer which is basically stand-alone table that is defined by a SQL query. ArcGIS can only handle having one type of geometry stored within each feature class which is why you will get a choice to pick what type of geometry do you want to look at.

New Query Layer After adding this layer into ArcMap, you will be able to see the polygons (provided you’ve chosen the polygons). The query layer is in read-only, so you cannot edit features in ArcMap. If you have a SQL Server table (non-registered with geodatabase) with multiple types of geometries stored, you will be able to switch easily between by adding multiple query layers into ArcMap defining what kind of geometry you want to work with.

Let us keep working with a geodatabase feature class which has only polygons. Let’s check if it’s true:

SELECT Shape.STGeometryType() AS GeometryType FROM dbo.PARCELS

Alright, so we know already what kind of coordinate system the data is stored in and we know that there are polygons. Let us get the perimeter (length) and the area of those polygons.

SELECT PARCEL_ID, SHAPE.STArea() AS Area,
SHAPE.STLength() AS Perimeter
FROM dbo.PARCELS 

Microsoft SQL Server Management StudioWe can also get the coordinates of each polygon within our feature class. Note that that the start point and the end point are identical – that is because each polygon is considered to be closed:

SELECT Shape.STAsText() AS GeometryType FROM dbo.PARCELS

This is what we will get:
POLYGON ((507348.9687482774 687848.062502546, 507445.156252367 687886.06251058145, 507444.18750036607 687888.56250258372, 507348.9687482774 687848.062502546))

There are similar functions such as .STAsBinary() which returns the Open Geospatial Consortium (OGC) Well-Known Binary (WKB) representation of a geometry instance and .AsGml() which returns the Geography Markup Language (GML) representation of a geometry instance.

We can also check the number of vertices per polygon:

SELECT PARCEL_id,
Shape.STAsText() AS GeometryDesc,
Shape.STNumPoints() AS NumVertices
FROM dbo.PARCELS
ORDER BY NumVertices DESC

Microsoft SQL Server Management StudioAlright, that was probably enough querying data. Let us check what kind of GIS analysis is available to us with native SQL. The easiest way to get started is probably to process the features of a feature class and then write the resultant geometries into a new table.

DROP TABLE [ParcelEnvelope]
CREATE TABLE [dbo].[ParcelEnvelope]([Id] [int] NOT NULL,
[PolyArea] int,[GeomData] [geometry] NOT NULL) ON [PRIMARY]

INSERT INTO ParcelEnvelope (Id,GeomData,PolyArea)
SELECT PARCEL_ID AS Id,
SHAPE.STEnvelope() AS GeomData,
SHAPE.STArea() AS PolyArea
FROM dbo.PARCELS
ORDER BY OBJECTID

This will create a new table where the envelopes of each parcel polygon will be written to.

Feature envelopeLet us do some buffers on road centerlines geodatabase feature class:

DROP TABLE [RoadBuffer]
CREATE TABLE [dbo].[RoadBuffer]([Id] [int] NOT NULL,
[GeomData] [geometry] NOT NULL) ON [PRIMARY]

INSERT INTO [RoadBuffer] (Id,GeomData)
SELECT OBJECTID AS Id,
SHAPE.STBuffer(50)
FROM dbo.Road_cl
ORDER BY OBJECTID

You can of course write newly generated features into a geodatabase feature class, not just a SQL Server database table. You need to create a new polygon feature class and then run the SQL below. This will create buffer zones for every line found in the Road_cl feature class.

DELETE FROM FC_ROADBUFFERS
INSERT INTO FC_ROADBUFFERS(OBJECTID,SHAPE)
SELECT OBJECTID AS OBJECTID,
SHAPE.STBuffer(50) AS SHAPE
FROM dbo.Road_cl
ORDER BY OBJECTID

Please refer to the Microsoft Geometry Data Type Method Reference to get a full list of available functions and more detailed description.

Try doing some other analysis such as finding what features intersect or overlap or how many points are located within a certain polygon. There is so much you can do! To learn more, get a book Beginning Spatial with SQL Server 2008 which has tons of examples and will also help you understand the spatial data structure basics. I have read this book and really liked it. I think it is a must read for anyone using spatial SQL.

I hope this short introduction into what you as a GIS user can do with SQL Server will help you take advantage of using the native SQL functions wherever using a desktop GIS is not an option.

Automate web testing with Selenium

For some months, I have been using Selenium WebDriver a lot for automating the web testing procedures. If you haven’t heard of this piece of software yet, I highly recommend it to anyone in the software testing industry and for anyone who often performs repetitive tasks in a web browser.

The basic idea of this software is that it is able to execute commands to manipulate the web browser pretty much like a user would plus it can execute JavaScript commands to simulate certain events. You can start a web browser, navigate to a predefined URL, fill in some forms, press a button and evaluate the contents of the web page shown. You are free to write the sequence of the operations that will be taken in any supported language such as Java, Python or Ruby and some others.

This means that you will be able to automate a lot of the work you might have used to do manually. Think of a web application that you work on. You probably have certain tests you perform over and over every build or release and find yourself doing the same thing again and again. When the number of things to check gets large, it is also easy to miss a certain workflow which could have negative consequences in the QA/QC terms. Having a script for the tests will serve as a documentation for your colleagues. The script is also easy to extend to accumulate the regression test cases. The script can be executed as scheduled without you doing anything at all which might be a huge time saver.

Sounds interesting? Start at the Selenium home page. Navigate to the Download and download Selenium Client & WebDriver Language Binding of your choice (I am using Python binding). After downloading and installing the module, fire up your IDE (I use Wing IDE and like it very much) and let’s build a simple script.

Navigate to this example. Remember to choose the language you want to see examples in within the Programming Language Preference section that will follow you on the screen. Then you will see the code only in one language. The official Selenium documentation is great and the API itself is very intuitive and easy to get started. After finishing the tutorials and playing a bit, consider writing some small unit tests for the web application you work on. If you use Python, unittest module is probably the best to start with and you could wrap your Selenium script into individual test cases and run them from the command line or IDE of your choice.

If you have access to Pluralsight, there is a great course by John Sonmez called Automated Web Testing with Selenium. It helped me to get on track in no time at all and I highly recommend this one.

If you want to perform some tests that involve interacting with web map components and GIS functionality, you might find it hard to find any useful information on any best practices or helpful tips where to start.

I was interested in testing web apps build with ArcGIS API for JavaScript by using Selenium, too. From what I’ve found I could perform certain light interactions with pure Selenium (testing changing the basemap and zooming in/out by invoking keystrokes and mouse events). But it gets harder to get into the deep interaction with the Esri JS API. I’ve search for the information here at GIS.SE. There you can find more information on what kind of JS libraries can be used for web testing.

Here are some useful features of Selenium you should be aware of:

  • You can take screenshots of the web browser while the program is executed. This implies that you will be able to open the saved images later on and review the web application visually if there are some parts that are hard to test automatically (such as the layout design). Having an image of how the web application looked like earlier, you could use any image comparison algorithm (like PIL in Python) to find what have changed in the web application.
  • You can execute JavaScript code when running a browser. This means you can interact with the web map component and get its properties, such as the map extent:
console.log(map.extent.toJson());
  • It is possible to resize the window with the browser opened so it will not obstruct other open windows.
driver.set_window_size(500, 500)

How to be efficient as a GIS professional (part 2)

This is a second post on how stay efficient while being a GIS professional. Please see my first post on that here.

3. Learn how to search the Internet.

Well, you probably think of Google. But how many people you know use any Google search syntax? It can be of tremendous help to be able to search just within a certain web site.
I used to search a lot on Esri Forums, so the syntax for this is “site:” and then the site name and then the search string, for example, “site:forums.arcgis.com import arcpy”. This is very useful because the Esri’s search on Forums is not as good as Google’s one. Search for “site:geonet.esri.com import arcpy” on Esri GeoNet. “site:gis.stackexchange.com” is another helpful place for the search.

If you are a veteran, then you’d probably enjoy searching the Archived Esri forums, too: “site:forums.esri.com import arcpy”. By using advanced search techniques you’ll be able to find the answers to your questions much faster.

4. Get yourself a decent suite of useful utilities.

You have gone through this list What free programs should every GIS user have installed, haven’t you? There are tons of useful programs that could save you hours at work.

I also recommend installing a proper file manager program if you find yourself managing files often. Total Commander is used by many people I know, but my favorite is Far Manager, but this is because I grew up using Norton and Volkov Commander, so I am a bit biased. I love using the keyboard for managing files, and Far Manager gives me this kind of control. Navigating around without using the mouse, editing the files and shooting some DOS commands without having to start a CMD is just so great.

5. Learn programming.

Being able to operate any software or operating system programmatically can be a huge time saver. It can help make sure that the operation can be re-run and you have full control over its execution. You don’t have to be a computer scientist, but being able to script a few map export workflows or list all your shapefiles in a folder can be very helpful. Most of modern GIS products provide a scripting language, most often Python, for automation and customization.

I cannot stress more how important it is to be comfortable using one of the popular programming languages. My favorite is Python; I am using it for lots of things: automating some desktop GIS workflows, building add-ins for desktop GIS, managing files, administer GIS servers, author geoprocessing web services.

If you want to learn more about Python in GIS, consider looking at this GIS.SE post: Resources for learning Python programming with generic GIS goals in mind? And if you are an ArcGIS user: What are some resources for learning ArcPy?

More tips will be published soon!

How to be efficient as a GIS professional (part 1)

There has always been a wide interest in how to stay effective regardless of what kind of work you do. There are tons of blogs on personal effectiveness, lots of books on getting things done topic, and lists of useful pieces of software that can be awesome time-savers. My thought was to summarize in here what I’ve learned so far about staying efficient being GIS professional to make it more relevant for everyone involved in GIS industry. There will be multiple posts on that.

1. Get to know the GIS products you work with.
No matter what GIS software you use, you will spend most of your day operating a certain application or two producing a map, converting datasets, or solving geographical problems. Therefore, it is of crucial importance to be familiar with the software. One of the first things I do when learning any new application is going through the top menu and checking what are the options available to me. I hate working with an application I spend most of my time without understanding what is the purpose of the settings and options available in the menus.

Maybe you will find out a cool tool that would save you some time. Or there could be an option for the customization, so you could enjoy interacting with the application much more by changing the background color or toolbars layout.

2. Always think how you can do things faster.
It is always tempting just to get things done and forget about them. This urge is hard to beat. Maybe you need to run a data processing tool by going through a couple of menus. If you do it several times per day, this will result in a half a minute per day which will result in 15 minutes in a month. Take here other things you do which could be done faster. You’ll get a whole hour in a month. You could spend it for something valuable rather than clicking through the menus, right?

Let’s start with the operating system shortcuts. Most of you are on Windows, so learn its ones well. Sometimes you just won’t have a mouse and you will need to start a Windows Explorer with just a keyboard. Pressing the keys is way faster than moving the mouse cursor around.

Then learn the GIS software shortcuts. If you edit geographic data a lot, learn Keyboard shortcuts that can be used while editing (ArcGIS); learn shortcuts to get around the application, too, by Assigning shortcut keys. You will be amazed how much faster you will operate the application. Search the ArcGIS software documentation for the shortcuts already embedded.

3. Master touch typing.
You should be able to type fast (touch typing) without looking at your keyboard. If you are brave, why not get a Dvorak keyboard and learn typing on that. The research shows some evidence that Dvorak layout is safer and nicer to work with. I’ve got mine from TypeMatrix and quite like it, even though it was painful to accept typing really slow first. It usually takes around a month to switch to Dvorak completely.

If you run many Windows programs, why not add Windows shortcuts to them? Windows is capable of creating shortcuts for nearly everything from a program installed to a control panel item.

All these things initially take time to setup, but it is worth it in the long run. Just think of this as of an investment which will pay off rather soon.

More tips will be published soon!