Open source community in 2018

I was recently searching for a virtual machine with the open source GIS software pre-installed knowing that there was one available for many years which I have blogged about 8 years ago. It was funny to find and read my 8 years old blog post which I’ve started saying that “I am not a big fan of Linux and open source software.” How much have changed since then!

True, back then open source GIS community was not what it is today; QGIS has really grown into a fully-fledged desktop GIS, quite a few Python geospatial packages have been written, and it became a whole lot easier to start using open-source software. Today, I love open-source. Just as proprietary software, it has its own pros and cons, though. But to illustrate the beauty of open-source, I’d like to share a couple of personal stories that I think are very illustrative.

One evening, I was playing with QGIS and have noticed an annoying bug – pasting data from clipboard inside Python console causes the text cursor to be moved to the end of the row. What is funny about this bug is that the same behavior can be seen in ArcMap Python console. I decided to report the QGIS bug on their issues web page. You have no idea how surprised I was when I have received a notification in a couple of hours that the issue was fixed in the QGIS source code and the next release won’t have this issue. Some time later, I have reported a typo in the installation dialog text – it was fixed within an hour after the issue was reported. OK, I do understand that those were not very complicated issues, however, I found it astonishing to be able to get fixes in a fairly large and complex desktop applications that quickly. This is how open-source community operates: next time I upgraded the QGIS, those bugs were not there any longer.

Another night, I was playing with mypy generating Python interface files. I found a bug which I reported on the mypy GitHub page. Later, on the same day, Guido van Rossum himself confirmed the bug and suggested the fix. I have forked the mypy repo, fixed the issue, Guido reviewed the change, suggested refactoring, I have refactored the code, Guido reviewed it again, and merged my pull request. It took just a few hours to fix an issue in a package used daily by thousands users. In addition, having this personal interaction with the author of Python and having him approving the code you write is very inspiring. This is what I love about Python community. This is what I love about open-source.

If you have not done so yet, I encourage everyone find a product, a project, or a program that is open-sourced and start contributing. You have no idea how much you will learn by reading code written by other people and how fast you will grow as a developer by working in a virtual team with other peers. If you are not a programmer, you can always work on finding and reporting issues, improving the docs, or writing a tutorial. Answering or improving questions on the GIS StackExchange website is another great way to contribute to the public knowledge base available for all GIS professionals.

I have myself authored a few programs with open source code published on GitHub. It is hard to describe what a joy it is to hear from the users of those fairly simple programs that they found my programs to be helpful in their work. Yes, you may be writing software as a part of your job you get paid for and this software is then used by your happy customers, but having a complete stranger praising the program you have written and shared is a whole different story. Give it a try!

Advertisements

Getting geodatabase features with arcpy and heapq Python module

If you have ever needed to merge multiple spatial datasets into a single one using ArcGIS, you have probably used the Merge geoprocessing tool. This tool can take multiple datasets and create a single one by merging all the features together. However, when your datasets are stored on disk as multiple files and you only want to get a subset of features from it, running the Merge tool to get all the features together into a single feature class may not be very smart.

First, merging features will take some time particularly if your datasets are large and there are a few of them. Second, even after you have merged the features together into a single feature class, you still need to iterate it getting the features you really need.

Let’s say you have a number of feature classes and each of them stores cities (as points) in a number of states (one feature class per state). Your task is to find out 10 most populated cities in all of the feature classes. You could definitely run the Merge tool and then use the arcpy.da.SearchCursor with the sql_clause to iterate over sorted cities (the sql_clause argument can have an ORDER BY SQL clause). Alternatively, you could chain multiple cursor objects and then use the sorted built-in function to get only the top 10 items. I have already blogged about using the chains to combine multiple arcpy.da.SearchCursor objects in this post.

However, this can also be done without using the Merge geoprocessing tool or sorted function (which will construct a list object in memory) solely with the help of arcpy.da.SearchCursor and the built-in Python heapq module. Arguably, the most important advantage of using the heapq module lies in ability to avoid constructing lists in memory which can be critical when operating on many large datasets.

The heapq module is present in Python 2.7 which makes it available to ArcGIS Desktop users. However, in Python 3.6, it got two new optional key and reverse arguments which made it very similar to the built-in sorted function. So, ArcGIS Pro users have a certain advantage because they can choose to sort the iterator items in a custom way.

Here is a sample code that showcases efficiency of using the heapq.merge over constructing a sorted list in memory. Please mind that the key and reverse arguments are used, so this code can be run only with Python 3.

 

Printing pretty tables with Python in ArcGIS

This post would of interest to ArcGIS users authoring custom Python script tools who need to print out tables in the tool dialog box. You would also benefit from the following information if you need to print out some information in the Python window of ArcMap doing some ad hoc data exploration.

Fairly often your only way to communicate the results of the tool execution is to print out a table that the user could look at. It is possible to create an Excel file using a Python package such as xlsxwriter or by exporting an existing data structure such as a pandas data frame into an Excel or .csv file which user could open. Keep in mind that it is possible to start Excel with the file open using the os.system command:

os.system('start excel.exe {0}'.format(excel_path))

However, if you only need to print out some simple information into a table format within the dialog box of the running tool, you could construct such a table using built-in Python. This is particularly helpful in those cases where you cannot guarantee that the end user will have the 3rd party Python packages installed or where the output table is really small and it is not supposed to be analyzed or processed further.

However, as soon as you would try to build something flexible with the varying column width or when you don’t know beforehand what output columns and what data the table will be printed with, it gets very tedious. You need to manipulate multiple strings and tuples making sure everything draws properly.

In these cases, it is so much nicer to be able to take advantage of the external Python packages where all these concerns have been already taken care of. I have been using the tabulate, but there are a few others such as PrettyTable and texttable both of which will generate a formatted text table using ASCII characters.

To give you a sense of the tabulate package, look at the code necessary to produce a nice table using the ugly formatted strings (the first part) and using the tabulate package (the second part):

The output of the table produced using the built-in modules only:

builtin

The output of the table produced using the tabulate module:

tabulate

 

 

Warning: new GDB_GEOMATTR_DATA column in ArcGIS geodatabase 10.5

This post would be of interest to ArcGIS users who are upgrading enterprise geodatabases from ArcGIS 10.1-10.4 version to 10.5+ version. According to the Esri documentation and resources (link1, link2, link3):

Feature classes created in an ArcGIS 10.5 or 10.5.1 geodatabase using a 10.5 or 10.5.1 client use a new storage model for geometry attributes, which stores them in a new column (GDB_GEOMATTR_DATA). The purpose of this column is to handle complex geometries such as curves. Since a feature class can have only one shape column, the storage of circular geometries must be stored separately and then joined to the feature class when viewed in ArcGIS.

This means that if you create a new feature class in an enterprise geodatabase (either manually or by using a geoprocessing tool), three fields will be created: the OID field (OBJECTID), the geometry field (SHAPE), and this special GDB_GEOMATTR_DATA field. To be aware of this is very important because you will not be able to see this column when working in ArcGIS Desktop or when using arcpy.

The GDB_GEOMATTR_DATA field is not shown when accessing a feature class using arcpy.

[f.name for f in arcpy.ListFields('samplefc')]
[u'OBJECTID', u'SHAPE', u'SHAPE.STArea()', u'SHAPE.STLength()']

Querying the table using SQL, however, does show the field.

select * from dbo.samplefc
[OBJECTID],[SHAPE],[GDB_GEOMATTR_DATA]

If you are working with your enterprise geodatabase only using ArcGIS tools, you may not notice anything. However, if you have existing SQL scripts that work with the feature class schema, it is a good time to check that those scripts will not remove the GDB_GEOMATTR_DATA column from the feature class. This could happen if you are re-constructing the schema based on another table and have previously needed to keep the OBJECTID and the SHAPE columns. After moving to 10.5, you would also keep the GDB_GEOMATTR_DATA column.

Keep in mind that deleting the GDB_GEOMATTR_DATA column will make the feature class unusable in ArcGIS. Moreover, if this feature class stores any complex geometries such as curves, deleting the GDB_GEOMATTR_DATA column would result in data loss.

Trying to preview a feature class without the GDB_GEOMATTR_DATA column in ArcCatalog would show up the following error:

database.schema.SampleFC: Attribute column not found [42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name ‘GDB_GEOMATTR_DATA’.] [database.schema.SampleFC]

Even though very unlikely to happen, trying to add a new field called exactly GDB_GEOMATTR_DATA to a valid feature class using ArcGIS tools would also result in an error:

ERROR 999999: Error executing function.
Underlying DBMS error [Underlying DBMS error [[Microsoft][SQL Server Native Client 11.0][SQL Server]Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘SDE.SAMPLE1’ is specified more than once.][database.schema.sample1.GDB_GEOMATTR_DATA]]
Failed to execute (AddField).

Obviously, trying to add the GDB_GEOMATTR_DATA using plain SQL would not  work either:

ALTER TABLE sde.samplefc
ADD GDB_GEOMATTR_DATA varchar(100)

Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘sde.samplefc’ is specified more than once.

Multiple Ring Buffer with PostGIS and SQL Server

Recently I needed to generate multiple ring buffers around some point features. This can be done using a dozen of tools – Multiple Ring Buffer geoprocessing tool in ArcGIS, using arcpy to generate multiple buffer polygons and merging them into a single feature class using the buffer() method of arcpy.Geometry() object, or by using open source GIS tools such as QGIS. This is also possible to achieve using relational database that has support for the spatial functions. In this post, I would like to show you how this can be done using the ST_Buffer spatial function in PostGIS and SQL Server.

In order to generate multiple buffer distance values (for instance, from 100 to 500 with the step of 100) in SQL Server, I would probably need use CTE or just create a plain in-memory table using declare; in other words, this is what it takes to run range(100, 501, 100) in Python.

In the gist below, there are two ways to generate multiple buffers – using the plain table and the CTE.

Generating a sequence of distances in Postgres is a lot easier thanks to the presence of the generate_series function which provides the same syntax as range in Python.

Using Python start up script for all Python interpreters

This post would be helpful for users of desktop GIS software such as ArcMap who need to use Python inside those applications.

There is a not so well known trick to trigger execution of a Python script before any Python interpreter on your system starts.

Note: If you are a QGIS user, there is a special way of achieving this. Please see the question Script that runs automatically from the QGIS Python Console when QGIS starts, using schedule tasks on Windows 10 for details.

The way to do this is to set up an environment variable called PYTHONSTARTUP in your operating system. You need to do two things:

  1. Create an environment variable that would point to a path of a valid Python script file (.py) with the code that you would like to get executed before any Python interactive interpreter starts. Look at the question [Installing pythonstartup file](https://stackoverflow.com/questions/5837259/installing-pythonstartup-file) for details.
  2. Write Python code that you would like to get executed.

A very important thing to consider is that

The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session.

This means that you can do a bunch of imports and define multiple variables which would be available to you directly at the start up of your GIS application. This is very handy because I often need to import the os and sys modules as well as import arcpy.mapping module and create mxd variable pointing to the current map document I have open in ArcMap.

Here is the code of my startup Python script which you can modify to suit your needs. If your workflow relies on having some data at hand, then you might need to add more variables exposed. I have ArcMap and ArcGIS Pro users in mind.

I have included in the example above a more specific workflow where you would like to be able to quickly execute SQL queries against an enterprise geodatabase (SDE). So, when ArcMap has started, you only need to create a variable conn pointing to a database connection file (.sde) and then use the sql() function running your query. Thanks to the tabulate package, the list of lists you get back is drawn in a nice table format.

2018-03-15 12_55_03-PythonWindowArcMap

 

 

Visualizing computational geometry concepts using JTS TestBuilder

In this post, I would like to let you know about an excellent piece of software, Java Topology Suite (JTS).

JTS is an open source library of spatial predicates and functions for processing geometries. It provides a complete, consistent, and robust implementation of fundamental algorithms for processing linear geometry on the 2-dimensional Cartesian plane.

A funny thing about it is that JTS

is used by most java based Open Source geospatial applications, and GEOS, which is a C++ port of JTS, is used by most C based applications.

So, all the downstream projects using GEOS such as various Python wrappers around GEOS such as shapely and even the PostgreSQL extension, PostGIS, all of them really work against the JTS using the GEOS as the interface for communication. So the JTS is a very, very powerful Java library.

If you are not a Java developer, though, this might be of little interest to you. However, there is another little application, called JTS TestBuilder, which provides a GUI for geometry exploration and is an interface into the JTS API. It is not so famous as other pieces of open source GIS stack, such as QGIS or GRASS, though. Also its documentation is outdated and scarce, so you would need to find out how to use the application on your own.

Nevertheless, it is an indispensable tool for anyone who spends a fair amount of time working with computational geometry or spatial data processing applications. It would also serve as a great visualization tool for GIS instructors who need to visually explain how GIS algorithms operate. I have used it to show how Convex Hull is created from a set of points, for instance. One obvious advantage of JTS TestBuilder is that you do not need to run any heavy GIS applications and the “geometry modification – running analysis – seeing the result” cycle is really short.

Here I’ve loaded cities of California along with the state boundary and created a convex hull for the boundary geometry.

2018-03-14 17_41_18-JTS TestBuilder

Having said that, you can work in the following manner:

  • Use your favorite GIS database management tool to get WKT of a geometry you would like to inspect or analyze.
  • Use the JTS TestBuilder to draw the features.
  • Run JTS Geometry Functions constructing new geometries or answering spatial questions.
  • Load the results of the analysis back into your GIS (either for ad hoc exploration or for storage).

The code to read the features into WKT and write back from WKT using arcpy:

 

JTS TestBuilder can also help you to learn something new with regard to GIS theory. If you think that you are a well seasoned GIS professional who can amaze others by mentioning a few cool names like Voronoi or Thiessen, I encourage you to explore the geometry functions JTS TestBuilder provides. I am pretty sure just a few of you have heard of:

  • Koch snowflake which are used a lot in space-filling as well as cartographic simplifaction algorithms.
  • Seirpinski carpet which is not used extensively in GIS yet, but there are some emerging applications regarding urban pattern analysis.

If you would like to take advantage of the computational geometry algorithms implemented in JTS, there are also ports to .NET and JavaScript.

Another very similar application that is particularly popular among math teachers is GeoGebra. I have been using it for a while, too, but it lacks export of result geometries into WKT which can be put into a geospatial database or drawn directly in a desktop GIS application such as ArcMap or QGIS. You can try GeoGebra online or by installing a desktop application. It is also available as an app for iOS, Android, and Windows Phone.