Drop multiple PostgreSQL databases matching name pattern

I once had a long data processing job that created multiple database snapshots on a PostgreSQL instance. However, this process was crashing sometimes leaving the databases behind. I had to automate the process of dropping the database where databases with names matching a certain pattern would be dropped. Bash script came in handy!

In the script, I write to a text file a series of SQL drop database statements and then execute each of the line using psql. It tool me a while to get it right as I wasn’t able to find a ready-to-use Bash script online, so this may be helpful to others.

Handy Bash scripts to run OGR command line tools

If you are a user of GDAL installation (which includes OGR), then you could take advantage of running OGR command line tools such as ogrinfo and ogr2ogr which are very handy and often more efficient in data processing comparing to Python scripts written using shapely or fiona. As a rule of thumb, if your Bash script is within 20-30 lines of code then you are doing okay. If it gets longer, it is worth switching to Python for readability and maintainability of the code.

Make sure to review the Python GDAL/OGR Cookbook!, it has a ton of useful examples. Below are some snippets you can use; they will also help you learn Bash if you are not familiar with it yet.

If you will be writing a lot of Bash, I suggest using an IDE that supports it. I have been using PyCharm with an amazing plugin BashSupport. It takes the experience of writing Bash scripts to a new level. It provides syntax highlight, auto-completion, and hover hints.

Multiple Ring Buffer with PostGIS and SQL Server

Recently I needed to generate multiple ring buffers around some point features. This can be done using a dozen of tools – Multiple Ring Buffer geoprocessing tool in ArcGIS, using arcpy to generate multiple buffer polygons and merging them into a single feature class using the buffer() method of arcpy.Geometry() object, or by using open source GIS tools such as QGIS. This is also possible to achieve using relational database that has support for the spatial functions. In this post, I would like to show you how this can be done using the ST_Buffer spatial function in PostGIS and SQL Server.

In order to generate multiple buffer distance values (for instance, from 100 to 500 with the step of 100) in SQL Server, I would probably need use CTE or just create a plain in-memory table using declare; in other words, this is what it takes to run range(100, 501, 100) in Python.

In the gist below, there are two ways to generate multiple buffers – using the plain table and the CTE.

Generating a sequence of distances in Postgres is a lot easier thanks to the presence of the generate_series function which provides the same syntax as range in Python.

Generating a database schema report using SchemaSpy

I recently wanted to generate a visual report over a database to see what tables, relationships, and other objects are present. I also wanted to see the schema of all the tables. The registrant Python package I’ve written does give me most of the things I want but it was not designed to fetch any relational structure such as primary/foreign keys and constraints.

I found an excellent free command line tool tool, SchemaSpy. It requires Java so it may be an extra thing to fix if you do not have it installed. The syntax for Microsoft SQL Server:

java.exe -jar .\schemaspy-6.0.0-rc2.jar -t mssql05 -dp .\sqljdbc42.jar -db non_spat -host localhost -port 1433 -u user -p password -o C:\Temp\sqlschema

You would need to download the Microsoft JDBC Driver for SQL Server if you work with MS SQL Server (I used the version 6).

Working with SQL in Jupyter notebook and dumping pandas into a SQL database

I have posted previously an example of using the SQL magic inside Jupyter notebooks. Today, I will show you how to execute a SQL query against a PostGIS database, get the results back into a pandas DataFrame object, manipulate it, and then dump the DataFrame into a brand new table inside the very same database. This can be very handy if some of your operations are better done using plain SQL, for instance, with the help of spatial SQL functions, but other ones are easier to perform using pandas​, for instance, adding and calculating new columns when the data you need to access is not stored inside the database.

The steps are as below:

  • Connect to a database using the SQL magic syntax
  • Execute a SELECT SQL query getting back a result set
  • Read the result set into a pandas DataFrame object
  • Do stuff with the DataFrame object
  • Dump it to a new table in the database you are connected to using the PERSISTcommand

Again, this is very handy for prototyping when you need to produce some tables doing a database design or when you need to have a new temporary table created for some application to read or when running some integration tests and needing to have a mock-up table to work with.

The sample notebook is available as a gist:


Esri enterprise geodatabase and PostGIS database

PostGIS is an amazing extension to PostgreSQL which makes it possible to manage geospatial data very efficiently. Just in several mouse clicks, after installing the PostgreSQL for your operating system and the PostGIS extension, you will be able to create databases and load geospatial datasets into it for managing later on.

If you are just starting learning about PostGIS, consider going through this excellent tutorial from Boundless (frankly, the best one I’ve seen so far).

You can do pretty amazing things right in your PostGIS database using ST SQL spatial functions. This is handy if your data does reside in the database and you are more comfortable using SQL rather than Python or any other language to access the database and then do the analysis using a geospatial library such as geopandas. Just for your reference, though, there is a Python package GeoAlchemy that can let you work with the PostGIS data and ST functions right from the Python ecosystem. Another Python package, geopandas, can read PostGIS spatial table into a GeoDataFrame so you can work with your data in Python just like you would work with a pandas Data Frame.

For ArcGIS users, it might be also handy to be able to create an enterprise multiuser geodatabase in PostgreSQL database (with PostGIS extension enabled) and then use SQL to do the analysis when you prefer scripting certain workflows without using geoprocessing tools or arcpy scripting. An important note: be sure to create your geodatabase feature classes using a configuration keyword PG_GEOMETRY. Storing your data with ST_GEOMETRY type won’t let you run PostGIS SQL spatial functions on the data.

Please refer to this help page for details on this. You essentially need to create a PostGIS database first (using psql shell or pgAdmin GUI) and then run the Create Enterprise Geodatabase geoprocessing tool. Again, you will be able to load your Esri geodatabase feature class stored in the PostGIS database using the PG_GEOMETRY keyword into a geopandas data frame. Keep in mind though, that to create an ArcGIS enterprise geodatabase in PostGIS, you would need to have an ArcGIS Server license.

I think having an ArcGIS multiuser geodatabase in PostGIS is very appealing because you might have a bunch of other software either serving the spatial datasets (such as GeoServer / ArcGIS Server) or accessing it via SQL in a desktop GIS (such as QGIS / ArcGIS Desktop). Having all the data stored in one place will make it easier to manage and maintain it as you don’t need to propagate the updates among multiple data repositories. Having an Esri complied geodatabase repository with all the powerful features it has to offer (such as geometric networks, linear referencing, geocoding) along with a chance to be able to pull all the data with pure SQL is a very agile concept many would find suitable for their organizations.

PostgreSQL and ArcSDE

ESRI ArcGIS 9.3 version has support for one more database management system (DBMS) for storing our spatial datasets. In addition to the ability of using Microsoft SQL Server, Oracle, IBM DB2 and Informix, PostgreSQL can be used now as well. In this post, I would like to give a brief review of this DBMS and outline some results of its integration with ESRI ArcGIS family products.

PostgreSQL is an open source DBMS and it was developed by an online community. This product is licensed under BSD license (which makes it basically free and moves very close to public domain use). This DBMS conforms to SQL 92/99 standards (see details for database language grammar). PostgreSQL can be treated as an adequate alternative for most used commercial DBMSs; it provides support for user defined data types (UDT), indexing, using stored procedures (OK, there are no really stored procedures there, but kind of), views, indices and many other things we used to use in everyday work with databases. Furthermore, the DBMS has client library interface in several programming languages including, but not limited to, C++, Java, and Python.

When I am talking about using a DBMS with ESRI GIS solutions, I mean mostly using a DBMS for managing geographical information and having an interface that could provide an effective means for data storing, indexing, and retrieving. As to ESRI solutions, ArcSDE provides such a gateway to a DBMS and is used for loading and extracting spatial data from DBMS. Thus, multiple questions are raised. Is it easy to install and administer the DBMS software? Does it work relatively fast comparing to other commercial DBMS? How good is the product documentation? Well, let’s try to answer at least some of those questions.

Requirements and installation of PostgreSQL

The full list of requirements is provided at ESRI Wiki.

The process of installation is very simple, straightforward, wizard-based, and well-documented. A detailed step-by-step guide is here, at the ESRI support knowledge base article #35128.

Most of these steps are given in the ArcGIS Server Help as well.

PostgreSQL distribution files are shipped with ArcSDE binary files, thus no any additional steps for downloading any PostgreSQL files are required. Instructions on PostgreSQL and ArcSDE configuration are shipped with binary files at <ArcSDE installation folder>\windows\documentation_server\ARCSDE93POSTGRESQLINSTALLGUIDES\install _gd_postgresql.htm.

Using a spatial database

After performing installation, post-installation, and PostgreSQL configuration, one is ready to use ArcCatalog application or PostgreSQL administrative application (pgAdmin III) in order to start creating or loading data into a spatial database. In general, not many significant differences can be noticed while working with SDE database stored in PostgreSQL as to Oracle or SQL Server. Both three-tier (sde service) and two-tier architecture (direct connection) can be used to establish a connection with a database (however, a direct connection has been the recommended way to go for a while by ESRI).

Using PostGIS extension

PostGIS is an optional spatial database extension for PostgreSQL that is not installed during PostgreSQL installation and should be installed and enabled independently via Application Stack Builder. PostGIS provides support for geographic objects storage into the PostgreSQL object-relational database. Thus, PostGIS, basically, “spatially enables” the PostgreSQL to be used as a spatial database for ESRI ArcSDE platform by using a PG_GEOMETRY format. Similar solutions are provided by Oracle (Oracle Spatial), IBM DB2 (Spatial Extender), Informix (Spatial DataBlade Module), and Microsoft SQL Server 2008 (out-of-the-box GEOMETRY and GEOGRAPHY data types). PostGIS complies with the OpenGIS “Simple Features Specification for SQL” and has been certified as compliant with the “Types and Functions“. PostGIS installation can be found and downloaded from the Internet and is free as well. For details, see ESRI Support knowledge base article 36389.

In a word, what PostGIS gives us, is that it makes it possible to store geometry of features by using another format, PG_GEOMETRY. Nevertheless, we can use spatial SQL functions with ST_GEOMETRY as well as PG_GEOMETRY types to manipulate features from DBMS to any other application, much like how we can do that in Oracle. The list of available SQL functions can be found here: http://webhelp.esri.com/arcgisserver/9.3.1/dotNet/geodatabases/an_overv315896832.htm. That is, users are able to perform SQL queries and operations on spatial data directly within the PostgreSQL.

Some testing

Inside the PostgreSQL, the interface is very simple and intuitive, looking a bit similar to SQL Server Management Studio. Product documentation is very good and the help system is available both from the desktop installation and from the web (there is a help with user comments as well)

Regarding performance of data retrieving, just a few tests have been done. Undoubtedly, the level of performance will vary greatly among different machines depending upon hardware resources, spatial datasets complexity, and finally, performance assessment tools. In our case, the assessment of the time required for retrieving and drawing features from a PostgreSQL database of PostGIS spatial type and SQL Server 2008 Geometry type has been done. For this purpose, an ESRI mxdperfstat utility has been used.

To my knowledge, this free utility is the most appropriate ArcGIS-compatible solution for map rendering performance evaluation (apart from Geodatabase Toolset (GDBT) for ArcCatalog, which will be available soon). This application from ESRI Testing Service is widely used to help diagnose typical performance problems with ArcGIS map documents such as:

  • Inefficient scale dependency options use
  • Slow symbology rendering
  • Using projection on-the-fly
  • Potential database tuning needs.

Two MXD map documents were created and each of them included the same set of feature classes from two different databases, from a PostgreSQL 8.3 of PostGIS data type and SQL Server 2008 (PG_GEOMETRY and Geometry data types, respectively). As to my tests, the speed of retrieving and drawing features from those two databases were different. I have used a direct connection to both PostgreSQL database and SQL Server 2008 database.

SQL Server feature classes were retrieved and drawn faster on smaller scales (from 1:1,000,000 and smaller). However, on large scales, 1:500,000 and larger, feature classes with PostGIS geometry are retrieved significantly faster. As I did not set any scale dependencies, i.e., in smaller scales more features are drawn, since we cover a bigger geographical area. The problem could be with the indices in SQL Server 2008, which are recommended to be changed by ESRI. In order to understand whether the tuning of ArcSDE configuration parameters, PostgreSQL database tuning, and SQL Server 2008 indices tuning can influence the performance of PostgreSQL, more detailed research should be done.

Conclusion and future prospects

With the version ArcGIS Server 9.3, users are now provided with additional support for PostgreSQL and PostGIS framework. This allows users to access multiple types of geographic datasets from virtually any commercial or free software programs (you could check zigGIS if you have PostGIS data and ArcGIS Desktop, but ArcSDE is not accessible). The ability of using PostgreSQL can be even more alluring given that in the ArcGIS 9.3 Data Interoperability extension direct read, data import and data export options for PostgreSQL and PostGIS were added. Finally, as proposed, at ArcGIS 10, ESRI will release an open API for the geodatabase, thus further enabling non-ESRI software to interoperate with the ArcGIS platform smoothly. In the release 10, which seems to be very promising, ArcGIS will introduce a new functionality that is called “query layers”. This option will allow users to access data of different spatial types stored in a database that is not an ESRI geodatabases directly. That will be done in order to address many ESRI users’ needs to support direct access to the spatial types within a relational database (i.e. SQL 2008, Oracle Spatial, etc) that are not geodatabases per se (i.e. does not utilize ESRI ArcSDE technology). As ESRI states, these so called “query layers” “will allow users to use SQL queries to access spatial type data and create a new (read-only) layer in ArcMap. This layer will allow users to map, query, and analyze data from spatially enabled databases.” Users will be able to use PostgreSQL without registering the spatial information in an ESRI geodatabase or installing and configuring ArcSDE. Moreover, geodatabase users who store their spatial information in spatial types can use this functionality to work with their data using complex SQL queries. All of those arguments make it worthwhile to take a closer look at PostgreSQL and its possible applications to ESRI solutions market.

Further reading

All available technical articles on the PostgreSQL at ESRI

Implementation of ArcSDE for PostgreSQL (pdf)

2009 ESRI User Conference Technical Workshop. Introduction to ArcSDE for PostgreSQL (pdf)

ESRI User forum: ArcSDE for PostgreSQL

Building a Spatial Database in PostgreSQL (ppt)

Some blog posts worth checking out