Warning: new GDB_GEOMATTR_DATA column in ArcGIS geodatabase 10.5

This post would be of interest to ArcGIS users who are upgrading enterprise geodatabases from ArcGIS 10.1-10.4 version to 10.5+ version. According to the Esri documentation and resources (link1, link2, link3):

Feature classes created in an ArcGIS 10.5 or 10.5.1 geodatabase using a 10.5 or 10.5.1 client use a new storage model for geometry attributes, which stores them in a new column (GDB_GEOMATTR_DATA). The purpose of this column is to handle complex geometries such as curves. Since a feature class can have only one shape column, the storage of circular geometries must be stored separately and then joined to the feature class when viewed in ArcGIS.

This means that if you create a new feature class in an enterprise geodatabase (either manually or by using a geoprocessing tool), three fields will be created: the OID field (OBJECTID), the geometry field (SHAPE), and this special GDB_GEOMATTR_DATA field. To be aware of this is very important because you will not be able to see this column when working in ArcGIS Desktop or when using arcpy.

The GDB_GEOMATTR_DATA field is not shown when accessing a feature class using arcpy.

[f.name for f in arcpy.ListFields('samplefc')]
[u'OBJECTID', u'SHAPE', u'SHAPE.STArea()', u'SHAPE.STLength()']

Querying the table using SQL, however, does show the field.

select * from dbo.samplefc
[OBJECTID],[SHAPE],[GDB_GEOMATTR_DATA]

If you are working with your enterprise geodatabase only using ArcGIS tools, you may not notice anything. However, if you have existing SQL scripts that work with the feature class schema, it is a good time to check that those scripts will not remove the GDB_GEOMATTR_DATA column from the feature class. This could happen if you are re-constructing the schema based on another table and have previously needed to keep the OBJECTID and the SHAPE columns. After moving to 10.5, you would also keep the GDB_GEOMATTR_DATA column.

Keep in mind that deleting the GDB_GEOMATTR_DATA column will make the feature class unusable in ArcGIS. Moreover, if this feature class stores any complex geometries such as curves, deleting the GDB_GEOMATTR_DATA column would result in data loss.

Trying to preview a feature class without the GDB_GEOMATTR_DATA column in ArcCatalog would show up the following error:

database.schema.SampleFC: Attribute column not found [42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name ‘GDB_GEOMATTR_DATA’.] [database.schema.SampleFC]

Even though very unlikely to happen, trying to add a new field called exactly GDB_GEOMATTR_DATA to a valid feature class using ArcGIS tools would also result in an error:

ERROR 999999: Error executing function.
Underlying DBMS error [Underlying DBMS error [[Microsoft][SQL Server Native Client 11.0][SQL Server]Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘SDE.SAMPLE1’ is specified more than once.][database.schema.sample1.GDB_GEOMATTR_DATA]]
Failed to execute (AddField).

Obviously, trying to add the GDB_GEOMATTR_DATA using plain SQL would not  work either:

ALTER TABLE sde.samplefc
ADD GDB_GEOMATTR_DATA varchar(100)

Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘sde.samplefc’ is specified more than once.

Advertisements

Working with SQL in Jupyter notebook and dumping pandas into a SQL database

I have posted previously an example of using the SQL magic inside Jupyter notebooks. Today, I will show you how to execute a SQL query against a PostGIS database, get the results back into a pandas DataFrame object, manipulate it, and then dump the DataFrame into a brand new table inside the very same database. This can be very handy if some of your operations are better done using plain SQL, for instance, with the help of spatial SQL functions, but other ones are easier to perform using pandas​, for instance, adding and calculating new columns when the data you need to access is not stored inside the database.

The steps are as below:

  • Connect to a database using the SQL magic syntax
  • Execute a SELECT SQL query getting back a result set
  • Read the result set into a pandas DataFrame object
  • Do stuff with the DataFrame object
  • Dump it to a new table in the database you are connected to using the PERSISTcommand

Again, this is very handy for prototyping when you need to produce some tables doing a database design or when you need to have a new temporary table created for some application to read or when running some integration tests and needing to have a mock-up table to work with.

The sample notebook is available as a gist:

 

Adding IPython SQL magic to Jupyter notebook

If you do not use the %%sql magic in your Jupyter notebook, the output of your SQL queries will be just a plain list of tuples. A better way to work with the result sets returned is to draw them as a table with the headers. This is where the IPython SQL magic gets very handy. You can install it using pip install ipython-sql. Refer to its GitHub repository for details of the implementation.

You have to connect to a database and then all your subsequent SQL queries will be aware of this connection and the result sets are also drawn nicely in a table. Another neat feature of the %%sql magic is that you will be able to get the result of your SQL query as a pandas data frame object if you would like to proceed working with the result set using Python. The result object of SQL query execution can be accessed from a variable _. This is because IPython’s output caching system defines several global variables; _ (a single underscore) stores previous output just like the IPython interpreter.

Look into this sample Jupyter notebook for illustration.

 

Desktop PyQt application for executing SQL queries against Esri file geodatabase

As I was always interested in building GUIs for some GIS operations, I thought that exploring PyQt deeper would be fun. A project has started as an experimental playground to see what functionality PyQt provides. As I spent more time working with PyQt, I have started wondering what it would take to build a useful desktop application.

Because I often find myself in need of querying a file geodatabase’s datasets, I have decided to build a GUI-based SQL editor that would let me execute SQL queries against a table or a feature class and draw the result set in a table form for further visual inspection. I have thought that other GIS users and developers may find this application useful and I therefore have decided to start a GDB: GitHub repository to let others take advantage of my work. Here it comes, check it out!

GDBee_sample

GDBee is a PyQt5 desktop application which you can use to write and execute SQL queries against tables and feature classes stored inside an Esri file geodatabase. It provides a tabbed interface which lets you connect to multiple geodatabases within a single session. It has a rich code editor featuring auto-completion (with suggestions), syntax highlight, and result set export.

If you are a QGIS Desktop user, you are already able to execute SQL against file geodatabases using QGIS DBManager plugin, but GDBee has some extra features that the DBManager is missing (for instance, you do not need to add your datasets as layers first and you can choose to copy individual cells instead of the whole row) from the result table.

Because Python is so widely used in the GIS community, I thought it would make sense to take advantage of Python bindings of GDAL (via GEOS) to be able to connect to a file geodatabase and execute SQL queries. Working with a file geodatabase via GEOS makes it possible to take advantage of SQL spatial functions that are otherwise inaccessible to an ArcGIS user!

The application provides multiple features:

  • Working with multiple geodatabases using multiple tabs (single geodatabase connection per tab)
  • Exporting result sets into various formats (WKT strings to paste into QGIS using QuickWKT plugin, arcpy code to paste into ArcMap Python window, pandas data frame via .csv file (which can be taken into geopandas), and Markdown table via .md file or plain text)
  • Executing SQL query with respect to the user selection (only selected text is executed)
  • Loading/saving SQL queries from and to text files on disk
  • Convenient keyboard shortcuts for query execution (F5 and Ctrl-Enter) and tab interaction (Ctrl-N and Ctrl-W for opening and closing tabs)
  • Copying data from the result set table (either individual cell values or row(s) with the headers preserved) – ready to paste properly into an Excel sheet
  • Choosing whether you want to have geometry column in the result set as WKT

You can look at its GitHub repository: GDBee: GitHub repo. You may find this PyQt desktop application useful if:

  • You would like to be able to interrogate your file geodatabase datasets using SQL (instead of Python-based interface such as Esri arcpy or open-source ogr)
  • You are an ArcGIS user that does not want to have QGIS Desktop installed just to be able to execute SQL against a file geodatabase
  • You use SQL on a daily basis working with spatial databases (such as PostgreSQL or Microsoft SQL Server) and want to be able to execute ad hoc SQL queries against file geodatabase datasets without loading them into a proper DBMS database
  • You already have a lot SQL code targeting tables stored in a DBMS spatial database and you would like to be able to reuse this code when targeting a file geodatabase

Do you think there is some other functionality that should be added? Please let me know by submitting an issue in the repository.

Load Esri geodatabase tables into SQLite

I have blogged earlier about loading Esri file geodatabase feature classes into SQLite database by using standard geoprocessing tools. This time, I’ve written a short sample that creates a SQLite database, reads a file geodatabase feature class rows, and then creates a new table and insert the features into it. Everything except reading file geodatabase (I use arcpy to access features with a cursor) is done using pure Python and SQL. The logic is rather simple – you read the schema of a geodatabase table and then mirror it inside a SQLite database table. Then you insert the features’ geometry and attributes into right columns.

I was pleased to observe excellent performance of data insertion and query processing. Of course, the initial data load is an overhead that will present in the script, but the efficiency of data manipulation using SQL and spatial analysis using ST_Geometry functions outweighs it, in my opinion. If you will work with rather larger datasets, you should consider tuning the performance of SQLite spatial indices because using ST_Geometry may be initially slow.

It is also worth noting that with some minor modifications, it would be possible to load a non-spatial file geodatabase table into an SQLite database table if you want to perform some SQL-based data aggregation. A very unique feature of SQLite database is that it is possible to generate such a database completely in memory without writing anything on disk. This is suitable for scripts where you want to do the processing on the fly without creating any additional scratch workspaces. Use the syntax for creating SQLite databases in memory (:memory:). I’ve done tests loading the spatial extension and some data and it worked fine.