Warning: new GDB_GEOMATTR_DATA column in ArcGIS geodatabase 10.5

This post would be of interest to ArcGIS users who are upgrading enterprise geodatabases from ArcGIS 10.1-10.4 version to 10.5+ version. According to the Esri documentation and resources (link1, link2, link3):

Feature classes created in an ArcGIS 10.5 or 10.5.1 geodatabase using a 10.5 or 10.5.1 client use a new storage model for geometry attributes, which stores them in a new column (GDB_GEOMATTR_DATA). The purpose of this column is to handle complex geometries such as curves. Since a feature class can have only one shape column, the storage of circular geometries must be stored separately and then joined to the feature class when viewed in ArcGIS.

This means that if you create a new feature class in an enterprise geodatabase (either manually or by using a geoprocessing tool), three fields will be created: the OID field (OBJECTID), the geometry field (SHAPE), and this special GDB_GEOMATTR_DATA field. To be aware of this is very important because you will not be able to see this column when working in ArcGIS Desktop or when using arcpy.

The GDB_GEOMATTR_DATA field is not shown when accessing a feature class using arcpy.

[f.name for f in arcpy.ListFields('samplefc')]
[u'OBJECTID', u'SHAPE', u'SHAPE.STArea()', u'SHAPE.STLength()']

Querying the table using SQL, however, does show the field.

select * from dbo.samplefc
[OBJECTID],[SHAPE],[GDB_GEOMATTR_DATA]

If you are working with your enterprise geodatabase only using ArcGIS tools, you may not notice anything. However, if you have existing SQL scripts that work with the feature class schema, it is a good time to check that those scripts will not remove the GDB_GEOMATTR_DATA column from the feature class. This could happen if you are re-constructing the schema based on another table and have previously needed to keep the OBJECTID and the SHAPE columns. After moving to 10.5, you would also keep the GDB_GEOMATTR_DATA column.

Keep in mind that deleting the GDB_GEOMATTR_DATA column will make the feature class unusable in ArcGIS. Moreover, if this feature class stores any complex geometries such as curves, deleting the GDB_GEOMATTR_DATA column would result in data loss.

Trying to preview a feature class without the GDB_GEOMATTR_DATA column in ArcCatalog would show up the following error:

database.schema.SampleFC: Attribute column not found [42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name ‘GDB_GEOMATTR_DATA’.] [database.schema.SampleFC]

Even though very unlikely to happen, trying to add a new field called exactly GDB_GEOMATTR_DATA to a valid feature class using ArcGIS tools would also result in an error:

ERROR 999999: Error executing function.
Underlying DBMS error [Underlying DBMS error [[Microsoft][SQL Server Native Client 11.0][SQL Server]Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘SDE.SAMPLE1’ is specified more than once.][database.schema.sample1.GDB_GEOMATTR_DATA]]
Failed to execute (AddField).

Obviously, trying to add the GDB_GEOMATTR_DATA using plain SQL would not  work either:

ALTER TABLE sde.samplefc
ADD GDB_GEOMATTR_DATA varchar(100)

Column names in each table must be unique. Column name ‘GDB_GEOMATTR_DATA’ in table ‘sde.samplefc’ is specified more than once.

Advertisements

Multiple Ring Buffer with PostGIS and SQL Server

Recently I needed to generate multiple ring buffers around some point features. This can be done using a dozen of tools – Multiple Ring Buffer geoprocessing tool in ArcGIS, using arcpy to generate multiple buffer polygons and merging them into a single feature class using the buffer() method of arcpy.Geometry() object, or by using open source GIS tools such as QGIS. This is also possible to achieve using relational database that has support for the spatial functions. In this post, I would like to show you how this can be done using the ST_Buffer spatial function in PostGIS and SQL Server.

In order to generate multiple buffer distance values (for instance, from 100 to 500 with the step of 100) in SQL Server, I would probably need use CTE or just create a plain in-memory table using declare; in other words, this is what it takes to run range(100, 501, 100) in Python.

In the gist below, there are two ways to generate multiple buffers – using the plain table and the CTE.

Generating a sequence of distances in Postgres is a lot easier thanks to the presence of the generate_series function which provides the same syntax as range in Python.

Generating a database schema report using SchemaSpy

I recently wanted to generate a visual report over a database to see what tables, relationships, and other objects are present. I also wanted to see the schema of all the tables. The registrant Python package I’ve written does give me most of the things I want but it was not designed to fetch any relational structure such as primary/foreign keys and constraints.

I found an excellent free command line tool tool, SchemaSpy. It requires Java so it may be an extra thing to fix if you do not have it installed. The syntax for Microsoft SQL Server:

java.exe -jar .\schemaspy-6.0.0-rc2.jar -t mssql05 -dp .\sqljdbc42.jar -db non_spat -host localhost -port 1433 -u user -p password -o C:\Temp\sqlschema

You would need to download the Microsoft JDBC Driver for SQL Server if you work with MS SQL Server (I used the version 6).

Working with SQL in Jupyter notebook and dumping pandas into a SQL database

I have posted previously an example of using the SQL magic inside Jupyter notebooks. Today, I will show you how to execute a SQL query against a PostGIS database, get the results back into a pandas DataFrame object, manipulate it, and then dump the DataFrame into a brand new table inside the very same database. This can be very handy if some of your operations are better done using plain SQL, for instance, with the help of spatial SQL functions, but other ones are easier to perform using pandas​, for instance, adding and calculating new columns when the data you need to access is not stored inside the database.

The steps are as below:

  • Connect to a database using the SQL magic syntax
  • Execute a SELECT SQL query getting back a result set
  • Read the result set into a pandas DataFrame object
  • Do stuff with the DataFrame object
  • Dump it to a new table in the database you are connected to using the PERSISTcommand

Again, this is very handy for prototyping when you need to produce some tables doing a database design or when you need to have a new temporary table created for some application to read or when running some integration tests and needing to have a mock-up table to work with.

The sample notebook is available as a gist:

 

SQL Server spatial functions for GIS users: part 2

I was curious to see what posts of my blog have been visited by most people over the time. The most popular post is SQL Server spatial functions for GIS users which I have published 3 years ago. It seems as there are many GIS users who need to use SQL Server spatial functions as similar requests such as “sql server spatial functions” are appearing fairly often in the search engines. Today, I will share more examples of using SQL Server spatial functions if you are a GIS user.

Below you find just a few sample SQL queries that have been executed against a SQL Server database containing datasets accessible via PostGIS workshop materials by Boundless which can be downloaded from this link.

This post is an extraction of the Jupyter notebook published on GitHub. Open the notebook to see the result tables or jump to the end of the post where I have it published.

Find points that have the same coordinates

To find coincident points (often referred to as duplicates), you could use various SQL Server spatial functions.

Table1.Shape.STDistance(Table2.Shape) < 1000 --distance value
Table1.Shape.STEquals(Table2.Shape) = 1 --whether shapes are identical
Table1.SHAPE.STX and Table1.SHAPE.STY --compare points XY coordinates

All of these methods would give the same result while having a different execution performance.

Find using STDistance

--Find duplicate points within a certain borough
SELECT CAST(T1.ID AS INT) AS FirstPoint,
CAST(T2.ID AS INT) SecondPoint,
T1.Shape.STDistance(T2.Shape) Distance
FROM
dbo.Homicides T1
JOIN
dbo.Homicides T2
ON
T1.ID < T2.ID
and
T1.Shape.STDistance(T2.Shape) = 0
and
T1.BORONAME = 'Queens'
ORDER BY
Distance, FirstPoint

Find using STDistance

SELECT CAST(T1.ID AS INT) AS FirstPointId,
CAST(T2.ID AS INT) SecondPointId,
T1.Shape.STDistance(T2.Shape) Distance
FROM
dbo.Homicides T1
JOIN
dbo.Homicides T2
ON
T1.ID < T2.ID
and
T1.Shape.STEquals(T2.Shape) = 1
and
T1.BORONAME = 'Queens'
ORDER BY
FirstPointId, SecondPointId

Find using STEquals

SELECT CAST(T1.ID AS INT) AS FirstPointId,
CAST(T2.ID AS INT) SecondPointId,
T1.Shape.STDistance(T2.Shape) Distance
FROM
dbo.Homicides T1
JOIN
dbo.Homicides T2
ON
T1.ID < T2.ID
and
T1.Shape.STEquals(T2.Shape) = 1
and
T1.BORONAME = 'Queens'
ORDER BY
FirstPointId, SecondPointId

Find using STX and STY

SELECT T1.SHAPE.STX, T1.SHAPE.STY, COUNT(*) AS COUNT
FROM
dbo.Homicides T1
WHERE
T1.BORONAME = 'Queens'
GROUP BY
T1.SHAPE.STX, T1.SHAPE.STY
HAVING
COUNT(*) > 1

Calculating distances between points stored in tables

The STDistance function could be used to find the plain distance between the points stored within the same table.

SELECT CAST(T1.ID AS INT) AS FirstPointId
,CAST(T2.ID AS INT) SecondPointId
,T1.Shape.STDistance(T2.Shape) Distance
FROM
dbo.Homicides T1
JOIN dbo.Homicides T2
ON T1.ID < T2.ID
and
T1.Shape.STDistance(T2.Shape) BETWEEN 10 AND 20
and
T1.BORONAME = 'Queens'
ORDER BY
Distance

Calculating distances between points stored in the same table

SELECT CAST(T1.ID AS INT) AS FirstPointId
,CAST(T2.ID AS INT) SecondPointId
,T1.Shape.STDistance(T2.Shape) Distance
FROM
dbo.Homicides T1
JOIN dbo.Homicides T2
ON T1.ID < T2.ID
and
T1.Shape.STDistance(T2.Shape) BETWEEN 10 AND 20
and
T1.BORONAME = 'Queens'
ORDER BY
Distance

Calculating distances between points stored in two tables

Find homicide points that are located within the specified number of meters to the subway stations points. ArcGIS tool: Point Distance (Analysis)

SELECT * FROM
(SELECT CAST(Homi.ID AS int) AS HomicideId
,Homi.WEAPON AS Weapon
,CAST(Subway.ID AS int) AS SubwayId
,Subway.NAME AS Name
,Homi.Shape.STDistance(Subway.Shape) AS Distance
FROM dbo.HOMICIDES Homi
cross join dbo.Subway_stations Subway
where Homi.BORONAME = 'Manhattan' AND Subway.BOROUGH = 'Manhattan')
AS
data
WHERE
Distance < 20
ORDER BY
Distance

Counting points in polygons

To count points in polygons, you could use the STContains function. Table1.Shape.STContains(Table2.Shape) would return 0 or 1.

Find neighborhoods with the largest number of crimes committed (count number of homicides in each neighborhood)

SELECT TOP 10
Polys.Name AS NeighborhoodName, Count(*) AS CrimeCount
FROM
dbo.Homicides AS Points
JOIN
dbo.Neighborhoods AS Polys
ON
Polys.Shape.STContains(Points.Shape) = 1
GROUP BY
Polys.Name
ORDER BY
CrimeCount DESC

We can also calculate a column in the Neighborhoods table to contain the number of points within each neighborhood. For that, we will first need to add a new column to the table, then populate it, and then drop to leave the data clean for the further queries.

This query adding a new fields and calculating the number of points located within each polygon is what is done by the ArcGIS GP tool Spatial Join.

ALTER TABLE dbo.Neighborhoods
ADD PointCount int;

UPDATE
Polys
SET
[PointCount] = COUNTS.CrimeCount
FROM
dbo.Neighborhoods AS Polys
JOIN
(
SELECT
Polys.Name AS NeighborhoodName, Count(*) AS CrimeCount
FROM
dbo.Homicides AS Points
JOIN
dbo.Neighborhoods AS Polys
ON
Polys.Shape.STContains(Points.Shape) = 1
GROUP BY
Polys.Name
) AS COUNTS
ON
Polys.Name = COUNTS.NeighborhoodName

Add polygon name to points located within the polygon

To enrich the points layer with the information what polygon each point is located within you would need to use the STWithin function. In this example, we will add a new column to the homicides table so we know what neighborhood the crime has been committed.

Again, this query adding a new field and calculating the neighborhood name for the points located within each polygon is what is done by the ArcGIS GP tool Spatial Join.

SELECT TOP 10
Points.OBJECTID
,Points.INCIDENT_D
,Points.BORONAME
,Points.NUM_VICTIM
,Points.PRIMARY_MO
,Points.ID
,Points.WEAPON
,Points.LIGHT_DARK
,Points.YEAR
,Polys.Name AS NeighborhoodName
FROM
dbo.Neighborhoods AS Polys
JOIN
dbo.Homicides AS Points
ON
Points.Shape.STWithin(Polys.Shape) = 1
ORDER BY
OBJECTID
ALTER TABLE dbo.Homicides
ADD NeighborhoodName varchar(50);

UPDATE
Points
SET
[NeighborhoodName] = PointsInPolys.NeighborhoodName
FROM
dbo.Homicides AS Points
JOIN
(
SELECT
Points.OBJECTID
,Points.INCIDENT_D
,Points.BORONAME
,Points.NUM_VICTIM
,Points.PRIMARY_MO
,Points.ID
,Points.WEAPON
,Points.LIGHT_DARK
,Points.YEAR
,Polys.Name AS NeighborhoodName
FROM
dbo.Neighborhoods AS Polys
JOIN
dbo.Homicides AS Points
ON
Points.Shape.STWithin(Polys.Shape) = 1
) AS PointsInPolys
ON
PointsInPolys.ID = Points.ID

Summary statistics and frequency

This is a simplistic implementation of the Frequency GP tool in ArcGIS using a T-SQL stored procedure.

ALTER PROCEDURE dbo.FrequencyTable
@Columns varchar(500)
AS
BEGIN
EXEC ('SELECT COUNT(*) AS ' + @Columns +
' FROM dbo.HOMICIDES
WHERE WEAPON <> '''' AND LIGHT_DARK <> ''''
GROUP BY WEAPON,
LIGHT_DARK ORDER BY FREQUENCY DESC;');
END
GO
EXEC dbo.FrequencyTable 'dbo.HOMICIDES', 'FREQUENCY, WEAPON, LIGHT_DARK';

The Jupyter notebook with the result sets:

 

Adding IPython SQL magic to Jupyter notebook

If you do not use the %%sql magic in your Jupyter notebook, the output of your SQL queries will be just a plain list of tuples. A better way to work with the result sets returned is to draw them as a table with the headers. This is where the IPython SQL magic gets very handy. You can install it using pip install ipython-sql. Refer to its GitHub repository for details of the implementation.

You have to connect to a database and then all your subsequent SQL queries will be aware of this connection and the result sets are also drawn nicely in a table. Another neat feature of the %%sql magic is that you will be able to get the result of your SQL query as a pandas data frame object if you would like to proceed working with the result set using Python. The result object of SQL query execution can be accessed from a variable _. This is because IPython’s output caching system defines several global variables; _ (a single underscore) stores previous output just like the IPython interpreter.

Look into this sample Jupyter notebook for illustration.

 

Using SQL Server constraints and geodatabase domains

Attribute constraints

Many ArcGIS users use geodatabase domains to allow data editors to enter for certain attributes only certain values, either within a range or from a set of coded values. This functionality streamlines data editing and is very helpful indeed. However, having a geodatabase domain set for a field doesn’t actually prevent users from typing in value that is either outside of range or not in the list of coded values.

Entrance of illegal values can be done either programmatically using arcpy or SQL or by editing the attribute table of a feature class using Field Calculator. To be able to find out which features have illegal attribute values, you would need to select all of your features in the editing session and click Editor > Validate Features. This will select features with illegal values.

But what if you would like to let your users pick only certain values when editing the features and prohibit entering any illegal values? To do this, you could use database constraints such as foreign key constraint. In fact, I have already answered this exact question on GIS.SE: Restrict values to domain codes (beyond the attribute table).

In the end of the post, please look at the code snippet of what should be done in SQL. 

Now you can just use GP tool Table To Domain which will let you create a geodatabase domain from the dbo.CityType table as well as add the coded values into it. Then you can assign this domain to a field Type in the Cities feature class using the GP tool Assign Domain To Field.

Now user will get an error window in ArcMap (returned from SQL Server) every time they will try to enter illegal values into the field and save the feature. One thing to keep in mind when embracing this workflow is that you’d need to go to Editor toolbar > Options > Attributes tab and enable the option Display the attributes dialog before storing new features. This is necessary to do if you don’t specify any default value for this field that is stored within the dbo.CityType table. In this case, newly created features will have no value associated with the Type attribute and you won’t be able to digitize a feature on the map without getting the error message.

Spatial constraints

Another thing that may bug you is the geodatabase topology. It’s very handy when you have inherited a large suite of feature classes and you would like to enforce some integrity rules concerning the spatial relationships between features in those feature classes. However, if your data is stored in a multi-user geodatabase, then you could create own rules that would prohibit users from creating features that break those rules. Using ArcGIS geodatabase topology it is still possible to create a feature that would be considered invalid in terms of its relationship with another feature (say school point inside a lake polygon), however the only way to find this out is to validate topology on existing features.

Using SQL Server triggers, it is possible to specify the spatial rules and prevent creation of features that don’t follow these rules. Below is a simple example of a trigger that won’t let ArcMap users to digitize a point on the map to create a point feature that is located outside of the boundaries of the California state.