Using SQL Server constraints and geodatabase domains

Attribute constraints

Many ArcGIS users use geodatabase domains to allow data editors to enter for certain attributes only certain values, either within a range or from a set of coded values. This functionality streamlines data editing and is very helpful indeed. However, having a geodatabase domain set for a field doesn’t actually prevent users from typing in value that is either outside of range or not in the list of coded values.

Entrance of illegal values can be done either programmatically using arcpy or SQL or by editing the attribute table of a feature class using Field Calculator. To be able to find out which features have illegal attribute values, you would need to select all of your features in the editing session and click Editor > Validate Features. This will select features with illegal values.

But what if you would like to let your users pick only certain values when editing the features and prohibit entering any illegal values? To do this, you could use database constraints such as foreign key constraint. In fact, I have already answered this exact question on GIS.SE: Restrict values to domain codes (beyond the attribute table).

In the end of the post, please look at the code snippet of what should be done in SQL. 

Now you can just use GP tool Table To Domain which will let you create a geodatabase domain from the dbo.CityType table as well as add the coded values into it. Then you can assign this domain to a field Type in the Cities feature class using the GP tool Assign Domain To Field.

Now user will get an error window in ArcMap (returned from SQL Server) every time they will try to enter illegal values into the field and save the feature. One thing to keep in mind when embracing this workflow is that you’d need to go to Editor toolbar > Options > Attributes tab and enable the option Display the attributes dialog before storing new features. This is necessary to do if you don’t specify any default value for this field that is stored within the dbo.CityType table. In this case, newly created features will have no value associated with the Type attribute and you won’t be able to digitize a feature on the map without getting the error message.

Spatial constraints

Another thing that may bug you is the geodatabase topology. It’s very handy when you have inherited a large suite of feature classes and you would like to enforce some integrity rules concerning the spatial relationships between features in those feature classes. However, if your data is stored in a multi-user geodatabase, then you could create own rules that would prohibit users from creating features that break those rules. Using ArcGIS geodatabase topology it is still possible to create a feature that would be considered invalid in terms of its relationship with another feature (say school point inside a lake polygon), however the only way to find this out is to validate topology on existing features.

Using SQL Server triggers, it is possible to specify the spatial rules and prevent creation of features that don’t follow these rules. Below is a simple example of a trigger that won’t let ArcMap users to digitize a point on the map to create a point feature that is located outside of the boundaries of the California state.

Changing the order of fields in a geodatabase feature class

At some point, it might be necessary to change the order of fields in a feature class. Perhaps a new field has been added and you want that all your users that will add this feature class to their map document to see the field in a right place when opening the attribute table.

An easy solution that could work is to create a layer file (.lyr) that contains among other things the order of fields that was set up for the map layer at the moment of layer file export. Now your users can add this layer file into ArcMap session to see the fields in the needed order when opening the attribute table or when editing feature class features. Creating and updating layer files can be done with Python and arcpy scripting.

If your data don’t reside in an enterprise geodatabase, such as SQL Server, your only choice is to copy your feature class to a new feature class providing a field mapping object that will contain the information on the needed field order. Then you need to replace the source feature class replacing it with the newly created one. This approach would work for a standalone dataset that is not part of any complex container such as topology, network dataset, or a geometric network and doesn’t have any domains specified for its fields. Otherwise, you would need to take the feature class out of the containers (for instance, you cannot delete a feature class that is part of a network dataset), replace it with the feature class that have the right field order, and then reassign the domains, add feature class to its containers and so forth. Many of those operations cannot be automated which makes it re-ordering the fields in a feature class quite complicated.

If your data reside in a DBMS such as SQL Server, though, you can use your DBMS tools for defining the order of the fields. If you want to provide your users access to a feature class with a particular order of the fields, you could create a view using SQL or ArcGIS geoprocessing tools. However, a view cannot be edited in ArcMap (you won’t be able to update the underlying feature class through a view).

If you do need to change the original feature class field order and let users edit it, you can use the SQL Server. The detailed instructions are available on SO in this answer. This can be done using SSMS without writing any SQL queries.

I have tested this on a feature class that was part of a network dataset and a topology and also had domains associated with its fields. After changing the columns order in SQL Server, I was able to use the feature class in ArcMap and all behaviors associated with its containers seemed to be intact and working properly. The domains were left in place, too.

If you want to take a look at the SQL query that is being run behind the scene, look here at DBA site of SE.

To see the changes in the field order, restart ArcMap, as it seems to cache the order of fields in feature class within the session. I’m not sure whether changing the order of fields for a feature class is something that is officially supported by Esri, so be careful if you decide to do this – always back your data up before doing this.

When writing any code that works with feature class columns, always refer to columns by names and not by their index. If using arcpy and Python, it is super easy to use named tuples (I blogged about this earlier).

Then you are not dependent any longer on the order of fields in your feature class, so if the order will change, as long as a field has the same name, your code won’t break.

Geoprocessing history logging in ArcGIS: performance review

If you have used geoprocessing (further GP) tools in the ArcGIS framework, you are probably familiar with the concept of GP history logging. Essentially, all kinds of GP that you perform using ArcToolbox GP tools are written on the disk in a special folder. You probably already know that you can access the results of your GP tools execution (which are saved only within the map document); they are accessible in the Results window in ArcMap. However, this information is also written to the disk with all sorts of execution metadata including tool’s name, its parameters, the output information, and the environment settings. Please review this Help page – Viewing tool execution history – to learn more.

When working in ArcMap session, whether the GP history will be logged is determined by the GP settings (Geoprocessing menu > Geoprocessing Options). Yet after disabling this setting you may not see the performance boost as it doesn’t cost too much to write some data on the disk – running a GP tool 100 times in isolation produces a log file of about 1MB.

As to Esri docs, however, it might be worth noting that:

Logging can affect tool performance, especially in models that use iterators. For example, you may have a model that runs the Append tool thousands of times, appending features to the same feature class. If logging is enabled, each iteration in the model updates the metadata of the feature class, which slows down the performance of the model.

So, if your models or scripts might execute GP tools many thousands times, this will affect the performance as this information will have to be collected internally by the software and then written on the disk. In a word, it might be worth disabling this option if you don’t need to preserve any metadata of your operations. This might be particularly helpful when authoring large arcpy based scripts where GP tools are run a lot of times.

Keep in mind that according to the Esri docs,

for script tools and stand-alone scripts (scripts run outside of an ArcGIS application—from the operating system prompt, for example), history logging is enabled by default.

Another thing that is good to know is when exactly the writing occurs:

A session is defined by all the work performed from the time you open the application to the time you exit.

So when you run your Python script, the file is created named to the date and time when you’ve started running your script, but the actual data is written only when the script has finished running.

Luckily, there is an arcpy function that lets you disable geoprocessing history logging, arcpy.SetLogHistory(False).

Another thing is that the GP history is also written within the geodatabase itself regardless whether it’s a file based geodatabase or an enterprise geodatabase (ArcSDE). Fortunately, it’s the same setting that controls whether it’s being written – arcpy.SetLogHistory().

When the data you process is stored within a file based geodatabase, the geoprocessing history is being saved within the metadata stored for associated geodatabase object, such as a feature class. Right-clicking the feature class and choosing Item Description won’t let you see the GP history. This is because the standard ArcGIS Item Description style of the metadata view gives you only a simple outlook. In order to view the GP history, go to the Customize > ArcMap Options window > Metadata tab and choose for Metadata Style some other style such as FGDC. Then when right-clicking the feature class and choosing the Item Description, you will be able to see all the GP tools that were run on this feature class under the Geoprocessing History section.

As to practical numbers, I’ve created an empty file geodatabase, loaded into a polygon feature class with a dozen of polys and then run the Calculate Field GP tool 1,000 times calculating the same field over and over again. I’ve run this loop multiple times and have seen the stable increase of the file geodatabase in size with 300KB every 1,000 GP tool executions.

When processing data stored within an enterprise geodatabase (with the help of any DBMS supported) that is sometimes referred to as an ArcSDE geodatabase, keep in mind that when running any kind of GP tools on a geodatabase object such as a table or a feature class, the metadata of the tool execution is also being written as XML into the GDB_Items database table within the Documentation column (of XML type). This XML metadata can get fairly large as it contains the information about the tool name, input and output parameters, and some more. Single execution of a GP tool will add one line of XML; so again, if you run a lot of the GP tools on your feature class, the XML metadata stored can get very large and the queries to this database table will take longer to process because it will take more and more time to write/read the XML data. To give you a feeling of the size increase, running the Calculate Field GP tool 1,000 times made the GDB_Items table 500KB larger (I am on SQL Server and I’ve run the exec sp_spaceused 'sde.GDB_Items'). I’ve run this loop multiple times and have seen roughly the same increase in size.

This logging of the feature class metadata can also be easily disabled either in ArcMap by switching off this option in the Geoprocessing Options window or by using the arcpy.SetLogHistory(False) function when running your Python scripts. Esri has a KB HowTo: Automate the process of deleting geoprocessing history that can help you automate cleaning up the GP history metadata from your feature classes. The basic workflow is to export the metadata excluding the GP history and then import it back. This is the only workflow I can think of if you are using a file based geodatabase (apart from re-creating the feature class which will also drop the GP history metadata). With an ArcSDE geodatabase, you can use SQL to clean up the GDB_Items table deleting the content of the Documentation column for the chosen feature classes. You would need to parse the XML to clean up the metadata/Esri/DataProperties/lineage/Process resource as you might want to preserve other metadata information.

Remember as always to test your workflows in the sandbox environment before applying any changes in the production database.

Design of WebGIS back-end: architecture considerations

I have spent last two years doing a lot of Python development and designing and implementing Web GIS which included ArcGIS Server, geoprocessing services and ArcGIS API for JavaScript (further JS) web client. What I would like to do is to share an idea which I got to like.

If you need to do something, try doing it at the back-end

Imagine you have a JS web application where users will work with some feature services via a web map. They can select multiple features and calculate the sum of the values features have in a field (or fields). Let’s go through alternatives you have now.

  1. Pre-calculate the values you think your users will query and store them in the database.
    This would work fine actually when you know that your users are going to generate reports on a certain fields often and the performance is crucial. It might actually make sense to calculate certain values beforehand and store them. The disadvantage of this is additional storage and that you need to keep the values updated – the calculated field depends on other fields and their values can change. This would imply re-calculating the report field often as a part of the daily or weekly routine depending on the workflow.
  1. Get the feature’s data from the ArcGIS Server feature service and calculate the requested value on-the-fly in the client.
    Unless you are retrieving complex geometry, this operation wouldn’t cost you much. The problem is that the volume of JS code (or TypeScript) will increase and every upcoming modification in the code would imply new release which can be a painful process if you need to compress your code and move things around. Another thing is that if the amount of data you work with is rather large, there is a good chance the web browser might get slow and the performance will degrade significantly.
  1. Use the database server to calculate the values.
    This became my favorite over last years. This approach has multiple advantages.
    First, this operation runs on the database server machine with enough RAM and CPU resources. So you are not limited by the web browser capacity. The database servers are very good at calculating the values: this kind of operation is very inexpensive because in most cases it does not involve use of cursors. You have a privilege to work in transaction which provides a higher level of data integrity (it would be hard to mess up the database since you can roll back).
    Second, you can use SQL. It might not sound as an advantage first, but remember that code is written once, but is read many times. Readability counts. SQL is a clean way of communicating the workflow and the database code (such as stored procedures) is very easy to maintain. Unlike JS, you work with just one database object and don’t really have any dependencies on the system provided that you have a database server of a certain version and privileges required to create and execute stored procedures.
    Finally, allowing the database server do the work for you, you expose a certain procedure to other clients which could work with it. You don’t need to modify the client code and by updating the SQL code at one place, you automatically make it available for all the applications that work with it.

How to be efficient as a GIS professional (part 3)

6. Automate, automate, automate

Whatever you are doing, take a second to think whether you will need to run the sequence of steps you’ve just completed again. It may seem first that you are very unlikely to run the same sequence of steps again, but in fact you may find yourself performing them over and over again later on.

Automating is not only about saving the time. It is also about the quality assurance. When you are doing something manually, there is always a chance to forget a certain step or detail which can potentially lead to an error. When having the workflow automated, you can always see what steps are being performed. An automated workflow is already a piece of documentation which you can share with others or use yourself as a reference.

Don’t trust your memory: you think you know what columns you’ve added to the table and why, yeah. Get back in two weeks and you will be surprised by how much of those memories you have left. If you will leave the job and get the work over to a new person, she will be happy to inherit a well maintained documentation and discrete description of the workflow he will be responsible for.

Considering desktop GIS automation, think about using Python for geospatial operations (think truncating tables + appending new data + perform data checks). For database automation, use SQL (add new columns + alter columns data type). Feel free to build SQL scripts with commands for adding/deleting/calculating columns and copying data, too. By preserving those scripts, you will always be able to re-run them on a another table, in another database or modify the script to match your needs. This gives you a way into looking at changes performed in your database. This is just like adding a field manually and then writing down that you have added field of type X into table Y at time Z. It is just so much easier to build a SQL script to avoid doing that.

7. SQL, SQL, SQL

Another advantage of the SQL for data processing is that it is very vendor neutral and can be executed either as is or with really minor adjustments on most DBMS platforms. This is applicable to SQL spatial functions which provide ISO and OGC compliant access to the geodatabase and database, too. Being able to execute SQL queries and perform data management operation is really advantageous when you work in a large IT environment. This might be helpful because you won’t always have the network connection to the production environment for data update and using ArcGIS might not be possible. Running a Python script would require having the Python installation on some machine and if you use arcpy – ArcGIS Desktop. Running a SQL code which has no dependencies might be your only alternative.

Many folks don’t know that one can use pure SQL with an enterprise geodatabase stored in any DBMS supported. This is just a short list of what you can do with SQL:

8. Python, Python, Python

I have blogged about using spatial functions of SQL Server earlier. Remember that you can also execute some of the SQL from Python code when using the arcpy.ArcSDESQLExecute class. Here is the SQL reference for query expressions used in ArcGIS some of which you can use in the arcpy.da cursors where clauses. Learn some of the useful Python libraries which could save you some time. Look at:

  • Selenium for automating ftp data download if this happens often and you have to browse through a set of pages;
  • scipy.spatial module for spatial analysis such as building Voronoi diagrams, finding distances between arrays, construct convex hulls in N dimensions and doing many other things;
  • Numpy, a fundamental package for scientific computing with Python, for handling huge GIS datasets (both vectors and rasters) with arcpy.

Read more about What are the Python tools/modules/add-ins crucial in GIS and watch an Esri Video on Python: Useful Libraries for the GIS Professional.

Get a chance to learn more about the SQL and Python and how you could take advantage of them in your work!

SQL Server spatial functions for GIS users

If you have been using SQL Server for some time, you’ve probably heard of the spatial data support. This might be particularly interesting for anyone who is using any desktop GIS for data management and analysis. If you are an ArcGIS user and have enterprise geodatabases stored within SQL Server databases, you might have wondered whether it is possible to interact with the spatial data. This is useful when you don’t have a chance to use ArcMap to access the database due to some restrictions (permissions, network connections or software compatibility).

Well, you actually can do a whole lot with your geographic data just with SQL. It is important that you define the Shape field as of the Geometry/Geography data type. For most of the GIS work, you’d probably choose Geometry type which represents data in a Euclidean (flat) coordinate system. As soon as you have a geodatabase feature class which has the Shape field defined as of Geometry type, you can use native SQL Server tools to interact both with the feature class attributes and geometry.

Beginning with ArcGIS 10.1, feature classes created in geodatabases in SQL Server use the Microsoft Geometry type by default. To move your existing feature classes to the Geometry storage type, use the Migrate Storage geoprocessing tool or a Python script.

Alright, so after you have copied your file geodatabase feature class into a SQL Server geodatabase, you are ready to use native SQL to interact with the spatial data.

Let’s select all the features from the Parcels feature class.

SELECT * FROM dbo.PARCELS

Because we have a SHAPE column that of Geometry type, we get another tab in the results grid – Spatial results. There you can see your geometries visualized.

Microsoft SQL Server Management Studio

Let’s see what coordinate system our feature class was defined in.

DECLARE @srid INT = (SELECT TOP 1 shape.STSrid FROM dbo.PARCELS)
SELECT @srid AS SRID,
srtext AS Name FROM sde.SDE_spatial_references WHERE auth_srid = @srid

Here we use <GeometryColumnName>.STSrid to get the spatial reference id (SRID) of the coordinate system of the first feature. Because our geographic data is stored in a projected coordinate system (and Geometry type), we cannot get its name by using core SQL Server spatial references table, sys.spatial_reference_systems.

Here is why:

The coordinate systems in this table are for the geography type only as it contains information about the ellipsoid that is required to perform calculations. No such information is required to perform calculations for projected coordinate systems on the plane used by the geometry type, so you are free to use any reference system you like. For the calculations done by the geometry type, it is the same no matter what you use.

Let us explore next what kind of geometry is stored within a table. It is possible to store different types of geometry (such as polygon and polyline) within one table in SQL Server.

Let us see if it is true:

SELECT Id,GeomData AS Geometry,GeomData.STAsText() AS GeometryData
FROM [testgdb].[dbo].[GeneralizedData]

Microsoft SQL Server Management Studio

Yes indeed we store in one table features of different geometry and SQL Server has no problems with that. To be able to visualize this table in ArcMap though, you would need to use a query layer which is basically stand-alone table that is defined by a SQL query. ArcGIS can only handle having one type of geometry stored within each feature class which is why you will get a choice to pick what type of geometry do you want to look at.

New Query Layer After adding this layer into ArcMap, you will be able to see the polygons (provided you’ve chosen the polygons). The query layer is in read-only, so you cannot edit features in ArcMap. If you have a SQL Server table (non-registered with geodatabase) with multiple types of geometries stored, you will be able to switch easily between by adding multiple query layers into ArcMap defining what kind of geometry you want to work with.

Let us keep working with a geodatabase feature class which has only polygons. Let’s check if it’s true:

SELECT Shape.STGeometryType() AS GeometryType FROM dbo.PARCELS

Alright, so we know already what kind of coordinate system the data is stored in and we know that there are polygons. Let us get the perimeter (length) and the area of those polygons.

SELECT PARCEL_ID, SHAPE.STArea() AS Area,
SHAPE.STLength() AS Perimeter
FROM dbo.PARCELS 

Microsoft SQL Server Management StudioWe can also get the coordinates of each polygon within our feature class. Note that that the start point and the end point are identical – that is because each polygon is considered to be closed:

SELECT Shape.STAsText() AS GeometryType FROM dbo.PARCELS

This is what we will get:
POLYGON ((507348.9687482774 687848.062502546, 507445.156252367 687886.06251058145, 507444.18750036607 687888.56250258372, 507348.9687482774 687848.062502546))

There are similar functions such as .STAsBinary() which returns the Open Geospatial Consortium (OGC) Well-Known Binary (WKB) representation of a geometry instance and .AsGml() which returns the Geography Markup Language (GML) representation of a geometry instance.

We can also check the number of vertices per polygon:

SELECT PARCEL_id,
Shape.STAsText() AS GeometryDesc,
Shape.STNumPoints() AS NumVertices
FROM dbo.PARCELS
ORDER BY NumVertices DESC

Microsoft SQL Server Management StudioAlright, that was probably enough querying data. Let us check what kind of GIS analysis is available to us with native SQL. The easiest way to get started is probably to process the features of a feature class and then write the resultant geometries into a new table.

DROP TABLE [ParcelEnvelope]
CREATE TABLE [dbo].[ParcelEnvelope]([Id] [int] NOT NULL,
[PolyArea] int,[GeomData] [geometry] NOT NULL) ON [PRIMARY]

INSERT INTO ParcelEnvelope (Id,GeomData,PolyArea)
SELECT PARCEL_ID AS Id,
SHAPE.STEnvelope() AS GeomData,
SHAPE.STArea() AS PolyArea
FROM dbo.PARCELS
ORDER BY OBJECTID

This will create a new table where the envelopes of each parcel polygon will be written to.

Feature envelopeLet us do some buffers on road centerlines geodatabase feature class:

DROP TABLE [RoadBuffer]
CREATE TABLE [dbo].[RoadBuffer]([Id] [int] NOT NULL,
[GeomData] [geometry] NOT NULL) ON [PRIMARY]

INSERT INTO [RoadBuffer] (Id,GeomData)
SELECT OBJECTID AS Id,
SHAPE.STBuffer(50)
FROM dbo.Road_cl
ORDER BY OBJECTID

You can of course write newly generated features into a geodatabase feature class, not just a SQL Server database table. You need to create a new polygon feature class and then run the SQL below. This will create buffer zones for every line found in the Road_cl feature class.

DELETE FROM FC_ROADBUFFERS
INSERT INTO FC_ROADBUFFERS(OBJECTID,SHAPE)
SELECT OBJECTID AS OBJECTID,
SHAPE.STBuffer(50) AS SHAPE
FROM dbo.Road_cl
ORDER BY OBJECTID

Please refer to the Microsoft Geometry Data Type Method Reference to get a full list of available functions and more detailed description.

Try doing some other analysis such as finding what features intersect or overlap or how many points are located within a certain polygon. There is so much you can do! To learn more, get a book Beginning Spatial with SQL Server 2008 which has tons of examples and will also help you understand the spatial data structure basics. I have read this book and really liked it. I think it is a must read for anyone using spatial SQL.

I hope this short introduction into what you as a GIS user can do with SQL Server will help you take advantage of using the native SQL functions wherever using a desktop GIS is not an option.

More useful SQL Server commands

When you modify a table in SQL Server (I still use mostly SQL Server 2005) and try to uncheck an Allow Nulls option for a certain field (thus prohibiting having NULL values in this column), the following error may occur:

Cannot insert the value NULL into column ‘<columnname>’, table ‘<dbname>.dbo.Tmp_<tablename>’; column does not allow nulls. INSERT fails. The statement has been terminated.

This error message appears because there are already null values in the column. So, what you should do is update the data in the table, so that this column does not contain any null values. If the column with null values is an integer (int) type you might update it with this SQL statement:

UPDATE <databasename>.dbo.<tablename>

SET <integercolumnname> = -1

Then you can go to the design mode of the table again and uncheck Allow Nulls option. To prohibit having NULL values by using an SQL command you could type:

ALTER TABLE <tablename>

ALTER COLUMN <columnname> <datatype> NOT NULL

Example:

ALTER TABLE Employees

ALTER COLUMN EmpName nvarchar(40) NOT NULL

This will force users not to leave this field in the table empty.

Dealing with null values in geodatabases feature classes

It is impossible to change Allow NULL values option for a field in a feature class, which already contains data. The only possible solution, to my knowledge, is to create a new empty feature class, define all required fields and data types and then set Allow NULL values appropriately. Thereafter, the load data from the original feature class to a newly created one has to be done. More info in the ESRI KB #34782.

How to add UNIQUE constraint for a field in SQL Server table

Sometimes it is necessary to allow users to add only unique values to a certain column. This can be done by running this command:

alter table <tablename>

add unique (<columname>)

It is worth noting that, in ArcMap, when you edit attribute fields of an SDE feature class for which you have set the unique constraint in SQL Server, and try to enter an existing value in a field with defined unique constraint, you will get an error message and the value will not be entered.

Adding a non-unique values

Thus, you can enforce the integrity of your data on ArcMap and DBMS level at the same time.

Update: use the Attribute Inspector (available from the Editing toolbar > Editor > Editing Options > Attributes tab. Check the Display the attributes dialog before storing new features option. This will allow you to enter attributes prior to writing rows to your database at the DBMS level.

If you are unsure what constraints have been defined for a field, you might like using the sp_help command, which will give you all the information about the table.

Just start a new query and run

sp_help ‘<tablename>’

Example:

sp_help ‘Employees’

How to select only certain rows from a table

If you need to select rows that match a specific query you might like using this syntax:

select * from <table>

where <columnname> in (value1, value2, value3)

Example:

select * from Employees

where EmplRegion in (South, Central, North)

If you need to perform a more complex query, then it might be useful to give this syntax a try:

select * from <table>

where <columnname> in (select <columnname> from <tablename> where <columnname> is not null)

Example:

select * from Employees

where EmplRegion in (select Districts from NorthDistricts where ReportedBy is not null)

This query will allow you to select all rows from the Employees’ table based on the name of the district where a report has been submitted.

To get more tips on using SQL Server I would recommend visiting http://www.sql-server-helper.com – which is a great web site containing a lot of useful information.