Using SQL Server constraints and geodatabase domains

Attribute constraints

Many ArcGIS users use geodatabase domains to allow data editors to enter for certain attributes only certain values, either within a range or from a set of coded values. This functionality streamlines data editing and is very helpful indeed. However, having a geodatabase domain set for a field doesn’t actually prevent users from typing in value that is either outside of range or not in the list of coded values.

Entrance of illegal values can be done either programmatically using arcpy or SQL or by editing the attribute table of a feature class using Field Calculator. To be able to find out which features have illegal attribute values, you would need to select all of your features in the editing session and click Editor > Validate Features. This will select features with illegal values.

But what if you would like to let your users pick only certain values when editing the features and prohibit entering any illegal values? To do this, you could use database constraints such as foreign key constraint. In fact, I have already answered this exact question on GIS.SE: Restrict values to domain codes (beyond the attribute table).

In the end of the post, please look at the code snippet of what should be done in SQL. 

Now you can just use GP tool Table To Domain which will let you create a geodatabase domain from the dbo.CityType table as well as add the coded values into it. Then you can assign this domain to a field Type in the Cities feature class using the GP tool Assign Domain To Field.

Now user will get an error window in ArcMap (returned from SQL Server) every time they will try to enter illegal values into the field and save the feature. One thing to keep in mind when embracing this workflow is that you’d need to go to Editor toolbar > Options > Attributes tab and enable the option Display the attributes dialog before storing new features. This is necessary to do if you don’t specify any default value for this field that is stored within the dbo.CityType table. In this case, newly created features will have no value associated with the Type attribute and you won’t be able to digitize a feature on the map without getting the error message.

Spatial constraints

Another thing that may bug you is the geodatabase topology. It’s very handy when you have inherited a large suite of feature classes and you would like to enforce some integrity rules concerning the spatial relationships between features in those feature classes. However, if your data is stored in a multi-user geodatabase, then you could create own rules that would prohibit users from creating features that break those rules. Using ArcGIS geodatabase topology it is still possible to create a feature that would be considered invalid in terms of its relationship with another feature (say school point inside a lake polygon), however the only way to find this out is to validate topology on existing features.

Using SQL Server triggers, it is possible to specify the spatial rules and prevent creation of features that don’t follow these rules. Below is a simple example of a trigger that won’t let ArcMap users to digitize a point on the map to create a point feature that is located outside of the boundaries of the California state.


Load Esri geodatabase tables into SQLite

I have blogged earlier about loading Esri file geodatabase feature classes into SQLite database by using standard geoprocessing tools. This time, I’ve written a short sample that creates a SQLite database, reads a file geodatabase feature class rows, and then creates a new table and insert the features into it. Everything except reading file geodatabase (I use arcpy to access features with a cursor) is done using pure Python and SQL. The logic is rather simple – you read the schema of a geodatabase table and then mirror it inside a SQLite database table. Then you insert the features’ geometry and attributes into right columns.

I was pleased to observe excellent performance of data insertion and query processing. Of course, the initial data load is an overhead that will present in the script, but the efficiency of data manipulation using SQL and spatial analysis using ST_Geometry functions outweighs it, in my opinion. If you will work with rather larger datasets, you should consider tuning the performance of SQLite spatial indices because using ST_Geometry may be initially slow.

It is also worth noting that with some minor modifications, it would be possible to load a non-spatial file geodatabase table into an SQLite database table if you want to perform some SQL-based data aggregation. A very unique feature of SQLite database is that it is possible to generate such a database completely in memory without writing anything on disk. This is suitable for scripts where you want to do the processing on the fly without creating any additional scratch workspaces. Use the syntax for creating SQLite databases in memory (:memory:). I’ve done tests loading the spatial extension and some data and it worked fine.


Changing the order of fields in a geodatabase feature class

At some point, it might be necessary to change the order of fields in a feature class. Perhaps a new field has been added and you want that all your users that will add this feature class to their map document to see the field in a right place when opening the attribute table.

An easy solution that could work is to create a layer file (.lyr) that contains among other things the order of fields that was set up for the map layer at the moment of layer file export. Now your users can add this layer file into ArcMap session to see the fields in the needed order when opening the attribute table or when editing feature class features. Creating and updating layer files can be done with Python and arcpy scripting.

If your data don’t reside in an enterprise geodatabase, such as SQL Server, your only choice is to copy your feature class to a new feature class providing a field mapping object that will contain the information on the needed field order. Then you need to replace the source feature class replacing it with the newly created one. This approach would work for a standalone dataset that is not part of any complex container such as topology, network dataset, or a geometric network and doesn’t have any domains specified for its fields. Otherwise, you would need to take the feature class out of the containers (for instance, you cannot delete a feature class that is part of a network dataset), replace it with the feature class that have the right field order, and then reassign the domains, add feature class to its containers and so forth. Many of those operations cannot be automated which makes it re-ordering the fields in a feature class quite complicated.

If your data reside in a DBMS such as SQL Server, though, you can use your DBMS tools for defining the order of the fields. If you want to provide your users access to a feature class with a particular order of the fields, you could create a view using SQL or ArcGIS geoprocessing tools. However, a view cannot be edited in ArcMap (you won’t be able to update the underlying feature class through a view).

If you do need to change the original feature class field order and let users edit it, you can use the SQL Server. The detailed instructions are available on SO in this answer. This can be done using SSMS without writing any SQL queries.

I have tested this on a feature class that was part of a network dataset and a topology and also had domains associated with its fields. After changing the columns order in SQL Server, I was able to use the feature class in ArcMap and all behaviors associated with its containers seemed to be intact and working properly. The domains were left in place, too.

If you want to take a look at the SQL query that is being run behind the scene, look here at DBA site of SE.

To see the changes in the field order, restart ArcMap, as it seems to cache the order of fields in feature class within the session. I’m not sure whether changing the order of fields for a feature class is something that is officially supported by Esri, so be careful if you decide to do this – always back your data up before doing this.

When writing any code that works with feature class columns, always refer to columns by names and not by their index. If using arcpy and Python, it is super easy to use named tuples (I blogged about this earlier).

Then you are not dependent any longer on the order of fields in your feature class, so if the order will change, as long as a field has the same name, your code won’t break.

Geoprocessing history logging in ArcGIS: performance review

If you have used geoprocessing (further GP) tools in the ArcGIS framework, you are probably familiar with the concept of GP history logging. Essentially, all kinds of GP that you perform using ArcToolbox GP tools are written on the disk in a special folder. You probably already know that you can access the results of your GP tools execution (which are saved only within the map document); they are accessible in the Results window in ArcMap. However, this information is also written to the disk with all sorts of execution metadata including tool’s name, its parameters, the output information, and the environment settings. Please review this Help page – Viewing tool execution history – to learn more.

When working in ArcMap session, whether the GP history will be logged is determined by the GP settings (Geoprocessing menu > Geoprocessing Options). Yet after disabling this setting you may not see the performance boost as it doesn’t cost too much to write some data on the disk – running a GP tool 100 times in isolation produces a log file of about 1MB.

As to Esri docs, however, it might be worth noting that:

Logging can affect tool performance, especially in models that use iterators. For example, you may have a model that runs the Append tool thousands of times, appending features to the same feature class. If logging is enabled, each iteration in the model updates the metadata of the feature class, which slows down the performance of the model.

So, if your models or scripts might execute GP tools many thousands times, this will affect the performance as this information will have to be collected internally by the software and then written on the disk. In a word, it might be worth disabling this option if you don’t need to preserve any metadata of your operations. This might be particularly helpful when authoring large arcpy based scripts where GP tools are run a lot of times.

Keep in mind that according to the Esri docs,

for script tools and stand-alone scripts (scripts run outside of an ArcGIS application—from the operating system prompt, for example), history logging is enabled by default.

Another thing that is good to know is when exactly the writing occurs:

A session is defined by all the work performed from the time you open the application to the time you exit.

So when you run your Python script, the file is created named to the date and time when you’ve started running your script, but the actual data is written only when the script has finished running.

Luckily, there is an arcpy function that lets you disable geoprocessing history logging, arcpy.SetLogHistory(False).

Another thing is that the GP history is also written within the geodatabase itself regardless whether it’s a file based geodatabase or an enterprise geodatabase (ArcSDE). Fortunately, it’s the same setting that controls whether it’s being written – arcpy.SetLogHistory().

When the data you process is stored within a file based geodatabase, the geoprocessing history is being saved within the metadata stored for associated geodatabase object, such as a feature class. Right-clicking the feature class and choosing Item Description won’t let you see the GP history. This is because the standard ArcGIS Item Description style of the metadata view gives you only a simple outlook. In order to view the GP history, go to the Customize > ArcMap Options window > Metadata tab and choose for Metadata Style some other style such as FGDC. Then when right-clicking the feature class and choosing the Item Description, you will be able to see all the GP tools that were run on this feature class under the Geoprocessing History section.

As to practical numbers, I’ve created an empty file geodatabase, loaded into a polygon feature class with a dozen of polys and then run the Calculate Field GP tool 1,000 times calculating the same field over and over again. I’ve run this loop multiple times and have seen the stable increase of the file geodatabase in size with 300KB every 1,000 GP tool executions.

When processing data stored within an enterprise geodatabase (with the help of any DBMS supported) that is sometimes referred to as an ArcSDE geodatabase, keep in mind that when running any kind of GP tools on a geodatabase object such as a table or a feature class, the metadata of the tool execution is also being written as XML into the GDB_Items database table within the Documentation column (of XML type). This XML metadata can get fairly large as it contains the information about the tool name, input and output parameters, and some more. Single execution of a GP tool will add one line of XML; so again, if you run a lot of the GP tools on your feature class, the XML metadata stored can get very large and the queries to this database table will take longer to process because it will take more and more time to write/read the XML data. To give you a feeling of the size increase, running the Calculate Field GP tool 1,000 times made the GDB_Items table 500KB larger (I am on SQL Server and I’ve run the exec sp_spaceused 'sde.GDB_Items'). I’ve run this loop multiple times and have seen roughly the same increase in size.

This logging of the feature class metadata can also be easily disabled either in ArcMap by switching off this option in the Geoprocessing Options window or by using the arcpy.SetLogHistory(False) function when running your Python scripts. Esri has a KB HowTo: Automate the process of deleting geoprocessing history that can help you automate cleaning up the GP history metadata from your feature classes. The basic workflow is to export the metadata excluding the GP history and then import it back. This is the only workflow I can think of if you are using a file based geodatabase (apart from re-creating the feature class which will also drop the GP history metadata). With an ArcSDE geodatabase, you can use SQL to clean up the GDB_Items table deleting the content of the Documentation column for the chosen feature classes. You would need to parse the XML to clean up the metadata/Esri/DataProperties/lineage/Process resource as you might want to preserve other metadata information.

Remember as always to test your workflows in the sandbox environment before applying any changes in the production database.

ArcPy geoprocessing package

Esri has introduced in ArcGIS 10 a new geoprocessing package ArcPy, which is based on Python – a popular open scripting language. This module is supposed to replace the previously used in ArcGIS 9.3 arcgisscripting module and contains a plenty of new functions that could help you to be more efficient when managing your GIS workflows.

Today, I would like to focus on using the arcpy.mapping module. This module is included in the ArcPy package and its primary use is to manage map documents (.mxd) and layer files (.lyr). This module can be used very effectively when data sources in mxd map documents need to be updated or redirected to some other locations.

To learn more about the arcpy.mapping module, please refer to the Esri help page: Updating and fixing data sources with arcpy.mapping

Of course, you can change the source paths using ArcCatalog, yet it does not provide the same level of access to underlying objects as the ArcPy package. Please refer to the Esri Knowledge Base article and Help pages to learn more how to change data sources in your .mxd map documents:

HowTo: Use the ‘Set Data Source’ tool for map documents in ArcCatalog and Setting data sources respectively.

Below is the script which can be used for changing multiple SDE data sources in one .mxd map document.

import arcpy
# Define the path to the .mxd map document you want to update
mxd = arcpy.mapping.MapDocument(r"C:\Temp\Esri_sde_service.mxd")

# Define which connection strings will be changed. Here there is just one connection string being changed.

# It is possible to add more connection strings that have to be changed in the same map document

# The path where the new .mxd map document will be saved to.
del mxd

Here is the script which can be used for changing multiple SDE data sources in multiple.mxd map documents.

import arcpy, os
# Define the path to the folder where multiple .mxd map documents are stored
folderPath = r"C:\GIS\Maps"
for filename in os.listdir(folderPath):
fullpath = os.path.join(folderPath, filename)
if os.path.isfile(fullpath):
basename, extension = os.path.splitext(fullpath)
if extension.lower() == ".mxd":
mxd = arcpy.mapping.MapDocument(fullpath)

# Define which connection strings will be changed. Here there are two connection strings being changed. One for vector data
mxd.findAndReplaceWorkspacePaths(r"C:\Users\username\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog\Production@sde.sde", r"C:\Users\username\AppData\Roaming\ESRI\Desktop10.0\ArcCatalog\Development@sde.sde")

# and second one for raster data which may have used another connection string.
mxd.findAndReplaceWorkspacePaths(r"C:\Documents and Settings\username\Application Data\ESRI\ArcCatalog\raster_data.sde", r"C:\Documents and Settings\username\Application Data\ESRI\Desktop10.0\ArcCatalog\raster_dataV10.sde")

# You can use even more defitions of the connection strings that will be changed. Note that here is the update of the path from 9.3 to 10 is going on which you can see since the path to the .sde file is different (update from 9.3 to 10)
mxd.findAndReplaceWorkspacePaths(r"C:\Documents and Settings\username\Application Data\ESRI\ArcCatalog\TopologyData.sde", r"C:\Documents and Settings\username\Application Data\ESRI\Desktop10.0\ArcCatalog\TopologyData_manager.sde")
del mxd

Configuring ArcGIS Server for SQL Server course

Recently I have been teaching an ArcGIS Server Enterprise Configuration and Tuning for SQL Server course that Esri also now offers for ArcGIS 10. Originally, it was designed for the previous versions, but it has been rewritten and updated in order to reflect the new interface and functionality of the version 10.

The course is basically about how to manage the GIS data in a SQL Server database efficiently, as well as how to configure and optimize the ArcSDE for Microsoft SQL Server in order to achieve the best possible performance. I found the course to be concise, yet very informative. It gives a very good understanding of what ArcSDE and Microsoft SQL Server are, what functionality each of them provides, and how they are integrated in the ArcGIS Server system. I think one would learn quite a lot should one go to this course, coupled with the Data Management in the Multiuser Geodatabase course, because the latter course gives an idea about certain topics, which are not covered in the exercises of the former – such as: archiving, replication, setting permissions to the datasets, and managing versions.

Here are some additional reflections that I have concerning the course material, which I think might be valuable to others:

1. Quite an important topic when working with ArcSDE is how the spatial database definitions (which are available in the Catalog tree under the Spatial Database Connections folder in the ArcCatalog) are stored on the disk. These files are stored in the system user folder and can be copied or moved to another place and thereafter used for adding data or setting up the source paths to other applications like ArcMap.

2. Even though, in ArcSDE 10.1 release, Esri is supposed to provide graphical interface for managing ArcSDE instances, which will be a great time-saver, I still think that it is worth familiarizing yourself with the most commonly used ArcSDE commands.

3. Another issue that might arise after the data have been registered as versioned is how to unregister the data to be versioned. There is no such option to do this directly in the context menu, so one has to go to the Customize menu and add the Unregister as versioned button (under Geodatabase Tools category) to the user interface and use it when needed.

4. To be aware of spatial reference management in the geodatabase is very important too. A feature dataset can contain feature classes in only one coordinate system, so it is impossible to store feature classes with different a coordinate system in the same feature dataset. When importing a feature class from another geodatabase or a stand-alone shapefile to a feature dataset, it will be projected on-the-fly to match the feature dataset’s spatial reference system.

Creating ArcSDE services

Today I would like to discuss a fairly typical scenario, which is when you have worked with an ArcSDE database on a development server and then have backed it up and restored it on a production server. After running all the required scripts and eventually after doing user mappings, you will have moved to the stage of ArcSDE service creation.

In ArcSDE for SQL Server post installation, you choose Custom > Create ArcSDE service option only if you have installed repository and have created an sde user. The tricky part here is choosing a name for the ArcSDE service. You cannot have several sde services with the same name on one machine. Please check services.sde file in <ArcGIS installation folder>\ArcSDE\sqlexe\etc\services.sde and Windows services file (<diskdrive>:\<Windows installation folder>\system32\drivers\etc\services).

Be aware that if you already have a service with the name “Project3”, for instance, you cannot create a service “Project”, since the software will parse the services file and find the “Project” string there. ArcSDE will inform you if there are any similar entries in the services file by returning a message window: