Geoprocessing history logging in ArcGIS: performance review

If you have used geoprocessing (further GP) tools in the ArcGIS framework, you are probably familiar with the concept of GP history logging. Essentially, all kinds of GP that you perform using ArcToolbox GP tools are written on the disk in a special folder. You probably already know that you can access the results of your GP tools execution (which are saved only within the map document); they are accessible in the Results window in ArcMap. However, this information is also written to the disk with all sorts of execution metadata including tool’s name, its parameters, the output information, and the environment settings. Please review this Help page – Viewing tool execution history – to learn more.

When working in ArcMap session, whether the GP history will be logged is determined by the GP settings (Geoprocessing menu > Geoprocessing Options). Yet after disabling this setting you may not see the performance boost as it doesn’t cost too much to write some data on the disk – running a GP tool 100 times in isolation produces a log file of about 1MB.

As to Esri docs, however, it might be worth noting that:

Logging can affect tool performance, especially in models that use iterators. For example, you may have a model that runs the Append tool thousands of times, appending features to the same feature class. If logging is enabled, each iteration in the model updates the metadata of the feature class, which slows down the performance of the model.

So, if your models or scripts might execute GP tools many thousands times, this will affect the performance as this information will have to be collected internally by the software and then written on the disk. In a word, it might be worth disabling this option if you don’t need to preserve any metadata of your operations. This might be particularly helpful when authoring large arcpy based scripts where GP tools are run a lot of times.

Keep in mind that according to the Esri docs,

for script tools and stand-alone scripts (scripts run outside of an ArcGIS application—from the operating system prompt, for example), history logging is enabled by default.

Another thing that is good to know is when exactly the writing occurs:

A session is defined by all the work performed from the time you open the application to the time you exit.

So when you run your Python script, the file is created named to the date and time when you’ve started running your script, but the actual data is written only when the script has finished running.

Luckily, there is an arcpy function that lets you disable geoprocessing history logging, arcpy.SetLogHistory(False).

Another thing is that the GP history is also written within the geodatabase itself regardless whether it’s a file based geodatabase or an enterprise geodatabase (ArcSDE). Fortunately, it’s the same setting that controls whether it’s being written – arcpy.SetLogHistory().

When the data you process is stored within a file based geodatabase, the geoprocessing history is being saved within the metadata stored for associated geodatabase object, such as a feature class. Right-clicking the feature class and choosing Item Description won’t let you see the GP history. This is because the standard ArcGIS Item Description style of the metadata view gives you only a simple outlook. In order to view the GP history, go to the Customize > ArcMap Options window > Metadata tab and choose for Metadata Style some other style such as FGDC. Then when right-clicking the feature class and choosing the Item Description, you will be able to see all the GP tools that were run on this feature class under the Geoprocessing History section.

As to practical numbers, I’ve created an empty file geodatabase, loaded into a polygon feature class with a dozen of polys and then run the Calculate Field GP tool 1,000 times calculating the same field over and over again. I’ve run this loop multiple times and have seen the stable increase of the file geodatabase in size with 300KB every 1,000 GP tool executions.

When processing data stored within an enterprise geodatabase (with the help of any DBMS supported) that is sometimes referred to as an ArcSDE geodatabase, keep in mind that when running any kind of GP tools on a geodatabase object such as a table or a feature class, the metadata of the tool execution is also being written as XML into the GDB_Items database table within the Documentation column (of XML type). This XML metadata can get fairly large as it contains the information about the tool name, input and output parameters, and some more. Single execution of a GP tool will add one line of XML; so again, if you run a lot of the GP tools on your feature class, the XML metadata stored can get very large and the queries to this database table will take longer to process because it will take more and more time to write/read the XML data. To give you a feeling of the size increase, running the Calculate Field GP tool 1,000 times made the GDB_Items table 500KB larger (I am on SQL Server and I’ve run the exec sp_spaceused 'sde.GDB_Items'). I’ve run this loop multiple times and have seen roughly the same increase in size.

This logging of the feature class metadata can also be easily disabled either in ArcMap by switching off this option in the Geoprocessing Options window or by using the arcpy.SetLogHistory(False) function when running your Python scripts. Esri has a KB HowTo: Automate the process of deleting geoprocessing history that can help you automate cleaning up the GP history metadata from your feature classes. The basic workflow is to export the metadata excluding the GP history and then import it back. This is the only workflow I can think of if you are using a file based geodatabase (apart from re-creating the feature class which will also drop the GP history metadata). With an ArcSDE geodatabase, you can use SQL to clean up the GDB_Items table deleting the content of the Documentation column for the chosen feature classes. You would need to parse the XML to clean up the metadata/Esri/DataProperties/lineage/Process resource as you might want to preserve other metadata information.

Remember as always to test your workflows in the sandbox environment before applying any changes in the production database.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s