Changing the order of fields in a geodatabase feature class

At some point, it might be necessary to change the order of fields in a feature class. Perhaps a new field has been added and you want that all your users that will add this feature class to their map document to see the field in a right place when opening the attribute table.

An easy solution that could work is to create a layer file (.lyr) that contains among other things the order of fields that was set up for the map layer at the moment of layer file export. Now your users can add this layer file into ArcMap session to see the fields in the needed order when opening the attribute table or when editing feature class features. Creating and updating layer files can be done with Python and arcpy scripting.

If your data don’t reside in an enterprise geodatabase, such as SQL Server, your only choice is to copy your feature class to a new feature class providing a field mapping object that will contain the information on the needed field order. Then you need to replace the source feature class replacing it with the newly created one. This approach would work for a standalone dataset that is not part of any complex container such as topology, network dataset, or a geometric network and doesn’t have any domains specified for its fields. Otherwise, you would need to take the feature class out of the containers (for instance, you cannot delete a feature class that is part of a network dataset), replace it with the feature class that have the right field order, and then reassign the domains, add feature class to its containers and so forth. Many of those operations cannot be automated which makes it re-ordering the fields in a feature class quite complicated.

If your data reside in a DBMS such as SQL Server, though, you can use your DBMS tools for defining the order of the fields. If you want to provide your users access to a feature class with a particular order of the fields, you could create a view using SQL or ArcGIS geoprocessing tools. However, a view cannot be edited in ArcMap (you won’t be able to update the underlying feature class through a view).

If you do need to change the original feature class field order and let users edit it, you can use the SQL Server. The detailed instructions are available on SO in this answer. This can be done using SSMS without writing any SQL queries.

I have tested this on a feature class that was part of a network dataset and a topology and also had domains associated with its fields. After changing the columns order in SQL Server, I was able to use the feature class in ArcMap and all behaviors associated with its containers seemed to be intact and working properly. The domains were left in place, too.

If you want to take a look at the SQL query that is being run behind the scene, look here at DBA site of SE.

To see the changes in the field order, restart ArcMap, as it seems to cache the order of fields in feature class within the session. I’m not sure whether changing the order of fields for a feature class is something that is officially supported by Esri, so be careful if you decide to do this – always back your data up before doing this.

When writing any code that works with feature class columns, always refer to columns by names and not by their index. If using arcpy and Python, it is super easy to use named tuples (I blogged about this earlier).

Then you are not dependent any longer on the order of fields in your feature class, so if the order will change, as long as a field has the same name, your code won’t break.

Esri enterprise geodatabase and PostGIS database

PostGIS is an amazing extension to PostgreSQL which makes it possible to manage geospatial data very efficiently. Just in several mouse clicks, after installing the PostgreSQL for your operating system and the PostGIS extension, you will be able to create databases and load geospatial datasets into it for managing later on.

If you are just starting learning about PostGIS, consider going through this excellent tutorial from Boundless (frankly, the best one I’ve seen so far).

You can do pretty amazing things right in your PostGIS database using ST SQL spatial functions. This is handy if your data does reside in the database and you are more comfortable using SQL rather than Python or any other language to access the database and then do the analysis using a geospatial library such as geopandas. Just for your reference, though, there is a Python package GeoAlchemy that can let you work with the PostGIS data and ST functions right from the Python ecosystem. Another Python package, geopandas, can read PostGIS spatial table into a GeoDataFrame so you can work with your data in Python just like you would work with a pandas Data Frame.

For ArcGIS users, it might be also handy to be able to create an enterprise multiuser geodatabase in PostgreSQL database (with PostGIS extension enabled) and then use SQL to do the analysis when you prefer scripting certain workflows without using geoprocessing tools or arcpy scripting. An important note: be sure to create your geodatabase feature classes using a configuration keyword PG_GEOMETRY. Storing your data with ST_GEOMETRY type won’t let you run PostGIS SQL spatial functions on the data.

Please refer to this help page for details on this. You essentially need to create a PostGIS database first (using psql shell or pgAdmin GUI) and then run the Create Enterprise Geodatabase geoprocessing tool. Again, you will be able to load your Esri geodatabase feature class stored in the PostGIS database using the PG_GEOMETRY keyword into a geopandas data frame. Keep in mind though, that to create an ArcGIS enterprise geodatabase in PostGIS, you would need to have an ArcGIS Server license.

I think having an ArcGIS multiuser geodatabase in PostGIS is very appealing because you might have a bunch of other software either serving the spatial datasets (such as GeoServer / ArcGIS Server) or accessing it via SQL in a desktop GIS (such as QGIS / ArcGIS Desktop). Having all the data stored in one place will make it easier to manage and maintain it as you don’t need to propagate the updates among multiple data repositories. Having an Esri complied geodatabase repository with all the powerful features it has to offer (such as geometric networks, linear referencing, geocoding) along with a chance to be able to pull all the data with pure SQL is a very agile concept many would find suitable for their organizations.

Using R scripts as geoprocessing script tools in ArcGIS

If you are a user of R and ArcGIS, you might have wanted to execute the R scripts on your GIS datasets stored within the geodatabases either as feature classes. Recently, Esri has released an ArcGIS R binding (namely a package, arcgisbinding) which makes it possible to create a custom ArcGIS toolbox with a script tool which has its source pointed not to a .py file (for Python script tool), but to an .r file (for R script).

The installation process is outlined at the Esri GitHub page. If you follow the defaults during the installation, make sure you copy the installed arcgisbinding package to the R installation folder, otherwise you will be able to access your ArcGIS datasets within R and RStudio, but your R scripts won’t be executed when run in ArcMap. The detailed steps for fixing this are here.

After you have set everything up, you will be able to do a whole lot of things including but limited to reading Esri file geodatabase data directly from R and loading its datasets into R data frames.

You will also be able to load the geometry of features and accessing their coordinates, which means you can use your favourite R package for working with spatial data (such as sp or rgdal) and then write the output of your operations back into an Esri file geodatabase.

Here is a sample R script which you can point to from a script tool in ArcGIS. Create a script tool with two input parameters (input feature class and input string for state sub region) and one output parameter (.pdf file with the plot).

So, you can run an R script for your Esri based datasets directly from ArcMap. Very handy package many ArcGIS professionals who use R would love to have in their arsenal.

Using conda for geospatial Python development

In the summer, I have got a chance to learn more about various Python packages that are useful  to anyone working with geospatial data or GIS. ArcGIS is no different in this respect, as all the GIS data (such as file geodatabases) accessed with ArcGIS Desktop can be converted to open formats (such as GeoJSON or shapefiles) and used outside of the ArcGIS ecosystem.

To make managing Python packages easier, the whole new framework called Conda has been released. It is available for download for free and it is a great way to manage not only your Python packages (what you have probably done earlier using pip), but also your Python environments (what you have probably done earlier using virtualenv or just by re-installing the ArcGIS because you broke something in the Python base). Let me elaborate on that.

So, you have a fresh installation of ArcGIS Desktop 10.4 which has a Python 2.7.10 set up on your system. If you would like to have a 3rd party Python package installed, such as networkx, you would install it into the Python installation that is used by ArcGIS Desktop. Now you can access arcpy and networkx from your favorite Python IDE. But what if you need to install a Python package that depends on numpy of a certain version, different from the one that arcpy depends on? This means you cannot have them installed side by side in one Python installation. A framework called virtualenv solved this problem by isolating installed packages into separate environments, but you still had to manage your Python installations on your own.

Conda provides similar functionality making it possible to create isolated environments that include both Python installation and packages. Because you control the Python installation, you can install multiple versions of Python (such as 2.7 and 3.5) on the same system in Conda and then switch between them as needed. This can be helpful not only for users who want to use existing Python packages that may require different environments, but also for package developers who can easily test how their packages behave being installed in various environments.

I wanted to test Conda to see how easily GIS related packages can be installed in a fresh Python 2.7.10 environment. This is the workflow I’ve gone through:

1. Download and install Anaconda (which contains Conda manager + many Python packages included). As I have ArcGIS installed on the same machine, I unchecked the check boxes (1) make Anaconda the default Python and (2) add Anaconda’s Python to the Windows PATH.
2. To learn how to get started using Anaconda, go through this 30 minutes test drive.
3. Keep open conda cheetsheat.
4. I created a Python 2.7.10 environment with the following packages:

  • fiona
  • geopy
  • descartes
  • shapely
  • cartopy
  • pysal
  • pyproj
  • basemap
  • geopandas
  • bokeh
  • vincent
  • Folium
  • Scipy
  • networkx

4.1 Create environment:

conda create -n spatialbox python=2.7.10

4.2. Install fiona:

conda install fiona

4.3. Install packages listed in a text file:

conda install -c conda-forge –file custom_packages.txt

 

custom_packages.txt contents:

geopy
descartes
shapely
cartopy
pysal
pyproj
basemap
bokeh
Folium
seaborn
networkx
Ipykernel

5. Lastly, I run

conda install -c conda-forge geopandas

After you have installed these packages, you would likely want to test all of them have been installed properly. I have shared a Jupyter notebook Python code that contains primitive calls to each of these packages so you can see that they work well.

This suite contains practically everything you might need dealing with geospatial data. As there is a lot to cover with respect to every individual package, I decided to have separate posts on each of them.

Now say you would like to use arcpy package along with the geopy package. As these two packages have been installed in separate Python environments, you have to tell the Python installations where to look for packages in addition to the core site-packages folder. This is where instructions written by Curtis Price on how to get to arcpy from anaconda and backwards are very helpful. Please refer to the step 3 of this document. Curtis’ contribution on this topic has been invaluable to me. Another blog post will also guide on installation of Anaconda and the environments.

Now you can create a Python project in your Python IDE and then specify that you want to use your newly created Python environment with all the geospatial packages as your Python executable. Because you have specified in the .pth file of your environment that Python has to look inside site-packages of the ArcGIS Python folder, you will be able to import arcpy.

If installing separate Anaconda sounds too much for you, you can consider another alternative. ArcGIS Pro 1.3 installation comes with Conda which means you will be able to use Conda directly after installing ArcGIS Pro. Learn more about Conda in ArcGIS Pro. There is a great video from the Dev Summit on how Conda works within ArcGIS Pro installation. The talk’s pdf is available here.

Just in case you are an R user, you can install R and its packages with conda.

Test using conda and I am sure you will love it.

Geoprocessing history logging in ArcGIS: performance review

If you have used geoprocessing (further GP) tools in the ArcGIS framework, you are probably familiar with the concept of GP history logging. Essentially, all kinds of GP that you perform using ArcToolbox GP tools are written on the disk in a special folder. You probably already know that you can access the results of your GP tools execution (which are saved only within the map document); they are accessible in the Results window in ArcMap. However, this information is also written to the disk with all sorts of execution metadata including tool’s name, its parameters, the output information, and the environment settings. Please review this Help page – Viewing tool execution history – to learn more.

When working in ArcMap session, whether the GP history will be logged is determined by the GP settings (Geoprocessing menu > Geoprocessing Options). Yet after disabling this setting you may not see the performance boost as it doesn’t cost too much to write some data on the disk – running a GP tool 100 times in isolation produces a log file of about 1MB.

As to Esri docs, however, it might be worth noting that:

Logging can affect tool performance, especially in models that use iterators. For example, you may have a model that runs the Append tool thousands of times, appending features to the same feature class. If logging is enabled, each iteration in the model updates the metadata of the feature class, which slows down the performance of the model.

So, if your models or scripts might execute GP tools many thousands times, this will affect the performance as this information will have to be collected internally by the software and then written on the disk. In a word, it might be worth disabling this option if you don’t need to preserve any metadata of your operations. This might be particularly helpful when authoring large arcpy based scripts where GP tools are run a lot of times.

Keep in mind that according to the Esri docs,

for script tools and stand-alone scripts (scripts run outside of an ArcGIS application—from the operating system prompt, for example), history logging is enabled by default.

Another thing that is good to know is when exactly the writing occurs:

A session is defined by all the work performed from the time you open the application to the time you exit.

So when you run your Python script, the file is created named to the date and time when you’ve started running your script, but the actual data is written only when the script has finished running.

Luckily, there is an arcpy function that lets you disable geoprocessing history logging, arcpy.SetLogHistory(False).

Another thing is that the GP history is also written within the geodatabase itself regardless whether it’s a file based geodatabase or an enterprise geodatabase (ArcSDE). Fortunately, it’s the same setting that controls whether it’s being written – arcpy.SetLogHistory().

When the data you process is stored within a file based geodatabase, the geoprocessing history is being saved within the metadata stored for associated geodatabase object, such as a feature class. Right-clicking the feature class and choosing Item Description won’t let you see the GP history. This is because the standard ArcGIS Item Description style of the metadata view gives you only a simple outlook. In order to view the GP history, go to the Customize > ArcMap Options window > Metadata tab and choose for Metadata Style some other style such as FGDC. Then when right-clicking the feature class and choosing the Item Description, you will be able to see all the GP tools that were run on this feature class under the Geoprocessing History section.

As to practical numbers, I’ve created an empty file geodatabase, loaded into a polygon feature class with a dozen of polys and then run the Calculate Field GP tool 1,000 times calculating the same field over and over again. I’ve run this loop multiple times and have seen the stable increase of the file geodatabase in size with 300KB every 1,000 GP tool executions.

When processing data stored within an enterprise geodatabase (with the help of any DBMS supported) that is sometimes referred to as an ArcSDE geodatabase, keep in mind that when running any kind of GP tools on a geodatabase object such as a table or a feature class, the metadata of the tool execution is also being written as XML into the GDB_Items database table within the Documentation column (of XML type). This XML metadata can get fairly large as it contains the information about the tool name, input and output parameters, and some more. Single execution of a GP tool will add one line of XML; so again, if you run a lot of the GP tools on your feature class, the XML metadata stored can get very large and the queries to this database table will take longer to process because it will take more and more time to write/read the XML data. To give you a feeling of the size increase, running the Calculate Field GP tool 1,000 times made the GDB_Items table 500KB larger (I am on SQL Server and I’ve run the exec sp_spaceused 'sde.GDB_Items'). I’ve run this loop multiple times and have seen roughly the same increase in size.

This logging of the feature class metadata can also be easily disabled either in ArcMap by switching off this option in the Geoprocessing Options window or by using the arcpy.SetLogHistory(False) function when running your Python scripts. Esri has a KB HowTo: Automate the process of deleting geoprocessing history that can help you automate cleaning up the GP history metadata from your feature classes. The basic workflow is to export the metadata excluding the GP history and then import it back. This is the only workflow I can think of if you are using a file based geodatabase (apart from re-creating the feature class which will also drop the GP history metadata). With an ArcSDE geodatabase, you can use SQL to clean up the GDB_Items table deleting the content of the Documentation column for the chosen feature classes. You would need to parse the XML to clean up the metadata/Esri/DataProperties/lineage/Process resource as you might want to preserve other metadata information.

Remember as always to test your workflows in the sandbox environment before applying any changes in the production database.