Using Python in FME: WorkspaceRunner and fmeobjects

Are you an FME user and fond of Python? If you are, you might like learning more about use of Python in FME. FME is a great ETL software that can help streamline various GIS workflows. One of the neat features about FME that I like most is that it’s possible to call FME workbenches (think of workbenches as of ArcGIS ModelBuilder models) from Python supplying input arguments.

I have been using this approach for some of my workflows and it works really well. It’s fast and easy to customize. For some of the workflows, there might be a way to implement this purely in FME workbench, but I like calling things from Python because this lets me integrate FME workbenches into other non-GUI based operations.

This is the sample code for creating multiple buffers (basically Multiple Ring Buffer GP tool in ArcGIS) for the single input shapefile.

Calling an FME workbench from Python is super easy. You need to do just a couple of things:

* add fmeobjects to your path: this can be done in various ways; the one I found the most useful is to use the sys.path.append and specifying the path to the FME Python installation. The file we have to import from this folder is the fmeobjects.pyd which is essentially a Windows .dll folder. Check this question on the SO to learn more;
* initialize the FMEWorkspaceRunner class;
* run the workbench supplying the values for the published parameters as needed: the parameters are supplied in the dictionary format with the parameters names as dict keys and parameter values as dict values.

To take advantage of the intellisense and code completion in the IDE, I usually execute the code with the import of fmeobjects and then pause while staying in the debug mode. This makes it possible to access the properties and methods of the fmeobjects in the debugging window.

If you need to create a workbench that would be more flexible in terms of input parameters, you might consider learning about dynamic workspaces in FME if you aren’t familiar with them.

When you create a workbench with dynamic workspace, you are able to take any input data. This is what most authors of ArcGIS script tools would find useful. One would usually use FME for processing the same input on a regular basis, but I find it very helpful to be able to have a workspace that is not tied to any particular dataset or data schema and would let me process any kind of input dataset. Plus you would like to might want to use other transformers and not only those that have been added into the workbench.

This is the sample code that illustrates re-projecting multiple shapefiles in the input folder (think of this as of Batch mode for the Project GP tool in ArcGIS). The workbench that is being run has a Reprojector transformer added.

Unfortunately, the FME transformers are not exposed as Python functions, so this is where the FME differs from ArcGIS which has all of its GP tools exposed as arcpy package functions. This means you won’t be able to build all your workflows purely in Python because you won’t be able to use the transformers as they are. However, you can do a whole lot with fmeobjects purely in Python.

After importing fmeobjects, you will be able to read any supported data format (e.g. create a reader) and then write its features into any supported format (e.g. create a writer). The reader object is similar to the arcpy.da.SearchCursor(); it’s an iterator that you can go through just once reading features. Here is what is exposed to you among many other things:

* all the input dataset fields, its names and properties (similar to arcpy.ListFields() function);
* all the data rows (similar to arcpy.da.SearchCursor() cursor);
* geometry objects with lots of properties including coordinates of the vertices and the coordinate system (similar to arcpy.Geometry() class);

You will be able to manipulate the reader changing its schema and processing the features which you will write then into an output dataset. The feature object has also a bunch of methods for its processing. One of them is reproject() that re-projects the feature from its current coordinate system to that specified. Let’s build a script that can project an input shapefile without calling any FME workbenches.

Stress testing ArcGIS Server with Python

I’ve blogged earlier about using Apache JMeter for stress testing ArcGIS Server and for performance tests. It’s a terrific tool that can do so much more than that though.

In case you won’t be able to use the JMeter, you can perform a similar stress test on an ArcGIS Server service by using Python multiprocessing module as well as ArcREST Python package – an open source package developed by Esri and the user community and hosted on GitHub.

In this code, a pool of workers is created and each of those does the export of the map image which triggers creating an instance, if there aren’t any available, which use certain amount of RAM and CPU. To learn more about the ArcREST, please visit the Wiki page of the project. To learn more about the Python multiprocessing, check the Help page and some SO pages here and here.

For live monitoring of the ArcGIS Server service busy instances, a sample from Esri ArcGIS Server development team could be used. Another application built on top of that sample can be found here on GitHub.

Data analysis for ArcGIS Pro 1.2 users

If you will need to do any advanced data analysis, consider using Python libraries numpy and pandas for that. Just in case you are a user of ArcGIS Pro 1.2, the Python 3.4 compilation that can be installed with Pro does contain those libraries along with scipy and matplotlib. You will be able to perform most of your data analysis work with those libraries and you won’t need to download any package managers such as conda or pip. Jupyter notebook, formerly known as IPython notebook, is another great tool for interactive data analysis and visualization. In case you are not familiar with installing external libraries in Python, check this post on how to install pip; it’s very easy.  It’s all about downloading and executing a Python file. Then to get started with data analysis in Python and ArcGIS Pro:

1. install arcgis pro 1.2
2. install python 3.4.3 for arcgis pro
3. add to PATH system variable C:\Python34
4. run pip install ipython
5. run pip install jupyter

And now you have everything you will need to become a data scientist!

Working with csv files in Python

Have you ever needed to process some large datasets? Well, it all depends on how you define “large”. In this post, I will refer to large datasets as to relational tables or delimited text files of several GBs of size containing 10+ mln rows.

When I have to do some data processing or filtering, I often start analyzing what kind of toolset or framework should I use to perform the work. Say you have a large .csv file and you want to create a new .csv file that is filtered by a column (so you basically want to create a new .csv based on the SQL where clause).

Well, you can do that with ArcGIS. It is possible to add text files including .csv files into ArcMap and open its attribute table. You will be able to make selections based on attributes, calculate statistics, and many more. After having a selection created, you can export selected rows into a new comma separated file. However, in order to make selection work on the whole dataset (and not only the first 2000 rows visible in the attribute table), it seems as you need to move to the end of table. This can take rather long time; so it’s probably best to use the Definition Query tab in the Table Properties dialog box. This will make visible only those rows that match your criteria. Now you can export the whole table with the query on to a new text file; pay particular attention to what kind of default demiter will used when exporting a text file from ArcMap and whether any new columns such as OBJECTID is added to each of the source rows. Another caveat is to watch out carefully for any decimal delimiters that are used because they can cause problems for comma-delimited text tables. Refer to this KB article to learn more.

Another software used by many GIS professionals is FME and this task can be done within this framework, too. Probably the easiest way to do this is to use the AttributeFilter transformer to obtain only those rows that meet your criteria. You will be able to supply a Separator Character and Character Encoding among other options. Very handy and easy to build. Filtering 20mln rows csv file with two columns and writing to a new csv file with FME 2015 wouldn’t take more than several minutes on a decent machine.

Most DBMS will let you to import and export csv files, too. SQL Server, for instance, is also capable of importing a .csv file using the SQL Server Import and Export Wizard. You will be able to load a csv file into a SQL Server table, then create a new table with SELECT INTO with a where clause to keep only needed rows. You will also be able to script this process by using the SSIS packages or by using the BULK INSERT to insert csv rows into a table.

And… you can do this in Python! There are many ways to do this such as reading file lines or using built-in modules such as csv or external ones such as unicodecsv. Be careful when reading large files with Python; very often I see people reading all the file rows into a single object, and this can cause out of memory error even on powerful machines. Learn more about iterators in Python – the reader objects that will be created are generators that you can iterate (go through just once).

Here is a couple of samples on how to read/write csv with Python to help you get started:

OOP approach for ArcGIS development with arcpy

I have blogged earlier about creating helper functions for arcpy which can make you more productive. The basic idea is that you can wrap some of the chunks of code you call often in various projects and then store and maintain them just in one place. A great example of this would be using arcpy.Describe():

Let’s say you have a geodatabase and as a part of your workflow you need to find out some of its properties. You would get individual properties of the geodatabase; alternatively, you would construct a dictionary to access various attributes stored within the same Python object, just like you see in the sample above.

Today I want to show you how to implement this by using the object-oriented programming (OOP) paradigm. You haven’t probably seen that many examples of building new classes for use with arcpy. This is just a really primitive sample that can help you get a sense what is like to build a class and what advantages it provides. It’s important to understand that you might benefit from building own classes that are not present in arcpy. Even more, a new class might inherit all of its properties from an existing arcpy class such as arcpy.SpatialReference(), but be extended with extra properties and methods that you have yourself created.

As you see, we have started just by obtaining the geodatabase properties via the Describe properties. But then we went further and created own properties that would hold some useful information you would like have at hand while working with the geodatabase. It might be super helpful to have JSON like representation of all the features in the feature class, for instance. Let’s see now how we can quickly access various objects inside the geodatabase.

There is a ton of other things you could do now when you have all the feature classes and its features available. One thing to keep in mind is that it does take some time to read the features and feature classes properties, so when creating an instance of a large geodatabase, this process can take quite a long time. Unless you don’t need to access all the features within all the feature classes within the geodatabase, try to construct only those objects which are relevant for your work.

Export feature class attribute table into an Excel 2007 with xlsxwriter

Did you ever need to export the attribute table of your feature class into an Excel workbook? I am fairly sure you did. Maybe someone without ArcMap needed to take a look at a piece of data or build a graph within the Excel based on the attributes. You just need to take some of your data into an enterprise system that require an Excel file as input.

Previously, before the ArcGIS 10.2 was released, you had to build custom Python scripts for exporting the attributes. There are some custom scripts that were made available for 10.0+ users, too. Alternatively, you could export the attribute table into a .txt/.csv file which you could later import in Excel. Take a look at this Esri KB article HowTo: Export an attribute table to Microsoft Excel to learn more.

At 10.2, script tools Table to Excel and Excel to Table were made available in ArcGIS for Desktop for all license levels. These tools are based on the xlwt Python site package. However, these tools can work only with .xls files (Excel 97-2003 format) which means you won’t be able to generate an Excel from a feature class with more than 65,536 rows.

I’ve recently needed to generate multiple Excel files from rather large feature classes with some fancy styling; this is where another Python site package – xlsxwriter – came handy. It provides a very clean and fast interface for creating new Excel files. In fact, generating an Excel file from a feature class (much like what the Table to Excel geoprocessing tool does) can be done in this very snippet of code.

Keep in mind that it would be very easy to filter out what rows you want to export by using a where clause within the cursor definition. The xlsxwriter has an amazing suite of features for doing very fancy things within an Excel workbooks such as generating charts, using formulas, and provide data validation rules. The project is very active and there are many users, so it would be safe to include it in your production workflows.

Writing helper Python functions with arcpy

If you have been working with arcpy for some time now, you might have found at some point that you re-use the same chunks of code in various modules and making changes in one place requires making modifications in other files involved.

At this stage, you should consider wrapping those bits of code into a function. This will make it easier to organize your code and make modifications as your project grows. Another thing with arcpy is that you might finding yourself going through the same procedure when starting working on a project. For instance, you set up a workspace, supply a feature class path, use the arcpy.da cursor to get values from a column and then start doing some analysis based on that. You may of course have multiple projects each of which will require a different setup, but this one is fairly common among many developers.

It is may be a good time at which you would need to compile a site package or just a helper module that will contain the functions and methods you need to use regularly.

An excellent example of compiled package of arcpy helper functions can be found here on github; the package is called arcapi. Caleb and Filip have done a great job putting together dozens of useful functions that nearly every arcpy developer uses daily. Take a look at the amazing work they have done.

So, for instance, you often need a sorted list of unique values within a column in a feature class. You could write a helper function that will return a sorted list and keep in a helper module apart from your project where you focus on implementing the business logic.

Provided this function is stored within a separate module named arcpy_helper.py, it’s just about importing this module and calling

arcpy_helper.select_distinct(“path_to_fc,”input_field_name”)

Much less code, much more productive.

Sometimes it’s not about bringing in a new piece of functionality though, but rather about writing less code.

Compare two function calls:

fields = [field.name for field in arcpy.ListFields(in_fc,field_type=”DOUBLE”)]

and

fields = get_fields(in_fc,”DOUBLE”)

Both of this calls will give you a list of fields of double type found in a feature class or a table. However, the second one is much shorter. If you need to run this piece of code just once, you are OK. But if you call it fairly often throughout the module, you can save a lot of space by using a wrapper function (sometimes referred to as syntactic sugar) that is more concise and can improve the code readability.

However, be cautious when trying to implement a functionality that you think is missing in arcpy, because there is a chance there is a ready-to-use geoprocessing tool for that. For instance, when you need to find out how many times a certain value is found in a column, the Summary Statistics GP tool can be used. When trying to remove the duplicate rows or features, the Delete Identical GP tool can be used.

But in other cases, it might be worth wrapping a chunk of code you think you will need “just for this project” into a helper function. You can collect those and at some point of time you will realize how much time you save by calling them instead searching through the projects’ code looking for a code snippet you need.

I will leave you with a helper function I’ve written for listing out fields where there are no values stored (i.e., all rows have NULL).