Publishing Python scripts as geoprocessing services: best practices

Why Python instead of models?

If you have been publishing your ModelBuilder models as geoprocessing (further GP) services you have probably realized that it can be quite cumbersome. If you haven’t moved to Python, I think you really should. Authoring Python scripts has serious advantages over authoring models in the context of publishing GP services. This is because during the publishing process, ArcGIS Server will turn data and anything that may be needed to change into variables and this might mess up the model if you haven’t followed the guidelines on authoring GP services. The rule of thumb for me was that if there are more than 10 objects in the model, it is a good time to switch to Python. Another thing is that you can easily make modifications in the Python code without republishing; in contrast, you need to republish the model each time you want to release an updated version of the GP service. Finally, since you don’t need to restart the GP service when updating the Python file (in contrast to republishing the model which requires restarting service), there is no down-time for the service and users won’t notice anything.

What happens after publishing?

Let’s take a look at what is going on under the hood. You have run your script tool in ArcMap and got the result published as a service. Now you can find your service and all the accompanying data inside the arcgisserver folder somewhere on your disk drive. The path would be: C:\arcgisserver\directories\arcgissystem\arcgisinput\%GPServiceName%.GPServer

You will find a bunch of files within the folder. Let’s inspect some of them:

  • serviceconfiguration.json – provides an overview over all the properties of the service including its execution type, enabled capabilities, output directory and many others. Here you will see all the settings you usually see in the Service Editor window.
  • manifest.xml and manifest.json – provides an overview of the system settings that were used while publishing the service. Those are not the files you usually would want to inspect.

Inside the folder esriinfo/metadata there is a file named metadata.xml which is really helpful because there you can see what date a service was published. Two tags you should look at are:

  • <CreaDate>20141204</CreaDate>
  • <CreaTime>15443700</CreaTime>

Since this information is not exposed from the GUI in ArcGIS Desktop or ArcGIS Server Manager, this is the only way to find out what time the service was created. This information may be very handy when you are unsure about the release versions.

Inside the extracted/v101 folder, you will find the result file and the toolbox you have worked with when publishing the GP service. Here you will also find a folder named after the folder where you source Python file was stored and containing the source Python file.

Best practices to organize the Python code and files?

Let’s look inside the Python file. You might have noticed that when publishing some of the variables you’ve declared were renamed to g_ESRI_variable_%id%. The rule of a thumb is that you shouldn’t really use strings; you can turn paths to datasets and names into variables. Of course you don’t have to do this since Esri will update those inline variables, but it is so much harder to refactor with those variable names, so you better organize your code correctly from the beginning.

If running the script tool in ArcGIS, scratch geodatabase is located at C:\Users\%user%\AppData\Local\Temp\scratch.gdb. However, after publishing the tool, the service will get a new scratch geodatabase. If you need to inspect the intermediate data created, go to the scratch geodatabase (the path can be retrieved with the arcpy.env.scratchGDB) which will be a new file geodatabase in each run of GP service with the following notation: c:\arcgisserver\directories\arcgisjobs\%service%_gpserver\%jobid%\scratch\scratch.gdb.

Keep in mind that GP service will always use its local server jobs folder for writing intermediate data and this behavior cannot be changed. But having the service writing to the scratch workspace is actually a lot safer than writing to a designated location on disk. This is because there is no chance of multiple GP service instances trying to write to the same location at the same time which can result in dead-locking and concurrency issues. Remember that each submitted GP job will be assigned to a new unique scratch folder and geodatabase in the arcgisjobs folder.

Make sure you don’t use arcpy.env.workspace in your code; always declare a path variable and assign it to be the folder or a geodatabase connection. For the datasets path, use the os.path.join() instead of concatenating strings. For performance reasons, use in_memory workspace for intermediate data with the following notation:

var = os.path.join(in_memory,"FeatureClassName")

You can take advantage of using in_memory workspace, but for troubleshooting purposes it might be better to write something to disk to inspect later on. In this case, it might be handy to create a variable called something like gpTempPlace which you can change to be either “in_memory” or a local file geodatabase depending whether you run clean code in production or troubleshoot the service in the staging environment.

Make sure you don’t use the same name for variable and feature class/field name. This sometimes leads to unexpected results when running the service. It might be helpful to add “_fc” in the end for feature class variable and “_field” for the field variable. This way, you will also be able to distinguish them much easier. The same is applicable for the feature class and feature layer (created with Make Feature Layer GP tool) names.

Remember that you can adjust the Logging level of ArcGIS Server (in Manger) or GP service only (done in Service Editor window) for troubleshooting purposes. It is often useful to set the Message Level setting to Info level before publishing the GP service into production because this will give you the detailed information what exactly went wrong when running the GP service. You can access this information either from the Results window in ArcMap or from the ArcGIS Server logs in Manager.

How do I update the published service?

A special word should go for those users who needs to publish the GP services often while making changes in the code. It is important to understand that after publishing the GP service, the copied Python code file and the toolbox don’t maintain any connection to the source Python file and the source toolbox you have authored. This implies that after making edits to the script tool, you need to push those changes to the published service on server.

There are two types of changes you can make on your source project: the tool parameters in the script tool properties and the Python code. Keep in mind that you cannot edit the published toolbox; so if you added a new parameter or modified existing parameter data type, you would need to republish the service. However, if have only modified the Python script source code, there is no need to republish the whole service as you only need to replace the contents within the Python file.

Even though you can automate service publishing workflow with Python, it still takes time to move the toolbox and the Python code files. Therefore, you can save a lot of time by finding a way to replace the Python code in the published Python code file. In order to update the Python code file, you have really just two options – you either copy and replace the published file with the updated source file or copy/paste the source code. This approach may work if you have a tiny Python script and all the paths to the data on your machine and on the server are the same. This can be a plausible solution when you have Desktop and Server on the same machine. If you have a configuration file where the Python source code file gets all the paths and dataset names from, you could also safely replace the published Python file. However, this is still an extra thing to do.

The best practice I came to while working for two years on the GP services is to split the code files and the tool itself. Let me explain.

How do I separate the toolbox logic and Python code?

Create a Python file (a caller script) which will contain all the import statements for the Python modules and your files. By appending the path to your Python files, you will be able import the Python files you are working on; this is very useful when your GP service consists not just of one file yet of multiple modules.

import sys
import socket
sys.path.append(r'\\' + socket.gethostname() + "path to code files")
import codefile1 #your Python file with business logic
import codefile2 #your Python file with business logic

This file should also include all the parameters which are exposed in the script tool.

Param1 = arcpy.GetParameterAsText(0)
Param2 = arcpy.GetParameterAsText(1)
Param3 = arcpy.GetParameterAsText(2)

Then you define a main function which will be executed when running the GP service. You call the functions defined within the files you imported.

def mainworkflow(Param1,Param2):
    Result = codefile1.functionName(Param1,Param2)
return Result

It is also handy to add some parameter handling logic; when you run the script tool in ArcMap, you supply some values for the tool which will become default values visible when users will execute GP service from ArcMap or from any other custom interface. In order to avoid that, you can just leave those parameters empty and then return empty output for GP script tool publishing only purposes.

if Param1 == '' and Param2 == "":
    Result = ""
else:
    Result = mainworkflow(Param1,Param2)

Create a script tool from this caller Python file defining the parameters and their data types. After the tool will be published as a GP service, you can work with the Python files which will contain only the code that actually does the job.

After performing and saving the changes in the code, feel free to run the GP service directly – the caller Python file (published as a GP service) will import the codefile1 at the folder you specified and run the code. There is no need to restart the GP service or re-import your update module.

Advertisements

11 thoughts on “Publishing Python scripts as geoprocessing services: best practices

  1. “Make sure you don’t use the same name for variable and feature class/field name. This sometimes leads to unexpected results when running the service. It might be helpful to add “_fc” in the end for feature class variable and “_field” for the field variable.”

    Absolutely true!

    I was struggling to make my GPservice working and receiving
    “ERROR 000622: Failed to execute (Make Feature Layer). Parameters are not valid.
    ERROR 000628: Cannot set input into parameter out_layer.”

    Apparently, the fact that I was calling my var with the same name as the parameter output name was the problem making my GPService complaining. When I appended a “_lyr” to the output name parameter, the service ran successfully.
    Thanks for the post, these are really great advices for publishing scripts onto ArcGIS Server!

  2. “Make sure you don’t use the same name for variable and feature class/field name. This sometimes leads to unexpected results when running the service. It might be helpful to add “_fc” in the end for feature class variable and “_field” for the field variable.”

    Absolutely true!

    I was struggling to make my GPService working, but I was stuck getting error:
    ” ERROR 000622: Failed to execute (Make Feature Layer). Parameters are not valid.
    ERROR 000628: Cannot set input into parameter out_layer.”

    When I realized the name of my var was the same as the output name’s parameter, I appended “_lyr” to the latter and everything worked just fine.
    Thank you for this unvaluable piece of knowledge on publishing scripts onto ArcGIS Server. This post was really helpful!

  3. Hi,
    I have a question: when you talk about using the chunk below, what do you mean by “path to code files”? You mean the full path on my local machine, e.g. “C:\\\\ScriptFolder” or you mean a relative path to codefile1 and codefile2? I am just confused on how this “path to code files” will be converted upon publishing the service since the location will change of course..

    Thanks!

    import sys
    import socket
    sys.path.append(r’\\’ + socket.gethostname() + “path to code files”)
    import codefile1 #your Python file with business logic
    import codefile2 #your Python file with business logic

    1. Hi! If your server machine has name `server1` and you have shared a folder `mycodefolder` on the server, then you would have sys.path.append(r’\\’ + socket.gethostname() + ‘\mycodefolder’. Always use shared folder path.

      1. @Alex, by “shared folder” on the server you mean a folder that has been registered in the Data Store related to that server, i.e. a folder that is both recognized by Desktop on my PC and Server on a remote machine?
        Thanks!

  4. I’m struggling with the following error:

    “ERROR 000732: Input Features: Dataset C:\\arcgisserver\\directories\\arcgissystem\\arcgisinput\\GP\\ExhibitsWebPrinting.GPServer\\extracted\\v101\\tbx_egdb.sde\\BDWMD_COUNTIES does not exist or is not supported”

    I’ve tried both os.path.join method as well as simply quoting the entire path and feature class name but it is failing on arcpy.MakeFeatureLayer_management() and no resolution.

    Attempt 1 fails:

    ws_egdb = r’\\…\agsdev\SrcData\Feature\SDE\tbx_egdb.sde’
    fc_AdminBndy3_name = r’egdb.WMEGD.BDWMD_COUNTIES’
    fc_AdminBndy3 = os.path.join(ws_egdb, fc_AdminBndy3_name)
    fl_AdminBndy3 = os.path.join(“in_memory”,”fl_AdminBndy3″)
    arcpy.MakeFeatureLayer_management(fc_AdminBndy3, fl_AdminBndy3)

    Attempt 2 fails:

    fc_AdminBndy3 = r’\\…\agsdev\SrcData\Feature\SDE\tbx_egdb.sde\egdb.WMEGD.BDWMD_COUNTIES’
    fl_AdminBndy3 = os.path.join(“in_memory”,”fl_AdminBndy3″)
    arcpy.MakeFeatureLayer_management(fc_AdminBndy3, fl_AdminBndy3)

  5. Alex, thank you for this post. I’m working on a project that has three environments (Development, Staging, and Production). Each environment has multiple ArcGIS Servers and more than 20 GP services. The services get updated frequently. This technique will save us hours every time the Python code for a GP service is changed.

    1. Hey Larry, this sounds great. One of the projects I have is just like yours with multiple envs. Being able to skip re-publishing the GP services is very nice. Another thing I suggest putting some thought into is configuration of the constants and environment variables (that is, the context). Try to have a single configuration file that will be available in all three environments and will pick the values based on where the Python code is being executed. This will free you from updating the config values every time you will move between the envs. For this, I use os.getenv(): I have a Windows env variable that I can read from Python. When Python knows which environment it’s running, it’s just the matter of picking the right item from the list. For instance: db_name = [‘a’, ‘b’, ‘c’][idx] where idx is coming from {“Dev”: 0, “Staging”: 1, “Prod”:2} and os.getenv(‘MyEnv’). Hope this makes sense. Good luck and feel free to share some of the best practices you’ve found usefule while working with GP services and Python.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s