How to print line being executed when running a Python script

If you have ever needed to run a rather large data processing script, you know that it may be rather difficult to track the progress of the script execution. If you copy a number of files from a directory into another one you can easily show the progress by figuring out the total size of all files and then print how much has already been copied or how much is left. However, if your program does many things and executes code from some 3rd party packages, there is a risk you won’t have a clue about how much time is left or at least where you are in the program, that is what line of code is currently being executed.

A simple solution to this is to spread the print statements around the program to show the progress. This approach works great when you have just a few key breakpoints you are paying attention to, however, as your program grows it may become vital to be able to tell exactly what line of code is being executed. This may come in handy if the program does not seem to do anything any longer and you would like to re-execute it, but do not want to run the code that has been run successfully. Adding a print statement after each line of code will soon become impractical.

Fortunately, there is a built-in module trace available both in Python 2 and 3 which you can use to show the progress of your program execution. To learn more about the trace module, take a look at the standard library docs page and on the Python MOTW page.

To provide a simple example, if you have a Python script containing the following:

import math
import time

print(math.pow(2, 3))
print(math.pow(2, 4))
print(math.pow(2, 5))

then you can run python -m trace -t --ignore-dir C:\Python36 .\ to see the live updates on what line of your program is being executed. This means you can run your long time taking script in a terminal and then get back to it now and then to see its progress because it will print each line that is currently being executed. The handy --ignore-dir option lets you filter out calls to the internal Python modules so your terminal won’t be polluted with unnecessary details.

On Windows, be aware of the bug in CPython which breaks because of how directories comparison works incorrect on case-insensitive file systems (such as NTFS on Windows). So be sure to specify the path to the Python interpreter directory using the right case (C:\Python36 would work, but c:\python36 would not).

You can also provide multiple directories to ignore, but be aware of what environment you run your Python script on Windows, because you would need to use different syntax.

  • Git Bash: $ python -m trace -t --ignore-dir 'C:/Python36;C:/Util'
  • cmd: python -m trace -t --ignore-dir C:\Python36;C:\Util .\
  • PowerShell: python -m trace -t --ignore-dir 'C:\Python36;C:\Util' .\

In Linux, it seems like you don’t have to provide the ignore-dir argument at all to filter out the system calls:

linuxuser@LinuxMachine:~/Development$ python -m trace -t
— modulename: run, funcname: <module> import math import time print(math.pow(2, 3))
8.0 print(math.pow(2, 4))
16.0 print(math.pow(2, 5))
— modulename: trace, funcname: _unsettrace sys.settrace(None)

The trace module also has other usages such as generating the code coverage and branching which can be useful if you would like to see what branch of your if-else was picked during the program execution. However, you wouldn’t use the trace module only to generate the code coverage, because there is Python package called that provides much richer functionality for this, so be sure to use that instead.


Python static code analysis and linting

Over the past years, I have been using various tools that helped me to write better Python code and catch common errors before committing the code. Linter is a piece of software that can help with that and there are a few Python linters that are capable of finding and reporting issues with your code. I would like to split the types of issues a linter can report into three groups:

Obvious code errors that would cause runtime errors.

Those are easy ones. To mention a few:

  • you forgot to declare a variable before it is used;
  • you supplied wrong number of arguments to a function;
  • you try to access a non-existing class property or method.

Linters help you catch those errors so it is great to run the linter on your Python modules before executing them. You would need to modify your code manually. PyLint or flake8 could be used.

Style related issues that do not cause runtime errors.

Those are easy ones, too. To mention a few:

  • a code line is too large making it difficult to read;
  • a docstring has single quotes (instead of double quotes);
  • you have two statements on the same line of code (separated with semicolon ;);
  • you have too many spaces around certain operators such as assignment.

Linters can also help you catch those issues so it is great to run the linter on your Python modules before executing them. They are less critical as you won’t get any runtime errors due to those issues found. Fixing those issues, however, will make your code more consistent and easier to work with.

Making such changes as separating function arguments with a space or breaking a longer line manually could be tedious. It becomes even more difficult if you are working with a legacy Python module that was written without any style guidelines. You could need to reformat every single line of the code which is very impractical.

Fortunately, there are a couple of Python code formatters (such as autopep8 and yapf) that could reformat Python module in place meaning you don’t have do it manually. Formatters depend on the configuration that would define how certain issues should be handled, for instance, the maximum length line or whether all function arguments should be supplied each on a separate line. The configuration files is read every time formatter runs and makes it possible to use the same code style which is of utter importance when you are a team of Python developers.

The general style guidelines can be found at PEP-8 which is the de-facto standard for code formatting used by Python community. However, if PEP-8 suggestions don’t work for your team, you can tweak it; the important thing is that everyone agrees and sticks to the standard you decide to use.

Code quality, maintainability, and compliance to best practices

Those are more complex mainly because it is way harder to identify less obvious issues in the code. To mention a few:

  • a class has too many methods and properties;
  • there are two many nesting if and else statements;
  • a function is too complex and should be split into multiple ones.

There are just a few linters that can give some hints on those. You would obviously need to modify your code manually.

As I used linters more and more, I’ve been exposed to various issues I have not thought of earlier. At that point of time I realized that linters I used could not catch all kinds of issues that could exist leaving some of them for me for debugging. So, I’ve started searching for other Python linters and Python rulesets that I could learn from to write better code.

There are so many Python linters and programs that could help you with static analysis – a dedicated list is maintained under awesome-static-analysis repository.

I would like to share a couple of helpful tools not listed there that could be of great help. The obvious errors and stylistic issues are less interesting because there are so many Python linters that would report those such as pylint and flake8. However, for more complex programs, to be able to get some insights on possible refactoring can often be more important. Knowing the best practices of the language and idioms can also make your Python code easier to understand and maintain. There are even some companies that develop products that check the quality of your code and report any possible issues. Reading through their rulesets is also very helpful.

wemake-python-styleguide is a flake8 plugin that aggregates many other flake8 plugins reporting a huge number of issues of all three categories we have discussed above. Custom rules (not reported by any flake8 plugin) can be found at the docs page.

SonarSource linter is available in multiple IDEs and can report all kinds of bugs, code smells, and pieces of code that are too complex and should be refactored. Make sure to read through the ruleset, it is a great one.

Semmle ruleset is not an open-source product, but their ruleset is very helpful and should be reviewed.

SourceMeter ruleset is not an open-source product (however, there is a free version) but their ruleset is also very helpful and should be reviewed.

Printing pretty tables with Python in ArcGIS

This post would of interest to ArcGIS users authoring custom Python script tools who need to print out tables in the tool dialog box. You would also benefit from the following information if you need to print out some information in the Python window of ArcMap doing some ad hoc data exploration.

Fairly often your only way to communicate the results of the tool execution is to print out a table that the user could look at. It is possible to create an Excel file using a Python package such as xlsxwriter or by exporting an existing data structure such as a pandas data frame into an Excel or .csv file which user could open. Keep in mind that it is possible to start Excel with the file open using the os.system command:

os.system('start excel.exe {0}'.format(excel_path))

However, if you only need to print out some simple information into a table format within the dialog box of the running tool, you could construct such a table using built-in Python. This is particularly helpful in those cases where you cannot guarantee that the end user will have the 3rd party Python packages installed or where the output table is really small and it is not supposed to be analyzed or processed further.

However, as soon as you would try to build something flexible with the varying column width or when you don’t know beforehand what output columns and what data the table will be printed with, it gets very tedious. You need to manipulate multiple strings and tuples making sure everything draws properly.

In these cases, it is so much nicer to be able to take advantage of the external Python packages where all these concerns have been already taken care of. I have been using the tabulate, but there are a few others such as PrettyTable and texttable both of which will generate a formatted text table using ASCII characters.

To give you a sense of the tabulate package, look at the code necessary to produce a nice table using the ugly formatted strings (the first part) and using the tabulate package (the second part):

The output of the table produced using the built-in modules only:


The output of the table produced using the tabulate module:




Using Python start up script for all Python interpreters

This post would be helpful for users of desktop GIS software such as ArcMap who need to use Python inside those applications.

There is a not so well known trick to trigger execution of a Python script before any Python interpreter on your system starts.

Note: If you are a QGIS user, there is a special way of achieving this. Please see the question Script that runs automatically from the QGIS Python Console when QGIS starts, using schedule tasks on Windows 10 for details.

The way to do this is to set up an environment variable called PYTHONSTARTUP in your operating system. You need to do two things:

  1. Create an environment variable that would point to a path of a valid Python script file (.py) with the code that you would like to get executed before any Python interactive interpreter starts. Look at the question [Installing pythonstartup file]( for details.
  2. Write Python code that you would like to get executed.

A very important thing to consider is that

The file is executed in the same namespace where interactive commands are executed so that objects defined or imported in it can be used without qualification in the interactive session.

This means that you can do a bunch of imports and define multiple variables which would be available to you directly at the start up of your GIS application. This is very handy because I often need to import the os and sys modules as well as import arcpy.mapping module and create mxd variable pointing to the current map document I have open in ArcMap.

Here is the code of my startup Python script which you can modify to suit your needs. If your workflow relies on having some data at hand, then you might need to add more variables exposed. I have ArcMap and ArcGIS Pro users in mind.

I have included in the example above a more specific workflow where you would like to be able to quickly execute SQL queries against an enterprise geodatabase (SDE). So, when ArcMap has started, you only need to create a variable conn pointing to a database connection file (.sde) and then use the sql() function running your query. Thanks to the tabulate package, the list of lists you get back is drawn in a nice table format.

2018-03-15 12_55_03-PythonWindowArcMap



Changing selection color in exporting ArcMap layout with ArcPy

If you are an ArcMap user and you export your map layouts into images or .pdf files using arcpy, you may have noticed that the selection symbol stays always the same (the default light green color). I was looking recently for a way to change the selection color that would be respected when exporting map layouts programmatically using arcpy.mapping.ExportToPNG().

Changing a selection color in ArcMap (Selection menu > Selection Options) does change the selection color but it is only respected when exporting the layout manually from the File menu or when exporting the layout in the Python window (it respects the selection color set in ArcMap since I am inside a live ArcMap session). Exporting a layout using arcpy though always uses the default light green color for selection.

I have found out that the arcpy.mapping.Layer object does not expose the selection symbol property according to the docs of the Layer:

Not all layer properties are accessible through the Layer object. There are many properties available in the ArcMap Layer Properties dialog box that are not exposed to the arcpy scripting environment (for example, display properties, field aliases, selection symbology, and so on).

Adding duplicate layers pointing to the same data source and setting the definition queries manually sounds really cumbersome, particularly when I don’t know beforehand what layers I will need to run selection on.

I have solved it in another way.

  1. I pre-create three layers for each of the shape types (point, polyline, and polygon) using the necessary selection symbology (done manually).
  2. Before the export of the map document’s layout, I duplicate the layer that I will run selections on and update its symbology using the .lyr file.
  3. Then I set definition query on the dummy layer that will serve as a selection symbol layer only.

All of this is done programmatically, so the only thing I really need to supply is the name of the layer that selections will run on. It works great and gives me flexibility to use a custom color for selections on any layer inside any map document.

The script source code:

Working with SQL in Jupyter notebook and dumping pandas into a SQL database

I have posted previously an example of using the SQL magic inside Jupyter notebooks. Today, I will show you how to execute a SQL query against a PostGIS database, get the results back into a pandas DataFrame object, manipulate it, and then dump the DataFrame into a brand new table inside the very same database. This can be very handy if some of your operations are better done using plain SQL, for instance, with the help of spatial SQL functions, but other ones are easier to perform using pandas​, for instance, adding and calculating new columns when the data you need to access is not stored inside the database.

The steps are as below:

  • Connect to a database using the SQL magic syntax
  • Execute a SELECT SQL query getting back a result set
  • Read the result set into a pandas DataFrame object
  • Do stuff with the DataFrame object
  • Dump it to a new table in the database you are connected to using the PERSISTcommand

Again, this is very handy for prototyping when you need to produce some tables doing a database design or when you need to have a new temporary table created for some application to read or when running some integration tests and needing to have a mock-up table to work with.

The sample notebook is available as a gist:


Adding IPython SQL magic to Jupyter notebook

If you do not use the %%sql magic in your Jupyter notebook, the output of your SQL queries will be just a plain list of tuples. A better way to work with the result sets returned is to draw them as a table with the headers. This is where the IPython SQL magic gets very handy. You can install it using pip install ipython-sql. Refer to its GitHub repository for details of the implementation.

You have to connect to a database and then all your subsequent SQL queries will be aware of this connection and the result sets are also drawn nicely in a table. Another neat feature of the %%sql magic is that you will be able to get the result of your SQL query as a pandas data frame object if you would like to proceed working with the result set using Python. The result object of SQL query execution can be accessed from a variable _. This is because IPython’s output caching system defines several global variables; _ (a single underscore) stores previous output just like the IPython interpreter.

Look into this sample Jupyter notebook for illustration.