Geospatial careers: list of companies

I have published a curated list of companies in geospatial industry which you can use when looking for a job. The list is sorted by country and with some extent by industry domain. I have compiled it over many years while I was a part of GIS industry.

https://github.com/AlexArcPy/geospatial-careers

Each company is documented in format of [company website]: very brief description of what they do. Obviously quite a few companies work with a variety things so they can be hard to categorize. A company may have jobs for GIS developers, digital mappers, photogrammetry experts, machine learning professionals, and so forth. I suggest exploring their home page to learn more about them. It is also possible that a company did just one thing when this list was published, but then they expanded and now advertize for other job titles, too.

It is also possible that a company is located under a certain country section, but may have offices in other countries, too. However, keep in mind that they may look for geospatial positions only in a particular country/office. Company description is not comprehensive: a company may do many other things apart from geospatial related operations, but I won’t mention them as they are irrelevant in this context. Some companies may permit working remotely. Again, please explore the company website to double-check. Due to the dynamic nature of the Internet, if the URL is broken, just use a web search engine to find a company’s website.

This page won’t be updated on a regular basis so it is pretty static. If you know a company in geospatial sector, by all means, please do submit a pull request so we could expand this list. The geospatial industry is fairly small so I thought sharing this list with the community would benefit both the companies looking for talent and peer professionals looking for a job.

Good luck with job hunting!

Advertisements
Featured

Leaving the GIS industry: bye for now

After working for more than 10 years in geospatial industry, I have decided to change the field and focus on pure software engineering. I have quite enjoyed using GIS tools to solve various computational problems and software development tools to build useful GIS products and services for my customers. However, as my interests starting shifting towards clean code and programming per se – I have noticed I was reading in the bed Refactoring by Martin Fowler way more often than The ESRI guide to GIS analysis by Andy Mitchell – so I thought it would be useful to try changing the career path.

As I won’t be using any of the GIS software any longer, I won’t be able to post any new practical material that would be useful to GIS analysts and developers. However, in this post I would like to share some of the last thoughts I have which could be of interest to peer GIS professionals.

This blog becomes an archive of hopefully useful resources. Good luck!

Skills

SQL

I cannot stress enough how important it is for anyone using GIS data to master SQL. It is such an integral part of nearly any GIS processing workflow and being able to process or manage data stored in a DBMS using SQL becomes crucial. As GIS datasets grow in size, writing scripts for data processing won’t suffice as there are still no efficient spatial data processing packages that could make the process consistent, fast, and reliable. Using PostGIS or SQL Server native spatial types can get you often farther than any open source Python package. Don’t stop mastering SQL after learning the basics as there is so much more.

I have two massive posts about using SQL functions for GIS:

Math

Don’t bother too much studying math. I am not sure why many people I have spoken to think that in order to be good at GIS one has to be good at math. Being able to operate high school algebra and geometry terms is definitely useful (angle between intersecting roads, area of a lake, percentage of parcel covered by forest), but there is no need to study more sophisticated math. For a casual GIS analyst work, having the basics covered will be more than enough.

Of course if you are building a new routing algorithm that will be more efficient that A*, then you would have to review a lot of graph theory materials or if you are building a new Voronoi diagram generation procedure, you will find yourself reading the computational geometry books. If you are doing a lot of GIS analysis with spatial statistics stuff, then you should definitely get this amazing book Elementary Statistics for Geographers Third Edition.

If you do a lot of computational intensive programming and would like to catch up on math specifically for GIS, review

Testing in GIS

Review these two posts:

Python

Learn Python. It is the most widely used programming language in GIS industry and its usage will only expand. I have written the A progression path for GIS analyst which could be used a development road map. You should be very comfortable using Python; having Python skills will let you have a stronger influence on any operations within the organization and potentially automate more manual workflows leading to a better workplace.

Linux and bash

Learn Linux and bash. I think I should have started using Linux earlier. There are a few (1, 2) ready to use VirtualBox images with a ton of open-source GIS software installed, configured and ready-to-use. Using those machines will save you a lot of time. Learning bash is extremely helpful because it would let you be much more productive executing smaller commands and building pipelines for data processing than you would normally do on Windows using a programming language. Obviously learning bash, Linux, and Python are part of the industry agnostic skill set you could benefit from having at any later point of time.

Readings

There are so many excellent GIS books that I would like to recommend. You can find most popular titles online. What I’d like to do instead is to share of the hidden gems I have discovered and have really enjoyed reviewing. You can find those in the post Useful resources in computer science/math for GIS Analysts.

Programming

Ad-hoc mentality is very difficult to fight. It is 7 pm. You have a job you have to get done before releasing the data to a customer tomorrow morning. You are adding a missing domain to a geodatabase that your colleague’s Python script failed to add. Then you are changing a data type for a field because you have just realized that you need to store text instead of numbers. And… you find a few other things you are so glad to have spotted before the release. You fix them, zip the database, and upload it onto an FTP site. It is 10 pm, you are tired but happy and are heading off to the bed.

Success! … Or is it? The next thing tomorrow morning you want to document the manual changes you’ve introduced yesterday, but you are being dragged into some other urgent job… and you never do. A week after, a customer sends an email telling you she’s not able to run their in-house tools using your database you’ve prepared for them, but the one you’ve prepared a month ago works. Now it is 9 pm again and you are writing some oddly looking script trying to compare the databases and recalling what have you done on that evening… You are in a mess.

Doing what you have done may look natural because you just want to get stuff done. However, I want you to look at this from another perspective. You want your steps to be reproducible. You want to be able to track the changes you have done. Not only you, but any colleague of yours should be able to pick up the updates that have been made to any piece of data or a script. So resist the urge to get stuff done, pace yourself, and track your work with one of the following methods.

Documenting manually

If you are not comfortable programming or scripting at all, you should document each step you are taking while making modifications to a dataset. At least you could see what has been done in written form. I cannot stress that enough – you should document what you are doing, not what you have done. So write down what you have done after each change operation, not after you have done all the work. This is how it can look:

  1. You added field Area of Double type to the table dbo.Parcels.
  2. You write: “Add field Area of Double type to the table dbo.Parcels.”
  3. You dropped field OldArea of Double type in the table dbo.Parcels.
  4. You write: “Drop field OldArea of Double type in the table dbo.Parcels.”

One of the disadvantages of this approach is that it is possible to get the changes out of sync with the documentation. You could have made an error documenting a data type or a field name. Another thing is that the very same step can be done in many ways – what if you add field to a database using some GIS application and a colleague of yours uses a DBMS command line tool? Documenting exactly the procedure of making changes soon becomes tedious and you end up with tons of instructions that easily becomes misleading or plain obsolete. However, if you are vigorous, it is still possible to maintain a decent level of changes tracking with sufficiently rigid discipline.

Simply programming with VCS

Another approach is to write a program that will make the changes. When you write code, you don’t need to document what you are doing because the reader familiar with the syntax of this programming language will understand what happens. You can add some comments of course explaining why adding certain fields is required though. So, if you are building a database with a few tables, you can write a SQL script that can be re-run recreating your database at any point of time. If you never make any manual changes to a database and only write and keep SQL commands, your main SQL data compilation script will never get out of sync.

This leads us to a concept of version tracking where it is possible to track how your SQL script changed since the last version. Who is not guilty of having at some point of our career a dozen of files with some scripts named “production_final_compilation_truly_final_12.sql“? To avoid this mess, you should really use a VCS.

The main argument against this approach is that it setting up all this version control tools look like an overkill for someone doing simple GIS work. However, you will see how much safer your work will be in the long run. It will pay off very soon. Invest some time in learning about VCS such as Git for managing the source code. All major players – BitBucket, GitLab, and GitHub – provide free private repositories. Find out whether there is a VCS solution deployed in-house within your organization, such as Microsoft TFS, which you could use to check in the code. Should you like to dive deeper into Git, read the Git Pro book for free online. If you are not comfortable putting anything into the cloud (which is just someone’s else computer), use Git locally on your machine or a local server where you can securely check in your code and ask your system administrator to take backups of those repositories.

Open source vs proprietary software

Throughout your GIS career, you most likely will be exposed to both proprietary and open source software. You will have Windows on your machine with QGIS; or a Linux machine with Esri ArcGIS Server. It would be naive to think that either of these technologies is superior to another. You should be able to get the job done whatever tools you have available because you will not always be able to decide what your employer will be using.

I suggest instead being comfortable with both of them and widening your toolset as much as possible. As you become exposed to different tools, you will soon realize that commercial software can be much better for certain jobs rather than open-source or free one. For instance, certain types of spatial joins can run faster in ArcGIS Desktop rather than PostGIS, but some GDAL based raster masking may outperform ArcGIS Spatial Analyst tools. Creating a map layout with data driven pages is a pleasure in ArcMap, but can be tedious in QGIS. Always do the benchmarking to understand what tools work best and document your findings. Keep in mind that the very same tool can take 1 second to process 1,000 features and 5 minutes to process 10,000 features. Review briefly the Big O notation to avoid surprises.

I have always encouraged people using a particular tool to understand what it really does. Having a clear understanding of the underlying process will make it possible for you to extend an existing tool or write your own one. For instance, if you are applying some raster masking, you should understand what a matrix is. If you do a spatial join, you should understand how having a spatial index helps.

Always look to expand your toolset and be prepared to apply a tool that you think would be right for a particular job. A customer only has PostGIS and needs to do some polygon intersection? You use ST_Intersects native function. Don’t have access to QGIS? Know which ArcToolbox tool does the same job. You have to process a file on Linux machine you SSH into (so no Excel-like software)? Use bash or pandas to wrangle the data as needed. You shouldn’t be constrained by the environment you are in and what tools you have available at your disposal. You should be able to get the job done no matter what.

Keeping up with the industry

I have been a user of GIS StackExchange since 2013 and have blogged about my experience and why is it useful to be active on a forum in the post 4 years with GIS at StackExchange. Make a habit of reading the weekly most popular questions, for instance, every weekend. If you see a question you know the answer to, post it. It also helps to ask a question you had yourself and then you spent a week solving it and then finally found a solution. Please post an answer to your own question. You will save some effort to a peer GIS professional and you can also find this answer later when you will be doing a web search for the same issue in a few years time. If you have some time, you can review the most popular questions using the most voted questions option; there is so much to learn there.

Edit files in a mounted Linux directory in Windows

Sometimes it is very useful to be able to edit files stored on a Linux machine in a Windows application. This can be a handy setup when you want to store your source code on Linux to be able to execute it against a Linux Python interpreter but you would like to edit it in a rich GUI application such as PyCharm or Eclipse. To achieve this, you can use an open source framework that mounts a Linux directory as a Windows drive from which you can add your files to a PyCharm project.

Another use case is when an application you need to use is available under Windows only, but copying the files from Windows to Linux upon every edit is tedious.

To mount a Linux directory in Windows:

  1. Install https://www.microsoft.com/en-US/download/details.aspx?id=40784 (install x64).

  2. Install https://github.com/dokan-dev/dokany/releases/tag/v0.7.4 (ignore a message saying do you want to download the VS runtime – this is because it installs libraries for x86 you will not need, just click Cancel).

  3. Download and run https://github.com/feo-cz/win-sshfs/releases/tag/1.5.12.8. It will be available in your tray.

  4. Mount a drive as described in the section Using Win-SSHFS to Mount Remote File Systems on Windows at https://www.digitalocean.com/community/tutorials/how-to-use-sshfs-to-mount-remote-file-systems-over-ssh.

You can optionally choose to mount a Linux directory on Windows start up. Extremely handy.

Drop multiple PostgreSQL databases matching name pattern

I once had a long data processing job that created multiple database snapshots on a PostgreSQL instance. However, this process was crashing sometimes leaving the databases behind. I had to automate the process of dropping the database where databases with names matching a certain pattern would be dropped. Bash script came in handy!

In the script, I write to a text file a series of SQL drop database statements and then execute each of the line using psql. It tool me a while to get it right as I wasn’t able to find a ready-to-use Bash script online, so this may be helpful to others.

Handy Bash scripts to run OGR command line tools

If you are a user of GDAL installation (which includes OGR), then you could take advantage of running OGR command line tools such as ogrinfo and ogr2ogr which are very handy and often more efficient in data processing comparing to Python scripts written using shapely or fiona. As a rule of thumb, if your Bash script is within 20-30 lines of code then you are doing okay. If it gets longer, it is worth switching to Python for readability and maintainability of the code.

Make sure to review the Python GDAL/OGR Cookbook!, it has a ton of useful examples. Below are some snippets you can use; they will also help you learn Bash if you are not familiar with it yet.

If you will be writing a lot of Bash, I suggest using an IDE that supports it. I have been using PyCharm with an amazing plugin BashSupport. It takes the experience of writing Bash scripts to a new level. It provides syntax highlight, auto-completion, and hover hints.

How to print line being executed when running a Python script

If you have ever needed to run a rather large data processing script, you know that it may be rather difficult to track the progress of the script execution. If you copy a number of files from a directory into another one you can easily show the progress by figuring out the total size of all files and then print how much has already been copied or how much is left. However, if your program does many things and executes code from some 3rd party packages, there is a risk you won’t have a clue about how much time is left or at least where you are in the program, that is what line of code is currently being executed.

A simple solution to this is to spread the print statements around the program to show the progress. This approach works great when you have just a few key breakpoints you are paying attention to, however, as your program grows it may become vital to be able to tell exactly what line of code is being executed. This may come in handy if the program does not seem to do anything any longer and you would like to re-execute it, but do not want to run the code that has been run successfully. Adding a print statement after each line of code will soon become impractical.

Fortunately, there is a built-in module trace available both in Python 2 and 3 which you can use to show the progress of your program execution. To learn more about the trace module, take a look at the standard library docs page and on the Python MOTW page.

To provide a simple example, if you have a Python script run.py containing the following:

import math
import time

print(math.pow(2, 3))
print(math.pow(2, 4))
print(math.pow(2, 5))

then you can run python -m trace -t --ignore-dir C:\Python36 .\run.py to see the live updates on what line of your program is being executed. This means you can run your long time taking script in a terminal and then get back to it now and then to see its progress because it will print each line that is currently being executed. The handy --ignore-dir option lets you filter out calls to the internal Python modules so your terminal won’t be polluted with unnecessary details.

On Windows, be aware of the bug in CPython which breaks because of how directories comparison works incorrect on case-insensitive file systems (such as NTFS on Windows). So be sure to specify the path to the Python interpreter directory using the right case (C:\Python36 would work, but c:\python36 would not).

You can also provide multiple directories to ignore, but be aware of what environment you run your Python script on Windows, because you would need to use different syntax.

  • Git Bash: $ python -m trace -t --ignore-dir 'C:/Python36;C:/Util' run.py
  • cmd: python -m trace -t --ignore-dir C:\Python36;C:\Util .\run.py
  • PowerShell: python -m trace -t --ignore-dir 'C:\Python36;C:\Util' .\run.py

In Linux, it seems like you don’t have to provide the ignore-dir argument at all to filter out the system calls:

linuxuser@LinuxMachine:~/Development$ python -m trace -t run.py
— modulename: run, funcname: <module>
run.py(1): import math
run.py(2): import time
run.py(4): print(math.pow(2, 3))
8.0
run.py(5): print(math.pow(2, 4))
16.0
run.py(6): print(math.pow(2, 5))
32.0
— modulename: trace, funcname: _unsettrace
trace.py(80): sys.settrace(None)

The trace module also has other usages such as generating the code coverage and branching which can be useful if you would like to see what branch of your if-else was picked during the program execution. However, you wouldn’t use the trace module only to generate the code coverage, because there is Python package called coverage.py that provides much richer functionality for this, so be sure to use that instead.

Python static code analysis and linting

Over the past years, I have been using various tools that helped me to write better Python code and catch common errors before committing the code. Linter is a piece of software that can help with that and there are a few Python linters that are capable of finding and reporting issues with your code. I would like to split the types of issues a linter can report into three groups:

Obvious code errors that would cause runtime errors.

Those are easy ones. To mention a few:

  • you forgot to declare a variable before it is used;
  • you supplied wrong number of arguments to a function;
  • you try to access a non-existing class property or method.

Linters help you catch those errors so it is great to run the linter on your Python modules before executing them. You would need to modify your code manually. PyLint or flake8 could be used.

Style related issues that do not cause runtime errors.

Those are easy ones, too. To mention a few:

  • a code line is too large making it difficult to read;
  • a docstring has single quotes (instead of double quotes);
  • you have two statements on the same line of code (separated with semicolon ;);
  • you have too many spaces around certain operators such as assignment.

Linters can also help you catch those issues so it is great to run the linter on your Python modules before executing them. They are less critical as you won’t get any runtime errors due to those issues found. Fixing those issues, however, will make your code more consistent and easier to work with.

Making such changes as separating function arguments with a space or breaking a longer line manually could be tedious. It becomes even more difficult if you are working with a legacy Python module that was written without any style guidelines. You could need to reformat every single line of the code which is very impractical.

Fortunately, there are a couple of Python code formatters (such as autopep8 and yapf) that could reformat Python module in place meaning you don’t have do it manually. Formatters depend on the configuration that would define how certain issues should be handled, for instance, the maximum length line or whether all function arguments should be supplied each on a separate line. The configuration files is read every time formatter runs and makes it possible to use the same code style which is of utter importance when you are a team of Python developers.

The general style guidelines can be found at PEP-8 which is the de-facto standard for code formatting used by Python community. However, if PEP-8 suggestions don’t work for your team, you can tweak it; the important thing is that everyone agrees and sticks to the standard you decide to use.

Code quality, maintainability, and compliance to best practices

Those are more complex mainly because it is way harder to identify less obvious issues in the code. To mention a few:

  • a class has too many methods and properties;
  • there are two many nesting if and else statements;
  • a function is too complex and should be split into multiple ones.

There are just a few linters that can give some hints on those. You would obviously need to modify your code manually.

As I used linters more and more, I’ve been exposed to various issues I have not thought of earlier. At that point of time I realized that linters I used could not catch all kinds of issues that could exist leaving some of them for me for debugging. So, I’ve started searching for other Python linters and Python rulesets that I could learn from to write better code.

There are so many Python linters and programs that could help you with static analysis – a dedicated list is maintained under awesome-static-analysis repository.

I would like to share a couple of helpful tools not listed there that could be of great help. The obvious errors and stylistic issues are less interesting because there are so many Python linters that would report those such as pylint and flake8. However, for more complex programs, to be able to get some insights on possible refactoring can often be more important. Knowing the best practices of the language and idioms can also make your Python code easier to understand and maintain. There are even some companies that develop products that check the quality of your code and report any possible issues. Reading through their rulesets is also very helpful.

wemake-python-styleguide is a flake8 plugin that aggregates many other flake8 plugins reporting a huge number of issues of all three categories we have discussed above. Custom rules (not reported by any flake8 plugin) can be found at the docs page.

SonarSource linter is available in multiple IDEs and can report all kinds of bugs, code smells, and pieces of code that are too complex and should be refactored. Make sure to read through the ruleset, it is a great one.

Semmle ruleset is an open-source product (as part of LGTM.com), and their ruleset is very helpful and should be reviewed.

SourceMeter ruleset is not an open-source product (however, there is a free version) but their ruleset is also very helpful and should be reviewed.