Design of WebGIS back-end: architecture considerations

I have spent last two years doing a lot of Python development and designing and implementing Web GIS which included ArcGIS Server, geoprocessing services and ArcGIS API for JavaScript (further JS) web client. What I would like to do is to share an idea which I got to like.

If you need to do something, try doing it at the back-end

Imagine you have a JS web application where users will work with some feature services via a web map. They can select multiple features and calculate the sum of the values features have in a field (or fields). Let’s go through alternatives you have now.

  1. Pre-calculate the values you think your users will query and store them in the database.
    This would work fine actually when you know that your users are going to generate reports on a certain fields often and the performance is crucial. It might actually make sense to calculate certain values beforehand and store them. The disadvantage of this is additional storage and that you need to keep the values updated – the calculated field depends on other fields and their values can change. This would imply re-calculating the report field often as a part of the daily or weekly routine depending on the workflow.
  1. Get the feature’s data from the ArcGIS Server feature service and calculate the requested value on-the-fly in the client.
    Unless you are retrieving complex geometry, this operation wouldn’t cost you much. The problem is that the volume of JS code (or TypeScript) will increase and every upcoming modification in the code would imply new release which can be a painful process if you need to compress your code and move things around. Another thing is that if the amount of data you work with is rather large, there is a good chance the web browser might get slow and the performance will degrade significantly.
  1. Use the database server to calculate the values.
    This became my favorite over last years. This approach has multiple advantages.
    First, this operation runs on the database server machine with enough RAM and CPU resources. So you are not limited by the web browser capacity. The database servers are very good at calculating the values: this kind of operation is very inexpensive because in most cases it does not involve use of cursors. You have a privilege to work in transaction which provides a higher level of data integrity (it would be hard to mess up the database since you can roll back).
    Second, you can use SQL. It might not sound as an advantage first, but remember that code is written once, but is read many times. Readability counts. SQL is a clean way of communicating the workflow and the database code (such as stored procedures) is very easy to maintain. Unlike JS, you work with just one database object and don’t really have any dependencies on the system provided that you have a database server of a certain version and privileges required to create and execute stored procedures.
    Finally, allowing the database server do the work for you, you expose a certain procedure to other clients which could work with it. You don’t need to modify the client code and by updating the SQL code at one place, you automatically make it available for all the applications that work with it.

ArcREST: Python package for administering ArcGIS Server and ArcGIS Online/Portal

ArcREST is a great toolset I have found some time ago. It is for anyone who administers ArcGIS Online, ArcGIS Portal or ArcGIS Server. In short, it is a Python wrapper for the Esri REST API. I had to write many Python scripts that allowed me to update the properties of ArcGIS Server services in batch, but now I don’t need to write anything like this anymore. This is because now I can do everything I did on my own just by using ArcREST. If you are an ArcGIS Online / Portal admin, you should definitely take a look at this module since it can save you a lot of time, and you won’t need to author your own scripts for managing the ArcGIS Online content and organization settings with the scripting techniques.

This Python package is authored by Esri Solutions team and is available in public access on GitHub. You can download the source code, optionally install the package, and then use it on your local machine just like any Python package. If you don’t want to install the package, you can just add the path to arcrest and arcresthelper folders to the Python path by adding this into your Python file:

import sys
sys.path.append(r”path to arcrest folder”) #C:\GIS\Tools

Provided that you have a folder named arcrest in the example Tools folder, when you run the Python file, it will be able to import the arcrest package and access its modules.

To get an overview of this Python package, take a look at this excellent DevSummit 2015 video where developers of ArcREST talked about it.

Even though this is not a full implementation of the Esri REST API, it covers most of it and Esri developers update the code to include latest changes in the REST API. It is a good idea to clone the repository and pull the changes now and then to get the latest code if you will use on the daily basis.

I felt kind of sad first that all the Python code I wrote for administering ArcGIS Server won’t be used any longer, but at the same time so glad the ArcREST was developed. It is a great piece of software that will let you get started in no time at all and access all your server/online resources with Python.

Caveat: it does have some dependencies on arcpy package which is used for converting feature sets into JSON and back, but apart from that you should be able to run the tools on a machine with no ArcGIS software installed whatsoever.

Useful resources in computer science/math for GIS Analysts

A lot of people who are studying GIS at school or already working as GIS analysts or GIS consultants often wonder what kind of competence will help to be attractive for employers and what domains of expertise are going to be in demand in the foreseeable future.

Usually the kind of questions GIS professionals ask is how much a GIS analyst should learn from other domains. So, we are wondering how much math, statistics, programming, and computer science should GIS analysts learn. Naturally, knowing what kind of GIS specific expertise is in demand is also very helpful. I have several posts on how get better at GIS here, here, and here.

To know what kind of GIS tools can do what kind of job is definitely helpful. This is much like a woodworker should know what kind of tools he has in his toolbox and what tools are available in the woodworking shop. Finding an appropriate tool for a certain job is not so hard nowadays with the Internet search engine and QA sites. However, the ability to understand both how data processing tools work and what happens behind the scenes to be able to interpret the analysis results is indispensable.

What is often true for many GIS analysts is that during their studies the main focus was on the GIS techniques and tools while math and CS courses were supplementary. This makes sense and the graduates are indeed most often competent GIS professionals capable of operating various GIS software suites, provide user support, and perform all kind of spatial analysis. However, it is also possible that in a career change, a person who hasn’t done any studies on GIS, is working as a GIS analyst and needs to catch up a bit. For those people who feel that they lack background GIS competence that they should had a chance to learn during their studies, or for you who just want to learn something that could help to have a broader view and give a deeper understanding of the GIS, I have compiled a list of useful links and books. Please enjoy!

There are lots of great questions answered on the GIS.SE web site; here is just a few:

Great books:

Spatial Mathematics: Theory and Practice through Mapping (2013)
This book provides gentle introduction into some mathematical concepts with focus on mapping and might be a good book to start learning math in GIS. No advanced background in math is required and high-school math competence will be sufficient.

Table of contents

  • Geometry of the Sphere
  • Location, Trigonometry, and Measurement of the Sphere
  • Transformations: Analysis and Raster/Vector Formats
  • Replication of Results: Color and Number
  • Scale
  • Partitioning of Data: Classification and Analysis
  • Visualizing Hierarchies
  • Distribution of Data: Selected Concepts
  • Map Projections
  • Integrating Past, Present, and Future Approaches

Mathematical Techniques in GIS, Second Edition (2014)
This book gives you a fairly deep understanding of the math concepts that are applicable in GIS. To follow the first 5 chapters, you don’t need any math except high school math. Later on, the book assumes that you have good knowledge of math at the level of a college Algebra II course. If you feel that it gets hard to read, take an Algebra II course online at Khan Academy or watch some videos from MIT to catch up first and then get back to the book. What I really liked about this book is that there are plenty of applicable examples on how to implement certain mathematical algorithms to solve the basic GIS problems such as point in polygon problem, finding if lines are intersecting and calculating area of overlap between two polygons. This could be particularly useful for GIS analysts who are trying to develop own GIS tools and are looking for some background on where to get started with the theory behind the spatial algorithms.

Table of contents

  • Characteristics of Geographic Information
  • Numbers and Numerical Analysis
  • Algebra: Treating Numbers as Symbols
  • The Geometry of Common Shapes
  • Plane and Spherical Trigonometry
  • Differential and Integral Calculus
  • Matrices and Determinants
  • Vectors
  • Curves and Surfaces
  • 2D/3D Transformations
  • Map Projections
  • Basic Statistics
  • Correlation and Regression
  • Best-Fit Solutions

GIS: A Computing Perspective, Second Edition (2004)
The book is a bit dated, but it is probably the best book in computer science for a GIS professional. It provides very deep understanding of the computational aspects that are used in GIS.

Table of contents

  • Introduction
  • Fundamental database concepts
  • Fundamental spatial concepts
  • Models of geospatial information
  • Representation and algorithms
  • Structures and access methods
  • Architectures
  • Interfaces
  • Spatial reasoning and uncertainty
  • Time

Practical GIS Analysis (2002)
This book is a unique example of a book for GIS professionals who want to see how the basic GIS algorithms and tools work. The exercises that follow give readers a chance to execute many common GIS algorithms by hand which let truly understand even some complex operations such as generating TIN or finding the shortest path on a street network. The software used as a reference is ArcView GIS 3, but it is still relevant as the GIS concepts haven’t changed much since then.

Table of contents

  • GIS Data Models
  • GIS Tabular Analysis
  • Point Analysis
  • Line Analysis
  • Network Analysis
  • Dynamic Segmentation
  • Polygon Analysis
  • Grid Analysis
  • Image Analysis Basics
  • Vector Exercises
  • Grid Exercises
  • Saving Time in GIS Analysis

Maths for Map Makers (2004)
I haven’t read this book so don’t have anything to comment on this. Sorry!

Table of contents

  • Plane Geometry
  • Trigonometry
  • Plane Coordinates
  • Problems in Three Dimensions
  • Areas and Volumes
  • Matrices
  • Vectors
  • Conic Sections
  • Spherical Trigonometry
  • Solution of Equations
  • Least Squares Estimation
  • References
  • Least Squares models for the general case
  • Notation for Least Squares

Exploring Spatial Analysis in GIS (1996)
I haven’t read this book either. I guess this one might be hard to find, but have listed it here just in case.

Good luck with the readings!

Publishing Python scripts as geoprocessing services: best practices

Why Python instead of models?

If you have been publishing your ModelBuilder models as geoprocessing (further GP) services you have probably realized that it can be quite cumbersome. If you haven’t moved to Python, I think you really should. Authoring Python scripts has serious advantages over authoring models in the context of publishing GP services. This is because during the publishing process, ArcGIS Server will turn data and anything that may be needed to change into variables and this might mess up the model if you haven’t followed the guidelines on authoring GP services. The rule of thumb for me was that if there are more than 10 objects in the model, it is a good time to switch to Python. Another thing is that you can easily make modifications in the Python code without republishing; in contrast, you need to republish the model each time you want to release an updated version of the GP service. Finally, since you don’t need to restart the GP service when updating the Python file (in contrast to republishing the model which requires restarting service), there is no down-time for the service and users won’t notice anything.

What happens after publishing?

Let’s take a look at what is going on under the hood. You have run your script tool in ArcMap and got the result published as a service. Now you can find your service and all the accompanying data inside the arcgisserver folder somewhere on your disk drive. The path would be: C:\arcgisserver\directories\arcgissystem\arcgisinput\%GPServiceName%.GPServer

You will find a bunch of files within the folder. Let’s inspect some of them:

  • serviceconfiguration.json – provides an overview over all the properties of the service including its execution type, enabled capabilities, output directory and many others. Here you will see all the settings you usually see in the Service Editor window.
  • manifest.xml and manifest.json – provides an overview of the system settings that were used while publishing the service. Those are not the files you usually would want to inspect.

Inside the folder esriinfo/metadata there is a file named metadata.xml which is really helpful because there you can see what date a service was published. Two tags you should look at are:

  • <CreaDate>20141204</CreaDate>
  • <CreaTime>15443700</CreaTime>

Since this information is not exposed from the GUI in ArcGIS Desktop or ArcGIS Server Manager, this is the only way to find out what time the service was created. This information may be very handy when you are unsure about the release versions.

Inside the extracted/v101 folder, you will find the result file and the toolbox you have worked with when publishing the GP service. Here you will also find a folder named after the folder where you source Python file was stored and containing the source Python file.

Best practices to organize the Python code and files?

Let’s look inside the Python file. You might have noticed that when publishing some of the variables you’ve declared were renamed to g_ESRI_variable_%id%. The rule of a thumb is that you shouldn’t really use strings; you can turn paths to datasets and names into variables. Of course you don’t have to do this since Esri will update those inline variables, but it is so much harder to refactor with those variable names, so you better organize your code correctly from the beginning.

If running the script tool in ArcGIS, scratch geodatabase is located at C:\Users\%user%\AppData\Local\Temp\scratch.gdb. However, after publishing the tool, the service will get a new scratch geodatabase. If you need to inspect the intermediate data created, go to the scratch geodatabase (the path can be retrieved with the arcpy.env.scratchGDB) which will be a new file geodatabase in each run of GP service with the following notation: c:\arcgisserver\directories\arcgisjobs\%service%_gpserver\%jobid%\scratch\scratch.gdb.

Keep in mind that GP service will always use its local server jobs folder for writing intermediate data and this behavior cannot be changed. But having the service writing to the scratch workspace is actually a lot safer than writing to a designated location on disk. This is because there is no chance of multiple GP service instances trying to write to the same location at the same time which can result in dead-locking and concurrency issues. Remember that each submitted GP job will be assigned to a new unique scratch folder and geodatabase in the arcgisjobs folder.

Make sure you don’t use arcpy.env.workspace in your code; always declare a path variable and assign it to be the folder or a geodatabase connection. For the datasets path, use the os.path.join() instead of concatenating strings. For performance reasons, use in_memory workspace for intermediate data with the following notation:

var = os.path.join(in_memory,"FeatureClassName")

You can take advantage of using in_memory workspace, but for troubleshooting purposes it might be better to write something to disk to inspect later on. In this case, it might be handy to create a variable called something like gpTempPlace which you can change to be either “in_memory” or a local file geodatabase depending whether you run clean code in production or troubleshoot the service in the staging environment.

Make sure you don’t use the same name for variable and feature class/field name. This sometimes leads to unexpected results when running the service. It might be helpful to add “_fc” in the end for feature class variable and “_field” for the field variable. This way, you will also be able to distinguish them much easier. The same is applicable for the feature class and feature layer (created with Make Feature Layer GP tool) names.

Remember that you can adjust the Logging level of ArcGIS Server (in Manger) or GP service only (done in Service Editor window) for troubleshooting purposes. It is often useful to set the Message Level setting to Info level before publishing the GP service into production because this will give you the detailed information what exactly went wrong when running the GP service. You can access this information either from the Results window in ArcMap or from the ArcGIS Server logs in Manager.

How do I update the published service?

A special word should go for those users who needs to publish the GP services often while making changes in the code. It is important to understand that after publishing the GP service, the copied Python code file and the toolbox don’t maintain any connection to the source Python file and the source toolbox you have authored. This implies that after making edits to the script tool, you need to push those changes to the published service on server.

There are two types of changes you can make on your source project: the tool parameters in the script tool properties and the Python code. Keep in mind that you cannot edit the published toolbox; so if you added a new parameter or modified existing parameter data type, you would need to republish the service. However, if have only modified the Python script source code, there is no need to republish the whole service as you only need to replace the contents within the Python file.

Even though you can automate service publishing workflow with Python, it still takes time to move the toolbox and the Python code files. Therefore, you can save a lot of time by finding a way to replace the Python code in the published Python code file. In order to update the Python code file, you have really just two options – you either copy and replace the published file with the updated source file or copy/paste the source code. This approach may work if you have a tiny Python script and all the paths to the data on your machine and on the server are the same. This can be a plausible solution when you have Desktop and Server on the same machine. If you have a configuration file where the Python source code file gets all the paths and dataset names from, you could also safely replace the published Python file. However, this is still an extra thing to do.

The best practice I came to while working for two years on the GP services is to split the code files and the tool itself. Let me explain.

How do I separate the toolbox logic and Python code?

Create a Python file (a caller script) which will contain all the import statements for the Python modules and your files. By appending the path to your Python files, you will be able import the Python files you are working on; this is very useful when your GP service consists not just of one file yet of multiple modules.

import sys
import socket
sys.path.append(r'\\' + socket.gethostname() + "path to code files")
import codefile1 #your Python file with business logic
import codefile2 #your Python file with business logic

This file should also include all the parameters which are exposed in the script tool.

Param1 = arcpy.GetParameterAsText(0)
Param2 = arcpy.GetParameterAsText(1)
Param3 = arcpy.GetParameterAsText(2)

Then you define a main function which will be executed when running the GP service. You call the functions defined within the files you imported.

def mainworkflow(Param1,Param2):
    Result = codefile1.functionName(Param1,Param2)
return Result

It is also handy to add some parameter handling logic; when you run the script tool in ArcMap, you supply some values for the tool which will become default values visible when users will execute GP service from ArcMap or from any other custom interface. In order to avoid that, you can just leave those parameters empty and then return empty output for GP script tool publishing only purposes.

if Param1 == '' and Param2 == "":
    Result = ""
    Result = mainworkflow(Param1,Param2)

Create a script tool from this caller Python file defining the parameters and their data types. After the tool will be published as a GP service, you can work with the Python files which will contain only the code that actually does the job.

After performing and saving the changes in the code, feel free to run the GP service directly – the caller Python file (published as a GP service) will import the codefile1 at the folder you specified and run the code. There is no need to restart the GP service or re-import your update module.

Which IDE should I choose for Python development for ArcGIS?

A bit of history…

I have started writing Python scripts in 2011 and my first IDE was PyScripter. I quite liked it first because this is so much better than Notepad… but after some time I realized that I get frustrated over some things such as inability to keep working on my code while the debug process was running and crashes that happened now and then. Some other useful features I was using in Visual Studio were not available either.

Choosing an IDE for Python might be hard especially if you haven’t used any earlier. However, I heard that Wing IDE is being used in Esri by quite a few Python developers. So, I switched to Wing IDE in 2014 and really liked it from the very beginning; it has a very clean UI and it is very easy to customize its appearance.

The intellisense (or code autocompletion) is working fine and I get most of my arcpy module objects in the suggestions. Some classes such as arcpy.da are implemented with CPython with no Python source wrapping it, IDEs cannot get the autocompletion for this one. However, while working with arcpy I got all the autocompletion I really need. It is very easy to switch what Python interpreter should be used for a certain file or a project (32-bit Python will fail to process large datasets; after choosing 64-bit Python if you have it installed as your Python executable you will be able to handle large GIS data with no problem provided that you have enough RAM on your machine. I’ve done some routing data (many GBs) processing and Python process was eating up around 12GB of RAM.)

Organizing your projects will be way easier because you can add many other files of other formats, such as SQL or HTML. Wing provides a great way to organize your datasets, documentation, and the code. It is capable of finding differences between two Python files in interactive mode. This is something you will definitely want to use when comparing several versions of the same file or a script you were working on.

As you see, it has many useful features which makes coding a lot more efficient and in fact pleasant. Here are some of them I use all the time:

Wing Source browser

wing_sourceThe PyScripter also has a similar window called Code Explorer but it is not as robust as the Wing’s Source Browser. I could build a very nice function calls tree using the Wing while working with some large Python module I’ve inherited. You can go to the place in the code where an object was used. This can be really helpful when refactoring some legacy code or when trying to build reference materials for yourself or peer developers.

Wing Debug probe

I use this window when writing new code and trying to find a bug in existing one. I usually run the program until the very end leaving a couple of clean-up rows, then fire up the Debug Probe and start working forward. Because your code is evaluated on-the-fly, there is no need to re-start the whole debugging process re-running the Python file.

wing_probeBecause the workflow you might work with can be executed on some large datasets, it will be very efficient not to run the program too often. After I’ve verified that the code I wrote is correct, I copy-paste it in the Python file. You can also run just a portion of your code in the Debug probe, which can be very useful for evaluating just a few rows.

You bring up the Debug Probe and press the + icon in top right near the Options menu to lock a range of lines into being the Active Range, which you can then execute by pressing the cog icon that appears in the shell.

Wing Source assistant

wing_assistUsing Wing, I found that I don’t go to the ArcGIS Help that often as I did when I was using PyScripter. Wing provides an interactive way to get the syntax and usage tips for a function or a class you use in any imported module. As you type the name of the tool, the Source Assistant window gets updated and you can see all the information about this tool. So, all the GP tool’s help with the input parameters and valid options are available right in the IDE window.

Some other key points
  • It is not expensive. This is reasonable price for a very good piece of software.
  • Wingware provides me with excellent and fast support. Always helpful and prompt.
  • It is very easy to authorize the software offline without any hustle. It is just about copying/pasting the license code. You are allowed to install the Wing you purchase on multiple machines, read more about licensing terms here.
  • The company trusts their users. You can just install it on a new laptop you get or on virtual machine you do your coding using the same license you have. When your license expires, the Wing will run for 10 minutes at a time without any license or activation at all, or a trial license period can be used until any license problem is resolved.
  • Wing has UI that just works. It starts fast, it has easy to navigate panels and popups. The software was always responsive and never crashed since I’ve started using it last year.

Come on, go and get yourself the Wings to fly and code like a pro!

How to be efficient as a GIS professional (part 3)

6. Automate, automate, automate

Whatever you are doing, take a second to think whether you will need to run the sequence of steps you’ve just completed again. It may seem first that you are very unlikely to run the same sequence of steps again, but in fact you may find yourself performing them over and over again later on.

Automating is not only about saving the time. It is also about the quality assurance. When you are doing something manually, there is always a chance to forget a certain step or detail which can potentially lead to an error. When having the workflow automated, you can always see what steps are being performed. An automated workflow is already a piece of documentation which you can share with others or use yourself as a reference.

Don’t trust your memory: you think you know what columns you’ve added to the table and why, yeah. Get back in two weeks and you will be surprised by how much of those memories you have left. If you will leave the job and get the work over to a new person, she will be happy to inherit a well maintained documentation and discrete description of the workflow he will be responsible for.

Considering desktop GIS automation, think about using Python for geospatial operations (think truncating tables + appending new data + perform data checks). For database automation, use SQL (add new columns + alter columns data type). Feel free to build SQL scripts with commands for adding/deleting/calculating columns and copying data, too. By preserving those scripts, you will always be able to re-run them on a another table, in another database or modify the script to match your needs. This gives you a way into looking at changes performed in your database. This is just like adding a field manually and then writing down that you have added field of type X into table Y at time Z. It is just so much easier to build a SQL script to avoid doing that.


Another advantage of the SQL for data processing is that it is very vendor neutral and can be executed either as is or with really minor adjustments on most DBMS platforms. This is applicable to SQL spatial functions which provide ISO and OGC compliant access to the geodatabase and database, too. Being able to execute SQL queries and perform data management operation is really advantageous when you work in a large IT environment. This might be helpful because you won’t always have the network connection to the production environment for data update and using ArcGIS might not be possible. Running a Python script would require having the Python installation on some machine and if you use arcpy – ArcGIS Desktop. Running a SQL code which has no dependencies might be your only alternative.

Many folks don’t know that one can use pure SQL with an enterprise geodatabase stored in any DBMS supported. This is just a short list of what you can do with SQL:

8. Python, Python, Python

I have blogged about using spatial functions of SQL Server earlier. Remember that you can also execute some of the SQL from Python code when using the arcpy.ArcSDESQLExecute class. Here is the SQL reference for query expressions used in ArcGIS some of which you can use in the arcpy.da cursors where clauses. Learn some of the useful Python libraries which could save you some time. Look at:

  • Selenium for automating ftp data download if this happens often and you have to browse through a set of pages;
  • scipy.spatial module for spatial analysis such as building Voronoi diagrams, finding distances between arrays, construct convex hulls in N dimensions and doing many other things;
  • Numpy, a fundamental package for scientific computing with Python, for handling huge GIS datasets (both vectors and rasters) with arcpy.

Read more about What are the Python tools/modules/add-ins crucial in GIS and watch an Esri Video on Python: Useful Libraries for the GIS Professional.

Get a chance to learn more about the SQL and Python and how you could take advantage of them in your work!

Build ArcGIS network dataset from OpenStreetMap

I have blogged previously on how you can get street data for use in the ArcGIS Network Analyst. If you have obtained TomTom or Nokia (Navstreets) data, you can easily build a network dataset (further ND) by using the Esri SDP toolbox which I have blogged about earlier.

If you don’t have any other sources for the data, consider using the OpenStreetMap (OSM) data if it is applicable in your business case. I have blogged earlier on how to get OSM data into ArcGIS network dataset, but this approach is outdated and I recommend another way to build the network. The overall workflow is fairly straightforward:

Download OSM data

Go to the OSM home page tab and choose area to download. You can either draw a rectangle or specify the bounding box coordinates. If the area you choose will be too large, you will have to use one of the sources listed at the left panel for bulk data downloads. Clicking the Overpass API link will trigger downloading the map file with no extension. Rename it by adding the .osm extension.

2015-02-20 15_46_54-OpenStreetMap _ ExportInstall ArcGIS Editor for OSM

Now you have to download the ArcGIS Editor for OSM, either 10.0, 10.1 or 10.2.x Desktop version. The installation file will install the libraries required as well as a geoprocessing (further GP) toolbox tools of which you will access later on. Read through the documentation on how to build a ND from OSM data on the ArcGIS OSM Editor home page. After installing, you should find the OpenStreetMap Toolbox in your ArcToolbox folder in ArcGIS.

Load OSM file into geodatabase

Start by running the Load OSM File GP tool. Please activate the Conserve Memory option if you have a large OSM file (larger than the amount of RAM), because during this process all nodes are going to be fetched. If you fail to do so, the process might crash. I’ve hard time to process some large files on a 8GB virtual machine partly because of the Windows paging 2GB limit. Running the processing from the 64bit Python might help, but this is something I have not tested yet. I remember that some of network data processing algorithms I have developed failed on building adjacency matrix for a network with 15 million edges when running with 32bit Python, but completed with no problems when running under 64bit Python taking almost 10GB of RAM on my machine.

Build a network dataset

When the data will be loaded into a feature dataset, you are ready to build a network dataset. You will need the Create OSM Network Dataset GP tool for that. You will need to provide a Network Configuration File which you can find in the C:\Program Files (x86)\ArcGIS\Desktop10.1\ArcToolbox\Toolboxes\ND_ConfigFiles provided. This is an XML file which provides parameters for interpreting your road types data into edge cost evaluators. The DriveGeneric.xml is for a generic motorcar routing network, but there is another one which can be used for cycling networks. There is one more file there – DriveMeters.xml. This configuration offers faster runtime performance (less Script evaluators), but will only work with coordinate systems that have a linear unit of meters. Let the tool to run as it might take a lot of time if you have a large dataset. After the ND is built, feel free to modify its properties and test how it works.

OSM_serviceareasI suggest start by downloading a small area to verify the tools are working as expected. The map.osm file you download should not be larger than 20MB. After you have verified the workflow, feel free to try larger datasets. There are some other useful tools in the ArcGIS OSM Editor toolbox which you might want to explore. There are some for designing maps based on the OSM data and loading data into PostgreSQL database.