Introduction

This week my boss asked me to gather and produce some LIDAR layers for the island that his family has a cabin on, and I thought it would be a great opportunity to show the workflow for processing compressed LAS point cloud data.

You can download all of the files used in this tutorial by cloning this repository. Included in the repo are the JSON pipelines, the LAS tiles I used, and some scripts to run some of the steps I cover below.

> git clone https://github.com/hadallen/savary-pdal

In this post, I will talk about:

Software used
Finding the data
Processing the point cloud data into raster files with PDAL
Using QGIS to load and further analyze the raster files
Using GDAL directly to analyze the raster files

Interactive Map

Below is an interactive map of the Savary Island LIDAR data, made with Leaflet. The vector & raster data and legend entries are being served from a QGIS Server WMS instance.

Note: I’ve noticed that I had to reload the page to get the map to show sometimes - if you don’t see it below this message then try refreshing

You can switch through the layers by clicking on the layer switcher in the top right corner of the map.

The Crown Height Model (CHM) shows the difference (in meters) between the Digital Surface Model (DSM) and Digital Terrain Model (DTM). It is overlaid on top of a hillshade made from the DSM for better visualization.
The DTM Hillshade is a hillshade made from the Digital Terrain Model.
The Slope Percent shows the slope (rise/run) of the terrain in percent. For example, a slope of 50% means that a horizontal distance of 50 meters would result in a vertical rise of 25 meters.

Software used

PDAL 2.4.2 - The main software used to process the LAZ point clouds:

PDAL is Point Data Abstraction Library. It is a C/C++ open source library and applications for translating and processing point cloud data. ¹

GDAL 3.5.1 - A extremely useful C/C++ library and set of command line tools for reading and writing raster and vector data.

GDAL is a translator library for raster and vector geospatial data formats that is released under an MIT style Open Source License by the Open Source Geospatial Foundation. ²

QGIS 3.26.1-Buenos Aires - Used to view the resulting raster files, and can be used to simplify the use of gdal algorithms.

QGIS is a user friendly Open Source Geographic Information System (GIS) licensed under the GNU General Public License. QGIS is an official project of the Open Source Geospatial Foundation (OSGeo). It runs on Linux, Unix, Mac OSX, Windows and Android and supports numerous vector, raster, and database formats and functionalities. ³

Gathering the data

Getting your hands on point cloud data can be pretty hit or miss. You may be able to find an online resource, like LidarBC. This is a service that allows you to browse & download tiles from the province of British Columbia’s collection of LIDAR point cloud (some derivative raster data is also available).

There may be similar services for your province, county, state, city, or country. Some countries have regularly updated country-wide datasets that are freely accessible. Some countries have more limited access to data, but you may be able to find some data for your region of interest.

Check out this article or do a quick internet search for “(Your area of interest) LIDAR data” to see if you can track down open data for the area you are interested in.

Be sure to check the licensing information for the data you download, and follow it accordingly.

Once I found the tiles that I was interested in, I downloaded each of them and put them in a folder savary-pdal/tiles. These will be read by PDAL and processed into raster data.

Processing Workflow

Setting up the pipelines

Although you can use PDAL command line options to provide all of the information needed to process the data, I find it a lot simpler to use PDAL’s pipeline option. This allows you to save the configuration and use it for multiple datasets, or share it with others.

Pipelines allow you to layout your workflow in a way that is easy to understand and share. Basically, in the pipeline file, you define the steps that will be executed in order. You can also define the input and output files, and the parameters that will be used for reading and writing the files.

Below is an example pipeline from PDAL’s documentation. It is a simple workflow that reads input.las using readers.las, applies the filters.crop filter to extract the points within the specified bounds in the follow format: "([xmin, xmax], [ymin, ymax])",and then writes the file to output.bpf using writers.bpf.

1
2
3
4
5
6
7
8
[
    "input.las",
    {
        "type":"filters.crop",
        "bounds":"([0,100],[0,100])" 
    },
    "output.bpf"
]

Refer to PDAL’s pipeline documentation for more information.

Creating the Digital Surface Model (DSM) pipeline

To create a digital surface model from the point clouds, we are only interested in the points that reflected off the top of the trees, buildings, and other objects - not the ones that penetrated the canopy and bounced off the ground.

Although I don’t in this example, you could potentially use the filters.outliers filter (like we do in the DTM Pipeline to remove the points that may be “noise” or “static”.

Pipeline example: `generate-dsm.json`

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
[
    {
        "type":"readers.las"
    },
    {
        "type":"filters.reprojection",
        "in_srs":"EPSG:6653",
        "out_srs":"EPSG:3157"
    },
    {
        "type":"filters.range",
        "limits":"returnnumber[1:1]"
    },

    {
        "type": "writers.gdal",
        "output_type":"idw",
        "gdaldriver":"GTiff",
        "resolution": 0.5,
        "radius": 1
    }
]

In the generate-dsm.json pipeline above, you can see:

readers.las is used to set the input file type. If you are only using a single file, you can use the filename option here.
filters.reprojection is used to reproject the data from one CRS to another (EPSG:6653 to EPSG:3157). CRS’s are used to define the coordinate system of the data and can be extremely confusing & a common source of errors. Make sure you know the CRS of the data you are working with and that all data is reprojected to the same CRS, if needed.
filters.range is used to isolate ONLY the 1st return points. This filters only the 1st return points, which are the ones that reflected off the top of the trees, buildings, and other objects. Only these filtered points will be passed onto the next stage.
writers.gdal is used to write the set the options for the output file.
- "gdaldriver":"GTiff": the gdal driver is set to GTiff for a GeoTIFF. I suggest translating it to a Cloud Optimized GeoTIFF (COG) as an additional step after analysis is finished, as it is internally tiled and has better compression.
- "resolution": 0.5: the resolution of the is set to 0.5m, which is very detailed. This is the height and width of each pixel in the output raster. The resolution can greatly affect the size of the output file, so for larger datasets, you may need to use a lower resolution.

You may notice that although I specify the readers and writers, I do not specify the filename. This is because we need to add an additional step to analyze our tiled LAZ files. If I was running a single file through the pipeline, I could specify the filenames in pipeline or use the following command:

pdal pipeline ./generate-dsm.json \
    --readers.las.filename=tiles/bc_092f096_2_3_2_xyes_8_utm10_2019.laz \
    --writers.gdal.filename=out/dsm/dsm_bc_092f096_2_3_2_xyes_8_utm10_2019.tif

Creating the Digital Terrain Model (DTM) pipeline

Now that our pipeline for the Digital Surface Model (DSM) is ready, we can create the pipeline for the Digital Terrain Model (DTM).

This takes a bit more work, because we need to do some classification and further filtering of our data to clean it up and extract the ground information.

Pipeline example: `generate-dtm.json`

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[
    {
        "type":"readers.las"
    },
    {
        "type":"filters.reprojection",
        "in_srs":"EPSG:6653",
        "out_srs":"EPSG:3157"
    },
    {
        "type":"filters.assign",
        "assignment":"Classification[:]=0"
    },
    {
        "type":"filters.elm"
    },
    {
        "type":"filters.outlier"
    },
    {

        "type":"filters.smrf",
        "ignore":"Classification[7:7]",
        "slope":0.2,
        "window":16,
        "threshold":0.45,
        "scalar":1.2
    },
    {
        "type":"filters.range",
        "limits":"Classification[2:2]"
    },
    {
        "type": "writers.gdal",
        "output_type":"idw",
        "gdaldriver":"GTiff",
        "resolution": 0.5,
        "radius": 1
    }
]

You will notice that the readers.las , filters.reprojection, and writers.gdal stages are the same as the DSM pipeline. However, there are a few more stages added to the pipeline.

Cleaning up the data (denoise, remove outliers, etc.)

Point cloud data can often contain points that will throw off your analysis. There are a few easy filters that can be used to clean up data, removing this “noise” that results in a smoother terrain surface.

First, the filters.assign stage is used to clear the classification values. Your data may already be classified, but you may not know the accuracy of the classification or you may need to do it yourself with specific options.
Next is the filters.elm stage, which is the Extended Local Minimum (ELM) filter. It classifies low points in the data as classification 7. Similarly, the filters.outlier stage is used to remove outliers. This is a filter that removes points that are too far from the average of the points in their neighborhood, and marks them as classification 7 as well. These will be ignored in the next step.

Classifying the ground points

Now that we have cleaned up the point cloud data, we can begin to classify the points as ground or non-ground using the filters.smrf (Simple Morphological Filter). This filter assigns ground points to the classification value of 2, which we will filter out using filters.range, which will be passed to writers.gdal to write the output file.

The options for filters.smrf can be found on the documentation and tweaked to better suit your needs.

Similar to the DSM, if I was running a single file through the pipeline, I could specify the filenames in pipeline or use the following command:

pdal pipeline ./generate-dtm.json \
    --readers.las.filename=tiles/bc_092f096_2_3_2_xyes_8_utm10_2019.laz \
    --writers.gdal.filename=out/dtm/dtm_bc_092f096_2_3_2_xyes_8_utm10_2019.tif

Running batch pipelines using `parallel`

PDAL has a great write-up on batch processing in the Workshop section. I will go through the process I used, using ls and parallel in bash or zsh, but if you are on Windows, you can follow the instructions in the write-up.

Creating the list of files with `ls`

To begin, I needed an ls command that will list ONLY the tiles that I was interested in processing.

Since I have all of my tiles in a single directory, I can use the ls command from inside the tiles directory to list all of the files inside.

> cd tiles         # modify this to the directory where your tiles are

> ls               # lists all the files in the current directory
bc_092f096_2_3_2_xyes_8_utm10_2019.laz  bc_092f096_2_4_4_xyes_8_utm10_2019.laz
bc_092f096_2_3_4_xyes_8_utm10_2019.laz  bc_092f097_1_3_1_xyes_8_utm10_2019.laz
bc_092f096_2_4_1_xyes_8_utm10_2019.laz  bc_092f097_1_3_3_xyes_8_utm10_2019.laz
bc_092f096_2_4_2_xyes_8_utm10_2019.laz  bc_092f097_1_3_4_xyes_8_utm10_2019.laz
bc_092f096_2_4_3_xyes_8_utm10_2019.laz  bc_092f097_3_1_2_xyes_8_utm10_2019.laz

If I had many tiles in that directory but I only wanted to process some of them, I just could use the ls bc_*.laz command to list all of the files that match bc_*.laz. Adjust this to your dataset as needed.

> cd tiles         # modify this to the directory where your tiles are

> ls bc_*.laz      # lists all files that match the pattern 'bc_*.laz'
bc_092f096_2_3_2_xyes_8_utm10_2019.laz  bc_092f096_2_4_4_xyes_8_utm10_2019.laz
bc_092f096_2_3_4_xyes_8_utm10_2019.laz  bc_092f097_1_3_1_xyes_8_utm10_2019.laz
bc_092f096_2_4_1_xyes_8_utm10_2019.laz  bc_092f097_1_3_3_xyes_8_utm10_2019.laz
bc_092f096_2_4_2_xyes_8_utm10_2019.laz  bc_092f097_1_3_4_xyes_8_utm10_2019.laz
bc_092f096_2_4_3_xyes_8_utm10_2019.laz  bc_092f097_3_1_2_xyes_8_utm10_2019.laz

Running the list of files through `parallel`

Now that I have the list of files, I can use parallel to run each pipeline on each tile:

> cd tiles         # modify this to the directory where your tiles are

> ls bc_092f09*.laz | parallel -I{} \
    pdal pipeline ../generate_dsm.json \ # adjust this if needed
    --readers.las.filename={} \
    --writers.gdal.filename=../out/dsm/dsm_{/.}.tif

> ls bc_092f09*.laz | parallel -I{} \
    pdal pipeline ../generate_dtm.json \ # adjust this if needed
    --readers.las.filename={} \
    --writers.gdal.filename=../out/dtm/dtm_{/.}.tif

In this command, each file from the list is passed to parallel as an argument. The {} is a placeholder for the file name. The {/.} is a placeholder for the file name without the extension.

Since the filename option is not set in our pipeline, we have to define it in each command for the reader and writer:

readers.las.filename is set to {}, which will pass the entire filename
- e.g. bc_092f096_2_3_2_xyes_8_utm10_2019.laz will be passed to PDAL
writers.gdal.filename is set to ..out/dsm/dsm_{/.}.tif
- Make sure that the output directory exists
- e.g. ../out/dsm/dsm_bc_092f096_2_3_2_xyes_8_utm10_2019.tif will be passed to PDAL

Checking the output

Each command will take an increasing amount of time based on the density of the point cloud, the size & number of tiles, and the resolution option in the writers.gdal stage of your pipeline. Once each batch pipeline is complete, check the folder that you set as the output directory.

> cd ..

> ls out/dsm
dsm_bc_092f096_2_3_2_xyes_8_utm10_2019.tif  dsm_bc_092f096_2_4_4_xyes_8_utm10_2019.tif
dsm_bc_092f096_2_3_4_xyes_8_utm10_2019.tif  dsm_bc_092f097_1_3_1_xyes_8_utm10_2019.tif
dsm_bc_092f096_2_4_1_xyes_8_utm10_2019.tif  dsm_bc_092f097_1_3_3_xyes_8_utm10_2019.tif
dsm_bc_092f096_2_4_2_xyes_8_utm10_2019.tif  dsm_bc_092f097_1_3_4_xyes_8_utm10_2019.tif
dsm_bc_092f096_2_4_3_xyes_8_utm10_2019.tif  dsm_bc_092f097_3_1_2_xyes_8_utm10_2019.tif

> ls out/dtm
dtm_bc_092f096_2_3_2_xyes_8_utm10_2019.tif  dtm_bc_092f096_2_4_4_xyes_8_utm10_2019.tif
dtm_bc_092f096_2_3_4_xyes_8_utm10_2019.tif  dtm_bc_092f097_1_3_1_xyes_8_utm10_2019.tif
dtm_bc_092f096_2_4_1_xyes_8_utm10_2019.tif  dtm_bc_092f097_1_3_3_xyes_8_utm10_2019.tif
dtm_bc_092f096_2_4_2_xyes_8_utm10_2019.tif  dtm_bc_092f097_1_3_4_xyes_8_utm10_2019.tif
dtm_bc_092f096_2_4_3_xyes_8_utm10_2019.tif  dtm_bc_092f097_3_1_2_xyes_8_utm10_2019.tif

Now that we have our DSM and DTM, we will view, organize, and further process them using QGIS and GDAL.

Working with the GeoTIFF Rasters in `QGIS` and `GDAL`

Open up QGIS and create a new project. Set the CRS of your project to the same CRS that you reprojected your data into in the pipeline, in my case it is EPSG:3157.

A screenshot of QGIS show the tiles of DSM raster data. — The QGIS window is shown above, with all of the DSM tiles loaded into a new project.

There are several ways to add the raster files to your project. You can drag in one or all of the TIFF files from your file explorer into QGIS, or use the browser panel to navigate to the folder that you set as the output directory, then drag them into the canvas area. You can also use the Data Source Manager to add any type of data.

Creating a virtual raster layer

After you’ve loaded all the raster tiles, you can create a virtual raster layer by going to:
Raster > Miscellaneous > Build Virtal Raster...

A screenshot of QGIS showing the raster menu. — Location of the `Build Virtual Raster...` tool

A screenshot of QGIS showing the Build Virtual Raster tool. — The `Build Virtual Raster` tool and the layer selection window

Click the ... button and add all of the tiles of one set (i.e. the 10 individual dsm_bc_..._x_y_x.tif tiles) to the layer selection window.
You can use this window to add layers that are already in the project, or you can add an entire directory using the buttons on the right hand side.
You can choose a location to save the virtual raster by clicking the ... button and selecting a location. If you leave it blank, it will be added to the project as a temporary layer.
Click the OK button and your virtual raster will be created.

A screenshot of QGIS showing the DSM virtual raster layer. — Once the `Build Virtual Raster` tool runs, you will see the entire mosaic combined into one continuous layer. In the foreground, you can see that layers panel shows the dsm virtual raster at the top, and a group of the tiles below.

If you don’t see the virtual raster on your canvas, you may have to rearrange the layers in the layer panel.

Do this for both the DSM and DTM rasters tile sets and take a look at each of them. Style them if you’d like - this won’t be covered in this article.

If you look at the bottom of the Build Virtual Raster tool, you can see the command that is run to fill in the gaps. A similar command can be seen at the bottom of each raster tool we cover in this article.

These are the GDAL command line tools that QGIS is running, and you can use them yourself to run these tools from the command line. I will include them with each analysis step in this article.

`gdalbuildvrt` command

# build virtual raster commands
> gdalbuildvrt -overwrite -resolution average -r nearest \
    -input_file_list /path/to/DSM_Tile_List \
    /home/hadallen/web/savary-pdal/out/dsm/dsm.vrt

> gdalbuildvrt -overwrite -resolution average -r nearest \
    -input_file_list /path/to/DTM_Tile_List \
    /home/hadallen/web/savary-pdal/out/dtm/dtm.vrt

Filling gaps in the raster

If you look closely, you can see that there are areas in each raster where there are large gaps of no data. One solution for this is the run the Raster > Analysis > Fill nodata... tool.

A screenshot of QGIS showing the `Fill nodata` tool. — `Fill Nodata` tool location in QGIS

Use the Fill nodata tool for each virtual raster layer. Take a look at the new layer and compare it to the original. You can see that most of the gaps have been filled in, but some remain. If needed, you can increase the Maximum distance (in pixels) to search out for values to interpolate option to fill in more gaps.

`gdal_fillnodata.py` command

# fill nodata commands
> gdal_fillnodata.py -md 10 -b 1 -of GTiff \
    /home/hadallen/web/savary-pdal/out/dtm/dsm.vrt \
    /home/hadallen/web/savary-pdal/out/finals/dsm_filled.tif

> gdal_fillnodata.py -md 10 -b 1 -of GTiff \
    /home/hadallen/web/savary-pdal/out/dtm/dtm.vrt \
    /home/hadallen/web/savary-pdal/out/finals/dtm_filled.tif

Generating Crown Height Model (CHM) using `gdal_calc.py`

Now that we have filled in the gaps in the rasters, we can use the gdal_calc.py command to generate the CHM.

> gdal_calc.py -A out/finals/dtm_filled.tif -B out/finals/dsm_filled.tif \
    --calc="B-A" --outfile out/finals/chm.tif --extent intersect

Generating Hillshade and Slope rasters

In the same menu in QGIS, you can go to Raster > Analyis > Slope or Hillshade to produce the respective raster layers.

In the Slope tool, you must be sure to select the Slope expressed at percent instead of degrees option in order to get the slope in Percent (rise/run) rather than in degrees.

A screenshot of QGIS showing the `Slope` tool. — `Slope` tool in `QGIS`

Run the Hillshade tool on both the DSM-filled.tif and the DTM-filled.tif rasters. We will use the DSM hillshade with the Crown Height Model for better visualization. The DTM Hillshade can be used to view the topography of the DTM.

A screenshot of QGIS showing the `Hillshade` tool. — `Hillshade` tool in `QGIS`

`gdaldem` command for hillshade and slope

# hillshade commands
> gdaldem hillshade /home/hadallen/web/savary-pdal/out/finals/dsm_filled.tif \
    /home/hadallen/web/savary-pdal/out/finals/dsm_hillshade.tif \
    -of GTiff -b 1 -z 1.0 -s 1.0 -az 315.0 -alt 45.0

> gdaldem hillshade /home/hadallen/web/savary-pdal/out/finals/dtm_filled.tif \
    /home/hadallen/web/savary-pdal/out/finals/dtm_hillshade.tif \
    -of GTiff -b 1 -z 1.0 -s 1.0 -az 315.0 -alt 45.0


# slope command
> gdaldem slope /home/hadallen/web/savary-pdal/out/finals/dtm_filled.tif  \
    /home/hadallen/web/savary-pdal/out/finals/slope.tif \
    -of GTiff -b 1 -s 1.0 -p

Additional Resources

Clipping the rasters

Another tool that you may want to use is the gdalwarp command line tool. I won’t cover this here, but you use this to clean up the edges of the raster files.

gdalwarp -of GTiff -cutline ../savary_polygon.shp -cl savory_polygon \
  -crop_to_cutline out/dtm/dtm-filled.tif out/dtm/dtm-filled-cropped.tif

Others Sources

PDAL has a great collection of documentation, particularly useful were the Workshop section & Batch Processing page.
GDAL is a great tool for working with rasters, and many tutorials, and the GIS Stack Exchange is a great collection of GDAL questions.
QGIS has an excellent set of documentation for working with raster and vector data, WMS/WFS web services, and much more. You can work with LAZ files in QGIS, however I find it to be a bit cumbersome to use for large datasets and more efficient to work in the command line.
https://www.simonplanzer.com/articles/lidar-chm/ this was an extremely valuable resource when I was first working with the data.
https://gisgeography.com/top-6-free-lidar-data-sources/

Introduction¶

Interactive Map¶

Software used¶

Gathering the data¶

Processing Workflow¶

Setting up the pipelines¶

Creating the Digital Surface Model (DSM) pipeline¶

Pipeline example: generate-dsm.json¶

Creating the Digital Terrain Model (DTM) pipeline¶

Pipeline example: generate-dtm.json¶

Cleaning up the data (denoise, remove outliers, etc.)¶

Classifying the ground points¶

Running batch pipelines using parallel¶

Creating the list of files with ls¶

Running the list of files through parallel¶

Checking the output¶

Working with the GeoTIFF Rasters in QGIS and GDAL¶

Creating a virtual raster layer¶

gdalbuildvrt command¶

Filling gaps in the raster¶

gdal_fillnodata.py command¶

Generating Crown Height Model (CHM) using gdal_calc.py¶

Generating Hillshade and Slope rasters¶

gdaldem command for hillshade and slope¶

Additional Resources¶

Clipping the rasters¶

Others Sources¶