Explore the data

We have downloaded the ICESat-2 data and saved them as ./download/processed_ATL03_20190805232841_05940403_004_01.h5. Before diving into a specific analysis routine, let us see if we can have a general overview about the data using the Jupyter tools. For this stage, we want tools that provide quick access to data, preferably in many ways. And we also want to have some funcionaility to manually navigate to different part of the data. Jupyter’s language-agnostic nature (i.e., not bonded to any specific programming languages) and support to interactive plotting widgets are designed to address these needs.


Explore the data file, including its data structure, size, geospatial information, and so on.


Check data structure

Since the data are stored using the HDF5 format (as indicated by the file extension), we will need certain tools to read them. For example, we can use the h5ls command-line tool to have a quick look of the data structure. On Jupyter notebook, we can use the ! character to use any command-line tool, and we can even pass the variable defined in the other cell (Python or Shell block) to the h5ls command.

filename = 'download/processed_ATL03_20190805232841_05940403_004_01.h5'

This string variable is now shared by the Python kernel and the shell environment. Note that we also use grep here because the full, nested data structure is very long, and we are only interested in the gt1l beam (which [HTL+21] use in their study).

!h5ls -r $filename | grep ^/gt1l/heights
!h5ls -r $filename | grep ^/gt1l/geolocation/segment
/gt1l/heights            Group
/gt1l/heights/delta_time Dataset {312012/Inf}
/gt1l/heights/dist_ph_across Dataset {312012/Inf}
/gt1l/heights/dist_ph_along Dataset {312012/Inf}
/gt1l/heights/h_ph       Dataset {312012/Inf}
/gt1l/heights/lat_ph     Dataset {312012/Inf}
/gt1l/heights/lon_ph     Dataset {312012/Inf}
/gt1l/heights/pce_mframe_cnt Dataset {312012/Inf}
/gt1l/heights/ph_id_channel Dataset {312012/Inf}
/gt1l/heights/ph_id_count Dataset {312012/Inf}
/gt1l/heights/ph_id_pulse Dataset {312012/Inf}
/gt1l/heights/quality_ph Dataset {312012/Inf}
/gt1l/heights/signal_conf_ph Dataset {312012/Inf, 5}
/gt1l/geolocation/segment_dist_x Dataset {993/Inf}
/gt1l/geolocation/segment_id Dataset {993/Inf}
/gt1l/geolocation/segment_length Dataset {993/Inf}
/gt1l/geolocation/segment_ph_cnt Dataset {993/Inf}

Load the data

Now we use h5py (the python library for working with HDF5) and numpy to open the file and load the data we want.

import h5py
import numpy as np
with h5py.File(filename, 'r') as f:
    lon_ph = f['gt1l']['heights']['lon_ph'][:]    # photon longitude (x)
    lat_ph = f['gt1l']['heights']['lat_ph'][:]    # photon latitude  (y)
    h_ph = f['gt1l']['heights']['h_ph'][:]        # photon elevation (z), in m
    dist_ph = f['gt1l']['heights']['dist_ph_along'][:]            # photon horizontal distance from the beginning of the parent segment, in m
    seg_length = f['gt1l']['geolocation']['segment_length'][:]    # horizontal of each segment, in m
    seg_ph_count = f['gt1l']['geolocation']['segment_ph_cnt'][:]  # photon count in each segment, in m

We can easily check the content and statistical information of each variable. For example:

print(h_ph.shape[0])        # this should equal to the summation of the photon count.

Prepare the data

We often need to apply a few steps to the raw data before visualizing or analyzing them. Take these ICESat-2 data for example: if we want to plot the elevation along this track, we need the distance along the track as x values, which is not provided but can be calculated on our own. Using Jupyter Notebook, we can quickly design a Python function for this variable.

def make_dist_alongtrack(ph_count, seg_length, dist_ph):
    For detailed explanation of each variable and reasoning of the code, see ICESat-2 ATL03 documentation.
    repeat_num = np.cumsum(seg_length) - seg_length[0]
    dist_alongtrack = np.repeat(repeat_num, ph_count)
    dist_alongtrack += dist_ph
    return dist_alongtrack
dist_alongtrack = make_dist_alongtrack(seg_ph_count, seg_length, dist_ph)   # distance along track for each photon, in m

Plot the data

Plotting the data is a great way to obtain a brief overview. Using interactive matplotlib figures powered by ipympl, we can speed up the exploration and quickly focus on the key elements of our data.

Use the following Notebook command to activate interactive matplotlib environment:

%matplotlib widget

And then we import matplotlib.

import matplotlib.pyplot as plt

Now every single figure will come with a control panel, and we can use the buttons to pan, zoom, and save the figure. This is especially very helpful in our case because crevasses are small scale features and do not show everywhere along the track. We have to really zoom in to a certain area in order to see them.

fig, ax = plt.subplots(1, 1, figsize=(7, 3))
ax.plot(dist_alongtrack, h_ph, '.', markersize=1)
[<matplotlib.lines.Line2D at 0x7f80ec072fd0>]

After a careful check up, we are able to locate the segment where Figure 6a of [HTL+21] uses. (Note that the along-track distance readings are different because we are using a subset of the original data granule .)

fig, ax2 = plt.subplots(1, 1, figsize=(7, 3))
ax2.plot(dist_alongtrack, h_ph, '.', markersize=1)
ax2.set_xlim(15090, 16090)
ax2.set_ylim(320, 363)
(320.0, 363.0)


The Jupyter ecosystem provides multiple and interactive approaches to access and explore the data.


Ute C. Herzfeld, Thomas Trantow, Matthew Lawson, Jacob Hans, and Gavin Medley. Surface heights and crevasse morphologies of surging and fast-moving glaciers from ICESat-2 laser altimeter data - Application of the density-dimension algorithm (DDA-ice) and evaluation using airborne altimeter and Planet SkySat data. Science of Remote Sensing, 3(May 2020):100013, 2021. URL: https://doi.org/10.1016/j.srs.2020.100013, doi:10.1016/j.srs.2020.100013.