Analyze the data

We are ready to conduct a simple analysis for crevass morphology. At this research stage, some of the key needs include

  1. choosing a good data structure for the following analysis and result storage

  2. quick visualization of processing results

  3. comparing different test results efficiently

Here we will use two tools, one from the Pangeo ecosystem (xarray) and the other from the Jupter ecosystem (Jupyter widgets), and show how they assist data analysis by providing solutions to these needs.


Analyze the ICESat-2 data we have downloaded and identify crevass locations on the glacier surface using a pretty basic algorithm.


Load and prepare the data

This step is replicated from the previous section. See Get data for details.

import h5py
import numpy as np

filename = 'download/processed_ATL03_20190805232841_05940403_004_01.h5'

with h5py.File(filename, 'r') as f:
    lon_ph = f['gt1l']['heights']['lon_ph'][:]    # photon longitude (x)
    lat_ph = f['gt1l']['heights']['lat_ph'][:]    # photon latitude  (y)
    h_ph = f['gt1l']['heights']['h_ph'][:]        # photon elevation (z), in m
    dist_ph = f['gt1l']['heights']['dist_ph_along'][:]            # photon horizontal distance from the beginning of the parent segment, in m
    seg_length = f['gt1l']['geolocation']['segment_length'][:]    # horizontal of each segment, in m
    seg_ph_count = f['gt1l']['geolocation']['segment_ph_cnt'][:]  # photon count in each segment, in m
    quality_ph = f['gt1l']['heights']['signal_conf_ph'][:, 0]     # photon quality, 0 = noise, 2 = low, 3 = medium, 4 = high 

def make_dist_alongtrack(ph_count, seg_length, dist_ph):
    For detailed explanation of each variable and reasoning of the code, see ICESat-2 ATL03 documentation.
    repeat_num = np.cumsum(seg_length) - seg_length[0]
    dist_alongtrack = np.repeat(repeat_num, ph_count)
    dist_alongtrack += dist_ph
    return dist_alongtrack

dist_alongtrack = make_dist_alongtrack(seg_ph_count, seg_length, dist_ph)   # distance along track for each photon, in m

Use xarray to handle the data

There are many closely related variables with different names. Since the variables we are interested in share the same array size (including lat, lon, dist_alongtrack, h, and quality), we can use xarray to group them together for the ease of the following analysis and plotting tasks.

The DataArray class from xarray comes with handy methods for various tasks. Its plotting functionality is build on matplotlib, so we can use ipympl widget again to generate interactive plots.

%matplotlib widget
import xarray as xr
import matplotlib.pyplot as plt

Now we can make an xarray.DataArray object using the variables above and some additional metadata:

data_grouped = np.vstack((h_ph, lon_ph, lat_ph, quality_ph))
labels = ['h', 'lon', 'lat', 'q']
attrs = {'dataset': 'ATL03', 'glacier': 'Negribreen', 'RGT': 594, 'date': '2019-08-05'}
da = xr.DataArray(data_grouped, coords=[labels, dist_alongtrack], dims=["labels", "dist_alongtrack"], attrs=attrs)
<xarray.DataArray (labels: 4, dist_alongtrack: 312012)>
array([[ 819.36749268,  826.67175293,  462.56271362, ...,  293.02194214,
         -22.02213097, -205.87425232],
       [  18.47193949,   18.47194186,   18.47182369, ...,   18.29993139,
          18.29982727,   18.2997665 ],
       [  78.50008076,   78.50008053,   78.50009213, ...,   78.67477398,
          78.67478393,   78.67478974],
       [   0.        ,    0.        ,    4.        , ...,    0.        ,
           0.        ,    0.        ]])
  * labels           (labels) <U3 'h' 'lon' 'lat' 'q'
  * dist_alongtrack  (dist_alongtrack) float64 0.1413 0.1058 ... 1.987e+04
    dataset:  ATL03
    glacier:  Negribreen
    RGT:      594
    date:     2019-08-05

You can see that the DataArray class supports a rich representation, which shows not only data but also metadata we just enetered when simply typing the object name on the cell.

Slice the data

From the previous stage, we know that the crevasses are located at x values (dist_alongtrack) between 15090 and 16090. We can slice and retrieve only the data in between that range. The other advantage of using xarray for this task is that the metadata will be kept and passed to the new data subset.

da_crevasse = da.where((15090 < da.dist_alongtrack) & (da.dist_alongtrack < 16090), drop=True)
<xarray.DataArray (labels: 4, dist_alongtrack: 13686)>
array([[-159.31948853, -156.37167358, -294.38366699, ...,  896.60644531,
         986.18243408,  896.69598389],
       [  18.34177156,   18.34177253,   18.34172716, ...,   18.33334928,
          18.33337263,   18.3333432 ],
       [  78.63275821,   78.63275811,   78.63276249, ...,   78.64154323,
          78.64154661,   78.64154945],
       [   0.        ,    0.        ,    0.        , ...,    0.        ,
           0.        ,    0.        ]])
  * labels           (labels) <U3 'h' 'lon' 'lat' 'q'
  * dist_alongtrack  (dist_alongtrack) float64 1.509e+04 1.509e+04 ... 1.609e+04
    dataset:  ATL03
    glacier:  Negribreen
    RGT:      594
    date:     2019-08-05

Examine the data using interactive plots

Let’s quickly revisit the plot we did at the end of the previous stage, but this time using xarray and ipympl’s interactive plotting capabilities.

fig, ax = plt.subplots(1, 1, figsize=(7, 3))
da_crevasse.loc['h'].plot.line('.', ax=ax, markersize=1)
ax.set_ylim(320, 363)
(320.0, 363.0)

We can plot them using different colors based on the photon quality:

fig, ax2 = plt.subplots(1, 1, figsize=(7, 3))

for q_value, q_label in zip([0, 2, 3, 4], ['noise', 'low', 'medium', 'high']):
    da_crevasse.where(da_crevasse.loc['q'] == q_value, drop=True).loc['h'].plot.line('.', ax=ax2, label=q_label, markersize=1)
ax2.set_ylim(320, 363)
<matplotlib.legend.Legend at 0x7fdba5ab9a30>

Apparently, there are two groups with a high photon quality: the first group has an elevation between 340 and 350 m (ice surface), and the second group is lower, at 330 to 340 m (bottom of crevasses).

Analysis: point density and crevasse identification

DDA-ice uses point density to detect crevasses [HTL+21]. We adopt this idea and want to set up a simple strategy to identify crevass locations. Here’s the plan:

  1. Calculate kernel denstiry estimate using Scipy’s gaussian_kde module.

  2. Groups with a high kernel density and below the ice surface are identified as crevasses.

from scipy.stats import gaussian_kde

The gaussian_kde module has an optinal argument called bw_method, which controls the estimator bandwidth. The ideal bw_method value generates a density peak at each bottom of crevasses. Let us start with an arbitrary value
0.005 and see what happens.

x = da_crevasse.dist_alongtrack
y = da_crevasse.loc['h']
xy = np.vstack([x, y])
z = gaussian_kde(xy, bw_method=0.005)(xy)      # kernel density estimate at each ICESat-2 photon point
fig, ax3 = plt.subplots(1, 1, figsize=(7, 3))
ax3.scatter(x, y, c=z, s=1)   # s is marker size
ax3.set_xlim(15090, 16090)
ax3.set_ylim(320, 363)
(320.0, 363.0)

Now points are rendered based on a color map. Brighter points means there are more points clustered together, suggesting a real detection at the air-ice boundary. Our question is, however, that whether bw_method = 0.005 is the best option for such a detection.

Use ipywidgets to compare tests

To test how bw_method affects the density peak detection, the traditional way is to make as many subplots as possible, assign a different bw_method for each subplot, and compare the results on one giant figure. With ipywidgets,

import ipywidgets as widgets

we have a more flexible and interactive way as this code cell shows:

fig, ax4 = plt.subplots(1, 1, figsize=(7, 3))
ax4.scatter(x, y, c=z, s=1)
ax4.set_xlim(15090, 16090)
ax4.set_ylim(320, 363)

slider = widgets.FloatLogSlider(value=0.01, min=-4, max=0, step=0.01)

def update_crevasse_kde():
    z = gaussian_kde(xy, bw_method=slider.value)(xy)
    ax4.scatter(x, y, c=z, s=1)
    ax4.set_xlim(15090, 16090)
    ax4.set_ylim(320, 363)

widgets.interact_manual(update_crevasse_kde, i=slider)

There is a slider and a button along with this new figure. We can adjust the value of the slider, and when we click the “Run Interact” button, the slider value will be passed to bw_method, and the figure will be updated using the new kernel density estimates. We can repeat this step as many times as needed until we find satisfying results.

Save results

Suppose we think 0.007 is a good value for bw_method and run gaussian_kde one more time.

z = gaussian_kde(xy, bw_method=0.007)(xy)

This notebook will preserve every steps of the analysis, from the begining to this decision. We can concatenate the results with the original DataArray object and add additional metadata:

da_z = xr.DataArray([z], coords=[['kde'], da_crevasse.dist_alongtrack], dims=['labels', 'dist_alongtrack'])
da_results = xr.concat([da_crevasse, da_z], dim='labels')
da_results.attrs['kde_bw_method'] = 0.007
<xarray.DataArray (labels: 5, dist_alongtrack: 13686)>
array([[-1.59319489e+02, -1.56371674e+02, -2.94383667e+02, ...,
         8.96606445e+02,  9.86182434e+02,  8.96695984e+02],
       [ 1.83417716e+01,  1.83417725e+01,  1.83417272e+01, ...,
         1.83333493e+01,  1.83333726e+01,  1.83333432e+01],
       [ 7.86327582e+01,  7.86327581e+01,  7.86327625e+01, ...,
         7.86415432e+01,  7.86415466e+01,  7.86415494e+01],
       [ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00, ...,
         0.00000000e+00,  0.00000000e+00,  0.00000000e+00],
       [ 3.89002717e-06,  3.89002767e-06,  2.91905388e-06, ...,
         5.66009124e-06,  2.91904350e-06,  5.66008990e-06]])
  * labels           (labels) <U3 'h' 'lon' 'lat' 'q' 'kde'
  * dist_alongtrack  (dist_alongtrack) float64 1.509e+04 1.509e+04 ... 1.609e+04
    dataset:        ATL03
    glacier:        Negribreen
    RGT:            594
    date:           2019-08-05
    kde_bw_method:  0.007

Now we can save this final object to disk (as a NetCDF file) for future use.



Using Pangeo and Jupyter tools, we show a data analysis workflow that is assisted with easy visualization and interactive comparison.


Ute C. Herzfeld, Thomas Trantow, Matthew Lawson, Jacob Hans, and Gavin Medley. Surface heights and crevasse morphologies of surging and fast-moving glaciers from ICESat-2 laser altimeter data - Application of the density-dimension algorithm (DDA-ice) and evaluation using airborne altimeter and Planet SkySat data. Science of Remote Sensing, 3(May 2020):100013, 2021. URL:, doi:10.1016/j.srs.2020.100013.