Analyze the data¶
We are ready to conduct a simple analysis for crevass morphology. At this research stage, some of the key needs include
choosing a good data structure for the following analysis and result storage
quick visualization of processing results
comparing different test results efficiently
Here we will use two tools, one from the Pangeo ecosystem (xarray
) and the other from the Jupter ecosystem (Jupyter widgets), and show how they assist data analysis by providing solutions to these needs.
Goals
Analyze the ICESat-2 data we have downloaded and identify crevass locations on the glacier surface using a pretty basic algorithm.
Steps¶
Load and prepare the data¶
This step is replicated from the previous section. See Get data for details.
import h5py
import numpy as np
filename = 'download/processed_ATL03_20190805232841_05940403_004_01.h5'
with h5py.File(filename, 'r') as f:
lon_ph = f['gt1l']['heights']['lon_ph'][:] # photon longitude (x)
lat_ph = f['gt1l']['heights']['lat_ph'][:] # photon latitude (y)
h_ph = f['gt1l']['heights']['h_ph'][:] # photon elevation (z), in m
dist_ph = f['gt1l']['heights']['dist_ph_along'][:] # photon horizontal distance from the beginning of the parent segment, in m
seg_length = f['gt1l']['geolocation']['segment_length'][:] # horizontal of each segment, in m
seg_ph_count = f['gt1l']['geolocation']['segment_ph_cnt'][:] # photon count in each segment, in m
quality_ph = f['gt1l']['heights']['signal_conf_ph'][:, 0] # photon quality, 0 = noise, 2 = low, 3 = medium, 4 = high
def make_dist_alongtrack(ph_count, seg_length, dist_ph):
'''
For detailed explanation of each variable and reasoning of the code, see ICESat-2 ATL03 documentation.
'''
repeat_num = np.cumsum(seg_length) - seg_length[0]
dist_alongtrack = np.repeat(repeat_num, ph_count)
dist_alongtrack += dist_ph
return dist_alongtrack
dist_alongtrack = make_dist_alongtrack(seg_ph_count, seg_length, dist_ph) # distance along track for each photon, in m
Use xarray
to handle the data¶
There are many closely related variables with different names. Since the variables we are interested in share the same array size (including lat
, lon
, dist_alongtrack
, h
, and quality
), we can use xarray
to group them together for the ease of the following analysis and plotting tasks.
The DataArray
class from xarray
comes with handy methods for various tasks. Its plotting functionality is build on matplotlib
, so we can use ipympl
widget again to generate interactive plots.
%matplotlib widget
import xarray as xr
import matplotlib.pyplot as plt
Now we can make an xarray.DataArray
object using the variables above and some additional metadata:
data_grouped = np.vstack((h_ph, lon_ph, lat_ph, quality_ph))
labels = ['h', 'lon', 'lat', 'q']
attrs = {'dataset': 'ATL03', 'glacier': 'Negribreen', 'RGT': 594, 'date': '2019-08-05'}
da = xr.DataArray(data_grouped, coords=[labels, dist_alongtrack], dims=["labels", "dist_alongtrack"], attrs=attrs)
da
<xarray.DataArray (labels: 4, dist_alongtrack: 312012)> array([[ 819.36749268, 826.67175293, 462.56271362, ..., 293.02194214, -22.02213097, -205.87425232], [ 18.47193949, 18.47194186, 18.47182369, ..., 18.29993139, 18.29982727, 18.2997665 ], [ 78.50008076, 78.50008053, 78.50009213, ..., 78.67477398, 78.67478393, 78.67478974], [ 0. , 0. , 4. , ..., 0. , 0. , 0. ]]) Coordinates: * labels (labels) <U3 'h' 'lon' 'lat' 'q' * dist_alongtrack (dist_alongtrack) float64 0.1413 0.1058 ... 1.987e+04 Attributes: dataset: ATL03 glacier: Negribreen RGT: 594 date: 2019-08-05
- labels: 4
- dist_alongtrack: 312012
- 819.4 826.7 462.6 461.8 462.3 462.2 461.9 ... 4.0 4.0 0.0 0.0 0.0 0.0
array([[ 819.36749268, 826.67175293, 462.56271362, ..., 293.02194214, -22.02213097, -205.87425232], [ 18.47193949, 18.47194186, 18.47182369, ..., 18.29993139, 18.29982727, 18.2997665 ], [ 78.50008076, 78.50008053, 78.50009213, ..., 78.67477398, 78.67478393, 78.67478974], [ 0. , 0. , 4. , ..., 0. , 0. , 0. ]])
- labels(labels)<U3'h' 'lon' 'lat' 'q'
array(['h', 'lon', 'lat', 'q'], dtype='<U3')
- dist_alongtrack(dist_alongtrack)float640.1413 0.1058 ... 1.987e+04
array([1.413464e-01, 1.057770e-01, 1.876492e+00, ..., 1.987020e+04, 1.987173e+04, 1.987262e+04])
- dataset :
- ATL03
- glacier :
- Negribreen
- RGT :
- 594
- date :
- 2019-08-05
You can see that the DataArray
class supports a rich representation, which shows not only data but also metadata we just enetered when simply typing the object name on the cell.
Slice the data¶
From the previous stage, we know that the crevasses are located at x values (dist_alongtrack
) between 15090 and 16090. We can slice and retrieve only the data in between that range. The other advantage of using xarray
for this task is that the metadata will be kept and passed to the new data subset.
da_crevasse = da.where((15090 < da.dist_alongtrack) & (da.dist_alongtrack < 16090), drop=True)
da_crevasse
<xarray.DataArray (labels: 4, dist_alongtrack: 13686)> array([[-159.31948853, -156.37167358, -294.38366699, ..., 896.60644531, 986.18243408, 896.69598389], [ 18.34177156, 18.34177253, 18.34172716, ..., 18.33334928, 18.33337263, 18.3333432 ], [ 78.63275821, 78.63275811, 78.63276249, ..., 78.64154323, 78.64154661, 78.64154945], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ]]) Coordinates: * labels (labels) <U3 'h' 'lon' 'lat' 'q' * dist_alongtrack (dist_alongtrack) float64 1.509e+04 1.509e+04 ... 1.609e+04 Attributes: dataset: ATL03 glacier: Negribreen RGT: 594 date: 2019-08-05
- labels: 4
- dist_alongtrack: 13686
- -159.3 -156.4 -294.4 326.2 330.4 329.6 ... 0.0 0.0 2.0 0.0 0.0 0.0
array([[-159.31948853, -156.37167358, -294.38366699, ..., 896.60644531, 986.18243408, 896.69598389], [ 18.34177156, 18.34177253, 18.34172716, ..., 18.33334928, 18.33337263, 18.3333432 ], [ 78.63275821, 78.63275811, 78.63276249, ..., 78.64154323, 78.64154661, 78.64154945], [ 0. , 0. , 0. , ..., 0. , 0. , 0. ]])
- labels(labels)<U3'h' 'lon' 'lat' 'q'
array(['h', 'lon', 'lat', 'q'], dtype='<U3')
- dist_alongtrack(dist_alongtrack)float641.509e+04 1.509e+04 ... 1.609e+04
array([15090.214811, 15090.200686, 15090.871387, ..., 16088.333564, 16088.605281, 16089.040432])
- dataset :
- ATL03
- glacier :
- Negribreen
- RGT :
- 594
- date :
- 2019-08-05
Examine the data using interactive plots¶
Let’s quickly revisit the plot we did at the end of the previous stage, but this time using xarray
and ipympl
’s interactive plotting capabilities.
fig, ax = plt.subplots(1, 1, figsize=(7, 3))
da_crevasse.loc['h'].plot.line('.', ax=ax, markersize=1)
ax.set_ylim(320, 363)
(320.0, 363.0)
We can plot them using different colors based on the photon quality:
fig, ax2 = plt.subplots(1, 1, figsize=(7, 3))
for q_value, q_label in zip([0, 2, 3, 4], ['noise', 'low', 'medium', 'high']):
da_crevasse.where(da_crevasse.loc['q'] == q_value, drop=True).loc['h'].plot.line('.', ax=ax2, label=q_label, markersize=1)
ax2.set_ylim(320, 363)
ax2.legend()
<matplotlib.legend.Legend at 0x7fdba5ab9a30>
Apparently, there are two groups with a high photon quality: the first group has an elevation between 340 and 350 m (ice surface), and the second group is lower, at 330 to 340 m (bottom of crevasses).
Analysis: point density and crevasse identification¶
DDA-ice uses point density to detect crevasses [HTL+21]. We adopt this idea and want to set up a simple strategy to identify crevass locations. Here’s the plan:
Calculate kernel denstiry estimate using Scipy’s gaussian_kde module.
Groups with a high kernel density and below the ice surface are identified as crevasses.
from scipy.stats import gaussian_kde
The gaussian_kde
module has an optinal argument called bw_method
, which controls the estimator bandwidth. The ideal bw_method
value generates a density peak at each bottom of crevasses. Let us start with an arbitrary value
0.005
and see what happens.
x = da_crevasse.dist_alongtrack
y = da_crevasse.loc['h']
xy = np.vstack([x, y])
z = gaussian_kde(xy, bw_method=0.005)(xy) # kernel density estimate at each ICESat-2 photon point
fig, ax3 = plt.subplots(1, 1, figsize=(7, 3))
ax3.scatter(x, y, c=z, s=1) # s is marker size
ax3.set_xlim(15090, 16090)
ax3.set_ylim(320, 363)
(320.0, 363.0)
Now points are rendered based on a color map. Brighter points means there are more points clustered together, suggesting a real detection at the air-ice boundary. Our question is, however, that whether bw_method = 0.005
is the best option for such a detection.
Use ipywidgets
to compare tests¶
To test how bw_method
affects the density peak detection, the traditional way is to make as many subplots as possible, assign a different bw_method
for each subplot, and compare the results on one giant figure. With ipywidgets
,
import ipywidgets as widgets
we have a more flexible and interactive way as this code cell shows:
fig, ax4 = plt.subplots(1, 1, figsize=(7, 3))
ax4.scatter(x, y, c=z, s=1)
ax4.set_xlim(15090, 16090)
ax4.set_ylim(320, 363)
slider = widgets.FloatLogSlider(value=0.01, min=-4, max=0, step=0.01)
def update_crevasse_kde():
z = gaussian_kde(xy, bw_method=slider.value)(xy)
ax4.clear()
ax4.scatter(x, y, c=z, s=1)
ax4.set_xlim(15090, 16090)
ax4.set_ylim(320, 363)
widgets.interact_manual(update_crevasse_kde, i=slider)
slider
There is a slider and a button along with this new figure. We can adjust the value of the slider, and when we click the “Run Interact” button, the slider value will be passed to bw_method
, and the figure will be updated using the new kernel density estimates. We can repeat this step as many times as needed until we find satisfying results.
Save results¶
Suppose we think 0.007
is a good value for bw_method
and run gaussian_kde
one more time.
z = gaussian_kde(xy, bw_method=0.007)(xy)
This notebook will preserve every steps of the analysis, from the begining to this decision. We can concatenate the results with the original DataArray
object and add additional metadata:
da_z = xr.DataArray([z], coords=[['kde'], da_crevasse.dist_alongtrack], dims=['labels', 'dist_alongtrack'])
da_results = xr.concat([da_crevasse, da_z], dim='labels')
da_results.attrs['kde_bw_method'] = 0.007
da_results
<xarray.DataArray (labels: 5, dist_alongtrack: 13686)> array([[-1.59319489e+02, -1.56371674e+02, -2.94383667e+02, ..., 8.96606445e+02, 9.86182434e+02, 8.96695984e+02], [ 1.83417716e+01, 1.83417725e+01, 1.83417272e+01, ..., 1.83333493e+01, 1.83333726e+01, 1.83333432e+01], [ 7.86327582e+01, 7.86327581e+01, 7.86327625e+01, ..., 7.86415432e+01, 7.86415466e+01, 7.86415494e+01], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ..., 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 3.89002717e-06, 3.89002767e-06, 2.91905388e-06, ..., 5.66009124e-06, 2.91904350e-06, 5.66008990e-06]]) Coordinates: * labels (labels) <U3 'h' 'lon' 'lat' 'q' 'kde' * dist_alongtrack (dist_alongtrack) float64 1.509e+04 1.509e+04 ... 1.609e+04 Attributes: dataset: ATL03 glacier: Negribreen RGT: 594 date: 2019-08-05 kde_bw_method: 0.007
- labels: 5
- dist_alongtrack: 13686
- -159.3 -156.4 -294.4 326.2 ... 3.715e-06 5.66e-06 2.919e-06 5.66e-06
array([[-1.59319489e+02, -1.56371674e+02, -2.94383667e+02, ..., 8.96606445e+02, 9.86182434e+02, 8.96695984e+02], [ 1.83417716e+01, 1.83417725e+01, 1.83417272e+01, ..., 1.83333493e+01, 1.83333726e+01, 1.83333432e+01], [ 7.86327582e+01, 7.86327581e+01, 7.86327625e+01, ..., 7.86415432e+01, 7.86415466e+01, 7.86415494e+01], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ..., 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 3.89002717e-06, 3.89002767e-06, 2.91905388e-06, ..., 5.66009124e-06, 2.91904350e-06, 5.66008990e-06]])
- labels(labels)<U3'h' 'lon' 'lat' 'q' 'kde'
array(['h', 'lon', 'lat', 'q', 'kde'], dtype='<U3')
- dist_alongtrack(dist_alongtrack)float641.509e+04 1.509e+04 ... 1.609e+04
array([15090.214811, 15090.200686, 15090.871387, ..., 16088.333564, 16088.605281, 16089.040432])
- dataset :
- ATL03
- glacier :
- Negribreen
- RGT :
- 594
- date :
- 2019-08-05
- kde_bw_method :
- 0.007
Now we can save this final object to disk (as a NetCDF file) for future use.
da_results.to_netcdf('download/ATL03_Negribreen_RGT594_2019-08-05.nc')
Summary
Using Pangeo and Jupyter tools, we show a data analysis workflow that is assisted with easy visualization and interactive comparison.
- HTL+21
Ute C. Herzfeld, Thomas Trantow, Matthew Lawson, Jacob Hans, and Gavin Medley. Surface heights and crevasse morphologies of surging and fast-moving glaciers from ICESat-2 laser altimeter data - Application of the density-dimension algorithm (DDA-ice) and evaluation using airborne altimeter and Planet SkySat data. Science of Remote Sensing, 3(May 2020):100013, 2021. URL: https://doi.org/10.1016/j.srs.2020.100013, doi:10.1016/j.srs.2020.100013.