Jupyter Meets the Earth - A toy research workflow

This is a simple workflow showing how Jupyter and Pangeo tools help develop a research project and share scientific results, specifically for the field of geoscience.

Summary

We use the ICESat-2 data (satellite later altimeter) and conduct a glacier crevasse analysis. We use tools in the Jupyter and Pangeo ecosystem, including Icepyx, to query data, explore ideas, visualize findings, and publish results. This workflow provides a generic scheme for earth science research projects, although some of the tools used here might be specific to cryospheric studies. We aim to tackle the following challenge: one has some data and ideas for scientific analysis and wants to make the results reproducible as possible to the other researchers. It might be even better to have some educational purposes. Here we show that the Jupyter and Pangeo tools can provide an easy approach to achieve the goals.

Seven stages of a research workflow

We break this toy workflow down into seven conceptual stages, which are present in most research activities:

  1. Search for ideas: You did a lot of studies for some interesting arguments. Now you have your own ideas and want to test them. How to develop a solid and practical plan for that? What are other challenges during this stage?

  2. Get data: You have pinned down a strategy and a data set you are going to use. If the data set is big and contains a lot of less related information to your ideas, How to query, access, slice, and retrieve the most valuable part with efficiency for your research project?

  3. Explore the data: Before carrying on a detailed analysis of the data, it is often necessary to evaluate them first so we know their potentials. Therefore, we wonder if there is a way to dissect the data quickly and get a broad picture of them.

  4. Analyze the data: The choice of tools determines how you work with the data. The Jupyter and Pangeo projects provide an arsenal full of software packages that have been well aggregated together. This is what we call an “ecosystem.” For each software “niche” that includes the most appropriate tools for a particular research case, the Jupyter ecosystem aims to simply support it.

  5. Generate publication-ready results: It is nearly always necessary to revisit a particular analysis, update the results, and modify figures and tables. How to make this step as less painful as possible? Furthermore, we want to move one step forward: how to interactively present your results for the readers to make sense?

  6. Write a report: Academic writing is challenging, partly because you have to organize all the material you have, such as text, figures, tables, equations, code, and references. How can Jupyter help with this?

  7. Publish and reproduce the work: The last step of the workflow is an important step to advance scientific exploration. Your work has been published, and now it is part of the knowledge base of the other researchers. When they want to test your results or build their workflows based on your work, what ways can you expedite this knowledge-sharing process on the Jupyter Landscape?