NSDF Tutorial: Using NSDF for End-to-End Analysis of Scientific Data

Overview

This tutorial introduces OpenVisus, an NSDF service that improves the way scientific data is accessed, analyzed, and visualized using cloud technologies. The tutorial provides step-by-step guidance using a module of the SOMOSPIE engine to collect raw data from a public source, like the USDA portal. This data is then converted into multiple files for analysis with NSDF and for storage on both public and private platforms. The module used is GEOtiled, the first step in the SOMOSPIE process, which efficiently calculates terrain attributes over large areas from digital elevation models (DEMs), maintaining high accuracy.

Throughout the tutorial, you will learn how to:

  • Construct a modular workflow that combines your application components with NSDF services.

  • Upload, download, and stream data to and from public and private storage solutions.

  • Deploy the NSDF dashboard for large-scale data access, visualization, and analysis.

The slides introducing this tutorial can be downloaded here. This tutorial follows the steps in Figure 1.


Figure 1. Workflow diagram illustrating the tutorial's process of data collection, transformation, analysis, and storage using the SOMOSPIE engine and NSDF services.


Table of content

  1. Prerequisites
  2. Running the Tutorial with Docker
  3. APPENDIX: Installing the Tutorial from the Beginning
  4. Community and Resources
  5. Related Publications
  6. Copyright and License
  7. Authors
  8. Acknowledgments

Prerequisites

Note: To follow this tutorial you must have a computer with minimum 8 GB of RAM and 5 GB of free disk

Before starting this tutorial, ensure you have installed Git and Docker Desktop on your computer.

  • To install Git: Follow the installation instructions for your operating system (Linux, Windows, or Mac).
  • To install Docker Desktop: Follow the installation instructions for your operating system (Linux, Windows, or Mac). Be sure you are running the most recent version of Docker! Previous versions to 4.15.10 may not work.

After installation, confirm that both tools are correctly set up by executing the following commands in your terminal.

Note: For Windows users, we recommend using the PowerShell terminal for these verifications.

  • To verify the GitHub installation:
# Check the Git version
git --version

Expected output (NOTE: git version can be different):

git version 3.12.0
  • To verify Docker Desktop installation: Make sure you open the Docker Desktop application before running Docker commands.
# Check the Docker installation information
docker info

Expected output:

Client:
 Version:    24.0.5
 Context:    default
 Debug Mode: false

Server:
 Containers: 120
  Running: 0
  Paused: 0
  Stopped: 120
 Images: 48

Note: The specific numbers in the output might vary based on your installation details and additional information may also appear.

Running the Tutorial with Docker

To run this tutorial, we have prepared a Docker container named globalcomputinglab/somospie_openvisus that includes all necessary software. Ensure you have installed Docker Desktop as outlined in the Prerequisites.

** Note: Before following the next steps, make sure to open the Docker Desktop Application.**

Now open the terminal and follow the next steps to deploy the tutorial in the Docker container:

# Clone the tutorial repository:
git clone https://github.com/nsdf-fabric/NSDF_Tutorial.git

# Navigate to the tutorial directory:
cd NSDF_Tutorial

# Launch the Docker environment:
docker-compose up -d

Note: If you get a permission denied error, please add sudo before the command. For example, sudo docker-compose up -d

After executing the above command, open your preferred web browser (such as Google Chrome, Firefox, or Safari) and enter the following URL to access Jupyter Lab and the tutorial notebook (Tutorial.ipynb): http://127.0.0.1:5000/lab/tree/Tutorial.ipynb

When you have finished the tutorial, ensure to stop the Docker container to free up resources. Do this by entering the following command in your terminal:

# Stop the Docker container
docker-compose down

APPENDIX: Installing the Tutorial from the Beginning

This session provides detailed instructions for setting up and running the workflow from the beginning. You have two options: you can set up a Docker container or configure your local machine for deployment. These instructions are designed for users with more advanced technical skills, and they can be customized to incorporate your application with GEOtiled.

Using a Docker container

To build the docker image in your local machine:

docker build --platform linux/amd64 -t globalcomputinglab/somospie_openvisus .

To pull the image from Dockerhub:

docker pull --platform linux/amd64 globalcomputinglab/somospie_openvisus:tutorial

To run:

docker run -d -p 5000:5000 -p 8989:8989 --name tutorial --platform linux/amd64 globalcomputinglab/somospie_openvisus

Follow this URL to run the Jupyter Notebook Tutorial.ipynb:

http://localhost:5000/

Using your local machine

Conda is used to control all the dependencies in this project; the file environment.yml contains the list of required versions:

# environment.yml

name: somospie
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - gdal
  - pip
  - pip:
      - pandas
      - pyspark
      - findspark
      - scikit-learn
      - matplotlib
      - grass-session
      - bokeh
      - xmltodict
      - requests
      - colorcet
      - geopandas
      - tqdm
      - panel
      - jupyterlab
      - grass-session
      - tifffile
      - rasterio
      - imagecodecs

To install the dependencies in your local machine, use the following command:

Note: Conda is mandatory in this step, use this link to install it

conda env create -f environment.yml

Activate the virtual environment:

conda activate somospie

Install OpenVisus dependencies:

# use this file to install openvisus in your local machine
./setup_openvisuspy.sh

Run the Jupyter Notebook and follow the internal instructions:

jupyter notebook Tutorial.ipynb

Community and Resources

NSDF and SOMOSPIE are open-source projects. Questions, discussion, and contributions are welcome. Contributions can be anything from new packages to bug fixes, documentation, or even new core features.

NSDF Resources:

OpenVisus Resources:

SOMOSPIE Resources:

[1] Roa, C., Olaya, P., Llamas, R., Vargas, R., Taufer, M. GEOtiled: A Scalable Workflow for Generating Large Datasets of High-Resolution Terrain Parameters. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (2023). link

[2] Olaya, Paula and Luettgau, Jakob and Roa, Camila and Llamas, Richardo and Vargas, Rodrigo and Wen, Sophia and Chung, I-Hsin and Seelam, Seetharami and Park, Yoonho and Lofstead, Jay and others Enabling Scalability in the Cloud for Scientific Workflows: An Earth Science Use Case. IEEE International Conference on Cloud Computing (2023). link

[3] D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data-driven decisions. In Proceedings of the 2019 15th International Conference on eScience (eScience) (2019). link

[4] V. Pascucci and R. J. Frank, “Global Static Indexing for Real-Time Exploration of Very Large Regular Grids,” SC ‘01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, USA, 2001, pp. 45-45, link

[5] Pascucci, Valerio, et al. “The ViSUS visualization framework.” High Performance Visualization. Chapman and Hall/CRC, 2012. 439-452. link

[6] Brian Summa, Giorgio Scorzelli, Ming Jiang, Peer-Timo Bremer, and Valerio Pascucci. 2011. Interactive editing of massive imagery made simple: Turning Atlanta into Atlantis. ACM Trans. Graph. 30, 2, Article 7 (April 2011), 13 pages. link

Copyright (c) 2024, Global Computing Lab

Catalog Comparison Tool is distributed under terms of the Apache License, Version 2.0 with LLVM Exceptions. See LICENSE for more details.

Authors

This project was created by the NSDF team and the SOMOSPIE team. To reach out email us at info@nationalsciencedatafabric.org and Dr. Michela Taufer mtaufer@utk.edu.

Acknowledgments

The authors of this tutorial would like to express their gratitude to:

  • NSF through the awards 2138811, 2103845, 2334945, 2138296, and 2331152.
  • The Dataverse team link
  • The Seal Storage team link
  • Vargas Lab led by Dr. Rodrigo Vargas link

Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


This material is based upon work supported by the National Science Foundation under Grant No. 2138811.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Copyright © 2021 National Science Data Fabric