NSDF Tutorial: Using NSDF for End-to-End Analysis of Scientific Data
Overview
This tutorial introduces OpenVisus, an NSDF service that improves the way scientific data is accessed, analyzed, and visualized using cloud technologies. The tutorial provides step-by-step guidance using a module of the SOMOSPIE engine to collect raw data from a public source, like the USDA portal. This data is then converted into multiple files for analysis with NSDF and for storage on both public and private platforms. The module used is GEOtiled, the first step in the SOMOSPIE process, which efficiently calculates terrain attributes over large areas from digital elevation models (DEMs), maintaining high accuracy.
Throughout the tutorial, you will learn how to:
-
Construct a modular workflow that combines your application components with NSDF services.
-
Upload, download, and stream data to and from public and private storage solutions.
-
Deploy the NSDF dashboard for large-scale data access, visualization, and analysis.
The slides introducing this tutorial can be downloaded here. This tutorial follows the steps in Figure 1.
Figure 1. Workflow diagram illustrating the tutorial's process of data collection, transformation, analysis, and storage using the SOMOSPIE engine and NSDF services.
Table of content
- Prerequisites
- Running the Tutorial with Docker
- APPENDIX: Installing the Tutorial from the Beginning
- Community and Resources
- Related Publications
- Copyright and License
- Authors
- Acknowledgments
Prerequisites
Note: To follow this tutorial you must have a computer with minimum 8 GB of RAM and 5 GB of free disk
Before starting this tutorial, ensure you have installed Git and Docker Desktop on your computer.
- To install Git: Follow the installation instructions for your operating system (Linux, Windows, or Mac).
- To install Docker Desktop: Follow the installation instructions for your operating system (Linux, Windows, or Mac). Be sure you are running the most recent version of Docker! Previous versions to 4.15.10 may not work.
After installation, confirm that both tools are correctly set up by executing the following commands in your terminal.
Note: For Windows users, we recommend using the PowerShell terminal for these verifications.
- To verify the GitHub installation:
# Check the Git version
git --version
Expected output (NOTE: git version can be different):
git version 3.12.0
- To verify Docker Desktop installation: Make sure you open the Docker Desktop application before running Docker commands.
# Check the Docker installation information
docker info
Expected output:
Client:
Version: 24.0.5
Context: default
Debug Mode: false
Server:
Containers: 120
Running: 0
Paused: 0
Stopped: 120
Images: 48
Note: The specific numbers in the output might vary based on your installation details and additional information may also appear.
Running the Tutorial with Docker
To run this tutorial, we have prepared a Docker container named globalcomputinglab/somospie_openvisus
that includes all necessary software. Ensure you have installed Docker Desktop as outlined in the Prerequisites.
** Note: Before following the next steps, make sure to open the Docker Desktop Application.**
Now open the terminal and follow the next steps to deploy the tutorial in the Docker container:
# Clone the tutorial repository:
git clone https://github.com/nsdf-fabric/NSDF_Tutorial.git
# Navigate to the tutorial directory:
cd NSDF_Tutorial
# Launch the Docker environment:
docker-compose up -d
Note: If you get a permission denied
error, please add sudo
before the command. For example, sudo docker-compose up -d
After executing the above command, open your preferred web browser (such as Google Chrome, Firefox, or Safari) and enter the following URL to access Jupyter Lab and the tutorial notebook (Tutorial.ipynb): http://127.0.0.1:5000/lab/tree/Tutorial.ipynb
When you have finished the tutorial, ensure to stop the Docker container to free up resources. Do this by entering the following command in your terminal:
# Stop the Docker container
docker-compose down
APPENDIX: Installing the Tutorial from the Beginning
This session provides detailed instructions for setting up and running the workflow from the beginning. You have two options: you can set up a Docker container or configure your local machine for deployment. These instructions are designed for users with more advanced technical skills, and they can be customized to incorporate your application with GEOtiled.
Using a Docker container
To build the docker image in your local machine:
docker build --platform linux/amd64 -t globalcomputinglab/somospie_openvisus .
To pull the image from Dockerhub:
docker pull --platform linux/amd64 globalcomputinglab/somospie_openvisus:tutorial
To run:
docker run -d -p 5000:5000 -p 8989:8989 --name tutorial --platform linux/amd64 globalcomputinglab/somospie_openvisus
Follow this URL to run the Jupyter Notebook Tutorial.ipynb
:
http://localhost:5000/
Using your local machine
Conda is used to control all the dependencies in this project; the file environment.yml
contains the list of required versions:
# environment.yml
name: somospie
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- gdal
- pip
- pip:
- pandas
- pyspark
- findspark
- scikit-learn
- matplotlib
- grass-session
- bokeh
- xmltodict
- requests
- colorcet
- geopandas
- tqdm
- panel
- jupyterlab
- grass-session
- tifffile
- rasterio
- imagecodecs
To install the dependencies in your local machine, use the following command:
Note: Conda is mandatory in this step, use this link to install it
conda env create -f environment.yml
Activate the virtual environment:
conda activate somospie
Install OpenVisus dependencies:
# use this file to install openvisus in your local machine
./setup_openvisuspy.sh
Run the Jupyter Notebook and follow the internal instructions:
jupyter notebook Tutorial.ipynb
Community and Resources
NSDF and SOMOSPIE are open-source projects. Questions, discussion, and contributions are welcome. Contributions can be anything from new packages to bug fixes, documentation, or even new core features.
NSDF Resources:
- Slack workspace: nsdf-workspace.
- Github Discussions: issues: Discussions and Q&A.
- Mailing list: https://groups.google.com/g/nsdf - nsdf@googlegroups.com
- Twitter: @FabricNsdf
OpenVisus Resources:
- Github: Open Source distribution of the ViSUS capabilities
- Webpage: VISUS - High performance Big Data Analysis and Visualization Solutions
SOMOSPIE Resources:
- GitHub: SOMOSPIE software
- Webpage: SOMOSPIE overview
- Questions: Michela Taufer mtaufer@utk.edu
Related Publications
[1] Roa, C., Olaya, P., Llamas, R., Vargas, R., Taufer, M. GEOtiled: A Scalable Workflow for Generating Large Datasets of High-Resolution Terrain Parameters. Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing (2023). link
[2] Olaya, Paula and Luettgau, Jakob and Roa, Camila and Llamas, Richardo and Vargas, Rodrigo and Wen, Sophia and Chung, I-Hsin and Seelam, Seetharami and Park, Yoonho and Lofstead, Jay and others Enabling Scalability in the Cloud for Scientific Workflows: An Earth Science Use Case. IEEE International Conference on Cloud Computing (2023). link
[3] D. Rorabaugh, M. Guevara, R. Llamas, J. Kitson, R. Vargas, and M. Taufer. SOMOSPIE: A modular SOil MOisture SPatial Inference Engine based on data-driven decisions. In Proceedings of the 2019 15th International Conference on eScience (eScience) (2019). link
[4] V. Pascucci and R. J. Frank, “Global Static Indexing for Real-Time Exploration of Very Large Regular Grids,” SC ‘01: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, USA, 2001, pp. 45-45, link
[5] Pascucci, Valerio, et al. “The ViSUS visualization framework.” High Performance Visualization. Chapman and Hall/CRC, 2012. 439-452. link
[6] Brian Summa, Giorgio Scorzelli, Ming Jiang, Peer-Timo Bremer, and Valerio Pascucci. 2011. Interactive editing of massive imagery made simple: Turning Atlanta into Atlantis. ACM Trans. Graph. 30, 2, Article 7 (April 2011), 13 pages. link
Copyright and License
Copyright (c) 2024, Global Computing Lab
Catalog Comparison Tool is distributed under terms of the Apache License, Version 2.0 with LLVM Exceptions. See LICENSE for more details.
Authors
This project was created by the NSDF team and the SOMOSPIE team. To reach out email us at info@nationalsciencedatafabric.org and Dr. Michela Taufer mtaufer@utk.edu.
Acknowledgments
The authors of this tutorial would like to express their gratitude to:
- NSF through the awards 2138811, 2103845, 2334945, 2138296, and 2331152.
- The Dataverse team link
- The Seal Storage team link
- Vargas Lab led by Dr. Rodrigo Vargas link
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.