NSDF Distinguished Speaker Series - Apr 10, 2024, at 12:30 pm ET / 11:30 am CT / 10:30 am MT / 9:30 am PT

Date: Wednesday, Apr 10, 2024, at 12:30 pm ET / 11:30 am CT / 10:30 am MT / 9:30 am PT

Title: National Science Data Fabric (NSDF) Webinar Tutorial: Using the NSDF Services for End-to-End Analysis and Visualization of Large Scientific Data

Speaker: Dr. Michela Taufer, University of Tennessee Knoxville and Dr. Valerio Pascucci, University of Utah

Seminar Recording


Recording

Abstract: Scientific research often involves dealing with vast amounts of data stored in various public and private remote locations. Researchers frequently prefer to review all the available data remotely before deciding which segments to download, transferring only specific portions of this data to their local computer for closer analysis and visualization. However, every step of this process is challenging: it is difficult to stream the data, identify and deploy tools for data visualization, interact dynamically with the data, explore multiple datasets simultaneously, and decide which relevant segment of data to download.

This tutorial targets scientists who need to visualize and analyze large scientific datasets interactively. The tutorial demonstrates how the National Science Data Fabric (NSDF)’s services enable accessible, flexible, and customizable workflows for multi-faceted analysis and visualization of various datasets. The tutorial walks through workflow steps of generating large datasets through modular applications, storing this data remotely, and analyzing and visualizing the data locally to draw scientific conclusions. NSDF services allow users to stream data from public storage platforms like DataVerse or private storage platforms like Seal Storage and access an easy-to-use NSDF dashboard for immediate interaction with data.

The tutorial highlights how to move through every step of the modular workflow, handling different data formats that are efficient for streaming, and how to use visualization for scientific inference on subsets of selected data. By deploying an earth science use case, the tutorial shows how a modular workflow can create a dataset, gather fine-resolution terrain parameters across the United States, and visualize selected regions of interest.

Come join us to learn about using NSDF services to empower your research!

For more information, contact us at info@nationalsciencedatafabric.org



All Hands Meeting - February 2024

Date February 28, 29 and March 1.

Speaker NSDF Team

NSDF AHM February 2024

The Fourth National Science Data Fabric (NSDF) in-person meeting held in San Diego, California



NSDF Distinguished Speaker Series - Dec 12, 2023, at 2pm ET / 1pm CT / 12pm MT / 11am PT

Date Tuesday, December 12, 2023, 2pm ET / 1pm CT / 12pm MT / 11am PT

Title Big Data at Synchrotron X-ray User Facilities: Challenges and Opportunities

Speaker Dr. Joel Brock, Director, Cornell High Energy Synchrotron Source (CHESS)

Seminar Recording


Recording

Abstract: Modern synchrotron x-ray facilities provide unique techniques for studying the structure and behavior of matter at the microstructural, molecular, and atomic levels. The Cornell High Energy Synchrotron Source (CHESS), a ring-shaped synchrotron, 768 meters in circumference and 5 stories underneath the Cornell University campus, delivers highly collimated x-ray beams over a billion times more intense than a conventional laboratory source to seven (7) experimental stations. These experimental stations each have unique instrumentation, opening new research vistas in condensed-matter physics, materials research, structural biology, chemistry, geology, structural and functional materials, plant science and agriculture, and cultural heritage. In this talk, Dr. Brock will introduce synchrotrons, the synchrotron-based x-ray characterization techniques available at CHESS, and a few of their broad scientific applications. He will then focus ion the challenges and opportunities that big data, cyber infrastructure, machine learning, and artificial intelligence provide.

Bio: Dr. Brock got his B.S. in Physics from Stanford and did his Ph.D. with J. David Litster and a post-doc with Robert J. Birgeneau (both at M.I.T.). He joined the A&EP faculty in 1989. He was a NSF Young Investigator from 1992-97. He is a member of the graduate fields of Applied Physics and Materials Science & Engineering. He served as Director of Graduate Studies for the Graduate Field of Applied Physics from 1993-99. He served as Director of the School of Applied & Engineering Physics from 2000-2007. He became Director of CHESS in 2012. He currently serves on: the International Advisory Committee (IAC) for the RIKEN SPring-8 Center, Japan; the IAC for the National Synchrotron Radiation Research Center (NSRRC), Taiwan; the External Advisory Committee for the National High Magnetic Field Laboratory, Tallahassee; and, he chairs the Scientific Advisory Committee for the Stanford Synchrotron Radiation Lightsource (SSRL) at the SLAC National Accelerator Laboratory. In addition to CHESS, Brock is affiliated with the Cornell Center for Materials Research (CCMR), the Center for Alkaline-Based Energy Solutions (CABES), and the Cornell Energy Systems Institute (CESI). He is a Faculty Fellow of the Cornell Atkinson Center for Sustainability and is a Fellow of the American Physical Society.



NSDF Distinguished Speaker Series - Oct 19, 2023, at 9AM HT / 12pm PT / 1pm MT / 2pm CT / 3pm ET

Date Oct 19, 2023, at 9AM HT / 12pm PT / 1pm MT / 2pm CT / 3pm ET

Title Cyberinfrastructure in Paradise: Enabling Science and Community Impact at the University of Hawaii

Speaker Gwen Jacobs, Director of Cyberinfrastructure, University of Hawaii

Seminar Recording


Recording

Abstract: The University of Hawai’i is a world class research institution with strengths in astronomy, ocean and atmospheric science, biomedicine, indigenous science and more across the 10 campus system. For the last 10 years, strategic investments in cyberinfrastructure, data science, team science and multidisciplinary research have had a transformative impact on the research mission, education and workforce development and brought many benefits to the state. This talk will highlight the strategies, lessons learned and opportunities for the future of research in Hawaii.

Bio: Dr. Gwen Jacobs serves as the Director of Cyberinfrastructure for the University of Hawai‘i System where she leads efforts to support data intensive research with advanced cyberinfrastructure for the University of Hawaiʻi research community. She directs the State of Hawai‘i EPSCoR Program and serves as Co-Director of the Hawaiʻi Data Science Institute. Her research accomplishments and interests span computational neuroscience, informatics, software tools for data management analysis and visualization, campus cyberinfrastructure and regional and national networking initiatives. Prior to coming to UH, she served as Professor of Neuroscience, Department Head of Cell Biology and Neuroscience, Director of the Howard Hughes Medical Institute Undergraduate Biology Education program and Asst. Chief Information Officer and Director of Research Computing at Montana State University. Her work has been continuously funded by the National Science Foundation and National Institutes of Health for more than 30 years and she has been actively engaged in science policy at the national level throughout her career. Recently, she served as member and Chair of the NSF Advisory Committee for Cyberinfrastructure (2016 -2020), member of the NSF Committee of Visitors for the Office of Advanced Cyberinfrastructure, member and chair of the external advisory board for NSF Earthcube, General Chair of PEARC20 and is an active member of the Campus Research Computing Consortium.



NSDF Distinguished Speaker Series - May 23, 2023 - 8AM PT / 9AM MT / 10AM CT / 11AM ET / 5PM CES

Date May 23, 2023, at 8AM PT / 9AM MT / 10AM CT / 11AM ET / 5PM CES

Title The Pathway to Implementing the UNESCO Recommendation on Open Science

Speaker Ana Persic, UNESCO

Video Link

Abstract: The United Nations Educational, Scientific, and Cultural Organization (UNESCO) led a consultative and collaborative process to develop an international standard-setting instrument on open science in the form of a UNESCO Recommendation on Open Science, which was adopted by 193 counties in November 2021. This talk will highlight the components of the Recommendation, including key definitions, values and guiding principles, and areas of action. With the advent in the US of the Nelson memo and OSTP naming 2023 as the year of open science, it is timely to refer to UNESCO’s efforts which also include an open science toolkit and checklists to support the implementation of open science worldwide.

Bio:: Dr. Ana Persic is Programme Specialist at the Science Policy and Innovation Policy Section at the UNESCO headquarters in Paris. An ecologist by training with a Ph.D. in Ecotoxicology, Dr. Ana Persic joined UNESCO in April 2006 in the framework of UNESCO’s Man and the Biosphere program within the Division of Ecological and Earth Sciences in Paris. She then served as a Science Specialist at the UNESCO Liaison Office in New York from 2011-2018. Her work relates to strengthening the science-policy interface and promoting science, technology, and innovation in implementing the United Nations 2030 agenda for sustainable development and sustainable development goals (SDGs). She coordinated the work to develop the UNESCO Recommendation on Open Science and is currently working on its implementation of UNESCO Recommendation on Open Science.


All Hands Meeting - April 2023

Date April 12 and April 13 2023

Speaker NSDF Team

NSDF AHM April 2023

In-person meeting of the National Science Data Fabric on April 12-13 in San Diego, California


NSDF Distinguished Speaker Series - March 23 2023 - 10:30 Mountain Time

Date March 23 2023 - 10:30 Mountain Time

Title Large (Hadron Collider) and Big (Data Science)

Speaker Federica Legger, National Institute for Nuclear Physics

Seminar Recording

Abstract: Since the start of data taking at the Large Hadron Collider (LHC) at CERN in 2009, the four LHC experiments (ALICE, ATLAS, CMS and LHCb) have collected more than an Exabyte of physics data. Storing and processing such a large amount of data requires a distributed computing infrastructure, the Worldwide LHC Computing Grid (WLCG), made up of almost 150 computing facilities spread in 42 countries around the world. The current computing infrastructures are expected to grow by an order of magnitude in size and complexity for the HL-LHC (the high luminosity upgrade of the LHC) era (2030->). In this talk, I will review the challenges of designing, deploying and operating a distributed and heterogeneous computing infrastructure, composed of on-premises data centers, public and private clouds, HPC centers. We will discover how machine learning and artificial intelligence techniques can be exploited to address such complex challenges, from data taking to data processing to data analysis in WLCG..

Bio: Dr. Federica Legger is an associate researcher at INFN (National Institute for Nuclear Physics). She studied Physics at the University of Turin in Italy, and graduated from EPFL (École Polytechnique Fédérale de Lausanne) in Switzerland with a thesis on the data acquisition electronics of the LHCb experiment at CERN. She is currently participating in distributed computing activities for the CMS experiment at the LHC (Large Hadron Collider) and for the Virgo experiment at EGO (European Gravitational Observatory). She is leading the Operational Intelligence initiative for WLCG (World LHC Computing Grid), a cross-experiment effort from the HEP (High Energy Physics) community that targets the reduction of operational cost of large scientific computing infrastructures through AI-powered automation. At the University of Turin, she is lecturer of the course Big Data and Machine Learning for graduate students. Within CMS, she is coordinating the Monitoring and Analytics working group, which is responsible for the management of the monitoring infrastructure, integration of new data sources, and the coordination of analytics tasks. Previously, she held the same role for the ATLAS experiment. In ATLAS, she held coordination roles in both distributed computing (Distributed Analysis coordinator), and physics groups for the search of Supersymmetry.


All Hands Meeting - October 2022

Date October 11 and October 12 2022

Speaker NSDF Team

NSDF AHM October 2022

In-person meeting of the National Science Data Fabric on October 11-12 in conjunction with the eScience conference in Salt Lake City


NSDF Distinguished Speaker Series - May 2022

Date 12:30 PM ET May 26 2022

Title A Global Research Data Platform: How Globus Services Enable Scientific Discovery

Speaker Ian Foster, University of Chicago and Argonne National Laboratory

Seminar Recording

Abstract: The Globus team has spent more than a decade developing software-as-a-service methods for research data management, available at globus.org. Globus transfer, sharing, search, publication, identity and access management (IAM), automation, and other services enable reliable, secure, and efficient managed access to exabytes of scientific data on tens of thousands of storage systems. For developers, flexible and open platform APIs reduce greatly the cost of developing and operating customized data distribution, sharing, and analysis applications. With 200,000 registered users at more than 2,000 institutions, more than 1.5 exabytes and 100 billion files handled, and 100s of registered applications and services, the services that comprise the Globus platform have become essential infrastructure for many researchers, projects, and institutions. I describe the design of the Globus platform, present illustrative applications, and discuss lessons learned for cyberinfrastructure software architecture, dissemination, and sustainability.

Bio: Dr. Ian Foster is Senior Scientist and Distinguished Fellow, and also director of the Data Science and Learning Division, at Argonne National Laboratory, and the Arthur Holly Compton Distinguished Service Professor of Computer Science at the University of Chicago. Ian received a BSc degree from the University of Canterbury, New Zealand, and a PhD from Imperial College, United Kingdom, both in computer science. His research deals with distributed, parallel, and data-intensive computing technologies, and innovative applications of those technologies to scientific problems in such domains as materials science, climate change, and biomedicine. Foster is a fellow of the AAAS, ACM, BCS, and IEEE, and an Office of Science Distinguished Scientists Fellow.



NSDF Distinguished Speaker Series - April 2022

Date 12:30 pm ET April 28 2022

Title Pangeo Forge - Crowdsourcing Analysis Ready Data in the Cloud

Speaker Ryan Abernathey, Columbia University, Department of Earth and Environmental Science

Seminar Recording

Abstract: Analysis-ready, cloud optimized (ARCO) scientific data is essential for scalable big data analytics in the cloud. ARCO can massively accelerate statistical analysis, visualization, and machine learning workflows on large-scale scientific datasets. However, most scientific data is distributed in archival formats that are not optimized for large-scale analysis.

Pangeo Forge (https://pangeo-forge.org/) is an open source framework for data Extraction, Transformation, and Loading (ETL) of scientific data. The goal of Pangeo Forge is to make it easy to extract data from traditional data archives and deposit it in cloud object storage in ARCO format.

Pangeo Forge is made of two main components:

  • Pangeo Forge Recipes: an open source Python package, which allows you to create and run ETL pipelines (“recipes”) and run them on your own computer.
  • Pangeo Forge Cloud: a cloud-based automation framework which executes these recipes in the cloud from code stored in GitHub and deposits the data into cloud object storage.

By storing data recipes in version-controlled GitHub repositories, we can maintain perfect provenance information from archival repository to ARCO copy. Using Pangeo Forge, we are collaboratively populating a petabyte-scale library of open ARCO climate data distributed across multiple cloud storage services, including Open Storage Network.

Pangeo Forge is inspired directly by Conda Forge, a community-led collection of recipes for building conda packages. We hope that Pangeo Forge can eventually play the same role for datasets, encouraging open, interdisciplinary collaboration around data curation.

Bio:: Ryan is a computational physical oceanographer who leads the Ocean Transport Group, whose mission is to advance scientific understanding of how stuff moves around the ocean and how this transport influences Earth’s large-scale climate and ecosystems. This research involves working with satellite data, numerical simulations, and observational datasets. Ryan is an enthusiastic advocate for open source scientific software and is an active contributor the Pangeo Project, a community platform for Big Data geoscience.


All Hands Meeting - February 2022

Date February 22-23 2022

Speaker NSDF Team

NSDF AHM February 2022

The meeting has multiple goals of:

  • developing and sharpening together the long term NSDF vision and uniqueness
  • knowing each other in person to strengthen the connections among team members and work more effectively together remotely
  • sharing with everyone the plans and the technical progress of working groups
  • identify NSDF strengths and weaknesses to improve our plans for the future.




This material is based upon work supported by the National Science Foundation under Grant No. 2138811.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Copyright © 2021 National Science Data Fabric