Catalog Updates & Must-Have Datasets With Community Catalog Release 2.8.0
Think 490 TB, 2500+ datasets, 600 thousand plus visits per month. Release 2.8.0 comes with a completely new rewrite of the look and feel of the community catalog and some must have datasets.
Release 2.8.0 🚀 starts with a focus on the rewrite of the community catalog 📚 which was based on a lot of user feedback 🗣️, some great discussion around how to allow for better navigation 🧭 and easier user flow 💨 as the catalog grows further. We didn’t forget the datasets 📊 and this release came with a few interesting 👀 and some must have datasets being added to the community catalog. We achieved some milestones with over 2500+ datasets 📁 and expanded to about 490+TB of data.
You can read about the improved new community catalog look and feel in our earlier post or just head over to the catalog page. This blog focuses on some of the key datasets that were added to enrich the scope and variety of datasets present.
Global Mangrove Canopy Height Maps (12m)
The Global Mangrove Canopy Height Maps dataset offers high-resolution global mangrove canopy height maps for 2015, with a 12-meter resolution. The canopy height estimates were derived from TanDEM-X digital surface models, calibrated and validated using GEDI Lidar data. Covering a circum-equatorial band between 34°N and 39°S latitude, the dataset spans the majority of the world's mangrove ecosystems. It includes 1,443 GeoTIFF files, each representing a 1° by 1° tile, with filenames indicating the latitude and longitude of the tile's southwest corner.
These canopy height maps are invaluable for studying local-scale geophysical and environmental conditions that influence forest structure and carbon cycle dynamics in mangrove ecosystems. The data, sourced from the German Aerospace Agency’s TanDEM-X mission, the Global Mangrove Watch map, and GEDI Lidar data, provides critical insights for researchers and conservationists.
Global 30 m Wetland Map with a Fine Classification System (2000-2022)
The Global 30 m Wetland Map with a Fine Classification System (GWL_FCS30) dataset is a high-resolution global wetland map that offers detailed insights into wetland ecosystems across the world. With a 30-meter spatial resolution, this dataset spans from 2000 to 2022 and classifies wetlands into eight distinct subcategories, including both coastal tidal and inland wetlands. The classification system distinguishes among types such as mangroves, salt marshes, tidal flats, permanent water, swamps, marshes, flooded flats, and saline wetlands. This detailed categorization is crucial for understanding the diverse ecological functions and management needs of various wetland types.
The dataset utilizes Landsat reflectance data, Sentinel-1 SAR imagery, and a stratified classification strategy with local adaptive random forest models to ensure high precision at a global scale. The resulting data, presented in square kilometers, serves as a valuable resource for ecological research, wetland management, and conservation efforts, providing a comprehensive view of wetland distribution and changes over the past two decades. Developed using an innovative approach that integrates automatic sample extraction from existing global wetland products with multi-temporal satellite imagery, GWL_FCS30 captures the complex dynamics of wetland environments.
Cyanobacteria Aggregated Manual Labels (CAML)
The Cyanobacteria Aggregated Manual Labels (CAML) dataset offers a comprehensive resource for monitoring and analyzing cyanobacteria blooms in inland water bodies across the United States. Traditional in-situ sampling and analysis face challenges due to the vast number of water bodies, the variability of algal growth, and inconsistencies in sampling methods and techniques. CAML addresses these challenges by providing a large dataset of in-situ cyanobacteria measurements, enabling researchers to investigate cyanobacteria detection and severity classification systematically. The dataset can be integrated with relevant satellite imagery from publicly available sources, enhancing the ability to identify regions with potential harmful algal growth over large areas.
CAML includes ground measurements of cyanobacteria cell counts at 23,570 points in U.S. inland water bodies from 2013 to 2021. This data, provided in CSV format, can be used to train algorithms for estimating cyanobacteria cell counts, supporting timely water quality assessments and public health interventions. The severity levels in the dataset are based on World Health Organization (WHO) thresholds, categorizing cyanobacteria density into low, moderate, and high levels. However, users can adjust these thresholds as needed to suit specific research or monitoring goals. This dataset is a valuable tool for understanding the environmental and anthropogenic factors that contribute to cyanobacteria incidence and proliferation.
Insiders Datasets
For those who are part of the Insiders Program of the Community Catalog two new datasets have been added while they are being updated and ingested into the catalog. These include two new datasets these months adding large troves of data to the catalog.
gNATSGO (gridded National Soil Survey Geographic Database
The gNATSGO (gridded National Soil Survey Geographic Database) database offers comprehensive coverage of the most detailed and accurate soils information available across the United States and Island Territories. This dataset, sourced specifically for raster data, is made accessible through the Planetary Computer STAC catalog due to the proprietary nature of the original data format. gNATSGO integrates soil data from three key sources: the Soil Survey Geographic Database (SSURGO), State Soil Geographic Database (STATSGO2), and Raster Soil Survey Databases (RSS), ensuring that users have access to the most reliable soil information available.
The gNATSGO database is primarily composed of SSURGO data, which represents over a century of field-validated, detailed soil mapping for more than 90 percent of the U.S. and Island Territories. STATSGO2 provides a general soil map covering the entire U.S., filling in areas not detailed by SSURGO. The next-generation RSSs, developed with advanced digital soil mapping methods, are also incorporated into gNATSGO, though their coverage is currently more limited. As the extent of RSS data expands, it will further enhance the gNATSGO database. Users can leverage the map unit values in the mukey raster asset by joining them with tables in the gNATSGO Tables Collection, with additional raster assets encoding commonly used values for ease of access.
While this dataset will be added in stages and is ongoing AWS or Available Water Storage and Mukey layers have been made available.
Overture Foundation Building Footprints
The Overture Foundation's building dataset, part of the 2024-07-22.0 data release and version 1.0.0 of the schema, is now available for use. The dataset has reached General Availability (GA) for its base, buildings, divisions, and places themes, while the transportation theme remains in beta and may undergo further changes. At present, the dataset includes data exclusively for the CONUS region, offering users access to detailed information about human-made structures across this area.
The buildings theme within the Overture Maps dataset provides a comprehensive description of structures with roofs or interior spaces that are permanently or semi-permanently located in one place, following the OSM building definition. This theme includes two main feature types: "Building," which represents the outer footprint or roofprint of a structure, and "Building Part," which details individual parts of a building. Each building is marked with a boolean attribute, "has_parts," indicating whether it has associated building parts. These building parts share properties with the main building and are linked to it via a unique building_id.
Network and Network Datasets
Earlier last month I went into a deep dive of some of the available network datasets that are available globally and pertaining to internet connectivity and network speeds and coverage. You can read that blog here including how to create your own data extracts.
Ookla's Global Fixed Broadband, Mobile, and 5G Datasets
The Ookla’s Open Data Initiative provides quarterly updated datasets on global fixed broadband and mobile network performance. The community catalog already contained the Global fixed broadband and mobile (cellular) network and this was updated to 2024 Q1. The recent addition of the Ookla 5G Map further enriches this resource, tracking 5G deployments across 241 global providers with nearly 150,000 features in the feature collection.
The Ookla 5G data was processed into a valid GeoJSON and further converted to an Earth Engine feature collection.
Measurement Lab Network Data Extracts (M-Lab)
M-Lab, the largest open-source internet measurement effort globally, provides a wealth of data through its Network Diagnostic Tool (NDT). This dataset offers valuable insights into real-world internet performance, reflecting user-initiated tests that capture download/upload speeds, latency, and packet loss during periods of network issues. This makes the NDT dataset an essential resource for analyzing internet health and understanding user experiences.
This is a very small sample extract for 15,000 download and upload extracts from a single days worth of extract 2024-06-01
For more information on the process of getting to this data and how this was processed find our earlier blog post. If you are like me you will like the adventure on how to create your own extracts.
As the Community Catalog grows, I am excited to see the innovative ways in which researchers, practitioners, and curious minds will leverage these powerful datasets to unlock new discoveries and drive positive change.
Let's connect! Reach out on Linkedin and Github to share dataset ideas, provide feedback, and join the conversation.
If you appreciate these efforts, please consider giving the GitHub repository a star ⭐️This simple gesture helps increase the visibility of our work and spreads awareness about the community catalog.
As always check the changelog for direct links and more updates.💡 Help out and become a sponsor of the community catalog and join me in building a vibrant community🌍
💡 Stay tuned for more updates as we continue to curate and refine this incredible resource!