AI Enhanced Search, Datasets & More in Community Catalog Release 3.0.0
What September has inspired, unlock the power of AI-driven geospatial search in the GEE Community Catalog 3.0.0. Discover new datasets spanning from crop yields to coastlines with unprecedented ease.
This month we are focusing on feature improvements or additions again , it has been a busy month with traveling for Geo for Good Mini Summit in Dublin with talks and hackathons and meeting amazing people it was a September that lived up to expectations in my book. But you are here for updates to the Awesome GEE Community Catalog so let usg dive right into those. For release 3.0.0 I am bringing a Large Language Model (LLM) think Gemini meets the Community Catalog. But these are unique because the search space is only the community catalog so the results are grounded with links to the source.
The search is in Beta and it came out of a discussion during the Hackathon at Geo for Good and I hope it can continue to grow as I get more feedback on the project as I explore what’s the art of the possible with Generative models that support summarization and citation to know the source of what it generated. We will get to this in a bit but lets start with the basics so how did we do in numbers.
We served over 607K + requests to the community catalog over the last 30 days and we got some good numbers to back our growing catalog sizes. As always you can find the latest stats and the changelog gives you details on the new datasets and features we have added.
I know you might be busy playing with the new search but we also added a few datasets this release.
Generative AI Search for the Community Catalog
The Community Catalog always had a search embedded with mkdocs-material which works in static environments and is great for quickly getting to datasets. One of the ideas that came last week from our Hackathon was the idea of creating these grounded generative AI searches and I wanted to build one for the Community Catalog so that’s the reason we have this now. You can head over to the search directly here https://gee-community-catalog.org/search
Want to try something really cool, this search can not only summarize for you in English it can automatically detect the language a question or prompt was created and created a summary out of that. As I mentioned this is less than a week old so improvements will come as I get to experiment more. Also if a prompt doesn’t return something try to be more specific or rewrite it and try again. I love the learning involved in asking the right questions that go hand in hand in getting the right response.
I will do a deeper dive on how I set this up, including a gentle introduction to RAGs again some of which you can find in my earlier post. For now have fun searching and asking for results and summaries.
QDANN 30m Yield Map for Corn, Soy, and Winter Wheat in the U.S
The QDANN 30m Yield Map for Corn, Soy, and Winter Wheat dataset introduces an advanced method for estimating crop yields at the subfield level using satellite imagery and machine learning. By leveraging the Quantile Loss Domain Adversarial Neural Network (QDANN) framework, the dataset addresses the challenges of obtaining fine-scale yield data. QDANN applies an unsupervised domain adaptation strategy, using labeled county-level data along with unlabeled subfield data to map crop yields accurately without relying on subfield-specific yield information. This breakthrough allows for more precise yield mapping in regions where ground truth data is limited, overcoming a key obstacle in crop yield estimation.
This dataset utilizes Landsat imagery and Gridmet weather data, validated against yield monitor records from approximately one million field-year observations for maize, soybeans, and winter wheat. QDANN outperforms existing models, achieving R² scores of 48% for maize, 32% for soybean, and 39% for winter wheat at subfield levels. When aggregated to the county level, accuracy improved significantly, with R² scores reaching 78% for maize, 62% for soybean, and 53% for winter wheat. The dataset offers publicly available 30-meter resolution yield maps for key U.S. states since 2008, providing valuable insights for precision agriculture and yield prediction research.
Congo Basin Forest Roads Dataset
The Congo Basin Forest Roads dataset offers detailed maps of road development within the tropical forests of the Congo Basin, using satellite data from Sentinel-1 and Sentinel-2 combined with deep learning techniques. This dataset provides openly available, up-to-date road maps that are critical for forest conservation, sustainable management, and guiding policy decisions. With road construction in these forests mainly driven by selective logging, the dataset sheds light on the extensive road networks that have emerged, particularly in remote areas, offering a vital tool to monitor logging activities and assess human impact on these sensitive ecosystems.
By integrating Sentinel-1 radar, which can penetrate clouds, and Sentinel-2 optical imagery, the dataset ensures precise monthly updates, even during the rainy season, making it a reliable source of information on narrow and often overlooked roads. Covering the six countries of the Congo Basin—Cameroon, Central African Republic, Democratic Republic of the Congo, Equatorial Guinea, Gabon, and Republic of the Congo—the data helps track road development since 2019. This valuable resource aids in understanding the ecological effects of road expansion and supports efforts to mitigate illegal activities and preserve the forest’s integrity.
Digital Earth Australia Coastlines
Digital Earth Australia Coastlines is a comprehensive dataset that tracks annual shorelines and rates of coastal change across Australia's entire coastline from 1988 to the present. By combining satellite data from Geoscience Australia's Digital Earth Australia program with tidal modeling, the dataset maps shoreline positions at mean sea level for each year. This allows for the monitoring of coastal retreat and growth at both local and continental scales, providing insights into long-term trends. With data updated regularly, scientists and policymakers can analyze how the coastline has evolved, comparing current rates of change to past decades, which is crucial for planning and managing the impacts of environmental factors on the coast.
In August 2024, the DEA Coastlines product was updated to version 2.2.0, which includes the addition of shoreline data for the year 2023. This update enables users to track the most recent shoreline changes and assess ongoing patterns of coastal erosion or growth. The dataset is a valuable tool for understanding whether these changes are driven by specific events or represent gradual shifts over time, aiding in decision-making and forecasting for future coastal management strategies.
Curious about what's coming next? Our changelog is your window into the ever-evolving world of the GEE Community Catalog. It's where you'll find all the latest updates, from new datasets to exciting features, complete with direct links to explore further. And if you're feeling inspired to take your support to the next level, think about becoming a sponsor. If you’re enjoying the project, consider giving the GitHub repository a star⭐️—it’s a small but impactful way to support the growth and visibility of our community catalog.
Feel free to reach out on Linkedin and Github to share your ideas, offer feedback, or dive into the discussion.🌍✨