Measure Twice, Compress Once: Tales of Compression in the Community Catalog
Lost in Compression? Learn from lessons and checks I perform on raster images that feed the GEE Community Catalog. With 200GB last week what better way than to start with pro tips and tricks
The Google Earth Engine (GEE) Community Catalog recently welcomed several massive datasets, totaling over 200GB. These additions, including high-resolution imagery from UrbanSky and Wyvern Hyperspectral Tales of Compression in the Community Catalog, along with the Global Annual Simulated NPP-VIIRS Nighttime Light Dataset (1992-2023) and release 3.2.0.
A few of these included some interesting take home from the point of view of image compression and as we think about anyone bringing their own data into such platforms or hosting it elsewhere.
Satellite Data: When Compression Calls
The need to efficiently transmit and store images is a challenge that has driven innovation for over a century. For satellite imagery, where data volumes are particularly massive, efficient compression is absolutely critical. Whether that is downlinking these images into ground stations in a matter of seconds to streaming these on your phones and laptops. Image compression save space, time and provides for a better experience.
In the context of satellite imagery, lossless compression methods also play a vital role. For applications where absolute data fidelity is crucial, such as scientific research or change detection studies, algorithms like LZW and DEFLATE are essential. These methods ensure that no information is lost during compression, allowing for accurate and reliable analysis. The challenge has always been to balance the need for high compression ratios with the requirements for data fidelity, processing speed, and compatibility with existing systems. As the volume of satellite data continues to grow exponentially, efficient image compression will remain a critical technology
Measuring the Uncompressed: Global Annual Simulated NPP-VIIRS Nighttime Light Dataset
Uncompressed imagery can get really large really quickly and not only do they need more storage space they are unruly for consumption and require the end user to spend compute cycles to add or compress the imagery at their end. As we move into more Cloud native formats like Cloud-Optimized GeoTIFF (COG) are transforming data access. COGs enable efficient reading of only the necessary portions of a file, leading to faster and more cost-effective usage in cloud environments while still allowing for compression to be applied to the imagery.
To start with lets talk about Global Annual Simulated NPP-VIIRS Nighttime Light Dataset (1992-2023). A simple look at the uncompressed files allows us to find that the total size on disk was about 369GB uncompressed which post conversion to COGs with LZW compression came down to about 8.5 GB
So here’s some thought, before uploading your data into storage look at the image metadata even a simple
gdalinfo "path to your file"
should provide a look into your geospatial data and yield some really useful information on what you need to do to optimize the data or if they are already optimized for performance. If these are uncompressed you can easily compress them using GDAL or other tools like rasterio which wrap around GDAL and have optimization built in. There are a lot of great blogs on doing this and here is a simple one and a good read here too.
I am also creating an open source collection of some of these tools I use or built so you can do this on your own. Look to a future blog on that one but if you are curious you can always try it right now here
Lossy and Lossless: The Urban Sky RGB
So how does the world look like in 10cm? Urban Sky recently provided open and freely available high-resolution aerial and thermal imagery captured from stratospheric Microballoons over the LA fires. The dataset includes both RGB imagery at 10cm resolution with coverage focused on wildfire events and urban areas. The image covers a portion of Los Angeles (118°44'W to 118°21'W, 34°13'N to 33°57'N) and boasts a resolution of 352,256 x 294,912 pixels compressed using a lossy YCbCr algorithm.
The YCbCr color encoding scheme is particularly well-suited for high-resolution imagery like 10cm RGB data. This is because YCbCr leverages the human eye's higher sensitivity to brightness variations compared to color variations. By maintaining high-quality brightness information (Y component) while applying stronger compression to the color components (Cb and Cr), YCbCr JPEG achieves significant file size reductions while preserving overall visual quality. This makes YCbCr JPEG ideal for a wide range of applications where minor color variations aren't critical, such as visual analysis, mapping, web service delivery, and large-scale urban monitoring.
Transcoding Trap!
Earth Engine ingestion does not store ingested images compressed in lossless compression. This is probably owing to the belief that the system has that very little should be done to the original pixel and a lossy compression does not preserve information. Well the result is Earth Engine recodes/transcodes the lossy compressed image into an lossless one.
The key pitfall - never transcode between lossy and lossless formats! Those JPEG compression artifacts become baked into the lossless version as actual pixel data.
Not to mention the size of these images post transcoding from lossy to lossless is much larger, in this case in Earth Engine we move from 11 GB to 97 GB in total size.
Convert to lossless only when you have uncompressed source data and lossless is truly needed downstream. Avoid trancoding from lossy to lossless and this preserves bandwith and overall storage required to save these images.
To get this to work in GEE while using the lossy compression is to convert to COGs unless they are already COGs and register or load directly onto Google Earth Engine using
var image = ee.Image.loadGeoTIFF(uri)
You can now also register them as collections too. Here are some gotchas that may not be obvious as the description suggestion the buckets have to be specific region locked else you will get this
Layer 1: Layer error: Image.loadGeoTIFF: Not allowed to read from bucket 'catalog-datasets' that is: not located in the US multi-region, not located in a dual-region including the US-CENTRAL1 region, not located in the US-CENTRAL1 region, or not accessible (check the storage.buckets.get permission).
Geospatial Data Compression Checklist
Pre-Compression Analysis
Run `gdalinfo` on your files to examine image metadata
Assess current storage size and compression status
Determine if data is already optimized for performance
Review the intended use case (scientific analysis, visual display, etc.)
Check if source data is uncompressed or already compressed
Compression Strategy Selection
Data Fidelity Requirements
Scientific research or change detection: Choose lossless (LZW, DEFLATE)
Visual analysis or web delivery : Consider lossy (JPEG, YCbCr)
Mixed use cases : Evaluate hybrid approaches or maintain multiple versions
Format Compatibility
Verify software ecosystem support
Consider Cloud-Optimized GeoTIFF (COG) for cloud environments
Check regional requirements for cloud storage (e.g., GEE bucket location requirements)
Performance Considerations
Balance encoding/decoding speed vs. file size and evaluate bandwidth constraints
Assert storage capacity limitations and consider processing overhead at user end
Key practices to embrace
The journey to effective geospatial data compression begins with understanding fundamental best practices that can make or break your data management strategy.
Conduct thorough testing of compression options on representative sample datasets
Maintain detailed documentation of compression decisions and their rationale
Regularly validate compressed data against original sources
Implement version control for compression configurations
There are always smaller and big lessons to consider as I download, process and clean datasets for the Awesome GEE Community Catalog. Best practices and simple checks allow you to save precious time and optimize your workflow while remaining platform agnostic in the choices you want to make. Until next time while we keep adding more datasets.
Ready to dive deeper? The changelog tracks every exciting addition to the GEE Community Catalog – from massive datasets to game-changing features. Take a peek and explore what catches your eye! Support our mission by sponsoring the project or dropping a ⭐️ on GitHub. These small gestures make a big impact in growing our community and my efforts in continuing to build the catalog.
Got thoughts to share? Find me on LinkedIn and GitHub and send me a direct message, tell me about your data story. 🌍✨
JpgCrush.com for JPEG compression