Data Commons and Connectivity: Exploring Global Data and Tools That Measure Internet Access
This week’s double feature: discover SpeedCheck 🛠️, a new tool for internet speed testing 📈, and explore two new global datasets 🌍 from Ookla and Measurement Lab in the GEE Community Catalog 📊
Like millions around the world 🌍, I recently found myself unexpectedly disconnected from the internet. My ISP faced issues allocating IP addresses to modems, and I spent an evening creatively cloning my MAC address, trying to convince my modem it was a computer 💻. While this was resolved in what felt like an eternity (approximately 12 hours ⏳), I found myself curious to learn more about networks. However, researching without an internet connection proved challenging.
In this blogpost, I'll share my journey with a deep dive on network connectivity and internet speed, I start with a new tool I've developed called SpeedCheck to streamline internet speed testing in your terminal and then delve into two new global network connectivity datasets in the community catalog from Ookla and Measurement Lab.
Understanding connectivity as a utility that is not equitably distributed allows us to grasp how accessibility varies significantly across different regions. While some areas enjoy high-speed internet as a given, others struggle with unreliable connections or lack access altogether. This disparity affects everything from education and work opportunities to healthcare and social connectivity. By examining global connectivity data, we can identify patterns towards a more equitable internet access for everyone.
Building tools of the trade 🛠️
With my internet restored, next on my agenda was understanding internet speed. Measuring internet speed seems simple: data transferred divided by the time it took should give the speed right? However, there's no single consistent way to measure it. Services like Ookla's Speedtest.net, Measurement Lab (MLAB), and Netflix's Fast.com all provide different results. These tools measure bandwidth to determine what data stream you can handle, but each has its nuances.
While browser-based speed tests are readily available, running tests directly from the command line offers a cleaner approach, avoiding potential interference from open tabs. On top of that getting accurate internet speeds is challenging, factors influencing that ranges from connection type (wireless or wired), the number of hops from your modem,just to name a few.
To address my need for a unified internet speed measurement tool, I developed a simple command line tool called SpeedCheck. It is a Python-based command line tool designed to simplify internet speed measurement and it consolidates results from multiple providers. SpeedCheck currently supports testing across these services like Ookla, Fast.com , Cloudflare Speedtest, M-Lab ,Open Speedtest and Speed Smart.
You can install it today from PyPI which includes additional instructions by simply typing
pip install speedcheck
playwright install #For playwright to get setup properly
playwright install-deps # May be necessary on some systems
You can find the run types if you type the following
Finally moment of truth running some tests and getting some results out, I am not connected to the modem and knowing how many other devices I have connected to the internet, so this is not bad and will change depending on so many factors that we discussed earlier.
Global Connectivity Patterns: Ookla and M-Lab Datasets 🌍📊
Understanding internet speed is just one piece of the puzzle. To gain deeper insights into global connectivity, we turn to two valuable resources. The community catalog already contained the Global fixed broadband and mobile (cellular) network and this was updated to 2024 Q1 and two new datasets were added to the community catalog.
1. Ookla's Global Fixed Broadband, Mobile, and 5G Datasets
The Ookla’s Open Data Initiative provides quarterly updated datasets on global fixed broadband and mobile network performance. The community catalog already contained the Global fixed broadband and mobile (cellular) network and this was updated to 2024 Q1. The recent addition of the Ookla 5G Map further enriches this resource, tracking 5G deployments across 241 global providers with nearly 150,000 features in the feature collection.
The Ookla 5G data was processed into a valid GeoJSON and further converted to an Earth Engine feature collection.
As I built the SpeedCheck tool, I realized that it would be great to look at how the world runs their own speed test and this time from source or more open source data which can be queried in multiple ways. Turns out Measurement Lab (M-Lab) which is one of the providers of a speedtest is also the largest open source Internet measurement effort in the world. The M-Lab Network Diagnostic Tool (NDT) is what I focused on which is what you run when you run their speedtest tool. This became the second network dataset that I added to the community catalog.
For those who are curious NDT (Network Diagnostic Tool) is a single stream performance measurement of a connection’s capacity for “bulk transport”. NDT reports upload and download speeds and latency metrics and this is what the speedtest at MLAB actually runs.
I experimented with this data, fetching raw data from Google Cloud storage buckets. A single day's worth of compressed data exceeded 30GB, so I focused on a sample.
The runs had IP addresses and no location data so I coded up something to fetch latitude, longitude, city, country and ISP information which could be used to populate the extract and convert it into a geospatial table. There are obvious limitations of estimating location from IP address and I will leave you to it to google that.
Our interconnected world still faces significant digital divides 🌍, and internet access, often taken for granted, varies greatly. These datasets not only highlight these challenges but also illustrate changes over the years 📅.
What’s going to be your rabbit hole of the week? 🕳️Reach out on Linkedin and Github to share dataset ideas, provide feedback, and join the conversation as you explore all of these datasets now part of the community catalog. Enjoy the simple CLI to run all your speed tests in one place.
As always, your involvement is crucial for keeping this project thriving. If you love what we're doing, why not give our GitHub repository a star ⭐️to help spread the word and boost visibility.
Don't forget to check the changelog for direct links and more updates💡. Much more to come stay tuned 🚀
Hello guys?
I have a project in iceland in estimatiing biomass remotely where i dont have the field data samples initially i was using the GEDI data togetehr with sentinel 1$2 and esa dem to estimate the same but since its not covering the area am finding it a challenge however have tried procesiing the ice sat 2 data and am not that much pleased with the results.am hoping to get assistace from anyone here who can helpme out ill lioterally appreciate my email is
ezrachesor@gmail.com
whatsapp no:
+254112751602
Thanks in advance