Πηγαίνετε εκτός σύνδεσης με την εφαρμογή Player FM !
Getting things done .. or not.
Manage episode 402614337 series 93563
Have you ever had a day when nothing you started actually got anywhere? I've had a fortnight like that. Several weeks ago I wrote a couple of articles about emergency communications and its tenuous relationship with our hobby. As a result I managed to get a week ahead of myself and started using that week to do some long overdue analysis of the WSPR or Weak Signal Propagation Reporter data set. I've started this process several times and I finally had a whole fortnight to come to grips with 6.7 billion rows of data. Spoiler alert, it hasn't happened yet.
The data contains a record of every reception report uploaded to WSPRnet.org since Tuesday 11 March 2008 at 22:02 UTC. It's published in compressed comma separated value text files and after previously spending weeks of wrangling I managed to convert each one into an sqlite3 database. This wrangling was required because some amateurs used commas in their callsigns or grid squares, or backslashes, or both, and SQLite import isn't smart enough to deal with this. After doing this conversion, I could actually query 191 different databases. I could collect the results and three weeks later I'd have an answer, just in time to download the next month of data.
Garth VK2TTY suggested that I look into parquet as an alternative. No joke, This Changed My Life. I managed to convert all the compressed CSV files to parquet, a process that took a day, rather than a week with SQLite, and then I could start playing. If you're going to do this yourself, make sure you have a big empty hard disk. After a few false starts, the report that previously took three weeks, returned in three hours, and if we're getting technical, since I know this will make at least somebody laugh, the parquet files are stored on a USB drive connected to an iMac that has the directory mounted via sshfs to a virtual Linux desktop machine that's running the duckdb binary inside a Docker container running on a different virtual Docker machine. If you're keeping track, the database travels across USB via two SSHFS mounts to duckdb and it still only takes three hours. So, impressed doesn't even begin to describe my elation. If you're asking "why?" - the answer is that I don't run untrusted binary executables on my host machine.
This allowed me to start doing what-if queries when I discovered a fun issue. A chart I generated with minimum, average and maximum power levels over time showed that there was at least one station that was claiming that it was transmitting with 103 dBm. For context, that's multiple times the power of HAARP, the High-frequency Active Auroral Research Program which in 2012 was the most powerful shortwave station using "only" 95.5 dBm, or 3,600 kilowatts, and only 2 dBm shy of the 105 dBm or 32 megawatts used by AN/FPS-85, part of the US Space Force's Space Surveillance Network, able to track a basketball-sized object 41,000 km from Earth.
In other words, 103 dBm is less of a whisper and more of a roar. Funnily enough, not every receiver on the planet reported these transmissions, but more than one did, so the issue is at the transmitter. Unfortunately, when I started looking for reports using more than 60 dBm, there were plenty to choose from, over 18 thousand. While that's less than 0.0003%, it made me wonder how much of the data is dirty and what should I do about it?
There's other examples of dirty data. My beacon has been reported on 24 MHz, which is odd, since my licence conditions do not permit me to use that band. Odder still is that several other beacons, normally on 28 MHz like me, were also reported on 24 MHz by the same station. How often does that happen?
I've previously reported the missing data from the hybrid solar eclipse in 2023, just under two hours and 12 minutes before the eclipse and the 38 minutes following it was missing. I've not yet checked to see if it magically reappeared.
Then there's the faulty decodes. I've talked about this before. Different WSPR versions are better or worse at decoding and the point at which it breaks down varies. In other words, some decoded data is inevitably wrong.
I have previously charted activated grid squares. Apparently, all of Earth, yes, all of it, has at one time or another been used both as a transmission or reception site. Including point Nemo, the top of Mount Everest, all of the arctic and antarctic and plenty more out of the way places, like say the Surveyor Generals Corner located in the Ngaanyatjarraku shire - look it up. Interesting patterns emerge when you split activations down per band. It's not clear if those are decoding artefacts or man made claims.
I've asked the HamSci community for guidance, since dropping incorrect data on the floor doesn't seem to be the right way to go about things, and whilst correcting data seems obvious, what do you change it to and how do you know what's correct?
So, no progress to show for two weeks of work and barely enough to whet your appetite to get on air and make some noise.
Some days are like that.
I'm Onno VK6FLAB
505 επεισόδια
Manage episode 402614337 series 93563
Have you ever had a day when nothing you started actually got anywhere? I've had a fortnight like that. Several weeks ago I wrote a couple of articles about emergency communications and its tenuous relationship with our hobby. As a result I managed to get a week ahead of myself and started using that week to do some long overdue analysis of the WSPR or Weak Signal Propagation Reporter data set. I've started this process several times and I finally had a whole fortnight to come to grips with 6.7 billion rows of data. Spoiler alert, it hasn't happened yet.
The data contains a record of every reception report uploaded to WSPRnet.org since Tuesday 11 March 2008 at 22:02 UTC. It's published in compressed comma separated value text files and after previously spending weeks of wrangling I managed to convert each one into an sqlite3 database. This wrangling was required because some amateurs used commas in their callsigns or grid squares, or backslashes, or both, and SQLite import isn't smart enough to deal with this. After doing this conversion, I could actually query 191 different databases. I could collect the results and three weeks later I'd have an answer, just in time to download the next month of data.
Garth VK2TTY suggested that I look into parquet as an alternative. No joke, This Changed My Life. I managed to convert all the compressed CSV files to parquet, a process that took a day, rather than a week with SQLite, and then I could start playing. If you're going to do this yourself, make sure you have a big empty hard disk. After a few false starts, the report that previously took three weeks, returned in three hours, and if we're getting technical, since I know this will make at least somebody laugh, the parquet files are stored on a USB drive connected to an iMac that has the directory mounted via sshfs to a virtual Linux desktop machine that's running the duckdb binary inside a Docker container running on a different virtual Docker machine. If you're keeping track, the database travels across USB via two SSHFS mounts to duckdb and it still only takes three hours. So, impressed doesn't even begin to describe my elation. If you're asking "why?" - the answer is that I don't run untrusted binary executables on my host machine.
This allowed me to start doing what-if queries when I discovered a fun issue. A chart I generated with minimum, average and maximum power levels over time showed that there was at least one station that was claiming that it was transmitting with 103 dBm. For context, that's multiple times the power of HAARP, the High-frequency Active Auroral Research Program which in 2012 was the most powerful shortwave station using "only" 95.5 dBm, or 3,600 kilowatts, and only 2 dBm shy of the 105 dBm or 32 megawatts used by AN/FPS-85, part of the US Space Force's Space Surveillance Network, able to track a basketball-sized object 41,000 km from Earth.
In other words, 103 dBm is less of a whisper and more of a roar. Funnily enough, not every receiver on the planet reported these transmissions, but more than one did, so the issue is at the transmitter. Unfortunately, when I started looking for reports using more than 60 dBm, there were plenty to choose from, over 18 thousand. While that's less than 0.0003%, it made me wonder how much of the data is dirty and what should I do about it?
There's other examples of dirty data. My beacon has been reported on 24 MHz, which is odd, since my licence conditions do not permit me to use that band. Odder still is that several other beacons, normally on 28 MHz like me, were also reported on 24 MHz by the same station. How often does that happen?
I've previously reported the missing data from the hybrid solar eclipse in 2023, just under two hours and 12 minutes before the eclipse and the 38 minutes following it was missing. I've not yet checked to see if it magically reappeared.
Then there's the faulty decodes. I've talked about this before. Different WSPR versions are better or worse at decoding and the point at which it breaks down varies. In other words, some decoded data is inevitably wrong.
I have previously charted activated grid squares. Apparently, all of Earth, yes, all of it, has at one time or another been used both as a transmission or reception site. Including point Nemo, the top of Mount Everest, all of the arctic and antarctic and plenty more out of the way places, like say the Surveyor Generals Corner located in the Ngaanyatjarraku shire - look it up. Interesting patterns emerge when you split activations down per band. It's not clear if those are decoding artefacts or man made claims.
I've asked the HamSci community for guidance, since dropping incorrect data on the floor doesn't seem to be the right way to go about things, and whilst correcting data seems obvious, what do you change it to and how do you know what's correct?
So, no progress to show for two weeks of work and barely enough to whet your appetite to get on air and make some noise.
Some days are like that.
I'm Onno VK6FLAB
505 επεισόδια
كل الحلقات
×Καλώς ήλθατε στο Player FM!
Το FM Player σαρώνει τον ιστό για podcasts υψηλής ποιότητας για να απολαύσετε αυτή τη στιγμή. Είναι η καλύτερη εφαρμογή podcast και λειτουργεί σε Android, iPhone και στον ιστό. Εγγραφή για συγχρονισμό συνδρομών σε όλες τις συσκευές.