Well that aint good

Further to Scraping a third party site:

Woke up this morning to an email from Linode saying my linode server has been running at 200% CPU. Logged in, and sure enough CPU is high, and there are a bazillion “firefox-esr” processes running. Did a kill-all on them, and the CPU immediately dropped to reasonable numbers. There was still a “geckodriver” process running, and when I killed that it went into zombie state instead of going away. I had to do an /etc/init.d/apache2 reload to make that one go away.

Did some tests, both from the nightly “scrape the whole damn site” cron job and the web site’s “scrape one flight plan when the user clicks on that flight plan” and I’m not currently seeing any orphaned firefox-esr or geckodriver processes. So it appears that when I scrape, I correctly close the Webdriver connection which correctly stops the firefox and geckodriver processes.

So I guess I need to keep an eye on this and see if I can figure out what the client is doing to make the website fail to close the Webdriver connection. Or maybe I left some turdlets around on the system when I was doing testing? I don’t know.

Scraping a third party site

For my sins, I wrote a website for a friend’s company that relies on scraping information off another company’s website. The company I’m doing this for does have a paid account on the third party’s website so there’s nothing ethically dubious going on here – I’m basically taking off information my clients had put into the third party site.

I couldn’t figure out the third party site’s authentication system, so instead of pulling in a page and parsing it using BeautifulSoup, I use Selenium to attach to it like a web browser.

The third party site, however, is utterly terribly written. It’s full of tables nested within tables, and missing closing tags and everything else that reminds you of the old “FrontPage” designed sites that only worked on IE. They don’t consistently use ids or names or anything else to help me find the right bits of data, so I’ve had to wing it and parse things out using regular expressions all over the place. But worse is that every now and then they change things around a bit in a way that breaks my scraping.

The way I’ve been scraping in the past was I used the Selenium “standalone” jar, attaching to this java process that pretends to be a web browser without actually being a browser. Which is important, because I run the scraping process on my web server, which is a headless linode, and like most web servers doesn’t even have X11 running on it. (Some components of X11 got installed on it a while back because something needed something that needed something that needed fonts, and voila, suddenly I’ve got bits of X11 installed.)

This method has worked great for several years – when I’m debugging at home I use the version of the Selenium webdriver that fires up a Chrome or a Firefox instance and scrapes, but then when it’s working fine I switch over to the version that connects to a java -jar selenium-standalone.jar process. I don’t know what the official term is, so I’m just going to call it “the selenium process”.

A couple of years ago they (the third party) made a change to their website that caused the selenium process to die with JavaScript errors. Like I said, their website is crap, and I shouldn’t be surprised it has crappy JavaScript. Fortunately at the same time they introduced these JavaScript errors, they put a big “submit” button on the page that would go to the selected page even if you disabled JavaScript, and so that’s what I did with my scraper back then.

Flash forward to now, and they’ve changed the site again. They still have the broken JavaScript that freaks out the selenium process, but now you can’t navigate the site if you turn off JavaScript. So I tried turning on JavaScript in my scraper and the selenium process, and as expected from 2 years ago it failed spectacularly. So I tried updating the selenium process jar, and it doesn’t even connect at all – even though my Python selenium API version number and selenium process jar version number are the same (3.141.59, I had been using selenium jar version 3.12.0 before). I did some googling and found the names of the arguments had changed a bit, so I changed that and I still couldn’t get anything working.

I tried a bunch of different ideas, and followed a bazillion web links and tried a bunch of things from those places. Nothing worked. Eventually I had to give up and install Firefox on my web server, and an optional piece of the selenium api called “geckodriver” that launches Firefox. Fortunately selenium knows how to launch Firefox in a headless manner (although installing it did drag in even more bits of X11 that I don’t actually want or need). That actually worked on the site, after I figured out how to put the geckodriver file somewhere on the path and get the geckodriver.log file put somewhere useful. But I’ve got it done for now. Until the next gratuitous change.

YouTube versus 360 Video again

So as I mentioned in YouTube versus 360 degree cameras, I had problems getting the full resolution version of a 5K 360 video. Subsequently, I uploaded a 360 4K video on the 2nd, and YouTube told me it had finished processing SD, HD and 4K version in a short time, but it wouldn’t show me the 4K version. Even 6 days later, still no 4K. I just uploaded it again, and it finished processing SD, HD and 4K versions in less than an hour. And I could immediately see the 4K version of that one. Very annoying.

I don’t know if it’s significant or not, but the first time I uploaded was on Firefox and the second was on Chrome.

YouTube versus 360 degree cameras

As anybody who has been watching my videos knows, I’m really in love with 360 degree cameras these days. Specifically I’m in love with the Garmin VIRB360, which is a shame because it doesn’t love me back. The camera hasn’t been updated in a number of years, and the image quality isn’t as good as some of the newer ones. And the VIRB Edit editing app frankly kind of sucks, except for the telemetry overlay. To the point where I sometimes put the telemetry overlay on, then export it, and bring it into Final Cut Pro to do the rest of the editing. But more importantly, it’s an orphan and you can’t get parts for it. I got the last replacement lenses for it after my boat blew off my car in a parking lot after I’d attached the camera, and I had to get them from a shop in Calgary. I saw some replacement lenses on eBay and they were going for over $250!

The VIRB360 has three things which no other 360 camera has:

  • Telemetry capture, including not just GPS in the camera, but also heart rate via ANT+ or Bluetooth. And other ANT+ or Bluetooth inputs as you like. Lots of cyclists like to connect their power meters or cadence meters, for instance.
  • An external power connector that’s waterproof, or at least water resistant enough for kayak racing.
  • And related to that, the ability to record for hours at a time without overheating. GoPro struggles to make a camera that can record for the full life of the battery in a single go, telling anybody who complains that some huge percentage of their users only record for a few minutes at a time anyway and it sucks to be you.

So while I’d like to get the higher image quality and better editing software of say, an Insta360 X2 or whatever GoPro has announced they’re going to be announcing this year, I’m kind of stuck with the Garmin.

Slight aside here – a 360 camera has two lenses and two CCDs. The process of putting the two images together is called “stitching” and can either be done in the camera or it can require desktop or mobile software to do it. What comes out is an equirectangular image that a 360 degree viewer or editor can do the fun pan around stuff in.

The Garmin’s “normal” mode is to stitch in the camera and produce a 4K (3840×2160) equirectangular image on the microSD card. But there’s also a “raw” mode where you have two files on the microSD card, and VIRB Edit stitches them into a 5.7K (4992×2496) equirectangular as it sucks the image in from the camera/microSD card. So as an experiment I did a recording a couple of days ago in the raw mode. The stitching wasn’t too terribly time consuming, and I did my usual hacked up edit just and exported the file. It’s a little bigger – about 1.17 GB per minute, versus 0.92 GB per minute for a 4K one I did a few days previously. Then I uploaded it to YouTube.

And this is where it gets frustrating. The 4K one took about a day or so to process on YouTube before I could see it in full res. It says it’s 4K, and the text and telemetry gauges look very sharp on a 5K monitor.

But the 5K one said it had finished processing a few hours after uploading, but on a 5K monitor at full screen, it says it’s only 1080 resolution, and it looks like it’s only 1080 resolution.

The text and the gauges look like crap at full screen.

So it looks like at least as far as YouTube goes, going for a higher resolution was a complete waste of time. (BTW: I can’t try Vimeo because it says one video is more than the free tier total upload limit.) So now I’m looking to see if there are good 360 video players for embedding in WordPress. Expect to see some test posts here shortly.

Heart Rate shouldn’t be this hard

In the 13-odd years I’ve been racing kayaks, I’ve come to rely on having my speed and heart rate displayed in front of me to help with pacing, both during training and racing. And usually that’s been done with the combination of some model of Garmin Forerunner GPS “watch” and a heart rate chest strap – starting with a Forerunner 301 (which nobody except Garmin would call a watch) and the strap that came with it, going through several generations of Forerunner and occasionally replacing the strap because Garmin uses these tiny little screws to hold in the battery cover and they strip easy. I’ve got 2 or 3 of them in my drawer with stripped screws. A few years ago I replaced my Garmin chest strap with a Wahoo TICKR and it worked great. Not only does it have a battery compartment that you can open with a quarter (or the corner of your CrashTag) but it also broadcast on both ANT+ for your watch and Bluetooth so you could display it on your phone (with the help of the Wahoo Fitness app).

Fast forward at bit. After a couple of years of the TICKR working great, I lost it. No idea what happened to it, it’s not in any of the places I’d normally put my strap between workouts, or any of the places where Vicki would throw it on “cleaning lady day” to get it out of the way, nor even any of the 3 gym bags or rolly bags that I normally use for travel. It just vanished. I bought a new one, which has been redesigned and is now in “stealth black” instead of blue and white. And it just never worked right – it displayed ridiculously high numbers all the time, both on my watch and on my phone. I returned it for a new one, and the same problem. After a lot of trouble shooting, I got Vicki to put it on and it reads right on her, so I know it’s not the strap. But meanwhile I’ve got no reliable heart rate.

So I bought a Garmin HRM:Dual, which is their newest heart rate strap – the “Dual” meaning that they too now broadcast on Bluetooth as well as ANT+. It worked pretty well for about a year on ANT+, but it’s *never* worked right on Bluetooth, at least not on either Wahoo Fitness or Kinomap. Ok, Wahoo Fitness might just be that the Wahoo app doesn’t work right with Garmin straps, and Kinomap is… quirky. Also, in the meantime I’ve also got a Garmin Fenix 6X Sapphire watch – it does everything the newest Forerunners do, and then some. And it reads me heart rate 24×7 on my wrist. Which is great, except for kayaking I really need to put a watch on the footstrap of my boat so I can see it. I can hardly see something on my wrist when it’s flashing past my eyes 40 times a minute.

A few weeks ago my HRM:Dual started giving garbage results towards the beginning and end of workouts. The beginning I can understand, sometimes it takes time to work up enough sweat that it makes good contact, even if you remember to spit on the pads before you start. I have some electrode gel I bought a few years ago and that helped a bit, but I was still getting garbage numbers part way through a workout. I replaced the battery, and it didn’t help.

And that’s when I started a game I call “Permutations and combinations”. Using my Fenix on my wrist and my old Forerunner 920XT on my desk or on my boat’s footstrap, I started experimenting. I tried the TICKR, still garbage (shows a number, but the number keeps rising up to around 122 while my Fenix and manual counts say I’m at 44), HRM:Dual, still garbage (says it’s connected, doesn’t show numbers). Replaced both batteries, both still garbage. Used electrode gel, both still garbage. Tried the HRM:Dual with the strap from one of the older Garmin heart rate monitors – hmmm, seeing some signs of life, but still not reliable numbers. Eventually I tried shaving the strip of hair on my chest under where the strap goes. And then I got good numbers on my HRM:Dual. At least on ANT+, still nothing on Bluetooth. But I did a workout yesterday with the HRM:Dual paired with the 920XT on my footstrap, and the Fenix 6 on my wrist, and both numbers stayed pretty amazingly in sync.

Heart rate comparison
(Purple line is HRM:Dual on Forerunner 920XT, Blue line is Fenix 6X on wrist)

And I guess that’s where I’ll leave it – I’ll pair the HRM:Dual with my Fenix 6 again, and use the Fenix on my footstrap. And try to remember to shave that strip on my chest when I shave my head.