Scraping a third party site

For my sins, I wrote a website for a friend’s company that relies on scraping information off another company’s website. The company I’m doing this for does have a paid account on the third party’s website so there’s nothing ethically dubious going on here – I’m basically taking off information my clients had put into the third party site.

I couldn’t figure out the third party site’s authentication system, so instead of pulling in a page and parsing it using BeautifulSoup, I use Selenium to attach to it like a web browser.

The third party site, however, is utterly terribly written. It’s full of tables nested within tables, and missing closing tags and everything else that reminds you of the old “FrontPage” designed sites that only worked on IE. They don’t consistently use ids or names or anything else to help me find the right bits of data, so I’ve had to wing it and parse things out using regular expressions all over the place. But worse is that every now and then they change things around a bit in a way that breaks my scraping.

The way I’ve been scraping in the past was I used the Selenium “standalone” jar, attaching to this java process that pretends to be a web browser without actually being a browser. Which is important, because I run the scraping process on my web server, which is a headless linode, and like most web servers doesn’t even have X11 running on it. (Some components of X11 got installed on it a while back because something needed something that needed something that needed fonts, and voila, suddenly I’ve got bits of X11 installed.)

This method has worked great for several years – when I’m debugging at home I use the version of the Selenium webdriver that fires up a Chrome or a Firefox instance and scrapes, but then when it’s working fine I switch over to the version that connects to a java -jar selenium-standalone.jar process. I don’t know what the official term is, so I’m just going to call it “the selenium process”.

A couple of years ago they (the third party) made a change to their website that caused the selenium process to die with JavaScript errors. Like I said, their website is crap, and I shouldn’t be surprised it has crappy JavaScript. Fortunately at the same time they introduced these JavaScript errors, they put a big “submit” button on the page that would go to the selected page even if you disabled JavaScript, and so that’s what I did with my scraper back then.

Flash forward to now, and they’ve changed the site again. They still have the broken JavaScript that freaks out the selenium process, but now you can’t navigate the site if you turn off JavaScript. So I tried turning on JavaScript in my scraper and the selenium process, and as expected from 2 years ago it failed spectacularly. So I tried updating the selenium process jar, and it doesn’t even connect at all – even though my Python selenium API version number and selenium process jar version number are the same (3.141.59, I had been using selenium jar version 3.12.0 before). I did some googling and found the names of the arguments had changed a bit, so I changed that and I still couldn’t get anything working.

I tried a bunch of different ideas, and followed a bazillion web links and tried a bunch of things from those places. Nothing worked. Eventually I had to give up and install Firefox on my web server, and an optional piece of the selenium api called “geckodriver” that launches Firefox. Fortunately selenium knows how to launch Firefox in a headless manner (although installing it did drag in even more bits of X11 that I don’t actually want or need). That actually worked on the site, after I figured out how to put the geckodriver file somewhere on the path and get the geckodriver.log file put somewhere useful. But I’ve got it done for now. Until the next gratuitous change.

YouTube versus 360 Video again

So as I mentioned in YouTube versus 360 degree cameras, I had problems getting the full resolution version of a 5K 360 video. Subsequently, I uploaded a 360 4K video on the 2nd, and YouTube told me it had finished processing SD, HD and 4K version in a short time, but it wouldn’t show me the 4K version. Even 6 days later, still no 4K. I just uploaded it again, and it finished processing SD, HD and 4K versions in less than an hour. And I could immediately see the 4K version of that one. Very annoying.

I don’t know if it’s significant or not, but the first time I uploaded was on Firefox and the second was on Chrome.

YouTube versus 360 degree cameras

As anybody who has been watching my videos knows, I’m really in love with 360 degree cameras these days. Specifically I’m in love with the Garmin VIRB360, which is a shame because it doesn’t love me back. The camera hasn’t been updated in a number of years, and the image quality isn’t as good as some of the newer ones. And the VIRB Edit editing app frankly kind of sucks, except for the telemetry overlay. To the point where I sometimes put the telemetry overlay on, then export it, and bring it into Final Cut Pro to do the rest of the editing. But more importantly, it’s an orphan and you can’t get parts for it. I got the last replacement lenses for it after my boat blew off my car in a parking lot after I’d attached the camera, and I had to get them from a shop in Calgary. I saw some replacement lenses on eBay and they were going for over $250!

The VIRB360 has three things which no other 360 camera has:

  • Telemetry capture, including not just GPS in the camera, but also heart rate via ANT+ or Bluetooth. And other ANT+ or Bluetooth inputs as you like. Lots of cyclists like to connect their power meters or cadence meters, for instance.
  • An external power connector that’s waterproof, or at least water resistant enough for kayak racing.
  • And related to that, the ability to record for hours at a time without overheating. GoPro struggles to make a camera that can record for the full life of the battery in a single go, telling anybody who complains that some huge percentage of their users only record for a few minutes at a time anyway and it sucks to be you.

So while I’d like to get the higher image quality and better editing software of say, an Insta360 X2 or whatever GoPro has announced they’re going to be announcing this year, I’m kind of stuck with the Garmin.

Slight aside here – a 360 camera has two lenses and two CCDs. The process of putting the two images together is called “stitching” and can either be done in the camera or it can require desktop or mobile software to do it. What comes out is an equirectangular image that a 360 degree viewer or editor can do the fun pan around stuff in.

The Garmin’s “normal” mode is to stitch in the camera and produce a 4K (3840×2160) equirectangular image on the microSD card. But there’s also a “raw” mode where you have two files on the microSD card, and VIRB Edit stitches them into a 5.7K (4992×2496) equirectangular as it sucks the image in from the camera/microSD card. So as an experiment I did a recording a couple of days ago in the raw mode. The stitching wasn’t too terribly time consuming, and I did my usual hacked up edit just and exported the file. It’s a little bigger – about 1.17 GB per minute, versus 0.92 GB per minute for a 4K one I did a few days previously. Then I uploaded it to YouTube.

And this is where it gets frustrating. The 4K one took about a day or so to process on YouTube before I could see it in full res. It says it’s 4K, and the text and telemetry gauges look very sharp on a 5K monitor.

But the 5K one said it had finished processing a few hours after uploading, but on a 5K monitor at full screen, it says it’s only 1080 resolution, and it looks like it’s only 1080 resolution.

The text and the gauges look like crap at full screen.

So it looks like at least as far as YouTube goes, going for a higher resolution was a complete waste of time. (BTW: I can’t try Vimeo because it says one video is more than the free tier total upload limit.) So now I’m looking to see if there are good 360 video players for embedding in WordPress. Expect to see some test posts here shortly.

First cross country ski of the season

I’m trying to remember when was the last time I really skied. I had pretty much quit by the end of university in 1985, because skate technique hurt my knees so much. I know I had one winter where I got out 4 or 5 times sometime between Shani and I breaking up and me moving south, so I guess 1992-3 or sometime around then? Then a few years ago where I tried to ski at Mendon Ponds with my now ancient ski equipment and my boots (bought in 1981 at great sacrifice) both completely separated from their soles within a few hundred meters of the parking lot. a

Last year I found out about Cummings Nature Center, and the fact that they rent there. I tried it out once and immediately fell back in love with skiing. Unfortunately I discovered it pretty late in the season so I didn’t get back out. So I’ve been itching for a chance to go out again this year. First we didn’t have snow, then we got fresh snow and the temps immediately plummeted to around 0F. Not good for starting out. But today the weather finally cooperated. It was 26F and lightly snowing when I set out for the nearly hour long drive down to Cummings.

Driving for an hour meant the return of the painful butt. I’m still in making rounds of doctors to try and get some relief of that, whatever it is, and that means I spent half the drive trying to sit only on one buttock or lift myself out of the seat.

By the time I got there, it was snowing quite a bit harder, although the roads were well plowed. I was hoping they’d still be plowed when I finished. I got there just on the dot of 9am and there was one other car in the lot. They were skiing but not renting (I could tell because they’d skied from the parking lot to the chalet). The rental form asked what level skier you are. They didn’t have a spot for “I used to be quite good, but that was before you were born”, so I ticked “intermediate”. I was sure that when they saw that I’d put my e-mail address at xcski.com they’d accuse me of giving a fake address, but they didn’t say anything.

The equipment was quite good quality and new this year they told me. The new bindings are so much better than they were when I was a skier. And the ski lengths aren’t multiples of 5cm for some odd reason. I got a pair of Madshus Actives at 207cm because I used to race on 215s and I was a lot lighter back then. The wax less system felt like a combination of steps and skins. It worked pretty well at first.

Felt like old times. Set off and hey, my diagonal stride isn’t too bad in the grooves, but the muscles you use to keep your skis in a straight line when you aren’t in the grooves, or to skate around corners, or snowplow turn on a downhill, are all completely atrophied. Oh well, I’ll get this back.

My heart was pounding pretty hard, but the values displayed on my watch were ridiculously low. Stupid heart rate strap had had problems last time I’d erged. I didn’t think it had been long enough to need a new battery, so I hoped it would start reading right after I’d worked up a sweat. I figured it was probably in the high 140s or more because I’d had to stop to catch my breath on a couple of climbs.

I did the yellow trail out to the blue and did the blue loop, and when I got back to the yellow I thought “I don’t need to go back to the lodge yet” and set out around the blue trail again. Even though it was only 1.5 or 2 kilometers, it felt like a victory. And when I got to the junction with the orange trail, I took that one. Half way through the orange trail I got a notice on my watch that the heart rate strap had a low battery. I stopped to take it off, hoping that the watch would revert to the built in optical heart rate. I’m not sure what it did, because it was still giving me numbers around 100 bpm when the pounding in my chest was telling me it was actually over 140. I wonder if the strap was continuing to broadcast crappy data in my backpack.

When I finished the orange and blue, this time I took the yellow trail back to the lodge. I didn’t note the actual distance, but I think it was somewhere between 4.5 and 5.5 kilometers. My goal for the day had been to make it for 5 kilometers total, so I was feeling pretty good. And after having a brief sit down in the chalet to drink some water and eat a banana I’d brought, I was feeling good enough to go out and do the blue trail loop again.

This time, I think the wax they’d put on the skis to improve the glide had worn off, because my skis stopped abruptly instead of gliding a few times, once pitching me onto my face. I had to stop a few times to do the old “scrape the ski over the edge of the other ski” trick to get my glide back. I was also definitely tired now. But my heart rate was now showing up properly on my watch, and I was seeing numbers in the very high 140s and low 150s.

I finished up back at the lodge with a total of 6.6 kilometers. Goal exceeded! But I was really done – I don’t think I could have done even the yellow loop again. So I returned my rentals, suggested they renew the glide wax, and headed off to the car. It was barely 10:30. And it was snowing quite hard.

The first part of the drive was plowed but now bare, but after taking it easy on that I soon got back to bare road and headed home. Once again, the sore butt problem “reared” it’s ugly head but it was an excuse to stop for a Coke at least.

I can’t wait to do it again.

Ending on a high note

Today I went into my place of work, and picked up all the stuff I’d left in and around my desk. Then I spent a few hours making sure none of my non-work info was left on my laptop, especially my password manager and iCloud account. Left my keyfob on my desk. Then I took my laptop to FedEx Office and sent it back to our head office in Connecticut. And that is it. Forty years of work as a professional computer programmer is over.

I counted it up a few months ago when I was writing my resignation letter, and I make it somewhere between 20 and 22 different jobs depending how you count it. That includes 1 month contracts and two 6 year long permanent jobs and everything in between. It doesn’t include two occasions where I was unemployed for several months in a row. Sometimes it sucked, sometimes it was great, but I’m never sorry that I chose this path.

Early on in the history of this blog, I started a series of “bad job experiences” posts. I stopped that after one of the people I’d mentioned in a post found the blog and disputed some of the things I said about it. I realized these posts might show up when I’m looking for work and potential employers Google my name and that might be harming me. I’d much rather they found my 100,000 plus Stackoverflow points or even my pathetic GitHub profile than that.

Weirdly, even though I had fodder for that series even at the best jobs I had, I am hard pressed to find anything like that to write about my last job. I started at Skillsoft on 5 January 2020. By late March, we very quickly transitioned to working from home. Skillsoft management were great – one of the first things they did was immediately give us a day off to recover from the “stress” of the change. I’d had 7 years of previous experience with working from home and I thrive in that environment, but I took the day off, of course. They then put two weeks of “special leave” in our online time manager that we could take for COVID related emergencies, like providing support for sick family members or needing time to arrange things for your children. I think our sick leave was officially “use as much as you need, but we’ll probably need a doctors note if it drags on too long”.

I loved just about everything at this job. It was fast paced without being frenetic, you weren’t pressured to meet unreasonable deadlines, the tech stack was good, the other developers very approachable. Pat, the team leader was always willing to get on a slack call and walk you through any problems you had. Usually I tried to call my team mate Daquanne rather than Pat because Pat had so many other calls on his time and Daquanne was great at explaining things. I kind of hated sprint demo day, I did at my previous Agile jobs as well, but I got through them ok. And when we were in the office, Michelle would make cookies on demo day.

Other than the stress of demo day, the only nit I could pick was my co-worker Uyen who wore a lot of perfume. I’m over sensitive to perfume, and it would frequently make me sneeze even when she was at her desk and I was at mine. I bought a little USB powered fan to try to blow air towards her desk, and I guess it worked but I only had it for a week when we went to full work from home. Anybody need a cheap fan? She also had an accent which made it hard to understand her over Teams, so I didn’t go to her for help unless it was something where she was the subject matter expert, like our Fastly configuration.

We had a small team, and everybody got to work on front end and back end as per our own inclinations. Everybody had their areas of comfort but they also didn’t seem to mind if you picked up a story in their area or suggested a different approach in a code review. I can honestly say this was the best team I’ve ever been on – I’ve worked with other very smart very good programmers, but every other team had a person or two who you just hoped they’d go away and stop dragging down the rest of you. I’m hoping that doesn’t mean I was the drag.

I’ve been looking forward to retiring for a long time. I’m not going to stop programming – I’ve got a couple of projects I want to work on, and maybe I’ll do some bug fixing for open source projects. It sounds like log4j could use some help?

But also, I’ve been looking forward to having more time for paddling and biking. With more time to train, I was hoping I could try to do the Adirondak Canoe Classic. Unfortunately I’ve been having massive problems with pain in my hips and butt. This summer, I actually had to stop paddling during races to lift my butt out of the seat a few times to relieve the pain. And that pain has gotten worse over the last few months. I can’t paddle, or even sit in a car or a desk chair for more than 45 minutes without being in intense pain. In our recent trip to BC, there were several times I thought I was going to scream I was in so much pain. If I can’t find a solution for the pain, I’m not sure what I’m going to do.

That’s also going to impact my other major goal of retirement – traveling with Vicki. Again, I’m not looking forward to long car rides. Flying business class seems acceptable, especially those amazing pods we got on the flight home from BC. And let’s not even think about what the new COVID variant might mean to our booked Viking cruise.

So I guess task # 1 of the new year will be pounding the desk at my doctor until I get a solution to my pain problems or medication to manage them.