A weird thought

I had a weird thought the other night. There are a couple of programming tasks on my massive “to do” list that I figured I’d power through in the first few months of retirement before I started spending hours and hours alternating between training in my kayak and touring on bikes with my wife.

Well, life doesn’t always work out the way you intended and none of my todo items has been checked off because the pain that last year made it uncomfortable to sit for too long, and which was just on the “barely tolerable” end of things by the end of a normal length kayak race has now progressed to absolutely intolerable for even short stints at a desk chair or kayak. I’ve spent about 15 minutes total all winter in my erg, and haven’t even put my kayaks in the water. Normally by this time of the year I’d have 30 or 40 hours on the erg and about the same on the water. And I limit my sitting at my desk chair to short periods to deal with bills and paying taxes and the like. Even the library easy chairs are uncomfortable verging on painful these days.

But back to my weird thought. I have a new iPad. I can’t afford a new laptop. So I was thinking that for those programming tasks, what I might try is to install “code-server”, which is a hosted version of Visual Studio Code, on my linux server. This gives you the full power of a pretty extensive IDE available through an iPads web browser. I could try coding up one of those projects using that, maybe using the git integration to push the app to a free heroku instance for testing and debugging. I wonder if that’s doable?

Well, in order to find out, first I’d have to install code-server and make it available through my web server. Oh, what’s this, it appears you need to use nginx as your web server rather than Apache to do that. Well, no problem, I’ve been intending to make that switch to make it easier to use LetsEncrypt to put everything behind https like I should have years ago. Oh wait, one of my sites uses Perl fastcgi. Looks like there’s some extra hoops you have to jump through to configure that. And also convert all my .htacess files into clauses in the nginx configuration files.

Sigh, this is going to be a full in yak shaving exercise, isn’t it? I just wish the pain killers I take to be able to sleep at night didn’t leave me dizzy and disoriented all day, or that they actually killed the pain instead of just knocking me out.

Well that aint good

Further to Scraping a third party site:

Woke up this morning to an email from Linode saying my linode server has been running at 200% CPU. Logged in, and sure enough CPU is high, and there are a bazillion “firefox-esr” processes running. Did a kill-all on them, and the CPU immediately dropped to reasonable numbers. There was still a “geckodriver” process running, and when I killed that it went into zombie state instead of going away. I had to do an /etc/init.d/apache2 reload to make that one go away.

Did some tests, both from the nightly “scrape the whole damn site” cron job and the web site’s “scrape one flight plan when the user clicks on that flight plan” and I’m not currently seeing any orphaned firefox-esr or geckodriver processes. So it appears that when I scrape, I correctly close the Webdriver connection which correctly stops the firefox and geckodriver processes.

So I guess I need to keep an eye on this and see if I can figure out what the client is doing to make the website fail to close the Webdriver connection. Or maybe I left some turdlets around on the system when I was doing testing? I don’t know.

Scraping a third party site

For my sins, I wrote a website for a friend’s company that relies on scraping information off another company’s website. The company I’m doing this for does have a paid account on the third party’s website so there’s nothing ethically dubious going on here – I’m basically taking off information my clients had put into the third party site.

I couldn’t figure out the third party site’s authentication system, so instead of pulling in a page and parsing it using BeautifulSoup, I use Selenium to attach to it like a web browser.

The third party site, however, is utterly terribly written. It’s full of tables nested within tables, and missing closing tags and everything else that reminds you of the old “FrontPage” designed sites that only worked on IE. They don’t consistently use ids or names or anything else to help me find the right bits of data, so I’ve had to wing it and parse things out using regular expressions all over the place. But worse is that every now and then they change things around a bit in a way that breaks my scraping.

The way I’ve been scraping in the past was I used the Selenium “standalone” jar, attaching to this java process that pretends to be a web browser without actually being a browser. Which is important, because I run the scraping process on my web server, which is a headless linode, and like most web servers doesn’t even have X11 running on it. (Some components of X11 got installed on it a while back because something needed something that needed something that needed fonts, and voila, suddenly I’ve got bits of X11 installed.)

This method has worked great for several years – when I’m debugging at home I use the version of the Selenium webdriver that fires up a Chrome or a Firefox instance and scrapes, but then when it’s working fine I switch over to the version that connects to a java -jar selenium-standalone.jar process. I don’t know what the official term is, so I’m just going to call it “the selenium process”.

A couple of years ago they (the third party) made a change to their website that caused the selenium process to die with JavaScript errors. Like I said, their website is crap, and I shouldn’t be surprised it has crappy JavaScript. Fortunately at the same time they introduced these JavaScript errors, they put a big “submit” button on the page that would go to the selected page even if you disabled JavaScript, and so that’s what I did with my scraper back then.

Flash forward to now, and they’ve changed the site again. They still have the broken JavaScript that freaks out the selenium process, but now you can’t navigate the site if you turn off JavaScript. So I tried turning on JavaScript in my scraper and the selenium process, and as expected from 2 years ago it failed spectacularly. So I tried updating the selenium process jar, and it doesn’t even connect at all – even though my Python selenium API version number and selenium process jar version number are the same (3.141.59, I had been using selenium jar version 3.12.0 before). I did some googling and found the names of the arguments had changed a bit, so I changed that and I still couldn’t get anything working.

I tried a bunch of different ideas, and followed a bazillion web links and tried a bunch of things from those places. Nothing worked. Eventually I had to give up and install Firefox on my web server, and an optional piece of the selenium api called “geckodriver” that launches Firefox. Fortunately selenium knows how to launch Firefox in a headless manner (although installing it did drag in even more bits of X11 that I don’t actually want or need). That actually worked on the site, after I figured out how to put the geckodriver file somewhere on the path and get the geckodriver.log file put somewhere useful. But I’ve got it done for now. Until the next gratuitous change.

Maybe back to the drawing board…

I had this idea for an app to handle registration and results for kayak races. I had the following requirements in mind:

  • It must work when off-line
  • It must work on laptops and tablets
  • Preferably, it will sync up with a server when it is on-line
  • It must not require any installation or other technical futzing around because my target audience (the people who run kayak races) are not all very technically sophisticated.

After that, my idea was to make a proof of concept, and incrementally improve it as I got more ideas and maybe got some interest from others. I also wanted it to be a web page (with supporting JavaScript and CSS files), one that I could just give people a zip file and they could unzip it and open index.html in their browser and be good to go.

I discovered PouchDB, which would take care of the storing information locally in the browser when off-line, and also would sync to a server when it came time to do that. And so off I went programming away. My little proof of concept was humming along, it could accept registrations and display and edit existing registrations, and I was well set to add results entry and display, when I thought to try it on the bane of every web developers lives, Internet Explorer.

First problem: IE reports the ‘fetch’ is not a valid function. Fortunately, the documentation for PouchDB warns you about that, and says to install a polyfill. So I install it, and now IE reports ‘Promise’ is not a valid function. Hmm, no mention of that in the PouchDB docs that I can find.

Can I just mention as an aside that the PouchDB docs do say that it supports IE 10 and IE 11? Yeah, about that…

Thanks to an answer on StackOverflow, I find another polyfill for Promise. Now IE reports that you can’t use IndexDB on web pages that are loaded as files rather than as URLs. Not sure what to do about that except tell people to stop using IE. It appears that with my polyfills and stuff, it does work in Edge, at least. Small mercies.

2018 Look Back

2018 started out pretty shitty. I was unemployed, and my unemployment insurance had run out. Depressed due to the long employment search and other things, I started the year out of shape and overweight, only to be hit with two massive bouts of sickness that pretty much wiped out my winter training and dieting, meaning I hit the racing season with very few miles under my belt and a lot more fat under there.

I got a job in February, and while it was interesting the pay was quite low – I’d actually earned more as a full timer with benefits in 2001 than I was earning as an hourly contractor with no benefits at this job. So midway though the year I left that job for another which paid much better. I hate to be a job hopper like that but the difference in pay was hard to believe.

Because of the reduced financial circumstances this year, I didn’t do a lot of the “away” things I’ve done in previous years – no TC Surfski Immersion Weekend, no Canadian Surfski Champs, no Gorge, no Lighthouse to Lighthouse. Instead I concentrated on doing as many NYMCRA races as possible, even camping out to save money instead of getting hotels for away races. I did several races I’ve never done before, including the two days of Madrid and the lovely Blue Mountain Lake race.

Even better, the USCA national championship races were held in Syracuse. I had two really good 10 mile races – unfortunately both races were 12 miles. Both times I lead a pack of racers for the first 10 miles, then faded and got passed by all of them in the last 2 miles. Definitely something to work on this year.

I started the season completely out of shape with the intention of racing my way into shape, hoping to peak with the USCA Champs. It worked pretty well, and in spite of my tactical errors there, I had a really good race at Long Lake. I was hoping to continue with the final race of the season, the Seneca Monster, but it got cancelled.

In other good news, I really dialed in my video production workflow, aided by the fact that I now have a high end iMac. Also, I got a really amazing carbon fibre GoPro mount for the front of my kayak – not only lighter than my older aluminum one, but also more aerodynamic. After the end of the season, GoPro released a new camera, the Hero 7 Black, with a much touted “Hyper Stabilization” mode. I bought one and tried it out and it is pretty amazing. I can’t wait to use it for races next year.

I also bought a new boat – I did some side work for a pilot friend of mine and used part of the money to buy a V8 Pro, a more stable boat than my V10 Sport, but still pretty fast. During interval workouts on the bay, I found I could just put the power down instead of bracing and trying to keep upright.

One of my daughters got engaged this year. I really like her fiance and they seem really good together.

Both of my parents had health setbacks this year. I think this coming year’s travel plans will have to mostly involve visiting them.