New disks set-up

Having established that there is *something* making the disks run slowly on my colo box, I am resolved to fix it. One of my xen “tenants” generously donated two new Hitachi Deathstar^WDeskstar disks. In order to save some downtime, I’m setting up the new disks on the server that I replaced because I thought it was causing hardware problems (but which may or may not have been due to the crappy disks I was using). Setting up essentially a new server means I also have a chance to try out Debian 6, which became the “Stable” release a few weeks ago but which I haven’t had the nerve to upgrade the colo box to.

Fortunately, I have my previous post on Another try at setting up the new server to act as a checklist.

Day 1

  • Downloaded and burned the Debian 6 NetInst disk for AMD64.
  • Installed the new disks in the old box and booted from the NetInst disk
  • Just in case they fixed the problems with lvm and software RAID and grub not playing nice together, tried installing as a two disk software RAID-1 with LVM on top of that
  • Installed with 4Gb root partition and 2Gb swap on LVM
  • One of the install options was “SSH Server”, and so I choose that one
  • Success! It boot with Grub with that configuration.
  • Discovered that ssh installation dragged in xauth and a bunch of X11 libraries, so removed those.
  • Installed smartmontools and enabled them in /etc/defaults/smartmontools
  • Installed xen-utils and kernel and all the stuff that drags in.
  • Rebooted and discovered to my relief that it boots the xen kernel.
  • Installed rsync for backups.
  • Installed munin-node and munin-plugins-extra.
  • Installed vim and removed vim-tiny.
  • xm list isn’t working. Tried to manually start xend and got a screen full of errors. Tried to start it with the /etc/init.d/xend start and nothing happened.
  • Discovered it’s not starting xend because Grub is booting the xen kernel without the Hypervisor. If I choose the correct entry off the grub list, I get it. Now to figure out how to change the boot order in this new version of Grub.
  • Took my backup disk and added it to the third drive sled so I’ll have SATA speeds when I restore from it.
  • Edited /etc/default/grub and changed the GRUB_DEFAULT value to 4 (remember they’re numbered from 0) and then ran update-grub.
  • Copy ssh configuration in /etc/ssh/ and ~root/.ssh from backup.
  • Copy munin-node configuration in /etc/munin/
  • Uninstalled exim4 and installed postfix because I know how to configure postfix.
  • Copy postfix configuration from backup.
  • Oops. Need the hostname configuration to match the hostname in postfix.
  • Create lvm volumes with lvcreate -L 150G -n xen1-disk xen-disk.
  • Create file systems on them with mkfs.ext3 /dev/xen-disk/xen1-disk.
  • Create swap with mkswap /dev/xen-disk/xen1-swap.
  • Installed ntp
  • Copy backups with rsync -aSurvx --numeric-ids --delete /mnt/sdc1/mp3s/ /mnt/mp3s/.

And at this point, while restoring the data from the backup to the disks, it started throwing SMART errors. Which at least vindicates our purchase of new hardware to replace this box. I was starting to worry that the problems we’d seen on this hardware were entirely due to the same disk problems we were seeing on the new hardware.

Continuing on:

  • Reformat the partitions with mkfs.ext3 -c.
  • Still get the error on restoring the backup.
  • Deleted the lv that was causing the problems, and tried creating a bunch of smaller ones.
  • Make file systems on the smaller (50G) lvs and rsynced about 45Gb of data onto each one. Didn’t get any errors, so wondered if the errors were coming from the source disk.
  • Did a tar cvfz /dev/null . of the backup that was throwing the errors. That didn’t give any errors either.
  • Removed the “junk” lvs and created the big one again. Did a mkfs.ext3 -c on it.
  • rsyncing the data over got the error on the same file again. And this time I’m almost sure it’s the backup disk, not the destination.
  • Tried to copy the offending file to /tmp, and got the same error. So yes, it’s the backup disk.
  • At this point, I have enough of the system restored that it’s painless to do the rest of the rsyncs from last night’s backups on my home server. So that’s what I’m doing. I’ve done rsync -aSurvx --numeric-ids --delete xen1/Sun 192.168.1.119:/mnt/xen1 and it transferred about 10 files and deleted a couple of postgresql log files.

With that all done, it was time to get serious about setting up xen and running the domUs.

  • Copied the domU configuration files from backup to /etc/xen.
  • Modified them for the new kernel version (hey, is this the version with no global locks? That could be a huge win). Copied the appropriate /lib/modules/ into each of the domU directories
  • Tried to start a domU. It complained about being unable to start the network. Copied a line out of the backup of /etc/xen/xend-config.sxp to the new one.
  • Tried to start a domU. Ran out of memory.
  • Remembered that the live site has 8Gb but this only has 4Gb, so reduced the size of the memory allocated to each domU.
  • Tried to start a domU. It gave a bunch of errors about being unable to start the raid and the lvm. Thought about it for a while, and realized that since I’m specifying an initrd in the config file, and that initrd is the one I use to start the host OS, it thinks it needs to start a raid and lvm in order to mount any disks. Oh oh.
  • In desperation, installed xen-tools to see what it did when it created a configuration file. It used the same kernel and initrd as I had, but instead of calling the virtual disks “hda1” etc, it called them “xvda1”.
  • Modified all my xen configuration files and fstabs and was able to bring up all three domUs.
  • When I attempted to reboot, the computer threw a bunch of errors and locked up. It appears that it was trying to save the xen configuration in /var/lib/xen/save. I’ve seen that before. So I modified /etc/default/xendomains to change the XENDOMAINS_SAVE variable to prevent it from saving. Now it’s shutting down correctly.

Houston, we have a problem

Since putting in the new colo box, we (myself and the two “tenants” on the Xen user domains (domU)) have noticed it being very slow. At times it seems like the first time you try something it will be very slow, but if you try again immediately it will run quickly. For instance, sometimes a page load will time out, but you hit refresh and it will load quite quickly. I’ve started to suspect the problem is the disks, because the CPU is pretty fast. In order to pin point the problem, I’ve decided to try and benchmark the colo box against my home computer. Both computers have SATA 3Gb/s disks in a software RAID-1 (mirror). Both computers have dual core CPUs (although the home one is a Core2 Duo at 1.86GHz and the colo is a Xeon at 3.0GHz). However, the colo is also running tons of other stuff and it’s running in a Xen domU, so that might slow things down a bit.
Continue reading “Houston, we have a problem”

Mild disappointment

Bought two new hard drives to add to my Linux box. Could only find one of the two SATA cables that I thought I had, so I went to FrozenCPU.com today to pick up some new ones. Got home, opened up the computer and found the missing SATA cable, and also discovered that there is only one power connector free. So tomorrow I’ll have to stop by FrozenCPU.com again to buy an adaptor. Fortunately they’re in East Rochester, and so is my physiotherapist, so it won’t be a wasted trip. But it does mean another day of failed backup jobs because I don’t have the extra disk space.

Some observations on Facebook’s “Phonebook”

Facebook has a personal “Phonebook” for your account. A couple of people have seen this and thought “Oh my God, Facebook has information I never gave it”. I’m not so sure this is correct. As far as I can tell, the information there is a combination of information other people have added to their account plus information I have shared. Based on my observations, it appears the information they’re showing me is either

  • Phone numbers I already had
  • Phone numbers that my Facebook friends share with their friends
  • Phone numbers that people who aren’t Facebook friends share with the public
  • And in some cases, phone numbers I already had combined correlated with the FB profiles of people who have put their phone number in the protected part of their profile

I cannot find a single instance of it divulging a phone number to me from a stranger. But I can see why people might be a little surprised about that last part. I’m not.

I use a phone operating system, WebOS, that integrates all my contacts from Google, from information I imported from my old Palm Treo, from LinkedIn, and also information it downloads from Facebook. This is kind of cool, because when I get a phone call from a Facebook friend I get their Facebook profile picture showing up on the screen. It also means I don’t have to grovel through multiple sources to get all the information I know about somebody. I suppose I shouldn’t be too surprised that some of that information made it up to Facebook. I can’t recall for sure, but I might have also used one of Facebook’s “Find your friends, upload your contacts” things. I’ve also set up various links between Google contacts and Apple Address Book and the like, so it’s damn near impossible to find where the data came from.

So here are some observations on what data they have, and what data they don’t have.

Case 1: My FB friend Dennis Mike was worried because when he looks at his phonebook, it shows his cell phone number which he doesn’t think he’s every shared with FB. I don’t have his phone number in my phone, and he doesn’t show in the FB Phonebook for me. So it’s not sharing his phone number with people who didn’t already have it, even FB friends.

Case 2: It shows my daughter’s phone number (which is information I already had) linked to my daughter’s FB profile with a “Add as friend” link. (She unfriended me a while ago, long story.) I assume that this is combining information I already had (her phone number) with information that Facebook got from her profile and decided “aha, this Facebook profile is a person you already know”. She may not be disclosing her phone number to non-friends, but FB decided that’s information I already have.

Case 3: It shows my brother’s phone number, but it’s not linked to his profile, in spite of the fact that we’re FB friends. I think that means that he never linked his phone to FB, and because his name on my phone is different than his name on FB (Dave versus David) it doesn’t manage to find the link.

Case 4: It shows a guy who I had some business dealings with, full name, linked to his Facebook profile with an “Add as a friend” link. In my phone, I have his number and his name as “Dave @ [company name]”. So I guess this is another example of FB correlating a phone number in my phone with a phone number that somebody put in the protected part of his profile.

I guess my point of this investigation is that Facebook has some information, and they’re able to do some correlations on this. What’s visible seems pretty innocuous, and it really does help you find the FB profiles of people you know. I see this feature as a good thing, both because of the way it helps you and because it gives you some insight into the sorts of correlations that it’s possible they’re doing behind your back and not exposing to the general public. As they say on Angry Mac Bastards, if a business isn’t charging you money, it’s because you’re not the customer, you’re the product.

How did Google find that?

Google has a blog post showing how they set up some fake search results, and then a short time later Bing started returning the same fake results, and therefore they suspect IE8’s “Suggested Sites” and/or Bing’s “Customer Experience Improvement Program” is spying on what you click and sending the results off to Microsoft.

But before Google gets all high and mighty, I want to tell you about what happened to me. I did some documentation for a customer I was doing some work for. I did it in the form of a TiddleyWiki and stuck it up on a brand new, never used before subdomain of my main domain. Well, she hated it and asked that I do it as a Word document instead, which I did. But I forgot to take it down. No problem, I thought, after all nothing links to it or mentions it in any public place, so how would a crawler find it?

Imagine my surprise when the customer calls me up some time later saying that this old version of the documentation, in a subdirectory on a un-linked to site is showing up in Google searches for her product’s name. How did that happen? Using the advanced search, I couldn’t find anything that linked to it. There was one mention of that domain in a forum post, but in that case I was using the :8080 port because I was referring to the Tomcat server that was also running on that domain.

So as I see it, the choices are:

  • Google saw the mention of the domain in the middle of a forum post, recognized it as a URL (it wasn’t a link) and stripped out the :8080 and crawled the site OR
  • They saw me mention the url in a link I send in a GMail to the customer and used that as an excuse to crawl the site.
  • IE reported the link to Bing when the customer clicked on it and then Google stole it from Bing somehow
  • Chrome reported the link to Google when I clicked on it

Either way, they’re crawling things that aren’t public links. Me thinks Google protest too much.