Fun with statistical clustering

I have a 60Gb iPod with 10,000+ songs on it. Every day when I get to work I hit “Shuffle Songs” and probably hear 100 to 125 of them during the course of the day. You’d think with such a vast library to hear from, it would be rare indeed where I hear two songs from the same album in one day. But that doesn’t appear to be the case. Monday, I heard at least three different songs from Shonen Knife, even though they make up only 46 of the 10,000 songs. Tuesday it seems like I heard a lot of classical, including at least two Bach piano concertos (although only one was Glen Gould). I’ve definitely heard at least one Johnny Cash song each of those three days.

Very odd. The mind is pretty amazing at finding patterns in randomness, isn’t it?

4 thoughts on “Fun with statistical clustering”

  1. Just looking at one of those examples, I figure the odds of hearing two Shonen Knife songs in one day, given you’ve heard one (got to have one before you can discern a “pattern” after all) thus:

    The chance a song is not by Shonen Knife is going to be (10,000 – 46) / 10,000 = 0.9954.

    The chance that all of the other 99 songs are not by Shonen Knife is going to be 0.9954^124 = 0.63.

    If I’m thinking straight, the odds you’ll hear three are 0.37 * (1 – 0.9954^98) = 0.14.

    If Shonen Knife’s tally of 46 tracks were representative of other artists on your ying-tong-iddle-iPod, roughly similar odds would apply to every other artist. In other words, once you’d heard one song by X, the chances of hearing two more are about one in seven. With presumably over 200 different artists represented…

    …no, I can’t remember how to work it out, but the odds are better than you think.

  2. I think the iPods just prefer to play certain artists or songs. Mine almost always goes through my entire Showtunes playlist before it plays anything from Rent.

  3. I don’t think it’s you. My iPod has a definate preference for some songs/artists/albums. I ended up defining all my playlists as “sort by least played”, and it’s helped a bit, but shuffle still isn’t random for me.

    I hear the new iTunes makes shuffle more configurable.

Comments are closed.