How did Google find that? – Rants and Revelations

Google has a blog post showing how they set up some fake search results, and then a short time later Bing started returning the same fake results, and therefore they suspect IE8’s “Suggested Sites” and/or Bing’s “Customer Experience Improvement Program” is spying on what you click and sending the results off to Microsoft.

But before Google gets all high and mighty, I want to tell you about what happened to me. I did some documentation for a customer I was doing some work for. I did it in the form of a TiddleyWiki and stuck it up on a brand new, never used before subdomain of my main domain. Well, she hated it and asked that I do it as a Word document instead, which I did. But I forgot to take it down. No problem, I thought, after all nothing links to it or mentions it in any public place, so how would a crawler find it?

Imagine my surprise when the customer calls me up some time later saying that this old version of the documentation, in a subdirectory on a un-linked to site is showing up in Google searches for her product’s name. How did that happen? Using the advanced search, I couldn’t find anything that linked to it. There was one mention of that domain in a forum post, but in that case I was using the :8080 port because I was referring to the Tomcat server that was also running on that domain.

So as I see it, the choices are:

Google saw the mention of the domain in the middle of a forum post, recognized it as a URL (it wasn’t a link) and stripped out the :8080 and crawled the site OR
They saw me mention the url in a link I send in a GMail to the customer and used that as an excuse to crawl the site.
IE reported the link to Bing when the customer clicked on it and then Google stole it from Bing somehow
Chrome reported the link to Google when I clicked on it

Either way, they’re crawling things that aren’t public links. Me thinks Google protest too much.

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28