The terms people arriving at our site through search.live.com are just... weird, though. Most are outright vulgar, searching for obscure pornography or celebrity names, drugs, sex aids... Here's an few examples:
/log/trunk Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR
220.127.116.11 http://search.live.com/result.aspx?q=breast+enhancement&mrt=en-us&FORM=LVSP /
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Win64; x64; SV1)
/browser/media/tutorials Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2;
.NET CLR 1.1.4322)
These hits were to small obscure pages on our site, such as svn changelogs, and then I noticed the source IP addresses were in just a few IP ranges, so I ran a whois to see if one ISP tied connected them all;
arc@sobek ~/work/pysoy $ whois 18.104.22.168
OrgName: Microsoft Corp
Address: One Microsoft Way
NetRange: 22.214.171.124 - 126.96.36.199
You read that correctly. Microsoft, in a desperate attempt to make themselves seem more important, or perhaps just to flood free software project's websites with unwanted traffic, is running bots which act like normal web crawlers. Indeed, over 97% of the hits we got from search.live.com were from Microsoft's own IP subnets. Searching Google, I found this story was previously covered by others more observant of their logs.
In response, I'm adding a special rule to block all future traffic from the offending netblocks, including MICROSOFT 188.8.131.52 - 184.108.40.206 and MICROSOFT-1BLK 220.127.116.11 - 18.104.22.168.