Contributing to Project Honeypot

Spammers are one of the most annoying natural enemies of the blogging community. They waste the time of site administrators who must install anti-spam systems and dig through suspicious comments to pick out real ones. They waste the time of users who are forced to jump through hoops like site registration and CAPCHAs.

One way to help fight spam is to participate in Project Honeypot. If you run a website, they will give you a script to add somewhere. Then, you add links to the script that robots will follow, but not people. This allows the project to catalogue the IP addresses of robots, as well as track the general spam problem globally. People who run websites but don’t control the hosting (for instance, people with blogs on Blogger.com or WordPress.com) can add ‘QuickLinks’ which serve a similar function.

Stop Spam Harvesters, Join Project Honey Pot

People running WordPress blogs can also use the http:BL WordPress Plugin to take advantage of Project Honeypot’s data and block spammers and harvesters of email addresses.

Setting up a honeypot only takes a couple of minutes, and gives the satisfaction of knowing you are helping to make the internet a slightly more civil place. In addition to running a honeypot and using the http:BL plugin, this site has a wiki protected with Bad Behaviour, a blog protected with Akismet, and spam defences built into .htaccess.

Threaded comments

WordPress now supports the option of threaded comments, where people can respond to a specific comment in a sub-thread, rather than just adding to the bottom of a single list.

Do people think incorporating this feature would improve this site, or make it less functional?

I would have no objections to giving it a whirl if doing so was easily reversible, but it seems certain that any switch back to linear comments would turn threaded conversations into confusing messes. As such, I would have to be pretty certain the shift would be beneficial in order to make it.

Microsoft’s imitation Google

Microsoft’s new Bing search engine is a bit bewildering. To call it an homage to Google is an understatement: complete with ‘Web,’ ‘Images,’ ‘News,’ ‘Maps,’ etc across the top bar. While the bird’s eye feature in Bing Maps is a bit neat (it seems like it might be based on HDR images), one cannot easily shake the feeling that Microsoft decided to respond to Google’s approach by outright copying it. The only oddity is that, because I have my Windows language set to British English (so it knows how to spell ‘colour’), this makes Bing think I am in the UK, and the site offers me no option for showing Canadian results or news. Not very clever, given the ease with which an IP address can be turned into a location.

Has anybody discovered any Bing feature that is either quite different from or better than a Google offering? Hotmail certainly cannot begin to touch the searchable glory that is GMail.

Hashing with Wolfram Alpha

Separately, I have discussed both the Wolfram Alpha computational knowledge engine and the practice of hashing information. The fact that WA allows anyone to do so easily has relevance for things like making bets online, in situations where players want to conceal their guesses until everyone else has put theirs up.

Here is an example. Say you want to place bets on who will win the next Republican presidential primary. You don’t want those who post later to have the advantage of knowing what others have already posted, so you do the following:

  1. Choose a hash algorithm (MD5 should be fine, but SHA is more secure)
  2. Have each participant put their guess into WA. Say I think it will be Sarah Palin. I would enter: “SHA “I think the primary winner will be Sarah Palin, though I fear what she will do with the country” into Wolfram Alpha, and it would spit out something like “f7ca 4adf 11c7 5b56 f355 1635 5b50 2eca 5950 5349”
  3. Note that the supplementary text, in addition to the name, is vital. Otherwise, it would be trivially easy for the other players to check the hashes for likely guesses and learn what people have chosen. Incorporating a salt into the hashing algorithm would be ideal, but WA doesn’t seem to have that capability.
  4. Have each participant post the hash of their response, saving the exact text somewhere secure to them.
  5. When the outcome is known, those who guessed correctly can confirm that fact, by providing text that hashes into their original post.

A somewhat roundabout and nerdy solution to a relatively unimportant problem, perhaps, but it illustrates some of the ways hashes can be used to prove what you said earlier, without having the content of your earlier message immediately accessible – a general ability with many applications.

One more fact about salts: they are the most straightforward way to foil attacks using rainbow tables.

Colour-based Google image searches

Google Image Search now lets you search for images that are predominantly similar to twelve different colours. For instance, the set of all photos from my site they have indexed can be restricted to just those with red highlights or those dominated by blue.

All told, Google currently includes 204 images from my site in their index. Here is the colour breakdown:

  • Red: 10
  • Teal: 7
  • White: 11
  • Orange: 17
  • Blue: 25 (lots of the sky)
  • Grey: 41 (many of them in black and white)
  • Yellow: 2
  • Purple: 2
  • Black: 47
  • Green: 8
  • Pink: 0
  • Brown: 45

You can also search for various image types: news content, faces, clip art, line drawings, and photo content.

As ever, Google Image Search is a somewhat perplexing creation. It’s not clear why it selects the photos it does or how it ranks them. I look forward to further improvements in the service.

Playing with Wolfram Alpha

Ottawa street lights

Wolfram Alpha is based on a rather neat idea: making a website that can actually deal with information in an intelligent way, rather than simply search for words of things in existing pages. Put in ‘1 kg gold‘ and it will tell you that it would form a sphere 2.313cm in diameter, a cube 3.73cm to a side, and cost US$32,520. Put in ‘running 10 km/h 60 minutes 6’0″ 185lbs age 25 male‘ and it will estimate the number of calories expended. It doesn’t know about cycling yet, unfortunately. It doesn’t seem to be able to do calculations on greenhouse gas emissions yet, either, though it will tell you that mixtures of air and methane in which the methane is between 5% and 15% of the total will explode if exposed to a temperature of 595˚C. It also knows that Apple has 35,100 employees and a current P/E ratio of 23.1. It can search for base pair sequences within the human genome.

The biggest limitation of the site is phrasing things in a way it interprets properly. Indeed, most of the searches I try produce only the message: “Wolfram|Alpha isn’t sure what to do with your input.” For the time being, Wolfram Alpha is less of an open-ended vehicle for computations and data access, and more a set of discretely made tools for existing tasks. If you know which tools exist and how to format input for them, it works well. If you are trying to get it to do something its designers didn’t anticipate, it probably won’t work.

In short, Wolfram Alpha is a fun thing for statistics nerds to play around with, and could be genuinely useful for research. The best way to appreciate its current capabilities is to watch this introductory screencast.

Downloading from YouTube and Megavideo

There are lots of sites out there that present videos in a non-downloadable way, wrapped in flash video players. For instance, there are Megavideo and YouTube. For those running Safari, there is an easy way to download these videos:

  1. On the page with the video, click ‘Window’ and the ‘Activity.’
  2. Scroll through the list until you find something plausibly large (at least a few megabytes)
  3. Select that item and copy it, either with the hotkey or from ‘Edit > Copy.’
  4. Click ‘Window’ and ‘Downloads’
  5. Paste the item, with either the menu or the hotkey.
  6. The file will download to your desktop.
  7. Add the extension .flv to the file
  8. They can be played in video players like VideoLAN.

The same trick can be used to grab other forms of embedded media from all sorts of websites, and it’s much easier than digging around in search of temporary internet files.

Down the west coast by public transit

As reported on Tristan’s blog, my friend Mike is in the process of traveling from Vancouver to Portland, by public transit alone. Apparently, this is possible because the transit systems of successive places overlap.

You can follow the journey via Twitter, or through the blog they have been updating several times a day: The I-5 Chronicles