Random numbers

Truly random numbers are hard to find, as patterns tend to abound everywhere. This is problematic, because there are times when a completely random string of digits is necessary: whether you are choosing the winner of a raffle or generating the one-time pad that secures the line from the White House to the Kremlin.

Using random radio crackle, random.org promises to deliver random data in a number of convenient formats (though one should be naturally skeptical about the security of such services). Another page, by Jon Callas, provides further information on why random numbers are both necessary and surprisingly tricky to get.

This comic amusingly highlights another aspect of the issue.

Facebook ecosystem

Milan’s Facebook ecosystem

It is nearly always interesting to see complex data presented in a new way – particularly as a visualization. The way this one arose was actually very mathematical, based on equations for modeling the strength of electromagnetic fields.

The dense cluster on the left is a tangle of high school and undergrad. The much smaller grouping on the right is Wadham College, Oxford. All around the edges and bottom are relatively or entirely isolated people – evidence of how many people I meet and random and at a sufficient level to warrant a Facebook linkage.

Encrypting personal communication

Statue outside the National Archives

Personal use of encrypted communication is yet another example of so-called ‘network effects.’ (These have been mentioned previously: 1, 2, 3.) The basic idea is that the more widespread certain technologies become, the more useful they are to everyone using them. The most commonly cited examples are telephones and fax machines; back when only a few people had them, they had limited utility. You would need alternative channels of communication and you would waste time deciding which one to use and exchanging instructions about that with other parties. Once telephones became ubiquitous, each one was a lot more powerful and convenient. The same can be said for email addresses.

Good free software exists that allows the encryption of emails at a level where it would challenge major organizations to read them. While this may not protect an individual message that falls under scrutiny, it changes the dynamic of the whole system. It is no longer possible to filter every email passing along a fibre-optic cable for certain keywords, for instance. You would need to crack every one of them first.

Making the transition to the routine use of encryption, however, requires more effort than the adoption of telephones or email. While those technologies were more convenient than their predecessors, encryption adds a layer of difficulty to communication. You need to have the required software, key pairs generated, and passphrases. It is possible to make mistakes and encrypt things such that you can never access them again.

As such, there is a double barrier to the adoption of widespread communication encryption: people must deal with the added difficulties involved in communicating in this way and with the problem that hardly anyone uses such systems now. If there is nobody out there with whom you can exchange PGP encrypted messages, you aren’t too likely to bother with acquiring and using the software. It is entirely possible that those two constraints will prevent widespread adoption for the foreseeable future.

One nice exception to this rule is Skype. Users may not know it, but calls made over Skype are transmitted in encryption form, very considerably increasing the difficulty of intercepting them. The fact that users do not know this is happening greatly increases the level of usage (you cannot avoid using it). While such systems may well not be as secure as explicit encryption efforts undertaken by senders and recipients, they may be a useful way to increase overall adoption of privacy technology. Such ‘invisible encryption’ could also be usefully incorporated into stores of personal data, such as the contents of GMail accounts.

PS. For anyone who decides to give PGP a try, my public key is available here.

Footprints all over the web – Google Web History

Red brick facade and fire escapes

When I am online, I usually have at least one Google service open. At home, I usually have a Google Mail window open at all times, as well as Google Calendar. At work, it is only the latter. What I didn’t know until today is that whenever you are logged into your Google account, Google is tracking your web usage through a system called Web History. Accessing the system allows you to ‘pause’ the recording and even delete what is already there. While the listings disappear from your screen, there is good reason to doubt whether they vanish from Google’s records.

It is common knowledge that Google saves every search query that gets input into it, and does so in a way that can be linked to an individual computer. The web history service, however, has more troubling implications. Whether you are at work, at home, or at an internet cafe, you just need to be logged into any Google service for it to be operating. Since more than one computer can be logged into a Google account at once, and there is no indication on either machine that this is happening, anybody who gets your password can monitor your web usage, as well as your email and any other Google services you use. Given how common keyloggers have become, this should worry people.

One very helpful feature Google could implement would be the option to show when and where you last logged into your account. That way, if someone has been peeking at your email from London while you have been in Seattle, you know that it may be time to change your password. Also desirable, but much less likely to happen, would be a requirement that services like GMail store your information as an encrypted archive. Even if the encryption was based on your password and a relatively weak cipher, it would make it impractical for either Google or malicious agents with access to their information storage systems to undertake the wholesale mining of the information therein.

The final reason for which this is concerning has to do with cooperation between companies and governments. It is widely rumoured that companies including Microsoft and Yahoo have helped the Chinese government to track down and prosecute dissidents, by turning over electronic records held outside China. Given the increasingly bold snooping of both democratic and authoritarian governments, a few more layers of durable protection built into the system would be prudent and encouraging.

Grist for the mill

Fire at Booth, near Somerset

Here is an interesting article about the ongoing debates about ethical food and climate change: “The Eat-Local Backlash.” Such articles demonstrate how fiendishly complicated it can be to make personal environmental decisions. Questions about which of two options has the lesser environmental effect can rarely be definitively answered, not least because there are so many different types of environmental effects, ranging from air and water pollution to climate change and loss of biodiversity. This article is from a site called Grist, which has recently joined the ranks of those I consult most frequently and read most carefully. Their analysis isn’t always terrific, but the place has a lot of life.

Indeed, the site itself demonstrates the benefits of aggregation (one argument against local food). Rather than having the attention of a few hundred people spread between a few dozen environmental blogs, each getting a couple hundred hits a day, this provides a much more concentrated conversation. I encourage those interested in environmental issues to join and start commenting.

Transitioning from transition

After a month on the job, this no longer feels like a “weblog in transition.” As such, I need to come up with a new secondary title. Given how it is the first piece of information most people absorb about the site – after a general appreciation for the layout and style – it is important to tune correctly. Given the diverse areas of interest explored here, I am not sure what would be most suitable. What I do know is that I don’t want it to mention my area of employment, because I do not to be an important feature of what happens here.

Do people have any suggestions? The cleverer the better. Work is also being done on a new banner.

Ottawa blogs

Within a few months of arriving in Oxford, I had sorted out which blogs were worth reading. So far, I have not stumbled across any good Ottawa blogs. Does anybody know of any? Environment blogs, photo blogs, food blogs, travel blogs – all of these are potentially interesting. Personal blogs are better than pundit blogs. High quality writing is the key factor, along with some local information.

Strengthening substitution ciphers

Fountain in Gatineau

The biggest problem with substitution ciphers (those that replace each letter with a particular other letter or symbol) is that they are vulnerable to frequency analysis. In any language, some letters are more common than others. By matching up the most common symbols with what you know the most common letters are, you can begin deciphering the message. Likewise, you can use rules like ‘a rare letter than almost always appears to the left of one specific more common letter is probably a Q.’ What is needed to strengthen such ciphers is a language in which words have no such ‘personality.’ Here is how to do it:

First, take all the short words (less than three letters) and assign them a random three digit code. Lengthening very short words further strengthens this approach because short words are the most vulnerable to frequency analysis; a single letter sitting with spaces on either side is probably ‘a’ or ‘i.’ Using three digit groups and 26 letters, you can assign 17,576 words. Now, take as many words from the whole language as you want to be able to use. For the sake of completeness, let’s use the entire Oxford English Dictionary. The 456,976 possible four letter groups more than suffice to cover every word in it, leaving some space for technical terms that we may want to encrypt but which might not be included. If we need even more possibilities, there are 11,881,376 five letter combinations.

This approach is cryptographically valuable for a number of reasons. Since the codes representing words have a random collection of letters, the letter frequency in a ‘translated’ message is also random. You no longer need to worry that some English letters are more common than others. Just as important, there are none of the ‘Q’ type rules by which to later attack the substitution cipher. The dictionary of equivalencies would not need to be secret; indeed, it should be widely available. Having the dictionary does not make encrypted messages more vulnerable, since they will have passed through a substitution cipher before being distributed and are fundamentally more robust to the cryptoanalysis of substitution ciphers than a message enciphered from standard English would be.

In the era of modern algorithms like AES, I doubt there is any need for the above system. Still, I wonder if there are any historical examples of this approach being used. If you have a computer to do the code-for-word and word-for-code substitutions, it would be quite a low effort mechanism to increase security.