Unicity distance

Sky, moon, and wires

In order to be able to decipher a secret message through cryptanalysis, you need to have a sufficient quantity of data to evaluate whether it has been done properly. If all a cryptoanalyst has to work with is enciphered text (say, in the form of an intercepted message) the attempt to decipher it is called a ciphertext-only attack. For a variety of reasons, these are very tricky things to accomplish. The element described below is one of the most basic.

In order to understand why a message of sufficient length is important, consider a message that consists only of a single enciphered phone number: “724-826-5363.” These numbers could have been modified in any of a great number of ways: for instance, adding or subtracting a certain amount from each digit (or alternating between adding and subtracting). Without knowing more, or being willing to test lots of candidate phone numbers, we have no way of learning whether we have deciphered the message properly. On the basis of the ciphertext alone, 835-937-6474 is just as plausible as 502-604-3141.

Obviously, this is only a significant problem for short messages. One could imagine ways in which BHJG could mean ‘HIDE’ or ‘TREE’ or ‘TRAP.’ The use of different keys with the same algorithm could generate any four letter word from that ciphertext. Once we have a long enough enciphered message, however, it becomes a lot more obvious when we have deciphered it properly. If I know that the ciphertext:

UUEBJQPWZAYIVMNAZSUQPYJVOMDGZIQHWZCX

has been produced using the Vigenere cipher, and I find that it deciphers to:

IAMTHEVERYMODELOFAMODERNMAJORGENERAL

when I use the keyword MUSIC, it is highly likely that I have found both the key and the unenciphered text.

This concept is formalized in the idea of unicity distance: invented by Claude Shannon in the 1940s. Unicity distance describes the amount of ciphertext that we must have in order to be confident that we have found the right plaintext. This is a function of two things: the entropy of the plaintext message (something written in proper English is far less random than a phone number) and the length of the key being used for encryption.

To calculate the unicity distance for a mesage written in English, divide the length of the key in bits (say, 128 bits) by 6.8 (which is a measure of the level of redundancy in English). With about eighteen characters of ciphertext, we can be confident that we have found the correct message and not simply one of a number of possibilities, as in the phone number example. By definition, compressed files have redundancy removed; as such, you may want to divide the key length by about 2.5 to get their unicity distance. For truly random data, the level of redundancy is zero therefore the unicity distance is infinite. If I encipher a random number and send it to you, a person who intercepts it will never be able to determine – on the basis of the ciphertext alone – whether they have deciphered it properly.

For many types of data files, the unicity distance is comparable to that in normal English text. This holds for word processor files, spreadsheets, and many databases. Actually, many types of computer files have significantly smaller unicity distances because they have standardized beginnings. If I know that a file sent each morning begins with: “The following the the weather report for…” I can determine very quickly if I have deciphered it correctly.

Actually, the last example is particularly noteworthy. When cryptoanalysts are presented with a piece of ciphertext using a known cipher (say Enigma) and which is known to include a particular string of text (such as the weather report introduction), it can become enormously easier to determine the encryption key being used. These bits of probable texts are called ‘cribs‘ and they played an important role in Allied codebreaking efforts during the Second World War. The use of the German word ‘wetter’ at the same point in messages sent at the same time each day was quite useful for determining what that day’s key was.

Secrets and Lies

Ottawa church

Computer security is an arcane and difficult subject, constantly shifting in response to societal and technological forcings. A layperson hoping to get a better grip on the fundamental issues involved can scarcely do better than to read Bruce Schneier‘s Secrets and Lies: Digital Security in a Networked World. The book is at the middle of the spectrum of his work, with Beyond Fear existing at one end of the spectrum as a general primer on all security related matters and Applied Cryptography providing far more detail than non-experts will ever wish to absorb.

Secrets and Lies takes a systematic approach, describing types of attacks and adversaries, stressing how security is a process rather than a product, and explaining a great many offensive and defences strategies in accessible ways and with telling examples. He stresses the impossibility of preventing all attacks, and hence the importance of maintaining detection and response capabilities. He also demonstrates strong awareness of how security products and procedures interact with the psychology of system designers, attackers, and ordinary users. Most surprisingly, the book is consistently engaging and even entertaining. You would not expect a book on computer security to be so lively.

One critical argument Schneier makes is that the overall security of computing can only increase substantially if vendors become liable for security flaws in their products. When a bridge collapses, the construction and engineering firms end up in court. When a ten year old bug in Windows NT causes millions of dollars in losses for a company losing it, Microsoft may see fit to finally issue a patch. Using regulation to structure incentives to shape behaviour is an approach that works in a huge number of areas. Schneier shows how it can be made to work in computer security.

Average users probably won’t want to read this book – though elements of it would probably entertain and surprise them. Those with an interest in security, whether it is principally in relation to computers or not, should read it mostly because of the quality of Schneier’s though processes and analysis. The bits about technology are quite secondary and pretty easily skimmed. Most people don’t need to know precisely how smart cards or the Windows NT kernel are vulnerable; they need to know what those vulnerabilities mean in the context of how those technologies are used. Reading this book will leave you wiser in relation to an area of ever-growing importance. Those with no special interest in computers are still strongly encouraged to read Beyond Fear: especially if they are legislators working on anti-terrorism laws.

On technology and vulnerability

The first episode of James Burke’s Connections is very thought provoking. It demonstrates the inescapable downside of Adam Smith‘s pin factory: while an assembly line can produce far more pins than individual artisans, each of the assembly line workers becomes unable to produce anything without the industrial network that supports their work.

See this prior entry on Burke’s series

Protecting sources and methods

Rusty metal wall

By now, most people will have read about the Canadian pedophile from Maple Ridge who is being sought in Thailand. The story is a shocking and lamentable one, but I want to concentrate here on the technical aspect. INTERPOL released images of the man, claiming they had undone the Photoshop ‘twirl’ effect that had been used to disguise him initially in compromising photos. While this claim has been widely reported in the media, there is at least some reason to question it. It is also possible that INTERPOL is concealing the fact that it received unaltered photos from another source, which could have been anything from intercepted emails to files recovered from an improperly erased camera memory card. It could even have been recovered from the EXIF metadata thumbnails many cameras produce. It is also possible this particular effect is so easy to reverse (and that the technique is so widely known to exist) that INTERPOL saw no value in keeping their methods secret. A quick Google search suggests that the ‘twist’ effect is a plausible candidate for easy reversal.

Providing an alternative story to explain the source of information is an ancient intelligence tactic. For instance, during the Second World War an imaginary spy ring was created by the British and used to justify how they had some of the information that had actually been obtained through cracked ENIGMA transmissions at Bletchley Park. Some have argued that the Coventry Bombing was known about in advance by British intelligence due to deciphered messages, but they decided not to evacuate the city because they did not want to reveal to the enemy that their ciphers had been compromised. While this particular example may or may not be historically accurate, it illustrates the dilemma of somebody in possession of important intelligence acquired in a sensitive manner.

Cover stories can conceal sources and methods in other ways. A few years ago, it was claimed that Pervez Musharraf had escaped having his motorcade bombed, due to a radio jammer. While that is certainly possible, it seems unlikely that his guards would have reported the existence of the system if it had played such a crucial role. More likely, they got tipped off from an informant in the group responsible, an agent they had implanted in it, or some sort of communication intercept. Given how it is now widely known that email messages and phone calls worldwide are regularly intercepted by governments, I imagine a lot of spies and informants are being protected by false stories about communication intercepts.

In short, it is fair to say that any organization concerned with intelligence gathering will work diligently to protect their sources and methods. After all, these are what ensure their future access to privileged information in the future. While there is a slim chance INTERPOL intentionally revealed their ability to unscramble photographs as some sort of deterrent, it seems unlikely. This situation will simply encourage people to use more aggressive techniques to conceal their faces in the future. It is also possible that, in this case, they felt that getting the man’s image out was more important than protecting their methods. In my opinion, it seems most likely that ‘twist’ really is easy to unscramble and that they saw little value in not publicizing this fact. That said, it remains possible that a more complex collection of tactics and calculations has been applied.

Mac security tips

Gatineau Park, Quebec

During the past twelve months, 23.47% of visits to this blog have been from Mac users. Since there are so many of them out there, I though I would share a few tips on Mac security. Out of the box, OS X does beat Windows XP on security – partly for design reasons and partly because it isn’t as worthwhile to come up with malware that attacks an operating system with a minority of users. Even so, taking some basic precautions is worthwhile. The number one tip is behavioural, rather than technical. Be cautious in the websites and emails you view, the files you download, and the software you install.

Here are more detailed guides from a company called Corsair (which I know nothing about) and from the American National Security Agency (who knew they used Macs?). The first link is specific to Tiger (10.4), while the latter is about the older Panther (10.3). I expect they will both remain largely valid for the upcoming Leopard (10.5).

Some more general advice I wrote earlier: Protecting your computer.

PS. I am curious about the one person in the last orbit who accessed this site using OS/2 Warp, back on February 17th. I hope it was one of the nuns from the ads.

Once more on the importance of backups

As mentioned before, the best defence against data loss from viruses or hardware damage is to make comprehensive, frequent backups. As such, I propose the following rule of thumb:

If a piece of data is worth more than the drive space it occupies, a second copy should exist somewhere else.

Nowadays, you can easily pick up hard drives for less than $1 per gigabyte. At those prices, it probably isn’t just personal photos and messages that are worth saving, but any bulk data (movies, songs, etc) that would take more than $1 per gigabyte in effort to find and download again.

Mac users should consider downloading Carbon Copy Cloner. It produced bootable byte-for-byte copies of entire drives. That means that even if the hard drive in your computer dies completely and irreplaceably, you can actually run your system off an external hard drive, with all the data and functionality it possessed when you made the most recent copy.

One nice perk about having one or more such copies is how they can let you undo mistakes. If you accidentally erased or corrupted an important file, you can go back and grab it. Likewise, if you installed a software update that proved problematic, you can shift you entire system back to an earlier state.

[Update: 22 January 2010] Since I wrote this article, Apple released new versions of OS X with their excellent Time Machine backup software built-in. I strongly encourage all Mac users to take advantage of it.

A suggestion to Google

One cool feature of Google is that it performs unit conversions. It makes it easy to learn that 1000 rods is the same as 2750 fathoms. One useful addition would be the calculation of carbon dioxide equivalents: you could plunk in “250 tonnes of methane in CO2 equivalent” and have it generate the appropriate output, based on the methodology of the IPCC. The gasses for which the calculator should work would also include nitrous oxide, SF6, HCFCs, HFCs, CFCs, and PFCs.

Sure, this feature would only be useful for less than one person in a million, but Google has often shown itself willing to cater to the needs of techie minorities.

The true price of nuclear power

Maple leaf

Several times this blog has discussed whether climate change is making nuclear power a more acceptable option (1, 2, 3). One element of the debate that bears consideration is the legacy of contamination at sites that form part of the nuclear fuel cycle: from uranium mines to post-reactor fuel processing facilities. The Rocky Flats Plant in the United States is an especially sobering example.

Insiders at the plant started “tipping” the FBI about the unsafe conditions sometime in 1988. Late that year the FBI started clandestinely flying light aircraft over the area and noticed that the incinerator was apparently being used late into the night. After several months of collecting evidence both from workers and by direct measurement, they informed the DOE on June 6, 1989 that they wanted to meet about a potential terrorist threat. When the DOE officers arrived, they were served with papers. Simultaneously, the FBI raided the facilities and ordered everyone out. They found numerous violations of federal anti-pollution laws including massive contamination of water and soil, though none of the original charges that led to the raid were substantiated.

In 1992, Rockwell was charged with minor environmental crimes and paid an $18.5 million fine.

Accidents and contamination have been a feature of facilities handling nuclear materials worldwide. Of course, this does not suffice to show that nuclear energy is a bad option. Coal mines certainly produce more than their share of industrial accidents and environmental contamination.

The trickiest thing, when it comes to evaluating the viability of nuclear power, is disentangling exactly what sort of governmental subsidies do, have, and will exist. These subsidies are both direct (paid straight to operators) and more indirect (soft loans for construction, funding for research and development). They also include guarantees that the nuclear industry is only responsible for a set amount of money in the result of a catastrophic accident, as well as the implicit cost that any contamination that corporations cannot be legally forced to correct after the fact will either fester or be fixed at taxpayer expense. Plenty of sources claim to have a comprehensive reckoning of these costs and risks, but the various analyses seem to be both contradictory and self-serving.

Before states make comprehensive plans to embrace or reject nuclear power as a climate change mitigation option, some kind of extensive, comprehensive, and impartial study of the caliber of the Stern Review would be wise.

The Storm Worm

The Storm Worm is scary for a number of good reasons. It acts patiently, slowly creating a massive network of drone machines and control systems, communicating through peer-to-peer protocols. It gives little evidence that a particular machine has been compromised. Finally, it creates a malicious network that is particularly hard (maybe impossible, at this time) to map or shut down.

This is no mere spam-spread annoyance. If it takes over very large numbers of computers and remains in the control of its creators, it could be quite a computational force. The only question is what they (or someone who rents the botnet) will choose to use it for, and whether such attacks can be foiled by technical or law-enforcement means. Hopefully, this code will prove a clever exception to the norm, rather than a preview of what the malware of the future will resemble.

Normally, I don’t worry too much about viruses. I use a Mac, run anti-virus software, use other protective programs, make frequent backups, and use the internet cautiously. While those things are likely to keep my own system free of malware, I naturally remain vulnerable to it. That’s where most spam comes from. Also, there is the danger that a network of malicious computers will crash or blackmail some website or service that I use. With distributed systems like Storm, the protection of an individual machine isn’t adequate to prevent harm.

Previous related posts:

Dr. Strangelove in a nuclear bunker

Marc Gurstein rides the bomb

After today’s orientation, I went with some friends to see Dr. Strangelove in the Diefenbunker – the infamous Canadian nuclear shelter, built to protect top Canadian military and civilian leadership in the event of nuclear war. Diefenbunker is actually a general term for shelters of the type: the one near Ottawa is called CFS Carp. Apparently, there is also one in Nanaimo, B.C. One odd thing is that the shelter has a multi-room suite for the Governor General. Presumably, Canada would not have much need for a local representative of the Queen, after the actual Queen’s entire realm is reduced to a burnt, radioactive plain.

Tonight’s film was followed up by Pho with three fellow employees of the federal government. It was all a distinct social step forward, and Ashley Thorvaldson deserves credit for organizing the expedition.

You can read about the Cold War movies events on the website of the Diefenbunker Museum.