Geek stuff – Page 135 – a sibilant intake of breath

Two useful WordPress hacks

By the time anyone is reading this, I will be well on my way to Snowdonia with the Walking Club. Rather than make this the longest pause in blogging in recent memory (four days!), I have queued up some short entries with images.

This post is a fairly esoteric one, of interest only to people who are either using WordPress of thinking of setting up a WordPress blog. It details two little programming tricks that improve the WordPress experience. Continue reading “Two useful WordPress hacks”

Afterlife for web pages

One research tool that surprisingly few people seem to know about is the Wayback Machine, at the Internet Archive. If you are looking for the old corporate homepage of the disbanded mercenary firm Executive Outcomes, or want to see something that used to be posted on a governmental site, but is no longer available there, it is worth a try.

Obviously, they cannot archive everything that is online, but the collection is complete enough to have helped out more than a couple of my friends. People who operate sites may also be interested in having a look at what data of yours they have collected.

Of group sizes and word counts

According to Malcolm Gladwell, something fundamental happens to human organizations once they grow beyond 150 people. This is called Dunbar’s Number. If you take the size of a primate’s neocortex, relative to the rest of its brain, you will find a close correlation to the expected maximum group size for that species.¹ This number corresponds to village sizes, as seen around the world, to the sizes of effective military units, and to the size at which Hutterite communities split up. It seems that, above this size, organizations require complex hierarchies, rules, regulations, and formal measures to operate efficiently.

I think that something very similar happens to pieces of academic writing, once they get beyond about 5,000 words. That is the point where my ability to hold the entire thing at once in my mind fails, often leading to duplication and confusion. Even with two levels of sub-divisions, things simply become unmanageable at that point and I go from feeling total control over a piece of writing (2,500 words) to feeling that it has sprawled a bit (3-4,000 words) to feeling rather daunted by the whole thing. With my revised second chapter at 5,700 words and three to seven hours left prior to submission, I am certainly feeling as though things have grown beyond the bounds of good sense and comprehensibility.

[1] Gladwell, Malcolm. The Tipping Point: How Little Things Can Make a Big Difference. Back Bay Books; New York, 2000. p.179

Bad software rage

Be warned, EndNote 9.0 for Macintosh is not a well-coded piece of software. There is nothing quite like being in the middle of adding some complex footnotes to your thesis draft when EndNote crashes: causing Word, Firefox, Entourage, the Dock, and the Finder to crash along with it. Then, all the various error reporting windows crop up and waste even more time and sanity. That I haven’t lost any important data because of this so far is largely the product of extremely frequent backups.

Framing, selection, and presentation issues

One of the major issues that arises when examining the connections between science and policy are the ways information is framed. You can say that the rate of skin cancer caused by a particular phenomenon has increased from one in ten million cases to one in a million cases. You can say that the rate has increased tenfold, or that it has gone up by 1000%. Finally, you could say that an individual’s chances of getting skin cancer from this source have gone up from one tiny figure to a larger, but still tiny seeming, figure. People seem to perceive the risks involved in each presentation differently, and people pushing for one policy or another can manipulate that. This can be especially true when the situations being described are of not comparably rare: having your chances of being killed through domestic violence reduced 1% is a much greater absolute reduction than having your chances of dying in a terrorist attack reduced by 90%.

Graphing

When talking about presentation of information, graphs are an important case. Normally, they are a great boon to understanding. A row of figures means very little to most people, but a graph provides a wealth of comprehensible information. You can see if there is a trend, what direction it is in, and approximately how strong it is. The right sort of graph, properly presented, can immediately illuminate the meaning of a dataset. Likewise, it can provide a compelling argument: at least, between those who disagree more about what is going on than how it would be appropriate to respond to different situations.

People see patterns intuitively, though sometimes they see order in chaos (the man on the moon, images of the Virgin Mary in cheese sandwiches). Even better, they have an automatic grasp of calculus. People who couldn’t tell you a thing about concavity and the second derivative can immediately see when a slope is upwards and growing ever steeper: likewise, one where something is increasing or decreasing, but at a decreasing rate. They can see what trends will level off, and which ones will explode off the scale. My post on global warming damage curves illustrates this.

Naturally, it is possible to use graphs in a manipulative way. You can tweak the scale, use a broken scale, or use a logarithmic scale without making clear what that means. You can position pie charts so that one part or another is emphasized, as well as abuse colour and three dimensional effects. That said, the advantages of graphs clearly outweigh the risks.

It is interesting to note how central a role one graph seems to have played in the debate about CFCs and ozone: the one of the concentration of chlorine in the stratosphere. Since that is what CFCs break down to produce, and that is what causes the breakdown of ozone, the concentration is clearly important. The graph clearly showing that concentrations would continue to rise, even under the original Montreal Protocol, seems to have had a big impact on the two rounds of further tightening. Perhaps the graph used so prominently in Al Gore in An Inconvenient Truth (the trends on display literally dwarfing him) will eventually have a similar effect.

Stats in recent personal experience

My six-month old Etymotic ER6i headphones are being returned to manufacturer tomorrow, because of the problems with the connector I reported earlier. Really not something you expect for such a premium product, but I suppose there are always going to be some defects that arise in a manufacturing process. Of course, being without good noise isolating headphones for the time it will take them to be shipped to the US, repaired or replaced, and returned means that reading in coffee shops is not a possibility. Their advantage over libraries only exists when you are capable of excluding the great majority of outside noise and of drowning the rest in suitable music.

Speaking of trends, I do wonder why so many of my electronics seem to run into problems. I think this is due to a host of selection effects. I (a) have more electronics than most people (b) use them a great deal (c) know how they are meant to work (d) know what sort of warranties they have and for how long (e) treat them so carefully that manufacturers can never claim they were abused (f) maintain a willingness to return defective products, as many times as is necessary and possible under the warranty. Given all that, it is not surprising that my own experience with electronics failing and being replaced under warranty is a lot greater than what you might estimate the background rate of such activity to be.

Two other considerations are also relevant. It is cheaper for manufacturers to rely upon consumers to test whether a particular item is defective, especially since some consumers will lose the item, abuse it, or simply not bother to return it even if defective. Secondly, it is almost always cheaper to simply replace consumer electronics to fix them, because of the economies of scale involved in either activity. From one perspective, it seems wasteful. From another, it seems the more frugal option. A bit of a paradox, really.

[14 March 2007] My replacement Etymotic headphones arrived today. Reading in coffee shops is possible again, and none too soon.

Zoom confusion

A quick question for fellow photographers: Continue reading “Zoom confusion”

Making a hash of things

The following is the article I submitted as part of my application for the Richard Casement internship at The Economist. My hope was to demonstrate an ability to deal with a very technical subject in a comprehensible way. This post will be automatically published once the contest has closed in all time zones.

Cryptography
Making a hash of things

Oxford
A contest to replace a workhorse of computer security is announced

While Julius Caesar hoped to prevent the hostile interception of his orders through the use of a simple cipher, modern cryptography has far more applications. One of the key drivers behind that versatility is an important but little-known tool called a hash function. These consist of algorithms that take a particular collection of data and generate a smaller ‘fingerprint’ from it. That can later be used to verify the integrity of the data in question, which could be anything from a password to digital photographs collected at a crime scene. Hash functions are used to protect against accidental changes to data, such as those caused by file corruption, as well as intentional efforts at fraud. Cryptographer and security expert Bruce Schneier calls hash functions “the workhorse of cryptography” and explains that: “Every time you do something with security on the internet, a hash function is involved somewhere.” As techniques for digital manipulation become more accessible and sophisticated, the importance of such verification tools becomes greater. At the same time, the emergence of a significant threat to the most commonly used hashing algorithm in existence has prompted a search for a more secure replacement.

Hash functions modify data in ways subject to two conditions: that it be impossible to work backward from the transformed or ‘hashed’ version to the original, and that multiple originals not produce the same hashed output. As with standard cryptography (in which unencrypted text is passed through an algorithm to generate encrypted text, and vice versa), the standard of ‘impossibility’ is really one of impracticability, given available computing resources and the sensitivity of the data in question. The hashed ‘fingerprint’ can be compared with a file and, if they still correspond, the integrity of the file is affirmed. Also, computer systems that store hashed versions of passwords do not pose the risk of yielding all user passwords in plain text form, if the files containing them are accidentally exposed of maliciously infiltrated. When users enter passwords to be authenticated, they can be hashed and compared with the stored version, without the need to store the unencrypted form. Given the frequency of ‘insider’ attacks within organizations, such precautions benefit both the users and owners of the systems in question.

Given their wide range of uses, the integrity of hash functions has become important for many industries and applications. For instance, they are used to verify the integrity of software security updates distributed automatically over the Internet. If malicious users were able to modify a file in a way that did not change the ‘fingerprint,’ as verified through a common algorithm, it could open the door to various kinds of attack. Alternatively, malicious users who could work backward from hashed data to the original form could compromise systems in other ways. They could, for instance, gain access to the unencrypted form of all the passwords in a large database. Since most people use the same password for several applications, such an attack could lead to further breaches. The SHA-1 algorithm, which has been widely used since 1995, was significantly compromised in February 2005. This was achieved by a team led by Xiaoyun Wang and primarily based at China’s Shandong University. In the past, the team had demonstrated attacks against MD5 and SHA: hash functions prior to SHA-1. Their success has prompted calls for a more durable replacement.

The need for such a replacement has now led the U.S. National Institute of Standards and Technology to initiate a contest to devise a successor. The competition is to begin in the fall of 2008, and continue until 2011. Contests like the one ongoing have a promising history in cryptography. Notably, the Advanced Encryption Standard, which was devised as a more secure replacement to the prior Data Encryption Standard, was decided upon by means of an open competition between fifteen teams of cryptographers between 1997 and 2000. At least some of those disappointed in that contest are now hard at work on what they hope will become one of the standard hash functions of the future.

Go

Anyone who has even been curious about the game of Go should try this interactive tutorial. Wikibooks also has an introduction, though it does not seem to have been fully written yet. The game is an attractive looking and tricky one, as employed as a dramatic device in the film A Beautiful Mind. Notably, Go is also a game in which the best human players can consistently beat very powerful computers. Unlike chess, it would appear, sheer number crunching ability is not enough to succeed at Go.

Normally played on a grid of 19×19 intersecting lines, the objectives of Go are to capture enemy stones, while also surrounding terrain. Players take turns placing stones on the board, in any position except one where the stone would be immediately captured. Stones or groups of stones that are encircled, such that there are no clear paths or ‘liberties’ extending from them, are captured. Finally, a player may not make a move that would return to the board to how it was immediately before their opponent’s last move (the ‘ko’ rule). The rules according to which stones are placed are very simple, making it initially surprising that the complexity of the game can be so great. Eventually, the game ends when both players pass their turn, indicating that neither sees a possibility for further gains.

As a beginner, playing on a 13×13 board is recommended. The standard size board has more than twice as many intersections to contest, and is probably too much for someone without a developed sense of the game to manage. Learning how to play decently is one project that I will need to suspend, until more pressing tasks are complete.

Studio photography on the (very) cheap

Something useful learned tonight: using standard height white ceilings, a glossy white St. Anthony’s College laundry card, and the on-camera flash on a Canon Powershot A510 digital camera, you can pull off some tolerable bounce-lit flash photography. A hand-held mirror is even better, though I would recommend using a relatively matte ceiling, with that arrangement. The flash is only really adequate for this role in the wide-angle range, due to a low power rating, but this does make it dramatically less unflattering, through the dual benefit of eliminating white patches that have been completely overexposed and removing the unnatural shadows that arise from a flash too close to the lens.

Attempt to make diffusers out of Sainsbury’s receipts, onion-skin paper, and other miscellaneous translucent materials were less successful. I look forward to eventually having a proper off-camera flash with diffuser, not to mention the chance to do some real studio work. If only this pesky thesis wasn’t getting in the way of various hobbies.

Visual programming tools for non-coders

Using Yahoo Pipes, a neat visual tool for making simple web applications, I made an RSS feed that aggregates new blog posts, blog comments, changes to the wiki, and 43(places/things/people) contributions. While this particular feed is probably only of use to me, people may well find the architecture useful for doing other things.

While it will probably never be the case that you can do serious computer engineering without knowing how to write code, tools like this are a good way to deal with the fact that the vast majority of computer users will never write Java or PERL. Designing interfaces which are both flexible and comprehensible to non-experts is quite a challenge, but certainly one worth taking up. Much of the momentum behind blogs is simply the result of the fact that they can be set up and operated by people who have never needed to deal with a command prompt or the configuration of a web server.