Seeking new Oxford bloggers

Oxford is positively laden with newly arriving students. At least some of them must be bloggers. If you are among them, please let a comment with a link back to your site (if you want it added to my listing of Oxford blogs). Likewise, if anyone has found such a fresher blog, please leave a comment that links back to it.

I will not link blogs immediately. Rather, I will wait to see that they:

  1. have at least some real content
  2. have been around for at least a few weeks

Otherwise, maintaining the list would take far too long, and too many items in it would be without much value.

All Oxford bloggers should remember that the fourth OxBloggers gathering is happening on Wednesday of 4th week, November 1st.

PS. Making a link in a blog comment is easy. Just use the following format, replacing the square brackets with pointy ones (the ones that look like this shape ^ turned on either side):

[a href=”http://www.thesiteyouarelinking.com”]the text you want for the link[/a]

That will make a string of blue text that says: “the text you want for the link.” When clicked, it will take the browser to www.thesiteyouarelinking.com. Every bit of the formatting is important, including the quotation marks, so be careful.

Basic problems with biometric security

You have to wonder whether anything other than having watched too many James Bond films feeds the idea that biometrics are a good means of achieving security. Nowadays, Canadians are not allowed to smile when they are having their passport photos taken, in hopes that computers will be able to read the images more easily. Of course, any computer matching system foiled by something as simple as smiling is not exactly likely to be useful for much.

Identification v. authentication

Biometrics can be used in two very distinct ways: as a means of authentication, and as a means of identification. Using a biometric (say, a fingerprint) to authenticate is akin to using a password in combination with a username. The first tells the system who you claim to be, the second attempts to verify that using something you have (like a keycard), something you know (like a password), or something you are (like a fingerprint scan). Using a biometric for identification attempts to determine who you are, within a database of possibilities, using biometric information.

Using a fingerprint scan for identification is much more problematic than using it for authentication. This is a bit like telling people to enter a password and, if it matches any password in the system, allow them into that person’s account. It isn’t quite that bad, because fingerprints are more unique and secure than passwords, but the problem remains that as the size of the database increases, the probability of false matching increases.

For another example, imagine you are trying to identify the victim of a car wreck using dental records. If person X is the registered owner and hasn’t been heard from since the crash, we can use dental records to authenticate that a badly damaged body almost certainly belongs to person X. This is like using biometrics for authentication. Likewise, if we know the driver could be one of three people, we can ascertain with a high degree of certainty which it is, by comparing dental x-rays from the body with records for the three possible matches. The trouble arises when we have no idea who person X is, so we try running the x-rays against the whole collection that we have. Not only is this likely to be resource intensive, it is likely to generate lots of mistakes, for reasons I will detail shortly.

The big database problem in security settings

The problem of a big matching database is especially relevant when you are considering the implementation of wholesale surveillance. Ethical issues aside, imagine a database of the faces of thousands of known terrorists. You could then scan the face of everyone coming into an airport or other public place against that set. Both false positive and false negative matches are potentially problematic. With a false negative, a terrorist in the database could walk through undetected. For any scanning system, some probability (which statisticians call Beta, or the Type II Error Rate) attaches to that outcome. Conversely, there is the possibility of identifying someone not on the list as being one of the listed terrorists: a false positive. The probability of this is Alpha (Type I Error Rate), and it is in setting that threshold that the relative danger of false positives and negatives is established.

A further danger is somewhat akin to ‘mission creep’ – the logic that, since we are already here, we may as well do X in addition to Y, where X is our original purpose. This is a very frequent security issue. For example, think of driver’s licenses. Originally, they were meant to certify to a police officer that someone driving a car is licensed to do so. Some types of people would try to attack that system and make fake credentials. But once having a driver’s license lets you get credit cards, rent expensive equipment, secure other government documents, and the like, a system that existed for one purpose is vulnerable to attacks from people trying to do all sorts of other things. When that broadening of purpose is not anticipated, a serious danger exists that the security applied to the originally task will prove inadequate.

A similar problem exists with potential terrorist matching databases. Once we have a system for finding terrorists, why not throw in the faces of teenage runaways, escaped convicts, people with outstanding warrants, etc, etc? Again, putting ethical issues aside, think about the effect of enlarging the match database on the possibility of false positive results. Now, if we can count on security personnel to behave sensibly when such a result occurs, there may not be too much to worry about. Numerous cases of arbitrary detention, and even the use of lethal force, demonstrate that this is a serious issue indeed.

The problem of rare properties

In closing, I want to address a fallacy that relates to this issue. When applying an imperfect test to a rare case, you are almost always more likely to get a false positive than a legitimate result. It seems counterintuitive, but it makes perfect sense. Consider this example:

I have developed a test for a hypothetical rare disease. Let’s call it Panicky Student Syndrome (PSS). In the whole population of students, one in a million is afflicted. My test has an accuracy of 99.99%. More specifically, the probability that a student has PSS is 99.99%, given that they have tested positive. That means that if the test is administered to a random collection of students, there is a one in 10,000 chance that a particular student will test positive, but will not have PSS. Remember that the odds of actually having PSS are only one in a million. There will be 100 false positives for every real one – a situation that will arise in any circumstance where the probability of the person having that trait (whether having a rare disease or being a terrorist) is low.

Given that the reliability of even very expensive biometrics is far below that of my hypothetical PSS test, the ration of false positives to real ones is likely to be even worse. This is something to consider when governments start coming after fingerprints, iris scans, and the like in the name of increased security.

PS. Those amazed by Bond’s ability to circumvent high-tech seeming security systems using gadgets of his own should watch this MythBusters clip, in which an expensive biometric lock is opened using a licked black and white photocopy of the correct fingerprint.

PPS. I did my first Wikipedia edit today, removing someone’s childish announcement from the bottom of the biometrics entry.

[Update: 3 October 2006] For a more mathematical examination of the disease testing example, using Bayes’ Theorem, look here.

On being an inept and reluctant webmaster

A website I am managing (not this one) is proving exceptionally frustrating. When I disabled the ‘what you see is what you get’ (WYSIWYG) editor in WordPress, I did so because its name was a filthy lie. In truth, what you code, and check, and then check again in every other browser you care to support is what you get. Well, the content management system (CMS) for the other site it like the the WYSIWYG editor writ large: nothing you do actually shows on the site in the way it showed in the editor. Like with the WordPress editor, hundreds of useless tags get added in opening and closing pairs. What’ s more, the CMS has added many layers of complexity to what it, in essence, a very simple site. The only way I have been able to edit tables in one part of the site has been the grab the HTML, edit it using jEdit, then paste it back into the site. This is clearly not the kind of thing you should have to do when you are running an elaborate CMS.

The simplicity of the content, versus the complexity of the management, is tempting me to copy the whole site over to a new CMS that is more comprehensible. Right now, we are using a system called Mambo. In many ways, it is a lot like WordPress. It uses an SQL database to store content, then displays it on dynamically generated pages. I am pretty sure WordPress could actually handle everything this website does, though having it look like a blog would not be acceptable.

Does anybody know of a free CMS that can be hosted using Apache and MySQL that might be easier to work with than Mambo?

Laptop RAM for sale

Before it goes up on eBay, I thought I should privately advertise the ability of a 256 meg stick of laptop RAM. I originally bought it directly from Apple, along with my 14″ G4 iBook and have since replaced it with a 1GB stick. It is in perfect working order, and should work with any laptop that takes 200pin PC2700 RAM. This includes all G3 and G4 iBooks and Powerbooks.

Continue reading “Laptop RAM for sale”

State of the iBook

According to iStat Pro, a system monitoring Dashboard widget, the battery in my 14″ G4 iBook only has 31% of the endurance that it shipped with, a bit more than a year ago. No wonder I have been unplugging it from the wall recently only to find less than an hour worth of power available. Of course, the figure it gives is untrue. With somewhere between ten and fifteen minutes remaining, the computer will simply turn off – hopefully in a way that seeks to avert file corruption. Every little click of my hard drive now makes me fearful of losing this vital academic and personal tool. The experience of the succession of iPods has made me wary. Backups as frequent as I can bear to run them seem the best option.

Since it would be at least US$129.00 to replace my iBook battery, I must simply tolerate the lack of stamina until such a time arises (probably once I have tunneled my way out of student debt) to strip this machine of most of its RAM and move to something snazzier.

[Update: 13 October 2008] My original iBook battery has now failed completely. It cannot run the computer for even a fraction of a second, the LED charge display on the bottom of the battery doesn’t work, and the computer often cannot detect that the battery is present.

Thesis document organization strategies

A practical question to those who have walked the path of grad school before me: when working on a major research project, how did you take notes on books, articles, and the rest? How did you file those notes? Also, how did you file documents and photocopies that served as sources? All the archivist readers of this blog out there, now is your time to show your colours.

I will be using EndNote for citation purposes, largely to save myself from the need to deal with the formatting of hundreds of distinct footnotes (for substantive asides) and endnotes (for simple citation). While the EndNote program does have faculties for note organization, there are two problems. One is the clunky interface, which does not strike me as useful for much beyond the aforementioned auto-citing. The other is the fact that I can only access EndNote on the departmental terminal server; I do not have a copy of my own, but have to use it on a virtual desktop of Windows Server 2003. That said, acquiring my own copy of the program might prove a necessary expense, both for the thesis and subsequent research projects. I certainly wish I had been using it when I wrote the fish paper.

The first big choice for overall organization seems to be pen and paper versus electronic; though the variety of sources will always make the whole library somewhat hybrid, hopefully with 90% in the dominant medium and a well-sorted 10% in the other. I find taking notes on the computer likely to be overly distracting, though my handwritten notes can be far from elegant. At the same time, my computer files are generally both very well organized and easily searchable. As such, the ideal option might be to write notes by hand, then type and print them. Of course, there are time and financial limitations on that approach. The whole blog constellation is also a good organizational tool for me.

Perhaps most important, did anyone try a system that completely failed to work, and should be avoided? I expect the thesis to eventually involve hundreds of sources. Most of them will be books that I have access to but do not own, and journal articles which I can print or photocopy. I have a big hanging file box to sort such articles, and perhaps photocopied sections from books, but I need to devise a system to coordinate the hundreds of pages of my own notes that this project will ultimately rest upon.

First impressions of Tiger

Installation, performance, and headline features

Installing Mac OS 10.4 (Tiger) was a breeze – certainly the only time I have been surprised by the ease of installing a major OS upgrade on top of an existing installation. Because of the simultaneity of my RAM upgrade and the installation of Tiger, I can’t discuss the performance effect of the new OS. In aggregate, the iBook runs like a whole new machine. While it used to creak and complain when running just Fetch and iPhoto, it now runs Firefox, iTunes, iPhoto, Photoshop, TextEdit, and Fetch without any trouble. Indeed, 321 megs of RAM are still inactive with that collection, as a Dashboard widget informs me (along with weather forecasts for Oxford, and the all-important Canada-England exchange rate).

Another widget (pearLyrics, suggested by Jessica) is adding lyrics to my iTunes tracks as they are played, though it is oddly myopic when it comes to absurdly popular songs not released in the last ten years (most songs by Pink Floyd and The Beatles seem to stump it). This is a particularly useful function for me, as I am prone to either ignore lyrics entirely, focusing on the tone of a song, or spectacularly misinterpret them. I will leave the specific examples to my friends, for use in mocking me at parties.

Spotlight hasn’t proved terribly useful to me yet, mostly because I already have all my information sorted for easy location. That is not too unusual, however. All advanced search tools – from Google to Oxford’s OLIS – take time to become familiar enough to be really beneficial. Next time I am digging around for a particular file not viewed in months, it may prove its worth.

I have not used Automator yet, but I eventually want to create a workflow that takes an image file from iPhoto, opens it in Photoshop, resizes it to either 1024 pixels across for horizontal shots or 768 from top to bottom for vertical ones, lets me adjust the levels, lets me apply an unsharp mask, then creates a 320 pixel across the long axis thumbnail version, and uploads both versions to my server using Fetch. Ideally, it would then create an appropriate block of code to paste into WordPress, but this is all a project for a less hectic time.

Newly usable software

One of the reasons I decided to upgrade to Tiger was the desire to use the free application WriteRoom. The essence of simplicity, it is just a black screen onto which you type. Unlike almost all Mac applications, it can be made to take up the whole screen. There is no formatting – though there is optional spell-checking – and the simplicity seems to contribute to my ability to concentrate on the topic at hand. At a stroke, it becomes more like writing a letter than writing an email, which is clearly a valuable transition in a world where enough attention is rarely paid to written expression.

As soon as I can get my Airport card to work in passive mode with packet injection capabilities, the new version of KisMAC seems likely to be quite useful. It is already rather better than the standard Mac OS WiFi interface, in terms of detecting networks and revealing their characteristics.

Less obvious improvements

Another unexpectedly good feature of Mac OS 10.4 is the Grapher utility, which does both two and three dimensional graphing in a useful and attractive way. I may not have enormously much cause to use a graphing calculator these days, but it can be good to play around in an attempt to remember some fraction of what once I knew about trigonometric functions and calculus.

A few welcome improvements have also been made to Safari, though I have yet to open iCal (I use Google Calendar, though iCal would be grabbing the data from there and copying it to my iPod, if my iPod wasn’t broken again), Mail (I use Entourage and GMail), or Address Book (same). The improved PDF functions built into every Print dialog definitely look useful for anyone who does any sort of document publishing or collaboration.

PS. Unrelated to Tiger, but Apple-related: If you call Apple about your broken iPod using a scratchy enough Skype connection, they will call you back at their expense. Since spending time on hold at 30p a minute cell phone fees is among the most tooth-grinding of human experiences, this is good to know.

Hey Tiger… wake up

Now that I have enough RAM to be confident of being able to run Dashboard without making my iBook run like molasses, I picked up a copy of Mac OS 10.4 (Tiger) from the UBC Bookstore today. Wish me luck – and no file corruption – for the cross-over. I probably won’t have time to carry out such an operation in Vancouver.

[Update: 6:00pm] Easiest operating system update ever. I started the installer, went for coffee, and came back to find my computer at the login screen. All my files and applications seem to have passed through the transition intact.

[Update: 7:00pm] Existing Tiger users: which widgets do you use, and why?