Monday, June 11, 2007

Bogo-Permute

So here's today's random question, apropos of, er, nothing. What's the performance of the following incredibly-poor algorithm for randomly permuting a list of numbers?

def bogo_permute(in_list):
    N = len(in_list)
    out_list = []
    # Keep track of whether we have dealt with each element of the input list
    seen = [False] * N
    for iteration in range(N):
        while True:
            which = randrange(N)
            if not seen[which]:
                out_list.append(in_list[which])
                seen[which] = True
                break
    return out_list

There are N selections needed for the output list. The first time through, the randomly selected entry from the input list will definitely not have been processed already, so it can be transferred across to the output list. The second time through, there's a chance that the random selection process will select the same entry again; for later iterations, this chance of selecting something that has already been selected will get higher and higher. So, on average, how many of these repeated picks are needed?

Given a coin that comes up heads with probability p (and so tails with probability q=1-p), what's the average number of coin tosses needed before the first heads comes up?

As an example, imagine that the first head turns up on the fourth toss. This means that the sequence of tosses is TTTH. The probability of this sequence of coin tosses is then q3p = (1-p)3p. This is easy to generalize:

Number of TossesSequenceProbability
1Hp
2THqp
3TTHq2p
4TTTHq3p
mT…THqm-1p

We want the average number of tosses needed to get a head, also known as the expected value. This is a sum of the each possible value multiplied by the probability of that value occurring. In other words, the average number of tosses needed is:

(1*(probability of heads on first toss) + 2*(probability of first heads on second toss) + 3*(probability of heads on third toss) + …)

Put into maths, this means that the average number is an infinite sum: Σm=1 m (qm-1p). From a trusty binomial expansion, the Σm=1 m qm-1 part pops out as (1-q)-2, and since q=1-p this means that the whole thing simplifies down to just 1/p.

(This seems reasonable: for a fair coin, it takes an average of 2 throws to get a head; for a coin that only comes up heads one time in a hundred, it takes an average of a hundred throws.)

So back to the original problem. For the i-th selection (with i running from 1 to N), the chance of picking an array entry that is still there is p=(N+1-i)/N, and so the average number of picks needed is N/(N+1-i). The first selection needs N/N = 1 attempt, the second selection needs N/(N-1) attempts, the third selection needs N/(N-2) attempts, …, the last selection needs N/1 = N attempts (on average).

Taken together, this means that the average number of random picks for the whole process is Σi=1N N/(N+1-i). Rearranging (with j=N+1-i) this is the same as N Σj=1N1/j, which includes a partial sum of the harmonic series, known as a harmonic number HN. In the large N limit, the harmonic numbers tend to HN ≈ ln(N) + γ (where γ is a constant), and so the asymptotic behaviour of the algorithm is O(N ln(N)).

So it's bad, but not quite as bad as I'd expected (I'd guessed it would be O(N2)).

Friday, May 04, 2007

inline_related_objects

[Edit, Feb 2008: fixed typo—changed extra_content to extra_context]

Today's neat Django trick: getting the create_update generic views for a model to display entries from a subsidiary model in-line—like the admin interface does when you add edit_inline=models.TABULAR.

The (oldforms) Form object that gets generated by the create_update view already has most of the required gubbins inside it; we just need a few extra steps to get at it:

  1. In urls.py, set an extra_context argument for the view to be a dictionary that contains inline_related_objects:
     {'inline_related_objects': model._meta.get_followed_related_objects(None) } 
  2. In the template, include a loop to pull in all of the related objects:
     {% for related_object in inline_related_objects %}{% edit_inline related_object %}{% endfor %} 
    within the <form> tag.
  3. Since the edit_inline template tag is not a built-in Django tag, the template also needs to load up the admin_modify extension:
     {% load admin_modify %} 
  4. Make local copies of the relevant admin templates (such as admin/edit_inline_tabular.html and widget/foreign.html) and tweak appropriately.

Of course, I didn't figure this out until after I'd spent a couple of hours dismantling the admin interface code with a view to stealing the relevant bits. Roll on the Django book.

Saturday, April 14, 2007

The Joy of UNIX

UNIX credo: you can do anything from a single command line (as long as the line is allowed to be arbitrarily long).

Case in point: generating a white noise audio file

sox -t sl -r 44100 -c 2 /dev/zero \
-r 44100 -c 2 -w whitenoise.wav synth 10:00 whitenoise vol 0.6 fade q 10 10:00 10

Friday, March 16, 2007

Magic Number

And the number is: 978-1-84753-300-5.

Sunday, February 25, 2007

The Spine is Evil

[reading: Skippy Blair, "Disco to Tango and Back"; recently: Simon Lovell, "How To Cheat At Everything"]

Drat; forgot to add extra space on the inside margins of my current typesetting project, so I had to go back through all of the chapters and re-set everything. Like the man said, the spine is evil.

Wednesday, December 13, 2006

Django Su

It's always a sign of good design when adding a new feature turns out to be easy.

In the Django authentication system, I wanted a way for an administrator to view the site as if they were a particular user; effectively an equivalent to su in UNIX-land.

The first easy step was to invent a URL to correspond to this action, which gets encoded in urls.py:

  (r'^su/(?P<username>.*)/$', 'qlockweb.accounts.views.su', {'redirect_url': '/qlockdata/'}),   

That done, the second and final step is to write some view code.

  @user_passes_test(lambda u: u.is_staff)
  def su(request, username, redirect_url='/'):
      su_user = get_object_or_404(User, username=username)
      if su_user.is_active:
          request.session[SESSION_KEY] = su_user.id
      return HttpResponseRedirect(redirect_url)

Seven lines of code and we're done (modulo a bunch of import statements).

Expanding what's going on here:

    (r'^su/(?P<username>.*)/$', 'qlockweb.accounts.views.su', {'redirect_url': '/qlockdata/'}), 
  • When an HTTP request arrives at the framework, Django goes through its list of URLs until it finds a match. In this case, going to http://mysite/accounts/su/fred/ ends hitting the urls.py line above; the /(?P<username>.*)/ part of the regexp pulls out "fred" and this gets passed as a parameter named username into the function su in qlockweb/accounts/views.py. This function also gets passed a parameter called redirect_url with value '/qlockdata/'.
  • @user_passes_test(lambda u: u.is_staff)
  • Actually, we need to rewind one step before we get into the su function. The line before the function definition is a Python decorator: some extra code wrapping the function that gets executed just before the function itself is executed. This decorator needs some expansion of its own:
      @user_passes_test(lambda u: u.is_staff)
    • The @ sign is the syntactic sugar that indicates that this line is a decorator for the function that comes immediately afterwards.
    • @user_passes_test(lambda u: u.is_staff)
    • This is the decorator function (from contrib/auth/decorators.py); its first argument test_func is a function that does the test. This test function is given a single parameter: the current User. If the test function returns true, the wrapped view code is called; if not, then the user gets redirected to a login page.
    • @user_passes_test(lambda u: u.is_staff)
    • More syntactic sugar. We want a function is_this_a_staff_user(u) that checks whether its argument u is an administrator. However, as this is the only place that the function is used, we don't bother to give it a name—we just use a lambda expression to give the definition right here and now.
    • @user_passes_test(lambda u: u.is_staff)
    • Finally, the body of the lambda expression just uses the method of the User class that indicates whether the user is an administrator or not.
  • def su(request, username, redirect_url='/'):
  • So now we're in the su function itself, and if we've got this far we're guaranteed that the person viewing the page is logged in as an is_staff user. The function has the username and redirect_url parameters mentioned earlier; it also has a request parameter that holds all of the information about the original web request (in a HttpRequest object).
  •   su_user = get_object_or_404(User, username=username)
  • The next line of code gets a User object for the username that was specified—fred in other words. If there isn't a user called fred, then a Http404 exception gets raised, which will percolate up the stack and display a (surprise, surprise) 404 page.
  •   if su_user.is_active:
  • This particular version of our code only allows impersonation of active users, helpfully provided by the is_active field in the standard User model.
  •     request.session[SESSION_KEY] = su_user.id
  • The next line of code is the one that actually does the work. The requesting user's session is modified so that its user ID is the impersonated user's.
  •   return HttpResponseRedirect(redirect_url)
  • The final line of code redirects the web browser off to the redirect_url page.

(Statutory disclaimer: I am not a security expert, nor do I play one on TV. Adding this to a production system is probably not a good idea.)


Another Django snippet: I finally discovered that the follow argument to the standard Manipulators allows you to list fields in the model that the form should leave untouched. Very helpful: the end result is more compact and less brittle than the code I'd put together to manually override all of the hidden fields.

[A:37385 B:3278 C:346 D:9187 E:62544 Total:112740]

Saturday, December 02, 2006

Mind Hack #39

[reading: Jón Árnason, Alan Boucher (translator), "Icelandic Folk Tales"]

One of the MindHacks that doesn't include an easy demonstration is Hack 39, where you're less likely to notice a trigger event if it occurs soon (say, <0.5s) after another trigger event.

(Now that I've got into the content rather than being distracted by the typesetting, the book is turning out to be rather good.)

My first attempt to test this out is below, but I have to say that the effect didn't seem all that strong to me—so maybe there's a bug or I've implemented it wrong. Or maybe I'm just too impatient to run it for long enough to get statistically significant data.

After hitting "Go", this applet will display letters for a tenth of a second each. You should hit a key whenever the letter displayed is either "X" or is displayed in white. To stop the applet, hit "Stop" or press the Escape key.

After running the applet, the bar graph shows the time between triggers (on the X axis) against how many triggers there were with that delay; the white is the total number shown, the black is the subset of them that were missed. Hitting "Go" again accumulates more data.

(Download source code)

Technical Details: letters are picked uniformly (so there's a 1/26 chance of a letter X trigger) and white is used for the colour 1/20 of the time. A trigger is considered hit if there's a keypress within 1 second after the trigger, but each keypress only counts once (so "trigger, trigger, keypress, letters…" would count as one hit then one missed trigger, with a delay of 1).

[A:37385 B:3278 C:346 D:9187 E:46445 Total:96641]

Thursday, November 30, 2006

*knolp*

[reading: William Faulkner, "As I Lay Dying"; recently Neil Strauss & Bernard Chang, "How To Make Money Like a Porn Star"]

Now that I've got Django installed and running, I've been setting up my first web application with it.

I spent a bunch of time yesterday trying to figure out how to get extra parameters passed through a generic view; in the end I had to UTSL to get a method that worked.

So of course today I find a nice page that concisely and coherently explains it, rather more quickly than the couple of hours it took me to figure out.

Perhaps I can suggest *knolp* (the reverse of *plonk*) as the sound of an RSS feed hitting my aggregator. (It looks like I'm not the first person to think of this.)

[A:37385 B:3278 C:346 D:9187 E:46320 Total:96516]

Saturday, November 25, 2006

&numl;

[reading: Derek Young, "Rock'n'Roll Dancing"]

Here's a question: is there a Unicode character for an N with an umlaut?

(Triggered by eating a splendid dessert of brownies and Häagen-Dazs ice cream, which made me think of that most famous of fake umlauts, the one in the name of "Spinal Tap").

Oh, and compounds like U+006E U+0208 (i.e. 0x6e 0xcc 0x88 in UTF-8) don't count.

[A:37385 B:3278 C:346 D:9187 E:39180 Total:89376]

Tuesday, November 21, 2006

Editing Hacks

[reading: Tom Stafford & Matt Webb, "Mind Hacks"]

Started reading "Mind Hacks", which looked interesting in the bookshop but the formatting is already starting to annoy me. In that respect, it's the worst book-that's-just-a-printout-of-a-cool-website ever, and that's up against some pretty stiff competition.

  • A large fraction of the hacks refer the reader off to movie files or Flash animations on the web. Not very helpful for a physical book that you might want to read on a train.
  • "Color: The second color is used to indicate a cross-reference within the text". Except that there is no second color—the relevant parts just come out in a hard-to-read light gray.
  • The halftoning for photographs and some of the diagrams is poor—it looks like the output of an 1980's laserprinter.
  • I don't know what system they used to produce the book, but it's generated some real oddities in linebreaking (mostly around URLs). Favourite so far: a line break after the decimal point in "3.3" (page 144).

Still, the content might be OK once I get into it, and at least they include lots of references to the original literature.

[25-Nov-06] Edited to clarify that I'm complaining about the formatting rather than the content. Feeling slightly guilty given that one of the authors came over and commented. You'd think I'd have learnt my lesson about who finds what on tha intarweb.

[A:37385 B:3278 C:346 D:9187 E:34361 Total:84557]

Thursday, November 09, 2006

One Year In

[reading: Seamus Heaney, "North"]

Just noticed: it's a year since the original idea for the company.

Thursday, August 17, 2006

The Truth Mines

In the Truth Mines, though, the tags weren't just references; they included complete statements of the particular definitions, axioms, or theorems the objects represented.

Every tunnel in the Mines was built from the steps of a watertight proof; every theorem, however deeply buried, could be traced back to every one of its assumptions. And to pin down exactly what was meant by 'proof', every field of mathematics used its own collection of formal systems: sets of axioms, definitions, and rules of deduction, along with the specialised vocabulary needed to state theorems and conjectures precisely.*

I came across a cool maths site today (via Good Math, Bad Math). I'd actually been thinking about doing it myself for the last few years, but it always looked a bit too much like hard work (even when I had Copious Free Time), so I never quite got around to it.

It's kind of eerie, though—the proof pages are almost exactly like the ones I'd mocked up and had in my head when I was playing with the project (although less XML/MathML-y). I guess there aren't really that many different ways you could present it, but it does feel like some sort of noodly appendage has sucked the idea right out of my head and served it up as a completed web page (conveniently skipping the tedious business of doing all the work).


* Greg Egan, "Diaspora", 1997