Copious Free Time: Bogo-Permute

So here's today's random question, apropos of, er, nothing. What's the performance of the following incredibly-poor algorithm for randomly permuting a list of numbers?

def bogo_permute(in_list):
    N = len(in_list)
    out_list = []
    # Keep track of whether we have dealt with each element of the input list
    seen = [False] * N
    for iteration in range(N):
        while True:
            which = randrange(N)
            if not seen[which]:
                out_list.append(in_list[which])
                seen[which] = True
                break
    return out_list

There are N selections needed for the output list. The first time through, the randomly selected entry from the input list will definitely not have been processed already, so it can be transferred across to the output list. The second time through, there's a chance that the random selection process will select the same entry again; for later iterations, this chance of selecting something that has already been selected will get higher and higher. So, on average, how many of these repeated picks are needed?

Given a coin that comes up heads with probability p (and so tails with probability q=1-p), what's the average number of coin tosses needed before the first heads comes up?
As an example, imagine that the first head turns up on the fourth toss. This means that the sequence of tosses is TTTH. The probability of this sequence of coin tosses is then q³p = (1-p)³p. This is easy to generalize:

Number of Tosses Sequence Probability

1 H p

2 TH qp

3 TTH q²p

4 TTTH q³p

⋮ ⋮ ⋮

m T…TH q^m-1p

We want the average number of tosses needed to get a head, also known as the expected value. This is a sum of the each possible value multiplied by the probability of that value occurring. In other words, the average number of tosses needed is:

(1*(probability of heads on first toss) + 2*(probability of first heads on second toss) + 3*(probability of heads on third toss) + …)

Put into maths, this means that the average number is an infinite sum: Σ_m=1^∞ m (q^m-1p). From a trusty binomial expansion, the Σ_m=1^∞ m q^m-1 part pops out as (1-q)^-2, and since q=1-p this means that the whole thing simplifies down to just 1/p.

(This seems reasonable: for a fair coin, it takes an average of 2 throws to get a head; for a coin that only comes up heads one time in a hundred, it takes an average of a hundred throws.)

Number of Tosses	Sequence	Probability
1	H	p
2	TH	qp
3	TTH	q²p
4	TTTH	q³p
⋮	⋮	⋮
m	T…TH	q^m-1p

So back to the original problem. For the i-th selection (with i running from 1 to N), the chance of picking an array entry that is still there is p=(N+1-i)/N, and so the average number of picks needed is N/(N+1-i). The first selection needs N/N = 1 attempt, the second selection needs N/(N-1) attempts, the third selection needs N/(N-2) attempts, …, the last selection needs N/1 = N attempts (on average).

Taken together, this means that the average number of random picks for the whole process is Σ_i=1^N N/(N+1-i). Rearranging (with j=N+1-i) this is the same as N Σ_j=1^N1/j, which includes a partial sum of the harmonic series, known as a harmonic number H_N. In the large N limit, the harmonic numbers tend to H_N ≈ ln(N) + γ (where γ is a constant), and so the asymptotic behaviour of the algorithm is O(N ln(N)).

So it's bad, but not quite as bad as I'd expected (I'd guessed it would be O(N²)).

Copious Free Time

Monday, June 11, 2007

Bogo-Permute

No comments:

Blog Archive

Tags

Links