### Bogo-Permute

So here's today's random question, apropos of, er, nothing. What's the performance of the following incredibly-poor algorithm for randomly permuting a list of numbers?

def bogo_permute(in_list): N = len(in_list) out_list = [] # Keep track of whether we have dealt with each element of the input list seen = [False] * N for iteration in range(N): while True: which = randrange(N) if not seen[which]: out_list.append(in_list[which]) seen[which] = True break return out_list

There are *N* selections needed for the output list. The first time through, the randomly selected
entry from the input list will definitely not have been processed already, so it can be transferred
across to the output list. The second time through, there's a chance that the random selection
process will select the same entry again; for later iterations, this chance of selecting something
that has already been selected will get higher and higher. So, on average, how many of these repeated
picks are needed?

Given a coin that comes up heads with probability

p(and so tails with probabilityq=1-p), what's the average number of coin tosses needed before the first heads comes up?As an example, imagine that the first head turns up on the fourth toss. This means that the sequence of tosses is TTTH. The probability of this sequence of coin tosses is then

q=^{3}p(1-p). This is easy to generalize:^{3}p

Number of Tosses Sequence Probability 1 H p2 TH qp3 TTH q^{2}p4 TTTH q^{3}p⋮ ⋮ ⋮ m T…TH q^{m-1}pWe want the average number of tosses needed to get a head, also known as the expected value. This is a sum of the each possible value multiplied by the probability of that value occurring. In other words, the average number of tosses needed is:

(1*(probability of heads on first toss) + 2*(probability of first heads on second toss) + 3*(probability of heads on third toss) + …)Put into maths, this means that the average number is an infinite sum:

Σ. From a trusty binomial expansion, the_{m=1}^{∞}m (q^{m-1}p)Σpart pops out as_{m=1}^{∞}m q^{m-1}(1-q), and since^{-2}q=1-pthis means that the whole thing simplifies down to just.1/p(This seems reasonable: for a fair coin, it takes an average of 2 throws to get a head; for a coin that only comes up heads one time in a hundred, it takes an average of a hundred throws.)

So back to the original problem. For the *i*-th selection (with *i* running from 1 to N),
the chance of picking an array
entry that is still there is *p=(N+1-i)/N*, and so the average number of picks needed is
*N/(N+1-i)*. The first selection needs *N/N = 1* attempt, the second selection
needs *N/(N-1)* attempts, the third selection needs *N/(N-2)* attempts, …, the last
selection needs *N/1 = N* attempts (on average).

Taken together, this means that the average number of random picks for the whole process is
*Σ _{i=1}^{N} N/(N+1-i)*. Rearranging (with

*j=N+1-i*) this is the same as

*N Σ*, which includes a partial sum of the harmonic series, known as a harmonic number

_{j=1}^{N}1/j*H*. In the large

_{N}*N*limit, the harmonic numbers tend to

*H*(where

_{N}≈ ln(N) + γ*γ*is a constant), and so the asymptotic behaviour of the algorithm is

**.**

*O(N ln(N))*
So it's bad, but not quite as bad as I'd expected (I'd guessed it would be *O(N ^{2})*).

## No comments:

Post a Comment