So here's today's random question, apropos of, er, nothing. What's the performance of the following
incredibly-poor
algorithm for randomly permuting a list of numbers?
def bogo_permute(in_list):
N = len(in_list)
out_list = []
# Keep track of whether we have dealt with each element of the input list
seen = [False] * N
for iteration in range(N):
while True:
which = randrange(N)
if not seen[which]:
out_list.append(in_list[which])
seen[which] = True
break
return out_list
There are N selections needed for the output list. The first time through, the randomly selected
entry from the input list will definitely not have been processed already, so it can be transferred
across to the output list. The second time through, there's a chance that the random selection
process will select the same entry again; for later iterations, this chance of selecting something
that has already been selected will get higher and higher. So, on average, how many of these repeated
picks are needed?
Given a coin that comes up heads with probability p (and so tails with probability q=1-p),
what's the average number of coin tosses needed before the first heads comes up?
As an example, imagine that the first head turns up on the fourth toss. This means that the sequence
of tosses is TTTH. The probability of this sequence of coin tosses is then
q3p = (1-p)3p.
This is easy to generalize:
Number of Tosses | Sequence | Probability |
1 | H | p |
2 | TH | qp |
3 | TTH | q2p |
4 | TTTH | q3p |
⋮ | ⋮ | ⋮ |
m | T…TH | qm-1p |
We want the average number of tosses needed to get a head, also known as the
expected value.
This is a sum of the each possible value multiplied by the probability of that value occurring. In other
words, the average number of tosses needed is:
(1*(probability of heads on first toss) + 2*(probability of first heads on second toss) + 3*(probability of
heads on third toss) + …)
Put into maths, this means that the average number is an infinite sum:
Σm=1∞ m (qm-1p). From a trusty
binomial
expansion, the
Σm=1∞ m qm-1 part pops out
as (1-q)-2, and since q=1-p this means that the whole thing
simplifies down to just 1/p.
(This seems reasonable: for a fair coin, it takes an average of 2 throws to get a head; for a coin
that only comes up heads one time in a hundred, it takes an average of a hundred throws.)
So back to the original problem. For the i-th selection (with i running from 1 to N),
the chance of picking an array
entry that is still there is p=(N+1-i)/N, and so the average number of picks needed is
N/(N+1-i). The first selection needs N/N = 1 attempt, the second selection
needs N/(N-1) attempts, the third selection needs N/(N-2) attempts, …, the last
selection needs N/1 = N attempts (on average).
Taken together, this means that the average number of random picks for the whole process is
Σi=1N N/(N+1-i). Rearranging (with j=N+1-i) this is the
same as N Σj=1N1/j, which includes a partial sum of the
harmonic series,
known as a
harmonic number HN.
In the large N limit, the harmonic numbers tend to HN ≈ ln(N) + γ
(where γ is a constant),
and so the asymptotic behaviour of the algorithm is O(N ln(N)).
So it's bad, but not quite as bad as I'd expected (I'd guessed it would be O(N2)).