Jeff Atwood recently posted an amusing treatise about shuffling cards demonstrating how sometimes the simplest algorithm isn't the right one.
Remember those open-ended interview questions that make you think deeply about a problem? Sometimes you actually need to do that in your job; it's not just an interview trick!
Card shuffling made me think of a similar, more common, and more complex problem of the "Shuffle Playlist" or "Random Slideshow."
(In the 90's we used to joke that "Every application grows until it becomes its own operating system." During the bubble we used to joke that "Every application grows until it is capable of sending and receiving email." Today I think the joke should be: "Every physical device grows until it is capable of playing mp3's or displaying pictures at random.")
The goal of the shuffled playlist isn't the same as shuffling cards. With cards it's sufficient to have a "randomized deck," meaning that every possible ordering of cards has an equal likelihood of being output by the shuffle algorithm.
The reason playlists and slideshows are more complex is: The goal is to make a human feel like the order is random, not to have it actually be random.
What's the difference? Imagine you have 5 songs. The algorithm for choosing the next song is to select one of the songs with equal probability and play it. Obviously this is completely random mathematically. Chi-squared will be happy.
But a human won't like it. It won't be uncommon (20% of the time) for the same song to be played twice in a row. Sometimes (1%) it will be played three times in a row! But that's not really what you want when you put songs on shuffle.
With an image slideshow it's worse. You're expecting images to change every 5 seconds (say), and sometimes it seems to get "stuck" (because the same image comes up twice in a row). Crappy slideshow player! Get a new one.
The usual fix for this is to take the entire playlist and randomize the order of the songs, rather than randomizing each selection of song. This ensures we have the perception of random distribution of songs. In fact it's not random at all by any mathematical definition. It's an even distribution, but not random one.
But that's OK, it's perception that counts. But we're not done! What happens when we go through the playlist once? You'd probably propose we just repeat the algorithm -- reshuffle the playlist and keep going another round -- but it's now still possible to repeat a song. The last song from the first shuffle might be the same as the first song of the second shuffle. A repeat!
In fact, the perception is probably marred even if any songs match near the "crease" between two shuffles. For example, if the last song of the first shuffle matches the second song of the second shuffle, that's probably still too close to "sound random." "I just heard this song!" exclaims the disgruntled, indignant user. And rightly so you bastards!
So now what? With larger playlists, you could start with a full shuffle, then take the last N (or N%?) of the first shuffle and check the first N of the second, and if there are overlaps you could swap songs from that initial "unsafe" region into the "safe" region, at random, being careful of course to swap only with songs allowed in the unsafe region.
"Wait," you cry, after having digested Jeff's excellent reasoning for why selective swapping is clearly for dummies, "that's not random!" No, it's not, and no one cares, because it's doing the Right Thing.
OK, so this swapping idea isn't great -- you can't do my version in place (because you have to remember the unsafe songs) and it's complex. Here's a simple, in-place algorithm:
- S == number of songs, N == number of "unsafe" songs, meaning the last N of the first shuffle must not match any of the first N of the second shuffle.
- Starting with the first shuffle, shuffle songs 0 through S-N. (This ensures that, at least, the first N songs are evenly selected from the set of songs allowed to be there.)
- Now shuffle songs N through S. (This ensures that the unsafe songs are evenly distributed in the safe area.)
The keen eye will notice that if N > S/2, this algorithm fails. Indeed, you cannot satisfy the constraints in that case no matter what the algorithm because there's not enough "safe" songs to fill the "unsafe" region. As a simple example, if N=9 and S=10, we have only one song that is safe and nine unsafe slots to fill!
It's even worse for small playlists. With two items, the best you can do is alternate. With three, I would argue the best technique is to just keep any back-to-back's from happening -- anything more strict and you end up with zero variation, which doesn't feel like a "shuffle."
And don't think I'm being pedantic in calling out the cases of 2 and 3! That electronic picture frame on Grandma's sideboard might very well be displaying just 2 pictures, especially when you're first setting it up for her. First impressions are important!
So what's the lesson?
In most applications, human perception is more important than mathematical correctness. And sometimes it's worth digging into your interviewee skills to cover a problem as deeply as you can. In this case, a user unsatisfied with "shuffle" might not buy, might return the item, or might call tech support. A little thought and common sense could save many thousands of dollars.
Oh yeah,
that's why great developers are worth a lot of money....