How I Defeated Wordle with Python
05 January 2022
Wordle is a simple online word game. It is played by similar rules to the classic game Mastermind, which is known in the United States through the version licensed to Hasbro. Wordle is played with words instead of arbitrary configurations of pegs or numbers, and there is one added twist: each of your guesses must also be a word in Wordle’s dictionary, not any arbitrary configuration of letters.
For those unfamiliar with Mastermind, I recommend clicking through to try the game or reading this brief description of the rules: every day, the game randomly chooses a five letter word from a predetermined list of possible words. On each of your turns, you will guess a word that must be in a dictionary contained by the game. If you guess the word within 6 turns, you win. The game tracks your streak of wins and the number of guesses it takes you to solve each one.
For each guess, the game will tell you whether each letter
- does not exist in the correct answer at all
- exists in the correct answer, but is at the incorrect location
- is the correct letter and is at the correct location
Each letter in the guess has a 1-1 correspondence with the same letter in the answer, if there is one. This means that if the correct answer is “means”, and you guess “green”, you would see the following:
The yellow color indicates that the chosen letter appears in the solution, but that it is in the wrong place in our guess. The second “e” appears in grey, because there is no second “e” in the solution. There must be a 1-1 correspondence between each guessed letter and a solution letter, and if they were both highlighted in yellow, the “e” in the solution would correspond to two letters in the guess.
Likewise, a green letter, indicating a letter in the correct location, takes precedence over a letter in the incorrect location. So if the solution was “rules”, we would see the following:
Here as before only one “e” is highlighted, but the second is in green rather than the first being in yellow, since the letter in a correct location takes precedence over a letter in an incorrect location.
I’d like to solve the puzzle
When I come across a puzzle like this, I’m immediately compelled to think about optimal ways to play it. For example, the player is in the same knowledge position at the beginning of the game every time. Since the opponent is not adversarial (a random 5 letter word is picked, not a word intended to be problematic for any particular guessing strategy), the optimal strategy necessarily means guessing the same word first every time. Few human players likely play this way.
In fact, given that the solution word is drawn from an unchanging list of words before the game begins, this means you can just pre-calculate the optimal guess given each previous guess and response from the game, laying out every possible game state in a tree-shaped diagram.
I have written a solver for Wordle in Python which searches for an optimal (per several rules, see below) game tree, and saves it in a JSON file, which turns out to only contain about 15 KB when compressed. The optimal game tree for Wordle therefore turns out to be pretty small!
How the solver works
The ideal game tree would have 3 constraints:
- Every guess is a word that the game will accept
- The maximum path to a solution is less than seven guesses
- The average path to a solution is minimized
I have made several simplifications in order to quickly get a reasonable solution:
- Every guess is a word that the game will consider using as a solution
- Each guess results in the smallest number of live possible solutions, on average.
A constraint to keep the maximum path under seven guesses turns out not to be necessary, because a solver with just the two constraints above will never take more than five.
Likewise, always guessing the word that will result in (on average) the smallest number of possible solutions is only an approximation of optimal guessing. There are 2314^6 = 153525361154699100736 different possible routes through the game, and so brute forcing your way to an optimal solution is (while not unthinkable as it is in chess), probably unworkable in Python. Clever pruning of the search tree should help (and I’m going to look at this at some point), but is not as easy as it is in games with an opponent. In chess, you can simplify the tree by assuming that the opponent will always make the move that is worst for you. In Wordle, because the solution is random, you have to always consider all possibilities and try to find the guess that is optimal on average.
Still, though, reducing the average number of live solutions as much as possible is a very good approximation of ideal play. Consider the following situation:
After the second guess, the computer makes the interesting choice “pygmy”. This may seem counter-intuitive. Not only do we already have two vowels confirmed to be in the word, “i” and “e”, the guess actually contains “y” twice, which reduces the number of letters contained in the guess. As a matter of fact however, this guess is rather astute. The computer will always find the solution on the fourth guess depending on the outcome of this query.
Using ‘A’ to mean absent, ‘P’ to mean present, and ‘C’ to mean correct:
AAAAA → liver AAAPA → timer AAPAA → giver AACAA → tiger PAAAA → viper CAAAA → piper
Being in a point in the game tree where there are six descendent guesses means that there are six possible solutions. Clearly, optimal behavior would be to always find the solution on the next turn. A brute force search would find this solution, but so would the heuristic of eliminating as many possibilities as possible. With “pygmy”, we always eliminate five possible solutions, the best result possible in this case.
Using a heuristic like this is much faster than a brute force search, because it generates a guess in each situation without needing any recursion at all.
This was just a quick little project for me, taking a few hours, so the code is relatively unoptimized. Determining the best first guess requires searching every combination of possible guess and possible solution, and this takes several hours. The complete tree is generated in only a few minutes after that. Because it is stored as JSON, a player for the game is included that doesn’t have to do any searching - it simply reads the next guess out of the game tree.
The solution is found, on average, in 3.51 guesses. In the worst case scenario, the solution is found is 5 guesses. The histogram of outcomes is as follows:
1: 1 (0.0%) 2: 65 (2.8%) 3: 1085 (46.9%) 4: 1074 (46.4%) 5: 90 (3.9%)
If anyone improves on my solution by utilizing the complete dictionary or achieving a brute force solve of the game, I would be curious to hear how much you manage to improve on these statistics.
Jan 11: I discovered that the program was treating all guesses that resulted in the same average reduction to the live possibilities as equivalent. We can, without violating the constraints given above, choose to prefer guesses that are also possible solutions to the current puzzle. Adopting this change reduced the average solution distance from 3.68 to 3.51 guesses.
All of my code for this project is open sourced, under a GPL3 license, on Github. This includes:
- words.txt - a list of the words used in possible solutions on Wordle
- solver.py - uses words.txt to generate the solution (solution.json)
- solution.json - a pregenerated copy of the solution
- stats.py - analyzes solution.json to generate the stats given above
- player.py - an interactive prompt that uses the solution to play the game
- game.py - in case you want to get addicted to Wordle rather than play once a day, this game will generate endless puzzles for you to solve
All the Python code should run on recent versions of Python 3, including the Pypy implementation for additional speed.