How I defeated Wordle with Python

05 January 2022 · Adam Fontenot

Wordle is a simple online word game. It is played by similar rules to the classic game Mastermind, which is known in the United States through the version licensed to Hasbro. Wordle is played with words instead of arbitrary configurations of pegs or numbers, and there is one added twist: each of your guesses must also be a word in Wordle’s dictionary, not any arbitrary configuration of letters.

For those unfamiliar with Mastermind, I recommend clicking through to try the game or reading this brief description of the rules: every day, the game randomly chooses a five letter word from a predetermined list of possible words. On each of your turns, you will guess a word that must be in a dictionary contained by the game. If you guess the word within 6 turns, you win. The game tracks your streak of wins and the number of guesses it takes you to solve each one.

For each guess, the game will tell you whether each letter

Each letter in the guess has a 1-1 correspondence with the same letter in the answer, if there is one. This means that if the correct answer is “means”, and you guess “green”, you would see the following:

the word "green" with the first "e" and the "n" highlighted in yellow

The yellow color indicates that the chosen letter appears in the solution, but that it is in the wrong place in our guess. The second “e” appears in grey, because there is no second “e” in the solution. There must be a 1-1 correspondence between each guessed letter and a solution letter, and if they were both highlighted in yellow, the “e” in the solution would correspond to two letters in the guess.

Likewise, a green letter, indicating a letter in the correct location, takes precedence over a letter in the incorrect location. So if the solution was “rules”, we would see the following:

the word "green" with the "r" highlighted in yellow, and the second "e" highlighted in green

Here as before only one “e” is highlighted, but the second is in green rather than the first being in yellow, since the letter in a correct location takes precedence over a letter in an incorrect location.

I’d like to solve the puzzle

When I come across a puzzle like this, I’m immediately compelled to think about optimal ways to play it. For example, the player is in the same knowledge position at the beginning of the game every time. Since the opponent is not adversarial (a random 5 letter word is picked, not a word intended to be problematic for any particular guessing strategy), the optimal strategy necessarily means guessing the same word first every time. Few human players likely play this way.

In fact, given that the solution word is drawn from an unchanging list of words before the game begins, this means you can just pre-calculate the optimal guess given each previous guess and response from the game, laying out every possible game state in a tree-shaped diagram.

I have written a solver for Wordle in Python which searches for an optimal (per several rules, see below) game tree, and saves it in a JSON file, which turns out to only contain about 15 KB when compressed. The optimal game tree for Wordle therefore turns out to be pretty small!

How the solver works

The ideal game tree would have 3 constraints:

I have made several simplifications in order to quickly get a reasonable solution:

A constraint to keep the maximum path under seven guesses turns out not to be necessary, because a solver with just the two constraints above will never take more than five.

As it turns out, the game actually has two word lists: one is the set of words it will consider using for solutions, just over 2000 words. The other is the full dictionary of words it will accept as guesses. I found both these lists in the Javascript source code of the game. Clever use of the full list of guessable words would in same cases allow faster solves, but my solver is so efficient even without this that I haven’t seen fit to implement it yet. So my program only guesses words that could, in theory, be used by the program as solutions.

Likewise, always guessing the word that will result in (on average) the smallest number of possible solutions is only an approximation of optimal guessing. There are 2314^6 = 153525361154699100736 different possible routes through the game, and so brute forcing your way to an optimal solution is (while not unthinkable as it is in chess), probably unworkable in Python. Clever pruning of the search tree should help (and I’m going to look at this at some point), but is not as easy as it is in games with an opponent. In chess, you can simplify the tree by assuming that the opponent will always make the move that is worst for you. In Wordle, because the solution is random, you have to always consider all possibilities and try to find the guess that is optimal on average.

Still, though, reducing the average number of live solutions as much as possible is a very good approximation of ideal play. Consider the following situation:

A series of Wordle guesses: "raise", "fiend", "pyymy", followed by the correct solution, "tiger"

After the second guess, the computer makes the interesting choice “pygmy”. This may seem counter-intuitive. Not only do we already have two vowels confirmed to be in the word, “i” and “e”, the guess actually contains “y” twice, which reduces the number of letters contained in the guess. As a matter of fact however, this guess is rather astute. The computer will always find the solution on the fourth guess depending on the outcome of this query.

Using ‘A’ to mean absent, ‘P’ to mean present, and ‘C’ to mean correct:

AAAAA → liver
AAAPA → timer
AAPAA → giver
AACAA → tiger
PAAAA → viper
CAAAA → piper

Being in a point in the game tree where there are six descendent guesses means that there are six possible solutions. Clearly, optimal behavior would be to always find the solution on the next turn. A brute force search would find this solution, but so would the heuristic of eliminating as many possibilities as possible. With “pygmy”, we always eliminate five possible solutions, the best result possible in this case.

Using a heuristic like this is much faster than a brute force search, because it generates a guess in each situation without needing any recursion at all.

Results

This was just a quick little project for me, taking a few hours, so the code is relatively unoptimized. Determining the best first guess requires searching every combination of possible guess and possible solution, and this takes several hours. The complete tree is generated in only a few minutes after that. Because it is stored as JSON, a player for the game is included that doesn’t have to do any searching - it simply reads the next guess out of the game tree.

Stats:

The solution is found, on average, in 3.51 guesses. In the worst case scenario, the solution is found is 5 guesses. The histogram of outcomes is as follows:

1: 1 (0.0%)
2: 65 (2.8%)
3: 1085 (46.9%)
4: 1074 (46.4%)
5: 90 (3.9%)

If anyone improves on my solution by utilizing the complete dictionary or achieving a brute force solve of the game, I would be curious to hear how much you manage to improve on these statistics.

Updates

Jan 11: I discovered that the program was treating all guesses that resulted in the same average reduction to the live possibilities as equivalent. We can, without violating the constraints given above, choose to prefer guesses that are also possible solutions to the current puzzle. Adopting this change reduced the average solution distance from 3.68 to 3.51 guesses.

Code

All of my code for this project is open sourced, under a GPL3 license, on Github. This includes:

All the Python code should run on recent versions of Python 3, including the Pypy implementation for additional speed.

©2024 Adam Fontenot. Licensed under CC BY-SA. About Me Projects RSS Feed