A case study in innovation

Innovation starts with solving personal problems. Every chess player writes down their moves on a chess scoresheet during a tournament game to analyze soon after. The annoying part is you have to record your moves once during the game, and then again into a computer to analyze the moves you played. On top of this, it’s super easy to keep highly disorganized computer files (keeping track of moves in our minds, unfortunately, doesn’t translate well to keeping track of papers in real life). Instead, we store loose scoresheets in the musty depths of a backpack — crumpled up to oblivion.

One year ago, my friend Alex and I were in the final round of a team chess tournament. With five minutes on the clock, humongous cameras from a local TV station flooded my surroundings. After a blitzkrieg of moves, I had him! Check… mate?

Turns out, I had sacrificed a rook for nothing, and the match was lost. Alex said, “Looks like another night in a dark room with Stockfish.” (A common chess engine). I didn’t have the will to open my laptop, type those moves in, and relive my darkest moments. But my drooping eyes suddenly jolted up: why not make a system that scans scoresheets and digitizes them? Alex was equally excited, and the quest began.

Introducing Reine

Today, we present Reine: an open-source chess scoresheet scanner that can easily be adapted for other handwriting scanning purposes. Chess players currently have two options: one is to write moves down while playing a game, then manually type them into a computer. This manual entry can take up to 30 minutes for a whole tournament, or 4–5 hours for a tournament director to enter all the games. The other option is to purchase a $500 digital chess recorder that has been approved by the US Chess Federation (MonRoi or Plycounter). Now, Reine allows you to scan a single game in less than 10 seconds.

The goal of this article is to teach how a complete, ready-to-ship machine learning and computer vision project can be put together. Also, while the program currently works in one specific domain for one custom scoresheet, our hope is that it will be clear how you can generalize this for any kind of high-accuracy automated form processing.

Preliminary Planning

Starting out, we asked ourselves a few key questions: Why hasn’t our idea been done before? What’s difficult about it? What can we do to mitigate those issues?

With scannable scoresheets for chess, the problem is accuracy. We didn’t want to go through the trouble of recognizing connected handwriting scrawled over the constipated boxes of conventional scoresheets like this one:

That would be difficult to build software for — and it’s just one variant of a conventional scoresheet. Creating software for multiple types would be nearly impossible. For these reasons, we decided to split up the moves into characters in a custom scoresheet, also allowing more vertical space for each character and some QR-esque boxes that will later be used for alignment. This is what it looks like:

This way, characters are written individually (and mostly) within the gridlines. A clean copy of our scoresheet is available here.

Building the Model

If you hadn’t guessed already, we’ll be using a CNN trained on EMNIST to recognize characters. But how can we maximize accuracy? With 62 classes, high accuracy is very difficult to achieve!

If you don’t know how chess notation works, please read an article like this one.

How did we decide that five characters per ply* were enough? A database of 20,000 Lichess games was analyzed for the number of characters per move, and we came up with this distribution:

*A ply is one move for one side, or half a move (e.g. “d4”)

We’re able to get 99.9% of moves with a maximum of five characters. We don’t want to ignore the longer moves, but this can be solved by looking at the general possible move formats for each number of characters:

One very useful pattern arises: if we ignore checks, the last two characters of any move MUST notate a square on the chessboard — a lowercase letter, followed by a number. Also, moves that are 6 characters long become only 5 characters long. Thus, we decided: do not write a “+” for checks on a Reine scoresheet!

Also, 99% of pawn promotions are to queens, so we assume that any pawn reaching either end of the board is promoted to a queen. This brings 7 character moves down to 4 character moves — and no move takes more than 5 characters to notate on a Reine scoresheet. With these adjustments, the longer moves won’t be ignored.

This is one example of the difference between copy pasting and new work. Think about new techniques and optimizations from a broader perspective of your end goal, with the whole project in mind, then get help from StackOverflow as you run into execution problems. There will be more examples of this kind of thinking throughout in the steps below.

Let’s get from a smartphone picture to a list of predicted characters first — the important part.

Step 1: Align Image

We want to use the “ArUco” markers in the four corners here to align this image, so that all we see are the moves. Let’s import everything we’ll need for the whole project:

We use the ArUco markers for pose estimation (finding the direction the paper is pointing in 3D space), and then correct for those values to make the scoresheet look face-on. We got some help with alignment from here, an attempt at a similar project. This is our two-step process:

  1. Find ArUco markers with a built-in function, 4x4 markers in this case. This function returns multiple values per marker (coordinates and marker ID), so the function get_data() gives us just coordinates.
  2. The OpenCV docs describe the get_transform() function: “For perspective transformation, you need a 3x3 transformation matrix…. To find this transformation matrix, you need 4 points on the input image and corresponding points on the output image.” The function cv.getPerspectiveTransform() finds the matrix, and cv.warpPerspective() uses it to adjust the image perspective.

Step 2: Remove Shadows

Now, we’ll use OpenCV to remove shadows from the image. We make a copy image and from this copy subtract only the pixels we want to keep (i.e. not shadows), and then subtract this copy image from the original image to get a de-shadowed image.

  1. Dilate & median blur the image to get rid of thin lines (text, handwriting, gridlines).
  2. Subtract this new image from the original image, essentially subtracting out shadows.
  3. Step 1 isn’t perfect, so everything will be lighter than the original image. Normalize to increase contrast.

Here is the code we used.

This is what we have now:

Step 3: Cut Up Boxes

Thankfully, others have come up with a method to detect gridlines and find contours after. Here is an overview and the edits we had to make.

1. Define horizontal and vertical kernels that you’ll essentially run over the whole image, looking for horizontal and vertical lines.

2. Create a new image out of those kernels combined.

3. Dilate and then Erode (i.e. close) all the lines. Contours that have bumpy borders or missing pixels won’t be detected properly by the findContours() function otherwise. This is our addition — because we cannot read what we cannot see, it’s imperative that ALL 500 boxes are accounted for (the number of possible characters on our scoresheet). We add these lines to our image before the findContours() function is used.

4. Finally, sort the boxes left-to-right, top-to-bottom. We sort the left and right halves of the scoresheet separately, of course. This is a replacement for the given get_contour_precedence() function in the Medium article, which only sorts top-to-bottom.

5. Sorting: The top-to-bottom sorting is done first with the function shown directly below. The variable row_y has 25 y-values (for 25 moves/rows) — the y-values of the contours will be close to exactly one of these values. From there, sorting is done left-to-right using boundaries given by cv.boundingRect() and separately for each half of the sheet.

And now, we’re here:

But EMNIST looks like this!

Step 4: Preprocessing Our Data

We’re lucky — the paper that summarizes the creation of the EMNIST dataset from the NIST government dataset (which contains natural characters similar to the ones we had cut up) explains how to preprocess each character:

We do the following:

  1. Resize characters to 138x138 Because boxes are taller than they are wide, we add horizontal whitespace to maintain character proportion.
  2. Invert the image.
  3. Threshold the input image, making it only black and white.
  4. Apply a Gaussian blur to reduce bumpiness and noise.
  5. Cut off some lines on the outside so that any of the gridlines aren’t included in the input to the model. Now we have a 128x128 image.
  6. Center the image by mass.
  7. Repeatedly delete one pixel from each side as long as all border pixels are black. Note: because less whitespace is removed for larger characters, they will have thinner strokes after we resize to 28x28, and vice versa. This is the reason for step 8.
  8. We add one step not done in the EMNIST paper: after removing black border pixels, we erode smaller images and dilate larger images so that stroke width is proportional to the size of the image. (This significantly improved results.)
  9. Add 2px border padding, as per the EMNIST paper.
  10. Resize the image to 28x28. Use the “bicubic interpolation” method to create variance in whites (i.e. the edge of a stroke is lighter than its center). This is important in character recognition because information about the edge of a stroke tells the model where the character ends.

This chunk of code contains all of the preprocessing, along with comments:


Step 5: Recognition

With 62 classes, the best accuracies for EMNIST we found were under 90%, even for the validation set. Real handwriting would be worse. Looking at the generalized possibilities ignoring checks, we can tackle this problem by limiting the classes to only the characters that could actually appear at each position in a chess move:

Here are the models we need, and the improvements from the original 4% random classification accuracy you would have with 25 classes:

Each model now has a heavily reduced number of classes — and therefore, higher accuracy. (Note: the “Letter” model does not include “x,” and the “number” model includes “O” to account for castling.)

Now we have characters that look like the EMNIST dataset, and we know exactly what classes to use for our CNN. We just have to train them. You can use any model you’d like: an ensemble, a single CNN, or experiment with different numbers of epochs.

Two aspects were particularly important for us, though: choosing only the necessary characters of EMNIST to train, and data augmentation. We used this Kaggle kernel, but with the following necessary adjustments:

1. Remove unnecessary characters from the training data. Recall that we need five different nets: Letters, Numbers, Pieces, Pieces or Letters, and Letters or X’s or Numbers. We illustrate how you might make the Letters or X’s or Numbers net only, since it’s the most complicated of the required models.

2. Data augmentation. Since people will write erratically during a chess game, we don’t know that the postprocessing will give us a perfect EMNIST-type fit. This is especially true if people write at angles or heavily weight a character to one side (messing with the mass-centering in preprocessing). So, we want augmentation to look something like this:

Tweak these numbers to figure out what amount of augmentation works best for your project.

Our code with heavy commenting can be found here, but other than the above changes it’s not much different from a standard EMNIST+CNN procedure.


Step 6: Postprocessing (last one!)

Our postprocessing is best explained by example:

The model gives us a confidence for each possible character in each box. It might say, N-90%, K-10%, B/R/Q-0% for one of the “piece” boxes. We check each possible move composed by characters to which our model assigns a high enough confidence. (We check at least the top 2 characters, regardless of confidence). For example, consider the string of moves 1. d4 d2. Black can’t play d2 on move 1! But we look at the next most likely predictions, perhaps d3 for white and d5 for black. The correct move should be one of the following:

1.d4 d2

1.d4 d5

1.d3 d2

1.d3 d5

Only 1.d4 d5 and 1.d3 d5 are valid, so we ignore the other two and continue to the next move.

Now, let’s examine the code, which can be found here:


Postprocessing consists of two main steps:

  1. Translate the list of character probabilities into real moves.
  2. Determine which combinations of moves form a valid game.

Step 1 is fairly intuitive. We start by filtering out unlikely characters for each box (as explained previously). We then use the itertools.product() function to find all the possible combinations of these characters (meaning, the moves).

Before we start checking whether these moves are valid, we further decrease the number of moves we have to check: we used five distinct models due to restrictions on which characters can be in a given position for a move, and there are additional restrictions based on the other characters in the move. (While “Bbc5” and “bxc5” are valid moves, “bbc5” is not!) The following function eliminates such illegal moves:

Next is step 2: finding all possible legal games. The Python Chess library [LINK] is useful here, and we take advantage of its Board objects, which represent a single chess position (chess players know these as FENs). We had mercy on our RAM (but sacrificed a little speed) by repeatedly calling a function to update global variables on a move-by-move basis.

In the following code, boards contains lists of “moves”, which are lists of all the boards that are valid for that move number (boards[0][0] was initialized to the starting position). The global pgn is structured similarly but contains the lists of moves made to arrive at those boards (Board objects store only position data).

Run check_iterative() once for each ply, and the result is your PGN. All that’s left is the formality of reformatting the moves into a nice PGN string, and finally a .pgn file, which looks like this:

It turned out we had two mistakes on this scoresheet, giving us over 96% accuracy. Fix a few moves, open in a chess engine GUI, and…


Thanks for reading! Email us at [email protected] or [email protected] if you’d like to get involved or have any questions. Our company email is [email protected], if you have any business inquiries. Good luck with your next revolutionary idea!

Here’s the website for the project:

Reine - Chess Scoresheet Scanner

You can find the open-source code here:


Chess.com blog post: chess.com/blog/ReineChess/scannable-scoresheets-free

Scannable Scoresheets (free!)

Hacker News post (#1 on Show HN!):

Show HN: Scannable Chess Scoresheets | Hacker News

~ Rithwik Sudharsan & Alex Fung

Editor’s Note: Ready to dive into some code? Check out Fritz on GitHub. You’ll find open source, mobile-friendly implementations of the popular machine and deep learning models along with training scripts, project templates, and tools for building your own ML-powered iOS and Android apps.

Join us on Slack for help with technical problems, to share what you’re working on, or just chat with us about mobile development and machine learning. And follow us on Twitter and LinkedIn for the all the latest content, news, and more from the mobile machine learning world.