What is this?

Have you ever wanted to share a crossword with someone and solve it together? My friends and I did, and we used to periodically send pictures of a page from a newspaper or book to each other. We'd message each other answers and have to ask them to take another picture of the crossword so we could more easily solve other clues.

This works, but your friends sometimes take a while to send you the picture. Sometimes, they can't send you another picture because they're too busy, but you want to spend more time on the puzzle. Waiting for them to send you another picture many hours later can break your solving stride.

This is where this project comes in. There are two main aspects to flow above. You need to be able to take a picture of a puzzle you have, and you need to be able to solve it in real-time with your friends. You can begin solving puzzles from the main page, and you can start digitising your puzzles with the Android client.

The main parts of the project are an Android client that lets you take a picture of a crossword, digitise, share and solve it and a server made from scratch (mostly) using Rust.

Creating an Android app for digitising crosswords

I split the process of digitising the crosswords into determining the structure of the crossword grid and detecting the hints. The structural information and the hint form a "clue". Users first capture the grid with one photograph with the help of a viewfinder that gives them live feedback by way of a contour which highlights where in the image the crossword may be located, and then are given an interface for extracting clues where they select a region of interest to scan. While it was conceivable that with one button press, I could digitise a crossword, in reality, it was much more reliable to let the user have a chance to redo the optical character recognition part of the algorithm if there were inaccurate scans.

I created a classical image processing pipeline using OpenCV to process a photograph of a crossword grid into a data structure. This algorithm was prototyped in Python and eventually translated into Kotlin (original repo) for the Android App. The diagram below gives an overview of how this algorithm works. This part of the process for digitising the crossword works fairly robustly and produces a "hint-less" version of the final puzzle.

The processing algorithm
Schematic of the processing pipeline for extracting the grid. First, the grid is found by looking for the largest square in the frame. Next, the cells within the grid are located. Finally, the cells are grouped into clues.

To digitise the hints for the clues, I used a ready-made implementation of optical character recognition (OCR) developed by Google. This was combined with regexes to determine how to map the text on a page with the clues in the data structure representing the puzzle. The hints often take a form like 10. Argue Score? (8,4) [1]. Since OCR sometimes mistakes characters like "5" for "S", the regexes for digitising the hints needed to account for several characters that were found to be problematic.

The Android client was made with the Jetpack compose framework and followed an MVVM architecture. The repo is here. The app has screens for scanning the puzzle, authentication/sign-up screens, and a screen for solving puzzles. A demo of the app is shown below.

Digitising crosswords with the app.

Creating the server for solving crosswords collaboratively in real-time.

There are plenty of frameworks with which you can make a server, but I decided to make my own from scratch using Rust. The Rust Book gives you a good starting point for making a multithreaded web server, so I based my project on this. On top of the server described in the Rustbook, I implemented a significant portion of the WebSocket protocol, a route handler to make it easy for me to add an endpoint with error handling, a response builder to make creating responses easy, a cookie-based authentication system, and a way to manage multiple users interacting with the same puzzle.

The source of truth for the puzzle is located on the server. An overview of the interactions between the client and the server is shown in the diagram below. If the puzzle is not being solved, it is stored on disk. If someone has established a WebSocket connection with the server for that puzzle, then that puzzle will be loaded into memory. Whenever the user updates one of the letters in the grid, they are sending a small message to the server over the WebSocket. The server receives this data, and it acquires a mutex on the crossword data structure, updates it, and broadcasts the change to all the connected clients. When the server detects that there are no more clients attached to the puzzle, the puzzle is serialised, and the on-disk puzzle is updated.

Each WebSocket connection established three threads. One for receiving messages from the client, one for sending messages to the client from the other clients, and one for sending ping/pong messages. It is implemented with threads and multi-sender-single-consumers rather than using an asynchronous Rust runtime. The client is removed from the puzzle pool if the ping/pong messages are not responded to or if a close message is received.

A doodle showing how the client and the server work.

The following types summarise how the route handler works.

type HandlerFn = fn(&HttpRequest, Arc<Tera>, TcpStream) -> Result<(), HandlerError>;
type RouteMapping = HashMap<&'static str, HandlerFn>;

The HandlerFn takes an enum that is designed to represent the HTTP verbs (but I never implemented them all), a Tera struct for rendering HTML and the TCP stream. It returns a result which is used to provide default error handling like 400/500 responses. The route mapping lets you specify a route with a regex. The interaction between an incoming TCP stream and the endpoint is managed by an API struct. A simple route (e.g. this page) can be set up with a relative amount of ease.

fn main() {
    ...
    let mut routes: RouteMapping = HashMap::new();
    routes.insert(r"^/about$", about_html);

    ...
    let tera_arc = Arc::new(tera);
    let api: Api = Api::register_routes(routes, tera_arc);
    ...
}
    
fn about_html(_req: &HttpRequest, tera: Arc<Tera>, mut stream: TcpStream)  -> Result<(), HandlerError> {
    let context = tera::Context::new();
    let contents = match tera.render("about.html", &context){
        Ok(contents) => contents,
        Err(error) => return Err(HandlerError::new(stream, Error::new(ErrorKind::Other, format!("{}",error))))
    };
    
    let response = ResponseBuilder::new()
        .set_status_code(StatusCode::Ok)
        .set_html_content(contents)
        .build();

    match stream.write_all(response.as_bytes()) {
        Ok(_) => Ok(()),
        Err(error) => Err(HandlerError::new(stream, error))
    }
}

That was a very brief overview of how the server works. Check out the repo if you're interested in looking at the code in more detail.

Closing words

This project is probably always going to be 80% done since I can always think of some other nice feature to add, some detail to finesse or a refactor that would improve the quality of the code. A non-exhaustive list in no particular order:

However, this project already has enough of the features I want, and I've got other projects I want to work on, so this project will be put into the feature-freeze mode for the time being.

[1] The answer is question mark.