Maru version 8.1

Maru is a computer Go program developed using deep reinforcement learning from randomly generated game records. The deep learning model of Maru incorporates large-kernel depthwise convolutions and multi-head attention, enabling it to efficiently grasp the overall state of the board. Its reinforcement learning procedure is inspired by approaches used in Katago and Gumbel AlphaZero, allowing it to efficiently learn a wide range of patterns. This page shows the improvement in playing strength through reinforcement learning and the results of self-play matches.

Maru is a sibling program of the computer Shogi program Gokaku. Maru shares the same deep learning model architecture, search algorithm, and reinforcement learning methodology as Gokaku.

This page shows the changes in playing strength during the reinforcement learning process of the latest version, Maru version 8.1, which is run with 500 visits. The Elo rating is calculated by playing matches against 60 models of nearby generations. The Elo ratings of other baseline programs are calculated based on their match results against Maru (these are not objective indicators of each program’s actual strength).

The executable program and model files for Maru are available on GitHub. The progress of reinforcement learning for previous version, Maru version 8.0, is shown on here. If you have any questions, please contact Atsushi Takeda.

Ratings

Self-Play Games

History

2025/08/04: Reinforcement learning of Maru 8.1 has started from game records with random moves.

2025/08/04: Training of the 1st model, which consists of 4 blocks and 128 channels, has started.

2025/08/26: The 1st model training has been stopped when the number of generated game records reached 2.5M.

2025/08/26: Training of the 2nd model, which consists of 8 blocks and 192 channels, has started.

2025/09/25: The 2nd model training has been stopped when the number of generated game records reached 4M.

2025/09/25: Training of the 3rd model, which consists of 12 blocks and 256 channels, has started.

2025/10/27: The 3rd model training has been stopped when the number of generated game records reached 5M.

2025/10/27: Training of the 4rd model, which consists of 20 blocks and 384 channels, has started.