Understanding Engine Evaluations

What you’ll learn: What +1.5 actually means, when to trust engine scores, and when they mislead. Reading time: 8 minutes

When an engine shows +1.47 or -0.83, what does that actually mean? The short answer is “the position is worth about 1.5 pawns to White” or “Black is slightly better.” But engine evaluations are more nuanced than that, and misunderstanding them leads to bad decisions.

This guide explains what the numbers mean, when to trust them, and when the evaluation is misleading.


The basics: centipawns and pawns

Engine evaluations are measured in centipawns. One pawn = 100 centipawns = 1.00 in the display.

  • +0.00 to +0.30: roughly equal; either side could be slightly better
  • +0.30 to +1.00: slight advantage; one side has an edge but the position is playable
  • +1.00 to +2.00: clear advantage; one side is better with correct play
  • +2.00 to +4.00: winning advantage; objectively won with accurate play
  • +4.00 and above: decisive; the game should be over soon

Negative numbers mean Black is better. #5 means mate in 5 moves; #-5 means Black delivers mate in 5.

Where these numbers come from

Traditional engines assign values to various positional factors:

  • Material: Queen = 9-10, Rook = 5, Bishop = 3-3.5, Knight = 3, Pawn = 1
  • King safety: Exposed king, weak squares, attacking pieces nearby
  • Pawn structure: Passed pawns (very valuable), doubled pawns (slightly bad), isolated pawns (moderate weakness)
  • Piece activity: Centralised pieces, control of open files, bishop pair
  • Space and control: Territorial advantage, mobility

NNUE-based engines like Stockfish learn these weights from data rather than having them hard-coded, but the output is still in the same centipawn units.


Evaluations are not predictions

Here’s the critical thing: an evaluation is not a probability of winning.

When Stockfish says +2.00, it means “this position is roughly equivalent to being two pawns ahead.” It does not mean “White will win.” The position might be extremely difficult to convert. It might require 40 moves of perfect technique. For humans, it might even be losing if the winning plan is too subtle.

Conversely, +0.50 might be an easy technical win if the advantage is simple to exploit—for example, a passed pawn in a rook endgame where the winning technique is well-known.

Win/Draw/Loss (WDL) percentages

To address this, modern engines also display WDL probabilities—estimated chances of Win, Draw, or Loss assuming both sides play at a high level.

For example, Stockfish might show:

  • Evaluation: +1.20
  • WDL: 55% / 40% / 5%

This means: “From this position, White wins about 55% of the time, draws 40%, and loses 5%—assuming strong play from both sides.”

WDL is more useful than raw evaluation for understanding practical chances, but it’s calibrated for engine-level play. Against a weaker opponent, your actual winning chances might be higher (or lower, if you’re misplaying the advantage).


When evaluations are misleading

Closed positions

In locked pawn structures where the game will be decided by slow manoeuvring, engines are unreliable at lower depths. The evaluation might swing wildly as the engine calculates deeper and finds different long-term plans.

Closed position with locked pawns

A position might show +1.00 at depth 20 and +0.20 at depth 40 because the engine eventually realises the position is a fortress.

Opposite-coloured bishops

These endgames are notorious for producing wrong evaluations. A position might show +2.00 but be completely drawn because the defender can set up a blockade the attacker can never break.

Opposite-coloured bishops endgame

The engine “knows” it’s up two pawns, but if those pawns can’t advance, the advantage is theoretical only. When you see a significant plus score with opposite-coloured bishops, check whether there’s actually a way to make progress.

Fortresses

A fortress is a defensive setup that can’t be broken even with a large material advantage. Classic example: lone bishop can’t win against a corner-defended king, even with multiple extra pawns.

Fortress position

Engines often don’t recognise fortresses until very deep analysis. The evaluation might show +5.00 for dozens of moves before finally settling to 0.00 when the engine admits there’s no way through.

Opening tabiya positions

In well-known opening positions, the engine’s evaluation reflects objective assessment—but objective assessment isn’t everything. A position evaluated as +0.30 might be easy for White to play and difficult for Black. Another position at +0.30 might offer Black all the practical chances despite the slight disadvantage.

Opening evaluations are where you should trust statistics (how the position scores in practice) more than engine numbers.


Depth and stability

Engine evaluations become more reliable at higher depth, but there are diminishing returns.

  • Depth 15-20: Quick initial assessment. Good for blunder-checking.
  • Depth 20-25: Solid analysis for most positions.
  • Depth 25-35: Deep analysis; sufficient for almost everything.
  • Depth 35+: Only useful for truly critical positions or theoretical analysis.

Watch for evaluation instability. If the score is bouncing around significantly as depth increases, the position is probably complex and the “true” evaluation is uncertain. A stable evaluation that holds across depths is more trustworthy.

Horizon effects

Sometimes an engine prefers a slightly worse position because the worse outcome is beyond its search horizon. This is less common with modern engines and high depths, but it still happens.

Classic example: the engine might avoid a sacrifice that leads to a slightly worse but completely drawn endgame, because it calculates only to move 40 and the draw isn’t visible yet. Instead it might prefer a complicated position that’s objectively worse but where the bad outcome happens on move 50.


Practical advice

For game analysis

  • Don’t obsess over small differences. +0.23 vs +0.35 is noise.
  • Look for swings. Where did the evaluation change by more than 0.5? Those are the key moments.
  • Check the engine’s line. A strong evaluation means nothing if you don’t understand why the position is better.

For opening preparation

  • Use evaluations as tiebreakers, not primary selection criteria.
  • A line evaluated at -0.20 might be fine if it’s easy to play and leads to positions you understand.
  • Check multiple engines. If Stockfish and Leela disagree significantly, the position is probably complex. Chessmate shows both side-by-side so you can see where the lines diverge.

For endgames

  • Use tablebases for positions with 7 or fewer pieces. These are mathematically solved; the evaluation is exact.
  • Be sceptical of large plus scores in endings with reduced material. Check for fortress potential.

For time trouble

  • Trust the evaluation more when you’re out of time and need quick decisions.
  • But trust your calculation if you see a concrete winning line the engine might miss due to horizon effects.

Summary

Engine evaluations are powerful tools, but they’re not truth. They’re estimates that become more reliable with depth but remain imperfect—especially in closed positions, fortress scenarios, and opposite-coloured bishop endings.

The number tells you who stands better, not who will win. Understanding the difference makes you a better analyst.