Mentalis AIMentalis AI
← Blog

Engineering

The Graph Neural Network Behind Draft Sense

By Josh Brehm9 min read

Draft Sense watches your Magic: The Gathering Arena draft and recommends every pick in real time, then assembles your deck when the draft ends. Underneath the overlay is a graph neural network that reads an entire set as a web of relationships. Here’s how it works — and why the design choices behind it matter.

Drafting is a sequencing problem

A draft is three packs of cards, opened one at a time. You take a card, pass the rest, and repeat — around forty-five picks in all — until you have a pool to build a forty-card deck from. Every pick depends on two things: what you’ve already taken (your colors, your curve, the synergies you’re committing to) and what the packs imply about what the other seven drafters aren’t taking.

That is why a static pick-order list — “take card A over card B” — only goes so far. It is context-free. It can’t know that the eighth-best card in a vacuum is your best pick here because it is on-color and fills a hole in your curve, or that an unremarkable common is a powerhouse in the exact archetype you’ve fallen into. And on the day a new set releases, there is no list at all.

Draft Sense treats each pick as a decision conditioned on everything you’ve seen so far. To do that, it first needs a representation of the cards that understands how they relate to one another. That representation is a graph.

A set is a graph

A Magic set is not a flat list of a few hundred cards. It is a dense web. This creature wants that aura; this removal spell answers that bomb; these two cards share a keyword that defines an entire archetype; this land fixes the colors those gold cards demand. The information that matters lives in the relationships.

Graphs are the natural way to represent things and the connections between them, and graph neural networks (GNNs) are the natural way to learn over graphs. So for every set, Draft Sense builds one — and trains a network to reason on it.

The nodes

Every card in the set is a node. There is also one extra node: a single global set node connected to all the others, which we’ll come back to. Each card node carries a feature vector in two parts.

What the card does

A 384-dimensional embedding of the card’s rules text, type line, mana cost, and keywords, produced by a sentence-transformer language model. Two cards that do similar things land near each other in this space even when they share no exact wording — the model has read enough Magic text to know that “destroy target creature” and “exile target creature” are cousins.

What the card is

About sixty hand-built numbers describing the card’s structure: its mana value, its colored-mana requirements (fractional and hybrid-aware, so a cost of 1BB reads differently from 3B), its pip density, its types, its power and toughness, the keywords it carries, whether it produces mana.

Giving that up sounds like a handicap. It is actually the most valuable decision in the whole system — more on that shortly.

The global set node

One node stands for the entire set, with edges running to and from every card. As the network reasons, this node gathers a composition-aware summary of everything in the set — how much removal it holds, how much mana-fixing, how aggressive it skews — and broadcasts that context back out to every card.

So no card is judged in a vacuum. The same two-mana 2/2 means one thing in a slow, grindy set and another in a hyper-aggressive one, and the set node is how the model feels that difference.

The edges

Relationships are typed, and the model handles each kind with its own learned transformation, so “synergizes with” and “answers” never get confused. There are a handful of types:

  • Mechanical synergy — cards that share keywords or themes built to play together.
  • Semantic synergy — cards whose rules text sits close in that language-model space; a softer “these belong together.”
  • Interaction — directed edges for answers: which cards counter which, which removal handles which threats.
  • Membership — the set-node edges above, in both directions.
        +-----------------------------+
        |          SET node           |   aggregates the whole set,
        +-----------------------------+   broadcasts context to every card
           ^            ^            ^
           | in_set     | in_set     | in_set
           v            v            v
        +------+     +------+     +------+
        | Card |<--->| Card |<--->| Card |
        +------+     +------+     +------+
             synergy / counters / removal edges
Two node types and several edge types. The set node links to every card; cards link to each other through synergy and interaction edges.

How the network reasons: message passing

A graph neural network works in rounds. In each round — one layer — every node looks at its neighbors, pulls a message from each one along each type of edge, pools those messages together, and updates its own vector. Sketched in pseudocode:

for each message-passing layer:
    for each card v:
        # gather one message per edge type, averaged over neighbors
        m = sum over edge types r of:
              mean over neighbors u of v along r:  message_r(h[u])

        # update the card's vector, keeping its own identity
        h[v] = h[v] + update(h[v], m)        # the + is a skip connection
One message-passing layer. message_r and update are small neural nets; the skip connection keeps each card's own identity from washing out as information mixes in.

Stack a couple of these layers and information travels. After the first round, a card knows about the cards it directly connects to; after the second, about its neighbors’ neighbors — the local cluster, the shape of an archetype forming around it. What comes out the other end is a single vector per card that blends four things: what the card does, what it is, the set it lives in, and the web of cards surrounding it. Everything downstream reads that vector.

The trick that makes a brand-new set work

Here is the decision that ties the design together. Every learnable transformation in the network is pointwise: it operates on one fixed-width feature vector at a time, never on the number of cards or which cards they happen to be. The message function takes one neighbor’s vector in and sends one message out, whether the set has two hundred cards or three hundred, whether it has ever seen those particular cards or not.

That makes the learned weights portable. Train the network across a couple dozen past sets, and the very same weights run, unchanged, on the graph of a set that released this morning. Combined with the structural-only features — every one of them printable on day one — this is what lets Draft Sense draft a new set the day it drops, with no play data in existence yet. As the community generates data over the following weeks, we fine-tune for that specific set; but the assistant is useful from the first hour.

From card vectors to decisions

The GNN is the shared foundation. Two models read its output.

The picker

It builds a “what do I want right now” query from your current pool and where you are in the draft — your colors, your curve, your removal count, which pick this is — and then scores the pack. The important part is that it scores the pack as a whole, with attention across every card at once, so it can reason relatively: this is the best card here; or, I’d take A unless B were also in the pack. A model that rated each card in isolation could never express that. This one can.

The deck builder

When the draft ends, a Transformer reads your entire pool and predicts two things at once: which cards belong in the maindeck, and how many of each basic land to run. That turns a messy pile of forty-odd picks into a tuned forty-card deck, mana base and all.

How it learns

Both models train on a large corpus of real human drafts and the game results that followed, drawn from public data. The loss is outcome-weighted: drafts that went on to win count for more, though every draft contributes something. The result is a single generalist trained across many sets, which can be sharpened set-by-set as fresh data arrives.

Why this shape

Three properties fall out of the design. It is context-aware — it sees your whole pool and the whole pack, never one card in isolation. It is synergy-aware — the graph is built from how cards actually relate. And it is launch-day-ready — structural features plus portable weights mean it works before the community has played a single game.

More than that, it is how we like to build at Mentalis: find the hidden structure in something that looks like noise — a pack of cards, a set of a few hundred — and make it legible. Draft Sense is that idea turned into software.

Machine LearningGraph Neural NetworksMagic: The Gathering