# Approximate Nash Equilibrium by Regret Matching

Two-player games can be solved by following a very intuitive algorithm called Regret Matching ​1​. Players modify their action probabilities according to the so-called regrets or advantages, which can be thought as consequences of alternative choices. For a good overview of the topic, see the friendly yet detailed introduction by Neller and Lanctot ​2​.

The following code snippet demonstrates an efficient implementation for two-player zero-sum finite games in PyTorch, and tests on the game called Rock-Paper-Scissors with Crash. In this variant, the scissors crashes against rock, and the loss is doubled. This shifts the equilibrium choices in favour of “paper”, at the exact proportion of 25%/50%/25%. Remarkably, regrets can be used to estimate the approximation accuracy, which approaches the perfect solution under mild conditions ​3​. This is also demonstrated in the snippet.

import torch

## Game: Rock-Paper-Scissors with Crash, the equilibrium achieved with 25%/50%/25%

N_ACTIONS = 3 # 0: rock, 1: paper, 2: scissors
PAYOFF = torch.tensor([[0,-1,2], [1,0,-1], [-2,1,0]], dtype=torch.float).cuda() # gain of row player / loss of col player

## Utils

def getStrategy(cfr):
"""Return a strategy corresponding to observed regrets
Args:
cfr (array): counter-factual regret, shape (N_ACTIONS,)
Returns:
weights (array): strategy, shape (N_ACTIONS,)
"""
weights = torch.clip(cfr,0,torch.inf)
weights = weights/torch.sum(weights)
N_ACTIONS = weights.shape[0]
weights = torch.nan_to_num(weights,nan=1/N_ACTIONS)
return weights

#@torch.jit.script
@torch.compile(mode='max-autotune')
def getEquilibrium(PAYOFF, N_ITER:int = 500, stochastic_advantage:bool=False):
# auxiliary variables
N_ACTIONS = PAYOFF.shape[0]
cumCFR1 = torch.zeros(N_ACTIONS,).cuda()
cumCFR2 = torch.zeros(N_ACTIONS,).cuda()
cumStrategy1 = torch.zeros(N_ACTIONS,).cuda()
cumStrategy2 = torch.zeros(N_ACTIONS,).cuda()
strategy1 = torch.ones(N_ACTIONS,).cuda()/N_ACTIONS
strategy2 = torch.ones(N_ACTIONS,).cuda()/N_ACTIONS
# training loop
for _ in range(N_ITER):
# sample actions and observe regrets
# a) stochastic variant, often implemented in tutorials
action1 = torch.multinomial(strategy1,num_samples=1).squeeze()
action2 = torch.multinomial(strategy2,num_samples=1).squeeze()
cfr1 = PAYOFF[:,action2]-PAYOFF[action1,action2]
cfr2 = - (PAYOFF[action1,:]-PAYOFF[action1,action2])
else:
# b) averaged variant
PAYOFF_avg = strategy1.view(1,-1).mm(PAYOFF).mm(strategy2.view(-1,1))
cfr1 = (PAYOFF.mm(strategy2.view(-1,1))-PAYOFF_avg).squeeze()
cfr2 = (strategy1.view(1,-1).mm(PAYOFF)-PAYOFF_avg).squeeze() * (-1)
# update strategies proportionally to regrets
strategy1 = getStrategy(cumCFR1)
strategy2 = getStrategy(cumCFR2)
# track cumulated regrets and strategies
cumCFR1 += cfr1
cumCFR2 += cfr2
cumStrategy1 += strategy1
cumStrategy2 += strategy2

# averaged strategies converge to Nash Equilibrium
avgStrategy1 = cumStrategy1/cumStrategy1.sum()
avgStrategy2 = cumStrategy2/cumStrategy2.sum()
# estimate approximation error (upper bound)
eps = 2*torch.max(cumCFR1.max(),cumCFR2.max())/N_ITER
torch.cuda.synchronize()
return (avgStrategy1,avgStrategy2,eps)

getEquilibrium(PAYOFF) # eps < 0.03

Regrets are expensive to compute in large games, but can be approximated using modern machine-learning techniques. This approach has recently found many applications, including solvers for Poker and even larger card games ​4,5​ .

1. 1.
Hart S, Mas-Colell A. A Simple Adaptive Procedure Leading to Correlated Equilibrium. Econometrica. Published online September 2000:1127-1150. doi:10.1111/1468-0262.00153
2. 2.
Neller TW, Lanctot M. An Introduction to Counterfactual Regret Minimization. Teaching Materials. Published 2013. Accessed 2023. https://www.ma.imperial.ac.uk/~dturaev/neller-lanctot.pdf
3. 3.
Waugh K. Abstraction in Large Extensive Games. University of Alberta Libraries. Published online 2009. doi:10.7939/R3CM74
4. 4.
Brown N, Lerer A, Gross S, Sandholm T. Deep Counterfactual Regret Minimization. In: ; 2019.
5. 5.
Adams D. The Feasibility of Deep Counterfactual Regret Minimisation for Trading Card Games. AI 2022: Advances in Artificial Intelligence. Published online 2022:145-160. doi:10.1007/978-3-031-22695-3_11