Two-player games can be solved by following a very intuitive algorithm called **Regret Matching** ^{1}. Players modify their action probabilities according to the so-called regrets or advantages, which can be thought as consequences of alternative choices. For a good overview of the topic, see the friendly yet detailed introduction by Neller and Lanctot ^{2}.

The following code snippet demonstrates an efficient implementation for two-player zero-sum finite games in PyTorch, and tests on the game called **Rock-Paper-Scissors with Crash**. In this variant, the scissors crashes against rock, and the loss is doubled. This shifts the equilibrium choices in favour of “paper”, at the exact proportion of 25%/50%/25%. Remarkably, regrets can be used to estimate the approximation accuracy, which approaches the perfect solution under mild conditions ^{3}. This is also demonstrated in the snippet.

```
import torch
## Game: Rock-Paper-Scissors with Crash, the equilibrium achieved with 25%/50%/25%
N_ACTIONS = 3 # 0: rock, 1: paper, 2: scissors
PAYOFF = torch.tensor([[0,-1,2], [1,0,-1], [-2,1,0]], dtype=torch.float).cuda() # gain of row player / loss of col player
## Utils
def getStrategy(cfr):
"""Return a strategy corresponding to observed regrets
Args:
cfr (array): counter-factual regret, shape (N_ACTIONS,)
Returns:
weights (array): strategy, shape (N_ACTIONS,)
"""
weights = torch.clip(cfr,0,torch.inf)
weights = weights/torch.sum(weights)
N_ACTIONS = weights.shape[0]
weights = torch.nan_to_num(weights,nan=1/N_ACTIONS)
return weights
#@torch.jit.script
@torch.compile(mode='max-autotune')
def getEquilibrium(PAYOFF, N_ITER:int = 500, stochastic_advantage:bool=False):
# auxiliary variables
N_ACTIONS = PAYOFF.shape[0]
cumCFR1 = torch.zeros(N_ACTIONS,).cuda()
cumCFR2 = torch.zeros(N_ACTIONS,).cuda()
cumStrategy1 = torch.zeros(N_ACTIONS,).cuda()
cumStrategy2 = torch.zeros(N_ACTIONS,).cuda()
strategy1 = torch.ones(N_ACTIONS,).cuda()/N_ACTIONS
strategy2 = torch.ones(N_ACTIONS,).cuda()/N_ACTIONS
# training loop
for _ in range(N_ITER):
# sample actions and observe regrets
if stochastic_advantage:
# a) stochastic variant, often implemented in tutorials
action1 = torch.multinomial(strategy1,num_samples=1).squeeze()
action2 = torch.multinomial(strategy2,num_samples=1).squeeze()
cfr1 = PAYOFF[:,action2]-PAYOFF[action1,action2]
cfr2 = - (PAYOFF[action1,:]-PAYOFF[action1,action2])
else:
# b) averaged variant
PAYOFF_avg = strategy1.view(1,-1).mm(PAYOFF).mm(strategy2.view(-1,1))
cfr1 = (PAYOFF.mm(strategy2.view(-1,1))-PAYOFF_avg).squeeze()
cfr2 = (strategy1.view(1,-1).mm(PAYOFF)-PAYOFF_avg).squeeze() * (-1)
# update strategies proportionally to regrets
strategy1 = getStrategy(cumCFR1)
strategy2 = getStrategy(cumCFR2)
# track cumulated regrets and strategies
cumCFR1 += cfr1
cumCFR2 += cfr2
cumStrategy1 += strategy1
cumStrategy2 += strategy2
# averaged strategies converge to Nash Equilibrium
avgStrategy1 = cumStrategy1/cumStrategy1.sum()
avgStrategy2 = cumStrategy2/cumStrategy2.sum()
# estimate approximation error (upper bound)
eps = 2*torch.max(cumCFR1.max(),cumCFR2.max())/N_ITER
torch.cuda.synchronize()
return (avgStrategy1,avgStrategy2,eps)
getEquilibrium(PAYOFF) # eps < 0.03
```

Regrets are expensive to compute in large games, but can be approximated using modern machine-learning techniques. This approach has recently found many applications, including solvers for Poker and even larger card games ^{4,5} .

- 1.Hart S, Mas-Colell A. A Simple Adaptive Procedure Leading to Correlated Equilibrium.
*Econometrica*. Published online September 2000:1127-1150. doi:10.1111/1468-0262.00153 - 2.Neller TW, Lanctot M. An Introduction to Counterfactual Regret Minimization. Teaching Materials. Published 2013. Accessed 2023. https://www.ma.imperial.ac.uk/~dturaev/neller-lanctot.pdf
- 3.Waugh K. Abstraction in Large Extensive Games.
*University of Alberta Libraries*. Published online 2009. doi:10.7939/R3CM74 - 4.Brown N, Lerer A, Gross S, Sandholm T. Deep Counterfactual Regret Minimization. In: ; 2019.
- 5.Adams D. The Feasibility of Deep Counterfactual Regret Minimisation for Trading Card Games.
*AI 2022: Advances in Artificial Intelligence*. Published online 2022:145-160. doi:10.1007/978-3-031-22695-3_11