Pluribus Ai dominates… by losing 70K$ to top pro poker players (bonus Pluribus VPIP)

Pluribus poker AI is the greatest AI and poker news of 2019. We can find out how Pluribus plays poker. To great fanfare, Facebook and Carnegie Mellon University announced the domination of AI at 6max NLHM poker. With marketing savvy, the article in venerable Science magazine was published at the peak of the WSOP championships of poker on 11 July 2019. Domination was demonstrated in mbb/100 using AIVAT, a sophisticated variance (luck) removal algorithm that works for AI machines, not for human. So luck removed for Pluribus, maintained for humans, sort of thing.

Funny enough, the researchers omitted to publish actual results, ie 70k losses this time, but did not hesitate to publish them in their previous venture when Libratus beat pros at heads up NLHM 2 years ago. So although a bit is playing, we witness very human behaviour. It’s like any random poker player, when she wins, she brags about the results, when he loses, he blames it on variance or bad beats! I guess the difference is these guys are actually able to prove it.

Anyway the thing is, AVAIT is way beyond my understanding but widely recognised as valid in AI academic circles. So let’s admit it Pluribus AI does dominate.

It just shows the extraordinary variance of poker. In other words, considering reasonable 6 hour sessions, being the best poker player in the world, after 50 live sessions or 16 online sessions, you can easily be down 70k and worry whether it is bad play or bad luck. Yeah, don’t quit your day job.

The thing is, previous advances in AI, by University of Alberta Cepheus for limit heads-up and Deepstack for heads up no limit and CMU Libratus were a revolution for all decent poker players introducing GTO and Nash theorem to the masses and yielding innumerable threads on GTO vs exploitative strategies.

So the cool thing is CMU did post the 10k hands, and they’re all in my database, so I’ll be writing a series of posts analysing how Pluribus plays poker (and the pros play). 10k hands is not that many, but believe me there are learnings, like opening ranges, bet sizings, easy SB opening strategy, irrelevance of Cbet? Etc.

By the way, Pluribus doesn’t play GTO strictly speaking and its algos don’t converge to a Nash equilibrium. Still it’s trying not to be exploited, not to exploit, although I do have some doubts on the post blueprint searches if they’re based on hands against humans, maybe that’s slightly exploitative, but this is total speculation and the researchers say it doesn’t take other players actions into account.

In the first post, I’ll focus on LJ RFI ranges (pre-flop first to speak, what do we open with)

And for now, just one stat, VPIP (voluntarily put money in pot, ie percentage of hands where pluribus decided to play/risk money) is 26.4%, no big surprise here, many of us are around 24% and if we played postflop perfectly, sure we’d open a bit more…

6 thoughts on “Pluribus Ai dominates… by losing 70K$ to top pro poker players (bonus Pluribus VPIP)”

  1. Thanks for the article!, reading the full series. If the developers decided to use Pluribus in an online poker site, with such a great variance, do you think they would make tons of cash or it just would not be worth it?.

  2. So Pluribus learns in an empirical way and doesn’t try to exploit other players…..but does it recognise when a player is loose for example? Would it adjust it’s ranges against Player 3 on the BN vs Player 4 on the BN or does it just look at her cards and apply the same strategy within her own standard context? She doesn’t have her own internal hud that tells her how to adjust?

    1. The article in Science magazine says it doesn’t adjust to other players. So I guess no. I don’t totally believe it because as Pluribus plays against humans, he solves new decision points he hasn’t seen before, hence somehow takes other people’s behavior into account a bit.

      Anyway according to creators, answer is nope.

      1. Considering how much inherent value there is in making these adjustments particularly at a static 6 max table where you are playing the same players for a number of hands it’s even more remarkable that it was able to achieve that ev. I found it puzzling that there were so many hands that it played a huge overbet when it got to the river with the effective nuts even though the humans normally responded by folding since bluffing with that bet size seems less than optimal. I know the theory is that the few times that you get called makes it worthwhile and the fact that it continued to do it means that it was profitable but it seemed to get so many folds

        1. I think this is long earned and experimented variance adjusment by pros. IE most very good players fold the river more than theory says. It is EV+ long term to call, but if it kills, destroys, damages bakroll significantly, it is not worth the long term gain.

          I the 2/5 live games and 100NL I play, calling a big river bet is rarely a good thing for me. So I end up folding to more bluffs than I should. I think this is the case for most winning players. Don’t really know how it is at high stakes with the best players of course, but I suspect similar…. especially in huge games with really 1M$ in the pot. Its just ok we lose some small edge, but will gain it in other spots that hurt less sort of thing.

Leave a Reply

Your email address will not be published. Required fields are marked *