The Problem of RoShamBo

Aaron Davidson

CMPUT 499/605

Ocotber 1st, 1999

I. The Problem of RoShamBo

In RoShamBo there is really only one problem to solve — predicting what your opponent will do next. However, for many opponents, this will be either impossible (if they use any randomness) or next to impossible if they are very complex. This inability to achieve a perfect prediction creates several sub-problems that must be solved. Is an opponent using randomness in any way or have we simply not identified their deeper strategy? Dealing with a random or quasi-random opponent is much different than dealing with a non-random opponent. Failing to identify this characteristic in an opponent is a non-maximal strategy, and misidentifying an opponent as one or the other can be very costly.

In this paper, I will present two different approaches to this problem. The first uses a traditional approach, while the second is an attempt to evolve a solution with genetic programming.

II. Granite

Granite went through three iterations of design and improvement. I will describe each version in detail, leading up to the final version 1.3 entered in the RoShamBo contest.

Granite I

The first version of Granite was a very simple frequency counter which examined only a few obvious contexts. Each frequency was then added to a final freqency count to give a triple {r,p,s} where each value was the sum of the observed games where in the current context, rock, paper, or scissors was played by the opponent. The maximum value of the triple was chosen as the prediction of the opponent's most likely next action, so Granite would then counter that predicted action.

Granite I kept counters for a small handfull of contexts. At depth zero, there were the overall totals for each type of action. At a depth of one, there was a total for each type of action given what their last action was, as well as what Granite's last action was. At a depth of two there were counters for each action given the last two actions made by either Granite or the opponent. Even with such a simple prediction module, Granite scored fairly well, clobbering most simple bots.

Granite II

With Granite I, the bottleneck was not in predicting simple opponents, but in being too predictable itself to attack by smarter opponents. Granite II added an elegant solution which retained the ability to relentlessly clobber predictable opponents while allowing Granite to defend itself moderately against the tougher ones. This method is discussed in parts (1) and (3) of Granite III.

Granite III

Granite III is a simple, but fairly successful program consisting of three core systems. The first system evaluates Granite's play, the second part attempts to analyze the opponent's play, and the third piece puts everything together and chooses an action.

(1) Self-Evaluation

Self-Evaluation consists of an analysis of our performance against the opponent. Granite II calculated its overall success against the opponent. Granite III modified to only observe the success during the last twenty games, which gives a more accurate view of the current success rate. From this evaluation a noise factor is created. The noise factor is inversely proportional to Granite's success rate — as Granite's success at predicting the opponent decreases, the noise factor increases. The noise factor will be used later to introduce randomness into its strategy. This has the effect of Granite's play becoming defensively random when its ability to predict the opponent suffers.

(2) Contextual Frequency Calculation

The contextual frequency calculations are done as described in Granite I, although granite III adds a depth of three as well as two other simple context counters of depth two which mix the last two observed moves between the opponent and Granite.

Contexts are created from depth zero to three and are based on both the opponents past last three actions as well as Granite's own past three actions. For each context, the frequencies over all games are determined and each context is summed equally to create a final probability triple for the likelihood of the opponent's next action. This module could be improved substantially by extending the contexts further and by weighting each context by significance rather than giving each an equal vote.

(3) Action Generator

In order to choose an action, a probability triple is created from the totals of each action. This is where we combine the noise factor from (1) with the final probability triple.Instead of choosing the most probable action immediately, the lesser probabilities are multiplied by the noise factor, and the highest probability action is assigned 1 - the new lesser probabilities. A move is then selected from this reweighted triple with the help of a random spinner.

Note that if the noise factor were zero, the strategy would be to always choose a move to combat the opponents most likely action, just as Granite I did.

Example:

  (1) We see that we have lost 4 of the last 20 games.
      noise = 0.2

  (2) Triple generated:
      (r = 0.45, p = 0.30,  s = 0.25) 
 
  (3) The two smallest values are multiplied by the noise:
      (r = 0.45, p = 0.06,  s = 0.05) 
      The highest value is set to 1 - (0.06 + 0.05):
      (r = 0.89, p = 0.06,  s = 0.05) 
 
   Granite would then choose good ol' rock 89% of the time, 
   paper 6% of the time, and scissors 5% of the time.

(4) Experiments

A few other brief experimental features were tested briefly before the RoShambo International Contest deadline. Any experimental additions were removed if there was no observed improvement in my test trials.

One of the first experimental features was to try a quick and dirty analysis for significance of patterns detected in each context. Each counter was multiplied by a factor calculated by how many intervals (interval size was a parameter) away from a random distribution the results were. It was surprising that this yielded no change in Granite’s strength (in my tests run against older versions of Granite). Whether this was due to an error in my code, my using adhoc values rather than true standard deviation calculations, or that the change was simply not affecting the overall weak link in Granite, is unknown.

Another change which also surprised me by not changing Granite’s performance was a smarter way to generate noise. Granite III generates its noise based on its success rate. The problem with this method is that if the noise level were to rise early on, due to difficulty in the first few games the noise may never drop to a low level again, even if further observations allowed us to make much more accurate predictions. If we start with a high noise value, we have a high probability of not choosing our prediction. If our prediction ability has been growing, but we have been playing with a high level of noise, we are likely to be only breaking even, which perpetuates a high noise level. The solution was to measure our success rate in predictions rather than actual wins. By keeping track of our predictions regardless of what noise may have had us choose, the actual ability to predict our opponent is measured. In this case, high noise would make Granite play randomly until enough observations had beed gathered to make successful predictions at which point the noise level would drop.

III. Critter

Although the results for this project are not yet established, I will briefly go over what has been done to date. Critters are small programs written in a simple assembly language which runs on a virtual computer — a RoShamBo computer. Each critter has a memory space and a genetic code, which, when executed on this virtual machine, have the effect of playing RoShamBo. A critter can even be thought of as a RoShamBo game function mapping game input into game actions.

The Critter Virtual Machine

The virtual machine has a number of assembly instructions — everything from flow control, simple memory manipulation, arithmetic, to game specific operations such as accessing the game histories. When a critter is executed an associated energy value is initialized to the maximum energy. Each operation in the virtual machine uses up energy and upon reaching an energy level of zero, the program is forced to terminate. Programs may also terminate before their energy is used up by implicitly returning an action or by running out of statements to execute. In the latter case, a special register is examined and translated into an action.

The Fittness Function

The evaluation criteria is simple and takes only a few various forms. In self-play mode, the current population of Critters must play in tournamens against themselves. This type of evaluation is very computationaly expensive so for most experiments is disabled. Instead, Critters are pitted against various hand coded opponents. Each Critter’s score is also adjusted by subtracting the length of their code and their average energy usage per game. Thus, not only are the strongest players selected, but ties are broken by the smallest, fastest program. An objective evaluation of the population’s fitness can be measred with each iteration by tracking the hand-coded opponent’s overall scores.

The Future

It is too early to announce results — fine tuning of the parameters, as well as some program infrastructure to save and load runs remains to be finished. Only small test trial runs have been tested. Early results indicate that it is very easy to quickly evolve Critters which can easly detect and distinguish a RockBot from a PaperBot from a ScissorsBot and play accordingly, but will take much more time to evolve solutions to beat the more advanced opponents. There will also need to be measures taken to avoid the common situation of the population getting caught in local maxima. Once some of these issues are worked out, Critters will be set for the “big run” of several thousand generations with a large population.

IV. Summary

RoShamBo was a completely surprising game to work with — games do not get much simpler than RoShamBo, yet the variety of tactics available in a tournament environment was staggering. The coding was trivial, but developing the insight into the game and coming up with winning strategies, a challenge.

BACK