×
Instructions:
You have a total of 100 plays to maximize your reward.
At each play, select an arm to play and receive a reward of 0 or 1.
To help your decision, you have access to the following information:
- Rewards: The total reward from each arm.
- Plays: The total number of times you have played each arm.
- Estimated Probs: total reward/total plays, the estimated success probabilities of each arm.
- UCBs: The UCB of each arm (according to Auer et al. 2002).
Note that the 'Show UCBs' box needs to be checked in order for the UCBs to be shown and updated.