Exploration vs Exploitation

A central concept in reinforcement learning (RL from now on) is the balancing of exploration and exploitation. In concrete terms; when your RL chess agent is about to make its next move should it

Pick the move it thinks is the best
Admit it might not know everything and pick another, possibly very different move

Always exploiting and picking the best move according to your current knowledge leads to local minima. The agent knows a couple of good moves and it will try to bend the world to fit its one good move.

Always exploring leads to fragmented and distracted play. The agent will appear to make no progress because it has no way to connect and make use of its previous knowledge.

The two stereotypes listed above are of RL agents, but most of us have run into humans with the same characteristic. It appears the balancing of exploration and exploitation is relevant to us as well. Maybe we could learn a thing or two from reinforcement learning.

The field of reinforcement learning have long figured out good values for this tradeoff. The simple strategy is to exploit 95% of the time. If you have to pick a single number this is the neighborhood you will want to be in. I guess it makes sense that Google abandoned their 20% time. Too much exploration.

A more nuanced strategy is to put the exploitation percentage even higher initially and then gradually decrease it to the 95%. The reasoning for this is that very exploitative agents tend to do well early one, they get good knowledge of reasonably effective strategies fast. They just tend to get stuck in them, hence the gradual decrease in exploitation percentage.

Translated to human terms this means that you should spend 5% of your time exploring new options. This comes out to one day per month. Or two weeks per year if that is your preference. Does your workplace support 5% exploration?

The second takeaway for humans is to “do what works” when learning something new. Don't try to reinvent the wheel before you understand the status quo. However once you do understand the status quo please try to reinvent the wheel. At least 5% of the time!