24 minute read

[!info] title: Hidden Games: The Surprising Power of Game Theory to Explain Irrational Human Behaviour author: Moshe Hoffman, Erez Yoeli published: 2023 edition: 1 ISBN: 978-1529376845

Introduction

  • Identifying hidden games (Game Theory) to explain seemingly irrational human behaviours.
  • My objective is to form an intuitive understanding of how human behaviours can be explained using Game Theory, so I will skip over the algebra and calculations, and even some of the experimental evidence.
  • The book takes the approach of laying down the assumptions of each hidden game, and then try to challenge each assumption to see if the assumptions relates to reality, which helps to justify the model. The book also explores the Nash equilibrium and subgame perfect equilibrium of each game.

Some background concepts related to human behaviour:

  • Learning
    • Reinforcement learning helps us acquire specific skills
    • Social learning occurs through interacting with the community
    • Behaviours, beliefs, and preferences can all be shaped by learning
    • Lags: learned attributes carry over time, even when conditions appropriate for those attributes no longer exists
    • Spillover: learned attributes is applied to other situations where corresponding conditions for those attributes do not exists
  • Primary vs Secondary rewards
    • Primary rewards are things we have evolved to be motivated to pursue (food, shelter, health, comfort, safety, effort, time, prestige, power, sex), and they are often social.
    • Primary rewards are usually universally liked and evolutionary sensible.
    • Secondary rewards are things we find rewarding but we are not evolved to like them (fitness, conscious goals, psychological rewards, financial incentives).
    • For secondary rewards, we can learn to like them, therefore we can also unlearn them, making them suspiciously flexible.
    • Although financial incentives are powerful, they are no social.
  • Proximate-Ultimate distinction
    • Proximate answers are explanations or justifications people tell themselves, or people use to convince themselves why they behave or make certain decisions. You can dig deeper to find underlying explanations beneath proximate answers.
    • Ultimate answers are the final level of answers after digging deeper, or the good enough answers to justify the behaviours and decisions convincingly.
  • Emic-Etic distinction
    • Emic explanations are used by people within a community to justify the practices of the community
    • Etic explanations are used by people outside the community, to objectively justify the practices of the community

We are interested in understanding human behaviours driven by primary rewards explained using ultimate and etic answers.

  • Nash equilibrium
    • If any players in the game cannot gain more benefit by deviating from their current strategy unilaterally
    • Players have optimised their actions, taking into other players also optimising their actions

Sex Ratio: The Gold Standard of Game Theory

  • In the game, assume population of 100 players
  • Two types of players: male and female
  • Objective of players: players wants to have more grand offspring (grandchildren)
  • Actions players can take: Choose the sex of their offspring (simplified assumption)
  • We are interested in finding what will be the final sex ratio of the population
  • Assumptions:
    • Both male and female offspring are equally costly to produce
    • Mating is exogamous (no inbreeding)
    • Every male and female is equally likely to be chosen as mate
  • Nash equilibrium:
    • Occurs when the population maintains 1:1 sex ratio
  • Initial sex ratio not 1:1
    • If this was the starting state, the minority sex will naturally be more popular, and more opportunity to mate
    • Parents will also choose to birth the minority sex since the minority sex is more competitive
    • Over a few generation, the ratio will equalise at 1:1
  • Parents cannot choose sex of offspring (reality)
    • I don’t understand the explanation given by the book, why sex ratio will still equalise at 1:1
    • I think the book says that, the minority sex is more competitive and have more offspring
    • Therefore, those offspring inherits the innate ability/gene/attribute to have a higher chance of producing the minority sex
  • If one sex is more costly to produce
    • If e.g. female is 3x more costly to produce than male, then the system expects female to have 3x more offspring than male, to balance the tradeoff
    • This is observed in some animal population, like ants (e.g 1 queen: 8 male ants)
  • Optimisation process
    • The force is natural selection, so the currency is fitness (proxied by number of offspring)
    • If this game is about conscious decisions, then currency is likely pleasure or other factors
    • If the force is learning, then currency is primary reward
    • Lags and spillovers are more likely for learned behaviour from evolution, instead of conscious decisions.

Hawk-Dove Game (Rights)

  • Two players, contesting over some kind of resource
  • Each with two actions: Hawk (H) or Dove (D)
  • Payoffs for players
    • If both player choose Hawk, they may split the resource but both pay the cost of aggressive contesting
    • If players choose Hawk and Dove (or Dove and Hawk) combination, the Hawk player gains the resource, and the Dove player avoid paying the cost of contesting
    • If both player choose Dove, they split the resource
  • Nash equilibrium
    • If the cost of contesting is expensive (more expensive than potentially splitting the resource with another player), then (H,D) or (D,H) are the only equilibrium
    • Which player gets the resource may be based on arbitrary events (like who were at the location first), but that player is expected to play Hawk
    • Knowing that the first player will play Hawk, the second player will play Dove to avoid the cost of contesting
    • This practice then becomes a self-fulfilling expectation
  • Uncorrelated asymmetry
    • The arbitrary event which player condition on to play Hawk depends on many factors, like context, culture, precedent, efficiency
    • But the event has no relation with the payoffs of the game (uncorrelated)
    • And the event is asymmetrical because it can be used to differentiate between the players
  • Real World Asymmetries
    • Who was there first
      • Property rights in many countries and cultures are determined by who was there first
      • Society is conditioned to accept that those who were at a location first has the rights to that piece of land and resource
      • Society also accept that those who were there first will fight aggressively for their rights
    • Who has possession of the resource
    • Who built it
    • Some of these asymmetries were built into law and enforced as rights
  • Shared Expectations
    • People need to recognise the rights and expect them to be enforced, in order to behave as predicted by the theory
    • However, shared expectations is dependent on culture and environment (laws and rights are different across countries)
  • At the Workplace
    • If your role is a leader or manager, and depending on who you are working with (your team members or your management) you might be expected to play Hawk or Dove
    • If you find that you are stuck at work, you might not be grasping the shared expectation or playing the wrong move
  • Internalised Racism/Sexism and Stockholm Syndrome
    • People may be playing Dove as a form of self-preservation
    • The power asymmetries is real and can hurt they player

Costly Signalling (Aesthetics)

  • Two players: Sender and Receiver
    • Sender may be of High type (desirable) or Low type, which is predetermined with some probability
  • Payoffs for players
    • Sender wants to be accepted by Receiver. Fixed payoff regardless of type.
    • Receiver wants to accept Sender of High type. The payoff is lower for accepting a Low type.
  • Actions for Sender
    • Sender can send a costly signal to Receiver
    • The cost is higher for a Low type than a High type Sender
  • Nash equilibrium
    • The Sender only sends a signal, only if it is the High type
    • The Receiver only accepts a Sender, if the Sender sends a signal
  • Analysis
    • The signal must be easier to send for the High type, or more difficult to send for the Low type, for it to be effective
    • The Low type Sender will avoid sending the costly signal as the payoff of getting accepted is not worth the cost
    • If the signal becomes easier to send for all Senders, it will cease being used as it loses its effectiveness
  • Peacock and Peahen
    • Peacock large tail is a costly signal, which poses a danger to the peacock (affects hunting and hygiene)
    • It signals to the peahen that the peacock with a larger tail is fitter and more adept at surviving (High type)
    • Which is why peahen chooses to mate with peacock with larger tail
    • Peacock which are less fit, if they were unfortunately born with a large tail, they will have a hard time surviving before reaching maturity to mate
  • Luxury goods
    • People infer something desirable from someone carrying luxury goods, otherwise those traits are hard to observe’
    • The signal is wasteful (e.g. Rolex watches don’t functionally perform better than a Casio G-shock, or leather bags are definitely less durable and more destructible than plastic bags)
    • The signal is less costly for some High type (which are supposedly rich and wealthy people)
    • People like the signal less if it becomes easier for others to send the same signal
  • Fattening
    • Some culture in South Nigeria, practice fattening the bride before a marriage
    • It signals that the family can afford to fatten the bride in an environment where food is scarce
  • Sugar and Spice
    • In late 17th century Europe, only the wealthy families can afford the use of sugar and spices in their meals
    • But as the price of sugar and spice falls and becomes easily accessible to everyone, the wealthy families also changed their preference and taste
  • Long pinky nail
    • A practice in olden days China, Thailand, Northeast India and Egypt
    • It signals that these men don’t need to be involved in hard labour (must be wealthy or of certain status in the society)
  • Pale skin
    • Valued in East and Southeast Asia
    • Women who can maintain pale and fair skin signals that they don’t need to be working in the sun
  • White dress shirt
    • Signals white collared job or wealth
    • The environment allowed these people to easily keep their shirt clean and white
    • Which contrasts with blue collared jobs. People are dressed in blue overalls and blue jeans to hide the dirt and oil stain
  • Authenticity in art
    • Objectively speaking, there isn’t much functional difference between authentic and imitation art
    • But being able to consume or possess authentic art signals wealth, as it is not accessible to most people
  • Etiquette
    • Signals that you have been brought up well, which usually only occurs at well-to-do families
  • Wine connoisseurship
    • Signals that you have class, not just cash
    • Even if you can afford the wine, you may not be able to afford the training to appreciate the wine
  • Rhyming
    • Signals talent and cleverness of the songwriter
    • It is a self-imposed constraint which makes writing the song harder, and not everyone can achieve that
  • Religious rules and worship
    • Joining a religious community will unlock a lot of benefits, which may create a free-rider problem
    • The rites and rituals becomes a costly signal, so that only those who observe the rules may join the community and share in the benefits
    • The community can then trust that those who joined will stick around when times are hard
    • The more onerous the obligations, the longer the community will survive
    • The greater the need for cooperation, the more onerous the practices

Buried Signal (Modesty)

  • This game is a modification on top of costly signalling game
  • Some examples
    • Anonymous giving
    • Concealing excitement (call me maybe?)
    • Japanese Shibui (subtle virtuosity) and art where “less is more”
  • Possible explanations
    • Many positive signals: the Sender is capable of sending so many different signals, that this particular signal in question is muted
    • Long-term relationships: the Sender is expecting the Receiver to find out about the signal at a later time
    • Outside options: the Sender is not interested in this particular Receiver picking up the signal
    • Devoted fans: the Sender is only expecting a particular group of Receiver (e.g. devoted fans) to pick up the subtle signal
    • Specific observers: the Sender only wants their targeted Receiver to receive the signal (e.g. identity of anonymous donors are actually known by the charity organisation)

Evidence Game (Spin)

Three variants of a similar evidence game. This is commonly observed in any news, announcement from government and organisations.

Evidence Revelation

  • The world has two true states: High or Low
  • There are two players: Sender and Receiver
    • Sender has a persuasive motive
    • Sender wants Receiver to believe that the world state is High
    • Payoffs for Sender is increasing the Receiver’s (posterior) perceived probability that world state is High
  • Conditions
    • Sender may or may not receive Evidence that reveals the world state
    • Evidence is considered Supportive: probability of receiving High-state evidence is higher than receiving Low-state evidence
  • Actions
    • Sender can choose to reveal Evidence, if they receive it
    • Receiver will update their (prior) belief about the world state, based on
      • Whether Evidence is supportive
      • Whether Receiver expects Sender to reveal Evidence or not
      • Whether Sender indeed reveals Evidence or not
    • There is no payoffs for Receiver
  • Nash equilibrium
    • Sender will only reveal Evidence that is supportive
    • Receiver expects this as the Sender’s strategy and update their belief accordingly
    • If Sender do not reveal any Evidence, Receiver will simply assume it wasn’t obtained, without considering if non-supportive Evidence was suppressed
  • Analysis
    • People will likely present biased evidence when they have persuasion motive, as that is their dominant strategy
    • Any evidence revealed will therefore be supportive
    • It is easier to withhold evidence than to fabricate it
  • The world is based on the above game, but with some modifications
  • Actions
    • Sender can choose to search for Evidence or not (instead of being handed Evidence like game 1)
    • Sender can choose to search with minimal effort (no cost), but will obtain an Evidence with lower probability
    • Sender can choose to search with maximum effort (with some cost), but will obtain an Evidence with higher probability
    • Then Sender can still choose to reveal the Evidence or not
    • Receiver does not observe whether a search has been conducted or not, but is aware of these search options
    • Receiver updates their belief based on all the same factors in game 1
      • And with expectation how hard the Sender has searched for the Evidence
  • Nash equilibrium
    • Sender will only search for supportive Evidence maximally if the cost is small relative to an increase in the Receiver’s posterior
    • Receiver always expects the Sender to have searched for the Evidence maximally, whether it is revealed or not
    • So even when shown supportive evidence, Receiver’s posterior may not increase much due to the expected bias search
  • Analysis
    • Sender will never search for non-supportive evidence
    • Sender will always search maximally if it is not too costly
    • Sender will always be biased in their search when they have a persuasion motive, and that is what Receiver expects

Testing

  • The world is based on game 1, but with some modifications
  • Actions
    • Sender can choose to perform a Test to check the world state (from a set of possible Tests, each Test has some probability of generating Evidence)
    • Tests are considered Confirmatory, if there is a high probability of obtaining Evidence related to world state
    • Tests are considered Diagnostic, if the test is much more likely to generate supportive Evidence for the High state
    • Sender may or may not obtain Evidence from the Test (depends on whether the test is confirmatory)
    • Sender can then choose whether to reveal the Evidence or not
    • Receiver does not observe which Test has been conducted, but is aware of the possible Tests
    • Receiver updates their belief based on all the same factors in game 1
      • And with expectation which Test the Sender have chosen
  • Nash equilibrium
    • Sender will choose a Confirmatory Test
    • Sender will also choose the Test that has higher chance to generate supportive Evidence
    • Receiver will always expect Sender to choose a Confirmatory Test and may not update their posterior much
  • Analysis
    • Sender will always choose confirmatory tests when they have a persuasion motive and the details of the test are hard to observe

Motivated Reasoning

  • Last chapter explores how a Sender might try to persuade a Receiver
  • This chapter explores how people might try to convince themselves
  • Overconfidence
    • People have the tendency to overestimate their abilities and underestimate their weaknesses
  • Asymmetric updating
    • When presented supportive evidence about their abilities, people are more responsive to them
    • When presented with evidence about their weakness, people tend to overlook those evidence
  • Asymmetric search
    • When the weighing scale shows a desired weight, people will be happy and will walk away
    • When the weighing scale shows over/under weight, people may try to test their weight again, hoping for a different outcome
  • Attitude polarisation
    • Once people pick a side or form certain belief, it is hard to convince them otherwise even when presented with strong contrasting evidence
  • Internalised persuasion
    • Using the above actions, people convince themselves to believe in a subject
    • It is easier to convince someone else, if you truly believe in the subject as well (or it is harder to make a mistake and having your lie fall through, if you simply don’t lie)

Repeated Prisoner’s Dilemma (Altruism)

  • The game has two players playing in rounds
  • In each round, the Prisoner’s Dilemma is played
  • There is a probability that a new round will be played after each round (but also a possibility that the game will end)
  • In Prisoner’s Dilemma, the players can choose to Cooperate (C) or Defect (D)
  • The payoffs for the players
    • If player Cooperate, they pay a cost, and the other player will get the full benefit if they Defect (benefit is greater than cost)
    • If both player Defects, they get nothing
    • If both player Cooperate, they both get the benefit, minus the cost of Cooperating
  • Nash equilibrium
    • Players will consider past actions in their decision for each round
    • Some strategy available are:
      • Tit for Tat: start with C, but play whatever the opponent played last round
      • Grim Trigger: start with C, but switch to D permanently when opponent Defect
      • Always Defect or Always Cooperate
    • Always Defect strategy can always be sustained in equilibrium
    • Cooperative equilibrium like Tit for Tat and Grim Trigger can be sustained if the probability of a next round of game happening is high
      • Higher probability of repeated rounds making the risk of cooperating worth it
      • The payoffs (benefit - cost) but also be enticing enough
  • Analysis
    • To sustain cooperation, rounds must be repeated, and actions must be observable
    • Cooperation is condition on other players also becoming cooperative in the future rounds
    • Since Always Defect is always an available equilibrium, players will be sensitive to expectations, context, framing, etc. to influence other players to make cooperative decisions

Norm Enforcement

  • There are multiple players in the game (more than 2)
  • Game is played over multiple rounds
    • First round, a random player is chosen to make a decision: Comply or not
    • Complying has a personal cost to the player
    • In subsequent rounds, players are randomly paired up
    • Each player then chooses whether to punish their paired partner
    • Punishment comes at a cost, and will hurt the other player by a certain amount
  • There is a probability whether the next round of the game will occur or the game will end
  • Nash equilibrium
    • The first player will comply
    • Subsequent rounds, players will punish anyone who didn’t comply in the first round (third party punishment)
    • Players will also punish other players who didn’t punish those that should have been punished (higher-order punishment)
    • This equilibrium occurs when probability of repeated rounds is high, and cost of complying or dishing out punishment is relatively cheaper than the hurt of the punishment
  • Analysis
    • Observability must be high, shirking must be punished, and punishment itself must be incentivised
    • Comply action in round one is arbitrary, depending on context and culture, which establishes the norm
    • Higher-order beliefs is observed, as player need to recognise that a norm has been violated, and they must be motivated to dish out higher-order punishment
  • Practical advice on norm enforcement
    • Increase observability
      • So that it is not easy to violate the norm in private
    • Eliminate plausible excuses
      • So that it is not ambiguous when a norm is violated and punishment can be dished out
    • Communicate expectation
      • So that people will know what is the norm and how to behave
  • Motivations for punishment
    • Those who benefit from the norm might compensate the punisher
    • Institutions might be developed to punish norm violation
    • The punisher is motivated to punish, as a way to signal their own commitment to the norm
    • Punishment isn’t always costly to the punisher

Categorical Norms

  • The game is about coordination between players based on signals from the world state
  • There are two variants of the game, continuous or discrete state

Continuous State

  • The world state is between 0 and 1
  • Players receive signal of the world state, with some margin of error from the true world state value
  • Players then simultaneously choose their action in the coordination game to apply sanction or not
    • Payoffs for the players are not affected by the world state or the signal
    • Players want to apply sanction only if other players also apply sanction
  • Nash equilibrium
    • Players may try to set a threshold, and apply sanction if the signal they receive is above the threshold value
    • But there cannot be any equilibrium for threshold strategy, unless there is zero margin of error (every player receive the exact same precise signal)
  • Analysis
    • Norms cannot be sustained if they are conditioned on continuous information, unless the information is error-free
    • We can also reach an equilibrium if players can share information
    • Therefore in the real world, governments will not use continuous value to coordinate their actions unless those values are error-free

Discrete State

  • Same game as the above continuous variant, except the world state is either exactly 0 or 1
  • The players receive a signal that indicates either 0 or 1, but with also some margin of error (a chance to receive the opposite signal)
  • Nash equilibrium
    • The players may try to apply sanction if they receive a signal value 1
    • This will be in equilibrium if the margin of error is sufficiently small
  • Analysis
    • As seen in the real world, governments and international bodies apply sanctions as long as a categorical norm has been violated
    • They do not use continuous values (like number of casualties or degree of damage) as every player may receive a different signal
    • It is hard to come to a consensus, and applying sanction wrongly carries risk for players
    • Therefore, if a discrete categorical value is used (like, chemical weapon has been used, violating human rights), it is easy for all players to align and apply sanction

Higher-Order Beliefs

  • The base game is similar to Categorical Norms coordination game
  • This chapter explores some variants

Shared Signals

  • Conditional sanctioning can be an equilibrium (players sanction when they receive 1 signal) that is conditioned on the signal
  • This requires the signal to be highly observable
  • This can also work if the players are sharing their signal or observing a common signal

Plausible Deniability

  • The game is modifies Shared Signals
  • There is a chance for false positive (signal is 1, but world state is 0) but no chance of false negative (if signal is 0, world state must be 0)
  • Players are informed of the chance of a false positive
  • Nash equilibrium
    • Players can conditionally apply sanction if they receive positive signal, and know that the probability of false positive is low
  • Analysis
    • The higher the chance of false positive, the more plausible deniability other players have
    • Other players might have received a 1 signal but still chose not to apply sanction, since no one can see their signal

Higher-Order Uncertainty

  • The game is modifies Shared Signals
  • Player 1 knows the true world state
  • Player 2 gets a noisy signal about the world state
    • There are no false positives (if signal is 1, world state is 1)
    • There is a chance of false negative (if signal is 0, the world state may be 1)
  • Nash equilibrium
    • Player 1 sanctions if world state is 1
    • Player 2 sanctions if they receive a signal 1
    • This equilibrium cannot hold if chance of false negative is high, or if the chance of the true world state being 1 is low

Higher-Order Signal

  • Based on the higher-order uncertainty game with all the same conditions
  • Player 1 additionally gets a noisy signal about the signal that player 2 receives with the following properties
    • No false positives: when player 2 gets a 1, player 1 also gets a 1
    • Chance of false negative: when player 2 gets a 1, player 1 may get a 0 with certain probability
  • Nash equilibrium
    • Player 1 sanctions if they receive a signal saying that player 2 has received 1
    • Player 2 sanctions if they receive a signal 1
  • Analysis
    • The additional signal about other player’s signal only helps if the signal is observable
    • The additional signal is useful if the other player’s signal is somewhat noisy (if not, the game reduces to the base game, where player 1 just need to condition on their own signal and knowledge)

Real World Coordination

  • It is therefore insufficient to only know the world state
  • If coordination is required, we also need to know about other factors
    • Observability: was the state easy to observe for other players?
    • Correlation: do other players have access to the same signal as you?
    • Plausible Deniability: can other players find excuses for their coordination action that is not related to the world state?
    • Higher-order Uncertainty: even if we know the world state, and the signal others have received, do they know about our knowledge?
  • Symbolic Gestures
    • Apologies, or other gestures, may signal our intent and our possible coordination actions in the future, which others can depend on to condition their own action
  • Indirect Communication
    • Helps to communicate our intent and higher-order beliefs, which others can use to condition their own actions
    • But it is subtle, may not be observable to all players, and may be plausibly deniable
    • Which makes it a good tool
  • Omission-Commission Distinction
    • It is often more acceptable that inaction leads to a disaster, than a direct action leading to the same disaster (trolley problem)
    • Because a direct action is observable, undeniable, and correlates to your intention, to cause the disaster
    • So even if you did intent to cause disaster with your inaction, this cannot be observed
    • Therefore, other players cannot punish you based on your omission
    • But this assumptions only holds, if coordination is required to dish out punishment (e.g. vigilantes can always take matters into their own hands)
  • Some manifestation of omission-commission
    • Avoiding the ask: detour to avoid the Salvation Army
    • Strategic ignorance: not getting tested for infectious disease, even when you know you are high risk
    • Means vs by-product distinction:
      • Using humans as shields to stop an attack (means)
      • Launching an attack but hurt some people due to collateral damage (by-product) is more acceptable

Subgame Perfection (Justice)

  • The game is the repeated prisoner dilemma
  • The subgame perfect strategy is
    • Player 1 will not Defect, as long as Defection was always punished. If not, Player 1 will always try to Defect
    • Player 2 will punish player 1 if player 1 has Defected this round, and Defection is always punished. If not, player 2 does not punish
  • Analysis
    • Transgressions are deterred in equilibrium by the threat of punishments
    • And punishments must be incentivised. Whenever a transgression happens, punishment MUST be applied, to act as deterrence against further transgression
  • Slippery Slope
    • If any transgression goes unpunished, the player will simply take more advantage in the future
  • Apologies
    • If the transgression did not benefit the player, an apology may be sufficient (benefit weighed against punishment)
  • Moral luck
    • A person may decide to cause harm but luckily the hurt did not materialise
    • People tend to look past this, as intention is harder to observe, than materialised transgression

Hidden Role of Primary Rewards

  • To answer the question of why some people become extremely passionate about certain things
  • Requires Time
    • Passion requires investment of time
    • But the tradeoff is a potential for substantial primary rewards (fame, legacy, respect, romantic opportunities)
  • Social Value
    • Passion must have social value
    • People will hardly develop passion for something that has no social value, as they will not deem it as a good use of their time
    • Social recognition provides a helpful feedback signal for us to know if the social rewards will be a sufficient payoff
  • Strengths and Weaknesses
    • People tend to develop passion for something they are good at
    • And because they are bad at other stuff, which if they invest their time to develop those other skills, it will be a waste and unproductive
  • Economics of Superstar
    • Being able to obtain fame and fortune is a motivation that drives passion
  • Probability of Success
    • If you have a higher chance to be a superstar (due to connections, opportunities etc.) then you will be more likely to pursue your passion

Tips on Analysing Hidden Games

  • Focus on primary rewards
  • Set assumptions to model the game
  • Break each of the assumptions to see if they correspond with reality