I started asking myself these questions after reading Robert Axelrod's - "The Evolution of Cooperation". In short, Axelrod walks us through how a computer program "TIT FOR TAT" that was submitted by Anatol Rapoport (which surprisingly was only 3 lines of code) won the iterated Prisoner's Dilemma tournament - that he setup which "was structured as a round robin meaning that each entry was paired with each other entry" and "each entry was also paired with its own twin and with RANDOM, a program that randomly cooperates or defects with equal probability.[1]"
Structure and Scoring:
§ Each game was 200 moves
§ Each player received 3 points if they both cooperated with each other (i.e. the first interaction pictured above)
§ If they both defected, then 1 point was awarded to each player
§ If one player defects while the other cooperated the defector would receive 5 points while the unfortunate cooperator received 0 points
Axelrod noted that "there is a single property which distinguishes the relatively high-scoring entries from the relatively low scoring entries. This is the property of being nice, which is to say never being the first to defect.[2]" So this means it pays to be always nice no matter what? Well - not always. Defects warrant defects. As do attempts at reconciliation warrant reconciliation.
TFT was always nice meaning it would never defect until another player defects. After another player defects, TFT defects on the next turn and as you can probably guess by now - after the player that defected realizes (if they do) that in order for cooperation to ensue in future interactions they need to cooperate, then TFT cooperates in the next turn. Essentially always doing what the other player did on the previous move (TFT is pictured above in green). What is fascinating is that after the first round of the tournament all participants were given the rules that governed everyone else's program - they all knew what rules the successful programs followed and TFT still won the second round.
TFT never once scored more than any other player. In fact, it can't - the best it could do is have the same score as the other player 600 points for mutual cooperation throughout all 200 moves per game. What does this mean? This tournament along with other games in life - is a non-zero-sum game. There is no need for TFT to exploit other players due to the fact that the other players really can't do much better throughout all moves, yes there is room for exploitation from other players but sooner or later that exploration is self-defeating when taking the whole ecosystem into account. The strategy that relies on exploiting the weak programs sooner or later results in the weak programs being eliminated from the ecosystem that made the predatory program successful.
How to do well in an iterated Prisoner's Dilemma
§ Don't be envious (you don't need to do better than your opponent to be better off than when you started)
§ Don't be the first to defect (it pays to be nice - cooperation can develop)
§ Reciprocate both cooperation and defection (the willingness to be provoked when warranted - the forgives that should follow reconciliation)
§ Don't be too clever (attempts at exploitation can lead to the other player not being able to recognize how you operate - in turn a series of defections may follow and echo throughout the remainder of your interactions) [3]
Mutual cooperation can develop in three ways:
1. Enlarging the shadow of the future - It's easy for us to decide to defect when we know that we won't interact with the other party in the future (leaving our trash places we know we won't return, leaving zero tip for bad service at a restaurant we won't come back to, not paying the supplier of your goods if you believe they are going out of business etc..) "No form of cooperation is stable when the future is not important enough relative to the present. There are two basic ways of enlarging the shadow of the future: by making interactions more durable, and by making them more frequent.[4]"
2. Changing the payoffs - we don't need to change the immediate payoffs in the short run "It is only necessary to make the long-term incentive for mutual cooperation greater than the short-term incentive for defection.[5]"
3. Teaching people to care about each other, teaching reciprocity, improving recognition abilities withing the environment - if there is no form of altruism governing interactions between people sooner or later an absence of cooperation will follow. Reciprocity is need on both ends of interactions both cooperation and defection. "Unconditional cooperation can not only hurt you, but it can hurt other innocent bystanders with whom the successful exploiters will interact with later. Unconditional cooperation tends to spoil the other player; it leaves a burden on the rest of the community to reform the spoiled player, suggesting that reciprocity is a better foundation for morality than is unconditional cooperation.[6]" If we can not recognize when others defect then we will not be able to respond accordingly.
These thoughts are unrefined and don't represent the entirety of the book - all ideas are derived from "The Evolution of Cooperation". I would recommend reading the book if you found this interesting - Axelrod does a better job in explaining; he is a writer and I am a reader.
[1] PG 30 - 3rd paragraph
[2] PG 33 - 2nd paragraph
[3] PG 110 1st paragraph
[4] PG 129 2nd paragraph
[5] PG 134 bottom of 1st paragraph
[6] PG 136 bottom paragraph