16: Should You Be Instrumentally Rational?

Click here and press the right key for the next slide (or swipe left)

also ...

Press the left key to go backwards (or swipe right)

Press n to toggle whether notes are shown (or add '?notes' to the url before the #)

Press m or double tap to slide thumbnails (menu)

Press ? at any time to show the keyboard shortcuts

What is rationality?

preferences

Your preferences impose a ranking on outcomes. (Ignore cardinal preference functions for now.)

To exhibit instrumental rationality is to select those actions which you expect to best satisfy your preferences.

To exhibit \emph{instrumental rationality} is to select those actions which you expect to best satisfy your preferences. (See \citet[p.~4]{osborne:1994_game} for a concise and simple formal presentation.)

‘the laws of decision theory (or any other theory of rationality) are not empirical generalisations about all agents. What they do is define what is meant ... by being rational’

\citep[p.~43]{Davidson:1987wc}

Davidson, 1987 p. 43

revealed preferences Importance: transition from externally given criterion Explains why this way of thinking about rationality is attractive

‘the revealed preference revolution of the 1930s (Samuelson, 1938)

... replaced the supposition that people are attempting to optimize any externally given criterion (e.g., some psychologically interpretable motion of utility, perhaps to be quantified in units of pleasure and pain).

Rather, if economic agents are typically assumed to be subject to relatively mild consistency conditions (e.g., such as transitivity ...), it can be shown that there will exist a set of probabilities and utilities such that each agent’s choices will be just “as if” that agent were maximizing expected utility’

Chater, 2014

‘Suppose that A and B are [outcomes] between which the agent is not indifferent, and that N is an ethically neutral condition [i.e. the agent is indifferent between N and not N].

Then N has probability 1/2 if and only if the agent is indifferent between the following two gambles.

B if N, A if not

A if N, B if not'

Jeffrey, 1983 p. 47

This is important for the RPT interpretation, finding structure or patterns in behaviour. It also nicely explains what, if any, normative force the theory could have.

‘As ordinarily understood, the prescription to maximize your expected utility presupposes that there is some measure of expected utility that applies to you and that your preferences are therefore obliged to maximize.

But in the context of decision theory, the utility and probability functions that apply to you are constructed out of your preferences, and so your expected utility is not an independent measure that your preferences can be obliged to maximize;

rather, your expected utility is whatever your preferences do maximize, if they obey the axioms.

Hence, the injunction to maximize your expected utility can at most mean that you should have preferences that can be represented as maximizing some measure (or measures) of expected utility, which will then apply to you by virtue of being maximized by your preferences’

\citep[p.~149]{Velleman:2000fq}

Velleman, 2000 p. 149

motivational states

When we think about motivational states, the most familiar kind is ...

primary motivational states

Another familiar kind of motivational state is what animal learning theorists call ‘primary motivational states’.

These do not change over time and are not modifiable by learning, or at least not readily modifiable.

They inlcude hunger, thirst, lust and disgust.

Can your primary motivational states diverge from your preferences?

You see a rat and a lever. The rat presses the lever occasionally. Now you start rewarding the rat: when it presses the lever it is rewarded with a particular kind of food. As a consequence, the rat presses the lever more often.

Devaluation - standard procedure:

Training: Rat is put in chamber with Lever; pressing Lever dispenses sucrose (novel food).

Devaluation: Rat is taken into another chamber, poisoned, and then exposed to sucrose.

Extinction Test: Rat returns to chamber with Lever; pressing Lever does nothing.

Dickinson, 1985 figure 3; Balleine & Dickinson, 1991 figure 1 (part)

‘Mean lever-press rates during the extinction (left-handpanel) and reacquisitiontests(right-handpanel) followingthe devaluation of either the contingent (group D-N) or non-contingentfood (group N-D).’

‘Experiment I: Mean number of lever presses ... during the extinction test session ... The various groups received either immediate (Groups IMM/SUC and IMM/ H20) or delayed (Groups DELjSUC and DEL/H2O) toxicosis [delayed did not cause aversion] and were re-exposed either to the sucrose solution (Groups IMM/SUC and DEL/SUC) or to water (Groups IMM/H2O and DEL/H20).’

Pavlovain conditioning, primary motivational states can have a direct effect on actions.

‘The dissociation between lever pressing and magazine entries produced by re-exposure is [...] problematic for the incentive learning account.

To recapitulate, this explanation assumes that instrumental performance is mediated by some “representation” of the relationship between the instrumental action and reinforcer that also encodes the current incentive value of the reinforcer. The represented incentive value can only be changed, however, after aversion conditioning by exposure to the reinforcer.

Given this account, the question immediately arises as to why re-exposure is necessary for a change in lever pressing but not magazine entries’

\citep[p.~293]{balleine:1991_instrumental}

Balleine & Dickinson, 1991 p. 293

Why do

the two actions,

lever pressing

and

magazine entry,

dissociate in this way?

conditioning

Pavlovian (classical)

Operant

Results in stimulus-action links

The animal responds to the stimulus by performing the action

Acquired through being rewarded when acting in the presence of the stimulus

Involved habitual processes

habitual

Action occurs in the presence of Stimulus.

Agent is rewarded [/punished]

Stimulus-Action Link is strengthened [/weakened] due to reward [/punishment]

Given Stimulus, will Action occur? It depends on the strength of the Stimulus-Action Link.

instrumental

Action leads to Outcome.

Action-Outcome Link is strengthened.

Agent has strong [/weak] positive [/negative] Preference for Outcome

Will Action occur? It depends on the strength of Action-Outcome Link and Agent’s Preference.

conditioning

Pavlovian (classical)

Operant

Results in stimulus-action links

The animal responds to the stimulus by performing the action

Acquired through being rewarded when acting in the presence of the stimulus

Involved habitual processes

Why do

the two actions,

lever pressing

and

magazine entry,

dissociate in this way?

Because

magazine entry but not lever pressing ‘is under the control of Pavlovian ... contingencies’

and Pavlovian contingenies enable primary motivational states directly influence action.

Balleine & Dickinson, 1991 p. 294

‘A possible resolution to this discrepancy lies with the differing contingencies controlling lever pressing and magazine entry. There is evidence that simple anticipatory approach to a food source, such as magazine entry, is primarily under the control of Pavlovian as opposed to instrumental contingencies (e.g. Holland, 1979),thus raising the possibility that incentive learning is necessary for instrumental but not Pavlovian reinforcer revaluation effects. There is, in fact, independent evidence that accords with this analysis’ \citep[p.~294]{balleine:1991_instrumental}

You see a rat and a lever. The rat presses the lever occasionally. Now you start rewarding the rat: when it presses the lever it is rewarded with a particular kind of food. As a consequence, the rat presses the lever more often.

Devaluation - standard procedure:

Training: Rat is put in chamber with Lever; pressing Lever dispenses sucrose (novel food).

Devaluation: Rat is taken into another chamber, poisoned, and then exposed to sucrose.

Extinction Test: Rat returns to chamber with Lever; pressing Lever does nothing.

Dickinson, 1985 figure 3; Balleine & Dickinson, 1991 figure 1 (part)

Pavlovain conditioning, primary motivational states can have a direct effect on actions.

Aversion does not directly influence preferences.

not only must consumption of the reinforcer be paired with toxicosis,

the animals must also have an opportunity to contact the reinforcer after aversion conditioning if there is to be a change in instrumental performance’

\citep[p.~293]{balleine:1991_instrumental}

Balleine & Dickinson, 1991 p. 293

[To introduce the term ‘incentive learning’]

A moment ago I asked, What happens if we poison the subjects but do not re-expose them to the food?

Can your primary motivational states dissociate from your preferences?

The two kinds of motivational states can dissociate

motivational states

primary motivational states

not directly modifiable by learning

hunger
thirst
satiety
disgust
...

preferences

changing, influenced by learning (and fashion, ...)

chocolate over rhubarb
lime over lemon
red over blue
...

What kinds of processes in individual animals guide actions?

What kinds of processes in

individual animals

guide actions?

Two conclusions:

1. two kinds of process -- habitual vs instrumental

2. two kinds of motivational state -- primary vs preferences

Recall Davidson ...

‘the laws of decision theory (or any other theory of rationality) are not empirical generalisations about all agents. What they do is define what is meant ... by being rational’

\citep[p.~43]{Davidson:1987wc}

Davidson, 1987 p. 43

dilemma

Prioritise one kind of motivational state over all others.

Assume that despite multiple kinds of motivational state at the level of representations and algorithms, the system as a whole will satisfy the axioms governing preferences (e.g. transitivity).

This is important for the RPT interpretation, finding structure or patterns in behaviour. It also nicely explains what, if any, normative force the theory could have.

‘As ordinarily understood, the prescription to maximize your expected utility presupposes that there is some measure of expected utility that applies to you and that your preferences are therefore obliged to maximize.

But in the context of decision theory, the utility and probability functions that apply to you are constructed out of your preferences, and so your expected utility is not an independent measure that your preferences can be obliged to maximize;

rather, your expected utility is whatever your preferences do maximize, if they obey the axioms.

Hence, the injunction to maximize your expected utility can at most mean that you should have preferences that can be represented as maximizing some measure (or measures) of expected utility, which will then apply to you by virtue of being maximized by your preferences’

\citep[p.~149]{Velleman:2000fq}

Velleman, 2000 p. 149

dilemma

Prioritise one kind of motivational state over all others.

Assume that despite multiple kinds of motivational state at the level of representations and algorithms, the system as a whole will satisfy the axioms governing preferences (e.g. transitivity).

Should we try to resolve or escape the dilemma?

Game Theory

Aim: describe rational behaviour in social interactions.

An action is rational
in a noncooperative game
if it is a member of a nash equilibrium?

Entails:

Resisting (‘cooperating’) is not rational in the Prisoner’s Dilemma.

Choosing ‘Low’ in Hi-Low is rational.

‘The problem with measuring risk preferences is not that measurement is difficult and inaccurate; it is that there are no risk preferences to measure – there is simply no answer to how, ‘deep down’, we wish to balance risk and reward.

And, while we’re at it, the same goes for the way people trade off present against future; how altruistic we are and to whom; how far we display prejudice on gender, race, and so on...

... there can be no method...that can conceivably answer this question, not because our mental motives, desires and preferences are impenetrable, but because they don‘t exist’

Chater 2008, p. 123--4

\citep[pp.~123--4]{chater:2018_mind}

If we give up on the claim about rationality, can decision theory be used to explain patterns in other cases? Why, for example, does this seem like an explanation?

I couldn’t resist this one ... game theory (rock-paper-scissors specifically) has been used to explain ‘evolutionary stable strategy model to a three-morph mating system in the side-blotched lizard’ \citep{sinervo:1996_rock}. (The ones on the right resemble sexually receptive females morphologically; they are ‘sneakers’.)

If we give up on the claim about rationality, can decision theory be used to explain patterns in other cases? Why, for example, does this seem like an explanation?

On explanation: ‘Many events and outcomes prompt us to ask: Why did that happen? [...] For example, cutthroat competition in business is the result of the rivals being trapped in a prisoners’ dilemma’

\citep[p.~36]{dixit:2014_games}.

Dixit et al, 2014 p. 36

Keyboard Shortcuts?

16: Should You Be Instrumentally Rational?

[email protected]

Keyboard Shortcuts`?`