Behavioral Ecology Vol. 15 No. 3: 396-399
Behavioral Ecology vol. 15 no. 3 © International Society for Behavioral Ecology 2004; all rights reserved
Energetic state during learning affects foraging choices in starlings
Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
Address correspondence to A. Kacelnik. E-mail: alex.kacelnik{at}zoo.ox.ac.uk.
Received 26 July 2002; revised 6 May 2003; accepted 24 June 2003.
| ABSTRACT |
|---|
|
|
|---|
We investigated the influence of energetic state at the time of acquaintance with a new food source on preference for that source on later encounters, using wild-caught European starlings as subjects. Twelve birds learned to obtain food rewards by pecking at either of two keys identified by color. The keys were encountered in different sessions, while the subjects were food deprived or prefed. Food rewards from both sources were always identical. After an equal number of reinforced trials with each source, the birds were presented with choices between them. The birds significantly preferred the source that had previously delivered food under higher deprivation. We relate these results to findings reported elsewhere of preferences for options previously associated with greater effort. We hypothesize that subjects may attribute value to an option according to the marginal fitness gain associated with this option in the past. Although this process may be adaptive under many circumstances, it violates the assumptions of normative models of choice that imply mechanisms of valuation sensitive to the absolute properties of a payoff or to expected absolute changes in state.
Key words: foraging, hunger, starlings, state-dependent choice.
| INTRODUCTION |
|---|
|
|
|---|
Normative models of animal behavior assume that animals make decisions according to the fitness value of different options, such as mates, feeding patches, or prey types. This value, in turn, is often a joint function of the properties of the option and of the individual's state (for extensive treatments of state-dependent decision models, see Mangel and Clark, 1988
One candidate metric, not necessarily optimal in all circumstances but probably quite a good compromise, may be some measure of the changes in the subject's state experienced as a consequence of taking an option, for example, the increase in reserves consequent on deciding to forage in a given patch type. This is equivalent to remembering the parameters (size, handling time) of typical rewards from each source. A second possibility is for the subject to be sensitive to its own state at the time payoffs are received, so that value depends on past marginal gains in fitness (or need reduction) consequent on encounters with a given option. This would normally be correlated to the former but not necessarily linearly (i.e., doubling reward size does not necessarily double benefit). Under the second possibility, two payoffs of equal size would be valued differently if one occurred when reserves were lower, because the function of fitness versus reserves is often assumed to be concave (see McNamara and Houston, 1982
). Both absolute state changes and past marginal fitness gains are intuitively appealing candidates for behavioral rules that could approximate optimal behavior under many circumstances. Other reasonable mechanisms may be hypothesized, and here we consider two examples. One possible mechanism would be for options to be valued according to the value of the state with which that option is associated in memory. Under this hypothesis, if an alternative is systematically experienced under a state of lower need (higher well-being) it would be preferred to one evoking a less desirable state. Another possibility is for preference to be affected by state both at the time of acquisition and at the time of choice: the subject may prefer options that during acquisition yielded benefits when the subject was in a state similar to that at the time of the choice. Because any of these four mechanisms could be defended as potentially adaptive under many circumstances, they need to be compared experimentally.
We are particularly interested in the second hypothetical mechanism because of its relevance to other recent studies of mechanisms of preference. In a recent study, Kacelnik and Marsh (2002)
presented an explicit model to account for nonhuman similes of the so-called sunk-cost fallacy, a phenomenon related to the Concorde fallacy (Dawkins and Carlisle, 1976
). In their study, starlings obtained equal food amounts after no-choice trials involving either light or heavy work, and were at a later time allowed to choose between the stimuli cueing for these food deliveries. The birds preferred the food source that in the no-choice trials required heavier work (a similar result was reported by Clement et al. [2000]
working with pigeons). Kacelnik and Marsh pointed out that this effect may be adaptive if heavier work creates a state of greater need and fitness is a concave function of reserves, so that the same food reward yields a greater need reduction when it takes hard work to achieve. These results speak against a metric based exclusively on the properties of reward and the associated state changes, because in both studies absolute gain in reserves (the reward size) was identical between options, thus favoring the possibility that some form of marginal gain valuation may be involved. However, although the theoretical account is based on state dependency, none of these studies manipulated the energetic state of the subjects.
Here we continue exploring this issue by testing directly how an animal's energetic state at the time of acquiring information about a novel food source influences its preference for that source in the future. Despite the importance of this issue for a broad class of functional models, we have been unable to find empirical or theoretical work that explicitly addresses it.
We first allowed wild-caught European starlings (Sturnus vulgaris) to become acquainted with two artificial food sources that were encountered in two different states of food deprivation, in sessions in which only one source was present. Later, we gave them choices between the sources while in either of the two deprivation states. We consider four possible outcomes, following from the hypothetical mechanisms presented above. If choice is based on the absolute improvement in energetic state associated with each option, the birds should show indifference between the options, because the rewards are identical. If choice is influenced by past marginal gains in fitness, we expect to see a preference for the option experienced in a greater state of need. If options acquire value by association with the desirability of the state they evoke, the birds should prefer the option experienced in the higher state of reserves regardless of state at the time of testing. Finally, if the value attributed to each option reflects the recall of benefits received under present state, then choice should favor the source met under the state at the time of testing.
| METHODS |
|---|
|
|
|---|
Subjects were 12 wild-caught adult European starlings (license 19990420/20010253 by English Nature). Before the beginning of the experiments (October 2000) the birds were kept in two outdoor aviaries, where they were fed ad libitum on a mixture of turkey crumbs, Orlux pellets, and mealworms (Tenebrio sp.). Drinking and bathing water were always available and replaced daily. Five days before the beginning of the training sessions, the birds were transferred from the outdoor aviary to the laboratory and housed in individual cages (120 x 60 x 50 cm) that served both as home cages and experimental chambers. Rooms were maintained at 18°C (±3°C), with lighting on a 12-hour light/12-hour dark cycle with gradual transition periods at 0730 and 1930 h. Each cage had two perches (85 cm apart) and a panel with two circular response keys (3 cm diameter) and a central food hopper. The keys could be illuminated with white, red, or green. Food rewards were fixed for all treatments at four units of semicrushed and sieved Orlux pellets (approximately 0.02 g per unit) and delivered at a rate of 1 unit/s by automatic pellet dispensers (Campden Instruments). Birds were also permitted to feed ad libitum on turkey starter crumbs and supplementary mealworms at the beginning and end of each day, and for 15 min after the end of each experimental session. Orlux (a preferred food item) was only available during the experimental sessions. Drinking water was always available, and bathing trays were provided twice a week. There was no mortality or any sign of adverse state by any subject throughout the experiment. All subjects were released back into the wild (University Parks, Oxford, UK) during the following spring.
The birds were tested in two groups of six members each. All subjects had participated in an earlier experiment in which they pecked at black and white symbols to procure food rewards. For the experiment presented here (which used colors rather than symbols), the birds were transferred to new cages with different pecking panels, as described above.
Pretraining
Subjects were given 1 day to adapt to the new cages, followed by a day when they had to peck at keys lit randomly with respect to side (each using a white lamp) to release food. On this day all birds were observed to be pecking regularly to obtain food.
Training
During this stage the birds were presented with the options in isolation, to allow them to learn about their properties. We used a within-subjects design in which the birds experienced a "cycle" of two sessions per day, one of each type: hungry and prefed. Each session consisted of the following parts, in sequential order: (1) a food deprivation period (2 hours 50 min); (2) a "manipulation" period (10 min); (3) a "key pecking" period, when the birds were presented with 10 trials, each of which was characterized by the presentation of a colored key that the bird had to peck to receive food (this period could last up to 45 min, depending on how long it took the birds to complete 10 trials); and (4) an ad libitum food period (turkey crumbs, for the final 15 min of each session) designed to reduce carry-over influences across sessions and equalize state for the following session.
The manipulation period determined the kind of session: in prefed sessions, the experimenter entered the room and provided ad libitum food (turkey crumbs) for 9 min. In the hungry sessions, the experimenter entered the room at the same times as in prefed sessions, but no food was provided. In the key pecking period that immediately followed, a colored lamp (red or green) corresponding with the session type was illuminated behind the pecking keys on either side (randomly determined across trials) of the panel. The association of colors with session types was balanced across birds, but for each bird one color was always associated with one session type (e.g., red with prefed and green with hungry). The pecking key remained lit until a peck was registered. When pecked, the lamp was switched off and the food reward released, followed by an intertrial interval (ITI) of 80 s. The key pecking period terminated after the subject completed 10 trials, or after 45 min, whichever came first.
The two daily sessions began at 0800 and 1200 h, respectively. For each subject, one session of each type was given each day, with order of presentation balanced across subjects and reversing between days. The first group (n = 6 birds) had 4 days of training sessions. The second group (n = 6 birds) had 5 days, because on the third day the data were lost and hence an additional day was added. This means that the number of observations available for both groups is the same, but the amount of training received by the birds was slightly (approximately 20%) different. Three of the birds in the second group did not complete all of the key pecking opportunities in the four sessions of recorded data (they completed a total of 36, 37, and 37 trials out of the 40 registered opportunities10 trials x 4 daysin the prefed sessions). Because, however, birds in this group had an extra day of training, the total number of reinforcements during training slightly exceeded that of the first group.
Testing for preference
After 4 days of training, the birds experienced 1 day with two "choice" sessions. These sessions consisted of 10 trials each. Each trial started with both colored keys simultaneously illuminated (side randomized), so that subjects faced a choice between the options previously associated to either treatment (hungry or prefed). The first peck to either key extinguished both lamps, released the food reward, and initiated the ITI. The state of the subjects at the time of testing was manipulated as during the training period, and the order in which subjects experienced the two states was balanced across birds: within each group, three of the subjects had the morning session in the hungry state and the afternoon session in the prefed state. The remaining three birds within each group had the opposite order. Thus, each subject was evaluated by using 20 choices, 10 in each state at the time of testing.
| RESULTS |
|---|
|
|
|---|
We were interested in whether state at the time of initial acquaintance with a food source (training) affected preference for an option in the future, and whether this possible effect, if present, interacts with state at the time of expressing the preference. Because of the difference in the amount of pretraining between the first and second group tested (4 and 5 days, respectively, see Methods), when appropriate we consider the results of each group separately. We consider two measures of choice bias: the proportion of subjects that favored the hungry treatment option and the mean proportion of choices for this option across subjects. Out of the 20 choice opportunities each bird had, all 12 birds in both groups preferred the option associated with the hungry treatment more often (if a binomial test is applied to each group on its own, p =.031; for the 12 subjects together, p <.001).
Figure 1 shows the mean proportion of choices for the option associated with hunger during pretraining for each state during testing. To test the influence of state during testing and the existence of group differences in choice proportions, we conducted a repeated-measures ANOVA, having state during testing (prefed or hungry) as a within-subjects variable and group (first or second) as a between-subjects variable. Because preference is analyzed in the form of proportions, the data was arcsine square root transformed before the analyses (Zar, 1999
). Arcsine transformed proportions and residuals were then inspected for normality and homogeneity of variance. None of these assumptions were violated (Shapiro-Wilk test of normality and Levene's test of the hypothesis that error variance of the dependent variable, state, was equal across the two groups: p >.05 in both cases). The state of the subjects at the time of testing did not significantly affect preferences (F1,10 = 0.25, p >.5). Preference for the option associated with higher deprivation was stronger in the second group (F1,10 = 9.91, p =.01). The only known difference between the two groups was the additional day of training used to compensate for 1 day of loss of data during training. If one accepts as a reasonable hypothesis that this extra training may have contributed to this group's more extreme preference for the option associated with higher deprivation, the implication would be that the closer the birds were to the asymptotic level of training, the more pronounced the difference in preference between the options. The interaction between state at time of testing and group was not significant (F1,10 = 0.01, p >.5).
|
| DISCUSSION |
|---|
|
|
|---|
Our results show that the energetic state of an animal at the time it learns about the properties of a food source affects the likelihood that the source will be selected in future foraging opportunities. In our experiment, a food source with which the subject becomes acquainted while in a higher state of need was preferred to an alternative yielding the same payoff at the time of the choice. We failed to find any effect on preference of state at the time of the choice. Because in the experiment rewards from both sources did not differ, a strictly normative analysis does not predict any preference. If anything, one might expect indifference. However, once we accept that despite physical equality, sources are treated differently as a consequence of their history, it is clear that purely functional models that assume perfect awareness of the properties of alternatives and state-dependent choices (i.e., preferences that depend on state at the time of the choice) are insufficient to model and predict foraging choices. Clearly, the process of acquiring information can itself affect choice and, hence, learning cannot be ignored.
One problem with relating behavior of primary ecological significance such as foraging choices to learning is that learning research is seldom explicit about choice. It is often assumed that rewards received under higher states of need lead to faster acquisition of responding, but it is not clear whether this affects preferences when the options are encountered simultaneously, after learning has reached an asymptote. Indeed, recent developments in learning theory such as those fostered by Gallistel and Gibbon (2000)
formalize the process of acquisition but are mute with respect to choice between stimuli either during acquisition or after asymptotic learning. Despite this lack of direct tests, it would be surprising if factors that promote learning were not influential in choice. For instance, in a series of experiments using rats, Dickinson and Balleine (1994
, 1995
, and references therein) show that the incentive value of a stimulus as expressed in rate of responding under controlled testing states is modulated by the motivational state (e.g., hunger, thirst) at the time of learning. The investigators do not address the problem of choice, but while more data are gathered, it seems reasonable to expect that rate of responding when an option is faced in isolation will correlate with preference in simultaneous encounters. The results of Dickinson and Balleine, similar to ours, do not provide evidence for an effect of state at time of testing. Dickinson and Balleine propose that subjects learn the value of a resource in a given state through direct contact with it in that state, a possibility also recently supported by the observation that the control of caching behavior in scrub jays (Aphelocoma coerulescens) is mediated by the incentive value of specific items rather than by a general motivational state (Clayton and Dickinson, 1999
). Our results extend and generalize these findings to situations involving choices between simultaneously available alternatives, and suggest that the use of such a mechanism can affect a potentially wider range of ecologically relevant decisions, such as those concerning diet choice and selection and use of foraging patches.
Within the foraging literature, state-dependent decision making has been discussed in a number of contexts, such as those involving classic foraging models (Nonacs, 2001
, see also review in Houston and McNamara, 1999
), the distribution of foragers in the environment (Houston and McNamara, 1997
), and risk-sensitive foraging (Stephens, 1981
). In all these cases, however, the state under scrutiny is that at the time of the choice and not at the time of information acquisition. As befits normative modelling, in these studies preferences are only affected by the properties of each food source and the state in which the subject would find itself as a consequence of choosing it. In the present experiment we, however, found it easier to induce preference effects as a function of state at acquisition than at the time of choice.
There are both experimental and theoretical leads to follow from these results. From an empirical point of view, it would be sensible to examine the strength of the effect we found. Because we were considering several hypotheses making opposite predictions, we ran the experiment with equal rewards in both sources so as to increase the chances of observing any effect. An obvious follow-up question is to run experiments with different reward sizes, for instance, by making subjects pick a source that delivers smaller rewards if knowledge about that option was acquired under conditions when the smaller reward size caused a greater need reduction than did the larger reward. A tantalizingly close result in this respect is that of Belke (1992)
and the related later work by Gibbon (1995)
. This work shows that pigeons may choose a reward that is more delayed than is an alternative if acquisition occurred with the sources paired with different alternatives, so that the more delayed reward was the better of two alternatives during training, whereas the opposite was true for the less delayed reward. Relative judgments during acquisition, as well as state-dependent learning, clearly play a role in choice, an effect so far ignored by behavioral ecological modelling.
In terms of theory, we defended our conceptualization by stating that it would appear that a mechanism that assigned value on the basis of remembered need reduction might be ecologically sound. For this notion to work, the costs of using such choice mechanism (so far we have shown that the birds express a preference when they should be neutral, but the work on relative valuation implies that similar mechanisms may generate an active preference for the poorer alternative) must be low, probably because of the frequency with which situations leading to paradoxical choices occur in the natural environment. We have done no formal modelling of this idea, and it is far from obvious what kind of environment would favor such a learning mechanism as opposed to any of the equally reasonable hypothetical alternatives that are not supported by the choice data. As it often the case, progress in experimental analysis of behavioral mechanisms may point the direction in which further behavioral ecological theorizing needs to develop.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to A. Dickinson and two anonymous referees for very helpful comments on previous versions. Financial support was received from CAPES (grant to C. S.-P.) and a Biotechnology and Biological Sciences Research Council grant (43/S13483) to A.K. B.M. also gratefully acknowledges the support of New College, Oxford, The Max Planck Society, and the Rhodes Scholarship Trust. This paper was written while A.K. was on research leave at the Institute for Advanced Studies in Berlin.
| REFERENCES |
|---|
|
|
|---|
Belke TW, 1992. Stimulus preference and the transitivity of preference. Anim Learn Behav 20:401-406.
Clayton NS, Dickinson A, 1999. Motivational control of caching behaviour in the scrub jay, Aphelocoma coerulescens. Anim Behav 57:435-444.[Medline]
Clement TS, Feltus JR, Kaiser DH, Zentall TR, 2000. "Work ethic" in pigeons: reward value is directly related to the effort or time required to obtain the reward. Psychon Bull Rev 7:100-106.[Web of Science][Medline]
Dawkins R, Carlisle TR, 1976. Parental investment, mate desertion and a fallacy. Nature 262:131-133.[CrossRef]
Dickinson A, Balleine BW, 1994. Motivational control of goal-directed action. Anim Learn Behav 22:1-18.
Dickinson A, Balleine BW, 1995. Motivational control of instrumental action. Curr Dir Psych Sci 4:162-167.[CrossRef]
Gallistel CR, Gibbon J, 2000. Time, rate and conditioning. Psych Rev 107:289-344.[CrossRef]
Gibbon J, 1995. Dynamics of time matching: arousal makes better seem worse. Psychon Bull Rev 2:208-215.
Gigerenzer G, Todd PM, ABC Research Group,, 1999. Simple heuristics that make us smart. New York: Oxford University Press.
Houston AI, Kacelnik A, McNamara JM, 1982. Some learning rules for acquiring information. In: Functional ontogeny (McFarland DJ, ed). Boston: Pitman; 140191.
Houston AI, McNamara JM, 1997. Patch choice and population size. Evol Ecol 11:703-722.[CrossRef]
Houston AI, McNamara JM, 1999. Models of adaptive behaviour: an approach based on state. Cambridge: Cambridge University Press.
Kacelnik A, Marsh B, 2002. Cost can increase preference in starlings. Anim Behav 63:245-250.[CrossRef]
Kahneman D, Slovic P, Tversky A, (eds), 1982. Judgement under uncertainty: heuristics and biases. Cambridge: Cambridge University Press.
McNamara JM, Houston A, 1982. Short-term behaviour and lifetime fitness. In: Functional ontogeny (McFarland DJ, ed). London: Pitman; 6087.
Mangel M, Clark CW, 1988. Dynamic modeling in behavioral ecology. Princeton, New Jersey: Princeton University Press.
Nonacs P, 2001. State dependent behavior and the marginal value theorem. Behav Ecol 12:71-83.
Stephens DW, 1981. The logic of risk-sensitive foraging preferences. Anim Behav 29:628-629.[CrossRef]
Zar JH, 1999. Biostatistical analysis. Engelwood Cliffs, New Jersey: Prentice-Hall.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Harlander-Matauschek, F. Wassermann, J. Zentek, and W. Bessei Laying Hens Learn to Avoid Feathers Poult. Sci., September 1, 2008; 87(9): 1720 - 1724. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. VASCONCELOS and P. J. URCUIOLI Deprivation level and choice in pigeons: A test of within-trial contrast Learn Behav, February 1, 2008; 36(1): 12 - 18. [Abstract] [PDF] |
||||
![]() |
T. R. ZENTALL Within-trial contrast: When you see it and when you don't Learn Behav, February 1, 2008; 36(1): 19 - 22. [Abstract] [PDF] |
||||
![]() |
M. VASCONCELOS and P. J. URCUIOLI Certainties and mysteries in the within-trial contrast literature: A reply to Zentall (2008) Learn Behav, February 1, 2008; 36(1): 23 - 25. [Abstract] [PDF] |
||||
![]() |
L. Pompilio, A. Kacelnik, and S. T. Behmer State-dependent learned valuation drives choice in an invertebrate. Science, March 17, 2006; 311(5767): 1613 - 1615. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Behmer, C. E. Belt, and M. S. Shapiro Variable rewards and discrimination ability in an insect herbivore: what and how does a hungry locust learn? J. Exp. Biol., September 15, 2005; 208(18): 3463 - 3473. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




