Resumed where you left off β€” your last spot
Introduction to Cognitive Science Β· Leiden

The Cognitive Science Master Study Guide

An exam-ready synthesis of all ten lectures β€” every theory, every named scholar, every diagram, every contestable claim, every multiple-choice trap.

10
Lectures
540+
Pages of slides
100+
Named scholars
90+
Self-test MCQs
JLB Chapter 1

The Prehistory of Cognitive Science

Behaviourism dominated psychology in the early 20th century, restricting it to observable behaviour. Cracks in its account of conditioning, the rise of computational models of mind, and discoveries about attention pushed psychology toward cognitivism.

1.1 Two assumptions of behaviourism

  1. All learning is the result of conditioning.
  2. Conditioning depends on association and reinforcement.

The behaviourist slogan: "Psychology is the science of behavior." Mental processes are unobservable and therefore unscientific. Chomsky's killer rejoinder:

"Defining psychology as the science of behavior was like defining physics as the science of meter reading."β€” Noam Chomsky

1.2 Classical (Pavlovian) conditioning

Pavlov initially studied salivation in dogs (Nobel Prize 1904). When he noticed dogs salivating to the assistant opening the door β€” a "psychic secretion" β€” he pivoted his entire research programme.

Three stages of classical conditioning BEFORE Bell β†’ none Food (US) Saliva (UR) DURING Bell Food (US) Saliva (UR) AFTER Bell (CS) Saliva (CR)
CS = conditioned stimulus, UR = unconditioned response, CR = conditioned response.

The big debate inside classical conditioning

S–R (Watson)

Conditioning forges a direct stimulus–response bond. Bell β†’ salivation. No mental states involved. Strictly behaviourist.

vs.
S–S (Pavlov / cognitivists)

Bell activates a mental representation of the food, which then produces the response. Internal states inferred from their predicted effects.

Rescorla (1973) tested this with rats. Pair light + loud sound β†’ both elicit freezing. Then habituate half the rats to the sound alone until they stop freezing to it. Now present the light:

  • S–R predicts: freezing to the light is intact (the light β†’ freezing bond is independent).
  • S–S predicts: no freezing β€” the light triggers the representation of the now-habituated sound.

Result: habituated rats did NOT freeze to the light β€” strong support for the cognitivist S–S theory.

Phenomena to know

  • Extinction β€” CR weakens when CS is presented repeatedly without the US.
  • Spontaneous recovery β€” after extinction, the CR partially returns after a rest. Pavlov concluded the CR is inhibited, not lost.
  • Generalization β€” CR also occurs to similar stimuli (tones near the trained tone).
  • Discrimination β€” organism learns to respond only to a specific CS, not similar ones.
  • Volkova (1953) β€” semantic generalization: children conditioned to "good"/"bad" generalized to whole sentences like "The children are playing nicely together" / "The Fascists destroyed many cities."

πŸ“˜ Textbook adds β€” Tolman's latent learning & cognitive maps (1930, 1946)

Tolman & Honzik (1930) β€” "'Insight' in Rats"

Three groups of rats ran a 14-unit T-Alley maze. Group 1 always rewarded; Group 2 never; Group 3 unrewarded for 10 days, then rewarded. When Group 3 started getting rewards, they learned the maze faster than Group 1 ever had β€” they had stored maze information during the unrewarded period. This latent learning directly contradicts behaviourism: learning without reinforcement.

Tolman et al. (1946) β€” cross-maze studies: place learning (knowing where the food is) is easier than response learning (knowing which turn to make). Rats build cognitive maps β€” internal representations of spatial layout. First major case of postulating internal representations in a behavioural science.

πŸ“˜ Textbook adds β€” Lashley (1951): "The Problem of Serial Order in Behavior"

Lashley argued that complex behaviour (speech, tennis, piano playing) cannot be a chained sequence of stimulus–response links because what happens next depends on what will happen later in the sequence and on the overall goal. He proposed behaviour is organised hierarchically, with high-level plans broken down into sub-plans. Two foundational ideas crystallise from his essay:

  • Subconscious information processing β€” most of the planning that turns goals into movements happens below awareness.
  • Task analysis β€” a complex cognitive ability can be understood by decomposing it into a hierarchy of simpler sub-tasks (the methodological backbone of cognitive science).

1.3 Operant (instrumental) conditioning

Edward L. Thorndike 1898 puzzle box

Cats in a puzzle box escape by trial-and-error. Over 20–30 trials, the time to escape drops sharply.

Law of Effect: "Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely."

B. F. Skinner Skinner box

Animal stays in the box and can repeatedly produce operant responses. Replaced Thorndike's mentalistic "satisfaction" with the behaviour-neutral term reinforcement.

Used variable-ratio schedules to explain gambling addiction (Skinner 1953).

Reinforcement schedules

ScheduleRuleBehaviour produced
Fixed-Ratio (FR)Reinforce after every nth response (FR-5 = every 5th)Fast, steady responding
Variable-Ratio (VR)Average of n responses per reward, varies unpredictablyFastest responding; most resistant to extinction β†’ slot machines!
Fixed-Interval (FI)First response after a fixed time interval is reinforced"Scalloping" β€” responding speeds up near the interval's end
Variable-Interval (VI)Interval varies unpredictablySlow, steady responding

Ratio > Interval because in ratio schedules reinforcers scale with response rate; in interval schedules they're time-capped.

Positive reinforcement = arrival of a stimulus increases the response. Negative reinforcement = removal of a stimulus increases the response. Both make behaviour more likely (β‰  punishment).

Shaping, discrimination, concept learning

  • Shaping = training complex behaviour by reinforcing successive approximations (dog β†’ kitchen β†’ refrigerator β†’ door β†’ scratching).
  • Discriminative stimulus = signal that a response will be reinforced (a light being on).
  • Concept learning β€” pigeons rewarded for pecking Monet (not Picasso) generalize to CΓ©zanne, Renoir β†’ a category "impressionist" forms.

1.4 Cognition and computation

If humans can simulate a single-tape Turing machine (slowly, inefficiently), then the brain is Turing-complete. McCulloch & Pitts built networks of neurons from three principles:

  1. Basic physiology
  2. Propositional logic
  3. Turing's theory of computation

Their results: any computable function can be computed by a network of neurons; all logical operators can be built from simple neural networks.

πŸ“˜ Textbook adds β€” Chomsky's Syntactic Structures (1957)

Chomsky distinguished the deep structure of a sentence (its constituent phrase structure) from its surface structure (the actual word order, derived via transformational rules).

Phrase-structure grammar

Sentences = combinations of basic parts of speech (N, V, Adj, NP, VP…) generated by recursive phrase-structure rules (e.g., S β†’ NP + VP).

vs.
Transformational grammar

Maps deep structure to surface structure. Explains why "John has hit the ball" and "The ball has been hit by John" share a meaning despite different surface forms; and why "Susan is easy to please" β‰  "Susan is eager to please" despite a near-identical surface.

This was the first time a linguist offered an explanatory account of language structure rather than just classification β€” the model for algorithmic theories of mental capacities.

1.5 The mind as an information processor

George A. Miller (1956) β€” "The magical number 7 (Β± 2)": human channel capacity β‰ˆ 3 bits β‰ˆ 7 items, roughly independent of modality. Measured by:

  • Digit-span task β€” repeat back the longest sequence of digits you can hold.
  • Absolute judgment task β€” identify stimuli along one dimension.

Psychophysics β€” Weber's law

jnd = k Β· M (just noticeable difference = constant Γ— stimulus magnitude)
  • k = 0.03 for weight (3% change detectable)
  • k = 0.01 for length
  • k = 0.25 for sound frequency in mice

The same absolute difference (10 units) is easy to detect at low magnitude (10 vs 20) and hard at high magnitude (110 vs 120). The detection probability curve runs sigmoidally from 0% (no difference) through 50% (the jnd) to 100% (clear difference).

1.6 Attention β€” reducing information load

Cherry's dichotic listening / shadowing: participant repeats one ear's story aloud and cannot report the content of the other ear. They do notice physical changes (voice pitch shift, sudden tones).

Broadbent's early-selection filter Story (left ear) Story (right ear) Filter Sensory memory(physical properties) STM / LTM(meaning)
Unattended channels are filtered before semantic processing.

Three arguments AGAINST early selection

Breakthrough (Moray 1959)

Your own name in the ignored ear penetrates the filter β€” meaning must have been processed.

Switching (Treisman 1960)

When the shadowed story switches ears, participants follow it β€” meaning the ignored ear was being parsed.

GSR (Corteen & Wood 1972)

Words PARIS, LONDON, CAIRO conditioned to a shock. Later, ROME in the ignored ear evokes a fear response β†’ semantic category "city" was activated.

Three alternatives to early selection

ModelAuthorClaim
Late selectionDeutsch & DeutschAll stimuli processed for meaning; ignored ones quickly forgotten.
AttenuationAnne TreismanIgnored info is attenuated, not blocked. Important info (your name) is spared.
Load theoryNilli LavieDistractor processing depends on how much capacity the main task leaves over.

Self-test Β· Lecture 1

1. In Rescorla's (1973) habituation experiment with rats, the cognitivist (S–S) theory predicted that:
  1. Habituated rats would still freeze to the light because the light→freezing bond is independent.
  2. Habituated rats would NOT freeze to the light because it triggers the (now habituated) representation of the sound.
  3. Habituated rats would freeze more strongly to the light due to dishabituation.
  4. Habituation should generalize to all conditioned stimuli regardless of pairing.
Show answer
B. The S–S theory says the CS activates a representation of the US; if that representation has been habituated, the CR disappears. Rescorla's data confirmed this.
2. Which reinforcement schedule is most resistant to extinction and famously exploited by slot machines?
  1. Fixed-ratio
  2. Fixed-interval
  3. Variable-ratio
  4. Continuous reinforcement
Show answer
C. Variable-ratio. Unpredictable timing of reward generates extremely persistent responding.
3. Miller's (1956) "magical number" claims human channel capacity is approximately:
  1. 3 bits (β‰ˆ 7 items), roughly modality-independent
  2. 7 bits (β‰ˆ 128 items), strongly modality-dependent
  3. 1 bit (binary), modality-dependent
  4. 10 bits (β‰ˆ 1000 items), strongly visual
Show answer
A. 3 bits β‰ˆ 7 items, roughly independent of modality (measured via digit span and absolute judgment).
4. Corteen & Wood (1972) conditioned a galvanic skin response to city names. When a new city name appeared in the IGNORED ear, participants showed a GSR. This finding:
  1. Supports Broadbent's strict early-selection filter
  2. Demonstrates that unattended stimuli are processed semantically
  3. Shows that GSR is unrelated to attention
  4. Confirms Pavlovian extinction
Show answer
B. The fact that semantic category "city" generalized through the unattended channel shows meaning was extracted β€” an argument against strict early selection.
5. Skinner deliberately preferred the term reinforcement over Thorndike's term because:
  1. It was newer terminology
  2. "Satisfaction" implied a mentalistic / unobservable inner state, which behaviourism avoided
  3. Thorndike's law was already discredited
  4. It allowed Skinner to include classical conditioning under the same heading
Show answer
B. "Satisfaction" suggests a felt inner state; "reinforcement" is behaviourally defined (whatever increases the response rate).
JLB Chapter 2

Three Milestones of Cognitive Science

Three foundational achievements: SHRDLU (language as algorithmic processing), the imagery debate (spatial vs propositional representation), and Marr's three levels (computational / algorithmic / implementational). Cognitive science matures by treating the mind as a system that operates over internal representations.

2.1 Language and micro-worlds

ELIZA Weizenbaum, 1965

Keyword matching + transformation rules simulating a psychotherapist. Created "the illusion of understanding."

"I had not realized that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

PARRY Colby, 1972 Β· Stanford

Simulated a paranoid schizophrenic β€” its inconsistencies made it more realistic. In a modified Turing test, 33 psychiatrists classified transcripts at only 48% accuracy (chance).

SHRDLU Winograd, 1970 Β· MIT

Operated in a colored-blocks micro-world with a robot arm. Could parse and respond to queries like "Does the shortest thing the tallest pyramid's support supports support anything green?"

SHRDLU's three processing stages:

Syntactic
analysis
β†’
Semantic
analysis
β†’
Integration with
world knowledge

Ambiguity demonstration: "Put the red cube on the block in the box." Two readings β€” [red cube on the block] / [in the box] vs [red cube] / [on the block in the box]. Syntactic parsing alone is insufficient; world knowledge is required to disambiguate.

2.2 The imagery debate β€” what is mental imagery?

Spatial / depictive

Mental images preserve metric/spatial properties. Same mechanisms as perception. Image of your room has actual layout.

vs.
Propositional

Mental images are symbolic, sentence-like ("the pizza was on the dining table"). No spatial format.

Evidence favouring the spatial view:

  • Detail recognition β€” to identify small details in an imagined object, you must "zoom in" β€” incompatible with propositional format.
  • Physical-property effects β€” brightness, contrast, motion speed affect reaction times the same way for perceived and imagined stimuli.
  • "Imagine your dinner" effect β€” people with bigger houses take longer to mentally answer spatial questions about them. It takes time to travel in our mind.
  • Mental rotation (Cooper & Shepard 1973) β€” reaction time to judge whether the letter R is normal or mirrored grows linearly with the angle of rotation, peaking near 180Β°.
Cooper & Shepard (1973) Β· Reaction time vs rotation angle 1200 ms 600 ms 0Β° 180Β° 360Β° RT grows linearly with rotation β€” people literally "rotate" the image.
Strong evidence imagery uses a spatial, perception-like format.

2.3 Marr's three levels of analysis

David Marr (1945–1980), neuroscientist of vision. Two prizes are named after him (IEEE/ICCV; Cognitive Science Society).

LevelQuestionExample for vision
ComputationalWhat problem is being solved? Input β†’ output?Recover 3-D structure from 2-D retinal image
AlgorithmicHow is it solved? What representations and operations?Edge detection β†’ 2Β½-D sketch β†’ 3-D model
ImplementationalHow is this physically realized?Neurons in V1, V2, V4, IT

Exam trap: classic distractors swap the algorithmic and implementational levels. The algorithm describes representations and steps; the implementation describes the physical substrate.

2.4 Marr's three stages of vision

Primal sketch
edges, blobs
β†’
2Β½-D sketch
surface orientations
viewer-centered
β†’
3-D sketch
object-centered
generalized cones

The 2Β½-D sketch is viewer-centered (depends on where you stand); the 3-D sketch is object-centered (invariant to viewpoint). The 2Β½-D stage uses stereopsis and Gestalt laws.

Contour-related demonstrations: Hidden Dalmatian (low contour image, hard to perceive) Β· Kanizsa's triangle (illusory contours filled in) Β· Camouflage = reducing contours to defeat the primal sketch.

2.5 Perceptual constancies

Size constancy

Two retinally same-sized people can be different real sizes β€” depth cues correct.

Brightness constancy

Patches A and B can reflect the same light but appear differently bright (the shaded one must be brighter).

Shape constancy

Shepard's tables: two tables look like the same shape on the retina but legs/perspective tell us one is actually longer.

2.6 Object categorization β€” four theories

TheoryClaimStrength / weakness
Categorization by definitionMembership = necessary + sufficient features (cat = furry + meows + four legs…)Hairless non-meowing cats? β†’ family resemblance fails
Categorization by prototypeA category = an idealized average; typicality = closeness to prototypeRobin verified faster than penguin as "bird" (sentence verification task)
Categorization by exemplarsEncountered instances stored individually; typicality falls out of frequencyBest for small categories
Recognition-by-components (Biederman)Objects = combinations of 36 geons (geometric ions)Decomposition-based view; matches Marr's "generalized cones" idea

2.7 Categorization hierarchies β€” expertise effect

Global
"furniture"
large within-category difference
β†’
Basic
"chair"
default level
β†’
Specific
"Barcelona Chair"
small within-category difference

Non-experts default to the basic level; experts categorize at a more specific level within their domain. Example: Barcelona Chair (Mies van der Rohe & Lilly Reich, 1929).

Self-test Β· Lecture 2

1. In Marr's framework, the question "what problem is being solved and why?" belongs to which level?
  1. Implementational
  2. Algorithmic
  3. Computational
  4. Connectionist
Show answer
C. The computational level specifies the goal/function; the algorithmic level specifies representations + steps; the implementational level specifies the physical substrate.
2. Cooper & Shepard's (1973) mental rotation experiment supports:
  1. Propositional theory of imagery
  2. Spatial/depictive theory of imagery
  3. The exemplar theory of categorization
  4. Marr's 3-D sketch
Show answer
B. The fact that reaction time grows with rotation angle implies that people actually rotate an image β€” incompatible with a purely propositional/symbolic representation.
3. Which of Marr's stages of vision is viewer-centered?
  1. Primal sketch
  2. 2Β½-D sketch
  3. 3-D sketch
  4. None β€” all are object-centered
Show answer
B. The 2Β½-D sketch encodes surface orientations from the viewer's position; the 3-D sketch is the viewpoint-invariant, object-centered description.
4. PARRY (Colby, 1972) showed that:
  1. A simple program could pass a modified Turing test with psychiatrists (β‰ˆ 48% accuracy)
  2. Schizophrenia is a purely computational disorder
  3. Real understanding requires biological substrate
  4. SHRDLU could not handle ambiguity
Show answer
A. Psychiatrists could only classify transcripts at chance β€” PARRY's inconsistencies even helped its realism.
5. Biederman's recognition-by-components theory claims objects are composed of how many geons?
  1. 3
  2. 12
  3. 24
  4. 36
Show answer
D. 36 geometric ions ("geons") that combine to form all objects.
6. The sentence "A robin is a bird" is verified faster than "A penguin is a bird." This is evidence for:
  1. Definitional categorization
  2. Prototype theory (typicality effects)
  3. Biederman's geons
  4. Family resemblance is irrelevant for birds
Show answer
B. Robin is closer to the bird prototype than penguin, so verification is faster. Pure definitional categorization predicts no typicality difference.
JLB Chapter 3

The Turn to the Brain

From the immaterial soul to localized neural circuits. Phrenology is wrong in particulars but not absurd in spirit β€” some abilities really do map onto specific brain regions. The lecture surveys dualism vs materialism, basic neuroanatomy, and a parade of neuropsychological case studies that establish localization of function.

3.1 Philosophical roots

Dualism (Descartes 1596–1650)

Two substances: material body and immaterial soul. Animals (dogs) lack souls. The soul interacts with the body via the pineal gland. Only humans need a soul (for thinking, believing).

Critique: how does the immaterial interact with the material? Places mind outside science.

vs.
Materialism (Hobbes 1588–1679)

Everything is material; soul is a meaningless concept. All human behaviour = physical processes in the brain. Thought is anchored in neural firing.

The view that lets the brain become an object of science.

3.2 Psychiatry then and now

  • 1247 β€” Bethlem Royal Hospital ("bedlam") founded. Closed institution; no treatment, just isolation.
  • 18th century β€” rest, cleanliness, regularity. King George III of England (1738–1820) went mad and recovered β€” proof that mental illness can pass.
  • 20th century β€” frontal lobotomy: removal/disconnection of the frontal lobe "to reduce the complexity of psychic life." Required a neurosurgeon.
  • Walter Freeman (1945) β€” transorbital lobotomy with an icepick through the eye socket. > 40,000 people in the US. Side effects: cognitive impairment, flattened affect.
  • Paul Broca (1824–1880) β€” founder of clinical neuropsychology. Showed that damage to Broca's area impairs speech but not other abilities β€” early empirical case for localization.

3.3 The brain in numbers (Carl Sagan's "very big place in a very small space")

  • β‰ˆ 1,300 g total mass
  • ~ 10ΒΉΒΉ neurons total (20 Γ— 10⁹ neocortical)
  • ~ 15 Γ— 10ΒΉΒ³ cortical synapses
  • Two hemispheres joined by the corpus callosum; lateralized despite sensory/motor symmetry.

The neuron

Soma Dendrites (input) Cell body Axon (output) myelin sheath β†’ synapses
Action potentials are binary (all-or-none); signals are frequency-modulated, not amplitude-modulated. Myelination (white matter) speeds transmission. Multiple sclerosis attacks myelin.

Neurotransmitters

NeurotransmitterPrimary function
AcetylcholineMuscle contraction
SerotoninSleep, mood, arousal
GlutamateLearning, memory (excitatory)
GABAInhibitory transmitter
NorepinephrineArousal, wakefulness
DopamineMotivation, emotion

3.4 Aphasias β€” language disorders

Aphasia affects ~35% of stroke patients.

Broca's aphasia (non-fluent)

Speech is effortful, telegraphic, but meaningful. "Cat… sit… mat." Comprehension largely preserved. Damage to Broca's area (frontal lobe, BA 44).

vs.
Wernicke's aphasia (fluent)

Speech is fluent but lacks content/meaning ("word salad"). Damage to Wernicke's area (temporal lobe).

3.5 Split-brain β€” corpus callosotomy

Cutting the corpus callosum treats refractory epilepsy. No major IQ, conversational, or coordination deficit. But experiments reveal striking dissociations:

  • Object presented to left visual field (β†’ right hemisphere) β€” patient says "There is no object."
  • But the left hand (also right-hemisphere) can pick out the same object from a collection perfectly.
  • Patients confabulate to explain their left-hand's behaviour.

3.6 Catalogue of deficits

Unilateral spatial neglect

Not blind β€” but ignores one side (usually left). Misses left side of objects when drawing; makes too many right turns; forgets the tens/hundreds in mental arithmetic. Can affect peri-personal space and even visual imagery. Implicit knowledge about neglected items may be preserved.

Visual agnosia

Cannot recognize visually presented objects despite intact vision, memory, language, and intelligence. Two flavours:

  • Form agnosia β€” cannot perceive shape
  • Integrative agnosia β€” perceives shapes but cannot integrate them. Patient HJA (Humphreys & Riddoch) could only describe a lion by inferring from parts: "A heavy, four-legged animal… these stripes mean something… I suppose it's a lion."
Blindsight

Cortical blindness (V1 damage) β€” no conscious vision β€” yet forced-choice responses are above chance for movement, orientation, even some shapes. "How can I look at something that I haven't seen?" Evidence for two visual systems: a primitive unconscious one and a conscious cortical one.

Prosopagnosia (face blindness)

Cannot recognize faces. Compensation via voice, gait, glasses, hair. Often bilateral damage to the fusiform face area (FFA).

3.7 Two visual streams

Dorsal "where/how"

Action stream. Damage: neglect, apraxia, erratic grasping. The "what pathway" is intact β€” patient can name and describe objects but can't grasp them properly.

vs.
Ventral "what"

Perception stream. Damage: visual agnosia, prosopagnosia. The "where pathway" is intact β€” patient can post a card "as if mailing a letter" through a rotated slot, but cannot report the slot's orientation.

πŸ“˜ Textbook adds β€” Petersen et al. (1988): PET subtraction logic

An early functional-neuroimaging landmark for single-word processing. Subjects performed a hierarchy of four tasks; each scan was subtracted from the one above to isolate the new component:

Fixation
baseline
βˆ’
Passive viewing
of words
+ visual
βˆ’
Speak the words
+ articulation
βˆ’
Generate verb
for each noun
+ semantic access

The pattern of activations supported a parallel (not strictly serial) model of single-word processing. Significance: this study established the paired-subtraction paradigm that all later fMRI cognitive subtraction designs inherited.

πŸ“˜ Textbook adds β€” Logothetis (2001): what does BOLD actually measure?

The crucial follow-up question after the fMRI revolution: BOLD measures blood oxygenation β€” but is that correlated with neuronal output (spikes) or input (synaptic activity)? Logothetis put both fMRI and non-magnetic microelectrodes in an anaesthetised monkey's V1 during a rotating-checkerboard stimulus.

SignalWhat it indexesCorrelated with BOLD?
Single-unit / multi-unit firing (SDF, MUA)Neural output (spikes)Adapts after 2 s β€” decouples from BOLD
Local Field Potential (LFP)Neural input (summed synaptic activity, low-pass-filtered)Tracks BOLD throughout the trial

Implication: fMRI activation in a region reflects information arriving there, not necessarily spikes leaving. A region can show BOLD without firing more β€” undermines naive "this region does X" inferences.

3.8 Capgras delusion β€” the inverse of prosopagnosia

Delusion that a loved one has been replaced by an identical-looking impostor. Explicit recognition is intact, but the implicit/emotional response is missing. Confabulation: "This person looks like my partner, but I don't feel the same about them β€” so it must be someone else."

Explicit recognitionImplicit/emotional recognition
Prosopagnosia❌ lostβœ… intact
Capgrasβœ… intact❌ lost

Self-test Β· Lecture 3

1. Descartes thought mind and body interact at the:
  1. Hippocampus
  2. Pineal gland
  3. Corpus callosum
  4. Thalamus
Show answer
B. The pineal gland was Descartes' proposed locus of mind–body interaction.
2. A patient with fluent but meaningless speech ("word salad") most likely has:
  1. Broca's aphasia
  2. Wernicke's aphasia
  3. Prosopagnosia
  4. Blindsight
Show answer
B. Wernicke's = fluent, content-poor. Broca's = effortful, telegraphic but meaningful.
3. A patient cannot report the orientation of a rotated slot, yet can post a card through it perfectly. This dissociation supports:
  1. Capgras delusion is a single-pathway disorder
  2. Ventral pathway = perception; Dorsal pathway = action (action is preserved)
  3. Dorsal damage causes visual agnosia
  4. Neglect affects only the left hemifield
Show answer
B. Visual agnosia patients (ventral damage) can act on stimuli they cannot consciously identify β€” Milner & Goodale's two-streams model.
4. Action potentials are best characterized as:
  1. Amplitude-modulated and graded
  2. Binary and frequency-modulated
  3. Continuous chemical signals
  4. Always inhibitory
Show answer
B. Spikes are all-or-none; signal strength is encoded in firing rate.
5. Capgras delusion is best described as:
  1. Loss of both explicit and implicit recognition of faces
  2. Loss of explicit recognition with intact emotional recognition
  3. Intact explicit recognition with lost emotional/implicit recognition
  4. A pure form of prosopagnosia
Show answer
C. Capgras = inverse of prosopagnosia. Recognition is intact; the emotional resonance is gone, so patients infer an impostor.
6. In a split-brain patient, an object shown only to the left visual field will:
  1. Be verbally named correctly
  2. Not be named verbally, but the left hand can correctly select it
  3. Cause complete blindness in that hemifield
  4. Trigger seizures
Show answer
B. Left visual field β†’ right hemisphere (which controls the left hand but lacks dominant language). The patient says there's no object but selects it correctly with the left hand β€” and often confabulates.
JLB Chapter 9

Strategies for Brain Mapping

No single neuroscientific technique sees the whole picture. Each method trades off temporal resolution against spatial resolution. Mastery means knowing which tool to reach for, what it measures, and the conceptual logic of subtraction and double dissociation.

4.1 Anatomical classification

  • Surface classification β€” gross anatomy (gyri, sulci, lobes).
  • Cellular classification (Brodmann) β€” Brodmann used staining to identify ~52 areas with distinct neuronal populations. Principle of Segregation: "Cerebral cortex can be classified into different areas with unique neuronal populations."
Brodmann areaFunction
BA 1–3Primary somatosensory cortex
BA 4Primary motor cortex
BA 17Primary visual cortex (V1)
BA 44Broca's area (language production)

DTI (Diffusion Tractography) = MRI-based visualization of white-matter fibre tracts β€” measures anatomical connectivity, not function.

4.2 The big methodological tradeoff

Spatial vs Temporal Resolution Temporal resolution β†’ Spatial resolution β†’ low high (ms) low high (mm) Single-unit recording fMRI (BOLD) PET MEG EEG
Use high-temporal tools (EEG/MEG) for attention/language/decision; high-spatial tools (fMRI/PET) for memory/face recognition.
TechniqueDirectly measuresTemporalSpatial
Single-unit recordingAction potentials of individual neuronsHighHigh
EEGElectrical activity of large neural populations (scalp)HighLow
MEGMagnetic fields from electrical population activityHighLow–medium
PETCerebral blood flowLowHigh
fMRIBlood oxygen levels (BOLD)LowHigh

Exam-critical: fMRI does NOT measure neural activity directly. It measures the BOLD signal: "changes in magnetic properties of haemoglobin in the blood due to brain activation."

4.3 ERP components

An ERP (event-related potential) is a time-locked average of EEG to a specific event.

ComponentLatencyIndexes
P1 / N1~ 100–150 msEarly sensory + attention (enhanced for attended stimuli)
P300~ 300 msGeneral cognitive processing / oddball detection
N400~ 400 msSemantic processing (e.g., "He spread butter on his socks")
P600~ 600 msSyntactic reanalysis

4.4 The Locus of Selection problem

Does attention modulate processing before or after a stimulus representation is built?

  • ERP timing: P1 and N1 are enhanced for attended stimuli β€” early modulation.
  • Macaque microelectrode recordings localize attentional modulation to V1–V4, before object recognition or access to meaning.
  • Resolution: combining ERP (timing) with microelectrodes (location) shows attention acts early, in pre-representational visual cortex.

4.5 fMRI logic β€” subtraction & hierarchical design

To localize a function, contrast a task that requires it against a near-identical task that doesn't. Hierarchical lexical-access design:

Fixation
control
βˆ’
Viewing words
visual
βˆ’
Speaking words
+ articulation
βˆ’
Generating verbs
+ semantics

Each contrast isolates the new component recruited at that step.

4.6 Owen et al. (2006) β€” detecting awareness in the vegetative state

Published in Science 313:1402. A patient diagnosed UWS was asked to imagine either playing tennis (engages motor/SMA) or walking through her house (engages parahippocampal regions). She produced the appropriate, task-specific BOLD activation on command β€” proving covert awareness despite the absence of behavioural response. This paradigm became a yes/no communication channel for "vegetative" patients.

Self-test Β· Lecture 4

1. fMRI directly measures:
  1. Action potentials of individual neurons
  2. Changes in haemoglobin's magnetic properties (BOLD signal)
  3. Cerebral blood flow via radioactive tracers
  4. Magnetic fields generated by neural firing
Show answer
B. BOLD = blood-oxygen-level-dependent signal. Note: that's not the same as PET (blood flow) or MEG (magnetic fields).
2. You want to study the time-course of semantic violation. Which technique?
  1. fMRI
  2. PET
  3. EEG (looking at the N400)
  4. DTI
Show answer
C. EEG has high temporal resolution, and the N400 is the canonical index of semantic anomaly.
3. Which Brodmann area corresponds to Broca's area?
  1. BA 4
  2. BA 17
  3. BA 44
  4. BA 1–3
Show answer
C. BA 44 = Broca's area. BA 4 = primary motor; BA 17 = primary visual (V1); BA 1–3 = somatosensory.
4. Owen et al. (2006) demonstrated covert awareness in a vegetative-state patient by using which paradigm?
  1. EEG recordings of N400 during speech
  2. fMRI during imagined tennis vs imagined spatial navigation
  3. PET imaging of dopamine receptors
  4. Single-unit recording in V1
Show answer
B. Tennis activates motor/SMA, spatial imagery activates parahippocampal regions β€” different and task-specific BOLD patterns are evidence of conscious task performance.
5. The combination of ERP timing + microelectrode localization resolved the locus-of-selection problem by showing:
  1. Attention modulates visual processing late, after object recognition
  2. Attention modulates visual processing early, in V1–V4 before recognition
  3. Attention is purely a frontal phenomenon
  4. Attention does not modulate sensory areas at all
Show answer
B. P1/N1 enhancement + macaque microelectrodes both localize the effect to early visual cortex.
JLB Chapter 5

Connectionism

Two starting points in Marr's hierarchy give rise to two paradigms. Symbolic AI starts from the mind (algorithms, interpretable rules). Connectionism starts from the brain (biology has already produced intelligence). The result: networks whose knowledge is a pattern of weights, not a list of beliefs.

5.1 Two starting points

Symbolic AI (top-down)

Start at the algorithmic level. Explicit symbol manipulation. Interpretable and fittable. Cognition as rule-based operations on discrete symbols (the Physical Symbol System hypothesis).

vs.
Connectionism (bottom-up)

Start at the implementational level. Biology has produced intelligence β€” use it as inspiration. Deep nets are powerful but hard to interpret. Knowledge is in the weight vector.

πŸ“˜ Textbook adds β€” The Physical Symbol System Hypothesis (Newell & Simon, 1976)

The lecture's "Symbolic AI" pole is grounded in a specific thesis the textbook treats as foundational. Allen Newell & Herbert Simon (Turing Award lecture, 1976):

"A physical symbol system has the necessary and sufficient means for general intelligent action."β€” Newell & Simon, 1976

Two claims packed in: (i) necessity β€” anything intelligent must be a physical symbol system; (ii) sufficiency β€” building one is enough to produce intelligence.

Four defining features of a physical symbol system

  1. Symbols are physical patterns (inscriptions on a tape, voltage states, neural firings).
  2. Symbols can be combined into complex structures via recursive rules (like sentences in propositional logic).
  3. The system contains processes that transform symbol structures in rule-governed ways β€” this is thinking.
  4. Those transformation processes can themselves be represented as symbols within the system (meta-representation).

Cognition, on this view, is heuristic search through a problem space. Newell & Simon's General Problem Solver (GPS) applied means–end analysis: compute the difference between the current and goal state, pick an operator that reduces it, apply, repeat. The PSSH defines what connectionism rejects.

5.2 From biology to schematic neurons

I₁ Iβ‚‚ I₃ W₁ Wβ‚‚ W₃ Ξ£ Activation f(Β·) step / ReLU / sigmoid Output threshold T
A unit computes Ξ£(weight Γ— input), passes the sum through an activation function (Heaviside / ReLU / sigmoid), and emits an output. Knowledge lives in the weights.

5.3 Learning β€” the delta rule (single-layer)

error Ξ΅ = (desired output βˆ’ actual output)
Ξ” T = βˆ’ Ξ΅    Ξ” Wα΅’ = Ξ΅ Β· Iα΅’ (scaled by a learning rate)

The perceptron convergence rule: training will find a solution in every case where a solution is possible. But which functions ARE possible?

5.4 The XOR problem

The single-layer perceptron cannot learn XOR β€” it oscillates, never converges. Why?

I₁Iβ‚‚XORContradiction
101(1Β·W₁) > T
011(1Β·Wβ‚‚) > T
110but then (W₁+Wβ‚‚) > T β†’ output would also be 1. Impossible.

Perceptrons only learn linearly separable functions. XOR is not linearly separable β€” you cannot draw a single straight line that separates the (1,0) and (0,1) cases from (0,0) and (1,1).

5.5 The escape route β€” multi-layer networks & backpropagation

  • Universal approximation theorem: a multi-layer network can compute any Turing-computable function.
  • But the perceptron rule no longer works β€” hidden units have no target activation.
  • Backpropagation calculates each hidden unit's "share of responsibility" for the output error and uses it to update weights.
  • Gradient descent: follow the negative gradient of the error surface; stop when the gradient is zero. Risk: local minima β‰  global minimum.

5.6 Biological plausibility β€” the critiques

  • Schematic neurons β‰  real ones; questions of parallelism and scale.
  • No evidence backpropagation occurs in the brain.
  • How would the brain set the number of hidden units?
  • No evidence individual neurons receive error signals from all downstream neurons.
  • Most biological learning is not supervised.

5.7 Cognitive implications β€” distributed vs local representations

  • Knowledge lies in a pattern of weights, not in any one unit.
  • A trained network does not need a separate unit per feature.
  • Processing = input vector Γ— weight vector. No discrete beliefs, no explicit rules.
  • Algorithmic in a limited sense: the learning rule and activation function are algorithms β€” but they're not task-specific and they don't operate over explicit representations.

Conclusion: "The nature of representations and computation in neural networks is fundamentally different compared to physical symbol systems."

Self-test Β· Lecture 5

1. Which Boolean function CANNOT be learned by a single-layer perceptron?
  1. AND
  2. OR
  3. NAND
  4. XOR
Show answer
D. XOR is not linearly separable. AND, OR, NAND all are.
2. The universal approximation theorem says that multi-layer networks can:
  1. Be trained by the perceptron convergence rule
  2. Always find the global minimum
  3. Compute any Turing-computable function (given enough hidden units)
  4. Encode any function in a single unit
Show answer
C. Multilayer = universal function approximator. But the perceptron rule fails β€” you need backpropagation.
3. In the delta rule, the update for a weight is:
  1. Ξ”Wα΅’ = Ξ΅ Β· Iα΅’ (scaled by learning rate)
  2. Ξ”Wα΅’ = βˆ’Ξ΅
  3. Ξ”Wα΅’ = T Β· Iα΅’
  4. Ξ”Wα΅’ = Wα΅’Β²
Show answer
A. Weight changes scale with both the error and the input that drove it. Threshold updates by βˆ’Ξ΅.
4. The strongest biological-plausibility critique of backpropagation is:
  1. It only works for linearly separable functions
  2. It produces local minima too often
  3. There is no evidence the brain implements it (no mechanism for propagating error signals through every synapse)
  4. It is slower than the perceptron rule
Show answer
C. Backprop requires each neuron to receive precise error signals from all downstream neurons β€” biology has no known mechanism for this. Plus most learning isn't supervised.
5. Distributed (vs. localist) representations in connectionist networks mean:
  1. Each unit represents one feature
  2. Knowledge lies in the pattern of weights across many units
  3. Information is stored explicitly as symbols
  4. Each layer represents a different category
Show answer
B. A localist scheme uses one unit per feature; distributed schemes encode features across overlapping populations β€” the source of NN power.
JLB Chapters 6 & 8

Modularity of Mind & Dynamical Systems

Three rival pictures of mental architecture: (1) Fodor's classical modularity β€” domain-specific input modules plus a non-modular central system; (2) massive modularity (Cisek-style evolutionary view) β€” no central processor at all; (3) dynamical systems theory β€” cognition as a process that evolves in time, possibly without representations or computation.

6.1 Agents: three tiers

Reflex agents

IF–THEN production rules. Not a cognitive system. No information processing β€” just acting on information. Examples: thermostat, zebrafish C-start reflex, somatic reflex.

Goal-based agents

Evaluate consequences of possible actions in light of goals (foraging). No learning.

Learning agents

Detect errors. Experiment with new strategies in light of past failures.

6.2 Classical (Fodorian) modularity

Aristotelian roots: horizontal faculties (perception, attention, memory) are domain-general; vertical faculties are domain-specific (colour, shape, face/voice, grammar, conspecific recognition).

Input modules (Fodor)
  • Domain-specific
  • Mandatory
  • Information-encapsulated
  • Fast
  • Fixed neural architecture
  • Specific breakdown patterns
vs.
Central processing (Fodor)
  • Domain-general
  • Information-un-encapsulated (isotropic)
  • Slow
  • Voluntary control
  • Diffuse neural structures
  • Personal-level propositional attitudes

Evidence cited: lesion studies, Broca's vs Wernicke's aphasia, brain mapping.

6.3 Massive modularity (Cisek 2019)

The radical alternative: there is no domain-general central processor. The mind is hundreds or thousands of genetically specified Darwinian modules selected for specific adaptive problems.

  • "The most important thing about the brain is that it evolved."
  • Domain-general learning mechanisms cannot detect statistically recurrent domain-specific structure.
  • Each module exploits specialized, domain-specific rules.
  • Descriptive vs pragmatic representations: control loops only need action-oriented (pragmatic) representations, not world models.
  • Input–output functionalism ignores the cyclical nature of behaviour.
  • No single decision-making system β€” just domain-specific competition mechanisms.
  • Conceptual maps emerge from learning on top of sensorimotor loops β€” no symbol-grounding problem.

πŸ“˜ Textbook adds β€” The cheater-detection module: Wason & Cosmides/Tooby

The textbook's flagship case study for a Darwinian module. The Wason selection task: four cards (E, K, 4, 7); rule "If a card has a vowel on one side, then it has an even number on the other". Which to turn? Correct answer: E and 7 (modus tollens). Most subjects say E and 4 β€” a famous failure of abstract conditional reasoning.

Griggs & Cox (1982) reframed the same logical task as a deontic conditional: "If a person is drinking beer, then that person must be over 19" with cards BEER, COKE, 16, 25. Now subjects answer correctly (BEER, 16) at near-ceiling rates.

Cosmides & Tooby argued the improvement reveals a domain-specific, evolved cheater-detection module for social-exchange reasoning. The argument links to the evolution of cooperation via the TIT FOR TAT strategy in indefinitely-iterated prisoner's dilemmas: applying TIT FOR TAT requires identifying defectors, so natural selection would favour a module specialised for spotting them.

Two general arguments for massive modularityCosmides & Tooby's claim
Argument from errorFitness criteria are domain-specific (treating kin, finding mates, detecting cheaters all differ) β†’ no domain-general cognitive mechanism could have evolved.
Argument from statistics & learningDomain-general learning mechanisms cannot detect statistically recurrent domain-specific patterns (e.g., Hamilton's kin-selection equation).

6.4 Dynamical systems theory (van Gelder 1995)

"What might cognition be, if not computation?"β€” Tim van Gelder, 1995

Cognition as a process that evolves through time, not necessarily involving computation or representations.

Traditional cogsci

Cognition = information processing = manipulating representations. Discrete steps. Symbols, rules.

vs.
Dynamical systems

Cognition = continuous trajectory through state space. Described by difference equations (discrete) or differential equations (continuous).

  • State space = geometric space of all possible system states. Each independently varying quantity = one dimension.
  • Trajectory = path through state space from initial conditions.
  • Two senses of "dynamical system": trivial (anything that evolves in time) vs technical (analyzable with DST tools).

πŸ“˜ Textbook adds β€” ACT-R as a hybrid architecture (Anderson, CMU)

Where Soar (Newell, Laird, Rosenbloom) is purely symbolic, ACT-R ("Adaptive Control of Thought β€” Rational") is the canonical hybrid architecture β€” symbolic and subsymbolic at once.

Symbolic layer

Chunks in declarative memory (knowledge-that, e.g., "7+6=13"). Production rules in procedural memory (knowledge-how, IF–THEN). All built from physical symbols.

+
Subsymbolic layer

Each production rule and chunk has a numerical activation/utility value. A pattern-matching module performs a Bayesian-style cost–benefit calculation to pick which rule fires next β€” no central executive.

Take-away: modularity and PSSH-style processing can coexist with neural-net-style subsymbolic selection. Cognitive architecture is not all-or-nothing.

6.5 Worked example 1 β€” Ising network model of depression (Cramer et al. 2016)

Traditional latent-variable view: gallstones cause nausea, abdominal pain, heartburn β€” a single hidden cause produces all symptoms.

The network view of psychopathology: symptoms are nodes (active = 1, inactive = 0) coupled by weights Wij. Activation propagates via a logistic function. Stress = extra input to all nodes.

  • Depression evolves as a self-sustaining network of interacting symptoms.
  • Insight into cognition without representations or computations.
  • Same approach extends to bipolar disorder, generalized anxiety, attitude models.

6.6 Worked example 2 β€” Decision Field Theory (Busemeyer et al. 2019)

Choosing among multiple options (e.g., three phones differing in price, OS, battery, speed). Preference state P evolves over time:

Pi,t = Ξ» Β· Pi,tβˆ’1 + Vi,t βˆ’ [ Ξ£jβ‰ i Vj,t / (n βˆ’ 1) ] + noise leakage Ξ» Β· own valence βˆ’ mean competitor valence + noise

The dynamics of a connectionist accumulator predict preferences, response times, and choice proportions as emergent properties of system evolution, not computations on symbols.

Take-away framing: "The behavior of the system as a whole is of interest β€” less focus on the computations on underlying representations, or even on architecture."

Self-test Β· Lecture 6

1. Which of these is NOT one of Fodor's listed properties of input modules?
  1. Domain-specific
  2. Mandatory
  3. Information-encapsulated
  4. Isotropic
Show answer
D. Isotropic = un-encapsulated, holistic β€” Fodor's description of central cognition, not modules. Modules are encapsulated, domain-specific, mandatory, fast, fixed in neural architecture, and have specific breakdown patterns.
2. Massive modularity (Cisek-style) differs from Fodor's view by claiming:
  1. There are no input modules at all
  2. There is no domain-general central processor β€” the mind is modules all the way down
  3. Modules cannot evolve
  4. Symbol grounding is impossible
Show answer
B. Fodor keeps a non-modular central system; massive modularity eliminates it, replacing it with many Darwinian modules each solving a specific adaptive problem.
3. Van Gelder's banner question β€” "What might cognition be, if not computation?" β€” points to:
  1. Massive modularity
  2. Dynamical systems theory
  3. Classical AI
  4. The Physical Symbol System hypothesis
Show answer
B. DST views cognition as a time-evolving process, possibly without manipulating representations.
4. In the Ising network model of depression, "stress" is modelled as:
  1. A reduction in weight strengths
  2. Extra activation input to all symptom nodes
  3. A change in the logistic activation function
  4. An additional hidden node
Show answer
B. Stress = extra activity injected to all nodes. This can push the network into a self-sustaining depressed state.
5. In a dynamical-systems framework, the path of a system through all its possible states over time is called:
  1. A representation
  2. A trajectory through state space
  3. An algorithm
  4. A symbol grounding
Show answer
B. State space = geometric space of all possible states; trajectory = the path from initial conditions.
JLB Chapter 7

Bayesianism in Cognitive Science

Three ideas: (1) belief comes in degrees, (2) those degrees obey probability calculus, (3) learning = updating probabilities via Bayes' rule. The lecture's punchline: Bayesianism is the normative ideal β€” but humans systematically fail to reason like Bayesians.

7.1 The probability calculus rules

  • Probabilities ∈ [0, 1].
  • Impossible sentences = 0; necessary truths (2+2=4) = 1.
  • If P and Q are logically equivalent, p(P) = p(Q).
  • Negation: p(Β¬S) = 1 βˆ’ p(S).
  • Disjunction (mutually exclusive): p(R ∨ S) = p(R) + p(S).
  • Conjunction (independent): p(R ∧ S) = p(R) Γ— p(S).
  • Conditional: p(A | B) = p(A ∧ B) / p(B).

7.2 Bayes' rule

p(H | E) = p(E | H) Β· p(H) / p(E) posterior = likelihood Γ— prior / evidence

The denominator is computed by marginalization: p(E) = p(E|H)Β·p(H) + p(E|Β¬H)Β·p(Β¬H)

7.3 Why the laws are objectively correct β€” Dutch books

A Dutch book is a set of bets that (1) the subject considers fair given their personal probabilities, but that (2) guarantee they lose money no matter what. Anyone whose beliefs violate probability calculus can be Dutch-booked. Therefore: rational degrees of belief must obey probability calculus.

Sam example: Sam believes "2+2=4" with probability 90%. Offer: he gets $0.90; he pays $1 if the bucket has 4 marbles. He thinks it's fair (EV = 0). But 2+2 IS 4 β€” so he always loses $0.10.

7.4 Worked example β€” ESP / clairvoyance

"Extraordinary claims require extraordinary evidence."β€” Carl Sagan (1978)

Clairvoyant correctly predicts 100 coin tosses. Should you believe in ESP?

  • P(predict | ESP) = 0.9
  • P(ESP) = 10⁻¹² (very skeptical prior)
  • P(predict | Β¬ESP) = 2⁻¹⁰⁰ β‰ˆ 7 Γ— 10⁻³¹

Posterior P(ESP | predict) β‰ˆ 1 βˆ’ 10⁻¹⁸. Almost certain β€” until you add the "trick" hypothesis with P(trick) = 10⁻⁢:

P(Β¬ESP & trick | predict) / P(ESP & Β¬trick | predict) β‰ˆ 10⁢

It's a million times more likely you were tricked than that ESP is real. The moral: tiny priors over fraud can outweigh enormous likelihoods.

7.5 Worked example β€” COVID-19 base-rate problem

Prevalence 1/1000. False positive rate 5%. You test positive. P(disease | positive) = ?

Reason over 1000 people:

  • 1 person actually has it (positive).
  • Of 999 healthy people, 5% test positive falsely β‰ˆ 50.
  • 51 positives total; only 1 is actually sick β†’ ~2%, not 95%.

Why we still trust tests: in real life the sample is not random β€” there's a reason you were tested, so the relevant base rate is much higher.

7.6 The transposed-conditional fallacy

p(A | B) β‰  p(B | A).

  • A = "is a white American man," B = "is a US senator"
  • p(A | B) β‰ˆ 0.9 (most senators are white American men)
  • p(B | A) β‰ˆ 0.00000009 (almost no white American men are senators)

πŸ“˜ Textbook adds β€” Perception as Bayesian inference (Helmholtz β†’ Hohwy)

The textbook frames Bayesianism in cognition as the modern formalisation of Hermann von Helmholtz's 19th-century proposal that perception is unconscious inference. The proximal sensory input radically underdetermines the distal world, so the brain must infer what is out there using stored knowledge about how the world tends to be.

  • Hypothesis (H) = candidate layout of the distal environment.
  • Evidence (E) = retinal stimulation.
  • Likelihood p(E | H) = how probable this image is given that layout.
  • Prior p(H) = how probable that layout is in general.
  • Gestalt principles (continuity, proximity, good form, common fate) function as Bayesian priors over scene structure.

Case study: Binocular rivalry (Hohwy, Roepstorff & Friston, 2008)

Present a red iron to the left eye and a green violin to the right eye. Perception alternates between the two β€” never a stable composite. Why?

  • H1 = red iron, H2 = green violin, H3 = composite "red-green iron-violin".
  • Likelihoods of the conflicting retinal input are roughly equal across H1–H3.
  • But p(H1) β‰ˆ p(H2) ≫ p(H3) β€” the prior on composite objects is tiny.
  • Posteriors: H1 and H2 tied, H3 ruled out. The visual system flips between the two equally-supported hypotheses rather than averaging them.

Binocular rivalry is a rational Bayesian response, not a glitch β€” a key example for predictive coding theories of perception.

7.7 Bayesian search theory (MH370)

For each grid cell i: pi (probability object is there), ai (probability of finding it if there), ci (cost of searching).

Optimal policy: argmaxi ( pi Β· ai / ci )

After each miss, redistribute the posterior over remaining cells and recompute.

7.8 Where humans fail β€” heuristics & biases

Availability heuristic

Judge probability by ease of recall. Therapist who just saw three depressed patients overestimates depression in the next.

Gambler's fallacy (predictable-world bias)

After 4 tails, you "feel" heads is due. But independent tosses have no memory.

Probability matching

Die with 4 red / 2 green sides. Maximizing (always red) β†’ 67% correct. Matching (red 2/3, green 1/3) β†’ 56%. Humans match; mice maximize. In stochastic processes, maximizing > matching.

Base-rate neglect

The COVID example above. Also: at the NY subway you see someone reading the NYT β€” better bet she has a PhD or no college degree? Far more non-graduates ride the subway, so no degree is the better bet.

7.9 The Linda problem β€” conjunction fallacy (Tversky & Kahneman)

Linda is 31, single, outspoken, very bright. Majored in philosophy; concerned with discrimination/social justice; antinuclear protests. Rank the probability:

  • F. Linda is a bank teller
  • H. Linda is a bank teller and active in the feminist movement

Most people rank H > F. But "feminist bank tellers" are a strict subset of "bank tellers." Specifying more detail can only LOWER probability, never raise it.

7.10 Bayesian view of psychopathology

Schizophrenic delusions can involve affirming the consequent: "Jesus had stigmata; I have stigmata; therefore I am Jesus." Rokeach (1964) "The Three Christs of Ypsilanti" β€” three paranoid schizophrenic men each believing he was Jesus, housed together for two years; their beliefs barely shifted, showing the difficulty of revising delusional priors.

Self-test Β· Lecture 7

1. In Bayes' rule p(H|E) = p(E|H)Β·p(H)/p(E), the term p(H) is the:
  1. Posterior
  2. Likelihood
  3. Prior
  4. Marginal evidence
Show answer
C. p(H) = prior; p(E|H) = likelihood; p(H|E) = posterior; p(E) = marginal evidence.
2. A disease has prevalence 1/1000. A test has a 5% false positive rate. If you test positive (with 95% true-positive sensitivity), the chance you actually have the disease is closest to:
  1. 95%
  2. 50%
  3. 2%
  4. 0.1%
Show answer
C. Roughly 2%. Reason over 1000 people: 1 true positive vs ~50 false positives.
3. Why is "Linda is a bank teller AND a feminist" more probable to most people than "Linda is a bank teller"?
  1. It actually IS more probable
  2. The conjunction fallacy: people judge by representativeness, not probability
  3. Feminist bank tellers form a superset
  4. It is a Bayesian-correct judgment
Show answer
B. The set "feminist bank tellers" is a strict subset of "bank tellers" β€” by definition, it cannot be more probable. The intuition is driven by representativeness.
4. A subject is told that ALL Dutch books exploit a particular feature of their beliefs. Which feature?
  1. Their willingness to gamble
  2. Violations of probability calculus in their personal probabilities
  3. Their use of the availability heuristic
  4. Their priors being too small
Show answer
B. If your degrees of belief don't obey probability calculus, a Dutch book can be constructed against you β€” that's the philosophical justification for Bayesianism.
5. In a stochastic environment with 4 red / 2 green outcomes, the optimal strategy is:
  1. Probability matching (predict red 2/3, green 1/3) β†’ 56%
  2. Maximizing (always predict red) β†’ 67%
  3. Always predict green
  4. Predict whichever was last seen
Show answer
B. Maximizing yields 67% correct; matching only 56%. Humans tend to match (suboptimal). Gambling addicts especially.
6. Bayesian search theory says you should search the cell that maximizes:
  1. pi Β· ai
  2. pi Β· ai / ci
  3. pi / ai
  4. ci Β· ai
Show answer
B. Maximize the product of "probability there" Γ— "probability you'd find it if there", divided by the cost of searching.
JLB Chapter 10

Language Learning

Three paradigms applied to language: symbolic (Fodor's Language of Thought, Chomsky's innatism), connectionist (neural nets that reproduce children's overgeneralization), and Bayesian (statistical learning of word boundaries and anaphora). The lecture closes with the LLM revolution and what it implies about innateness.

8.1 What is language understanding?

  • Semantics β€” meaning of words.
  • Syntax β€” structure; surface vs deep structure.
  • "Colorless green ideas sleep furiously" β†’ syntactically well-formed, semantically anomalous.
  • "John has hit the ball" / "The ball has been hit by John" β†’ two surface structures, one deep structure.

Strong vs weak mastery: are linguistic rules explicitly represented in the head (strong sense, Fodor/Chomsky) or merely obeyed in behaviour (weak sense, connectionist/Bayesian)?

Key permission slip: "Rule-governed phenomena need not come from rule-governed information-processing structures." β€” this opens the door to connectionism and Bayesianism for language.

8.2 Fodor's Language of Thought (Mentalese)

Learning a language requires being able to evaluate truth conditions: "'The cat is on the mat' iff there's a cat and there's a mat and the cat is on the mat." Circularity problem: you can't learn this in English without already knowing what "cat" and "mat" are.

Solution: an innate symbolic medium β€” Mentalese. Slogan: "You cannot use the language you're learning to learn."

8.3 Nicaraguan Sign Language

In the 1970s, a school for deaf children in Nicaragua tried to teach Spanish by finger-spelling. Instead, the children spontaneously generated their own sign language. Documented by linguist Judy Kegl. Later generations added structural features like spatial modulation β€” evidence for innate language abilities.

8.4 Three paradigms for past-tense learning

The English past tense is a microcosm. Children show two features: (1) follow the "-ed" rule ("walked"); (2) handle exceptions ("gave"). Crucially, they make overgeneralization errors ("goed") that come and go in a gradual learning curve.

Dual-route (symbolic)

Two separate systems: (1) associative memory for irregulars; (2) an explicit "-ed" rule for regulars.

vs.
Plunkett & Marchman (1993)

One connectionist network: 20 input + 30 hidden + 20 output units; phonological input β†’ phonological output. Reproduces overgeneralization and gradual learning β€” without explicit rules.

πŸ“˜ Textbook adds β€” Rumelhart & McClelland (1986): the original past-tense network

Before Plunkett & Marchman there was the Rumelhart & McClelland (1986) PDP model β€” the founding connectionist past-tense network, published in the two-volume Parallel Distributed Processing.

R&M (1986)Plunkett & Marchman (1993)
ArchitectureSimple pattern associator, no hidden units20–30–20 with hidden layer
Input encodingWickelfeatures (after Wickelgren) β€” context-sensitive phoneme codesRaw phonological input
Learning rulePerceptron convergenceBackpropagation
Training regime10 high-frequency verbs β†’ suddenly expanded to 410 medium-frequency (80% regular)20 verbs (half regular, half irregular), gradually expanded
Reproduced overgeneralization?Yes β€” but Pinker & Prince argued it was baked in by the sudden vocabulary jumpYes, without the question-begging schedule

The lineage matters because Pinker & Prince's critique of R&M is what motivated the dual-route symbolic model. The later Plunkett & Marchman result rebuts that critique on the connectionist side: overregularization emerges from co-presence of regulars and irregulars, not from training-set manipulation.

8.5 Bayesian language learning

(a) Word segmentation via transitional probabilities

For the sound string /k/ /ae/ /t/ /m/ /i/ /aʊ/ /z/ ("cat meows"):

  • p(/ae/ | /k/) β€” high (within-word transition)
  • p(/t/ | /ae/) β€” high
  • p(/m/ | /t/) β€” low β€” this dip signals a word boundary

Same logic scales up to word-level transitional probabilities for sentence boundaries.

(b) Pronominal anaphora (Lidz et al.)

"I'll play with this red ball, and you can play with that one." Does "one" refer to H1 = a ball or H2 = a red ball?

  • P(H | S) ∝ P(S | H) Β· P(H)
  • Children learn P(S | H) from experience.
  • Since P(S | H2) > P(S | H1), the most likely intended referent is "the red ball."

Reference: "What children know about syntax but could not have learnt."

8.6 LLMs β€” how they work

Token
β‰ˆ morpheme
β†’
Embedding
100s of dims, learned
β†’
Attention layers
weighted recombination
β†’
Next-token
prediction
  • Autoregression: feed each prediction back as input to predict the next.
  • Transformer paper: "Attention is all you need" (Vaswani et al., 2017).
  • Attention heads recode each token as a learned weighted combination of all tokens. Stacked hundreds of times, purely feedforward.
  • The final encoding of the last token IS the prediction of the next token. Deterministic; randomness added post-hoc.

Embedding arithmetic β€” words as vectors

biggest βˆ’ big + small = smallest

Paris βˆ’ France + Berlin = Germany

Doctor βˆ’ man + woman = nurse   bias embedded

Two-stage training

  1. Pre-training: mask the next word in billions of internet texts; backprop until predictions improve. (Tends to complete rather than reply.)
  2. RLHF (Reinforcement Learning from Human Feedback): humans rate outputs; network updated toward higher-rated predictions. Makes models conversational.

LLM knowledge β‰ˆ long-term semantic memory (fuzzy, can hallucinate); LLM prompts β‰ˆ working memory (relevant info inserted reduces hallucinations).

8.7 Big debates β€” does this refute Chomsky?

Piantadosi (2023)

"LLMs refute Chomsky." A pure text-prediction net acquires grammatical structure with no innate machinery. Proof of principle that syntax can be acquired without innate structure.

vs.
Bender et al. (2021) β€” Stochastic Parrots

"An LM is a system for haphazardly stitching together sequences of linguistic forms... according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot."

Other critiques:

  • Grounding problem β€” LLMs don't know what a banana tastes like.
  • Mitchell & Krakauer (2023) β€” humans learn concepts; they abstract, reason compositionally and counterfactually, intervene on the world, and explain.
  • Guest & Martin (2023) β€” multiple realizability: same outputs do not imply same mechanism.
  • Training data asymmetry β€” GPT-3 saw ~4 Γ— 10ΒΉΒΉ words; a 5-year-old does causal reasoning on 4–5 orders of magnitude less input. Possible explanations: nativism, multi-modal grounding, active/social learning, or "comparing apples and pears."
  • Binz & Schulz (2023, PNAS) β€” gave GPT-3 the Wason selection task. It got the canonical version right but failed ~50% of trials, with human-like error patterns.

Self-test Β· Lecture 8

1. Fodor's circularity argument for the Language of Thought says:
  1. Language can only be learned from a teacher
  2. You cannot use the language you are learning to learn it β€” so the medium of learning must already be in place (Mentalese)
  3. Children must first learn to speak English before they can think
  4. Truth conditions are unlearnable
Show answer
B. The argument: evaluating truth conditions for "the cat is on the mat" already presupposes you know what cat/mat mean β€” so there must be an innate, prior, symbolic medium (Mentalese).
2. The Plunkett & Marchman (1993) connectionist past-tense model demonstrated:
  1. Children do not actually overgeneralize
  2. A single network without explicit rules can reproduce both overgeneralization errors and gradual learning curves
  3. Past-tense learning requires the dual-route architecture
  4. Symbolic AI is sufficient for language learning
Show answer
B. One net, no explicit rules, but the developmental signature of children appears anyway β€” challenges the necessity of the symbolic dual-route account.
3. Transitional probabilities help an infant segment speech because:
  1. They are high within words and low across word boundaries
  2. They are low within words and high across word boundaries
  3. They are uniform across the speech stream
  4. They depend on the listener's prior vocabulary
Show answer
A. Within "cat", p(/ae/ | /k/) is high; across the boundary to "meows", p(/m/ | /t/) is low. The dip is the cue.
4. The Stochastic Parrots paper (Bender, Gebru, McMillan-Major & Shmitchell 2021) argues that LLMs:
  1. Have genuine semantic understanding
  2. Refute Chomsky's nativism
  3. Stitch linguistic forms together by probability without reference to meaning
  4. Will eventually become AGI
Show answer
C. The paper's defining critique. Also raises ethical issues: energy cost, internet biases, documentation debt.
5. Why does the case of Nicaraguan Sign Language matter for the nativist position?
  1. It shows that without explicit instruction, deaf children spontaneously generated a structured language with later generations adding features like spatial modulation
  2. It shows that sign language is impossible without spoken language input
  3. It refutes Chomsky
  4. It demonstrates LLM emergence
Show answer
A. Documented by Judy Kegl β€” cited as evidence for innate language capacities.
6. In the "Attention is all you need" transformer architecture, the final-layer encoding of the last input token represents:
  1. A hidden state to be discarded
  2. The prediction of the next token
  3. A summary of the entire vocabulary
  4. An attention weight matrix
Show answer
B. The final encoding of the last token IS the next-token prediction. Output is deterministic; randomness is added post-hoc by sampling.
JLB Chapter 15

Consciousness

Consciousness has two distinct dimensions (wakefulness and awareness), splits into easy vs. hard problems (Chalmers), and admits multiple competing theories. The empirical methods of cognitive science can address the easy problems; whether they can address the hard problem is a matter of fierce debate.

9.1 Two dimensions of consciousness (Laureys 2005)

Wakefulness

A state. Gradual. Varies over time. Objectively measurable (EEG, behaviour).

vs.
Awareness

An experience. To be conscious of something. First-person; not externally observable.

StateSleep/wake cycle?Reactions?Awareness?
ComaNoOnly reflexesNone
UWS (vegetative)YesAutonomic, eye-opening, reflex behaviourNone (apparent)
MCS (minimally conscious)Yes+ some non-reflex movements (fixations, follow commands)Some
LIS (locked-in)YesCannot move a muscle (minimal eye movement at most)Fully awake and conscious

In a study of 54 UWS/MCS patients, 5 could modulate brain activity via fMRI on command β€” they were conscious despite the clinical diagnosis.

9.2 The knowledge argument β€” Mary's Room (Jackson 1986)

"Mary is confined to a black-and-white room… she knows all the physical facts about us and our environment… It seems, however, that Mary does not know all that there is to know. For when she is let out of the black-and-white room or given a color television, she will learn what it is to see something red…"β€” Frank Jackson, 1986

Formal argument:

  1. Inside the room, Mary has complete knowledge of how the brain processes colour.
  2. So she knows everything about the information-processing of red.
  3. When she leaves the room, she acquires new knowledge β€” what red is like.
  4. Therefore, some aspects of conscious experience cannot be understood in terms of information processing.

9.3 Non-conscious processing β€” priming & dissociations

Strategy: contrast processing that works without awareness with processing that requires it.

  • Face/Tool priming: categorization is faster when the prime is congruent (face primes face).
  • Word priming: "DOG" is recognized faster after "CAT" than after "CAR" β€” the priming is semantic, not visual.
  • Non-conscious priming is short-lived across SOA β€” consciousness allows information to be retained over time.
  • Double dissociation β€” in disorder 1: A intact, B impaired; in disorder 2: B intact, A impaired. Strong evidence for separable mechanisms.
Neglect

Lesions to right parietal/frontal. Patients lack awareness of contralesional (typically left) space β€” yet implicit processing of neglected information can occur.

Blindsight

Lesions to V1. Patients are aware they cannot see ("How can I look at something I haven't seen?") yet, when forced to guess, are correct above chance for movement, orientation, even some shapes.

9.4 What is consciousness for?

Patients in blindsight or neglect never voluntarily act on stimuli in their affected field. The lecture's claim:

Consciousness permits identification of targets and planning of deliberate, voluntary action.

Milner & Goodale: two visual streams revisited

Dorsal β€” vision for action

Online motor control. Non-conscious. Not fooled by the Ebbinghaus illusion β€” grip aperture is veridical even when perceived size is illusory.

vs.
Ventral β€” vision for perception

Conscious perception, deliberate-action planning. Damaged in patient D.F. (visual form agnosia) β€” she can post a card into a slot but cannot report its orientation.

Fang & He (2005, Nature Neuroscience): interocular suppression renders a stimulus invisible. Result β€” robust dorsal activity even when the stimulus is invisible, but ventral activity tracks conscious perception. Conscious awareness is restricted to the ventral pathway.

9.5 Block's distinction

Phenomenal consciousness (P)

Raw experience. Qualia β€” the felt redness of red, the painfulness of pain. The "what it is like" aspect.

vs.
Access consciousness (A)

Reportable. Direct control of thought, reasoning, speech, action. Information available to the global workspace.

9.6 Chalmers β€” easy vs hard problems

Easy problems
  • Discrimination, categorization, reaction to environment
  • Integration of information
  • Reportability of mental states
  • Internal-state access
  • Focus of attention
  • Deliberate behavioural control
  • Wakefulness vs sleep
The Hard problem
"There is something it is like to be a conscious organism… the felt quality of redness, the sound of a clarinet, the smell of mothballs… What unites all of these states is that there is something it is like to be in them."β€” Chalmers, 1995

Why is there subjective experience at all? Why isn't the information processing happening "in the dark"?

9.7 Three theories of consciousness

Global (Neuronal) Workspace

(Baars Β· Dehaene & Changeux). Theatre metaphor: attention spotlights content on stage; the audience receives the broadcast; backstage processes shape what gets in. Hierarchical, distributed architecture broadcasts integrated info brain-wide β†’ reportability. Pyramidal neurons may form the substrate. Claims to dissolve the hard problem.

Integrated Information Theory (IIT)

(Tononi). Consciousness = integrated information (Ξ¦). A system is conscious to the extent it has irreducible cause-effect structure. Named in lecture without detailed Ξ¦ calculation.

Recurrent Processing (RP)

(Lamme). Recurrent loops in sensory cortex generate phenomenal experience. Named only.

πŸ“˜ Textbook adds β€” Higher-Order Theories (Rosenthal, Armstrong, Lycan, Carruthers)

The textbook treats HOT as one of the major theories alongside GWT, IIT, and recurrent processing β€” the lecture only names it. Core claim: a mental state is conscious iff it is the object of a suitable higher-order mental state. The very same first-order state can be conscious at one moment and nonconscious at another, depending on whether something higher is "watching" it.

HOP β€” Higher-Order Perception (Armstrong, Lycan)

A first-order state becomes conscious when an inner sense (introspection / a quasi-perceptual scanner) targets it. Objection: defining inner sense via "awareness" risks circularity; sensory representations of abstract thoughts (the Pythagorean theorem) seem implausible.

vs.
HOT β€” Higher-Order Thought (Rosenthal)

A first-order state is conscious iff accompanied by a thought about it. Empirical support: Lau & Passingham β€” visual masking varied subjective awareness while behavioural accuracy was constant; awareness tracked activation in dorsolateral PFC (BA 46).

Standard objection to both: when I consciously smell a rose, I am aware of the rose, not of a mental state representing the rose β€” HOT/HOP seem to misdescribe the phenomenology. They also struggle to explain why consciousness has any distinctive functional role if first-order states behave identically with or without an accompanying HOT.

9.8 Dennett's deflationary view β€” the vitalism analogy

"The so-called hard problem of consciousness will disappear once we have a good enough understanding of the various phenomena lumped together under the label 'access consciousness'."β€” Daniel Dennett (1942–2024)

Analogy: in the 19th century, biology and chemistry seemed incapable in principle of explaining what separates living from non-living matter β€” there must be an Γ©lan vital ("vital force"). As biology matured, vitalism evaporated. Same fate (Dennett predicts) awaits the hard problem.

9.9 Cognitive scientists vs Mysterians

  • Cognitive scientists: consciousness is a thriving research programme; the easy problems are tractable, and progress on them will eventually dissolve the hard problem.
  • Mysterians: consciousness is, in principle, beyond cognitive-scientific tools.

Self-test Β· Lecture 9

1. Who coined the term "hard problem of consciousness"?
  1. Frank Jackson
  2. Ned Block
  3. David Chalmers
  4. Daniel Dennett
Show answer
C. Chalmers (1995). Jackson did Mary's Room; Block did P/A distinction; Dennett is the deflationist.
2. The Mary's Room thought experiment is associated with:
  1. Daniel Dennett, defending illusionism
  2. Frank Jackson (1986), the knowledge argument
  3. Ned Block, P-consciousness
  4. Giulio Tononi, IIT
Show answer
B. Jackson's thought experiment argues that physical knowledge is insufficient for knowing what red is like.
3. The distinction between P-consciousness (qualia) and A-consciousness (reportable, control of thought/action) is due to:
  1. Tononi
  2. Chalmers
  3. Ned Block
  4. Baars
Show answer
C. Ned Block. Many cognitive scientists collapse them; Block insists they are distinct.
4. Locked-In Syndrome (LIS) is characterized by:
  1. No sleep/wake cycle, no awareness
  2. Sleep/wake cycle and some minimal non-reflex movements
  3. Inability to move any muscle but with full awareness and consciousness
  4. A coma
Show answer
C. LIS = fully conscious, fully aware, but no motor output (maybe minimal eye movements).
5. Fang & He (2005) used interocular suppression and showed that:
  1. Both dorsal and ventral activity require conscious vision
  2. Dorsal-stream activity persists for invisible stimuli; ventral activity does not
  3. Ventral-stream activity persists for invisible stimuli; dorsal does not
  4. Conscious vision recruits only V1
Show answer
B. Robust dorsal activity for invisible stimuli; ventral activity only for conscious stimuli. Conscious perception tracks the ventral stream.
6. Dennett's argument that the hard problem will dissolve is built on an analogy with:
  1. The phlogiston theory of combustion
  2. Vitalism / Γ©lan vital
  3. Cartesian dualism
  4. Phrenology
Show answer
B. Once chemistry and biology matured, vitalism had nothing left to do. Dennett predicts the same fate for the hard problem as access consciousness gets explained.
7. Which theory of consciousness uses the "theatre with a spotlight, an audience, and backstage processes" metaphor?
  1. Integrated Information Theory (Tononi)
  2. Higher-Order Theory (Rosenthal)
  3. Global (Neuronal) Workspace Theory (Baars / Dehaene)
  4. Recurrent Processing Theory (Lamme)
Show answer
C. GWT/GNW β€” attention spotlights content, audience receives broadcast, backstage processes set context.
Synthesis

Cross-cutting Themes

The lectures keep circling the same fault lines from different angles. Spotting them is half of cognitive science.

Theme 1 β€” Three paradigms (symbolic / connectionist / Bayesian-dynamical)

ParadigmStarts fromKnowledge lives inStar applications in the course
Symbolic / ClassicalAlgorithmic level (Marr); the mindExplicit rules + symbolsSHRDLU, Fodor's Mentalese, dual-route past tense
ConnectionistImplementational level; the brainPattern of weightsPlunkett & Marchman past tense, LLMs
Bayesian / DynamicalComputational behaviour over timeProbabilities or state-space trajectoriesWord segmentation, Lidz anaphora, Ising depression, DFT

Theme 2 β€” Localization vs holism / encapsulation vs integration

  • Phrenology (wrong in particulars, right in spirit) β†’ Broca β†’ Brodmann β†’ fMRI subtraction β†’ modular vision streams.
  • Fodor: peripheral modules + central isotropic system. Massive modularity: all modules, no central system.
  • Dorsal/ventral streams recur in lectures 3, 4, and 9 β€” different lesions, illusions, and consciousness studies all converge on the same anatomical dissociation.

Theme 3 β€” Conscious vs non-conscious processing

  • Behaviourism would deny "the unconscious." But Corteen & Wood, blindsight, neglect, priming, and Owen et al.'s vegetative-state patients all show information processing without (or apart from) awareness.
  • The lecture's unifying answer: consciousness is for deliberate, voluntary action and durable explicit information maintenance.

Theme 4 β€” Normative vs descriptive

  • Bayesianism = how rational agents should reason.
  • Tversky & Kahneman = how humans actually reason (badly, with predictable biases).
  • Same tension in language learning: do children obey UG (norms) or are they statistical learners (descriptive)?

Theme 5 β€” The four big "dissolution" moves

  1. Materialism dissolves dualism β€” Hobbes vs Descartes (lec 3).
  2. S–S dissolves S–R β€” Rescorla vs Watson (lec 1).
  3. Connectionism dissolves the need for explicit rules β€” Plunkett & Marchman vs dual-route (lec 5, 8).
  4. Dennett dissolves the hard problem β€” vitalism analogy (lec 9).
Timeline

Timeline of Key Figures & Events

DateFigure / EventContribution
1247Bethlem Royal Hospital ("Bedlam")First closed institution for the mentally ill
1596–1650RenΓ© DescartesCartesian dualism; pineal gland as mind–body interaction site
1588–1679Thomas HobbesMaterialism β€” everything is matter, soul is meaningless
~1810sFranz GallPhrenology β€” wrong in detail, right in spirit (localization)
1824–1880Paul BrocaClinical neuropsychology; Broca's area = speech production
1898Edward L. ThorndikePuzzle-box experiments; Law of Effect
1904Ivan PavlovNobel Prize; classical conditioning of salivation in dogs
1913John B. WatsonFounds behaviourism (S–R)
~1920s+B. F. SkinnerOperant conditioning; Skinner box; reinforcement schedules
1943McCulloch & PittsNeural networks compute any computable function
1945Walter FreemanTransorbital icepick lobotomy; >40,000 in the US
1953VolkovaSemantic generalization in conditioning
1956George A. Miller"Magical number 7 Β± 2"; ~3 bits channel capacity
1958–60Cherry Β· Broadbent Β· TreismanDichotic listening; early-selection filter; switching
1959Moray"Own name" breakthrough in unattended ear
1965Joseph WeizenbaumELIZA
1970Terry WinogradSHRDLU (MIT)
1972Kenneth ColbyPARRY (Stanford); 48% Turing-test accuracy
1972Corteen & WoodGSR breakthrough β€” semantic processing of unattended info
1973RescorlaS–S vs S–R habituation experiment
1973Cooper & ShepardMental rotation
1974Thomas Nagel"What is it like to be a bat?"
1980David MarrThree levels of analysis (published posthumously 1982)
1986Frank JacksonMary's Room β€” the knowledge argument
1993Plunkett & MarchmanConnectionist past-tense network (20-30-20)
1995David Chalmers Β· van GelderHard problem of consciousness Β· "What might cognition be, if not computation?"
2005Fang & HeInterocular suppression β€” dorsal vs ventral consciousness
2006Owen et al.fMRI detects covert awareness in vegetative-state patients
2016Cramer et al.Ising network model of depression
2017Vaswani et al."Attention is all you need" β€” the Transformer
2019Cisek Β· Busemeyer et al.Phylogenetic refinement Β· Decision Field Theory
2021Bender, Gebru, McMillan-Major & Shmitchell"On the Dangers of Stochastic Parrots"
2023Piantadosi Β· Binz & Schulz"LLMs refute Chomsky" Β· GPT-3 on Wason task
Glossary

Master Glossary

High-yield terms across all lectures. Skim the night before; nail them all.

Conditioning & learning

TermDefinition
US / URUnconditioned stimulus / unconditioned response (food β†’ salivation)
CS / CRConditioned stimulus / conditioned response (bell β†’ salivation after pairing)
ExtinctionCR weakens when CS is repeatedly unpaired with US
Spontaneous recoveryPartial return of CR after rest β€” CR is inhibited, not lost
S–R vs S–S theoryDirect stimulus-response bond vs link via mental representation of US
Law of EffectSatisfying consequences strengthen responses (Thorndike)
Operant / instrumentalSelf-initiated behaviour modified by consequences
ShapingReinforcement of successive approximations
FR / VR / FI / VIFour schedules β€” VR (variable-ratio) most resistant to extinction

Brain & methods

TermDefinition
Action potentialAll-or-none binary signal; frequency-modulated
MyelinSheath increasing axon speed; "white matter"; MS attacks it
Corpus callosumConnects hemispheres; cut in callosotomy for refractory epilepsy
Brodmann areaCytoarchitectonic parcellation (BA 4 motor, 17 visual, 44 Broca)
BOLD signalBlood-oxygen-level dependent β€” what fMRI directly measures
DTIDiffusion tractography β€” visualizes white-matter tracts
P1/N1, P300, N400, P600ERP components β€” attention, oddball, semantic, syntactic
Dissociation / double dissociationStrong evidence for separable mechanisms

Vision & categorization

TermDefinition
Primal / 2Β½-D / 3-D sketchMarr's three stages of vision
Generalized conesMarr's primitives β€” objects are stacks of cones
Geons (36)Biederman's geometric primitives
Constancies (size/brightness/shape)Brain corrects for retinal variation to perceive stable objects
Kanizsa's triangleIllusory contours filled in by the brain
Dorsal vs ventralWhere/how (action) vs what (perception) pathways
Form agnosia vs integrative agnosiaCannot perceive shape vs cannot integrate shape into recognition
BlindsightAbove-chance forced-choice without conscious vision (V1 damage)
ProsopagnosiaFace blindness; bilateral FFA damage
Capgras delusion"My partner is an impostor" β€” opposite dissociation of prosopagnosia

Connectionism & AI

TermDefinition
PerceptronSingle-layer network of weighted inputs
Linearly separableClass of functions a perceptron can learn (XOR is NOT)
Delta ruleΞ”Wα΅’ = Ρ·Iα΅’; Ξ”T = βˆ’Ξ΅
BackpropagationAssigns each hidden unit "responsibility" for output error
Gradient descentFollow negative gradient; risk of local minima
Universal approximationMulti-layer nets compute any Turing-computable function
Token / embeddingText chunk / high-dim learned vector
AutoregressionFeed each prediction back as input
RLHFReinforcement learning from human feedback β€” second training stage of LLMs

Bayes

TermDefinition
Prior / likelihood / posteriorp(H), p(E|H), p(H|E)
Marginal evidencep(E) = Ξ£ p(E|H)Β·p(H)
Dutch bookBet that's fair to subject but guaranteed loss β€” justifies probability calculus
Base-rate neglectIgnoring prior probability when updating
Transposed conditionalConfusing p(A|B) with p(B|A)
Conjunction fallacyJudging p(A∧B) > p(A) β€” Linda problem
Maximizing vs matchingAlways pick the higher-probability option vs match probabilities β€” maximizing wins
Search policyargmaxα΅’ pα΅’Β·aα΅’/cα΅’

Consciousness & mind

TermDefinition
Wakefulness / awarenessState vs experience (Laureys)
UWS / MCS / LISVegetative / minimally conscious / locked-in
QualiaFelt qualities of experience
P-consciousness / A-consciousnessBlock's phenomenal vs access distinction
Easy / Hard problemsInformation processing tractable / experience itself (Chalmers)
Knowledge argument / Mary's RoomJackson 1986
GWT / GNWGlobal (Neuronal) Workspace β€” Baars / Dehaene
IIT (Ξ¦)Integrated Information Theory β€” Tononi
RPRecurrent Processing theory β€” Lamme
Vitalism (Γ©lan vital)Dennett's analogy for why hard problem will dissolve
Mentalese / Language of ThoughtFodor's innate symbolic medium
Module (Fodor)Domain-specific, encapsulated, mandatory, fast, fixed, specific breakdown
IsotropicUn-encapsulated, holistic β€” Fodor's central processing
Massive modularityNo central processor; many Darwinian modules
State space / trajectoryDST geometric concepts for cognition over time
Big Practice Exam

40-question Cross-lecture MCQ Practice

Mixed, harder, exam-shaped. Click "Show answer" only after committing. The choices include the most common distractors a professor will use.

1. Pavlov originally believed that the salivation of dogs to the assistant's footsteps was:
  1. A successful classical conditioning result
  2. An experimental error he called "psychic secretion"
  3. An operant response
  4. Evidence of insight learning
Show answer
B. He initially called it "psychic secretion" and considered it noise β€” then pivoted his entire programme to study it.
2. Which of the following is the strongest evidence that ignored auditory input is processed semantically?
  1. Cherry's finding that listeners notice voice-pitch changes in the ignored ear
  2. Moray's (1959) "own name" breakthrough
  3. Treisman's (1960) ear-switching effect
  4. Corteen & Wood's (1972) GSR to ROME after conditioning to PARIS, LONDON, CAIRO
Show answer
D. Generalization across the semantic category "city" requires the meaning to be extracted from the ignored channel. (Own name and ear-switching are also evidence but the GSR data are the most striking semantic finding.)
3. McCulloch & Pitts based their model on three principles. Which one is NOT among them?
  1. Basic physiology
  2. Propositional logic
  3. Turing's theory of computation
  4. Information theory (Shannon)
Show answer
D. The three are physiology, propositional logic, and Turing computation. Shannon's information theory came from Miller's lineage, not McCulloch & Pitts.
4. In Marr's framework, "the algorithm for edge detection followed by stereopsis-based 2Β½-D sketching" describes a system at which level?
  1. Computational
  2. Algorithmic
  3. Implementational
  4. Connectionist
Show answer
B. Algorithmic = specific representations + steps. The implementational level would describe the neurons in V1/V2 carrying out those steps.
5. Biederman's recognition-by-components is most similar in spirit to:
  1. Marr's "generalized cones"
  2. Pavlov's S–S theory
  3. Broadbent's filter model
  4. Skinner's shaping
Show answer
A. Both decompose objects into geometric primitives β€” 36 geons / generalized cones.
6. A patient can post a card into a slot but cannot report the slot's orientation. The most likely lesion is to:
  1. Dorsal pathway only
  2. Ventral pathway only
  3. Both pathways
  4. Broca's area
Show answer
B. Ventral pathway damage = visual agnosia. The dorsal "how" pathway is intact, so action proceeds normally.
7. Capgras delusion is opposite to prosopagnosia in that:
  1. Capgras patients have intact explicit recognition but lost emotional/implicit recognition
  2. Capgras patients lose both explicit and implicit recognition
  3. Capgras patients have intact emotional but lost explicit recognition
  4. Capgras involves the dorsal stream
Show answer
A. Capgras: explicit βœ…, emotional ❌. Prosopagnosia: explicit ❌, emotional βœ….
8. The fMRI BOLD signal reflects:
  1. Direct firing of neurons
  2. Changes in haemoglobin's magnetic properties due to oxygenation
  3. Cerebral blood flow measured by radioactive tracer
  4. Magnetic fields produced by ion currents
Show answer
B. Oxy- vs deoxy-haemoglobin have different magnetic properties; that contrast is what fMRI measures. NOT direct neural activity.
9. Owen et al. (2006) instructed a vegetative-state patient to imagine "playing tennis" vs "walking through your house" because:
  1. These tasks activate motor (SMA) vs parahippocampal regions β€” task-specific, voluntary, and detectable on fMRI
  2. They are easier for patients than verbal tasks
  3. They reduce noise in EEG recordings
  4. They are the only paradigms compatible with locked-in syndrome
Show answer
A. Task-specific BOLD activation on command demonstrates covert awareness β€” a yes/no communication channel.
10. In a perceptron, the activation function turns the weighted sum into a 0 or 1 output. The classic version used in textbook walkthroughs is:
  1. Sigmoid
  2. ReLU
  3. Softmax
  4. Heaviside step function
Show answer
D. Heaviside step = binary threshold, analogous to a neuron's spike threshold. ReLU and sigmoid are more common in modern practice.
11. The XOR problem matters historically because:
  1. It is the only function single-layer perceptrons can compute
  2. It is non-linearly-separable, exposing the fundamental limitation of single-layer perceptrons
  3. It cannot be computed by any neural network
  4. It requires Hebbian learning
Show answer
B. XOR's non-linear separability motivated multi-layer networks and backpropagation.
12. Plunkett & Marchman's (1993) past-tense model showed that:
  1. You need a dual-route architecture for overgeneralization to occur
  2. A single neural network without explicit rules can reproduce overgeneralization and gradual learning curves
  3. Children do not overregularize
  4. Backpropagation is biologically implausible
Show answer
B. 20-30-20 phonological-input network; reproduces "goed" errors and the developmental trajectory.
13. The Cisek-style massive-modularity view rejects:
  1. Domain-specific modules
  2. The existence of a domain-general central processor
  3. Natural selection
  4. Pragmatic representations
Show answer
B. Fodor keeps a non-modular central system; massive modularity eliminates it.
14. "What might cognition be, if not computation?" is the banner question for:
  1. Symbolic AI
  2. Connectionism
  3. Dynamical systems theory (van Gelder, 1995)
  4. Predictive coding
Show answer
C. Van Gelder's quote launches the dynamical-systems alternative to representational/computational accounts.
15. The Ising network model of depression treats symptoms as:
  1. Symptoms of a single underlying latent variable
  2. Binary nodes coupled by pairwise weights; stress = extra input to all nodes
  3. Symbolic atoms in a Mentalese
  4. Output of a single decision system
Show answer
B. Network model. Contrasts with the latent-variable approach (gallstones cause all symptoms).
16. In Bayes' rule, the posterior is highest when:
  1. The likelihood is high and the prior is low
  2. The likelihood is high and the prior is high
  3. The evidence is high
  4. The evidence equals the likelihood
Show answer
B. Posterior ∝ Likelihood Γ— Prior. Both must be high.
17. The Dutch-book argument shows that:
  1. Personal probabilities are subjective and unscientific
  2. Violations of probability calculus make one exploitable, so rational beliefs must obey it
  3. Conditional probabilities are unreliable
  4. Frequentism is correct
Show answer
B. The normative justification for Bayesianism.
18. Why do people rank "Linda is a bank teller and active in the feminist movement" higher than "Linda is a bank teller"?
  1. Conjunction fallacy: representativeness overrides probability theory
  2. The conjunction is actually more likely
  3. Bank tellers form a subset of feminists
  4. Availability heuristic
Show answer
A. Tversky & Kahneman's signature finding. Conjunctions are subsets β€” they can only be less probable.
19. In a stochastic environment (4 red / 2 green), what does the maximizing strategy achieve?
  1. 56% correct
  2. 67% correct
  3. 100% correct
  4. 33% correct
Show answer
B. Always-red β†’ 2/3. Probability matching β†’ 5/9 = 56%. Maximizing wins.
20. The "stochastic parrots" critique of LLMs (Bender et al. 2021) argues that:
  1. LLMs have genuine semantic grounding
  2. LLMs stitch linguistic forms together by probability without reference to meaning
  3. LLMs cannot be trained on biased text
  4. LLMs refute Chomsky
Show answer
B. The defining quote. Also flags ethical issues: energy cost, biases, documentation debt.
21. The "knowledge argument" (Jackson 1986) concludes:
  1. Physical knowledge fully captures conscious experience
  2. There are aspects of conscious experience not captured by complete physical knowledge
  3. Mary cannot learn colour from books
  4. Qualia are an illusion
Show answer
B. When Mary leaves the room she learns something new β€” so physical knowledge is incomplete (anti-physicalist conclusion).
22. Which is the correct ordering of states by increasing level of awareness?
  1. Coma < LIS < UWS < MCS
  2. Coma < UWS < MCS < LIS
  3. UWS < Coma < MCS < LIS
  4. LIS < MCS < UWS < Coma
Show answer
B. Coma β†’ vegetative (UWS) β†’ minimally conscious (MCS) β†’ locked-in (fully conscious).
23. Fang & He (2005) found that under interocular suppression:
  1. Both streams went silent
  2. Dorsal-stream activity persisted; ventral activity tracked conscious perception
  3. Ventral-stream activity persisted; dorsal activity tracked conscious perception
  4. Conscious perception preceded both streams
Show answer
B. Conscious perception tracks the ventral stream; the dorsal stream operates non-consciously.
24. In Cooper & Shepard's mental-rotation paradigm, reaction time is highest at approximately:
  1. 0Β° / 360Β°
  2. 90Β°
  3. 180Β°
  4. 270Β°
Show answer
C. Maximally rotated from upright = maximum rotation needed β†’ highest RT. Returns to baseline at 360Β°.
25. The N400 ERP component most reliably indexes:
  1. Early sensory processing
  2. Attention
  3. Semantic processing / semantic anomaly
  4. Syntactic reanalysis
Show answer
C. "He spread butter on his socks" elicits a large N400. P600 indexes syntactic reanalysis; P1/N1 are attention/sensory.
26. Volkova (1953) demonstrated:
  1. Operant conditioning of language
  2. Semantic generalization in classical conditioning (responses generalized from "good"/"bad" to whole sentences)
  3. Extinction of conditioned fear
  4. Discrimination training in pigeons
Show answer
B. Generalization extended beyond physical properties to meaning.
27. According to Weber's law jnd = kM, if k = 0.03 for weight detection, what's the smallest weight difference detectable when M = 100 kg?
  1. 0.03 kg
  2. 0.3 kg
  3. 3 kg
  4. 30 kg
Show answer
C. 0.03 Γ— 100 = 3 kg. The same proportion of any magnitude.
28. Treisman's attenuation theory of attention claims that:
  1. Ignored input is completely blocked at an early filter
  2. All input is processed for meaning; ignored input is forgotten
  3. Ignored input is attenuated but not blocked; important info (your name) is spared
  4. Distractor processing depends on the load of the main task
Show answer
C. Attenuation = intermediate between Broadbent (block) and Deutsch & Deutsch (no filter). Load theory is Lavie's.
29. Which is the strongest Bayesian-style critique of LLMs as models of human cognition?
  1. They use too many parameters
  2. They are trained on 4–5 orders of magnitude more data than human children, so behavioural similarity does not imply mechanistic similarity (multiple realizability)
  3. They cannot solve the Wason task
  4. They are deterministic
Show answer
B. Guest & Martin's multiple-realizability point combined with the data-asymmetry concern: children learn from vastly less input, so the mechanisms can't be the same.
30. Lobotomy was popularized in the US by:
  1. Paul Broca
  2. Walter Freeman (transorbital icepick, 1945; >40,000 procedures)
  3. B. F. Skinner
  4. Antonio Damasio
Show answer
B. Freeman's transorbital approach removed the need for a neurosurgeon and the procedure became horrifyingly widespread.
31. Which of these scholars argued β€” using a vitalism analogy β€” that the hard problem of consciousness will dissolve?
  1. Chalmers
  2. Nagel
  3. Dennett
  4. Tononi
Show answer
C. Dennett: as biology and chemistry matured, "Γ©lan vital" had no work left to do. He predicts the same for the hard problem.
32. Block's "P-consciousness" refers to:
  1. The reportable, control-of-thought aspect of consciousness
  2. The phenomenal/qualia aspect β€” what it's like
  3. Pre-reflective awareness
  4. Primary sensory cortex activity
Show answer
B. P = phenomenal (qualia). A = access (reportability, direct control). Many cogsci accounts collapse them; Block insists they're distinct.
33. The optimal Bayesian search policy is:
  1. argmaxα΅’ pα΅’
  2. argmaxα΅’ aα΅’
  3. argmaxα΅’ pα΅’ Β· aα΅’ / cα΅’
  4. argmaxα΅’ pα΅’ + aα΅’ βˆ’ cα΅’
Show answer
C. Maximize the ratio of (probability there Γ— detection probability) to cost.
34. The "magical number" Miller proposed corresponds approximately to a channel capacity of:
  1. 1 bit
  2. 3 bits (β‰ˆ 7 items)
  3. 7 bits
  4. 10 bits
Show answer
B. 7 Β± 2 items, ~3 bits, roughly modality-independent.
35. "Colorless green ideas sleep furiously" demonstrates:
  1. Syntactic anomaly with semantic well-formedness
  2. Syntactic well-formedness with semantic anomaly
  3. Both syntactic and semantic anomaly
  4. The conjunction fallacy
Show answer
B. Grammatical structure is fine; meaning is anomalous. Used in lecture 8 to argue syntax and semantics are dissociable.
36. The "perceptron convergence rule" guarantees:
  1. Convergence on any Turing-computable function
  2. Convergence whenever a perceptron-realizable (i.e., linearly separable) solution exists
  3. Convergence on the global minimum of any cost surface
  4. Convergence in O(log n) iterations
Show answer
B. Convergence is guaranteed if the problem is linearly separable; otherwise (e.g., XOR) the rule fails.
37. Which is the best example of operant (NOT classical) conditioning?
  1. A dog salivates when a bell rings
  2. A child blinks when a puff of air hits her eye
  3. A rat presses a lever more often because pressing produces food
  4. A baby orients to a loud sudden sound
Show answer
C. Operant = response modified by its consequences. The others are reflexes / classical conditioning.
38. The Lidz et al. "red ball" anaphora example is a Bayesian argument because:
  1. It uses transitional probabilities to segment words
  2. It computes p(referent | sentence) ∝ p(sentence | referent) · p(referent), favoring "red ball" because more specific hypotheses make the data more probable
  3. It refutes Chomsky
  4. It demonstrates the conjunction fallacy
Show answer
B. A specific hypothesis (red ball) is more likely to have produced the observed sentence than a generic one (ball).
39. The strongest evidence that consciousness is restricted to the ventral pathway is:
  1. Patient D.F.'s ability to post a card despite ventral damage
  2. The grip aperture being unaffected by the Ebbinghaus illusion (dorsal not fooled)
  3. Fang & He (2005)'s interocular suppression: dorsal active for invisible stimuli, ventral only for conscious ones
  4. All of the above
Show answer
D. All three lines converge on the same conclusion: the dorsal stream operates non-consciously; conscious experience tracks the ventral stream.
40. Which of the following statements is FALSE?
  1. fMRI has high spatial but low temporal resolution
  2. Single-unit recording achieves both high temporal and high spatial resolution but is invasive
  3. MEG measures changes in blood oxygenation
  4. EEG has high temporal but low spatial resolution
Show answer
C. MEG measures magnetic fields generated by neural electrical activity, NOT blood oxygenation (that's fMRI). The other three statements are correct.

β€” End of study guide β€”
Good luck on the exam. Trust the priors. Update on evidence.

JLB Chapter 16

The Emotions: From Cognitive Science to Affective Science

Emotions were largely ignored in early cognitive science. Affective science now studies them with a rich, multidisciplinary toolkit β€” from genetics and lesion studies to neuroimaging β€” using fear as its central case study.

16.1 Early Theories

Herbert Simon β€” Emotions as Interrupt Mechanisms

Simon (1967) argued that any sufficiently complex serial information-processing system (like the mind) must contain interrupt mechanisms β€” processes that can suspend an ongoing goal and substitute a new one when circumstances demand it. His proposal: emotions are those interrupt mechanisms in the CNS.

Three key properties of emotions on Simon's account:

  1. They interrupt ongoing goals and substitute new goals/behaviours.
  2. They arouse the autonomic nervous system in predictable physiological ways.
  3. They generate feelings of emotion.

Simon's paper is influential but thin on detail β€” no specific emotion is actually analysed. It raises key open questions: How should emotions be classified? Are some basic? What is the role of arousal, physiology, and feeling? What are the neural bases?

Paul Ekman β€” Basic Emotion Theory

Ekman asked whether facial expressions of emotion are cross-cultural universals or products of social learning. To avoid the confound of media exposure, he studied the Fore linguistic-cultural group in New Guinea β€” a preliterate, visually isolated culture.

Method: participants were shown 2–3 photos of facial expressions while a story indicating an emotion was read aloud. They had to point to the matching face. Six target emotions: happiness, sadness, anger, surprise, disgust, fear.

Result: both adults and children in New Guinea matched emotions to faces at rates significantly above chance, supporting universal cross-cultural recognition. Ekman also showed that literate cultures could recognise New Guinean facial expressions.

Ekman's two main claims:
  1. There are discrete, separate basic emotions, each with a coherent set of facial, physiological, cognitive, and behavioural responses.
  2. Each basic emotion serves a distinctive evolutionary function and is hardwired for specific life tasks.

Example β€” fear: eyebrows raised and horizontal, upper eyelid lifted, more sclera exposed (gathers information about threat); heart rate and skin conductance elevated; peripheral blood flow redirected to large skeletal muscles (preparing to flee).

Criticisms: Meta-analyses cast doubt on strict links between emotions and specific physiological/neural signatures. Cultural anthropologists question whether emotions are truly independent of social context.

16.2 Affective Space and the Affective Scientist's Toolkit

Affective Space: Beyond Discrete Categories

Affective phenomena vary in duration and function β€” emotions, moods, instincts, drives, and affective traits are all distinct (though their boundaries are fuzzy). Many affective scientists now model emotions as points in a multidimensional space rather than discrete categories.

The simplest model uses two dimensions:

  • Valence β€” pleasure ↔ displeasure (attractiveness vs. aversiveness of a situation)
  • Arousal β€” degree of physiological/psychological engagement

The circumplex model (Russell, 1980) plots emotions on a circle defined by valence and arousal. Includes both emotions and moods (e.g., sadness and depression are adjacent).

Adolphs & Anderson (2018) propose a richer 7-dimensional framework: scalability, valence, persistence, generalisation, global coordination, automaticity, social coordination. Designed to apply to non-human animals without relying on verbal self-reports.

Appraisal theories (Lazarus and others) add a cognitive dimension: emotions involve evaluating the environment relative to the subject's goals β€” what Lazarus calls core relational themes. On this view, anger and fear differ because they involve different appraisals of the same situation.

The Affective Scientist's Toolkit

Different tools study different components of an emotional episode (trigger β†’ perception β†’ neural/somatic response β†’ behavioural response, with optional: cognitive appraisal, feelings, verbal report):

  • fMRI, PET, electrophysiology β€” neural responses
  • FACS (Facial Action Coding System) β€” behavioural/expressive responses
  • Physiological measures β€” heart rate, skin conductance, finger temperature
  • Lesion studies β€” causal role of specific brain regions
  • Genetic tools β€” knockout experiments, optogenetics, pharmacogenetics

Genetic Tools

Knockout experiments (mice, 1990s+): replace a functional gene with a nonfunctional copy in stem cells, then study behavioural change. E.g., knocking out the serotonin receptor 5-HT(1A) increases anxiety-like behaviour in mice.

Optogenetics: engineer specific neurons to express a light-sensitive ion channel (opsin). Neurons can then be switched on/off with light (millisecond resolution). Allows targeted intervention in specific neural populations.

Pharmacogenetics: engineer neurons to express receptors for specific synthetic drugs (not normal neurotransmitters). Can activate or inhibit targeted populations. Slower than optogenetics (minutes to hours).

GECIs / GEVIs: genetically engineered indicators for calcium or voltage β€” allow optical measurement of neural activity as a complement to electrophysiology.

Lesion Studies

Classic case: Phineas Gage (1848) β€” iron rod through his head destroyed ventromedial prefrontal cortex. Physical and perceptual abilities preserved; emotional regulation and social behaviour drastically changed. First evidence linking prefrontal cortex to emotional function.

Lesions in humans provide information about dissociations; animal lesions allow greater anatomical precision and pre/post comparisons. Types: permanent (aspiration, excision, neurotoxins) or reversible (pharmacological, cryogenic cooling, TMS).

16.3 Fear: A Multilevel and Multidisciplinary Case Study

Fear Conditioning in Rodents

Fear conditioning pairs a neutral conditioned stimulus (CS β€” e.g., a tone) with an aversive unconditioned stimulus (US β€” e.g., foot shock). After training, CS alone elicits fear responses (freezing, autonomic arousal). This provides a controllable, reproducible way to study fear.

Multiple studies converge on the amygdala (a subcortical limbic structure) as central to fear:

  • Electrical stimulation of the amygdala produces fear responses (increased respiration, heart rate, blood pressure, freezing) β†’ amygdala is sufficient.
  • Lesion studies: amygdala lesions abolish conditioned freezing and reduce unconditioned defensive behaviour (e.g., lesioned rats approach sedated cats) β†’ amygdala is necessary.
  • The basolateral nucleus (BLA) contains distinct neuron populations responding to positive vs. aversive stimuli, projecting to different brain regions.

Fear and Amygdala Damage in Humans β€” Patient S.M.

S.M. has bilateral amygdala destruction from Urbach–Wiethe disease (UWD), a rare genetic condition. Her basic cognition (intelligence, memory, language, perception) is intact. But she shows profound impairment in experiencing, expressing, and recognising fear.

Feinstein et al. (2011): exposed S.M. to live snakes and spiders, a haunted house, and scary films. She showed no fear responses β€” no avoidance, no subjective fear. She experienced other emotions normally, confirming fear-specific amygdala involvement.

BLA vs. CMA β€” fine-grained amygdala function:
A separate group of UWD patients (Namaqualand, South Africa) had damage only to the basolateral amygdala (BLA), leaving the central-medial amygdala (CMA) intact. Their response: fear hypervigilance β€” exaggerated attention to mild threat cues (e.g., better recognition of fearful facial expressions). This contrasts with S.M.'s hypovigilance (BLA + CMA damaged).

Interpretation: the BLA exerts inhibitory control over the CMA. Without BLA, CMA fires impulsively β†’ hypervigilance. Without both β†’ no fear response at all.

Neuroimaging of Fear in Humans

LaBar et al. (1998): fMRI showed amygdala activation during fear conditioning and extinction in humans; greatest involvement in early stages of conditioning.

The human fear network identified by neuroimaging:

  • Amygdala β€” central fear processing
  • Hippocampus β€” memory/contextual information
  • Insula β€” broader emotion processing (especially disgust)
  • Anterior cingulate cortex (ACC) β€” bridges limbic and prefrontal systems
  • Ventromedial prefrontal cortex (vmPFC) β€” integrative hub; modulates/controls fear responses (especially in extinction)

Clinical relevance: PTSD involves overgeneralised fear, slow extinction, amygdala/ACC hyperactivity, and low vmPFC activity.

Mobbs et al. (2010): scanned subjects while a live tarantula was placed at varying distances from their foot. Closer threat β†’ increased amygdala, insula, ACC, BNST activation (active coping: flee). More distant threat β†’ increased orbitomedial PFC activity (passive coping: freeze). Approach vs. retreat also differentially activated amygdala and BNST.

Summary

Affective science grew from cognitive science's neglect of emotion. Simon proposed a computational account (emotions as interrupt mechanisms); Ekman proposed discrete, universal basic emotions. Contemporary affective science has moved toward multidimensional models of affective space and uses a broad toolkit β€” neuroimaging, lesion studies, fear conditioning, and genetic tools. Fear is the best-studied emotion, with the amygdala (and its subregions BLA/CMA) playing a necessary and central role, embedded in a wider network including the hippocampus, insula, ACC, and vmPFC.