The Cognitive Science Master Study Guide
An exam-ready synthesis of all ten lectures β every theory, every named scholar, every diagram, every contestable claim, every multiple-choice trap.
The Prehistory of Cognitive Science
Behaviourism dominated psychology in the early 20th century, restricting it to observable behaviour. Cracks in its account of conditioning, the rise of computational models of mind, and discoveries about attention pushed psychology toward cognitivism.
1.1 Two assumptions of behaviourism
- All learning is the result of conditioning.
- Conditioning depends on association and reinforcement.
The behaviourist slogan: "Psychology is the science of behavior." Mental processes are unobservable and therefore unscientific. Chomsky's killer rejoinder:
"Defining psychology as the science of behavior was like defining physics as the science of meter reading."β Noam Chomsky
1.2 Classical (Pavlovian) conditioning
Pavlov initially studied salivation in dogs (Nobel Prize 1904). When he noticed dogs salivating to the assistant opening the door β a "psychic secretion" β he pivoted his entire research programme.
The big debate inside classical conditioning
SβR (Watson)
Conditioning forges a direct stimulusβresponse bond. Bell β salivation. No mental states involved. Strictly behaviourist.
SβS (Pavlov / cognitivists)
Bell activates a mental representation of the food, which then produces the response. Internal states inferred from their predicted effects.
Rescorla (1973) tested this with rats. Pair light + loud sound β both elicit freezing. Then habituate half the rats to the sound alone until they stop freezing to it. Now present the light:
- SβR predicts: freezing to the light is intact (the light β freezing bond is independent).
- SβS predicts: no freezing β the light triggers the representation of the now-habituated sound.
Result: habituated rats did NOT freeze to the light β strong support for the cognitivist SβS theory.
Phenomena to know
- Extinction β CR weakens when CS is presented repeatedly without the US.
- Spontaneous recovery β after extinction, the CR partially returns after a rest. Pavlov concluded the CR is inhibited, not lost.
- Generalization β CR also occurs to similar stimuli (tones near the trained tone).
- Discrimination β organism learns to respond only to a specific CS, not similar ones.
- Volkova (1953) β semantic generalization: children conditioned to "good"/"bad" generalized to whole sentences like "The children are playing nicely together" / "The Fascists destroyed many cities."
π Textbook adds β Tolman's latent learning & cognitive maps (1930, 1946)
Tolman & Honzik (1930) β "'Insight' in Rats"
Three groups of rats ran a 14-unit T-Alley maze. Group 1 always rewarded; Group 2 never; Group 3 unrewarded for 10 days, then rewarded. When Group 3 started getting rewards, they learned the maze faster than Group 1 ever had β they had stored maze information during the unrewarded period. This latent learning directly contradicts behaviourism: learning without reinforcement.
Tolman et al. (1946) β cross-maze studies: place learning (knowing where the food is) is easier than response learning (knowing which turn to make). Rats build cognitive maps β internal representations of spatial layout. First major case of postulating internal representations in a behavioural science.
π Textbook adds β Lashley (1951): "The Problem of Serial Order in Behavior"
Lashley argued that complex behaviour (speech, tennis, piano playing) cannot be a chained sequence of stimulusβresponse links because what happens next depends on what will happen later in the sequence and on the overall goal. He proposed behaviour is organised hierarchically, with high-level plans broken down into sub-plans. Two foundational ideas crystallise from his essay:
- Subconscious information processing β most of the planning that turns goals into movements happens below awareness.
- Task analysis β a complex cognitive ability can be understood by decomposing it into a hierarchy of simpler sub-tasks (the methodological backbone of cognitive science).
1.3 Operant (instrumental) conditioning
Edward L. Thorndike 1898 puzzle box
Cats in a puzzle box escape by trial-and-error. Over 20β30 trials, the time to escape drops sharply.
Law of Effect: "Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely."
B. F. Skinner Skinner box
Animal stays in the box and can repeatedly produce operant responses. Replaced Thorndike's mentalistic "satisfaction" with the behaviour-neutral term reinforcement.
Used variable-ratio schedules to explain gambling addiction (Skinner 1953).
Reinforcement schedules
| Schedule | Rule | Behaviour produced |
|---|---|---|
| Fixed-Ratio (FR) | Reinforce after every nth response (FR-5 = every 5th) | Fast, steady responding |
| Variable-Ratio (VR) | Average of n responses per reward, varies unpredictably | Fastest responding; most resistant to extinction β slot machines! |
| Fixed-Interval (FI) | First response after a fixed time interval is reinforced | "Scalloping" β responding speeds up near the interval's end |
| Variable-Interval (VI) | Interval varies unpredictably | Slow, steady responding |
Ratio > Interval because in ratio schedules reinforcers scale with response rate; in interval schedules they're time-capped.
Positive reinforcement = arrival of a stimulus increases the response. Negative reinforcement = removal of a stimulus increases the response. Both make behaviour more likely (β punishment).
Shaping, discrimination, concept learning
- Shaping = training complex behaviour by reinforcing successive approximations (dog β kitchen β refrigerator β door β scratching).
- Discriminative stimulus = signal that a response will be reinforced (a light being on).
- Concept learning β pigeons rewarded for pecking Monet (not Picasso) generalize to CΓ©zanne, Renoir β a category "impressionist" forms.
1.4 Cognition and computation
If humans can simulate a single-tape Turing machine (slowly, inefficiently), then the brain is Turing-complete. McCulloch & Pitts built networks of neurons from three principles:
- Basic physiology
- Propositional logic
- Turing's theory of computation
Their results: any computable function can be computed by a network of neurons; all logical operators can be built from simple neural networks.
π Textbook adds β Chomsky's Syntactic Structures (1957)
Chomsky distinguished the deep structure of a sentence (its constituent phrase structure) from its surface structure (the actual word order, derived via transformational rules).
Phrase-structure grammar
Sentences = combinations of basic parts of speech (N, V, Adj, NP, VPβ¦) generated by recursive phrase-structure rules (e.g., S β NP + VP).
Transformational grammar
Maps deep structure to surface structure. Explains why "John has hit the ball" and "The ball has been hit by John" share a meaning despite different surface forms; and why "Susan is easy to please" β "Susan is eager to please" despite a near-identical surface.
This was the first time a linguist offered an explanatory account of language structure rather than just classification β the model for algorithmic theories of mental capacities.
1.5 The mind as an information processor
George A. Miller (1956) β "The magical number 7 (Β± 2)": human channel capacity β 3 bits β 7 items, roughly independent of modality. Measured by:
- Digit-span task β repeat back the longest sequence of digits you can hold.
- Absolute judgment task β identify stimuli along one dimension.
Psychophysics β Weber's law
- k = 0.03 for weight (3% change detectable)
- k = 0.01 for length
- k = 0.25 for sound frequency in mice
The same absolute difference (10 units) is easy to detect at low magnitude (10 vs 20) and hard at high magnitude (110 vs 120). The detection probability curve runs sigmoidally from 0% (no difference) through 50% (the jnd) to 100% (clear difference).
1.6 Attention β reducing information load
Cherry's dichotic listening / shadowing: participant repeats one ear's story aloud and cannot report the content of the other ear. They do notice physical changes (voice pitch shift, sudden tones).
Three arguments AGAINST early selection
Breakthrough (Moray 1959)
Your own name in the ignored ear penetrates the filter β meaning must have been processed.
Switching (Treisman 1960)
When the shadowed story switches ears, participants follow it β meaning the ignored ear was being parsed.
GSR (Corteen & Wood 1972)
Words PARIS, LONDON, CAIRO conditioned to a shock. Later, ROME in the ignored ear evokes a fear response β semantic category "city" was activated.
Three alternatives to early selection
| Model | Author | Claim |
|---|---|---|
| Late selection | Deutsch & Deutsch | All stimuli processed for meaning; ignored ones quickly forgotten. |
| Attenuation | Anne Treisman | Ignored info is attenuated, not blocked. Important info (your name) is spared. |
| Load theory | Nilli Lavie | Distractor processing depends on how much capacity the main task leaves over. |
Self-test Β· Lecture 1
- Habituated rats would still freeze to the light because the lightβfreezing bond is independent.
- Habituated rats would NOT freeze to the light because it triggers the (now habituated) representation of the sound.
- Habituated rats would freeze more strongly to the light due to dishabituation.
- Habituation should generalize to all conditioned stimuli regardless of pairing.
Show answer
- Fixed-ratio
- Fixed-interval
- Variable-ratio
- Continuous reinforcement
Show answer
- 3 bits (β 7 items), roughly modality-independent
- 7 bits (β 128 items), strongly modality-dependent
- 1 bit (binary), modality-dependent
- 10 bits (β 1000 items), strongly visual
Show answer
- Supports Broadbent's strict early-selection filter
- Demonstrates that unattended stimuli are processed semantically
- Shows that GSR is unrelated to attention
- Confirms Pavlovian extinction
Show answer
- It was newer terminology
- "Satisfaction" implied a mentalistic / unobservable inner state, which behaviourism avoided
- Thorndike's law was already discredited
- It allowed Skinner to include classical conditioning under the same heading
Show answer
Three Milestones of Cognitive Science
Three foundational achievements: SHRDLU (language as algorithmic processing), the imagery debate (spatial vs propositional representation), and Marr's three levels (computational / algorithmic / implementational). Cognitive science matures by treating the mind as a system that operates over internal representations.
2.1 Language and micro-worlds
ELIZA Weizenbaum, 1965
Keyword matching + transformation rules simulating a psychotherapist. Created "the illusion of understanding."
"I had not realized that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."
PARRY Colby, 1972 Β· Stanford
Simulated a paranoid schizophrenic β its inconsistencies made it more realistic. In a modified Turing test, 33 psychiatrists classified transcripts at only 48% accuracy (chance).
SHRDLU Winograd, 1970 Β· MIT
Operated in a colored-blocks micro-world with a robot arm. Could parse and respond to queries like "Does the shortest thing the tallest pyramid's support supports support anything green?"
SHRDLU's three processing stages:
analysis
analysis
world knowledge
Ambiguity demonstration: "Put the red cube on the block in the box." Two readings β [red cube on the block] / [in the box] vs [red cube] / [on the block in the box]. Syntactic parsing alone is insufficient; world knowledge is required to disambiguate.
2.2 The imagery debate β what is mental imagery?
Spatial / depictive
Mental images preserve metric/spatial properties. Same mechanisms as perception. Image of your room has actual layout.
Propositional
Mental images are symbolic, sentence-like ("the pizza was on the dining table"). No spatial format.
Evidence favouring the spatial view:
- Detail recognition β to identify small details in an imagined object, you must "zoom in" β incompatible with propositional format.
- Physical-property effects β brightness, contrast, motion speed affect reaction times the same way for perceived and imagined stimuli.
- "Imagine your dinner" effect β people with bigger houses take longer to mentally answer spatial questions about them. It takes time to travel in our mind.
- Mental rotation (Cooper & Shepard 1973) β reaction time to judge whether the letter R is normal or mirrored grows linearly with the angle of rotation, peaking near 180Β°.
2.3 Marr's three levels of analysis
David Marr (1945β1980), neuroscientist of vision. Two prizes are named after him (IEEE/ICCV; Cognitive Science Society).
| Level | Question | Example for vision |
|---|---|---|
| Computational | What problem is being solved? Input β output? | Recover 3-D structure from 2-D retinal image |
| Algorithmic | How is it solved? What representations and operations? | Edge detection β 2Β½-D sketch β 3-D model |
| Implementational | How is this physically realized? | Neurons in V1, V2, V4, IT |
Exam trap: classic distractors swap the algorithmic and implementational levels. The algorithm describes representations and steps; the implementation describes the physical substrate.
2.4 Marr's three stages of vision
edges, blobs
surface orientations
viewer-centered
object-centered
generalized cones
The 2Β½-D sketch is viewer-centered (depends on where you stand); the 3-D sketch is object-centered (invariant to viewpoint). The 2Β½-D stage uses stereopsis and Gestalt laws.
Contour-related demonstrations: Hidden Dalmatian (low contour image, hard to perceive) Β· Kanizsa's triangle (illusory contours filled in) Β· Camouflage = reducing contours to defeat the primal sketch.
2.5 Perceptual constancies
Size constancy
Two retinally same-sized people can be different real sizes β depth cues correct.
Brightness constancy
Patches A and B can reflect the same light but appear differently bright (the shaded one must be brighter).
Shape constancy
Shepard's tables: two tables look like the same shape on the retina but legs/perspective tell us one is actually longer.
2.6 Object categorization β four theories
| Theory | Claim | Strength / weakness |
|---|---|---|
| Categorization by definition | Membership = necessary + sufficient features (cat = furry + meows + four legsβ¦) | Hairless non-meowing cats? β family resemblance fails |
| Categorization by prototype | A category = an idealized average; typicality = closeness to prototype | Robin verified faster than penguin as "bird" (sentence verification task) |
| Categorization by exemplars | Encountered instances stored individually; typicality falls out of frequency | Best for small categories |
| Recognition-by-components (Biederman) | Objects = combinations of 36 geons (geometric ions) | Decomposition-based view; matches Marr's "generalized cones" idea |
2.7 Categorization hierarchies β expertise effect
"furniture"
large within-category difference
"chair"
default level
"Barcelona Chair"
small within-category difference
Non-experts default to the basic level; experts categorize at a more specific level within their domain. Example: Barcelona Chair (Mies van der Rohe & Lilly Reich, 1929).
Self-test Β· Lecture 2
- Implementational
- Algorithmic
- Computational
- Connectionist
Show answer
- Propositional theory of imagery
- Spatial/depictive theory of imagery
- The exemplar theory of categorization
- Marr's 3-D sketch
Show answer
- Primal sketch
- 2Β½-D sketch
- 3-D sketch
- None β all are object-centered
Show answer
- A simple program could pass a modified Turing test with psychiatrists (β 48% accuracy)
- Schizophrenia is a purely computational disorder
- Real understanding requires biological substrate
- SHRDLU could not handle ambiguity
Show answer
- 3
- 12
- 24
- 36
Show answer
- Definitional categorization
- Prototype theory (typicality effects)
- Biederman's geons
- Family resemblance is irrelevant for birds
Show answer
The Turn to the Brain
From the immaterial soul to localized neural circuits. Phrenology is wrong in particulars but not absurd in spirit β some abilities really do map onto specific brain regions. The lecture surveys dualism vs materialism, basic neuroanatomy, and a parade of neuropsychological case studies that establish localization of function.
3.1 Philosophical roots
Dualism (Descartes 1596β1650)
Two substances: material body and immaterial soul. Animals (dogs) lack souls. The soul interacts with the body via the pineal gland. Only humans need a soul (for thinking, believing).
Critique: how does the immaterial interact with the material? Places mind outside science.
Materialism (Hobbes 1588β1679)
Everything is material; soul is a meaningless concept. All human behaviour = physical processes in the brain. Thought is anchored in neural firing.
The view that lets the brain become an object of science.
3.2 Psychiatry then and now
- 1247 β Bethlem Royal Hospital ("bedlam") founded. Closed institution; no treatment, just isolation.
- 18th century β rest, cleanliness, regularity. King George III of England (1738β1820) went mad and recovered β proof that mental illness can pass.
- 20th century β frontal lobotomy: removal/disconnection of the frontal lobe "to reduce the complexity of psychic life." Required a neurosurgeon.
- Walter Freeman (1945) β transorbital lobotomy with an icepick through the eye socket. > 40,000 people in the US. Side effects: cognitive impairment, flattened affect.
- Paul Broca (1824β1880) β founder of clinical neuropsychology. Showed that damage to Broca's area impairs speech but not other abilities β early empirical case for localization.
3.3 The brain in numbers (Carl Sagan's "very big place in a very small space")
- β 1,300 g total mass
- ~ 10ΒΉΒΉ neurons total (20 Γ 10βΉ neocortical)
- ~ 15 Γ 10ΒΉΒ³ cortical synapses
- Two hemispheres joined by the corpus callosum; lateralized despite sensory/motor symmetry.
The neuron
Neurotransmitters
| Neurotransmitter | Primary function |
|---|---|
| Acetylcholine | Muscle contraction |
| Serotonin | Sleep, mood, arousal |
| Glutamate | Learning, memory (excitatory) |
| GABA | Inhibitory transmitter |
| Norepinephrine | Arousal, wakefulness |
| Dopamine | Motivation, emotion |
3.4 Aphasias β language disorders
Aphasia affects ~35% of stroke patients.
Broca's aphasia (non-fluent)
Speech is effortful, telegraphic, but meaningful. "Cat⦠sit⦠mat." Comprehension largely preserved. Damage to Broca's area (frontal lobe, BA 44).
Wernicke's aphasia (fluent)
Speech is fluent but lacks content/meaning ("word salad"). Damage to Wernicke's area (temporal lobe).
3.5 Split-brain β corpus callosotomy
Cutting the corpus callosum treats refractory epilepsy. No major IQ, conversational, or coordination deficit. But experiments reveal striking dissociations:
- Object presented to left visual field (β right hemisphere) β patient says "There is no object."
- But the left hand (also right-hemisphere) can pick out the same object from a collection perfectly.
- Patients confabulate to explain their left-hand's behaviour.
3.6 Catalogue of deficits
Unilateral spatial neglect
Not blind β but ignores one side (usually left). Misses left side of objects when drawing; makes too many right turns; forgets the tens/hundreds in mental arithmetic. Can affect peri-personal space and even visual imagery. Implicit knowledge about neglected items may be preserved.
Visual agnosia
Cannot recognize visually presented objects despite intact vision, memory, language, and intelligence. Two flavours:
- Form agnosia β cannot perceive shape
- Integrative agnosia β perceives shapes but cannot integrate them. Patient HJA (Humphreys & Riddoch) could only describe a lion by inferring from parts: "A heavy, four-legged animalβ¦ these stripes mean somethingβ¦ I suppose it's a lion."
Blindsight
Cortical blindness (V1 damage) β no conscious vision β yet forced-choice responses are above chance for movement, orientation, even some shapes. "How can I look at something that I haven't seen?" Evidence for two visual systems: a primitive unconscious one and a conscious cortical one.
Prosopagnosia (face blindness)
Cannot recognize faces. Compensation via voice, gait, glasses, hair. Often bilateral damage to the fusiform face area (FFA).
3.7 Two visual streams
Dorsal "where/how"
Action stream. Damage: neglect, apraxia, erratic grasping. The "what pathway" is intact β patient can name and describe objects but can't grasp them properly.
Ventral "what"
Perception stream. Damage: visual agnosia, prosopagnosia. The "where pathway" is intact β patient can post a card "as if mailing a letter" through a rotated slot, but cannot report the slot's orientation.
π Textbook adds β Petersen et al. (1988): PET subtraction logic
An early functional-neuroimaging landmark for single-word processing. Subjects performed a hierarchy of four tasks; each scan was subtracted from the one above to isolate the new component:
baseline
of words
+ visual
+ articulation
for each noun
+ semantic access
The pattern of activations supported a parallel (not strictly serial) model of single-word processing. Significance: this study established the paired-subtraction paradigm that all later fMRI cognitive subtraction designs inherited.
π Textbook adds β Logothetis (2001): what does BOLD actually measure?
The crucial follow-up question after the fMRI revolution: BOLD measures blood oxygenation β but is that correlated with neuronal output (spikes) or input (synaptic activity)? Logothetis put both fMRI and non-magnetic microelectrodes in an anaesthetised monkey's V1 during a rotating-checkerboard stimulus.
| Signal | What it indexes | Correlated with BOLD? |
|---|---|---|
| Single-unit / multi-unit firing (SDF, MUA) | Neural output (spikes) | Adapts after 2 s β decouples from BOLD |
| Local Field Potential (LFP) | Neural input (summed synaptic activity, low-pass-filtered) | Tracks BOLD throughout the trial |
Implication: fMRI activation in a region reflects information arriving there, not necessarily spikes leaving. A region can show BOLD without firing more β undermines naive "this region does X" inferences.
3.8 Capgras delusion β the inverse of prosopagnosia
Delusion that a loved one has been replaced by an identical-looking impostor. Explicit recognition is intact, but the implicit/emotional response is missing. Confabulation: "This person looks like my partner, but I don't feel the same about them β so it must be someone else."
| Explicit recognition | Implicit/emotional recognition | |
|---|---|---|
| Prosopagnosia | β lost | β intact |
| Capgras | β intact | β lost |
Self-test Β· Lecture 3
- Hippocampus
- Pineal gland
- Corpus callosum
- Thalamus
Show answer
- Broca's aphasia
- Wernicke's aphasia
- Prosopagnosia
- Blindsight
Show answer
- Capgras delusion is a single-pathway disorder
- Ventral pathway = perception; Dorsal pathway = action (action is preserved)
- Dorsal damage causes visual agnosia
- Neglect affects only the left hemifield
Show answer
- Amplitude-modulated and graded
- Binary and frequency-modulated
- Continuous chemical signals
- Always inhibitory
Show answer
- Loss of both explicit and implicit recognition of faces
- Loss of explicit recognition with intact emotional recognition
- Intact explicit recognition with lost emotional/implicit recognition
- A pure form of prosopagnosia
Show answer
- Be verbally named correctly
- Not be named verbally, but the left hand can correctly select it
- Cause complete blindness in that hemifield
- Trigger seizures
Show answer
Strategies for Brain Mapping
No single neuroscientific technique sees the whole picture. Each method trades off temporal resolution against spatial resolution. Mastery means knowing which tool to reach for, what it measures, and the conceptual logic of subtraction and double dissociation.
4.1 Anatomical classification
- Surface classification β gross anatomy (gyri, sulci, lobes).
- Cellular classification (Brodmann) β Brodmann used staining to identify ~52 areas with distinct neuronal populations. Principle of Segregation: "Cerebral cortex can be classified into different areas with unique neuronal populations."
| Brodmann area | Function |
|---|---|
| BA 1β3 | Primary somatosensory cortex |
| BA 4 | Primary motor cortex |
| BA 17 | Primary visual cortex (V1) |
| BA 44 | Broca's area (language production) |
DTI (Diffusion Tractography) = MRI-based visualization of white-matter fibre tracts β measures anatomical connectivity, not function.
4.2 The big methodological tradeoff
| Technique | Directly measures | Temporal | Spatial |
|---|---|---|---|
| Single-unit recording | Action potentials of individual neurons | High | High |
| EEG | Electrical activity of large neural populations (scalp) | High | Low |
| MEG | Magnetic fields from electrical population activity | High | Lowβmedium |
| PET | Cerebral blood flow | Low | High |
| fMRI | Blood oxygen levels (BOLD) | Low | High |
Exam-critical: fMRI does NOT measure neural activity directly. It measures the BOLD signal: "changes in magnetic properties of haemoglobin in the blood due to brain activation."
4.3 ERP components
An ERP (event-related potential) is a time-locked average of EEG to a specific event.
| Component | Latency | Indexes |
|---|---|---|
| P1 / N1 | ~ 100β150 ms | Early sensory + attention (enhanced for attended stimuli) |
| P300 | ~ 300 ms | General cognitive processing / oddball detection |
| N400 | ~ 400 ms | Semantic processing (e.g., "He spread butter on his socks") |
| P600 | ~ 600 ms | Syntactic reanalysis |
4.4 The Locus of Selection problem
Does attention modulate processing before or after a stimulus representation is built?
- ERP timing: P1 and N1 are enhanced for attended stimuli β early modulation.
- Macaque microelectrode recordings localize attentional modulation to V1βV4, before object recognition or access to meaning.
- Resolution: combining ERP (timing) with microelectrodes (location) shows attention acts early, in pre-representational visual cortex.
4.5 fMRI logic β subtraction & hierarchical design
To localize a function, contrast a task that requires it against a near-identical task that doesn't. Hierarchical lexical-access design:
control
visual
+ articulation
+ semantics
Each contrast isolates the new component recruited at that step.
4.6 Owen et al. (2006) β detecting awareness in the vegetative state
Published in Science 313:1402. A patient diagnosed UWS was asked to imagine either playing tennis (engages motor/SMA) or walking through her house (engages parahippocampal regions). She produced the appropriate, task-specific BOLD activation on command β proving covert awareness despite the absence of behavioural response. This paradigm became a yes/no communication channel for "vegetative" patients.
Self-test Β· Lecture 4
- Action potentials of individual neurons
- Changes in haemoglobin's magnetic properties (BOLD signal)
- Cerebral blood flow via radioactive tracers
- Magnetic fields generated by neural firing
Show answer
- fMRI
- PET
- EEG (looking at the N400)
- DTI
Show answer
- BA 4
- BA 17
- BA 44
- BA 1β3
Show answer
- EEG recordings of N400 during speech
- fMRI during imagined tennis vs imagined spatial navigation
- PET imaging of dopamine receptors
- Single-unit recording in V1
Show answer
- Attention modulates visual processing late, after object recognition
- Attention modulates visual processing early, in V1βV4 before recognition
- Attention is purely a frontal phenomenon
- Attention does not modulate sensory areas at all
Show answer
Connectionism
Two starting points in Marr's hierarchy give rise to two paradigms. Symbolic AI starts from the mind (algorithms, interpretable rules). Connectionism starts from the brain (biology has already produced intelligence). The result: networks whose knowledge is a pattern of weights, not a list of beliefs.
5.1 Two starting points
Symbolic AI (top-down)
Start at the algorithmic level. Explicit symbol manipulation. Interpretable and fittable. Cognition as rule-based operations on discrete symbols (the Physical Symbol System hypothesis).
Connectionism (bottom-up)
Start at the implementational level. Biology has produced intelligence β use it as inspiration. Deep nets are powerful but hard to interpret. Knowledge is in the weight vector.
π Textbook adds β The Physical Symbol System Hypothesis (Newell & Simon, 1976)
The lecture's "Symbolic AI" pole is grounded in a specific thesis the textbook treats as foundational. Allen Newell & Herbert Simon (Turing Award lecture, 1976):
"A physical symbol system has the necessary and sufficient means for general intelligent action."β Newell & Simon, 1976
Two claims packed in: (i) necessity β anything intelligent must be a physical symbol system; (ii) sufficiency β building one is enough to produce intelligence.
Four defining features of a physical symbol system
- Symbols are physical patterns (inscriptions on a tape, voltage states, neural firings).
- Symbols can be combined into complex structures via recursive rules (like sentences in propositional logic).
- The system contains processes that transform symbol structures in rule-governed ways β this is thinking.
- Those transformation processes can themselves be represented as symbols within the system (meta-representation).
Cognition, on this view, is heuristic search through a problem space. Newell & Simon's General Problem Solver (GPS) applied meansβend analysis: compute the difference between the current and goal state, pick an operator that reduces it, apply, repeat. The PSSH defines what connectionism rejects.
5.2 From biology to schematic neurons
5.3 Learning β the delta rule (single-layer)
Ξ T = β Ξ΅ Ξ Wα΅’ = Ξ΅ Β· Iα΅’ (scaled by a learning rate)
The perceptron convergence rule: training will find a solution in every case where a solution is possible. But which functions ARE possible?
5.4 The XOR problem
The single-layer perceptron cannot learn XOR β it oscillates, never converges. Why?
| Iβ | Iβ | XOR | Contradiction |
|---|---|---|---|
| 1 | 0 | 1 | (1Β·Wβ) > T |
| 0 | 1 | 1 | (1Β·Wβ) > T |
| 1 | 1 | 0 | but then (Wβ+Wβ) > T β output would also be 1. Impossible. |
Perceptrons only learn linearly separable functions. XOR is not linearly separable β you cannot draw a single straight line that separates the (1,0) and (0,1) cases from (0,0) and (1,1).
5.5 The escape route β multi-layer networks & backpropagation
- Universal approximation theorem: a multi-layer network can compute any Turing-computable function.
- But the perceptron rule no longer works β hidden units have no target activation.
- Backpropagation calculates each hidden unit's "share of responsibility" for the output error and uses it to update weights.
- Gradient descent: follow the negative gradient of the error surface; stop when the gradient is zero. Risk: local minima β global minimum.
5.6 Biological plausibility β the critiques
- Schematic neurons β real ones; questions of parallelism and scale.
- No evidence backpropagation occurs in the brain.
- How would the brain set the number of hidden units?
- No evidence individual neurons receive error signals from all downstream neurons.
- Most biological learning is not supervised.
5.7 Cognitive implications β distributed vs local representations
- Knowledge lies in a pattern of weights, not in any one unit.
- A trained network does not need a separate unit per feature.
- Processing = input vector Γ weight vector. No discrete beliefs, no explicit rules.
- Algorithmic in a limited sense: the learning rule and activation function are algorithms β but they're not task-specific and they don't operate over explicit representations.
Conclusion: "The nature of representations and computation in neural networks is fundamentally different compared to physical symbol systems."
Self-test Β· Lecture 5
- AND
- OR
- NAND
- XOR
Show answer
- Be trained by the perceptron convergence rule
- Always find the global minimum
- Compute any Turing-computable function (given enough hidden units)
- Encode any function in a single unit
Show answer
- ΞWα΅’ = Ξ΅ Β· Iα΅’ (scaled by learning rate)
- ΞWα΅’ = βΞ΅
- ΞWα΅’ = T Β· Iα΅’
- ΞWα΅’ = Wα΅’Β²
Show answer
- It only works for linearly separable functions
- It produces local minima too often
- There is no evidence the brain implements it (no mechanism for propagating error signals through every synapse)
- It is slower than the perceptron rule
Show answer
- Each unit represents one feature
- Knowledge lies in the pattern of weights across many units
- Information is stored explicitly as symbols
- Each layer represents a different category
Show answer
Modularity of Mind & Dynamical Systems
Three rival pictures of mental architecture: (1) Fodor's classical modularity β domain-specific input modules plus a non-modular central system; (2) massive modularity (Cisek-style evolutionary view) β no central processor at all; (3) dynamical systems theory β cognition as a process that evolves in time, possibly without representations or computation.
6.1 Agents: three tiers
Reflex agents
IFβTHEN production rules. Not a cognitive system. No information processing β just acting on information. Examples: thermostat, zebrafish C-start reflex, somatic reflex.
Goal-based agents
Evaluate consequences of possible actions in light of goals (foraging). No learning.
Learning agents
Detect errors. Experiment with new strategies in light of past failures.
6.2 Classical (Fodorian) modularity
Aristotelian roots: horizontal faculties (perception, attention, memory) are domain-general; vertical faculties are domain-specific (colour, shape, face/voice, grammar, conspecific recognition).
Input modules (Fodor)
- Domain-specific
- Mandatory
- Information-encapsulated
- Fast
- Fixed neural architecture
- Specific breakdown patterns
Central processing (Fodor)
- Domain-general
- Information-un-encapsulated (isotropic)
- Slow
- Voluntary control
- Diffuse neural structures
- Personal-level propositional attitudes
Evidence cited: lesion studies, Broca's vs Wernicke's aphasia, brain mapping.
6.3 Massive modularity (Cisek 2019)
The radical alternative: there is no domain-general central processor. The mind is hundreds or thousands of genetically specified Darwinian modules selected for specific adaptive problems.
- "The most important thing about the brain is that it evolved."
- Domain-general learning mechanisms cannot detect statistically recurrent domain-specific structure.
- Each module exploits specialized, domain-specific rules.
- Descriptive vs pragmatic representations: control loops only need action-oriented (pragmatic) representations, not world models.
- Inputβoutput functionalism ignores the cyclical nature of behaviour.
- No single decision-making system β just domain-specific competition mechanisms.
- Conceptual maps emerge from learning on top of sensorimotor loops β no symbol-grounding problem.
π Textbook adds β The cheater-detection module: Wason & Cosmides/Tooby
The textbook's flagship case study for a Darwinian module. The Wason selection task: four cards (E, K, 4, 7); rule "If a card has a vowel on one side, then it has an even number on the other". Which to turn? Correct answer: E and 7 (modus tollens). Most subjects say E and 4 β a famous failure of abstract conditional reasoning.
Griggs & Cox (1982) reframed the same logical task as a deontic conditional: "If a person is drinking beer, then that person must be over 19" with cards BEER, COKE, 16, 25. Now subjects answer correctly (BEER, 16) at near-ceiling rates.
Cosmides & Tooby argued the improvement reveals a domain-specific, evolved cheater-detection module for social-exchange reasoning. The argument links to the evolution of cooperation via the TIT FOR TAT strategy in indefinitely-iterated prisoner's dilemmas: applying TIT FOR TAT requires identifying defectors, so natural selection would favour a module specialised for spotting them.
| Two general arguments for massive modularity | Cosmides & Tooby's claim |
|---|---|
| Argument from error | Fitness criteria are domain-specific (treating kin, finding mates, detecting cheaters all differ) β no domain-general cognitive mechanism could have evolved. |
| Argument from statistics & learning | Domain-general learning mechanisms cannot detect statistically recurrent domain-specific patterns (e.g., Hamilton's kin-selection equation). |
6.4 Dynamical systems theory (van Gelder 1995)
"What might cognition be, if not computation?"β Tim van Gelder, 1995
Cognition as a process that evolves through time, not necessarily involving computation or representations.
Traditional cogsci
Cognition = information processing = manipulating representations. Discrete steps. Symbols, rules.
Dynamical systems
Cognition = continuous trajectory through state space. Described by difference equations (discrete) or differential equations (continuous).
- State space = geometric space of all possible system states. Each independently varying quantity = one dimension.
- Trajectory = path through state space from initial conditions.
- Two senses of "dynamical system": trivial (anything that evolves in time) vs technical (analyzable with DST tools).
π Textbook adds β ACT-R as a hybrid architecture (Anderson, CMU)
Where Soar (Newell, Laird, Rosenbloom) is purely symbolic, ACT-R ("Adaptive Control of Thought β Rational") is the canonical hybrid architecture β symbolic and subsymbolic at once.
Symbolic layer
Chunks in declarative memory (knowledge-that, e.g., "7+6=13"). Production rules in procedural memory (knowledge-how, IFβTHEN). All built from physical symbols.
Subsymbolic layer
Each production rule and chunk has a numerical activation/utility value. A pattern-matching module performs a Bayesian-style costβbenefit calculation to pick which rule fires next β no central executive.
Take-away: modularity and PSSH-style processing can coexist with neural-net-style subsymbolic selection. Cognitive architecture is not all-or-nothing.
6.5 Worked example 1 β Ising network model of depression (Cramer et al. 2016)
Traditional latent-variable view: gallstones cause nausea, abdominal pain, heartburn β a single hidden cause produces all symptoms.
The network view of psychopathology: symptoms are nodes (active = 1, inactive = 0) coupled by weights Wij. Activation propagates via a logistic function. Stress = extra input to all nodes.
- Depression evolves as a self-sustaining network of interacting symptoms.
- Insight into cognition without representations or computations.
- Same approach extends to bipolar disorder, generalized anxiety, attitude models.
6.6 Worked example 2 β Decision Field Theory (Busemeyer et al. 2019)
Choosing among multiple options (e.g., three phones differing in price, OS, battery, speed). Preference state P evolves over time:
The dynamics of a connectionist accumulator predict preferences, response times, and choice proportions as emergent properties of system evolution, not computations on symbols.
Take-away framing: "The behavior of the system as a whole is of interest β less focus on the computations on underlying representations, or even on architecture."
Self-test Β· Lecture 6
- Domain-specific
- Mandatory
- Information-encapsulated
- Isotropic
Show answer
- There are no input modules at all
- There is no domain-general central processor β the mind is modules all the way down
- Modules cannot evolve
- Symbol grounding is impossible
Show answer
- Massive modularity
- Dynamical systems theory
- Classical AI
- The Physical Symbol System hypothesis
Show answer
- A reduction in weight strengths
- Extra activation input to all symptom nodes
- A change in the logistic activation function
- An additional hidden node
Show answer
- A representation
- A trajectory through state space
- An algorithm
- A symbol grounding
Show answer
Bayesianism in Cognitive Science
Three ideas: (1) belief comes in degrees, (2) those degrees obey probability calculus, (3) learning = updating probabilities via Bayes' rule. The lecture's punchline: Bayesianism is the normative ideal β but humans systematically fail to reason like Bayesians.
7.1 The probability calculus rules
- Probabilities β [0, 1].
- Impossible sentences = 0; necessary truths (2+2=4) = 1.
- If P and Q are logically equivalent, p(P) = p(Q).
- Negation: p(Β¬S) = 1 β p(S).
- Disjunction (mutually exclusive): p(R β¨ S) = p(R) + p(S).
- Conjunction (independent): p(R β§ S) = p(R) Γ p(S).
- Conditional: p(A | B) = p(A β§ B) / p(B).
7.2 Bayes' rule
The denominator is computed by marginalization: p(E) = p(E|H)Β·p(H) + p(E|Β¬H)Β·p(Β¬H)
7.3 Why the laws are objectively correct β Dutch books
A Dutch book is a set of bets that (1) the subject considers fair given their personal probabilities, but that (2) guarantee they lose money no matter what. Anyone whose beliefs violate probability calculus can be Dutch-booked. Therefore: rational degrees of belief must obey probability calculus.
Sam example: Sam believes "2+2=4" with probability 90%. Offer: he gets $0.90; he pays $1 if the bucket has 4 marbles. He thinks it's fair (EV = 0). But 2+2 IS 4 β so he always loses $0.10.
7.4 Worked example β ESP / clairvoyance
"Extraordinary claims require extraordinary evidence."β Carl Sagan (1978)
Clairvoyant correctly predicts 100 coin tosses. Should you believe in ESP?
- P(predict | ESP) = 0.9
- P(ESP) = 10β»ΒΉΒ² (very skeptical prior)
- P(predict | Β¬ESP) = 2β»ΒΉβ°β° β 7 Γ 10β»Β³ΒΉ
Posterior P(ESP | predict) β 1 β 10β»ΒΉβΈ. Almost certain β until you add the "trick" hypothesis with P(trick) = 10β»βΆ:
It's a million times more likely you were tricked than that ESP is real. The moral: tiny priors over fraud can outweigh enormous likelihoods.
7.5 Worked example β COVID-19 base-rate problem
Prevalence 1/1000. False positive rate 5%. You test positive. P(disease | positive) = ?
Reason over 1000 people:
- 1 person actually has it (positive).
- Of 999 healthy people, 5% test positive falsely β 50.
- 51 positives total; only 1 is actually sick β ~2%, not 95%.
Why we still trust tests: in real life the sample is not random β there's a reason you were tested, so the relevant base rate is much higher.
7.6 The transposed-conditional fallacy
p(A | B) β p(B | A).
- A = "is a white American man," B = "is a US senator"
- p(A | B) β 0.9 (most senators are white American men)
- p(B | A) β 0.00000009 (almost no white American men are senators)
π Textbook adds β Perception as Bayesian inference (Helmholtz β Hohwy)
The textbook frames Bayesianism in cognition as the modern formalisation of Hermann von Helmholtz's 19th-century proposal that perception is unconscious inference. The proximal sensory input radically underdetermines the distal world, so the brain must infer what is out there using stored knowledge about how the world tends to be.
- Hypothesis (H) = candidate layout of the distal environment.
- Evidence (E) = retinal stimulation.
- Likelihood p(E | H) = how probable this image is given that layout.
- Prior p(H) = how probable that layout is in general.
- Gestalt principles (continuity, proximity, good form, common fate) function as Bayesian priors over scene structure.
Case study: Binocular rivalry (Hohwy, Roepstorff & Friston, 2008)
Present a red iron to the left eye and a green violin to the right eye. Perception alternates between the two β never a stable composite. Why?
- H1 = red iron, H2 = green violin, H3 = composite "red-green iron-violin".
- Likelihoods of the conflicting retinal input are roughly equal across H1βH3.
- But p(H1) β p(H2) β« p(H3) β the prior on composite objects is tiny.
- Posteriors: H1 and H2 tied, H3 ruled out. The visual system flips between the two equally-supported hypotheses rather than averaging them.
Binocular rivalry is a rational Bayesian response, not a glitch β a key example for predictive coding theories of perception.
7.7 Bayesian search theory (MH370)
For each grid cell i: pi (probability object is there), ai (probability of finding it if there), ci (cost of searching).
After each miss, redistribute the posterior over remaining cells and recompute.
7.8 Where humans fail β heuristics & biases
Availability heuristic
Judge probability by ease of recall. Therapist who just saw three depressed patients overestimates depression in the next.
Gambler's fallacy (predictable-world bias)
After 4 tails, you "feel" heads is due. But independent tosses have no memory.
Probability matching
Die with 4 red / 2 green sides. Maximizing (always red) β 67% correct. Matching (red 2/3, green 1/3) β 56%. Humans match; mice maximize. In stochastic processes, maximizing > matching.
Base-rate neglect
The COVID example above. Also: at the NY subway you see someone reading the NYT β better bet she has a PhD or no college degree? Far more non-graduates ride the subway, so no degree is the better bet.
7.9 The Linda problem β conjunction fallacy (Tversky & Kahneman)
Linda is 31, single, outspoken, very bright. Majored in philosophy; concerned with discrimination/social justice; antinuclear protests. Rank the probability:
- F. Linda is a bank teller
- H. Linda is a bank teller and active in the feminist movement
Most people rank H > F. But "feminist bank tellers" are a strict subset of "bank tellers." Specifying more detail can only LOWER probability, never raise it.
7.10 Bayesian view of psychopathology
Schizophrenic delusions can involve affirming the consequent: "Jesus had stigmata; I have stigmata; therefore I am Jesus." Rokeach (1964) "The Three Christs of Ypsilanti" β three paranoid schizophrenic men each believing he was Jesus, housed together for two years; their beliefs barely shifted, showing the difficulty of revising delusional priors.
Self-test Β· Lecture 7
- Posterior
- Likelihood
- Prior
- Marginal evidence
Show answer
- 95%
- 50%
- 2%
- 0.1%
Show answer
- It actually IS more probable
- The conjunction fallacy: people judge by representativeness, not probability
- Feminist bank tellers form a superset
- It is a Bayesian-correct judgment
Show answer
- Their willingness to gamble
- Violations of probability calculus in their personal probabilities
- Their use of the availability heuristic
- Their priors being too small
Show answer
- Probability matching (predict red 2/3, green 1/3) β 56%
- Maximizing (always predict red) β 67%
- Always predict green
- Predict whichever was last seen
Show answer
- pi Β· ai
- pi Β· ai / ci
- pi / ai
- ci Β· ai
Show answer
Language Learning
Three paradigms applied to language: symbolic (Fodor's Language of Thought, Chomsky's innatism), connectionist (neural nets that reproduce children's overgeneralization), and Bayesian (statistical learning of word boundaries and anaphora). The lecture closes with the LLM revolution and what it implies about innateness.
8.1 What is language understanding?
- Semantics β meaning of words.
- Syntax β structure; surface vs deep structure.
- "Colorless green ideas sleep furiously" β syntactically well-formed, semantically anomalous.
- "John has hit the ball" / "The ball has been hit by John" β two surface structures, one deep structure.
Strong vs weak mastery: are linguistic rules explicitly represented in the head (strong sense, Fodor/Chomsky) or merely obeyed in behaviour (weak sense, connectionist/Bayesian)?
Key permission slip: "Rule-governed phenomena need not come from rule-governed information-processing structures." β this opens the door to connectionism and Bayesianism for language.
8.2 Fodor's Language of Thought (Mentalese)
Learning a language requires being able to evaluate truth conditions: "'The cat is on the mat' iff there's a cat and there's a mat and the cat is on the mat." Circularity problem: you can't learn this in English without already knowing what "cat" and "mat" are.
Solution: an innate symbolic medium β Mentalese. Slogan: "You cannot use the language you're learning to learn."
8.3 Nicaraguan Sign Language
In the 1970s, a school for deaf children in Nicaragua tried to teach Spanish by finger-spelling. Instead, the children spontaneously generated their own sign language. Documented by linguist Judy Kegl. Later generations added structural features like spatial modulation β evidence for innate language abilities.
8.4 Three paradigms for past-tense learning
The English past tense is a microcosm. Children show two features: (1) follow the "-ed" rule ("walked"); (2) handle exceptions ("gave"). Crucially, they make overgeneralization errors ("goed") that come and go in a gradual learning curve.
Dual-route (symbolic)
Two separate systems: (1) associative memory for irregulars; (2) an explicit "-ed" rule for regulars.
Plunkett & Marchman (1993)
One connectionist network: 20 input + 30 hidden + 20 output units; phonological input β phonological output. Reproduces overgeneralization and gradual learning β without explicit rules.
π Textbook adds β Rumelhart & McClelland (1986): the original past-tense network
Before Plunkett & Marchman there was the Rumelhart & McClelland (1986) PDP model β the founding connectionist past-tense network, published in the two-volume Parallel Distributed Processing.
| R&M (1986) | Plunkett & Marchman (1993) | |
|---|---|---|
| Architecture | Simple pattern associator, no hidden units | 20β30β20 with hidden layer |
| Input encoding | Wickelfeatures (after Wickelgren) β context-sensitive phoneme codes | Raw phonological input |
| Learning rule | Perceptron convergence | Backpropagation |
| Training regime | 10 high-frequency verbs β suddenly expanded to 410 medium-frequency (80% regular) | 20 verbs (half regular, half irregular), gradually expanded |
| Reproduced overgeneralization? | Yes β but Pinker & Prince argued it was baked in by the sudden vocabulary jump | Yes, without the question-begging schedule |
The lineage matters because Pinker & Prince's critique of R&M is what motivated the dual-route symbolic model. The later Plunkett & Marchman result rebuts that critique on the connectionist side: overregularization emerges from co-presence of regulars and irregulars, not from training-set manipulation.
8.5 Bayesian language learning
(a) Word segmentation via transitional probabilities
For the sound string /k/ /ae/ /t/ /m/ /i/ /aΚ/ /z/ ("cat meows"):
- p(/ae/ | /k/) β high (within-word transition)
- p(/t/ | /ae/) β high
- p(/m/ | /t/) β low β this dip signals a word boundary
Same logic scales up to word-level transitional probabilities for sentence boundaries.
(b) Pronominal anaphora (Lidz et al.)
"I'll play with this red ball, and you can play with that one." Does "one" refer to H1 = a ball or H2 = a red ball?
- P(H | S) β P(S | H) Β· P(H)
- Children learn P(S | H) from experience.
- Since P(S | H2) > P(S | H1), the most likely intended referent is "the red ball."
Reference: "What children know about syntax but could not have learnt."
8.6 LLMs β how they work
β morpheme
100s of dims, learned
weighted recombination
prediction
- Autoregression: feed each prediction back as input to predict the next.
- Transformer paper: "Attention is all you need" (Vaswani et al., 2017).
- Attention heads recode each token as a learned weighted combination of all tokens. Stacked hundreds of times, purely feedforward.
- The final encoding of the last token IS the prediction of the next token. Deterministic; randomness added post-hoc.
Embedding arithmetic β words as vectors
biggest β big + small = smallest
Paris β France + Berlin = Germany
Doctor β man + woman = nurse bias embedded
Two-stage training
- Pre-training: mask the next word in billions of internet texts; backprop until predictions improve. (Tends to complete rather than reply.)
- RLHF (Reinforcement Learning from Human Feedback): humans rate outputs; network updated toward higher-rated predictions. Makes models conversational.
LLM knowledge β long-term semantic memory (fuzzy, can hallucinate); LLM prompts β working memory (relevant info inserted reduces hallucinations).
8.7 Big debates β does this refute Chomsky?
Piantadosi (2023)
"LLMs refute Chomsky." A pure text-prediction net acquires grammatical structure with no innate machinery. Proof of principle that syntax can be acquired without innate structure.
Bender et al. (2021) β Stochastic Parrots
"An LM is a system for haphazardly stitching together sequences of linguistic forms... according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot."
Other critiques:
- Grounding problem β LLMs don't know what a banana tastes like.
- Mitchell & Krakauer (2023) β humans learn concepts; they abstract, reason compositionally and counterfactually, intervene on the world, and explain.
- Guest & Martin (2023) β multiple realizability: same outputs do not imply same mechanism.
- Training data asymmetry β GPT-3 saw ~4 Γ 10ΒΉΒΉ words; a 5-year-old does causal reasoning on 4β5 orders of magnitude less input. Possible explanations: nativism, multi-modal grounding, active/social learning, or "comparing apples and pears."
- Binz & Schulz (2023, PNAS) β gave GPT-3 the Wason selection task. It got the canonical version right but failed ~50% of trials, with human-like error patterns.
Self-test Β· Lecture 8
- Language can only be learned from a teacher
- You cannot use the language you are learning to learn it β so the medium of learning must already be in place (Mentalese)
- Children must first learn to speak English before they can think
- Truth conditions are unlearnable
Show answer
- Children do not actually overgeneralize
- A single network without explicit rules can reproduce both overgeneralization errors and gradual learning curves
- Past-tense learning requires the dual-route architecture
- Symbolic AI is sufficient for language learning
Show answer
- They are high within words and low across word boundaries
- They are low within words and high across word boundaries
- They are uniform across the speech stream
- They depend on the listener's prior vocabulary
Show answer
- Have genuine semantic understanding
- Refute Chomsky's nativism
- Stitch linguistic forms together by probability without reference to meaning
- Will eventually become AGI
Show answer
- It shows that without explicit instruction, deaf children spontaneously generated a structured language with later generations adding features like spatial modulation
- It shows that sign language is impossible without spoken language input
- It refutes Chomsky
- It demonstrates LLM emergence
Show answer
- A hidden state to be discarded
- The prediction of the next token
- A summary of the entire vocabulary
- An attention weight matrix
Show answer
Consciousness
Consciousness has two distinct dimensions (wakefulness and awareness), splits into easy vs. hard problems (Chalmers), and admits multiple competing theories. The empirical methods of cognitive science can address the easy problems; whether they can address the hard problem is a matter of fierce debate.
9.1 Two dimensions of consciousness (Laureys 2005)
Wakefulness
A state. Gradual. Varies over time. Objectively measurable (EEG, behaviour).
Awareness
An experience. To be conscious of something. First-person; not externally observable.
| State | Sleep/wake cycle? | Reactions? | Awareness? |
|---|---|---|---|
| Coma | No | Only reflexes | None |
| UWS (vegetative) | Yes | Autonomic, eye-opening, reflex behaviour | None (apparent) |
| MCS (minimally conscious) | Yes | + some non-reflex movements (fixations, follow commands) | Some |
| LIS (locked-in) | Yes | Cannot move a muscle (minimal eye movement at most) | Fully awake and conscious |
In a study of 54 UWS/MCS patients, 5 could modulate brain activity via fMRI on command β they were conscious despite the clinical diagnosis.
9.2 The knowledge argument β Mary's Room (Jackson 1986)
"Mary is confined to a black-and-white roomβ¦ she knows all the physical facts about us and our environmentβ¦ It seems, however, that Mary does not know all that there is to know. For when she is let out of the black-and-white room or given a color television, she will learn what it is to see something redβ¦"β Frank Jackson, 1986
Formal argument:
- Inside the room, Mary has complete knowledge of how the brain processes colour.
- So she knows everything about the information-processing of red.
- When she leaves the room, she acquires new knowledge β what red is like.
- Therefore, some aspects of conscious experience cannot be understood in terms of information processing.
9.3 Non-conscious processing β priming & dissociations
Strategy: contrast processing that works without awareness with processing that requires it.
- Face/Tool priming: categorization is faster when the prime is congruent (face primes face).
- Word priming: "DOG" is recognized faster after "CAT" than after "CAR" β the priming is semantic, not visual.
- Non-conscious priming is short-lived across SOA β consciousness allows information to be retained over time.
- Double dissociation β in disorder 1: A intact, B impaired; in disorder 2: B intact, A impaired. Strong evidence for separable mechanisms.
Neglect
Lesions to right parietal/frontal. Patients lack awareness of contralesional (typically left) space β yet implicit processing of neglected information can occur.
Blindsight
Lesions to V1. Patients are aware they cannot see ("How can I look at something I haven't seen?") yet, when forced to guess, are correct above chance for movement, orientation, even some shapes.
9.4 What is consciousness for?
Patients in blindsight or neglect never voluntarily act on stimuli in their affected field. The lecture's claim:
Consciousness permits identification of targets and planning of deliberate, voluntary action.
Milner & Goodale: two visual streams revisited
Dorsal β vision for action
Online motor control. Non-conscious. Not fooled by the Ebbinghaus illusion β grip aperture is veridical even when perceived size is illusory.
Ventral β vision for perception
Conscious perception, deliberate-action planning. Damaged in patient D.F. (visual form agnosia) β she can post a card into a slot but cannot report its orientation.
Fang & He (2005, Nature Neuroscience): interocular suppression renders a stimulus invisible. Result β robust dorsal activity even when the stimulus is invisible, but ventral activity tracks conscious perception. Conscious awareness is restricted to the ventral pathway.
9.5 Block's distinction
Phenomenal consciousness (P)
Raw experience. Qualia β the felt redness of red, the painfulness of pain. The "what it is like" aspect.
Access consciousness (A)
Reportable. Direct control of thought, reasoning, speech, action. Information available to the global workspace.
9.6 Chalmers β easy vs hard problems
Easy problems
- Discrimination, categorization, reaction to environment
- Integration of information
- Reportability of mental states
- Internal-state access
- Focus of attention
- Deliberate behavioural control
- Wakefulness vs sleep
The Hard problem
"There is something it is like to be a conscious organismβ¦ the felt quality of redness, the sound of a clarinet, the smell of mothballsβ¦ What unites all of these states is that there is something it is like to be in them."β Chalmers, 1995
Why is there subjective experience at all? Why isn't the information processing happening "in the dark"?
9.7 Three theories of consciousness
Global (Neuronal) Workspace
(Baars Β· Dehaene & Changeux). Theatre metaphor: attention spotlights content on stage; the audience receives the broadcast; backstage processes shape what gets in. Hierarchical, distributed architecture broadcasts integrated info brain-wide β reportability. Pyramidal neurons may form the substrate. Claims to dissolve the hard problem.
Integrated Information Theory (IIT)
(Tononi). Consciousness = integrated information (Ξ¦). A system is conscious to the extent it has irreducible cause-effect structure. Named in lecture without detailed Ξ¦ calculation.
Recurrent Processing (RP)
(Lamme). Recurrent loops in sensory cortex generate phenomenal experience. Named only.
π Textbook adds β Higher-Order Theories (Rosenthal, Armstrong, Lycan, Carruthers)
The textbook treats HOT as one of the major theories alongside GWT, IIT, and recurrent processing β the lecture only names it. Core claim: a mental state is conscious iff it is the object of a suitable higher-order mental state. The very same first-order state can be conscious at one moment and nonconscious at another, depending on whether something higher is "watching" it.
HOP β Higher-Order Perception (Armstrong, Lycan)
A first-order state becomes conscious when an inner sense (introspection / a quasi-perceptual scanner) targets it. Objection: defining inner sense via "awareness" risks circularity; sensory representations of abstract thoughts (the Pythagorean theorem) seem implausible.
HOT β Higher-Order Thought (Rosenthal)
A first-order state is conscious iff accompanied by a thought about it. Empirical support: Lau & Passingham β visual masking varied subjective awareness while behavioural accuracy was constant; awareness tracked activation in dorsolateral PFC (BA 46).
Standard objection to both: when I consciously smell a rose, I am aware of the rose, not of a mental state representing the rose β HOT/HOP seem to misdescribe the phenomenology. They also struggle to explain why consciousness has any distinctive functional role if first-order states behave identically with or without an accompanying HOT.
9.8 Dennett's deflationary view β the vitalism analogy
"The so-called hard problem of consciousness will disappear once we have a good enough understanding of the various phenomena lumped together under the label 'access consciousness'."β Daniel Dennett (1942β2024)
Analogy: in the 19th century, biology and chemistry seemed incapable in principle of explaining what separates living from non-living matter β there must be an Γ©lan vital ("vital force"). As biology matured, vitalism evaporated. Same fate (Dennett predicts) awaits the hard problem.
9.9 Cognitive scientists vs Mysterians
- Cognitive scientists: consciousness is a thriving research programme; the easy problems are tractable, and progress on them will eventually dissolve the hard problem.
- Mysterians: consciousness is, in principle, beyond cognitive-scientific tools.
Self-test Β· Lecture 9
- Frank Jackson
- Ned Block
- David Chalmers
- Daniel Dennett
Show answer
- Daniel Dennett, defending illusionism
- Frank Jackson (1986), the knowledge argument
- Ned Block, P-consciousness
- Giulio Tononi, IIT
Show answer
- Tononi
- Chalmers
- Ned Block
- Baars
Show answer
- No sleep/wake cycle, no awareness
- Sleep/wake cycle and some minimal non-reflex movements
- Inability to move any muscle but with full awareness and consciousness
- A coma
Show answer
- Both dorsal and ventral activity require conscious vision
- Dorsal-stream activity persists for invisible stimuli; ventral activity does not
- Ventral-stream activity persists for invisible stimuli; dorsal does not
- Conscious vision recruits only V1
Show answer
- The phlogiston theory of combustion
- Vitalism / Γ©lan vital
- Cartesian dualism
- Phrenology
Show answer
- Integrated Information Theory (Tononi)
- Higher-Order Theory (Rosenthal)
- Global (Neuronal) Workspace Theory (Baars / Dehaene)
- Recurrent Processing Theory (Lamme)
Show answer
Cross-cutting Themes
The lectures keep circling the same fault lines from different angles. Spotting them is half of cognitive science.
Theme 1 β Three paradigms (symbolic / connectionist / Bayesian-dynamical)
| Paradigm | Starts from | Knowledge lives in | Star applications in the course |
|---|---|---|---|
| Symbolic / Classical | Algorithmic level (Marr); the mind | Explicit rules + symbols | SHRDLU, Fodor's Mentalese, dual-route past tense |
| Connectionist | Implementational level; the brain | Pattern of weights | Plunkett & Marchman past tense, LLMs |
| Bayesian / Dynamical | Computational behaviour over time | Probabilities or state-space trajectories | Word segmentation, Lidz anaphora, Ising depression, DFT |
Theme 2 β Localization vs holism / encapsulation vs integration
- Phrenology (wrong in particulars, right in spirit) β Broca β Brodmann β fMRI subtraction β modular vision streams.
- Fodor: peripheral modules + central isotropic system. Massive modularity: all modules, no central system.
- Dorsal/ventral streams recur in lectures 3, 4, and 9 β different lesions, illusions, and consciousness studies all converge on the same anatomical dissociation.
Theme 3 β Conscious vs non-conscious processing
- Behaviourism would deny "the unconscious." But Corteen & Wood, blindsight, neglect, priming, and Owen et al.'s vegetative-state patients all show information processing without (or apart from) awareness.
- The lecture's unifying answer: consciousness is for deliberate, voluntary action and durable explicit information maintenance.
Theme 4 β Normative vs descriptive
- Bayesianism = how rational agents should reason.
- Tversky & Kahneman = how humans actually reason (badly, with predictable biases).
- Same tension in language learning: do children obey UG (norms) or are they statistical learners (descriptive)?
Theme 5 β The four big "dissolution" moves
- Materialism dissolves dualism β Hobbes vs Descartes (lec 3).
- SβS dissolves SβR β Rescorla vs Watson (lec 1).
- Connectionism dissolves the need for explicit rules β Plunkett & Marchman vs dual-route (lec 5, 8).
- Dennett dissolves the hard problem β vitalism analogy (lec 9).
Timeline of Key Figures & Events
| Date | Figure / Event | Contribution |
|---|---|---|
| 1247 | Bethlem Royal Hospital ("Bedlam") | First closed institution for the mentally ill |
| 1596β1650 | RenΓ© Descartes | Cartesian dualism; pineal gland as mindβbody interaction site |
| 1588β1679 | Thomas Hobbes | Materialism β everything is matter, soul is meaningless |
| ~1810s | Franz Gall | Phrenology β wrong in detail, right in spirit (localization) |
| 1824β1880 | Paul Broca | Clinical neuropsychology; Broca's area = speech production |
| 1898 | Edward L. Thorndike | Puzzle-box experiments; Law of Effect |
| 1904 | Ivan Pavlov | Nobel Prize; classical conditioning of salivation in dogs |
| 1913 | John B. Watson | Founds behaviourism (SβR) |
| ~1920s+ | B. F. Skinner | Operant conditioning; Skinner box; reinforcement schedules |
| 1943 | McCulloch & Pitts | Neural networks compute any computable function |
| 1945 | Walter Freeman | Transorbital icepick lobotomy; >40,000 in the US |
| 1953 | Volkova | Semantic generalization in conditioning |
| 1956 | George A. Miller | "Magical number 7 Β± 2"; ~3 bits channel capacity |
| 1958β60 | Cherry Β· Broadbent Β· Treisman | Dichotic listening; early-selection filter; switching |
| 1959 | Moray | "Own name" breakthrough in unattended ear |
| 1965 | Joseph Weizenbaum | ELIZA |
| 1970 | Terry Winograd | SHRDLU (MIT) |
| 1972 | Kenneth Colby | PARRY (Stanford); 48% Turing-test accuracy |
| 1972 | Corteen & Wood | GSR breakthrough β semantic processing of unattended info |
| 1973 | Rescorla | SβS vs SβR habituation experiment |
| 1973 | Cooper & Shepard | Mental rotation |
| 1974 | Thomas Nagel | "What is it like to be a bat?" |
| 1980 | David Marr | Three levels of analysis (published posthumously 1982) |
| 1986 | Frank Jackson | Mary's Room β the knowledge argument |
| 1993 | Plunkett & Marchman | Connectionist past-tense network (20-30-20) |
| 1995 | David Chalmers Β· van Gelder | Hard problem of consciousness Β· "What might cognition be, if not computation?" |
| 2005 | Fang & He | Interocular suppression β dorsal vs ventral consciousness |
| 2006 | Owen et al. | fMRI detects covert awareness in vegetative-state patients |
| 2016 | Cramer et al. | Ising network model of depression |
| 2017 | Vaswani et al. | "Attention is all you need" β the Transformer |
| 2019 | Cisek Β· Busemeyer et al. | Phylogenetic refinement Β· Decision Field Theory |
| 2021 | Bender, Gebru, McMillan-Major & Shmitchell | "On the Dangers of Stochastic Parrots" |
| 2023 | Piantadosi Β· Binz & Schulz | "LLMs refute Chomsky" Β· GPT-3 on Wason task |
Master Glossary
High-yield terms across all lectures. Skim the night before; nail them all.
Conditioning & learning
| Term | Definition |
|---|---|
| US / UR | Unconditioned stimulus / unconditioned response (food β salivation) |
| CS / CR | Conditioned stimulus / conditioned response (bell β salivation after pairing) |
| Extinction | CR weakens when CS is repeatedly unpaired with US |
| Spontaneous recovery | Partial return of CR after rest β CR is inhibited, not lost |
| SβR vs SβS theory | Direct stimulus-response bond vs link via mental representation of US |
| Law of Effect | Satisfying consequences strengthen responses (Thorndike) |
| Operant / instrumental | Self-initiated behaviour modified by consequences |
| Shaping | Reinforcement of successive approximations |
| FR / VR / FI / VI | Four schedules β VR (variable-ratio) most resistant to extinction |
Brain & methods
| Term | Definition |
|---|---|
| Action potential | All-or-none binary signal; frequency-modulated |
| Myelin | Sheath increasing axon speed; "white matter"; MS attacks it |
| Corpus callosum | Connects hemispheres; cut in callosotomy for refractory epilepsy |
| Brodmann area | Cytoarchitectonic parcellation (BA 4 motor, 17 visual, 44 Broca) |
| BOLD signal | Blood-oxygen-level dependent β what fMRI directly measures |
| DTI | Diffusion tractography β visualizes white-matter tracts |
| P1/N1, P300, N400, P600 | ERP components β attention, oddball, semantic, syntactic |
| Dissociation / double dissociation | Strong evidence for separable mechanisms |
Vision & categorization
| Term | Definition |
|---|---|
| Primal / 2Β½-D / 3-D sketch | Marr's three stages of vision |
| Generalized cones | Marr's primitives β objects are stacks of cones |
| Geons (36) | Biederman's geometric primitives |
| Constancies (size/brightness/shape) | Brain corrects for retinal variation to perceive stable objects |
| Kanizsa's triangle | Illusory contours filled in by the brain |
| Dorsal vs ventral | Where/how (action) vs what (perception) pathways |
| Form agnosia vs integrative agnosia | Cannot perceive shape vs cannot integrate shape into recognition |
| Blindsight | Above-chance forced-choice without conscious vision (V1 damage) |
| Prosopagnosia | Face blindness; bilateral FFA damage |
| Capgras delusion | "My partner is an impostor" β opposite dissociation of prosopagnosia |
Connectionism & AI
| Term | Definition |
|---|---|
| Perceptron | Single-layer network of weighted inputs |
| Linearly separable | Class of functions a perceptron can learn (XOR is NOT) |
| Delta rule | ΞWα΅’ = Ρ·Iα΅’; ΞT = βΞ΅ |
| Backpropagation | Assigns each hidden unit "responsibility" for output error |
| Gradient descent | Follow negative gradient; risk of local minima |
| Universal approximation | Multi-layer nets compute any Turing-computable function |
| Token / embedding | Text chunk / high-dim learned vector |
| Autoregression | Feed each prediction back as input |
| RLHF | Reinforcement learning from human feedback β second training stage of LLMs |
Bayes
| Term | Definition |
|---|---|
| Prior / likelihood / posterior | p(H), p(E|H), p(H|E) |
| Marginal evidence | p(E) = Ξ£ p(E|H)Β·p(H) |
| Dutch book | Bet that's fair to subject but guaranteed loss β justifies probability calculus |
| Base-rate neglect | Ignoring prior probability when updating |
| Transposed conditional | Confusing p(A|B) with p(B|A) |
| Conjunction fallacy | Judging p(Aβ§B) > p(A) β Linda problem |
| Maximizing vs matching | Always pick the higher-probability option vs match probabilities β maximizing wins |
| Search policy | argmaxα΅’ pα΅’Β·aα΅’/cα΅’ |
Consciousness & mind
| Term | Definition |
|---|---|
| Wakefulness / awareness | State vs experience (Laureys) |
| UWS / MCS / LIS | Vegetative / minimally conscious / locked-in |
| Qualia | Felt qualities of experience |
| P-consciousness / A-consciousness | Block's phenomenal vs access distinction |
| Easy / Hard problems | Information processing tractable / experience itself (Chalmers) |
| Knowledge argument / Mary's Room | Jackson 1986 |
| GWT / GNW | Global (Neuronal) Workspace β Baars / Dehaene |
| IIT (Ξ¦) | Integrated Information Theory β Tononi |
| RP | Recurrent Processing theory β Lamme |
| Vitalism (Γ©lan vital) | Dennett's analogy for why hard problem will dissolve |
| Mentalese / Language of Thought | Fodor's innate symbolic medium |
| Module (Fodor) | Domain-specific, encapsulated, mandatory, fast, fixed, specific breakdown |
| Isotropic | Un-encapsulated, holistic β Fodor's central processing |
| Massive modularity | No central processor; many Darwinian modules |
| State space / trajectory | DST geometric concepts for cognition over time |
40-question Cross-lecture MCQ Practice
Mixed, harder, exam-shaped. Click "Show answer" only after committing. The choices include the most common distractors a professor will use.
- A successful classical conditioning result
- An experimental error he called "psychic secretion"
- An operant response
- Evidence of insight learning
Show answer
- Cherry's finding that listeners notice voice-pitch changes in the ignored ear
- Moray's (1959) "own name" breakthrough
- Treisman's (1960) ear-switching effect
- Corteen & Wood's (1972) GSR to ROME after conditioning to PARIS, LONDON, CAIRO
Show answer
- Basic physiology
- Propositional logic
- Turing's theory of computation
- Information theory (Shannon)
Show answer
- Computational
- Algorithmic
- Implementational
- Connectionist
Show answer
- Marr's "generalized cones"
- Pavlov's SβS theory
- Broadbent's filter model
- Skinner's shaping
Show answer
- Dorsal pathway only
- Ventral pathway only
- Both pathways
- Broca's area
Show answer
- Capgras patients have intact explicit recognition but lost emotional/implicit recognition
- Capgras patients lose both explicit and implicit recognition
- Capgras patients have intact emotional but lost explicit recognition
- Capgras involves the dorsal stream
Show answer
- Direct firing of neurons
- Changes in haemoglobin's magnetic properties due to oxygenation
- Cerebral blood flow measured by radioactive tracer
- Magnetic fields produced by ion currents
Show answer
- These tasks activate motor (SMA) vs parahippocampal regions β task-specific, voluntary, and detectable on fMRI
- They are easier for patients than verbal tasks
- They reduce noise in EEG recordings
- They are the only paradigms compatible with locked-in syndrome
Show answer
- Sigmoid
- ReLU
- Softmax
- Heaviside step function
Show answer
- It is the only function single-layer perceptrons can compute
- It is non-linearly-separable, exposing the fundamental limitation of single-layer perceptrons
- It cannot be computed by any neural network
- It requires Hebbian learning
Show answer
- You need a dual-route architecture for overgeneralization to occur
- A single neural network without explicit rules can reproduce overgeneralization and gradual learning curves
- Children do not overregularize
- Backpropagation is biologically implausible
Show answer
- Domain-specific modules
- The existence of a domain-general central processor
- Natural selection
- Pragmatic representations
Show answer
- Symbolic AI
- Connectionism
- Dynamical systems theory (van Gelder, 1995)
- Predictive coding
Show answer
- Symptoms of a single underlying latent variable
- Binary nodes coupled by pairwise weights; stress = extra input to all nodes
- Symbolic atoms in a Mentalese
- Output of a single decision system
Show answer
- The likelihood is high and the prior is low
- The likelihood is high and the prior is high
- The evidence is high
- The evidence equals the likelihood
Show answer
- Personal probabilities are subjective and unscientific
- Violations of probability calculus make one exploitable, so rational beliefs must obey it
- Conditional probabilities are unreliable
- Frequentism is correct
Show answer
- Conjunction fallacy: representativeness overrides probability theory
- The conjunction is actually more likely
- Bank tellers form a subset of feminists
- Availability heuristic
Show answer
- 56% correct
- 67% correct
- 100% correct
- 33% correct
Show answer
- LLMs have genuine semantic grounding
- LLMs stitch linguistic forms together by probability without reference to meaning
- LLMs cannot be trained on biased text
- LLMs refute Chomsky
Show answer
- Physical knowledge fully captures conscious experience
- There are aspects of conscious experience not captured by complete physical knowledge
- Mary cannot learn colour from books
- Qualia are an illusion
Show answer
- Coma < LIS < UWS < MCS
- Coma < UWS < MCS < LIS
- UWS < Coma < MCS < LIS
- LIS < MCS < UWS < Coma
Show answer
- Both streams went silent
- Dorsal-stream activity persisted; ventral activity tracked conscious perception
- Ventral-stream activity persisted; dorsal activity tracked conscious perception
- Conscious perception preceded both streams
Show answer
- 0Β° / 360Β°
- 90Β°
- 180Β°
- 270Β°
Show answer
- Early sensory processing
- Attention
- Semantic processing / semantic anomaly
- Syntactic reanalysis
Show answer
- Operant conditioning of language
- Semantic generalization in classical conditioning (responses generalized from "good"/"bad" to whole sentences)
- Extinction of conditioned fear
- Discrimination training in pigeons
Show answer
- 0.03 kg
- 0.3 kg
- 3 kg
- 30 kg
Show answer
- Ignored input is completely blocked at an early filter
- All input is processed for meaning; ignored input is forgotten
- Ignored input is attenuated but not blocked; important info (your name) is spared
- Distractor processing depends on the load of the main task
Show answer
- They use too many parameters
- They are trained on 4β5 orders of magnitude more data than human children, so behavioural similarity does not imply mechanistic similarity (multiple realizability)
- They cannot solve the Wason task
- They are deterministic
Show answer
- Paul Broca
- Walter Freeman (transorbital icepick, 1945; >40,000 procedures)
- B. F. Skinner
- Antonio Damasio
Show answer
- Chalmers
- Nagel
- Dennett
- Tononi
Show answer
- The reportable, control-of-thought aspect of consciousness
- The phenomenal/qualia aspect β what it's like
- Pre-reflective awareness
- Primary sensory cortex activity
Show answer
- argmaxα΅’ pα΅’
- argmaxα΅’ aα΅’
- argmaxα΅’ pα΅’ Β· aα΅’ / cα΅’
- argmaxα΅’ pα΅’ + aα΅’ β cα΅’
Show answer
- 1 bit
- 3 bits (β 7 items)
- 7 bits
- 10 bits
Show answer
- Syntactic anomaly with semantic well-formedness
- Syntactic well-formedness with semantic anomaly
- Both syntactic and semantic anomaly
- The conjunction fallacy
Show answer
- Convergence on any Turing-computable function
- Convergence whenever a perceptron-realizable (i.e., linearly separable) solution exists
- Convergence on the global minimum of any cost surface
- Convergence in O(log n) iterations
Show answer
- A dog salivates when a bell rings
- A child blinks when a puff of air hits her eye
- A rat presses a lever more often because pressing produces food
- A baby orients to a loud sudden sound
Show answer
- It uses transitional probabilities to segment words
- It computes p(referent | sentence) β p(sentence | referent) Β· p(referent), favoring "red ball" because more specific hypotheses make the data more probable
- It refutes Chomsky
- It demonstrates the conjunction fallacy
Show answer
- Patient D.F.'s ability to post a card despite ventral damage
- The grip aperture being unaffected by the Ebbinghaus illusion (dorsal not fooled)
- Fang & He (2005)'s interocular suppression: dorsal active for invisible stimuli, ventral only for conscious ones
- All of the above
Show answer
- fMRI has high spatial but low temporal resolution
- Single-unit recording achieves both high temporal and high spatial resolution but is invasive
- MEG measures changes in blood oxygenation
- EEG has high temporal but low spatial resolution
Show answer
β End of study guide β
Good luck on the exam. Trust the priors. Update on evidence.
The Emotions: From Cognitive Science to Affective Science
Emotions were largely ignored in early cognitive science. Affective science now studies them with a rich, multidisciplinary toolkit β from genetics and lesion studies to neuroimaging β using fear as its central case study.
16.1 Early Theories
Herbert Simon β Emotions as Interrupt Mechanisms
Simon (1967) argued that any sufficiently complex serial information-processing system (like the mind) must contain interrupt mechanisms β processes that can suspend an ongoing goal and substitute a new one when circumstances demand it. His proposal: emotions are those interrupt mechanisms in the CNS.
Three key properties of emotions on Simon's account:
- They interrupt ongoing goals and substitute new goals/behaviours.
- They arouse the autonomic nervous system in predictable physiological ways.
- They generate feelings of emotion.
Simon's paper is influential but thin on detail β no specific emotion is actually analysed. It raises key open questions: How should emotions be classified? Are some basic? What is the role of arousal, physiology, and feeling? What are the neural bases?
Paul Ekman β Basic Emotion Theory
Ekman asked whether facial expressions of emotion are cross-cultural universals or products of social learning. To avoid the confound of media exposure, he studied the Fore linguistic-cultural group in New Guinea β a preliterate, visually isolated culture.
Method: participants were shown 2β3 photos of facial expressions while a story indicating an emotion was read aloud. They had to point to the matching face. Six target emotions: happiness, sadness, anger, surprise, disgust, fear.
Result: both adults and children in New Guinea matched emotions to faces at rates significantly above chance, supporting universal cross-cultural recognition. Ekman also showed that literate cultures could recognise New Guinean facial expressions.
- There are discrete, separate basic emotions, each with a coherent set of facial, physiological, cognitive, and behavioural responses.
- Each basic emotion serves a distinctive evolutionary function and is hardwired for specific life tasks.
Example β fear: eyebrows raised and horizontal, upper eyelid lifted, more sclera exposed (gathers information about threat); heart rate and skin conductance elevated; peripheral blood flow redirected to large skeletal muscles (preparing to flee).
Criticisms: Meta-analyses cast doubt on strict links between emotions and specific physiological/neural signatures. Cultural anthropologists question whether emotions are truly independent of social context.
16.2 Affective Space and the Affective Scientist's Toolkit
Affective Space: Beyond Discrete Categories
Affective phenomena vary in duration and function β emotions, moods, instincts, drives, and affective traits are all distinct (though their boundaries are fuzzy). Many affective scientists now model emotions as points in a multidimensional space rather than discrete categories.
The simplest model uses two dimensions:
- Valence β pleasure β displeasure (attractiveness vs. aversiveness of a situation)
- Arousal β degree of physiological/psychological engagement
The circumplex model (Russell, 1980) plots emotions on a circle defined by valence and arousal. Includes both emotions and moods (e.g., sadness and depression are adjacent).
Adolphs & Anderson (2018) propose a richer 7-dimensional framework: scalability, valence, persistence, generalisation, global coordination, automaticity, social coordination. Designed to apply to non-human animals without relying on verbal self-reports.
Appraisal theories (Lazarus and others) add a cognitive dimension: emotions involve evaluating the environment relative to the subject's goals β what Lazarus calls core relational themes. On this view, anger and fear differ because they involve different appraisals of the same situation.
The Affective Scientist's Toolkit
Different tools study different components of an emotional episode (trigger β perception β neural/somatic response β behavioural response, with optional: cognitive appraisal, feelings, verbal report):
- fMRI, PET, electrophysiology β neural responses
- FACS (Facial Action Coding System) β behavioural/expressive responses
- Physiological measures β heart rate, skin conductance, finger temperature
- Lesion studies β causal role of specific brain regions
- Genetic tools β knockout experiments, optogenetics, pharmacogenetics
Genetic Tools
Knockout experiments (mice, 1990s+): replace a functional gene with a nonfunctional copy in stem cells, then study behavioural change. E.g., knocking out the serotonin receptor 5-HT(1A) increases anxiety-like behaviour in mice.
Optogenetics: engineer specific neurons to express a light-sensitive ion channel (opsin). Neurons can then be switched on/off with light (millisecond resolution). Allows targeted intervention in specific neural populations.
Pharmacogenetics: engineer neurons to express receptors for specific synthetic drugs (not normal neurotransmitters). Can activate or inhibit targeted populations. Slower than optogenetics (minutes to hours).
GECIs / GEVIs: genetically engineered indicators for calcium or voltage β allow optical measurement of neural activity as a complement to electrophysiology.
Lesion Studies
Classic case: Phineas Gage (1848) β iron rod through his head destroyed ventromedial prefrontal cortex. Physical and perceptual abilities preserved; emotional regulation and social behaviour drastically changed. First evidence linking prefrontal cortex to emotional function.
Lesions in humans provide information about dissociations; animal lesions allow greater anatomical precision and pre/post comparisons. Types: permanent (aspiration, excision, neurotoxins) or reversible (pharmacological, cryogenic cooling, TMS).
16.3 Fear: A Multilevel and Multidisciplinary Case Study
Fear Conditioning in Rodents
Fear conditioning pairs a neutral conditioned stimulus (CS β e.g., a tone) with an aversive unconditioned stimulus (US β e.g., foot shock). After training, CS alone elicits fear responses (freezing, autonomic arousal). This provides a controllable, reproducible way to study fear.
Multiple studies converge on the amygdala (a subcortical limbic structure) as central to fear:
- Electrical stimulation of the amygdala produces fear responses (increased respiration, heart rate, blood pressure, freezing) β amygdala is sufficient.
- Lesion studies: amygdala lesions abolish conditioned freezing and reduce unconditioned defensive behaviour (e.g., lesioned rats approach sedated cats) β amygdala is necessary.
- The basolateral nucleus (BLA) contains distinct neuron populations responding to positive vs. aversive stimuli, projecting to different brain regions.
Fear and Amygdala Damage in Humans β Patient S.M.
S.M. has bilateral amygdala destruction from UrbachβWiethe disease (UWD), a rare genetic condition. Her basic cognition (intelligence, memory, language, perception) is intact. But she shows profound impairment in experiencing, expressing, and recognising fear.
Feinstein et al. (2011): exposed S.M. to live snakes and spiders, a haunted house, and scary films. She showed no fear responses β no avoidance, no subjective fear. She experienced other emotions normally, confirming fear-specific amygdala involvement.
A separate group of UWD patients (Namaqualand, South Africa) had damage only to the basolateral amygdala (BLA), leaving the central-medial amygdala (CMA) intact. Their response: fear hypervigilance β exaggerated attention to mild threat cues (e.g., better recognition of fearful facial expressions). This contrasts with S.M.'s hypovigilance (BLA + CMA damaged).
Interpretation: the BLA exerts inhibitory control over the CMA. Without BLA, CMA fires impulsively β hypervigilance. Without both β no fear response at all.
Neuroimaging of Fear in Humans
LaBar et al. (1998): fMRI showed amygdala activation during fear conditioning and extinction in humans; greatest involvement in early stages of conditioning.
The human fear network identified by neuroimaging:
- Amygdala β central fear processing
- Hippocampus β memory/contextual information
- Insula β broader emotion processing (especially disgust)
- Anterior cingulate cortex (ACC) β bridges limbic and prefrontal systems
- Ventromedial prefrontal cortex (vmPFC) β integrative hub; modulates/controls fear responses (especially in extinction)
Clinical relevance: PTSD involves overgeneralised fear, slow extinction, amygdala/ACC hyperactivity, and low vmPFC activity.
Mobbs et al. (2010): scanned subjects while a live tarantula was placed at varying distances from their foot. Closer threat β increased amygdala, insula, ACC, BNST activation (active coping: flee). More distant threat β increased orbitomedial PFC activity (passive coping: freeze). Approach vs. retreat also differentially activated amygdala and BNST.
Summary
Affective science grew from cognitive science's neglect of emotion. Simon proposed a computational account (emotions as interrupt mechanisms); Ekman proposed discrete, universal basic emotions. Contemporary affective science has moved toward multidimensional models of affective space and uses a broad toolkit β neuroimaging, lesion studies, fear conditioning, and genetic tools. Fear is the best-studied emotion, with the amygdala (and its subregions BLA/CMA) playing a necessary and central role, embedded in a wider network including the hippocampus, insula, ACC, and vmPFC.