Introduction to Cognitive Science · Leiden

The Cognitive Science Master Study Guide

An exam-ready synthesis of all ten lectures — every theory, every named scholar, every diagram, every contestable claim, every multiple-choice trap.

Lectures

540+

Pages of slides

100+

Named scholars

90+

Self-test MCQs

JLB Chapter 1

The Prehistory of Cognitive Science

Behaviourism dominated psychology in the early 20th century, restricting it to observable behaviour. Cracks in its account of conditioning, the rise of computational models of mind, and discoveries about attention pushed psychology toward cognitivism.

1.1 Two assumptions of behaviourism

All learning is the result of conditioning.
Conditioning depends on association and reinforcement.

The behaviourist slogan: "Psychology is the science of behavior." Mental processes are unobservable and therefore unscientific. Chomsky's killer rejoinder:

"Defining psychology as the science of behavior was like defining physics as the science of meter reading."— Noam Chomsky

1.2 Classical (Pavlovian) conditioning

Pavlov initially studied salivation in dogs (Nobel Prize 1904). When he noticed dogs salivating to the assistant opening the door — a "psychic secretion" — he pivoted his entire research programme.

CS = conditioned stimulus, UR = unconditioned response, CR = conditioned response.

The big debate inside classical conditioning

S–R (Watson)

Conditioning forges a direct stimulus–response bond. Bell → salivation. No mental states involved. Strictly behaviourist.

vs.

S–S (Pavlov / cognitivists)

Bell activates a mental representation of the food, which then produces the response. Internal states inferred from their predicted effects.

Rescorla (1973) tested this with rats. Pair light + loud sound → both elicit freezing. Then habituate half the rats to the sound alone until they stop freezing to it. Now present the light:

S–R predicts: freezing to the light is intact (the light → freezing bond is independent).
S–S predicts: no freezing — the light triggers the representation of the now-habituated sound.

Result: habituated rats did NOT freeze to the light — strong support for the cognitivist S–S theory.

Phenomena to know

Extinction — CR weakens when CS is presented repeatedly without the US.
Spontaneous recovery — after extinction, the CR partially returns after a rest. Pavlov concluded the CR is inhibited, not lost.
Generalization — CR also occurs to similar stimuli (tones near the trained tone).
Discrimination — organism learns to respond only to a specific CS, not similar ones.
Volkova (1953) — semantic generalization: children conditioned to "good"/"bad" generalized to whole sentences like "The children are playing nicely together" / "The Fascists destroyed many cities."

📘 Textbook adds — Tolman's latent learning & cognitive maps (1930, 1946)

Tolman & Honzik (1930) — "'Insight' in Rats"

Three groups of rats ran a 14-unit T-Alley maze. Group 1 always rewarded; Group 2 never; Group 3 unrewarded for 10 days, then rewarded. When Group 3 started getting rewards, they learned the maze faster than Group 1 ever had — they had stored maze information during the unrewarded period. This latent learning directly contradicts behaviourism: learning without reinforcement.

Tolman et al. (1946) — cross-maze studies: place learning (knowing where the food is) is easier than response learning (knowing which turn to make). Rats build cognitive maps — internal representations of spatial layout. First major case of postulating internal representations in a behavioural science.

📘 Textbook adds — Lashley (1951): "The Problem of Serial Order in Behavior"

Lashley argued that complex behaviour (speech, tennis, piano playing) cannot be a chained sequence of stimulus–response links because what happens next depends on what will happen later in the sequence and on the overall goal. He proposed behaviour is organised hierarchically, with high-level plans broken down into sub-plans. Two foundational ideas crystallise from his essay:

Subconscious information processing — most of the planning that turns goals into movements happens below awareness.
Task analysis — a complex cognitive ability can be understood by decomposing it into a hierarchy of simpler sub-tasks (the methodological backbone of cognitive science).

1.3 Operant (instrumental) conditioning

Edward L. Thorndike 1898 puzzle box

Cats in a puzzle box escape by trial-and-error. Over 20–30 trials, the time to escape drops sharply.

Law of Effect: "Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely."

B. F. Skinner Skinner box

Animal stays in the box and can repeatedly produce operant responses. Replaced Thorndike's mentalistic "satisfaction" with the behaviour-neutral term reinforcement.

Used variable-ratio schedules to explain gambling addiction (Skinner 1953).

Reinforcement schedules

Schedule	Rule	Behaviour produced
Fixed-Ratio (FR)	Reinforce after every nth response (FR-5 = every 5th)	Fast, steady responding
Variable-Ratio (VR)	Average of n responses per reward, varies unpredictably	Fastest responding; most resistant to extinction → slot machines!
Fixed-Interval (FI)	First response after a fixed time interval is reinforced	"Scalloping" — responding speeds up near the interval's end
Variable-Interval (VI)	Interval varies unpredictably	Slow, steady responding

Ratio > Interval because in ratio schedules reinforcers scale with response rate; in interval schedules they're time-capped.

Positive reinforcement = arrival of a stimulus increases the response. Negative reinforcement = removal of a stimulus increases the response. Both make behaviour more likely (≠ punishment).

Shaping, discrimination, concept learning

Shaping = training complex behaviour by reinforcing successive approximations (dog → kitchen → refrigerator → door → scratching).
Discriminative stimulus = signal that a response will be reinforced (a light being on).
Concept learning — pigeons rewarded for pecking Monet (not Picasso) generalize to Cézanne, Renoir → a category "impressionist" forms.

1.4 Cognition and computation

If humans can simulate a single-tape Turing machine (slowly, inefficiently), then the brain is Turing-complete. McCulloch & Pitts built networks of neurons from three principles:

Basic physiology
Propositional logic
Turing's theory of computation

Their results: any computable function can be computed by a network of neurons; all logical operators can be built from simple neural networks.

📘 Textbook adds — Chomsky's Syntactic Structures (1957)

Chomsky distinguished the deep structure of a sentence (its constituent phrase structure) from its surface structure (the actual word order, derived via transformational rules).

Phrase-structure grammar

Sentences = combinations of basic parts of speech (N, V, Adj, NP, VP…) generated by recursive phrase-structure rules (e.g., S → NP + VP).

vs.

Transformational grammar

Maps deep structure to surface structure. Explains why "John has hit the ball" and "The ball has been hit by John" share a meaning despite different surface forms; and why "Susan is easy to please" ≠ "Susan is eager to please" despite a near-identical surface.

This was the first time a linguist offered an explanatory account of language structure rather than just classification — the model for algorithmic theories of mental capacities.

1.5 The mind as an information processor

George A. Miller (1956) — "The magical number 7 (± 2)": human channel capacity ≈ 3 bits ≈ 7 items, roughly independent of modality. Measured by:

Digit-span task — repeat back the longest sequence of digits you can hold.
Absolute judgment task — identify stimuli along one dimension.

Psychophysics — Weber's law

jnd = k · M (just noticeable difference = constant × stimulus magnitude)

k = 0.03 for weight (3% change detectable)
k = 0.01 for length
k = 0.25 for sound frequency in mice

The same absolute difference (10 units) is easy to detect at low magnitude (10 vs 20) and hard at high magnitude (110 vs 120). The detection probability curve runs sigmoidally from 0% (no difference) through 50% (the jnd) to 100% (clear difference).

1.6 Attention — reducing information load

Cherry's dichotic listening / shadowing: participant repeats one ear's story aloud and cannot report the content of the other ear. They do notice physical changes (voice pitch shift, sudden tones).

Unattended channels are filtered before semantic processing.

Three arguments AGAINST early selection

Breakthrough (Moray 1959)

Your own name in the ignored ear penetrates the filter — meaning must have been processed.

Switching (Treisman 1960)

When the shadowed story switches ears, participants follow it — meaning the ignored ear was being parsed.

GSR (Corteen & Wood 1972)

Words PARIS, LONDON, CAIRO conditioned to a shock. Later, ROME in the ignored ear evokes a fear response → semantic category "city" was activated.

Three alternatives to early selection

Model	Author	Claim
Late selection	Deutsch & Deutsch	All stimuli processed for meaning; ignored ones quickly forgotten.
Attenuation	Anne Treisman	Ignored info is attenuated, not blocked. Important info (your name) is spared.
Load theory	Nilli Lavie	Distractor processing depends on how much capacity the main task leaves over.

Self-test · Lecture 1

1. In Rescorla's (1973) habituation experiment with rats, the cognitivist (S–S) theory predicted that:

Habituated rats would still freeze to the light because the light→freezing bond is independent.
Habituated rats would NOT freeze to the light because it triggers the (now habituated) representation of the sound.
Habituated rats would freeze more strongly to the light due to dishabituation.
Habituation should generalize to all conditioned stimuli regardless of pairing.

Show answer

B. The S–S theory says the CS activates a representation of the US; if that representation has been habituated, the CR disappears. Rescorla's data confirmed this.

2. Which reinforcement schedule is most resistant to extinction and famously exploited by slot machines?

Fixed-ratio
Fixed-interval
Variable-ratio
Continuous reinforcement

Show answer

C. Variable-ratio. Unpredictable timing of reward generates extremely persistent responding.

3. Miller's (1956) "magical number" claims human channel capacity is approximately:

3 bits (≈ 7 items), roughly modality-independent
7 bits (≈ 128 items), strongly modality-dependent
1 bit (binary), modality-dependent
10 bits (≈ 1000 items), strongly visual

Show answer

A. 3 bits ≈ 7 items, roughly independent of modality (measured via digit span and absolute judgment).

4. Corteen & Wood (1972) conditioned a galvanic skin response to city names. When a new city name appeared in the IGNORED ear, participants showed a GSR. This finding:

Supports Broadbent's strict early-selection filter
Demonstrates that unattended stimuli are processed semantically
Shows that GSR is unrelated to attention
Confirms Pavlovian extinction

Show answer

B. The fact that semantic category "city" generalized through the unattended channel shows meaning was extracted — an argument against strict early selection.

5. Skinner deliberately preferred the term reinforcement over Thorndike's term because:

It was newer terminology
"Satisfaction" implied a mentalistic / unobservable inner state, which behaviourism avoided
Thorndike's law was already discredited
It allowed Skinner to include classical conditioning under the same heading

Show answer

B. "Satisfaction" suggests a felt inner state; "reinforcement" is behaviourally defined (whatever increases the response rate).

JLB Chapter 2

Three Milestones of Cognitive Science

Three foundational achievements: SHRDLU (language as algorithmic processing), the imagery debate (spatial vs propositional representation), and Marr's three levels (computational / algorithmic / implementational). Cognitive science matures by treating the mind as a system that operates over internal representations.

2.1 Language and micro-worlds

ELIZA Weizenbaum, 1965

Keyword matching + transformation rules simulating a psychotherapist. Created "the illusion of understanding."

"I had not realized that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people."

PARRY Colby, 1972 · Stanford

Simulated a paranoid schizophrenic — its inconsistencies made it more realistic. In a modified Turing test, 33 psychiatrists classified transcripts at only 48% accuracy (chance).

SHRDLU Winograd, 1970 · MIT

Operated in a colored-blocks micro-world with a robot arm. Could parse and respond to queries like "Does the shortest thing the tallest pyramid's support supports support anything green?"

SHRDLU's three processing stages:

Syntactic
analysis

→

Semantic
analysis

→

Integration with
world knowledge

Ambiguity demonstration: "Put the red cube on the block in the box." Two readings — [red cube on the block] / [in the box] vs [red cube] / [on the block in the box]. Syntactic parsing alone is insufficient; world knowledge is required to disambiguate.

2.2 The imagery debate — what is mental imagery?

Spatial / depictive

Mental images preserve metric/spatial properties. Same mechanisms as perception. Image of your room has actual layout.

vs.

Propositional

Mental images are symbolic, sentence-like ("the pizza was on the dining table"). No spatial format.

Evidence favouring the spatial view:

Detail recognition — to identify small details in an imagined object, you must "zoom in" — incompatible with propositional format.
Physical-property effects — brightness, contrast, motion speed affect reaction times the same way for perceived and imagined stimuli.
"Imagine your dinner" effect — people with bigger houses take longer to mentally answer spatial questions about them. It takes time to travel in our mind.
Mental rotation (Cooper & Shepard 1973) — reaction time to judge whether the letter R is normal or mirrored grows linearly with the angle of rotation, peaking near 180°.

Strong evidence imagery uses a spatial, perception-like format.

2.3 Marr's three levels of analysis

David Marr (1945–1980), neuroscientist of vision. Two prizes are named after him (IEEE/ICCV; Cognitive Science Society).

Level	Question	Example for vision
Computational	What problem is being solved? Input → output?	Recover 3-D structure from 2-D retinal image
Algorithmic	How is it solved? What representations and operations?	Edge detection → 2½-D sketch → 3-D model
Implementational	How is this physically realized?	Neurons in V1, V2, V4, IT

Exam trap: classic distractors swap the algorithmic and implementational levels. The algorithm describes representations and steps; the implementation describes the physical substrate.

2.4 Marr's three stages of vision

Primal sketch
edges, blobs

→

2½-D sketch
surface orientations
viewer-centered

→

3-D sketch
object-centered
generalized cones

The 2½-D sketch is viewer-centered (depends on where you stand); the 3-D sketch is object-centered (invariant to viewpoint). The 2½-D stage uses stereopsis and Gestalt laws.

Contour-related demonstrations: Hidden Dalmatian (low contour image, hard to perceive) · Kanizsa's triangle (illusory contours filled in) · Camouflage = reducing contours to defeat the primal sketch.

2.5 Perceptual constancies

Size constancy

Two retinally same-sized people can be different real sizes — depth cues correct.

Brightness constancy

Patches A and B can reflect the same light but appear differently bright (the shaded one must be brighter).

Shape constancy

Shepard's tables: two tables look like the same shape on the retina but legs/perspective tell us one is actually longer.

2.6 Object categorization — four theories

Theory	Claim	Strength / weakness
Categorization by definition	Membership = necessary + sufficient features (cat = furry + meows + four legs…)	Hairless non-meowing cats? → family resemblance fails
Categorization by prototype	A category = an idealized average; typicality = closeness to prototype	Robin verified faster than penguin as "bird" (sentence verification task)
Categorization by exemplars	Encountered instances stored individually; typicality falls out of frequency	Best for small categories
Recognition-by-components (Biederman)	Objects = combinations of 36 geons (geometric ions)	Decomposition-based view; matches Marr's "generalized cones" idea

2.7 Categorization hierarchies — expertise effect

Global
"furniture"
large within-category difference

→

Basic
"chair"
default level

→

Specific
"Barcelona Chair"
small within-category difference

Non-experts default to the basic level; experts categorize at a more specific level within their domain. Example: Barcelona Chair (Mies van der Rohe & Lilly Reich, 1929).

Self-test · Lecture 2

1. In Marr's framework, the question "what problem is being solved and why?" belongs to which level?

Implementational
Algorithmic
Computational
Connectionist

Show answer

C. The computational level specifies the goal/function; the algorithmic level specifies representations + steps; the implementational level specifies the physical substrate.

2. Cooper & Shepard's (1973) mental rotation experiment supports:

Propositional theory of imagery
Spatial/depictive theory of imagery
The exemplar theory of categorization
Marr's 3-D sketch

Show answer

B. The fact that reaction time grows with rotation angle implies that people actually rotate an image — incompatible with a purely propositional/symbolic representation.

3. Which of Marr's stages of vision is viewer-centered?

Primal sketch
2½-D sketch
3-D sketch
None — all are object-centered

Show answer

B. The 2½-D sketch encodes surface orientations from the viewer's position; the 3-D sketch is the viewpoint-invariant, object-centered description.

4. PARRY (Colby, 1972) showed that:

A simple program could pass a modified Turing test with psychiatrists (≈ 48% accuracy)
Schizophrenia is a purely computational disorder
Real understanding requires biological substrate
SHRDLU could not handle ambiguity

Show answer

A. Psychiatrists could only classify transcripts at chance — PARRY's inconsistencies even helped its realism.

5. Biederman's recognition-by-components theory claims objects are composed of how many geons?

Show answer

D. 36 geometric ions ("geons") that combine to form all objects.

6. The sentence "A robin is a bird" is verified faster than "A penguin is a bird." This is evidence for:

Definitional categorization
Prototype theory (typicality effects)
Biederman's geons
Family resemblance is irrelevant for birds

Show answer

B. Robin is closer to the bird prototype than penguin, so verification is faster. Pure definitional categorization predicts no typicality difference.

JLB Chapter 3

The Turn to the Brain

From the immaterial soul to localized neural circuits. Phrenology is wrong in particulars but not absurd in spirit — some abilities really do map onto specific brain regions. The lecture surveys dualism vs materialism, basic neuroanatomy, and a parade of neuropsychological case studies that establish localization of function.

3.1 Philosophical roots

Dualism (Descartes 1596–1650)

Two substances: material body and immaterial soul. Animals (dogs) lack souls. The soul interacts with the body via the pineal gland. Only humans need a soul (for thinking, believing).

Critique: how does the immaterial interact with the material? Places mind outside science.

vs.

Materialism (Hobbes 1588–1679)

Everything is material; soul is a meaningless concept. All human behaviour = physical processes in the brain. Thought is anchored in neural firing.

The view that lets the brain become an object of science.

3.2 Psychiatry then and now

1247 — Bethlem Royal Hospital ("bedlam") founded. Closed institution; no treatment, just isolation.
18th century — rest, cleanliness, regularity. King George III of England (1738–1820) went mad and recovered — proof that mental illness can pass.
20th century — frontal lobotomy: removal/disconnection of the frontal lobe "to reduce the complexity of psychic life." Required a neurosurgeon.
Walter Freeman (1945) — transorbital lobotomy with an icepick through the eye socket. > 40,000 people in the US. Side effects: cognitive impairment, flattened affect.
Paul Broca (1824–1880) — founder of clinical neuropsychology. Showed that damage to Broca's area impairs speech but not other abilities — early empirical case for localization.

3.3 The brain in numbers (Carl Sagan's "very big place in a very small space")

≈ 1,300 g total mass
~ 10¹¹ neurons total (20 × 10⁹ neocortical)
~ 15 × 10¹³ cortical synapses
Two hemispheres joined by the corpus callosum; lateralized despite sensory/motor symmetry.

The neuron

Action potentials are binary (all-or-none); signals are frequency-modulated, not amplitude-modulated. Myelination (white matter) speeds transmission. Multiple sclerosis attacks myelin.

Neurotransmitters

Neurotransmitter	Primary function
Acetylcholine	Muscle contraction
Serotonin	Sleep, mood, arousal
Glutamate	Learning, memory (excitatory)
GABA	Inhibitory transmitter
Norepinephrine	Arousal, wakefulness
Dopamine	Motivation, emotion

3.4 Aphasias — language disorders

Aphasia affects ~35% of stroke patients.

Broca's aphasia (non-fluent)

Speech is effortful, telegraphic, but meaningful. "Cat… sit… mat." Comprehension largely preserved. Damage to Broca's area (frontal lobe, BA 44).

vs.

Wernicke's aphasia (fluent)

Speech is fluent but lacks content/meaning ("word salad"). Damage to Wernicke's area (temporal lobe).

3.5 Split-brain — corpus callosotomy

Cutting the corpus callosum treats refractory epilepsy. No major IQ, conversational, or coordination deficit. But experiments reveal striking dissociations:

Object presented to left visual field (→ right hemisphere) — patient says "There is no object."
But the left hand (also right-hemisphere) can pick out the same object from a collection perfectly.
Patients confabulate to explain their left-hand's behaviour.

3.6 Catalogue of deficits

Unilateral spatial neglect

Not blind — but ignores one side (usually left). Misses left side of objects when drawing; makes too many right turns; forgets the tens/hundreds in mental arithmetic. Can affect peri-personal space and even visual imagery. Implicit knowledge about neglected items may be preserved.

Visual agnosia

Cannot recognize visually presented objects despite intact vision, memory, language, and intelligence. Two flavours:

Form agnosia — cannot perceive shape
Integrative agnosia — perceives shapes but cannot integrate them. Patient HJA (Humphreys & Riddoch) could only describe a lion by inferring from parts: "A heavy, four-legged animal… these stripes mean something… I suppose it's a lion."

Blindsight

Cortical blindness (V1 damage) — no conscious vision — yet forced-choice responses are above chance for movement, orientation, even some shapes. "How can I look at something that I haven't seen?" Evidence for two visual systems: a primitive unconscious one and a conscious cortical one.

Prosopagnosia (face blindness)

Cannot recognize faces. Compensation via voice, gait, glasses, hair. Often bilateral damage to the fusiform face area (FFA).

3.7 Two visual streams

Dorsal "where/how"

Action stream. Damage: neglect, apraxia, erratic grasping. The "what pathway" is intact — patient can name and describe objects but can't grasp them properly.

vs.

Ventral "what"

Perception stream. Damage: visual agnosia, prosopagnosia. The "where pathway" is intact — patient can post a card "as if mailing a letter" through a rotated slot, but cannot report the slot's orientation.

📘 Textbook adds — Petersen et al. (1988): PET subtraction logic

An early functional-neuroimaging landmark for single-word processing. Subjects performed a hierarchy of four tasks; each scan was subtracted from the one above to isolate the new component:

Fixation
baseline

−

Passive viewing
of words
+ visual

−

Speak the words
+ articulation

−

Generate verb
for each noun
+ semantic access

The pattern of activations supported a parallel (not strictly serial) model of single-word processing. Significance: this study established the paired-subtraction paradigm that all later fMRI cognitive subtraction designs inherited.

📘 Textbook adds — Logothetis (2001): what does BOLD actually measure?

The crucial follow-up question after the fMRI revolution: BOLD measures blood oxygenation — but is that correlated with neuronal output (spikes) or input (synaptic activity)? Logothetis put both fMRI and non-magnetic microelectrodes in an anaesthetised monkey's V1 during a rotating-checkerboard stimulus.

Signal	What it indexes	Correlated with BOLD?
Single-unit / multi-unit firing (SDF, MUA)	Neural output (spikes)	Adapts after 2 s — decouples from BOLD
Local Field Potential (LFP)	Neural input (summed synaptic activity, low-pass-filtered)	Tracks BOLD throughout the trial

Implication: fMRI activation in a region reflects information arriving there, not necessarily spikes leaving. A region can show BOLD without firing more — undermines naive "this region does X" inferences.

3.8 Capgras delusion — the inverse of prosopagnosia

Delusion that a loved one has been replaced by an identical-looking impostor. Explicit recognition is intact, but the implicit/emotional response is missing. Confabulation: "This person looks like my partner, but I don't feel the same about them — so it must be someone else."

	Explicit recognition	Implicit/emotional recognition
Prosopagnosia	❌ lost	✅ intact
Capgras	✅ intact	❌ lost

Self-test · Lecture 3

1. Descartes thought mind and body interact at the:

Hippocampus
Pineal gland
Corpus callosum
Thalamus

Show answer

B. The pineal gland was Descartes' proposed locus of mind–body interaction.

2. A patient with fluent but meaningless speech ("word salad") most likely has:

Broca's aphasia
Wernicke's aphasia
Prosopagnosia
Blindsight

Show answer

B. Wernicke's = fluent, content-poor. Broca's = effortful, telegraphic but meaningful.

3. A patient cannot report the orientation of a rotated slot, yet can post a card through it perfectly. This dissociation supports:

Capgras delusion is a single-pathway disorder
Ventral pathway = perception; Dorsal pathway = action (action is preserved)
Dorsal damage causes visual agnosia
Neglect affects only the left hemifield

Show answer

B. Visual agnosia patients (ventral damage) can act on stimuli they cannot consciously identify — Milner & Goodale's two-streams model.

4. Action potentials are best characterized as:

Amplitude-modulated and graded
Binary and frequency-modulated
Continuous chemical signals
Always inhibitory

Show answer

B. Spikes are all-or-none; signal strength is encoded in firing rate.

5. Capgras delusion is best described as:

Loss of both explicit and implicit recognition of faces
Loss of explicit recognition with intact emotional recognition
Intact explicit recognition with lost emotional/implicit recognition
A pure form of prosopagnosia

Show answer

C. Capgras = inverse of prosopagnosia. Recognition is intact; the emotional resonance is gone, so patients infer an impostor.

6. In a split-brain patient, an object shown only to the left visual field will:

Be verbally named correctly
Not be named verbally, but the left hand can correctly select it
Cause complete blindness in that hemifield
Trigger seizures

Show answer

B. Left visual field → right hemisphere (which controls the left hand but lacks dominant language). The patient says there's no object but selects it correctly with the left hand — and often confabulates.

JLB Chapter 9

Strategies for Brain Mapping

No single neuroscientific technique sees the whole picture. Each method trades off temporal resolution against spatial resolution. Mastery means knowing which tool to reach for, what it measures, and the conceptual logic of subtraction and double dissociation.

4.1 Anatomical classification

Surface classification — gross anatomy (gyri, sulci, lobes).
Cellular classification (Brodmann) — Brodmann used staining to identify ~52 areas with distinct neuronal populations. Principle of Segregation: "Cerebral cortex can be classified into different areas with unique neuronal populations."

Brodmann area	Function
BA 1–3	Primary somatosensory cortex
BA 4	Primary motor cortex
BA 17	Primary visual cortex (V1)
BA 44	Broca's area (language production)

DTI (Diffusion Tractography) = MRI-based visualization of white-matter fibre tracts — measures anatomical connectivity, not function.

4.2 The big methodological tradeoff

Use high-temporal tools (EEG/MEG) for attention/language/decision; high-spatial tools (fMRI/PET) for memory/face recognition.

Technique	Directly measures	Temporal	Spatial
Single-unit recording	Action potentials of individual neurons	High	High
EEG	Electrical activity of large neural populations (scalp)	High	Low
MEG	Magnetic fields from electrical population activity	High	Low–medium
PET	Cerebral blood flow	Low	High
fMRI	Blood oxygen levels (BOLD)	Low	High

Exam-critical: fMRI does NOT measure neural activity directly. It measures the BOLD signal: "changes in magnetic properties of haemoglobin in the blood due to brain activation."

4.3 ERP components

An ERP (event-related potential) is a time-locked average of EEG to a specific event.

Component	Latency	Indexes
P1 / N1	~ 100–150 ms	Early sensory + attention (enhanced for attended stimuli)
P300	~ 300 ms	General cognitive processing / oddball detection
N400	~ 400 ms	Semantic processing (e.g., "He spread butter on his socks")
P600	~ 600 ms	Syntactic reanalysis

4.4 The Locus of Selection problem

Does attention modulate processing before or after a stimulus representation is built?

ERP timing: P1 and N1 are enhanced for attended stimuli — early modulation.
Macaque microelectrode recordings localize attentional modulation to V1–V4, before object recognition or access to meaning.
Resolution: combining ERP (timing) with microelectrodes (location) shows attention acts early, in pre-representational visual cortex.

4.5 fMRI logic — subtraction & hierarchical design

To localize a function, contrast a task that requires it against a near-identical task that doesn't. Hierarchical lexical-access design:

Fixation
control

−

Viewing words
visual

−

Speaking words
+ articulation

−

Generating verbs
+ semantics

Each contrast isolates the new component recruited at that step.

4.6 Owen et al. (2006) — detecting awareness in the vegetative state

Published in Science 313:1402. A patient diagnosed UWS was asked to imagine either playing tennis (engages motor/SMA) or walking through her house (engages parahippocampal regions). She produced the appropriate, task-specific BOLD activation on command — proving covert awareness despite the absence of behavioural response. This paradigm became a yes/no communication channel for "vegetative" patients.

Self-test · Lecture 4

1. fMRI directly measures:

Action potentials of individual neurons
Changes in haemoglobin's magnetic properties (BOLD signal)
Cerebral blood flow via radioactive tracers
Magnetic fields generated by neural firing

Show answer

B. BOLD = blood-oxygen-level-dependent signal. Note: that's not the same as PET (blood flow) or MEG (magnetic fields).

2. You want to study the time-course of semantic violation. Which technique?

fMRI
PET
EEG (looking at the N400)
DTI

Show answer

C. EEG has high temporal resolution, and the N400 is the canonical index of semantic anomaly.

3. Which Brodmann area corresponds to Broca's area?

BA 4
BA 17
BA 44
BA 1–3

Show answer

C. BA 44 = Broca's area. BA 4 = primary motor; BA 17 = primary visual (V1); BA 1–3 = somatosensory.

4. Owen et al. (2006) demonstrated covert awareness in a vegetative-state patient by using which paradigm?

EEG recordings of N400 during speech
fMRI during imagined tennis vs imagined spatial navigation
PET imaging of dopamine receptors
Single-unit recording in V1

Show answer

B. Tennis activates motor/SMA, spatial imagery activates parahippocampal regions — different and task-specific BOLD patterns are evidence of conscious task performance.

5. The combination of ERP timing + microelectrode localization resolved the locus-of-selection problem by showing:

Attention modulates visual processing late, after object recognition
Attention modulates visual processing early, in V1–V4 before recognition
Attention is purely a frontal phenomenon
Attention does not modulate sensory areas at all

Show answer

B. P1/N1 enhancement + macaque microelectrodes both localize the effect to early visual cortex.

JLB Chapter 5

Connectionism

Two starting points in Marr's hierarchy give rise to two paradigms. Symbolic AI starts from the mind (algorithms, interpretable rules). Connectionism starts from the brain (biology has already produced intelligence). The result: networks whose knowledge is a pattern of weights, not a list of beliefs.

5.1 Two starting points

Symbolic AI (top-down)

Start at the algorithmic level. Explicit symbol manipulation. Interpretable and fittable. Cognition as rule-based operations on discrete symbols (the Physical Symbol System hypothesis).

vs.

Connectionism (bottom-up)

Start at the implementational level. Biology has produced intelligence — use it as inspiration. Deep nets are powerful but hard to interpret. Knowledge is in the weight vector.

📘 Textbook adds — The Physical Symbol System Hypothesis (Newell & Simon, 1976)

The lecture's "Symbolic AI" pole is grounded in a specific thesis the textbook treats as foundational. Allen Newell & Herbert Simon (Turing Award lecture, 1976):

"A physical symbol system has the necessary and sufficient means for general intelligent action."— Newell & Simon, 1976

Two claims packed in: (i) necessity — anything intelligent must be a physical symbol system; (ii) sufficiency — building one is enough to produce intelligence.

Four defining features of a physical symbol system

Symbols are physical patterns (inscriptions on a tape, voltage states, neural firings).
Symbols can be combined into complex structures via recursive rules (like sentences in propositional logic).
The system contains processes that transform symbol structures in rule-governed ways — this is thinking.
Those transformation processes can themselves be represented as symbols within the system (meta-representation).

Cognition, on this view, is heuristic search through a problem space. Newell & Simon's General Problem Solver (GPS) applied means–end analysis: compute the difference between the current and goal state, pick an operator that reduces it, apply, repeat. The PSSH defines what connectionism rejects.

5.2 From biology to schematic neurons

A unit computes Σ(weight × input), passes the sum through an activation function (Heaviside / ReLU / sigmoid), and emits an output. Knowledge lives in the weights.

5.3 Learning — the delta rule (single-layer)

error ε = (desired output − actual output)
Δ T = − ε Δ Wᵢ = ε · Iᵢ (scaled by a learning rate)

The perceptron convergence rule: training will find a solution in every case where a solution is possible. But which functions ARE possible?

5.4 The XOR problem

The single-layer perceptron cannot learn XOR — it oscillates, never converges. Why?

I₁	I₂	XOR	Contradiction
1	0	1	(1·W₁) > T
0	1	1	(1·W₂) > T
1	1	0	but then (W₁+W₂) > T → output would also be 1. Impossible.

Perceptrons only learn linearly separable functions. XOR is not linearly separable — you cannot draw a single straight line that separates the (1,0) and (0,1) cases from (0,0) and (1,1).

5.5 The escape route — multi-layer networks & backpropagation

Universal approximation theorem: a multi-layer network can compute any Turing-computable function.
But the perceptron rule no longer works — hidden units have no target activation.
Backpropagation calculates each hidden unit's "share of responsibility" for the output error and uses it to update weights.
Gradient descent: follow the negative gradient of the error surface; stop when the gradient is zero. Risk: local minima ≠ global minimum.

5.6 Biological plausibility — the critiques

Schematic neurons ≠ real ones; questions of parallelism and scale.
No evidence backpropagation occurs in the brain.
How would the brain set the number of hidden units?
No evidence individual neurons receive error signals from all downstream neurons.
Most biological learning is not supervised.

5.7 Cognitive implications — distributed vs local representations

Knowledge lies in a pattern of weights, not in any one unit.
A trained network does not need a separate unit per feature.
Processing = input vector × weight vector. No discrete beliefs, no explicit rules.
Algorithmic in a limited sense: the learning rule and activation function are algorithms — but they're not task-specific and they don't operate over explicit representations.

Conclusion: "The nature of representations and computation in neural networks is fundamentally different compared to physical symbol systems."

Self-test · Lecture 5

1. Which Boolean function CANNOT be learned by a single-layer perceptron?

AND
OR
NAND
XOR

Show answer

D. XOR is not linearly separable. AND, OR, NAND all are.

2. The universal approximation theorem says that multi-layer networks can:

Be trained by the perceptron convergence rule
Always find the global minimum
Compute any Turing-computable function (given enough hidden units)
Encode any function in a single unit

Show answer

C. Multilayer = universal function approximator. But the perceptron rule fails — you need backpropagation.

3. In the delta rule, the update for a weight is:

ΔWᵢ = ε · Iᵢ (scaled by learning rate)
ΔWᵢ = −ε
ΔWᵢ = T · Iᵢ
ΔWᵢ = Wᵢ²

Show answer

A. Weight changes scale with both the error and the input that drove it. Threshold updates by −ε.

4. The strongest biological-plausibility critique of backpropagation is:

It only works for linearly separable functions
It produces local minima too often
There is no evidence the brain implements it (no mechanism for propagating error signals through every synapse)
It is slower than the perceptron rule

Show answer

C. Backprop requires each neuron to receive precise error signals from all downstream neurons — biology has no known mechanism for this. Plus most learning isn't supervised.

5. Distributed (vs. localist) representations in connectionist networks mean:

Each unit represents one feature
Knowledge lies in the pattern of weights across many units
Information is stored explicitly as symbols
Each layer represents a different category

Show answer

B. A localist scheme uses one unit per feature; distributed schemes encode features across overlapping populations — the source of NN power.

JLB Chapters 6 & 8

Modularity of Mind & Dynamical Systems

Three rival pictures of mental architecture: (1) Fodor's classical modularity — domain-specific input modules plus a non-modular central system; (2) massive modularity (Cisek-style evolutionary view) — no central processor at all; (3) dynamical systems theory — cognition as a process that evolves in time, possibly without representations or computation.

6.1 Agents: three tiers

Reflex agents

IF–THEN production rules. Not a cognitive system. No information processing — just acting on information. Examples: thermostat, zebrafish C-start reflex, somatic reflex.

Goal-based agents

Evaluate consequences of possible actions in light of goals (foraging). No learning.

Learning agents

Detect errors. Experiment with new strategies in light of past failures.

6.2 Classical (Fodorian) modularity

Aristotelian roots: horizontal faculties (perception, attention, memory) are domain-general; vertical faculties are domain-specific (colour, shape, face/voice, grammar, conspecific recognition).

Input modules (Fodor)

Domain-specific
Mandatory
Information-encapsulated
Fast
Fixed neural architecture
Specific breakdown patterns

vs.

Central processing (Fodor)

Domain-general
Information-un-encapsulated (isotropic)
Slow
Voluntary control
Diffuse neural structures
Personal-level propositional attitudes

Evidence cited: lesion studies, Broca's vs Wernicke's aphasia, brain mapping.

6.3 Massive modularity (Cisek 2019)

The radical alternative: there is no domain-general central processor. The mind is hundreds or thousands of genetically specified Darwinian modules selected for specific adaptive problems.

"The most important thing about the brain is that it evolved."
Domain-general learning mechanisms cannot detect statistically recurrent domain-specific structure.
Each module exploits specialized, domain-specific rules.
Descriptive vs pragmatic representations: control loops only need action-oriented (pragmatic) representations, not world models.
Input–output functionalism ignores the cyclical nature of behaviour.
No single decision-making system — just domain-specific competition mechanisms.
Conceptual maps emerge from learning on top of sensorimotor loops — no symbol-grounding problem.

📘 Textbook adds — The cheater-detection module: Wason & Cosmides/Tooby

The textbook's flagship case study for a Darwinian module. The Wason selection task: four cards (E, K, 4, 7); rule "If a card has a vowel on one side, then it has an even number on the other". Which to turn? Correct answer: E and 7 (modus tollens). Most subjects say E and 4 — a famous failure of abstract conditional reasoning.

Griggs & Cox (1982) reframed the same logical task as a deontic conditional: "If a person is drinking beer, then that person must be over 19" with cards BEER, COKE, 16, 25. Now subjects answer correctly (BEER, 16) at near-ceiling rates.

Cosmides & Tooby argued the improvement reveals a domain-specific, evolved cheater-detection module for social-exchange reasoning. The argument links to the evolution of cooperation via the TIT FOR TAT strategy in indefinitely-iterated prisoner's dilemmas: applying TIT FOR TAT requires identifying defectors, so natural selection would favour a module specialised for spotting them.

Two general arguments for massive modularity	Cosmides & Tooby's claim
Argument from error	Fitness criteria are domain-specific (treating kin, finding mates, detecting cheaters all differ) → no domain-general cognitive mechanism could have evolved.
Argument from statistics & learning	Domain-general learning mechanisms cannot detect statistically recurrent domain-specific patterns (e.g., Hamilton's kin-selection equation).

6.4 Dynamical systems theory (van Gelder 1995)

"What might cognition be, if not computation?"— Tim van Gelder, 1995

Cognition as a process that evolves through time, not necessarily involving computation or representations.

Traditional cogsci

Cognition = information processing = manipulating representations. Discrete steps. Symbols, rules.

vs.

Dynamical systems

Cognition = continuous trajectory through state space. Described by difference equations (discrete) or differential equations (continuous).

State space = geometric space of all possible system states. Each independently varying quantity = one dimension.
Trajectory = path through state space from initial conditions.
Two senses of "dynamical system": trivial (anything that evolves in time) vs technical (analyzable with DST tools).

📘 Textbook adds — ACT-R as a hybrid architecture (Anderson, CMU)

Where Soar (Newell, Laird, Rosenbloom) is purely symbolic, ACT-R ("Adaptive Control of Thought — Rational") is the canonical hybrid architecture — symbolic and subsymbolic at once.

Symbolic layer

Chunks in declarative memory (knowledge-that, e.g., "7+6=13"). Production rules in procedural memory (knowledge-how, IF–THEN). All built from physical symbols.

Subsymbolic layer

Each production rule and chunk has a numerical activation/utility value. A pattern-matching module performs a Bayesian-style cost–benefit calculation to pick which rule fires next — no central executive.

Take-away: modularity and PSSH-style processing can coexist with neural-net-style subsymbolic selection. Cognitive architecture is not all-or-nothing.

6.5 Worked example 1 — Ising network model of depression (Cramer et al. 2016)

Traditional latent-variable view: gallstones cause nausea, abdominal pain, heartburn — a single hidden cause produces all symptoms.

The network view of psychopathology: symptoms are nodes (active = 1, inactive = 0) coupled by weights W_ij. Activation propagates via a logistic function. Stress = extra input to all nodes.

Depression evolves as a self-sustaining network of interacting symptoms.
Insight into cognition without representations or computations.
Same approach extends to bipolar disorder, generalized anxiety, attitude models.

6.6 Worked example 2 — Decision Field Theory (Busemeyer et al. 2019)

Choosing among multiple options (e.g., three phones differing in price, OS, battery, speed). Preference state P evolves over time:

P_i,t = λ · P_i,t−1 + V_i,t − [ Σ_j≠i V_j,t / (n − 1) ] + noise leakage λ · own valence − mean competitor valence + noise

The dynamics of a connectionist accumulator predict preferences, response times, and choice proportions as emergent properties of system evolution, not computations on symbols.

Take-away framing: "The behavior of the system as a whole is of interest — less focus on the computations on underlying representations, or even on architecture."

Self-test · Lecture 6

1. Which of these is NOT one of Fodor's listed properties of input modules?

Domain-specific
Mandatory
Information-encapsulated
Isotropic

Show answer

D. Isotropic = un-encapsulated, holistic — Fodor's description of central cognition, not modules. Modules are encapsulated, domain-specific, mandatory, fast, fixed in neural architecture, and have specific breakdown patterns.

2. Massive modularity (Cisek-style) differs from Fodor's view by claiming:

There are no input modules at all
There is no domain-general central processor — the mind is modules all the way down
Modules cannot evolve
Symbol grounding is impossible

Show answer

B. Fodor keeps a non-modular central system; massive modularity eliminates it, replacing it with many Darwinian modules each solving a specific adaptive problem.

3. Van Gelder's banner question — "What might cognition be, if not computation?" — points to:

Massive modularity
Dynamical systems theory
Classical AI
The Physical Symbol System hypothesis

Show answer

B. DST views cognition as a time-evolving process, possibly without manipulating representations.

4. In the Ising network model of depression, "stress" is modelled as:

A reduction in weight strengths
Extra activation input to all symptom nodes
A change in the logistic activation function
An additional hidden node

Show answer

B. Stress = extra activity injected to all nodes. This can push the network into a self-sustaining depressed state.

5. In a dynamical-systems framework, the path of a system through all its possible states over time is called:

A representation
A trajectory through state space
An algorithm
A symbol grounding

Show answer

B. State space = geometric space of all possible states; trajectory = the path from initial conditions.

JLB Chapter 7

Bayesianism in Cognitive Science

Three ideas: (1) belief comes in degrees, (2) those degrees obey probability calculus, (3) learning = updating probabilities via Bayes' rule. The lecture's punchline: Bayesianism is the normative ideal — but humans systematically fail to reason like Bayesians.

7.1 The probability calculus rules

Probabilities ∈ [0, 1].
Impossible sentences = 0; necessary truths (2+2=4) = 1.
If P and Q are logically equivalent, p(P) = p(Q).
Negation: p(¬S) = 1 − p(S).
Disjunction (mutually exclusive): p(R ∨ S) = p(R) + p(S).
Conjunction (independent): p(R ∧ S) = p(R) × p(S).
Conditional: p(A | B) = p(A ∧ B) / p(B).

7.2 Bayes' rule

p(H | E) = p(E | H) · p(H) / p(E) posterior = likelihood × prior / evidence

The denominator is computed by marginalization: p(E) = p(E|H)·p(H) + p(E|¬H)·p(¬H)

7.3 Why the laws are objectively correct — Dutch books

A Dutch book is a set of bets that (1) the subject considers fair given their personal probabilities, but that (2) guarantee they lose money no matter what. Anyone whose beliefs violate probability calculus can be Dutch-booked. Therefore: rational degrees of belief must obey probability calculus.

Sam example: Sam believes "2+2=4" with probability 90%. Offer: he gets $0.90; he pays $1 if the bucket has 4 marbles. He thinks it's fair (EV = 0). But 2+2 IS 4 — so he always loses $0.10.

7.4 Worked example — ESP / clairvoyance

"Extraordinary claims require extraordinary evidence."— Carl Sagan (1978)

Clairvoyant correctly predicts 100 coin tosses. Should you believe in ESP?

P(predict | ESP) = 0.9
P(ESP) = 10⁻¹² (very skeptical prior)
P(predict | ¬ESP) = 2⁻¹⁰⁰ ≈ 7 × 10⁻³¹

Posterior P(ESP | predict) ≈ 1 − 10⁻¹⁸. Almost certain — until you add the "trick" hypothesis with P(trick) = 10⁻⁶:

P(¬ESP & trick | predict) / P(ESP & ¬trick | predict) ≈ 10⁶

It's a million times more likely you were tricked than that ESP is real. The moral: tiny priors over fraud can outweigh enormous likelihoods.

7.5 Worked example — COVID-19 base-rate problem

Prevalence 1/1000. False positive rate 5%. You test positive. P(disease | positive) = ?

Reason over 1000 people:

1 person actually has it (positive).
Of 999 healthy people, 5% test positive falsely ≈ 50.
51 positives total; only 1 is actually sick → ~2%, not 95%.

Why we still trust tests: in real life the sample is not random — there's a reason you were tested, so the relevant base rate is much higher.

7.6 The transposed-conditional fallacy

p(A | B) ≠ p(B | A).

A = "is a white American man," B = "is a US senator"
p(A | B) ≈ 0.9 (most senators are white American men)
p(B | A) ≈ 0.00000009 (almost no white American men are senators)

📘 Textbook adds — Perception as Bayesian inference (Helmholtz → Hohwy)

The textbook frames Bayesianism in cognition as the modern formalisation of Hermann von Helmholtz's 19th-century proposal that perception is unconscious inference. The proximal sensory input radically underdetermines the distal world, so the brain must infer what is out there using stored knowledge about how the world tends to be.

Hypothesis (H) = candidate layout of the distal environment.
Evidence (E) = retinal stimulation.
Likelihood p(E | H) = how probable this image is given that layout.
Prior p(H) = how probable that layout is in general.
Gestalt principles (continuity, proximity, good form, common fate) function as Bayesian priors over scene structure.

Case study: Binocular rivalry (Hohwy, Roepstorff & Friston, 2008)

Present a red iron to the left eye and a green violin to the right eye. Perception alternates between the two — never a stable composite. Why?

H1 = red iron, H2 = green violin, H3 = composite "red-green iron-violin".
Likelihoods of the conflicting retinal input are roughly equal across H1–H3.
But p(H1) ≈ p(H2) ≫ p(H3) — the prior on composite objects is tiny.
Posteriors: H1 and H2 tied, H3 ruled out. The visual system flips between the two equally-supported hypotheses rather than averaging them.

Binocular rivalry is a rational Bayesian response, not a glitch — a key example for predictive coding theories of perception.

7.7 Bayesian search theory (MH370)

For each grid cell i: p_i (probability object is there), a_i (probability of finding it if there), c_i (cost of searching).

Optimal policy: argmax_i ( p_i · a_i / c_i )

After each miss, redistribute the posterior over remaining cells and recompute.

7.8 Where humans fail — heuristics & biases

Availability heuristic

Judge probability by ease of recall. Therapist who just saw three depressed patients overestimates depression in the next.

Gambler's fallacy (predictable-world bias)

After 4 tails, you "feel" heads is due. But independent tosses have no memory.

Probability matching

Die with 4 red / 2 green sides. Maximizing (always red) → 67% correct. Matching (red 2/3, green 1/3) → 56%. Humans match; mice maximize. In stochastic processes, maximizing > matching.

Base-rate neglect

The COVID example above. Also: at the NY subway you see someone reading the NYT — better bet she has a PhD or no college degree? Far more non-graduates ride the subway, so no degree is the better bet.

7.9 The Linda problem — conjunction fallacy (Tversky & Kahneman)

Linda is 31, single, outspoken, very bright. Majored in philosophy; concerned with discrimination/social justice; antinuclear protests. Rank the probability:

F. Linda is a bank teller
H. Linda is a bank teller and active in the feminist movement

Most people rank H > F. But "feminist bank tellers" are a strict subset of "bank tellers." Specifying more detail can only LOWER probability, never raise it.

7.10 Bayesian view of psychopathology

Schizophrenic delusions can involve affirming the consequent: "Jesus had stigmata; I have stigmata; therefore I am Jesus." Rokeach (1964) "The Three Christs of Ypsilanti" — three paranoid schizophrenic men each believing he was Jesus, housed together for two years; their beliefs barely shifted, showing the difficulty of revising delusional priors.

Self-test · Lecture 7

1. In Bayes' rule p(H|E) = p(E|H)·p(H)/p(E), the term p(H) is the:

Posterior
Likelihood
Prior
Marginal evidence

Show answer

C. p(H) = prior; p(E|H) = likelihood; p(H|E) = posterior; p(E) = marginal evidence.

2. A disease has prevalence 1/1000. A test has a 5% false positive rate. If you test positive (with 95% true-positive sensitivity), the chance you actually have the disease is closest to:

95%
50%
2%
0.1%

Show answer

C. Roughly 2%. Reason over 1000 people: 1 true positive vs ~50 false positives.

3. Why is "Linda is a bank teller AND a feminist" more probable to most people than "Linda is a bank teller"?

It actually IS more probable
The conjunction fallacy: people judge by representativeness, not probability
Feminist bank tellers form a superset
It is a Bayesian-correct judgment

Show answer

B. The set "feminist bank tellers" is a strict subset of "bank tellers" — by definition, it cannot be more probable. The intuition is driven by representativeness.

4. A subject is told that ALL Dutch books exploit a particular feature of their beliefs. Which feature?

Their willingness to gamble
Violations of probability calculus in their personal probabilities
Their use of the availability heuristic
Their priors being too small

Show answer

B. If your degrees of belief don't obey probability calculus, a Dutch book can be constructed against you — that's the philosophical justification for Bayesianism.

5. In a stochastic environment with 4 red / 2 green outcomes, the optimal strategy is:

Probability matching (predict red 2/3, green 1/3) → 56%
Maximizing (always predict red) → 67%
Always predict green
Predict whichever was last seen

Show answer

B. Maximizing yields 67% correct; matching only 56%. Humans tend to match (suboptimal). Gambling addicts especially.

6. Bayesian search theory says you should search the cell that maximizes:

p_i · a_i
p_i · a_i / c_i
p_i / a_i
c_i · a_i

Show answer

B. Maximize the product of "probability there" × "probability you'd find it if there", divided by the cost of searching.

JLB Chapter 10

Language Learning

Three paradigms applied to language: symbolic (Fodor's Language of Thought, Chomsky's innatism), connectionist (neural nets that reproduce children's overgeneralization), and Bayesian (statistical learning of word boundaries and anaphora). The lecture closes with the LLM revolution and what it implies about innateness.

8.1 What is language understanding?

Semantics — meaning of words.
Syntax — structure; surface vs deep structure.
"Colorless green ideas sleep furiously" → syntactically well-formed, semantically anomalous.
"John has hit the ball" / "The ball has been hit by John" → two surface structures, one deep structure.

Strong vs weak mastery: are linguistic rules explicitly represented in the head (strong sense, Fodor/Chomsky) or merely obeyed in behaviour (weak sense, connectionist/Bayesian)?

Key permission slip: "Rule-governed phenomena need not come from rule-governed information-processing structures." — this opens the door to connectionism and Bayesianism for language.

8.2 Fodor's Language of Thought (Mentalese)

Learning a language requires being able to evaluate truth conditions: "'The cat is on the mat' iff there's a cat and there's a mat and the cat is on the mat." Circularity problem: you can't learn this in English without already knowing what "cat" and "mat" are.

Solution: an innate symbolic medium — Mentalese. Slogan: "You cannot use the language you're learning to learn."

8.3 Nicaraguan Sign Language

In the 1970s, a school for deaf children in Nicaragua tried to teach Spanish by finger-spelling. Instead, the children spontaneously generated their own sign language. Documented by linguist Judy Kegl. Later generations added structural features like spatial modulation — evidence for innate language abilities.

8.4 Three paradigms for past-tense learning

The English past tense is a microcosm. Children show two features: (1) follow the "-ed" rule ("walked"); (2) handle exceptions ("gave"). Crucially, they make overgeneralization errors ("goed") that come and go in a gradual learning curve.

Dual-route (symbolic)

Two separate systems: (1) associative memory for irregulars; (2) an explicit "-ed" rule for regulars.

vs.

Plunkett & Marchman (1993)

One connectionist network: 20 input + 30 hidden + 20 output units; phonological input → phonological output. Reproduces overgeneralization and gradual learning — without explicit rules.

📘 Textbook adds — Rumelhart & McClelland (1986): the original past-tense network

Before Plunkett & Marchman there was the Rumelhart & McClelland (1986) PDP model — the founding connectionist past-tense network, published in the two-volume Parallel Distributed Processing.

	R&M (1986)	Plunkett & Marchman (1993)
Architecture	Simple pattern associator, no hidden units	20–30–20 with hidden layer
Input encoding	Wickelfeatures (after Wickelgren) — context-sensitive phoneme codes	Raw phonological input
Learning rule	Perceptron convergence	Backpropagation
Training regime	10 high-frequency verbs → suddenly expanded to 410 medium-frequency (80% regular)	20 verbs (half regular, half irregular), gradually expanded
Reproduced overgeneralization?	Yes — but Pinker & Prince argued it was baked in by the sudden vocabulary jump	Yes, without the question-begging schedule

The lineage matters because Pinker & Prince's critique of R&M is what motivated the dual-route symbolic model. The later Plunkett & Marchman result rebuts that critique on the connectionist side: overregularization emerges from co-presence of regulars and irregulars, not from training-set manipulation.

8.5 Bayesian language learning

(a) Word segmentation via transitional probabilities

For the sound string /k/ /ae/ /t/ /m/ /i/ /aʊ/ /z/ ("cat meows"):

p(/ae/ | /k/) — high (within-word transition)
p(/t/ | /ae/) — high
p(/m/ | /t/) — low — this dip signals a word boundary

Same logic scales up to word-level transitional probabilities for sentence boundaries.

(b) Pronominal anaphora (Lidz et al.)

"I'll play with this red ball, and you can play with that one." Does "one" refer to H1 = a ball or H2 = a red ball?

P(H | S) ∝ P(S | H) · P(H)
Children learn P(S | H) from experience.
Since P(S | H2) > P(S | H1), the most likely intended referent is "the red ball."

Reference: "What children know about syntax but could not have learnt."

8.6 LLMs — how they work

Token
≈ morpheme

→

Embedding
100s of dims, learned

→

Attention layers
weighted recombination

→

Next-token
prediction

Autoregression: feed each prediction back as input to predict the next.
Transformer paper: "Attention is all you need" (Vaswani et al., 2017).
Attention heads recode each token as a learned weighted combination of all tokens. Stacked hundreds of times, purely feedforward.
The final encoding of the last token IS the prediction of the next token. Deterministic; randomness added post-hoc.

Embedding arithmetic — words as vectors

biggest − big + small = smallest

Paris − France + Berlin = Germany

Doctor − man + woman = nurse bias embedded

Two-stage training

Pre-training: mask the next word in billions of internet texts; backprop until predictions improve. (Tends to complete rather than reply.)
RLHF (Reinforcement Learning from Human Feedback): humans rate outputs; network updated toward higher-rated predictions. Makes models conversational.

LLM knowledge ≈ long-term semantic memory (fuzzy, can hallucinate); LLM prompts ≈ working memory (relevant info inserted reduces hallucinations).

8.7 Big debates — does this refute Chomsky?

Piantadosi (2023)

"LLMs refute Chomsky." A pure text-prediction net acquires grammatical structure with no innate machinery. Proof of principle that syntax can be acquired without innate structure.

vs.

Bender et al. (2021) — Stochastic Parrots

"An LM is a system for haphazardly stitching together sequences of linguistic forms... according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot."

Other critiques:

Grounding problem — LLMs don't know what a banana tastes like.
Mitchell & Krakauer (2023) — humans learn concepts; they abstract, reason compositionally and counterfactually, intervene on the world, and explain.
Guest & Martin (2023) — multiple realizability: same outputs do not imply same mechanism.
Training data asymmetry — GPT-3 saw ~4 × 10¹¹ words; a 5-year-old does causal reasoning on 4–5 orders of magnitude less input. Possible explanations: nativism, multi-modal grounding, active/social learning, or "comparing apples and pears."
Binz & Schulz (2023, PNAS) — gave GPT-3 the Wason selection task. It got the canonical version right but failed ~50% of trials, with human-like error patterns.

Self-test · Lecture 8

1. Fodor's circularity argument for the Language of Thought says:

Language can only be learned from a teacher
You cannot use the language you are learning to learn it — so the medium of learning must already be in place (Mentalese)
Children must first learn to speak English before they can think
Truth conditions are unlearnable

Show answer

B. The argument: evaluating truth conditions for "the cat is on the mat" already presupposes you know what cat/mat mean — so there must be an innate, prior, symbolic medium (Mentalese).

2. The Plunkett & Marchman (1993) connectionist past-tense model demonstrated:

Children do not actually overgeneralize
A single network without explicit rules can reproduce both overgeneralization errors and gradual learning curves
Past-tense learning requires the dual-route architecture
Symbolic AI is sufficient for language learning

Show answer

B. One net, no explicit rules, but the developmental signature of children appears anyway — challenges the necessity of the symbolic dual-route account.

3. Transitional probabilities help an infant segment speech because:

They are high within words and low across word boundaries
They are low within words and high across word boundaries
They are uniform across the speech stream
They depend on the listener's prior vocabulary

Show answer

A. Within "cat", p(/ae/ | /k/) is high; across the boundary to "meows", p(/m/ | /t/) is low. The dip is the cue.

4. The Stochastic Parrots paper (Bender, Gebru, McMillan-Major & Shmitchell 2021) argues that LLMs:

Have genuine semantic understanding
Refute Chomsky's nativism
Stitch linguistic forms together by probability without reference to meaning
Will eventually become AGI

Show answer

C. The paper's defining critique. Also raises ethical issues: energy cost, internet biases, documentation debt.

5. Why does the case of Nicaraguan Sign Language matter for the nativist position?

It shows that without explicit instruction, deaf children spontaneously generated a structured language with later generations adding features like spatial modulation
It shows that sign language is impossible without spoken language input
It refutes Chomsky
It demonstrates LLM emergence

Show answer

A. Documented by Judy Kegl — cited as evidence for innate language capacities.

6. In the "Attention is all you need" transformer architecture, the final-layer encoding of the last input token represents:

A hidden state to be discarded
The prediction of the next token
A summary of the entire vocabulary
An attention weight matrix

Show answer

B. The final encoding of the last token IS the next-token prediction. Output is deterministic; randomness is added post-hoc by sampling.

JLB Chapter 15

Consciousness

Consciousness has two distinct dimensions (wakefulness and awareness), splits into easy vs. hard problems (Chalmers), and admits multiple competing theories. The empirical methods of cognitive science can address the easy problems; whether they can address the hard problem is a matter of fierce debate.

9.1 Two dimensions of consciousness (Laureys 2005)

Wakefulness

A state. Gradual. Varies over time. Objectively measurable (EEG, behaviour).

vs.

Awareness

An experience. To be conscious of something. First-person; not externally observable.

State	Sleep/wake cycle?	Reactions?	Awareness?
Coma	No	Only reflexes	None
UWS (vegetative)	Yes	Autonomic, eye-opening, reflex behaviour	None (apparent)
MCS (minimally conscious)	Yes	+ some non-reflex movements (fixations, follow commands)	Some
LIS (locked-in)	Yes	Cannot move a muscle (minimal eye movement at most)	Fully awake and conscious

In a study of 54 UWS/MCS patients, 5 could modulate brain activity via fMRI on command — they were conscious despite the clinical diagnosis.

9.2 The knowledge argument — Mary's Room (Jackson 1986)

"Mary is confined to a black-and-white room… she knows all the physical facts about us and our environment… It seems, however, that Mary does not know all that there is to know. For when she is let out of the black-and-white room or given a color television, she will learn what it is to see something red…"— Frank Jackson, 1986

Formal argument:

Inside the room, Mary has complete knowledge of how the brain processes colour.
So she knows everything about the information-processing of red.
When she leaves the room, she acquires new knowledge — what red is like.
Therefore, some aspects of conscious experience cannot be understood in terms of information processing.

9.3 Non-conscious processing — priming & dissociations

Strategy: contrast processing that works without awareness with processing that requires it.

Face/Tool priming: categorization is faster when the prime is congruent (face primes face).
Word priming: "DOG" is recognized faster after "CAT" than after "CAR" — the priming is semantic, not visual.
Non-conscious priming is short-lived across SOA — consciousness allows information to be retained over time.
Double dissociation — in disorder 1: A intact, B impaired; in disorder 2: B intact, A impaired. Strong evidence for separable mechanisms.

Neglect

Lesions to right parietal/frontal. Patients lack awareness of contralesional (typically left) space — yet implicit processing of neglected information can occur.

Blindsight

Lesions to V1. Patients are aware they cannot see ("How can I look at something I haven't seen?") yet, when forced to guess, are correct above chance for movement, orientation, even some shapes.

9.4 What is consciousness for?

Patients in blindsight or neglect never voluntarily act on stimuli in their affected field. The lecture's claim:

Consciousness permits identification of targets and planning of deliberate, voluntary action.

Milner & Goodale: two visual streams revisited

Dorsal — vision for action

Online motor control. Non-conscious. Not fooled by the Ebbinghaus illusion — grip aperture is veridical even when perceived size is illusory.

vs.

Ventral — vision for perception

Conscious perception, deliberate-action planning. Damaged in patient D.F. (visual form agnosia) — she can post a card into a slot but cannot report its orientation.

Fang & He (2005, Nature Neuroscience): interocular suppression renders a stimulus invisible. Result — robust dorsal activity even when the stimulus is invisible, but ventral activity tracks conscious perception. Conscious awareness is restricted to the ventral pathway.

9.5 Block's distinction

Phenomenal consciousness (P)

Raw experience. Qualia — the felt redness of red, the painfulness of pain. The "what it is like" aspect.

vs.

Access consciousness (A)

Reportable. Direct control of thought, reasoning, speech, action. Information available to the global workspace.

9.6 Chalmers — easy vs hard problems

Easy problems

Discrimination, categorization, reaction to environment
Integration of information
Reportability of mental states
Internal-state access
Focus of attention
Deliberate behavioural control
Wakefulness vs sleep

The Hard problem

"There is something it is like to be a conscious organism… the felt quality of redness, the sound of a clarinet, the smell of mothballs… What unites all of these states is that there is something it is like to be in them."— Chalmers, 1995

Why is there subjective experience at all? Why isn't the information processing happening "in the dark"?

9.7 Three theories of consciousness

Global (Neuronal) Workspace

(Baars · Dehaene & Changeux). Theatre metaphor: attention spotlights content on stage; the audience receives the broadcast; backstage processes shape what gets in. Hierarchical, distributed architecture broadcasts integrated info brain-wide → reportability. Pyramidal neurons may form the substrate. Claims to dissolve the hard problem.

Integrated Information Theory (IIT)

(Tononi). Consciousness = integrated information (Φ). A system is conscious to the extent it has irreducible cause-effect structure. Named in lecture without detailed Φ calculation.

Recurrent Processing (RP)

(Lamme). Recurrent loops in sensory cortex generate phenomenal experience. Named only.

📘 Textbook adds — Higher-Order Theories (Rosenthal, Armstrong, Lycan, Carruthers)

The textbook treats HOT as one of the major theories alongside GWT, IIT, and recurrent processing — the lecture only names it. Core claim: a mental state is conscious iff it is the object of a suitable higher-order mental state. The very same first-order state can be conscious at one moment and nonconscious at another, depending on whether something higher is "watching" it.

HOP — Higher-Order Perception (Armstrong, Lycan)

A first-order state becomes conscious when an inner sense (introspection / a quasi-perceptual scanner) targets it. Objection: defining inner sense via "awareness" risks circularity; sensory representations of abstract thoughts (the Pythagorean theorem) seem implausible.

vs.

HOT — Higher-Order Thought (Rosenthal)

A first-order state is conscious iff accompanied by a thought about it. Empirical support: Lau & Passingham — visual masking varied subjective awareness while behavioural accuracy was constant; awareness tracked activation in dorsolateral PFC (BA 46).

Standard objection to both: when I consciously smell a rose, I am aware of the rose, not of a mental state representing the rose — HOT/HOP seem to misdescribe the phenomenology. They also struggle to explain why consciousness has any distinctive functional role if first-order states behave identically with or without an accompanying HOT.

9.8 Dennett's deflationary view — the vitalism analogy

"The so-called hard problem of consciousness will disappear once we have a good enough understanding of the various phenomena lumped together under the label 'access consciousness'."— Daniel Dennett (1942–2024)

Analogy: in the 19th century, biology and chemistry seemed incapable in principle of explaining what separates living from non-living matter — there must be an élan vital ("vital force"). As biology matured, vitalism evaporated. Same fate (Dennett predicts) awaits the hard problem.

9.9 Cognitive scientists vs Mysterians

Cognitive scientists: consciousness is a thriving research programme; the easy problems are tractable, and progress on them will eventually dissolve the hard problem.
Mysterians: consciousness is, in principle, beyond cognitive-scientific tools.

Self-test · Lecture 9

1. Who coined the term "hard problem of consciousness"?

Frank Jackson
Ned Block
David Chalmers
Daniel Dennett

Show answer

C. Chalmers (1995). Jackson did Mary's Room; Block did P/A distinction; Dennett is the deflationist.

2. The Mary's Room thought experiment is associated with:

Daniel Dennett, defending illusionism
Frank Jackson (1986), the knowledge argument
Ned Block, P-consciousness
Giulio Tononi, IIT

Show answer

B. Jackson's thought experiment argues that physical knowledge is insufficient for knowing what red is like.

3. The distinction between P-consciousness (qualia) and A-consciousness (reportable, control of thought/action) is due to:

Tononi
Chalmers
Ned Block
Baars

Show answer

C. Ned Block. Many cognitive scientists collapse them; Block insists they are distinct.

4. Locked-In Syndrome (LIS) is characterized by:

No sleep/wake cycle, no awareness
Sleep/wake cycle and some minimal non-reflex movements
Inability to move any muscle but with full awareness and consciousness
A coma

Show answer

C. LIS = fully conscious, fully aware, but no motor output (maybe minimal eye movements).

5. Fang & He (2005) used interocular suppression and showed that:

Both dorsal and ventral activity require conscious vision
Dorsal-stream activity persists for invisible stimuli; ventral activity does not
Ventral-stream activity persists for invisible stimuli; dorsal does not
Conscious vision recruits only V1

Show answer

B. Robust dorsal activity for invisible stimuli; ventral activity only for conscious stimuli. Conscious perception tracks the ventral stream.

6. Dennett's argument that the hard problem will dissolve is built on an analogy with:

The phlogiston theory of combustion
Vitalism / élan vital
Cartesian dualism
Phrenology

Show answer

B. Once chemistry and biology matured, vitalism had nothing left to do. Dennett predicts the same fate for the hard problem as access consciousness gets explained.

7. Which theory of consciousness uses the "theatre with a spotlight, an audience, and backstage processes" metaphor?

Integrated Information Theory (Tononi)
Higher-Order Theory (Rosenthal)
Global (Neuronal) Workspace Theory (Baars / Dehaene)
Recurrent Processing Theory (Lamme)

Show answer

C. GWT/GNW — attention spotlights content, audience receives broadcast, backstage processes set context.

Synthesis

Cross-cutting Themes

The lectures keep circling the same fault lines from different angles. Spotting them is half of cognitive science.

Theme 1 — Three paradigms (symbolic / connectionist / Bayesian-dynamical)

Paradigm	Starts from	Knowledge lives in	Star applications in the course
Symbolic / Classical	Algorithmic level (Marr); the mind	Explicit rules + symbols	SHRDLU, Fodor's Mentalese, dual-route past tense
Connectionist	Implementational level; the brain	Pattern of weights	Plunkett & Marchman past tense, LLMs
Bayesian / Dynamical	Computational behaviour over time	Probabilities or state-space trajectories	Word segmentation, Lidz anaphora, Ising depression, DFT

Theme 2 — Localization vs holism / encapsulation vs integration

Phrenology (wrong in particulars, right in spirit) → Broca → Brodmann → fMRI subtraction → modular vision streams.
Fodor: peripheral modules + central isotropic system. Massive modularity: all modules, no central system.
Dorsal/ventral streams recur in lectures 3, 4, and 9 — different lesions, illusions, and consciousness studies all converge on the same anatomical dissociation.

Theme 3 — Conscious vs non-conscious processing

Behaviourism would deny "the unconscious." But Corteen & Wood, blindsight, neglect, priming, and Owen et al.'s vegetative-state patients all show information processing without (or apart from) awareness.
The lecture's unifying answer: consciousness is for deliberate, voluntary action and durable explicit information maintenance.

Theme 4 — Normative vs descriptive

Bayesianism = how rational agents should reason.
Tversky & Kahneman = how humans actually reason (badly, with predictable biases).
Same tension in language learning: do children obey UG (norms) or are they statistical learners (descriptive)?

Theme 5 — The four big "dissolution" moves

Materialism dissolves dualism — Hobbes vs Descartes (lec 3).
S–S dissolves S–R — Rescorla vs Watson (lec 1).
Connectionism dissolves the need for explicit rules — Plunkett & Marchman vs dual-route (lec 5, 8).
Dennett dissolves the hard problem — vitalism analogy (lec 9).

Timeline

Timeline of Key Figures & Events

Date	Figure / Event	Contribution
1247	Bethlem Royal Hospital ("Bedlam")	First closed institution for the mentally ill
1596–1650	René Descartes	Cartesian dualism; pineal gland as mind–body interaction site
1588–1679	Thomas Hobbes	Materialism — everything is matter, soul is meaningless
~1810s	Franz Gall	Phrenology — wrong in detail, right in spirit (localization)
1824–1880	Paul Broca	Clinical neuropsychology; Broca's area = speech production
1898	Edward L. Thorndike	Puzzle-box experiments; Law of Effect
1904	Ivan Pavlov	Nobel Prize; classical conditioning of salivation in dogs
1913	John B. Watson	Founds behaviourism (S–R)
~1920s+	B. F. Skinner	Operant conditioning; Skinner box; reinforcement schedules
1943	McCulloch & Pitts	Neural networks compute any computable function
1945	Walter Freeman	Transorbital icepick lobotomy; >40,000 in the US
1953	Volkova	Semantic generalization in conditioning
1956	George A. Miller	"Magical number 7 ± 2"; ~3 bits channel capacity
1958–60	Cherry · Broadbent · Treisman	Dichotic listening; early-selection filter; switching
1959	Moray	"Own name" breakthrough in unattended ear
1965	Joseph Weizenbaum	ELIZA
1970	Terry Winograd	SHRDLU (MIT)
1972	Kenneth Colby	PARRY (Stanford); 48% Turing-test accuracy
1972	Corteen & Wood	GSR breakthrough — semantic processing of unattended info
1973	Rescorla	S–S vs S–R habituation experiment
1973	Cooper & Shepard	Mental rotation
1974	Thomas Nagel	"What is it like to be a bat?"
1980	David Marr	Three levels of analysis (published posthumously 1982)
1986	Frank Jackson	Mary's Room — the knowledge argument
1993	Plunkett & Marchman	Connectionist past-tense network (20-30-20)
1995	David Chalmers · van Gelder	Hard problem of consciousness · "What might cognition be, if not computation?"
2005	Fang & He	Interocular suppression — dorsal vs ventral consciousness
2006	Owen et al.	fMRI detects covert awareness in vegetative-state patients
2016	Cramer et al.	Ising network model of depression
2017	Vaswani et al.	"Attention is all you need" — the Transformer
2019	Cisek · Busemeyer et al.	Phylogenetic refinement · Decision Field Theory
2021	Bender, Gebru, McMillan-Major & Shmitchell	"On the Dangers of Stochastic Parrots"
2023	Piantadosi · Binz & Schulz	"LLMs refute Chomsky" · GPT-3 on Wason task

Glossary

Master Glossary

High-yield terms across all lectures. Skim the night before; nail them all.

Conditioning & learning

Term	Definition
US / UR	Unconditioned stimulus / unconditioned response (food → salivation)
CS / CR	Conditioned stimulus / conditioned response (bell → salivation after pairing)
Extinction	CR weakens when CS is repeatedly unpaired with US
Spontaneous recovery	Partial return of CR after rest — CR is inhibited, not lost
S–R vs S–S theory	Direct stimulus-response bond vs link via mental representation of US
Law of Effect	Satisfying consequences strengthen responses (Thorndike)
Operant / instrumental	Self-initiated behaviour modified by consequences
Shaping	Reinforcement of successive approximations
FR / VR / FI / VI	Four schedules — VR (variable-ratio) most resistant to extinction

Brain & methods

Term	Definition
Action potential	All-or-none binary signal; frequency-modulated
Myelin	Sheath increasing axon speed; "white matter"; MS attacks it
Corpus callosum	Connects hemispheres; cut in callosotomy for refractory epilepsy
Brodmann area	Cytoarchitectonic parcellation (BA 4 motor, 17 visual, 44 Broca)
BOLD signal	Blood-oxygen-level dependent — what fMRI directly measures
DTI	Diffusion tractography — visualizes white-matter tracts
P1/N1, P300, N400, P600	ERP components — attention, oddball, semantic, syntactic
Dissociation / double dissociation	Strong evidence for separable mechanisms

Vision & categorization

Term	Definition
Primal / 2½-D / 3-D sketch	Marr's three stages of vision
Generalized cones	Marr's primitives — objects are stacks of cones
Geons (36)	Biederman's geometric primitives
Constancies (size/brightness/shape)	Brain corrects for retinal variation to perceive stable objects
Kanizsa's triangle	Illusory contours filled in by the brain
Dorsal vs ventral	Where/how (action) vs what (perception) pathways
Form agnosia vs integrative agnosia	Cannot perceive shape vs cannot integrate shape into recognition
Blindsight	Above-chance forced-choice without conscious vision (V1 damage)
Prosopagnosia	Face blindness; bilateral FFA damage
Capgras delusion	"My partner is an impostor" — opposite dissociation of prosopagnosia

Connectionism & AI

Term	Definition
Perceptron	Single-layer network of weighted inputs
Linearly separable	Class of functions a perceptron can learn (XOR is NOT)
Delta rule	ΔWᵢ = ε·Iᵢ; ΔT = −ε
Backpropagation	Assigns each hidden unit "responsibility" for output error
Gradient descent	Follow negative gradient; risk of local minima
Universal approximation	Multi-layer nets compute any Turing-computable function
Token / embedding	Text chunk / high-dim learned vector
Autoregression	Feed each prediction back as input
RLHF	Reinforcement learning from human feedback — second training stage of LLMs

Bayes

Term	Definition
Prior / likelihood / posterior	p(H), p(E\|H), p(H\|E)
Marginal evidence	p(E) = Σ p(E\|H)·p(H)
Dutch book	Bet that's fair to subject but guaranteed loss — justifies probability calculus
Base-rate neglect	Ignoring prior probability when updating
Transposed conditional	Confusing p(A\|B) with p(B\|A)
Conjunction fallacy	Judging p(A∧B) > p(A) — Linda problem
Maximizing vs matching	Always pick the higher-probability option vs match probabilities — maximizing wins
Search policy	argmaxᵢ pᵢ·aᵢ/cᵢ

Consciousness & mind

Term	Definition
Wakefulness / awareness	State vs experience (Laureys)
UWS / MCS / LIS	Vegetative / minimally conscious / locked-in
Qualia	Felt qualities of experience
P-consciousness / A-consciousness	Block's phenomenal vs access distinction
Easy / Hard problems	Information processing tractable / experience itself (Chalmers)
Knowledge argument / Mary's Room	Jackson 1986
GWT / GNW	Global (Neuronal) Workspace — Baars / Dehaene
IIT (Φ)	Integrated Information Theory — Tononi
RP	Recurrent Processing theory — Lamme
Vitalism (élan vital)	Dennett's analogy for why hard problem will dissolve
Mentalese / Language of Thought	Fodor's innate symbolic medium
Module (Fodor)	Domain-specific, encapsulated, mandatory, fast, fixed, specific breakdown
Isotropic	Un-encapsulated, holistic — Fodor's central processing
Massive modularity	No central processor; many Darwinian modules
State space / trajectory	DST geometric concepts for cognition over time

Big Practice Exam

40-question Cross-lecture MCQ Practice

Mixed, harder, exam-shaped. Click "Show answer" only after committing. The choices include the most common distractors a professor will use.

1. Pavlov originally believed that the salivation of dogs to the assistant's footsteps was:

A successful classical conditioning result
An experimental error he called "psychic secretion"
An operant response
Evidence of insight learning

Show answer

B. He initially called it "psychic secretion" and considered it noise — then pivoted his entire programme to study it.

2. Which of the following is the strongest evidence that ignored auditory input is processed semantically?

Cherry's finding that listeners notice voice-pitch changes in the ignored ear
Moray's (1959) "own name" breakthrough
Treisman's (1960) ear-switching effect
Corteen & Wood's (1972) GSR to ROME after conditioning to PARIS, LONDON, CAIRO

Show answer

D. Generalization across the semantic category "city" requires the meaning to be extracted from the ignored channel. (Own name and ear-switching are also evidence but the GSR data are the most striking semantic finding.)

3. McCulloch & Pitts based their model on three principles. Which one is NOT among them?

Basic physiology
Propositional logic
Turing's theory of computation
Information theory (Shannon)

Show answer

D. The three are physiology, propositional logic, and Turing computation. Shannon's information theory came from Miller's lineage, not McCulloch & Pitts.

4. In Marr's framework, "the algorithm for edge detection followed by stereopsis-based 2½-D sketching" describes a system at which level?

Computational
Algorithmic
Implementational
Connectionist

Show answer

B. Algorithmic = specific representations + steps. The implementational level would describe the neurons in V1/V2 carrying out those steps.

5. Biederman's recognition-by-components is most similar in spirit to:

Marr's "generalized cones"
Pavlov's S–S theory
Broadbent's filter model
Skinner's shaping

Show answer

A. Both decompose objects into geometric primitives — 36 geons / generalized cones.

6. A patient can post a card into a slot but cannot report the slot's orientation. The most likely lesion is to:

Dorsal pathway only
Ventral pathway only
Both pathways
Broca's area

Show answer

B. Ventral pathway damage = visual agnosia. The dorsal "how" pathway is intact, so action proceeds normally.

7. Capgras delusion is opposite to prosopagnosia in that:

Capgras patients have intact explicit recognition but lost emotional/implicit recognition
Capgras patients lose both explicit and implicit recognition
Capgras patients have intact emotional but lost explicit recognition
Capgras involves the dorsal stream

Show answer

A. Capgras: explicit ✅, emotional ❌. Prosopagnosia: explicit ❌, emotional ✅.

8. The fMRI BOLD signal reflects:

Direct firing of neurons
Changes in haemoglobin's magnetic properties due to oxygenation
Cerebral blood flow measured by radioactive tracer
Magnetic fields produced by ion currents

Show answer

B. Oxy- vs deoxy-haemoglobin have different magnetic properties; that contrast is what fMRI measures. NOT direct neural activity.

9. Owen et al. (2006) instructed a vegetative-state patient to imagine "playing tennis" vs "walking through your house" because:

These tasks activate motor (SMA) vs parahippocampal regions — task-specific, voluntary, and detectable on fMRI
They are easier for patients than verbal tasks
They reduce noise in EEG recordings
They are the only paradigms compatible with locked-in syndrome

Show answer

A. Task-specific BOLD activation on command demonstrates covert awareness — a yes/no communication channel.

10. In a perceptron, the activation function turns the weighted sum into a 0 or 1 output. The classic version used in textbook walkthroughs is:

Sigmoid
ReLU
Softmax
Heaviside step function

Show answer

D. Heaviside step = binary threshold, analogous to a neuron's spike threshold. ReLU and sigmoid are more common in modern practice.

11. The XOR problem matters historically because:

It is the only function single-layer perceptrons can compute
It is non-linearly-separable, exposing the fundamental limitation of single-layer perceptrons
It cannot be computed by any neural network
It requires Hebbian learning

Show answer

B. XOR's non-linear separability motivated multi-layer networks and backpropagation.

12. Plunkett & Marchman's (1993) past-tense model showed that:

You need a dual-route architecture for overgeneralization to occur
A single neural network without explicit rules can reproduce overgeneralization and gradual learning curves
Children do not overregularize
Backpropagation is biologically implausible

Show answer

B. 20-30-20 phonological-input network; reproduces "goed" errors and the developmental trajectory.

13. The Cisek-style massive-modularity view rejects:

Domain-specific modules
The existence of a domain-general central processor
Natural selection
Pragmatic representations

Show answer

B. Fodor keeps a non-modular central system; massive modularity eliminates it.

14. "What might cognition be, if not computation?" is the banner question for:

Symbolic AI
Connectionism
Dynamical systems theory (van Gelder, 1995)
Predictive coding

Show answer

C. Van Gelder's quote launches the dynamical-systems alternative to representational/computational accounts.

15. The Ising network model of depression treats symptoms as:

Symptoms of a single underlying latent variable
Binary nodes coupled by pairwise weights; stress = extra input to all nodes
Symbolic atoms in a Mentalese
Output of a single decision system

Show answer

B. Network model. Contrasts with the latent-variable approach (gallstones cause all symptoms).

16. In Bayes' rule, the posterior is highest when:

The likelihood is high and the prior is low
The likelihood is high and the prior is high
The evidence is high
The evidence equals the likelihood

Show answer

B. Posterior ∝ Likelihood × Prior. Both must be high.

17. The Dutch-book argument shows that:

Personal probabilities are subjective and unscientific
Violations of probability calculus make one exploitable, so rational beliefs must obey it
Conditional probabilities are unreliable
Frequentism is correct

Show answer

B. The normative justification for Bayesianism.

18. Why do people rank "Linda is a bank teller and active in the feminist movement" higher than "Linda is a bank teller"?

Conjunction fallacy: representativeness overrides probability theory
The conjunction is actually more likely
Bank tellers form a subset of feminists
Availability heuristic

Show answer

A. Tversky & Kahneman's signature finding. Conjunctions are subsets — they can only be less probable.

19. In a stochastic environment (4 red / 2 green), what does the maximizing strategy achieve?

56% correct
67% correct
100% correct
33% correct

Show answer

B. Always-red → 2/3. Probability matching → 5/9 = 56%. Maximizing wins.

20. The "stochastic parrots" critique of LLMs (Bender et al. 2021) argues that:

LLMs have genuine semantic grounding
LLMs stitch linguistic forms together by probability without reference to meaning
LLMs cannot be trained on biased text
LLMs refute Chomsky

Show answer

B. The defining quote. Also flags ethical issues: energy cost, biases, documentation debt.

21. The "knowledge argument" (Jackson 1986) concludes:

Physical knowledge fully captures conscious experience
There are aspects of conscious experience not captured by complete physical knowledge
Mary cannot learn colour from books
Qualia are an illusion

Show answer

B. When Mary leaves the room she learns something new — so physical knowledge is incomplete (anti-physicalist conclusion).

22. Which is the correct ordering of states by increasing level of awareness?

Coma < LIS < UWS < MCS
Coma < UWS < MCS < LIS
UWS < Coma < MCS < LIS
LIS < MCS < UWS < Coma

Show answer

B. Coma → vegetative (UWS) → minimally conscious (MCS) → locked-in (fully conscious).

23. Fang & He (2005) found that under interocular suppression:

Both streams went silent
Dorsal-stream activity persisted; ventral activity tracked conscious perception
Ventral-stream activity persisted; dorsal activity tracked conscious perception
Conscious perception preceded both streams

Show answer

B. Conscious perception tracks the ventral stream; the dorsal stream operates non-consciously.

24. In Cooper & Shepard's mental-rotation paradigm, reaction time is highest at approximately:

0° / 360°
90°
180°
270°

Show answer

C. Maximally rotated from upright = maximum rotation needed → highest RT. Returns to baseline at 360°.

25. The N400 ERP component most reliably indexes:

Early sensory processing
Attention
Semantic processing / semantic anomaly
Syntactic reanalysis

Show answer

C. "He spread butter on his socks" elicits a large N400. P600 indexes syntactic reanalysis; P1/N1 are attention/sensory.

26. Volkova (1953) demonstrated:

Operant conditioning of language
Semantic generalization in classical conditioning (responses generalized from "good"/"bad" to whole sentences)
Extinction of conditioned fear
Discrimination training in pigeons

Show answer

B. Generalization extended beyond physical properties to meaning.

27. According to Weber's law jnd = kM, if k = 0.03 for weight detection, what's the smallest weight difference detectable when M = 100 kg?

0.03 kg
0.3 kg
3 kg
30 kg

Show answer

C. 0.03 × 100 = 3 kg. The same proportion of any magnitude.

28. Treisman's attenuation theory of attention claims that:

Ignored input is completely blocked at an early filter
All input is processed for meaning; ignored input is forgotten
Ignored input is attenuated but not blocked; important info (your name) is spared
Distractor processing depends on the load of the main task

Show answer

C. Attenuation = intermediate between Broadbent (block) and Deutsch & Deutsch (no filter). Load theory is Lavie's.

29. Which is the strongest Bayesian-style critique of LLMs as models of human cognition?

They use too many parameters
They are trained on 4–5 orders of magnitude more data than human children, so behavioural similarity does not imply mechanistic similarity (multiple realizability)
They cannot solve the Wason task
They are deterministic

Show answer

B. Guest & Martin's multiple-realizability point combined with the data-asymmetry concern: children learn from vastly less input, so the mechanisms can't be the same.

30. Lobotomy was popularized in the US by:

Paul Broca
Walter Freeman (transorbital icepick, 1945; >40,000 procedures)
B. F. Skinner
Antonio Damasio

Show answer

B. Freeman's transorbital approach removed the need for a neurosurgeon and the procedure became horrifyingly widespread.

31. Which of these scholars argued — using a vitalism analogy — that the hard problem of consciousness will dissolve?

Chalmers
Nagel
Dennett
Tononi

Show answer

C. Dennett: as biology and chemistry matured, "élan vital" had no work left to do. He predicts the same for the hard problem.

32. Block's "P-consciousness" refers to:

The reportable, control-of-thought aspect of consciousness
The phenomenal/qualia aspect — what it's like
Pre-reflective awareness
Primary sensory cortex activity

Show answer

B. P = phenomenal (qualia). A = access (reportability, direct control). Many cogsci accounts collapse them; Block insists they're distinct.

33. The optimal Bayesian search policy is:

argmaxᵢ pᵢ
argmaxᵢ aᵢ
argmaxᵢ pᵢ · aᵢ / cᵢ
argmaxᵢ pᵢ + aᵢ − cᵢ

Show answer

C. Maximize the ratio of (probability there × detection probability) to cost.

34. The "magical number" Miller proposed corresponds approximately to a channel capacity of:

1 bit
3 bits (≈ 7 items)
7 bits
10 bits

Show answer

B. 7 ± 2 items, ~3 bits, roughly modality-independent.

35. "Colorless green ideas sleep furiously" demonstrates:

Syntactic anomaly with semantic well-formedness
Syntactic well-formedness with semantic anomaly
Both syntactic and semantic anomaly
The conjunction fallacy

Show answer

B. Grammatical structure is fine; meaning is anomalous. Used in lecture 8 to argue syntax and semantics are dissociable.

36. The "perceptron convergence rule" guarantees:

Convergence on any Turing-computable function
Convergence whenever a perceptron-realizable (i.e., linearly separable) solution exists
Convergence on the global minimum of any cost surface
Convergence in O(log n) iterations

Show answer

B. Convergence is guaranteed if the problem is linearly separable; otherwise (e.g., XOR) the rule fails.

37. Which is the best example of operant (NOT classical) conditioning?

A dog salivates when a bell rings
A child blinks when a puff of air hits her eye
A rat presses a lever more often because pressing produces food
A baby orients to a loud sudden sound

Show answer

C. Operant = response modified by its consequences. The others are reflexes / classical conditioning.

38. The Lidz et al. "red ball" anaphora example is a Bayesian argument because:

It uses transitional probabilities to segment words
It computes p(referent | sentence) ∝ p(sentence | referent) · p(referent), favoring "red ball" because more specific hypotheses make the data more probable
It refutes Chomsky
It demonstrates the conjunction fallacy

Show answer

B. A specific hypothesis (red ball) is more likely to have produced the observed sentence than a generic one (ball).

39. The strongest evidence that consciousness is restricted to the ventral pathway is:

Patient D.F.'s ability to post a card despite ventral damage
The grip aperture being unaffected by the Ebbinghaus illusion (dorsal not fooled)
Fang & He (2005)'s interocular suppression: dorsal active for invisible stimuli, ventral only for conscious ones
All of the above

Show answer

D. All three lines converge on the same conclusion: the dorsal stream operates non-consciously; conscious experience tracks the ventral stream.

40. Which of the following statements is FALSE?

fMRI has high spatial but low temporal resolution
Single-unit recording achieves both high temporal and high spatial resolution but is invasive
MEG measures changes in blood oxygenation
EEG has high temporal but low spatial resolution

Show answer

C. MEG measures magnetic fields generated by neural electrical activity, NOT blood oxygenation (that's fMRI). The other three statements are correct.

— End of study guide —
Good luck on the exam. Trust the priors. Update on evidence.

JLB Chapter 16

The Emotions: From Cognitive Science to Affective Science

Emotions were largely ignored in early cognitive science. Affective science now studies them with a rich, multidisciplinary toolkit — from genetics and lesion studies to neuroimaging — using fear as its central case study.

16.1 Early Theories

Herbert Simon — Emotions as Interrupt Mechanisms

Simon (1967) argued that any sufficiently complex serial information-processing system (like the mind) must contain interrupt mechanisms — processes that can suspend an ongoing goal and substitute a new one when circumstances demand it. His proposal: emotions are those interrupt mechanisms in the CNS.

Three key properties of emotions on Simon's account:

They interrupt ongoing goals and substitute new goals/behaviours.
They arouse the autonomic nervous system in predictable physiological ways.
They generate feelings of emotion.

Simon's paper is influential but thin on detail — no specific emotion is actually analysed. It raises key open questions: How should emotions be classified? Are some basic? What is the role of arousal, physiology, and feeling? What are the neural bases?

Paul Ekman — Basic Emotion Theory

Ekman asked whether facial expressions of emotion are cross-cultural universals or products of social learning. To avoid the confound of media exposure, he studied the Fore linguistic-cultural group in New Guinea — a preliterate, visually isolated culture.

Method: participants were shown 2–3 photos of facial expressions while a story indicating an emotion was read aloud. They had to point to the matching face. Six target emotions: happiness, sadness, anger, surprise, disgust, fear.

Result: both adults and children in New Guinea matched emotions to faces at rates significantly above chance, supporting universal cross-cultural recognition. Ekman also showed that literate cultures could recognise New Guinean facial expressions.

Ekman's two main claims:

There are discrete, separate basic emotions, each with a coherent set of facial, physiological, cognitive, and behavioural responses.
Each basic emotion serves a distinctive evolutionary function and is hardwired for specific life tasks.

Example — fear: eyebrows raised and horizontal, upper eyelid lifted, more sclera exposed (gathers information about threat); heart rate and skin conductance elevated; peripheral blood flow redirected to large skeletal muscles (preparing to flee).

Criticisms: Meta-analyses cast doubt on strict links between emotions and specific physiological/neural signatures. Cultural anthropologists question whether emotions are truly independent of social context.

16.2 Affective Space and the Affective Scientist's Toolkit

Affective Space: Beyond Discrete Categories

Affective phenomena vary in duration and function — emotions, moods, instincts, drives, and affective traits are all distinct (though their boundaries are fuzzy). Many affective scientists now model emotions as points in a multidimensional space rather than discrete categories.

The simplest model uses two dimensions:

Valence — pleasure ↔ displeasure (attractiveness vs. aversiveness of a situation)
Arousal — degree of physiological/psychological engagement

The circumplex model (Russell, 1980) plots emotions on a circle defined by valence and arousal. Includes both emotions and moods (e.g., sadness and depression are adjacent).

Adolphs & Anderson (2018) propose a richer 7-dimensional framework: scalability, valence, persistence, generalisation, global coordination, automaticity, social coordination. Designed to apply to non-human animals without relying on verbal self-reports.

Appraisal theories (Lazarus and others) add a cognitive dimension: emotions involve evaluating the environment relative to the subject's goals — what Lazarus calls core relational themes. On this view, anger and fear differ because they involve different appraisals of the same situation.

The Affective Scientist's Toolkit

Different tools study different components of an emotional episode (trigger → perception → neural/somatic response → behavioural response, with optional: cognitive appraisal, feelings, verbal report):

fMRI, PET, electrophysiology — neural responses
FACS (Facial Action Coding System) — behavioural/expressive responses
Physiological measures — heart rate, skin conductance, finger temperature
Lesion studies — causal role of specific brain regions
Genetic tools — knockout experiments, optogenetics, pharmacogenetics

Genetic Tools

Knockout experiments (mice, 1990s+): replace a functional gene with a nonfunctional copy in stem cells, then study behavioural change. E.g., knocking out the serotonin receptor 5-HT(1A) increases anxiety-like behaviour in mice.

Optogenetics: engineer specific neurons to express a light-sensitive ion channel (opsin). Neurons can then be switched on/off with light (millisecond resolution). Allows targeted intervention in specific neural populations.

Pharmacogenetics: engineer neurons to express receptors for specific synthetic drugs (not normal neurotransmitters). Can activate or inhibit targeted populations. Slower than optogenetics (minutes to hours).

GECIs / GEVIs: genetically engineered indicators for calcium or voltage — allow optical measurement of neural activity as a complement to electrophysiology.

Lesion Studies

Classic case: Phineas Gage (1848) — iron rod through his head destroyed ventromedial prefrontal cortex. Physical and perceptual abilities preserved; emotional regulation and social behaviour drastically changed. First evidence linking prefrontal cortex to emotional function.

Lesions in humans provide information about dissociations; animal lesions allow greater anatomical precision and pre/post comparisons. Types: permanent (aspiration, excision, neurotoxins) or reversible (pharmacological, cryogenic cooling, TMS).

16.3 Fear: A Multilevel and Multidisciplinary Case Study

Fear Conditioning in Rodents

Fear conditioning pairs a neutral conditioned stimulus (CS — e.g., a tone) with an aversive unconditioned stimulus (US — e.g., foot shock). After training, CS alone elicits fear responses (freezing, autonomic arousal). This provides a controllable, reproducible way to study fear.

Multiple studies converge on the amygdala (a subcortical limbic structure) as central to fear:

Electrical stimulation of the amygdala produces fear responses (increased respiration, heart rate, blood pressure, freezing) → amygdala is sufficient.
Lesion studies: amygdala lesions abolish conditioned freezing and reduce unconditioned defensive behaviour (e.g., lesioned rats approach sedated cats) → amygdala is necessary.
The basolateral nucleus (BLA) contains distinct neuron populations responding to positive vs. aversive stimuli, projecting to different brain regions.

Fear and Amygdala Damage in Humans — Patient S.M.

S.M. has bilateral amygdala destruction from Urbach–Wiethe disease (UWD), a rare genetic condition. Her basic cognition (intelligence, memory, language, perception) is intact. But she shows profound impairment in experiencing, expressing, and recognising fear.

Feinstein et al. (2011): exposed S.M. to live snakes and spiders, a haunted house, and scary films. She showed no fear responses — no avoidance, no subjective fear. She experienced other emotions normally, confirming fear-specific amygdala involvement.

BLA vs. CMA — fine-grained amygdala function:
A separate group of UWD patients (Namaqualand, South Africa) had damage only to the basolateral amygdala (BLA), leaving the central-medial amygdala (CMA) intact. Their response: fear hypervigilance — exaggerated attention to mild threat cues (e.g., better recognition of fearful facial expressions). This contrasts with S.M.'s hypovigilance (BLA + CMA damaged).

Interpretation: the BLA exerts inhibitory control over the CMA. Without BLA, CMA fires impulsively → hypervigilance. Without both → no fear response at all.

Neuroimaging of Fear in Humans

LaBar et al. (1998): fMRI showed amygdala activation during fear conditioning and extinction in humans; greatest involvement in early stages of conditioning.

The human fear network identified by neuroimaging:

Amygdala — central fear processing
Hippocampus — memory/contextual information
Insula — broader emotion processing (especially disgust)
Anterior cingulate cortex (ACC) — bridges limbic and prefrontal systems
Ventromedial prefrontal cortex (vmPFC) — integrative hub; modulates/controls fear responses (especially in extinction)

Clinical relevance: PTSD involves overgeneralised fear, slow extinction, amygdala/ACC hyperactivity, and low vmPFC activity.

Mobbs et al. (2010): scanned subjects while a live tarantula was placed at varying distances from their foot. Closer threat → increased amygdala, insula, ACC, BNST activation (active coping: flee). More distant threat → increased orbitomedial PFC activity (passive coping: freeze). Approach vs. retreat also differentially activated amygdala and BNST.

Summary

Affective science grew from cognitive science's neglect of emotion. Simon proposed a computational account (emotions as interrupt mechanisms); Ekman proposed discrete, universal basic emotions. Contemporary affective science has moved toward multidimensional models of affective space and uses a broad toolkit — neuroimaging, lesion studies, fear conditioning, and genetic tools. Fear is the best-studied emotion, with the amygdala (and its subregions BLA/CMA) playing a necessary and central role, embedded in a wider network including the hippocampus, insula, ACC, and vmPFC.