Overview
In many ways, primary visual cortex can be thought of as our mind’s eye. It sits at a pivotal location in our visual system, such that losing primary visual cortex results in total loss of conscious visual perception. It is the site where visual information first enters cortex from the eyes, via the thalamus. As nearly all visual information passing to extrastriate visual areas first passes through primary visual cortex, the processing that it undertakes serves a critical and holistic role in our visual system.
Until the Renaissance, it was generally thought that the optic nerves terminated in the ventricles, which were supposed to house the animal spirits that drove our nervous system. In the 17th century, Thomas Willis was able to trace the optic nerves to the thalamus, establishing it for a time as the highest level of vision. However, evidence of an even higher visual center in cortex would soon begin to emerge. One early such example can be found in an anecdote given by the Dutch physician Herman Boerhaave during one of his popular medical lectures, in which he described the tale of a Parisian pauper with an exposed brain who would supposedly solicit alms while holding his own calvarium:
“He would frequently permit experiments to be made for a small trifle of money. Upon gently pressing the dura mater with one’s finger, he suddenly perceived as it were, a thousand sparks before his eyes, and upon pressing a little more forcibly, and then his eyes lost all their sight…”
More substantial evidence of a cortical seat for vision would soon be found. In the 19th century, Albrecht von Graefe developed an early form of perimetry to study visual fields in humans, and by examining medical cases of homonymous hemianopia and relating these to hemiplegias he was able to posit a role for cortex in visual perception. Around the same time, cortical lesion studies in animal models by researchers including David Ferrier and Hermann Munk would further suggest a cortical seat of vision, with Munk placing visual cortex in the occipital lobe.
In humans, primary visual cortex takes up a significant portion of our occipital lobe, extending from the posterior pole along the medial wall of the hemisphere ( Fig. 30.1 ). It is made up of six principal layers and is roughly 2 mm thick. Primary visual cortex goes by many other names, including “V1,” “striate cortex,” named as such because the 18th century Italian anatomist Francesco Gennari noted a white stripe running through the middle of the occipital lobe when he sliced through the brain, and “Area 17,” based on a numerical map of cortical areas generated by Korbidian Brodmann at the turn of the 20th century.
Within a given hemisphere, V1 receives information from both eyes, but only regarding the contralateral hemifield. In V1, the visual world is mapped onto cortical space in a retinotopic manner, meaning that neighboring portions of V1 encode neighboring portions of the visual field. The fovea—which takes up a much larger share of V1 per visual degree than more peripheral parts of the visual field—is represented in the occipital pole, while the far periphery is represented in the anterior margin of the calcarine fissure. The upper and lower visual fields are mapped onto the lower (lingual gyrus) and upper (cuneus gyrus) banks, respectively. Evidence for such arrangement of V1 in humans comes from experimental evidence using methods such as positron emission tomography (PET) or, more commonly, functional magnetic resonance imaging (fMRI). Perturbations to the brain in humans using either electrical stimulation or transcranial magnetic stimulation (TMS) have validated these maps, as have years of clinical work mapping scotomas related to damage in specific portions of V1. Our modern view of V1 starts with the work of Nobel laureates David Hubel and Torsten Wiesel, whose work studying light-evoked responses of individual neurons in V1 is described below in further detail.
Visual inputs to V1 and local cortical circuits
The six layers of V1 comprise several different cell types, with approximately 80% of cells being excitatory and approximately 20% being inhibitory. As is seen further, these six layers are further broken down based on the specific cell types in each layer and the specific types of connections the cells make. In broad strokes, thalamic input to V1 is focused in layer 4 (except for the koniocellular geniculocortical pathway), inhibitory neurons tend to modify local cortical processing, and excitatory neurons in both superficial (above layer 4) and deep (below layer 4) layers undertake tasks related to local, long-range cortical, and long-range subcortical signaling.
Bottom-up visual inputs to V1 from the lateral geniculate nucleus
V1 receives its predominant, driving visual input from the lateral geniculate nucleus (LGN). Work in nonhuman primates has shown that LGN axons carry signals from both left and right eyes, and from the parvocellular, magnocellular, and koniocellular (P, M, and K, respectively) layers, and that all these inputs remain segregated as they first synapse in V1 ( Fig. 30.2 ). The left and right eye segregation of axons from the M and P LGN layers into layer 4 C of V1 forms the anatomical basis for ocular dominance columns, which represent the substrate for binocularity, as discussed in detail further on. In V1, K, M, and P LGN axons terminate in separate layers and sublayers (see Fig. 30.2 for details). K LGN axons are likely made up of several classes, as those that end in different V1 layers appear to come from different populations of LGN cells.
Other inputs to V1
Besides the LGN, V1 receives a variety of other modulatory inputs both from subcortical and cortical areas. These inputs include serotonergic, noradrenergic, and cholinergic inputs from the brainstem and basal forebrain nuclei, respectively. The latter inputs show differences in density in V1 layers, but a much less specific pattern of innervation than do LGN inputs. Other input sources include the intralaminar nuclei of the thalamus and the pulvinar, both of which send broad projections most heavily to layers 1 and 2 of V1. Additionally, there are retinotopically more specific sources of input to V1, including from the claustrum and visual (V) areas 2, 3, 4, and 5 (V3, V4, and V5 are also referred to as the dorsomedial [DM], dorsolateral [DL] and middle-temporal [MT] visual areas, respectively ). As a rule, any area to which V1 projects also sends feedback to V1. However, some higher-order visual areas in the temporal and parietal lobes that do not receive direct projections from V1 send axons to V1. With the exception of the claustrum, the axons of which overlap with M and P axons in layer 4 C, the other extrastriate visual inputs to V1 terminate outside of layer 4 C. Some specific roles of such “top-down” inputs to V1 are described in more detail in the feedback section further.
Interlaminar circuitry and cell types
Feedforward pathways
The main thalamic inputs to V1 terminate in layer 4 C (except for K visual channels). Signals next propagate to layers 2/3, where they are sent intracortically to higher visual areas, as well as locally to layer 5. Layer 5 provides input to layer 6. Layers 5 and 6 send signals primarily to subcortical nuclei. These are by no means the only interlaminar feedforward pathways in V1, but they are the most well described across species and likely the most robust ( Fig. 30.3 ).
Intralaminar and feedback pathways
There is dense connectivity within each cortical layer enabling significant intralaminar processing. Additionally, there is significant feedback connectivity between connected cortical layers ( Fig. 30.3 ), meaning that upon receiving input from a given layer, the recipient layer processes the information and then relays a signal back to the input layer, thus resulting in highly recurrent processing. Such extensive recurrent processing has made it particularly complicated for researchers to gain a holistic view of how exactly information is processed locally in cortex and what kind of signal transformations are taking place.
Parallel streams within V1
In primate V1, beyond the thalamic input layers, where geniculate M, P, and K axons remain segregated, the three streams significantly intermingle. Whereas layer 4B receives inputs from M-dominated layer 4 Cα, and layer 4 A mainly from P-dominated layer 4 Cβ (in addition to direct geniculate K and P inputs to 4 A; Fig. 30.3 ), outputs from all subdivisions of 4 C converge onto the same regions in layer 3B, where inputs from K LGN axons also terminate. Layer 3 A, the major output layer to the secondary visual cortical area (V2), in turn, receives inputs from layer 3B, thus relaying a mixture of M, P, and K information to V2. Layer 4B sends mixed M and P signals to V2. Layer 4B also directly projects to area MT, a dorsal stream area specialized in motion processing, but only conveys M signals to MT, an area in the dorsal stream specialized in motion processing. Thus, except for the V1 output pathway to MT, there is not strict segregation of geniculate streams beyond layer 4 of V1.
Cell types making up the cortical circuit
Recent RNA sequencing work in the visual cortex of mice has found that V1 is comprised of approximately 20 different types of excitatory neurons and 20 different types of inhibitory neurons, with many of these cell types being specific to distinct cortical layers. Subsequent single cell RNA sequencing data from human cortex (the middle-temporal gyrus—obtained from surgical resections in epileptic patients) revealed that human cortex comprises roughly 75 transcriptionally unique cell types, with some of the cell types exhibiting strong transcriptional similarity to cell types found in mouse V1. Within cortex, excitatory neurons release the neurotransmitter glutamate from their axon terminals, and can be grossly separated based on morphology into pyramidal cells, which exhibit a long primary dendrite that streams out of the cell body up toward the cortical surface, and stellate cells, which comprise layer 4 excitatory neurons. Inhibitory cells (interneurons) release the neurotransmitter GABA from their axon terminals. Whereas many different morphologic, electrophysiological, and genetic types of inhibitory neurons have been described, recent work—largely in mice—has identified three major nonoverlapping classes of inhibitory neurons expressing unique molecular markers. Studies on the connectivity and functional properties of these inhibitory neuron types have further indicated that each type plays a unique role in cortical computations. Parvalbumin-expressing (PV) interneurons synapse onto the cell body/axon hillock region of nearby excitatory neurons, which provides them with a strong ability to veto the spiking of excitatory cells. In contrast, somatostatin-expressing (SST) interneurons synapse onto the dendrites of excitatory neurons, thus modulating the input-output transformation occurring in excitatory cells. Both PV and SST cells can also inhibit other interneurons. Another well-studied group of interneurons are vasoactive intestinal peptide–expressing (VIP) cells, which receive significant long-range cortical inputs and in turn provide inhibitory inputs to nearby SST interneurons. Thus, VIP cells can telegraph long-range information via local disinhibition: excitation of VIP cells can inhibit local SST neurons, which leads to disinhibition (i.e., removal of inhibition) of the dendrites of excitatory neurons. It remains unknown whether these three major classes of interneurons correspond to nonoverlapping populations in primate V1. However, the seminal studies of Lund and colleagues, which have identified a variety of morphologic interneuron types and their connections, indicate that primate V1 includes most morphologic types identified in rodents.
Processing in V1: classical and extraclassical receptive fields, functional architecture, and long-range connections
The early visual pathway performs a local analysis of the visual scene. This analysis begins in the retina and LGN, where circular receptive fields with a center-antagonistic surround organization process contrast, brightness, and color. This analysis continues in V1, where major changes in receptive field structure endow V1 cells with new response properties, including binocularity, sensitivity to stimulus orientation, spatial frequency, motion direction, and binocular disparity. Like LGN, V1 retains an orderly retinotopic organization, but V1 neurons have larger receptive field sizes than LGN cells.
The classical receptive field
The seminal studies of Hubel and Wiesel in cat and monkey V1 led to the discovery of orientation selectivity , the property of V1 cells to respond to a narrow range of edge orientations. Thus, the analysis of object contour, which represents the first step in the processing of object form, begins in V1. Hubel and Wiesel identified three types of orientation-selective receptive fields arranged in serial order of complexity: simple, complex, and hypercomplex. Simple cells have receptive fields with spatially segregated ON and OFF subregions, and dominate in the thalamic input layers of V1 ( Fig. 30.4 ). The response of these cells depends on the spatial arrangement of their inhibitory and excitatory subregions; when presented with an oriented moving bar of light, they respond optimally when the bar leaves the OFF region and enters the ON region. Hubel and Wiesel proposed a model in which orientation selectivity in simple cells results from the convergence of inputs from LGN cells with spatially aligned circular symmetric receptive fields ( Fig. 30.4 ). Although the mechanisms for the generation of orientation selectivity have been debated for decades, there is now good evidence that feedforward LGN inputs generate an orientation bias or preference in V1 neurons, according to a feedforward mechanism similar to that proposed by Hubel and Wiesel; however, intra-V1 recurrent excitatory and inhibitory connections amplify such bias and sharpen orientation tuning, respectively.
Complex cells have overlapping ON and OFF subregions and dominate outside the geniculate input layers. Unlike simple cells, complex cells show positional invariance, that is, they respond to an oriented bar presented at any location inside their receptive field and respond continuously to a moving bar of light in their receptive field. Hypercomplex or end-stopped cells respond to oriented bars of restricted length and are suppressed by long bars extending beyond their receptive field, suggesting they may signal more complex contours such as curves or corners. Hubel and Wiesel originally proposed that the receptive fields of each cell class (simple, complex, and hypercomplex) resulted from the convergence of inputs from the lower-order class in a hierarchical fashion. However, it appears that this is not the case, as many complex cells can receive direct inputs from the LGN, both simple and complex cells are found in all V1 layers, and length-tuning (or more generally size-tuning) is a general property of V1 cells, both simple and complex, now termed surround modulation or surround suppression .
The processing of object form requires more than just information about the orientation of contours. One can reconstruct any visual pattern when, in addition to edge orientation, information about the spatial frequency, contrast, and phase content is also available. To probe how the visual system analyzes spatial patterns, visual psychophysicists and neurophysiologists use grating stimuli . Fig. 30.5A shows an example of sinusoidal gratings in which the intensity of the light and dark bars changes gradually around the mean as a sinusoidal function of space. Gratings possess the four image properties listed above, and each can be varied systematically while keeping the others constant. The spatial frequency of the grating is the number of pairs of bars (or cycles) per degree of visual angle. The contrast is the difference in light intensity between the light and dark bars of the grating: when this difference is large the grating is said to have high contrast ( Fig. 30.5A , left and middle), when it is low the grating has low contrast ( Fig. 30.5A , right). The spatial phase refers to the position of the bars relative to some landmark. Using these grating stimuli, it has been shown that, in addition to orientation selectivity, V1 neurons show selectivity for spatial frequency ( Fig. 30.5B–K ). Broad tuning for spatial frequency is first established in the retina by the center-surround organization of its receptive fields. In V1, spatial frequency tuning becomes narrower than in retina and LGN. Furthermore, whereas in LGN cells spatial frequency tuning has low-pass characteristics, in V1 the majority of cells exhibit bandpass spatial frequency tuning ( Fig. 30.5B–K ). Early models of form vision suggested that the visual system’s spatial frequency selectivity is the basis for coding visual images by Fourier analysis, an idea that was not supported by experimental evidence. However, a number of subsequent models have used local spatial frequency filters with properties similar to the receptive fields of cells in the primate visual system to successfully describe the visual system’s ability to detect and discriminate visual objects, suggesting that even if the Fourier model is not fully accurate, some form of spatial frequency analysis does occur in the cortex.
It is also in V1 that the initial analysis of object motion is performed. Direction selectivity , the preferential response of a neuron to an object moving in a particular direction, is another receptive field property that, in primates, emerges predominantly in V1 (in rabbits and mice, direction-selective cells are commonly found in the retina, whereas only a sparse population of cells in the primate retina and LGN shows direction selectivity). Many models of direction selectivity require a time delay in the inputs to distinct parts of a cell’s receptive field, such that an object moving in the optimal direction first encounters a long-response latency region and subsequently a short latency region, so that signals from both regions arrive simultaneously at the cell body where they summate, leading the cell to firing; in contrast, when the object is moving in the nonoptimal direction, signals from the different regions do not temporally summate and the cell does not reach its firing threshold.
Another receptive field property that emerges outside the geniculate input layers of V1 is binocularity , the integration of inputs from the two eyes. Most binocular cells in V1 are dominated by signals from one or the other eye, a property called ocular dominance (see Fig. 30.9 ). Many binocular neurons are also selective for retinal disparity , a cue for stereoscopic depth perception. Retinal disparity results because objects located in front or behind the observer’s plane of fixation hit slightly different locations on the two retinas. Individual V1 cells are tuned to a narrow range of retinal disparities.
Functional architecture
Columns and maps
As described in Fig. 30.1 , V1 contains a systematic map of the visual field. This retinotopic organization imposes constraints on how object features and attributes are represented in cortex, requiring them to be encoded at each visual field location to avoid gaps in the overall representation. Using single microelectrode recordings, Hubel and Wiesel discovered that neurons recorded in a vertical electrode penetration shared similar orientation preference, ocular dominance and retinotopic location, but when the electrode was inserted at an oblique angle relative to the cortical surface, orientation preference, ocularity, and retinotopic position shifted in an orderly fashion. Near the fovea representation of V1, visuotopic location shifted approximately every 2 mm, while a full cycle of orientations and ocular dominance columns were represented approximately every 800 and 1000 µm, respectively; thus, both eyes and a full set of orientations are represented at each spatial location. Based on these findings, Hubel and Wiesel proposed the concept of the hypercolumn as the processing unit or module of V1. However, it was not until the advent of intrinsic signal optical imaging that we gained an understanding of how different feature maps are represented in V1 ( Fig. 30.6 ). This technique is based on differences in reflectance of red light (when the latter is shone on the cortical surface while presenting visual stimuli to the animal’s eyes), between oxygenated and deoxygenated blood that occurs as a result of changes in neural activity. Using this approach, it was discovered that in cat and primates, orientation-preference maps have a pinwheel-like organization: domains/columns representing all orientations around the clock are represented in an orderly manner as the spokes of a pinwheel around a pinwheel center or singularity; abrupt changes in orientation preference occur at singularities and regions of the map called fractures ( Fig. 30.6A ) . Eye preference is also organized into a map consisting of alternating bands of eye preference, the ocular dominance bands , repeating along the tangential domain of V1 ( Fig. 30.6B ). Orientation singularities align preferentially with the center of ocular dominance bands ( Fig. 30.6D ). Superimposed to these maps is also a direction-preference map, whereby each orientation domain is split into two subdomains preferring opposite directions for that given orientation.
Single microelectrode recording studies demonstrated that, similar to orientation, cells with similar spatial frequency tuning are also clustered together in V1. Studies using C-2-deoxyglucose uptake showed that presentation of gratings of high or low spatial frequency results in patchy activation patterns, with low spatial frequency responses coinciding with the cytochrome oxidase blobs in V1 (discussed further in chapter). However, because the clustering of spatial frequency tuning is looser than that for orientation tuning, it has long been difficult to demonstrate whether spatial frequency is mapped continuously in V1. Using two-photon calcium imaging, which allows recording from large neuronal populations at single cell resolution, it has been recently demonstrated that spatial frequency in layer 2/3 of macaque V1 is indeed organized into a highly structured and continuous map, the contours of which run orthogonally to the contours of the orientation map.
Cytochrome oxidase blobs
Embedded into the computational module of V1 (the hypercolumn) are the cytochrome oxidase (CO) blobs and interblobs ( Fig. 30.6C ). The CO blobs are clusters of cells, about 400 µm in diameter, prominent in the superficial layers of V1, which receive direct inputs from the K layers of the LGN, and stain darkly for CO; the latter is a metabolic enzyme of the Krebs cycle, which is particularly rich within the more metabolically active regions of V1, (i.e., those receiving direct inputs from the LGN). Blobs lie preferentially at the center of ocular dominance columns and align with the centers of orientation pinwheels ( Fig. 30.6D ; however, pinwheels are more numerous than blobs, thus while most blobs align with pinwheel centers, not all pinwheel centers align with blobs). Physiologically, neurons in the CO blobs have been reported to have receptive fields that are tuned for low spatial frequency, exhibit broader tuning for orientation, and are color selective. In contrast, neurons in the interleaving CO-pale interblobs prefer medium and high spatial frequencies, and are more sharply tuned for orientation. However, it has long been debated whether CO blobs uniquely contain color-selective cells and are, thus, devoted to color processing, and whether color and orientation are processed by distinct channels. A recent two-photon imaging study in macaque V1 has shown that a significant proportion of color-selective cells, which prefer isoluminant color over achromatic gratings, are also orientation selective, suggesting that color and orientation in V1 can be processed by similar circuits. However, unoriented, color-preferring neurons are predominantly located in the blobs, whereas oriented and color-selective cells dominate in the interblobs. Altogether, physiologic and imaging studies point to the blobs as regions specialized in the processing of surface properties, such as brightness and color, and the interblobs as regions specialized in the processing of contours.
The extraclassical receptive field: surround modulation in V1
Hubel and Wiesel, and many visual neurophysiologists after them, characterized the receptive field properties of single V1 cells to isolated grating or bar stimuli. This led to the concept of the classical receptive field as the visual field region where presentation of stimuli of optimal parameters for the neuron evokes a spiking response. However, it was later discovered that a visual stimulus extending beyond a neuron’s receptive field, or presentation of stimuli outside the receptive field, modulate the neuron’s response to stimuli inside its receptive field, a property that was termed surround modulation (reviewed in ). Surround modulation is an integral part of visual information processing because natural visual stimuli do not activate neurons in isolation, but within the context of other stimuli. Indeed, it has been described at all levels of the visual system, from the retina to extrastriate cortex, across many species and sensory modalities. A fundamental property of surround modulation is its orientation-dependence, that is gratings of different orientations presented inside and outside a neuron’s receptive field typically evoke a higher response from the neuron compared with stimuli of the same orientation. Thus, orientation discontinuities in a visual stimulus (as well as motion or texture discontinuities) enhance a neuron’s response. Because of this property, it was proposed that surround modulation serves to compute visual saliency, pop-out, object boundary detection, or figure-ground segregation. However, using iso-oriented and collinearly aligned small bar stimuli presented inside and outside the receptive field of a neuron, other researchers found response enhancement, leading these investigators to suggest that surround modulation serves contour integration , i.e., to group together collinear elements into elongated contours. Theoretical and experimental work has additionally suggested that the function of surround modulation may be to reduce redundancies in the responses of neurons to natural images/movies (which contain strong spatiotemporal correlations) and to increase response sparseness, which could lead to more efficient coding of natural stimuli.
Circuits for surround modulation
Surround modulation requires integration of visual signals across distant visual field locations. Because feedforward afferents from the LGN are spatially restricted to the receptive field size of their target V1 cells and LGN cells are not orientation tuned (in monkey), feedforward mechanisms are insufficient to account for orientation-tuned surround modulation in V1. Intra-areal long-range horizontal connections within V1 and interareal feedback connections from extrastriate cortex to V1 are two sets of connections thought to provide the spatial integration of signals and tuning required for surround modulation (reviewed in ) ( Fig. 30.7 ).