Among humans’ cognitive faculties, the ability to process others’ actions is essential. We can recognize the meaning behind running, eating, and finer movements like tool use. How does the visual system process and transform information about actions?
To explore this question, we collected 120 action videos, spanning a range of every-day activities sampled from the American Time Use Survey. Next, we used behavioral ratings and computational approaches to measure how these videos vary within three distinct feature spaces: visual shape features (“gist”), kinematic features (e.g., body parts involved), and intentional features (e.g., used to communicate). Finally, using fMRI, we obtained neural responses for each of these 2.5s action clips in 9 participants.
To analyze the structure in these neural responses, we used an encoding-model approach (Mitchell et al., 2008) to fit tuning models for each voxel along each feature space, and assess how well each model predicts responses to individual actions.
We found that a large proportion of cortex along the intraparietal sulcus and occipitotemporal surface were moderately well fit by all three models (median r=0.23-0.31). In a leave-two-out validation procedure, all three models could accurately classify between two action videos in ventral and dorsal stream sectors (65-80%, SEM=1.1%-2.6%). In addition, we observed a significant shift in classification accuracy between early visual cortex (EVC) and higher-level visual cortex: the gist model best in early visual cortex, whereas the high-level models out-performed gist in occipito-temporal and parietal regions.
These results demonstrate action representations can be successfully predicted using an encoding-model approach. More broadly, the pattern of fits for different feature models reveals that visual information is transformed from low- to high-level representational spaces in the course of action processing. These findings begin to formalize the progression of the kinds of action information being processed along the visual stream.
Ekman (Ekman, 1992) developed the Directed Facial Action Task (DFAT), which demonstrated that facial expressions can elicit emotional physiology. The present study investigated whether these responses also have mood-congruent memory effects, as found when emotions are elicited in other ways. 38 participants performed the DFAT for happy and sad expressions before recalling neutral, positive and negative images. The mood-congruent memory hypothesis predicts that, if the DFAT produces sustained affect, participants should recall more mood-congruent than mood-incongruent images. Some participants performed the DFAT while Galvanic Skin Response, an index of autonomic emotional response, was recorded. GSR correlated with reported mood change in the happy condition, while the difference between mood-congruent and –incongruent memory correlated with reported mood change in the sad condition. However, there was no correlation between GSR and memory. These results show that self-reported emotion but not physiological response was linked to congruency effects in memory for emotional images.
The inferior frontal gyrus and inferior parietal lobe have been characterized as human homologues of the monkey “mirror neuron” system, critical for both action production and recognition. However, data from brain lesion patients with selective impairment on only one of these tasks provides evidence of neural and cognitive dissociations. We sought to clarify the relationship between action production (AP) and action recognition (AR), and their critical neural substrates, by directly comparing performance of 131 chronic left-hemisphere stroke patients on both tasks—to our knowledge, the largest lesion-based experimental investigation of action cognition to date. Using voxel-based lesion-symptom mapping (VLSM), we found that lesions to primary motor and somatosensory cortices and inferior parietal lobule were associated with disproportionately impaired performance on AP, while lesions to lateral temporal-occipital cortex (LTO) were associated with a relatively rare pattern of disproportionately impaired performance on AR. In contrast, damage to posterior middle temporal gyrus (pMTG) was associated with impairment on both AP and AR. The distinction between LTO, critical for recognition, and pMTG, important for both tasks, suggests a rough gradient from modality-specific to abstract representations in posterior temporal cortex, the first lesion-based evidence for this phenomenon. Overall, the results of this large patient study help to bring closure to a long-standing debate by showing that tool-related action production and recognition critically depend on both common and distinct left hemisphere neural substrates, most of which are external to putative human mirror regions.