In sensory substitution devices (SSDs), visual information captured by an artificial receptor is delivered to the brain using non-visual sensory information. Using an auditory-to-visual SSD called "The vOICe" we previously reported that blind individuals perform successfully on object recognition tasks and are able to recruit specific ventral 'visual' structures for shape recognition using the device (i.e. through soundscapes). Comparable recruitment was also observed in sighted individuals learning to use this device. Here we directly compare a group of seven subjects who learned to perform object recognition via soundscapes and a group of seven subjects who learned arbitrary associations between sounds and object identity. We contrast these two groups’ brain activity for object recognition using SSD, and for auditory object and scrambled object soundscapes. We show that the most critical structures specific for shape extraction for the purpose of object recognition are the left Pre-Central Sulcus (PCS) and the bilateral Lateral-Occipital Complex (LOC). We also found significant activation in the occipito-parietal and posterior occipital cortex not previously observed using a smaller sample of subjects. These results support the notion that interactions between visual structures and a network of additional areas, specifically in prefrontal cortex (PCS) might underlie the machinery which is most critical for achieving multisensory or metamodal shape recognition.