Motion perception

The dorsal stream (green) and ventral stream (purple) are shown. They originate from a common source in visual cortex. The dorsal stream is responsible for detection of location and motion.

Motion perception is the process of inferring the speed and direction of elements in a scene based on visual, vestibular and proprioceptive inputs. Although this process appears straightforward to most observers, it has proven to be a difficult problem from a computational perspective, and extraordinarily difficult to explain in terms of neural processing.

Motion perception is studied by many disciplines, including psychology (i.e. visual perception), neurology, neurophysiology, engineering, and computer science.

Neuropsychology

The inability to perceive motion is called akinetopsia and it may be caused by a lesion to cortical area V5 in the extrastriate cortex. Neuropsychological studies of a patient who could not see motion, seeing the world in a series of static "frames" instead, suggested that visual area V5 in humans is homologous to motion processing area MT in primates.[1][2]

First-order motion perception

Example of Beta movement, often confused with phi phenomenon, in which a succession of still images gives the illusion of a moving ball.[3]

Two or more stimuli that are switched on and off in alternation can produce two different motion percepts. The first, demonstrated in the figure to the right is "Beta movement", often used in billboard displays, in which an object is perceived as moving when, in fact, a series of stationary images is being presented. This is also termed "apparent motion" and is the basis of movies and television. However, at faster alternation rates, and if the distance between the stimuli is just right, an illusory "object" the same colour as the background is seen moving between the two stimuli and alternately occluding them. This is called the phi phenomenon and is sometimes described as an example of "pure" motion detection uncontaminated, as in Beta movement, by form cues.[3] This description is, however, somewhat paradoxical as it is not possible to create such motion in the absence of figural percepts.

The phi phenomenon has been referred to as "first-order" motion perception. Werner E. Reichardt and Bernard Hassenstein have modelled it in terms of relatively simple "motion sensors" in the visual system, that have evolved to detect a change in luminance at one point on the retina and correlate it with a change in luminance at a neighbouring point on the retina after a short delay. Sensors that are proposed to work this way have been referred to as either Hassenstein-Reichardt detectors after the scientists Bernhard Hassenstein and Werner Reichardt, who first modelled them,[4] motion-energy sensors,[5] or Elaborated Reichardt Detectors.[6] These sensors are described as detecting motion by spatio-temporal correlation and are considered by some to be plausible models for how the visual system may detect motion. (Although, again, the notion of a "pure motion" detector suffers from the problem that there is no "pure motion" stimulus, i.e. a stimulus lacking perceived figure/ground properties). There is still considerable debate regarding the accuracy of the model and exact nature of this proposed process. It is not clear how the model distinguishes between movements of the eyes and movements of objects in the visual field, both of which produce changes in luminance on points on the retina.

Second-order motion perception

Second-order motion has been defined as motion in which the moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in luminance or motion energy in the Fourier spectrum of the stimulus.[7][8] There is much evidence to suggest that early processing of first- and second-order motion is carried out by separate pathways.[9] Second-order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies to which they respond. (The notion that neural responses are attuned to frequency components of stimulation suffers from the lack of a functional rationale and has been generally criticized by G. Westheimer (2001) in an article called "The Fourier Theory of Vision.") Second-order motion produces a weaker motion aftereffect unless tested with dynamically flickering stimuli.[10]

The aperture problem

The aperture problem. The grating appears to be moving down and to the right, perpendicular to the orientation of the bars. But it could be moving in many other directions, such as only down, or only to the right. It is impossible to determine unless the ends of the bars become visible in the aperture.

The motion direction of a contour is ambiguous, because the motion component parallel to the line cannot be inferred based on the visual input. This means that a variety of contours of different orientations moving at different speeds can cause identical responses in a motion sensitive neuron in the visual system.


See MIT example

Motion integration

Some have speculated that, having extracted the hypothesized motion signals (first- or second-order) from the retinal image, the visual system must integrate those individual local motion signals at various parts of the visual field into a 2-dimensional or global representation of moving objects and surfaces. (It is not clear how this 2D representation is then converted into the perceived 3D percept_.Further processing is required to detect coherent motion or "global motion" present in a scene.[11]

The ability of a subject to detect coherent motion is commonly tested using motion coherence discrimination tasks. For these tasks, dynamic random-dot patterns (also called random dot kinematograms) are used that consist in 'signal' dots moving in one direction and 'noise' dots moving in random directions. The sensitivity to motion coherence is assessed by measuring the ratio of 'signal' to 'noise' dots required to determine the coherent motion direction. The required ratio is called the motion coherence threshold.

Motion in depth

As in other aspects of vision, the observer's visual input is generally insufficient to determine the true nature of stimulus sources, in this case their velocity in the real world. In monocular vision for example, the visual input will be a 2D projection of a 3D scene. The motion cues present in the 2D projection will by default be insufficient to reconstruct the motion present in the 3D scene. Put differently, many 3D scenes will be compatible with a single 2D projection. The problem of motion estimation generalizes to binocular vision when we consider occlusion or motion perception at relatively large distances, where binocular disparity is a poor cue to depth. This fundamental difficulty is referred to as the inverse problem.[12]

Nonetheless, some humans do perceive motion in depth. There are indications that the brain uses various cues, in particular temporal changes in disparity as well as monocular velocity ratios, for producing a sensation of motion in depth.[13]

Perceptual learning of motion

Detection and discrimination of motion can be improved by training with long-term results. Participants trained to detect the movements of dots on a screen in only one direction become particularly good at detecting small movements in the directions around that in which they have been trained. This improvement was still present 10 weeks later. However perceptual learning is highly specific. For example, the participants show no improvement when tested around other motion directions, or for other sorts of stimuli.[14]

Cognitive map

Cognitive map is a type of mental representation which serves an individual to acquire, code, store, recall, and decode information about the relative locations and attributes of phenomena in their spatial environment. [15] [16] Place cells work with other types of neurons in the hippocampus and surrounding regions of the brain to perform this kind of spatial processing,[17] but the ways in which they function within the hippocampus are still being researched.[18]

Many species of mammals can keep track of spatial location even in the absence of visual, auditory, olfactory, or tactile cues, by integrating their movements—the ability to do this is referred to in the literature as path integration. A number of theoretical models have explored mechanisms by which path integration could be performed by neural networks. In most models, such as those of Samsonovich and McNaughton (1997)[19] or Burak and Fiete (2009),[20] the principal ingredients are (1) an internal representation of position, (2) internal representations of the speed and direction of movement, and (3) a mechanism for shifting the encoded position by the right amount when the animal moves. Because cells in the Medial Entorhinal Cortex(MEC) encode information about position (grid cells[21] ) and movement (head direction cells and conjunctive position-by-direction cells[22]), this area is currently viewed as the most promising candidate for the place in the brain where path integration occurs.

See also

References

  1. Hess, Baker, Zihl (1989). "The "motion-blind" patient: low-level spatial and temporal filters". Journal of Neuroscience. 9 (5): 1628–1640. PMID 2723744.
  2. Baker, Hess, Zihl (1991). "Residual motion perception in a "motion-blind" patient, assessed with limited-lifetime random dot stimuli". Journal of Neuroscience. 11 (2): 454–461. PMID 1992012.
  3. 1 2 Steinman, Pizlo & Pizlo (2000) Phi is not Beta slideshow based on ARVO presentation.
  4. Reichardt, W. (1961). "Autocorrelation, a principle for the evaluation of sensory information by the central nervous system". In W.A. Rosenblith. Sensory Communication. MIT Press. pp. 303–317.
  5. Adelson, E.H.; Bergen, J.R. (1985). "Spatiotemporal energy models for the perception of motion". Journal of the Optical Society of America A. 2 (2): 284–299. PMID 3973762. doi:10.1364/JOSAA.2.000284.
  6. van Santen, J.P.; Sperling, G. (1985). "Elaborated Reichardt detectors". Journal of the Optical Society of America A. 2 (2): 300–321. PMID 3973763. doi:10.1364/JOSAA.2.000300.
  7. Cavanagh, P & Mather, G (1989). "Motion: the long and short of it". Spatial vision. 4 (2–3): 103–129. PMID 2487159. doi:10.1163/156856889X00077.
  8. Chubb, C & Sperling, G (1988). "Drift-balanced random stimuli: A general basis for studying non-Fourier motion perception". J. Opt. Soc. Am. A. 5 (11): 1986–2007. doi:10.1364/JOSAA.5.001986.
  9. Nishida, S., Ledgeway, T. & Edwards, M. (1997). "Dual multiple-scale processing for motion in the human visual system". Vision Research. 37 (19): 2685–2698. PMID 9373668. doi:10.1016/S0042-6989(97)00092-8.
  10. Ledgeway, T. & Smith, A.T. (1994). "The duration of the motion aftereffect following adaptation to first- and second-order motion". Perception. 23 (10): 1211–1219. PMID 7899037. doi:10.1068/p231211.
  11. Burr, David C; Santoro, Loredana (2001). "Temporal integration of optic flow, measured by contrast and coherence thresholds". Vision Research. 41 (15): 1891–1899. ISSN 0042-6989. doi:10.1016/S0042-6989(01)00072-4.
  12. Maloney, Laurence T.; Lages, Martin; Heron, Suzanne (2010). "On the Inverse Problem of Binocular 3D Motion Perception". PLoS Computational Biology. 6 (11): e1000999. ISSN 1553-7358. PMC 2987932Freely accessible. PMID 21124957. doi:10.1371/journal.pcbi.1000999.
  13. Blake, Randolph; Wilson, Hugh (2011). "Binocular vision". Vision Research. 51 (7): 754–770. ISSN 0042-6989. doi:10.1016/j.visres.2010.10.009.
  14. Ball, K.; Sekuler, R. (1982). "A specific and enduring improvement in visual motion discrimination". Science. 219: 697–698.
  15. Kitchin RM (1994). "Cognitive Maps: What Are They and Why Study Them?". Journal of Environmental Psychology. 14 (1): 1–19. doi:10.1016/S0272-4944(05)80194-X.
  16. O'Keefe, John (1978). The Hippocampus as a Cognitive Map. ISBN 978-0198572060.
  17. Muir, Gary; David K. Bilkey (1 June 2001). "Instability in the Place Field Location of Hippocampal Place Cells after Lesions Centered on the Perirhinal Cortex" (PDF). The Journal of Neuroscience. 21 (11): 4016–4025. PMID 11356888.
  18. Redei, George (2008). Encyclopedia of Genetics, Genomics, Proteomics, and Informatics. p. 1501. ISBN 978-1-4020-6753-2..
  19. Samsonovich a, M. A. B. (1997). "Path integration and cognitive mapping in a continuous attractor neural network model". Journal of Neuroscience. 17 (15): 5900–5920. PMID 9221787.
  20. Burak, Y.; Fiete, I. R.; Sporns, O. (2009). Sporns, Olaf, ed. "Accurate Path Integration in Continuous Attractor Network Models of Grid Cells". PLoS Computational Biology. 5 (2): e1000291. PMC 2632741Freely accessible. PMID 19229307. doi:10.1371/journal.pcbi.1000291.
  21. Hafting, T.; Fyhn, M.; Molden, S.; Moser, M. -B.; Moser, E. I. (2005). "Microstructure of a spatial map in the entorhinal cortex". Nature. 436 (7052): 801–806. Bibcode:2005Natur.436..801H. PMID 15965463. doi:10.1038/nature03721.
  22. Sargolini, F. (5 May 2006). "Conjunctive Representation of Position, Direction, and Velocity in Entorhinal Cortex". Science. 312 (5774): 758–762. Bibcode:2006Sci...312..758S. PMID 16675704. doi:10.1126/science.1125572.

Hadad, B.,Maurer, D., Lewis, T. L. (2001). Long trajectory for the development of sensitivity to global and biological motion. Developmental Science, 14:6, pp 1330–1339.

Labs specialising in motion research

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.