Egomotion is defined as the 3D motion of a camera within an environment[1]. In the field of computer vision, egomotion refers to estimating a camera's motion relative to a rigid scene[2]. An example of egomotion estimation would be estimating a car's moving position relative to lines on the road or street signs as observed from the car itself. The estimation of egomotion is important in autonomous robot navigation applications[3].
The goal of estimating the egomotion of a camera is to determine the 3D motion of that camera within the environment using a sequence of images taken by the camera[4]. The process of estimating a camera's motion within an environment involves the use of visual odometry techniques on a sequence of images captured by the moving camera[5]. This is typically done using feature detection to construct an optical flow from two image frames in a sequence[1] generated from either single cameras or stereo cameras[5]. Using stereo image pairs for each frame helps reduce error and provides additional depth and scale information[6].
Features are detected in the first frame, and then matched in the second frame. This information is then used to make the optical flow field for the detected features in those two images. The optical flow field illustrates how features diverge from a single point, the focus of expansion. The focus of expansion can be detected from the optical flow field, indicating the direction of the motion of the camera, and thus providing an estimate of the camera motion.
There are other methods of extracting egomotion information from images as well, including a method that avoids feature detection and optical flow fields and directly uses the image intensities[1].