Match moving

From Wikipedia, the free encyclopedia

Match moving is a special effects technology to allow the insertion of virtual objects into real footage with the correct position, scale, orientation and motion in relation to the photographed objects in the scene. The term is used loosely to refer to several different ways of extracting motion information from a motion picture, particularly camera movement. Match moving is related to rotoscoping and photogrammetry. It is sometimes referred to as motion tracking.

Match moving is distinct from motion capture. The latter is a technology for recording the motion of objects, often human actors, in a controlled environment with special cameras and technology. The former is a software technique applied to normal footage that can be recorded in uncontrolled environments with ordinary cameras. This article defines match moving as the art of extracting motion information from actual footage, where additional cameras, motion capture sensors, and motion control photography are not necessarily used.

Match moving is primarily used to track the movement of a camera through a shot so that a virtual camera move can be reproduced inside of a computer. The intent is that when the virtual and real scenes are composited together they will come from the same perspective and appear seamless.

Match moving has two forms. Compositing programs, such as Shake, Adobe After Effects and discreet Combustion, have two dimensional motion tracking capabilities. This feature translates images in two-dimensional space and can add effects such as motion blur in an attempt to eliminate relative motion between two features of two moving images. This technique is sufficient to create verisimilitude when the two movies do not include major changes in camera perspective. For example a billboard deep in the background of a shot can often be replaced using two-dimensional tracking.

Three dimensional match moving tools make it possible to extrapolate three-dimensional information from two-dimensional photography. Programs capable of 3D match moving include:

  • Voodoo (free software)
  • Icarus (University of Manchester research project, now discontinued but still popular)
  • Maya Live (Module of Maya Unlimited)
  • The Pixel Farm PFTrack
  • PFHoe (Cost effective matchmover based on PFTrack algorithms)
  • RealViz MatchMover
  • Sciene.D.Visions 3DEqualizer (which won an academy award for technical achievement)
  • Ssontech SynthEyes
  • 2d3 Boujou

These programs allow users to derive camera movement and other relative motion from arbitrary footage. The tracking information can be transferred to computer graphics software such as Blender, Maya or Lightwave and used to animate virtual cameras and CGI objects.

The first, and still some of the best, examples of match moving were used in the film Jurassic Park. The filmmakers placed colored tennis balls in the scene as reference marks. They then used these marks to track the motion of the camera through the scene. This allowed virtual objects, such as CGI dinosaurs, to be added to complicated camera movements and even handheld shots. The tennis balls were later digitally painted out of the final shot.

Match moving is now an established visual effects tool.

Contents

[edit] How Match Moving Works

The process of match moving can be broken down into two steps.

[edit] Tracking

Image:Match moving - Tracking.jpg
Feature tracking in action. The green and orange paths represent tracks. The white squares encircle features that were automatically selected from the image by the motion tracking software. Screenshot from Icarus

The first step is identifying and tracking features. A feature is a specific point in the image that a tracking algorithm can lock onto and follow through multiple frames (SynthEyes calls them blips). Often features are selected because they are bright/dark spots, edges or corners depending on the particular tracking algorithm. Popular programs use template matching based on NCC Score and RMS Error. What is important is that each feature represents a specific point on the surface of a real object. As a feature is tracked it becomes a series of two-dimensional coordinates that represent the position of the feature across a series of frames. This series is referred to as a track. Once tracks have been created they can be used immediately for 2D motion tracking, or then be used to calculate 3D information.

[edit] Calibration

The second step involves solving for 3D motion. This process attempts to derive the motion of the camera by solving the inverse-projection of the 2D paths for the position of the camera. This process is referred to as Calibration.

To explain further: when a point on the surface of a three dimensional object is photographed its position in the 2D frame can be calculated by a 3D projection function. We can consider a camera to be an abstraction that holds all the parameters necessary to model a camera in a real or virtual world. Therefore a camera is a vector that includes as its elements the position of the camera, its orientation, focal length, and other possible paremeters that define how the camera focuses light onto the film plane. Exactly how this vector is constructed is not important as long as there is a compatible projection function P.

The projection function P takes as its input a camera vector (denoted camera) and another vector the position of a 3D point in space (denoted xyz) and returns a 2D point that has been projected onto a plane in front of the camera (denoted XY). We can express this:

XY = P(camera, xyz)
An illustration of feature projection. Around the rendering of a 3D structure, red dots represent points that are chosen by the tracking process. Cameras at frame i and j project the view onto a plane depending on the parameters of the camera. In this way features tracked in 2D correspond to real points in a 3D space. Though this illustration is computer generated, match moving is normally done on real objects.
Enlarge
An illustration of feature projection. Around the rendering of a 3D structure, red dots represent points that are chosen by the tracking process. Cameras at frame i and j project the view onto a plane depending on the parameters of the camera. In this way features tracked in 2D correspond to real points in a 3D space. Though this illustration is computer generated, match moving is normally done on real objects.

The projection function transforms the 3D point and strips away the component of depth. Without knowing the depth of the component an inverse projection function can only return a set of possible 3D points, that form a line emanating from the center of the camera and passing through the projected 2D point. We can express the inverse projection as:

xyz ∈ P'(camera, XY)

or

{xyz :P(camera, xyz) = XY}

Let's say we are in a situation where the features we are tracking are on the surface of a rigid object such as a building. Since we know that the real point xyz will remain in the same place in real space from one frame of the image to the next we can make the point a constant even though we do not know where it is. So:

xyzi = xyzj

where the subscripts i and j refer to arbitrary frames in the shot we are analyzing. Since this is always true then we know that:

P'(camerai, XYi) ∩ P'(cameraj, XYj) ≠ {}

Because the value of XYi has been determined for all frames that the feature is tracked through by the tracking program, we can solve the reverse projection function between any two frames as long as P'(camerai, XYi) ∩ P'(cameraj, XYj) is a small set. Set of possible camera vectors that solve the equation at i and j (denoted Cij).

Cij = {(camerai,cameraj):P'(camerai, XYi) ∩ P'(cameraj, XYj) ≠ {})

So there is a set of camera vector pairs Cij for which the intersection of the inverse projections of two points XYi and XYj is a non-empty, hopefully small, set centering around a theoretical stationary point xyz .

In other words, imagine a black point floating in a white void and a camera. For any position in space that we place the camera, there is a set of corresponding parameters (orientation, focal length, etc) that will photograph that black point exactly the same way. Since C has an infinite number of members, one point is never enough to determine the actual camera position.

As we start adding tracking points, we can narrow the possible camera positions. For example if we have a set of points {xyzi,0,...,xyzi,n} and {xyzj,0,...,xyzj,n} where i and j still refer to frames and n is an index to one of many tracking points we are following. We can derive a set of camera vector pair sets {Ci,j,0,...,Ci,j,n}.

In this way multiple tracks allow us to narrow the possible camera parameters. The set of possible camera parameters that fit, F, is the intersection of all sets:

F = Ci,j,0 ∩ ... ∩ Ci,j,n

The fewer elements are in this set the closer we can come to extracting the actual parameters of the camera. In reality errors introduced to the tracking process require a more statistical approach to determining a good camera vector for each frame, optimization algorithms and bundle block adjustment are often utilized. Unfortunately there are so many elements to a camera vector that when every parameter is free we still might not be able to narrow F down to a single possibility no matter how many features we track. The more we can restrict the various parameters, especially focal length, the easier it becomes to pinpoint the solution.

In all, the 3D solving process is the process of narrowing down the possible solutions to the motion of the camera until we reach one that suits the needs of the composite we are trying to create.

[edit] Point Cloud Projection

Once the camera position has been determined for every frame it is then possible to estimate the position of each feature in real space by inverse projection. The resulting set of points is often referred to as a point cloud because of its raw appearance like a nebula. Since point clouds often reveal some of the shape of the 3D scene they can be used as a reference for placing synthetic objects or by a reconstruction program to create a 3D version of the actual scene.

[edit] Ground Plane Determination

The camera and point cloud need to be oriented in some kind of space. Therefore, once calibration is complete, it is necessary to define a ground plane. Normally, this is a unit plane that determines the scale, orientation and origin of the projected space. Some programs attempt to do this automatically, though more often the user defines this plane. Since shifting ground planes does a simple transformation of all of the points, the actual position of the plane is really a matter of convenience.

[edit] Reconstruction

Reconstruction is the interactive process of recreating a photographed object using tracking data. This technique is related to photogrammetry. In this particular case we are referring to using match moving software to reconstruct a scene from incidental footage.

A reconstruction program can create three-dimensional objects that mimic the real objects from the photographed scene. Using data from the point cloud and the user's estimation, the program can create a virtual object and then extract a texture from the footage that can be projected onto the virtual object as a surface texture.

[edit] Automatic vs. Interactive Tracking

There are two methods by which motion information can be extracted from an image. Interactive tracking relies on the user to follow features through a scene. The points tracked by the user are then used to calculate the camera movement. Automatic tracking relies on computer algorithms to identify and track features through a shot.

The advantage of interactive tracking is that a human user can follow features through an entire scene and will not be confused by features that are not rigid. The disadvantage is that the user will inevitably introduce small errors as they follow objects through the scene, which can lead to drift.

The advantage of automatic tracking is that the computer can create many more points than a human can. A large number of points can be analyzed with statistics to determine the most reliable data. The disadvantage of automatic tracking is that, depending on the algorithm, the computer can be easily confused as it tracks objects through the scene.

Professional-level motion tracking is usually achieved using a combination of interactive and automatic techniques. An artist can remove points that are clearly anomalous and use tracking mattes to block confusing information out of the automatic tracking process.

[edit] Tracking Mattes

A tracking matte is similar in concept to a garbage matte used in traveling matte compositing. However, the purpose of a tracking matte is to prevent tracking algorithms from using unreliable, irrelevant or non-rigid tracking points. For example, in a scene where an actor walks in front of a background, the tracking artist will want to use only the background to track the camera through the scene, knowing that motion of the actor will throw off the calculations. In this case, the artist will construct a tracking matte to follow the actor through the scene, blocking that information from the tracking process.

[edit] Refining

Since there are often multiple possible solutions to the calibration process and a significant amount of error can accumulate, the final step to match moving often involves refining the solution by hand. This could mean altering the camera motion itself or giving hints to the calibration mechanism. This interactive calibration is referred to as refining.

Most match moving applications seem based on similar algorithms for tracking and calibration. Often, the initial results obtained are similar. However, it seems that each program has different refining capabilities. Therefore, when choosing software, look closely at the refining process.

[edit] Hardware Approaches

In applications where a character needs to interact with the CG composited environment, or the combination of a pan and zoom make the path ambiguous, or the resolution required for camera placement is higher than that which can be obtained by processing a moving video image, or the components of the image are not fixed with respect to each other, a hardware approach is required. In these cases, visible or infrared LEDs can be fixed to objects such as props, parts of the set, and the camera, and an optical tracking system can be used to track both the cameras, actors and props.

This method is preferred when the motion tracking hardware is already required for tracking the actor or props, as the software approaches work quite well and do not require any hardware. Active marker systems such as PhaseSpace [1] allow markers to be embedded in props and objects and provide real time input as to the relative coordinate systems allowing complex interactions. Embedded processors modulate the output of the LED to differentiate each marker so that hundreds of objects can be tracked.

[edit] Tips for Match Moving Cinematography

  1. Record everything.
    As in all visual effects photography, keep notes on every aspect of the shot. This will help in estimation.
    • Record the focal length.
    • Know the size of the film gate/CCD.
    • Measure the height of the lens from the ground plane.
    • Measure the distance from the nodal point to obvious features.
    • Measure the distance between obvious features.
    • Measure the distance between start and stop positions of a camera move.
  2. Create features where good ones don't exist.
    Avoid large surfaces that have very little or very repetitive texture. Make marks or add objects to the scene that will be easy to track. Plan on painting them out during compositing. Use colored balls, colored dots, or a grid of dots on a blue/green screen. Spheres work the best because it is easy to determine the center from all angles.
  3. Constrain as many parameters as possible.
    The fewer parameters that are variable the easier it will be to solve for the motion of the camera.
    • Use a constant focal length. Don't zoom.
    • Stay on a tripod. Of course you won't, but it will be easier.
    • Tilt, pan, and Dutch around the nodal point. This is the theoretical point around which the camera can rotate without shifting the perspective. Even in a dolly shot this will greatly simplify the overall motion of the camera.
    • Stay on a dolly. Some calibration algorithms can constrain the 3D motion solution to a straight line or curve.
  4. Include lateral motion in the shot.
    If you dolly sideways or boom, you will introduce parallax into the scene. This can add accuracy to calibration and point cloud projection.
  5. Avoid motion blur.
    Motion blur can add error to the perceived location of features. Large amounts of motion blur can create gaps in the tracking process where no tracks overlap. Keep handheld shots as steady as possible and use a wide lense.

[edit] See also

[edit] External links

[edit] Software

  • SynthEyes Camera tracker Provides widely used tools for camera tracking, motion capture, object building, object tracking, camera+object tracking, multiple-shot tracking, tripod (2.5-D) tracking, zooms, lens distortion, light solving, RAM playback, rotoscoped object separation and more.
  • PFTrack Image analysis and match moving tools for the film effects industry
  • PFHoe PFHoe is an extremely cost effective DV tracking application for Mac and Windows, aimed at DV hobbyists and budding CG artists. For the first time it is possible for those wanting to learn about match moving or experiment with matching computer graphics to video footage, to use cutting edge tools without price tag to match.

[edit] Books

Matchmoving: The Invisible Art of Camera Tracking, by Tim Dobbert, Sybex, Feb 2005, ISBN 0782144039

In other languages