Free viewpoint television
From Wikipedia, the free encyclopedia
|
Free Viewpoint Television (FTV) is a system for viewing natural video, allowing the user to interactively control the viewpoint and generate new views of a dynamic scene from any 3D position. The equivalent system for synthetic video is known as Virtual reality.
Systems for rendering arbitrary views of natural scenes have been well known in the Computer vision community for a long time but only in recent years has the speed and quality reached levels that are suitable for serious consideration as an end user system.
Professor Masayuki Tanimoto from Nagoya University (Japan) has done much to promote the use of the term "Free viewpoint television" and has published many papers on the ray space representation, although other techniques can, and are used for FTV.
FTV represents a revolutionary development in television watching as the focus of attention can be controlled by the viewers rather than a director. It is possible that each viewer may be observing a unique viewpoint. It remains to be seen how FTV will affect television watching as a group activity.
[edit] Capture of FTV
In order to acquire the views necessary to allow a high quality rendering of the scene from any angle, several cameras are placed around the scene; either in a studio environment or an outdoor venue, such as a sporting arena for example. The output Multiview Video (MVV) must then be packaged suitably so that the data may be compressed and also so that the users' viewing device may easily access the relevant views to interpolate new views.
[edit] Multi Camera Alignment
It is not enough to simply place cameras around the scene to be captured. The geometry of the camera set up must be measured by a process known in computer vision as camera calibration. Manual alignment would be too cumbersome so typically a 'best effort' alignment is performed prior to capturing a test pattern which is used to generate calibration parameters.
[edit] 3D Display
Multiview video capture varies from partial (usually about 30 degrees) to complete (360 degrees) coverage of the scene. Therefore it is possible to output stereoscopic views suitable for viewing with a 3D display or other 3D methods.
[edit] Display of FTV
Systems with more physical cameras can capture images with more coverage of the viewable scene, however, it is likely that certain regions will always be occluded from any viewpoint. A larger number of cameras should make it possible to obtain high quality output because less interpolation is needed.
More cameras mean that efficient coding of the Multview Video is required. This may not be such a big disadvantage as there are representations that can remove the redundancy in MVV; such as inter view coding using MPEG-4, the ray space representation, geometry videos etc.
In terms of hardware, the user requires a viewing device that can decode MVV and synthesize new viewpoints, and a 2D or 3D display.
[edit] Standardization
MPEG is currently investigating Multiview Video Coding under a group called '3DAV' (3D Audio and Visual) headed by Aljoscha Smolic [1] at Heinrich-Hertz Institute. This activity falls under ISO/IEC JTC1/SC29/WG11 [2] and is expected to be adopted as part of MPEG-4 when finished. The key technology to be standardized is the specification of the view synthesis engine.
[edit] Conclusions
Free Viewpoint Television is an upcoming video technology that allows users to freely select the viewpoint. FTV is the result of many advances in computer vision so it is reasonable to predict that FTV could be available soon, if the consumer market is ready for it. The current standardization activity is a strong indication that sufficient interest exists and will act as a springboard when finished.
[edit] See Also
- Quicktime VR might be considered a predecessor to FTV.
- iview is a British DTI project between BBC, Snell & Wilcox and University of Surrey to develop an FTV system.
- Eye Vision is a system developed by Professor Takeo Kanade at CMU for CBS's coverage of Super Bowl XXXV. The user is not able to change viewpoint but the camera operator is able to choose any virtual viewpoint by synthesizing images from an active vision system.