A new format for optimized representation of moving HDTV-pictures (Stand 1993)

September 1993 von Th. Herfet, B. Wendland - Lehrstuhl Nachrichtentechnik - Universität Dortmund - Contribution to the 1993 ISBT - Beijing, V.R. China

Über die Bewegugserkennung direkt in der Kamera zwecks optimaler Bildkompression.

Abstract

The paper presents a new representation of moving HDTV pictures. Additionally to the picture contents supplementary information is transmitted to generate synthetical pictures at any temporal position desired. Amongst the optimization of die camera a format called MCTV (motion controlled television) is introduced to attain direct compatibility to existing studio equipment as well as to 35mm film. The integration of motion information furthermore supports highly sophisticated signal processing techniques as format- & scanconversion, noise reduction or slow motion.

I. Introduction

HDTV systems have been introduced all over the world: The Japanese MUSE signal [Ono90] can be received several hours a day, the european HDMAC system [Vre89] has been used for the 1992 Olympic games at around 600 viewing sites and the FCC is on her way to test several proposals for the digital transmission of HDTV [Moe91] in the USA.

The synonym HDTV in the above sense directly refers to the underlying production standards. The parameters spread from 787.5 lines and 1280 pels to 1250 lines and 1920 pels. The temporal sampling rate varies from 60 over 59.94 to 50 Hz.

From this we can derive the necessity of high quality format- & scanconversion to achieve as many available HDTV productions as possible. Keeping in mind, that, due to the high costs for HDTV displays, at least for the introductory phase most of the high resolution material will be presented via 35mm film, also the tape to film transfer should be supported.

II. Motion vector based signal processing

To avoid judder, most of die proposed systems for the conversion between different HDTV formats generate the correct temporal field position with the help of motion vector based picture interpolation (fig. 1, fig. 2).

Several investigations to the characteristics of the human visual sense have been done. It has been shown, that for motion vector based picture interpolation a sampling rate of about 10 Hz for the motion information has to be provided. The resulting constant velocities in each 100 ms intervall do not lead to artefacts [Ton85-Jue88]. For the MCTV system, a sampling rate of 25 Hz for the supplementary information has been established. For the correct interpolation, the following conditions have to be satisfied:

The vectors have to represent the true object trajectories. This differs from the common usage of motion vectors in digital picture coding, where merely the differential energy has to be minimized.
Covered and uncovered background has to be detected and dealt with adequately. This implies the analysis of more than two pictures per motion vector field.
Deviations of the illumination like flicker or shadows have to be regarded.

In the following chapters, a camera concept paying respect to these conditions is derived and the developed algorithms for the generation of high quality motion vector information are presented.

III. An optimized camera concept

The influence of aliasing:
To correctly estimate the object trajectories, proper filtering has to be adopted to suppress aliasing, fig. 3 shows the influence of temporal aliasing with the well known example of the virtually backwards rotating wheel. The better die algorithm will work, the most probably it will detect the wrong translation.

But also spatial aliasing will lead to estimation errors, because those parts of the frequency spectrum folded down into the baseband will cause spatial deformation and thus e.g. will change the position of a transition. The influence of aliasing on motion estimation has been the subject of intense investigations [Hol92].

In conclusion, for the correct determination of object trajectories the amount of frequencies in the spatio-temporal space free of aliasing has to be maximized. Regarding translatory movements this leads to the condition:

Maximizing frequency components free of aliasing means maximizing the area beyond the hyperbolic function given in (1) and results in fig. 4. The maximum velocity has been chosen to be 0.8 pw/s, the spatial resolution in horizontal direction equals the vertical one with respect to an aspect ratio of 16:9.

Obviously a choice of 1/T = 300 Hz is the best that can be done. The spatial resolution in this case is 1/12 (1/3.5 in each direction) compared with an HDTV ca-mera working with 1250/50/2:1 and thus is too low to directly deliver an HDTV signal. To maintain as well high resolution HDTV pictures as an optimal generation of motion information the camera concept shown in fig. 5 has been applied.

The HDTV signal is recorded with 1250/25/1:1 to preserve the vertical correlations necessary lor high quality formal- & scanconversion while the supplementary path delivers spatially low but temporally high resoluted progressive images with e.g. 360/300/1:1. To be able to demonstrate the quality of MCTV signals a prototype has been realised at the University of Dortmund. Due to the necessity of using avaible pick up devices this prototype MCTV camera works with 1250/25/1:1 for the HDTV signal and 312/100/1:1 for the supple-mentary path.

Furthermore adding a complete sensor means splitting off a part of the incoming light and thus causes a loss of sensivity (the realised supplementary CCD sensor uses 30% of the incoming light). This can be overcome in the future by the usage of modified HDTV-CCD sensors. Enhancement of the temporal sampling rate by a factor of 4 in this case could be achieved with an increased CCD storage area (factor 1.75) and sensitivity is preserved by motion vector based integration over four frames.

In the next chapters the signal processing to adequately pay respect to this new signal format will be described. All of the simulation results refer to the MCTV prototype camera.

IV. MCTV motion estimation

The main tasks of the applied motion estimator are:

- Detection of the true object trajectories. For a two dimensional signal this means correctness of as well the horizontal as the vertical components.
- Provision of motion Information with an accuracy of ±0.5 pels respectively ±0.5 lines refered to the HDTV picture.
- Correct estimation of die translatory components also for more complex movements like zoom or rotation.

The first condition leads to a structure based algoritlim to overcome die aperture problem shown in fig. 6. To correctly estimate die horizontal and the vertical component, enough high frequency energy in both directions is necessary. This is given for curved transitions.

The measurement points are selected by evaluating the curvature.
sx, sy: derivatives of the luminance,. Ox, Oy: derivatives of the gradient direction.
To suppress noise effects, the spatial derivatives are evaluated for a block of pix-els by a surface approximation of second degree [Bea79].

The applied motion estimator is shown in fig. 7. The 10 ms measurement is carried out by a hierarchical block matching algorithm. The velocity range has been chosen to be ±6 pel/10 ms horizontally and ±4 lines/10 ms vertically. This refers to 64 pels/40 ms with respect to the HDTV picture with 1920 pel/line. To provide subpixel vectors (±0.5 pel/40 ms) the image is interpolated with a high resolution spline filter prior to the maximum evaluation.

Due to the structure based measurement the block size can be made small (from 9-5 to 3-3). Thus also for more complex movements like zoom or rotation the correct translations can be measured.

The verification takes advantage of the supplementary sensor's high temporal sampling rate. The measurement points are traced over each 40 ms interval. Only those trajectories describing a constant movement or constant acceleration over this interval are chosen. This verification causes the suppression of last changing "chaotic" movements and therefore leads to much more homogenous vectorfields. The left path shown in fig. 7 delivers low priority vectors in case of not enough curvature. These vectors also maintain high interpolation quality but should not be evaluated for parameter deter-mination, because there's no guarantee for both spatial directions being correct.

Fig. 8 shows the measured horizontal component for a test picture with a train moving diagonally over the picture in comparison to a standard block matching algorithm over 40 ms. The block size has been chosen to 9.5 pels.

V. Parameter estimation

To further reduce the bandwidth of the supplementary signal global zoom and rotation are determined and transmitted in form of a four parameter model.

x v: horizontal and vertical component of the translatory part and center point for global rotation and zoom, a 3.4 zoom and rotation parallel to the target plain.

The determination is carried out by a two step linear regression. In the first step, the irregular vector field of the motion estimator is regulated to a field of one vector for the central point of a 32-16 block. Only those blocks highly correlated are preserved to suppress local translations. The second step correlates these remaining vectors to fit the four parameter model. Simulations show, that the accuracy of the zoom parameter - and thus the remaining translatory parameter - is 0.15 pel/40 ms.

VI. Local assignment

At this point, global parameters have been detected and very reliable motion vectors have been found for selected pixels. These candidate vectors now are to be assigned to the pixels of the supplementary CCD sensor's image. The assignment algorithm has to pay respect to the detection of coverage and uncoverage as well as to the recognition of changing luminance due to variations in lightning or objects entering respectively leaving shadowed areas. Fig. 9 illustrates the way the assignment is carried out.

The algorithm is based on the linematching technique introduced in [Hou91]. The additional pictures resulting from the high temporal sampling rate are used for the improvement of the detection of coverage and uncoverage. The DFD (displaced frame difference) is calculated over lines of pixels with length v and orientation -v. The number of pixels in each line depends on the velocity and is 5 for Ivl < 5 pel/40 ms and 10 for Ivl > 5 pel/40 ms refering to 720 pel/line. Prior to the DFD calculation variing lightning is compensated by considering the linear interpolation of the luminance between the first and the last match point.

The match line extends into the moving object (orientation -v!) and thus a reliable recognition of cov-erage and uncoverage is achieved. If the DFD does exceed a threshold, the length of the match line is further reduced to detect very small, fast moving objects. At the time of finishing this manuscript including two pixels orthogonal to the direction of v is under investigation. This leads to a match cross and is expected to lower the threshold and therefore increase the reliability of the assignment.

VII. Coding

The block diagram fig. 5 already introduced the idea to transmit the supplementary signal within half of the vertical blanking interval to enable compatible distribution of MCTV signals in the studio. For a digital HDTV-VCR this refers to a data rate of [BTS92]

This is achieved by a two step coding technique. The 1 irst stage is the modified runlcngth coding algorithm shown in fig. 9. The points marked wilh N (new) enforce the specification of a motion vector while for those marked with O (old) the vector from directly above is taken. Due to the nonuniform probability distribution the runs are Huffman-coded.

Mostly motion vectors do appear in more than one run. This is caused by the form of the object contour or even by two objects moving with the same speed and direction. To further reduce the transmission rate, an optimized segmentation algorithm has been implemented (fig. 10) [Sch92]. An a posteriori decision on the collection of motion vectors is taken. If the vectors are collected, they form a table that has to be indexed within the runs. Thus in case the resulting data rate lies below that of direct specification of the motion vector within the run, the segmentation is prefered.

The algorithm has been simulated with several sequences shot with the prototype MCTV camera developed at the University of Dortmund, liven for a sequence with over 60 different relevant motion vectors the resulting data rate is below 0.4 bit/pet.

VIIL Multiplexing

To compatibly distribute the MCTV signal within the studio and to enable direct monitoring with standard HDI equipment, the HDTV signal is synthetically interlaced. The interlacer can be motion vector based. In this case, truncation of the vertical vector component enables reversible interlacing without introducing disturbing judder. The exact motion information is then inserted into the vertical blanking interval (occupying about half of the capacity, see chapter VII).

IX. Applications

A lot of video signal processing techniques directly profit from high quality motion vector fields. Besides the tape to film transfer, which at least in the introductory phase will be the dominant mean to distribute and present HDTV productions, high quality scan conversion is dependant on reliable motion information.

The expense of highly sophisticated processing techniques like motion vector based noise reduction, judder free slow motion or interlrame error concealment for recording and transmission will decrease drastically if the necessary motion information is generated directly in the camera.

The MCTV system is a contribution to the worldwide discussion concerning generation and distribution of motion information in the studio. The proposed system enables a compatible introduction into the existing HDTV studio equipment.

X. Conclusions

In this paper a new representation of moving HDTV pictures has been presented. To enable the generation of reliable motion information, detection of coverage and uncoverage and variations in lightning an HDTV camera has been supplied with a supplementary CCD sensor with increased temporal sampling rate. From this camera, the MCTV (motion controlled television) signal is generated. This signal format is based on HDTV pictures with 1250/50/2:1 (synthetically interlaced). A supplementary signal transmitted and recorded in the vertical blanking interval delivers motion information for the generation of intermediate HDTV pictures in any temporal position desired.

The paper developes the optimal parameters for the additional sensor and introduces several new algorithms (motion measurement, local assignment, coding of vectorfields) adopted to the new format.

The MCTV format is applicable to most of the standard HDTV equipment and enables high quality format- & scanconversion as well as sophisticated signal processing techniques like motion vector based noise reduction slow motion etc..

XI. Acknowledgement

The work has been supported by the 'Bundesminister für Forschung und Technologie' within the projekt titled 01BK004. The contents is at the authors responsibility.

XII. References

[Ono90] Y. Ono; "HDTV and today's braodcasiin" world", SMPTE Journal, 1/1990, pp.4-15
|VreX9] F.W.P. Vreeswijk; "HD-MAC-Coding for MAC compatible broadcasting of HDTV si-gnals, Proc. of 3rd International Workshop 00 HDTV, Turin, 1989
[Moe91] G.Moll; "Symposium 'Volldigitales Hoch-zeilenfernsehen1", Rundfunktechnische Mit-teilungen, H. 3, Jahrgang 35, 1991, pp. 138-141
[Ton85] G. Tonge; "Television Motion Portrayal", Contribution to the 'Les Assises des Jeunes Chercheurs1, Rennes, France, 1985
[Mil62] J.W. Miller and E. Ludvigh; "The Effect of Relative Motion on Visual Acuity", Surv. Opthal. 7, 1962, pp. 83-116
[Bro72] B. Brown; "Dynamic Visual Acuity, Eye Movements and Peripheral Acuity for Mov-ing Targets", Vision Res. 12, 1972, pp. 305-321
[Fuj85] T. Fujio; "Optimization of the High-Defini-tion Television System Viewed from the Vi-sual System", SMPTE HDTV-W.G. Psycho-physics, 1985
[Jue88] R. Jürgens, A.W. Kornhuber, W. Becker; "Prediction and strategy in human smooth pursuit eye movements", in: Eye movement research: physiological and psychological aspects, G. Liier, U. Lass & J. Shallo-Hoff-mann (eds.), 1988, pp. 55-75
[Hol92J Th. Hollmann; "Optimized generation of motion information for HDTV", Contribu-tion to the "Club de Rennes", Tokyo, 1992
[Bea79] P.R. Beaudet; "Rotational invariant image operators", International Joint Conference on Pattern Recognition, Kyoto, 1979
IBTS92J BTS; "Specifications for the DCH 1000 digi-tal HDTV-VCR", 1992
[Sch92] U. Schmitz; "Coding of Motion Information for a 25 Hz Production Standard", EU95-PG()2-WPMI-document 124 (Study Report), 2/1993
[Hou91] P. Hou; "Motion estimation for high quality picture interpolation", Proceedings of 4th In-ternational Workshop on HDTV and beyond, Torino, Italy, 9/1991