Google AI has been working on frame interpolation and has introduced a new neural network, Frame Interpolation for Large Motion (FILM). Frame interpolation is the process of synthesizing in-between images from pre-existing ones. The technique is frequently used for temporal up-sampling to accelerate video refresh rates or produce slow-motion effects.
Google published “FILM: Frame Interpolation for Large Motion” at the ECCV 2022, presenting a new technique to generate high-grade slow-mo videos from duplicate images. FILM is efficient for both large and small motions with state-of-the-art outcomes.
Google iteratively invoked the model to output in-between images at the inference moment.
The FILM model generates a middle image from two input images. There are three parts to the FILM model:
- A feature extractor uses deep multi-scale (pyramid) features to summarise each input image.
- A bi-directional motion estimator calculates pixel-wise motion (i.e., flows) at each pyramid level.
- A fusion module that generates the final interpolated image.
Typically, multi-resolution feature pyramids and hierarchical motion estimates are used to accommodate significant motion. Small and swiftly moving items challenge this technique as they tend to vanish near the pyramid’s base.
The above components help solve this problem by using a shared motion estimator and creating a network with fewer weights. Shared weights increase the number of pixels available for large motion supervision by enabling the interpretation of minor motions at deeper levels to be the same as large motions at shallow levels.
Following feature extraction, FILM uses pyramid-based residual flow estimates to determine the flows from the center image—which has not yet been predicted—to the two inputs. The model aligns the two feature pyramids after estimating the bi-directional flows. Stacking the two aligned feature maps, the bi-directional flows, and the input images at each pyramid level create a concatenated feature pyramid.