waplog

Skoltech scientists now able to manipulate with a human gaze on images


Scientists from Skoltech Computer Vision Group headed by professor Victor Lempitsky have developed an algorithm that can change the direction of the gaze in images and video in real time.
 
During video conferences (for example, in Skype) both sides are usually looking at the screen, but not at the camera, leading to a slightly downward-directed gaze. Oftentimes this ruins the feeling of a real conversation. A similar problem is faced by television announcers, who have to simultaneously read the text and look at the audience. This issue is usually solved by means of special technical devices, which are sometimes quite expensive. The general task is referred to as “gaze correction” in the literature.
 
The system developed by Skoltech scientists uses just a simple digital smartphone or laptop camera and doesn’t require any additional equipment. It is based on deep neural networks, a machine learning method that recently led to several breakthroughs in computer vision, speech recognition and natural language processing.
 
Yaroslav Ganin, first author: “First, we localize the eye region in the input frame and compute a set of characteristic points (anchors). This information along with the redirection angle if fed to the deep neural network which produces a so-called “flow field”, i.e. a warping deformation that needs to be applied to the input image to get the corrected one. Hence the name of the method ー DeepWarp.”
 
Victor Lempitsky, head of the Computer Vision group: “This work continues our long-standing project that we have been working on for three years already. Deep learning allowed us to significantly improve the system. Prior to that, we could only make fixed-angle adjustments. With DeepWarp, not only we can redirect to an arbitrary angle but also operate both in horizontal and vertical directions.”
 
Daniil Kononenko, co-author: “Training of a deep neural network such as DeepWarp requires a large amount of data. This is critical for the generalization capability of the model, i.e. good performance of the system in various conditions. Unfortunately, none of the publicly available datasets is of sufficient quality or size. That’s why we decided to create our own dataset and created special equipment and software for this purpose. Data collection has been carried out for several months with the help of Skoltech students and staff. We’ve managed to come up with a quite large training set and thus significantly boost the quality of our gaze correction system.”
 
Diana Sungatullina, co-author: “Speed optimization of the proposed system is another topic for the future work. Now the algorithm works in real-time on GPU,  and we would like to achieve compatible speed on any old laptop without losing quality and universality of the model.”
 
DeepWarp is one of the “deep” image generation projects developed by Victor Lempitsky’s group. Researchers note a great practical potential of this field. For instance, the gaze manipulation project can be used not only for video conferences but also in photo and motion picture industry as a post-processing tool.
 
The results of this research will be presented in October in Amsterdam at the 14th European Conference on Computer Vision.


MTCHT
ICT
TECHNOLOGICAL INNOVATIONS
POST
ABOUT US
NEWS
INTERESTING
INTERVIEW
ANALYSIS
ONLAIN LESSONS