Date:26/08/16
During video conferences (for example, in Skype) both sides are usually looking at the screen, but not at the camera, leading to a slightly downward-directed gaze. Oftentimes this ruins the feeling of a real conversation. A similar problem is faced by television announcers, who have to simultaneously read the text and look at the audience. This issue is usually solved by means of special technical devices, which are sometimes quite expensive. The general task is referred to as “gaze correction” in the literature.
The system developed by Skoltech scientists uses just a simple digital smartphone or laptop camera and doesn’t require any additional equipment. It is based on deep neural networks, a machine learning method that recently led to several breakthroughs in computer vision, speech recognition and natural language processing.
Yaroslav Ganin, first author: “First, we localize the eye region in the input frame and compute a set of characteristic points (anchors). This information along with the redirection angle if fed to the deep neural network which produces a so-called “flow field”, i.e. a warping deformation that needs to be applied to the input image to get the corrected one. Hence the name of the method ー DeepWarp.”
Victor Lempitsky, head of the Computer Vision group: “This work continues our long-standing project that we have been working on for three years already. Deep learning allowed us to significantly improve the system. Prior to that, we could only make fixed-angle adjustments. With DeepWarp, not only we can redirect to an arbitrary angle but also operate both in horizontal and vertical directions.”
Daniil Kononenko, co-author: “Training of a deep neural network such as DeepWarp requires a large amount of data. This is critical for the generalization capability of the model, i.e. good performance of the system in various conditions. Unfortunately, none of the publicly available datasets is of sufficient quality or size. That’s why we decided to create our own dataset and created special equipment and software for this purpose. Data collection has been carried out for several months with the help of Skoltech students and staff. We’ve managed to come up with a quite large training set and thus significantly boost the quality of our gaze correction system.”
Diana Sungatullina, co-author: “Speed optimization of the proposed system is another topic for the future work. Now the algorithm works in real-time on GPU, and we would like to achieve compatible speed on any old laptop without losing quality and universality of the model.”
DeepWarp is one of the “deep” image generation projects developed by Victor Lempitsky’s group. Researchers note a great practical potential of this field. For instance, the gaze manipulation project can be used not only for video conferences but also in photo and motion picture industry as a post-processing tool.
The results of this research will be presented in October in Amsterdam at the 14th European Conference on Computer Vision.
Skoltech scientists now able to manipulate with a human gaze on images
Scientists from Skoltech Computer Vision Group headed by professor Victor Lempitsky have developed an algorithm that can change the direction of the gaze in images and video in real time.During video conferences (for example, in Skype) both sides are usually looking at the screen, but not at the camera, leading to a slightly downward-directed gaze. Oftentimes this ruins the feeling of a real conversation. A similar problem is faced by television announcers, who have to simultaneously read the text and look at the audience. This issue is usually solved by means of special technical devices, which are sometimes quite expensive. The general task is referred to as “gaze correction” in the literature.
The system developed by Skoltech scientists uses just a simple digital smartphone or laptop camera and doesn’t require any additional equipment. It is based on deep neural networks, a machine learning method that recently led to several breakthroughs in computer vision, speech recognition and natural language processing.
Yaroslav Ganin, first author: “First, we localize the eye region in the input frame and compute a set of characteristic points (anchors). This information along with the redirection angle if fed to the deep neural network which produces a so-called “flow field”, i.e. a warping deformation that needs to be applied to the input image to get the corrected one. Hence the name of the method ー DeepWarp.”
Victor Lempitsky, head of the Computer Vision group: “This work continues our long-standing project that we have been working on for three years already. Deep learning allowed us to significantly improve the system. Prior to that, we could only make fixed-angle adjustments. With DeepWarp, not only we can redirect to an arbitrary angle but also operate both in horizontal and vertical directions.”
Daniil Kononenko, co-author: “Training of a deep neural network such as DeepWarp requires a large amount of data. This is critical for the generalization capability of the model, i.e. good performance of the system in various conditions. Unfortunately, none of the publicly available datasets is of sufficient quality or size. That’s why we decided to create our own dataset and created special equipment and software for this purpose. Data collection has been carried out for several months with the help of Skoltech students and staff. We’ve managed to come up with a quite large training set and thus significantly boost the quality of our gaze correction system.”
Diana Sungatullina, co-author: “Speed optimization of the proposed system is another topic for the future work. Now the algorithm works in real-time on GPU, and we would like to achieve compatible speed on any old laptop without losing quality and universality of the model.”
DeepWarp is one of the “deep” image generation projects developed by Victor Lempitsky’s group. Researchers note a great practical potential of this field. For instance, the gaze manipulation project can be used not only for video conferences but also in photo and motion picture industry as a post-processing tool.
The results of this research will be presented in October in Amsterdam at the 14th European Conference on Computer Vision.
Views: 621
©ictnews.az. All rights reserved.Similar news
- The mobile sector continues its lead
- Facebook counted 600 million active users
- Cell phone testing laboratory is planned to be built in Azerbaijan
- Tablets and riders outfitted quickly with 3G/4G modems
- The number of digital TV channels will double to 24 units
- Tax proposal in China gets massive online feedback
- Malaysia to implement biometric system at all entry points
- Korea to build Green Technology Centre
- Cisco Poised to Help China Keep an Eye on Its Citizens
- 3G speed in Azerbaijan is higher than in UK
- Government of Canada Announces Investment in Green Innovation for Canada
- Electric cars in Azerbaijan
- Dominican Republic Govt Issues Cashless Benefits
- Spain raises €1.65bn from spectrum auction
- Camden Council boosts mobile security