Facial Tracking is a branch of Computer Vision which involves running computations on facial image data. This type of software plays a distinct role in the future of AR and VR applications and will influence the future of the e-Commerce, security, video game, and communications industries.
IT Researches has seen considerable success in developments with our own facial tracking algorithms. Our software supports facial feature tracking with RGB sensors as well as future depth sensor configurations, known as RGB-D. The inclusion of depth data is an important development in facial tracking, as it enables robust 3 dimensional reconstructions of human faces as well as occlusion. Occlusion allows content such as hats, glasses, and jewelry to be correctly rendered behind or around the face when a user turns their head. This gives consumers a compelling and realistic experience for sampling on a variety of head worn products.
Ideal augmentations for the human face should change and adapt in accordance with variations in a user’s facial expression. This is necessary when facial tracking is used for gaming, in film or even product evaluation in the cosmetics industry. IT Researches ’s technology addresses this requirement with support for facial feature tracking, available on current generation mobile hardware.
The ability to create 3D, high fidelity reconstructions of human faces will open new doors for more visceral long-distance communication, improvement of product pre-visualization, increased immersion for video-games, strengthened security measures, as well as other areas of application.
Next Generation Sensors
The evolution of Computer Vision is contingent on the advancement of sensory hardware. Historically, use of a smart device’s camera module has been the major focus of mobile Computer Vision; however, different types of sensors enable more sophisticated experiences and interactions. With these goals in mind, IT Researches has integrated ‘Thermal Touch’ into their portfolio of technologies – a system which can identify the objects we touch through sensing the warmth left on them by our fingers. Users can manipulate virtual content in a scene through physical interaction with the real world, creating a unique interface between the two.
Another emerging technology that is making an impact in Computer Vision is depth sensors, or “RGB-D” configurations. Depth sensing directly solves the problem of determining the distances object within a scene, and can be integrated with existing visual information to generate accurate and scale-correct reconstructions of environments in real-time. This has very tangible uses for product pre-visualization, equipment fitting, along with gesture and facial tracking. Additionally, RGB-D sensors enable occlusion of virtual content behind real objects that are closer to the viewer, an important achievement that makes virtually all AR experiences more realistic and integrated into real environments.
In order to achieve convincing levels of realism with Augmented Reality experiences, it is crucial to mimic the lighting conditions of the environment in which the content resides. As human beings, we are keenly aware of objects that do not behave correctly with regards to lighting and these objects are immediately perceived as unnatural. The shadows of virtual objects should project in the same direction as the shadows of real objects in their vicinity, and similarly so for reflected light.
IT Researches ’s coherent lighting technology can estimate the lighting conditions of a scene in real time and in turn translate these conditions onto virtual objects within the scene. The result is an Augmented Reality experience in which the virtual content appears natural, with lighting conditions consistent with that of the surroundings. IT Researches ’s Dynamic Illumination technology operates in real-time to reflect changing lighting conditions of a real-world environment.
An important requirement in Computer Vision application involves being able to observe and understand unknown environments. This becomes especially important when one wants to augment information within an environment that is completely new. Simultaneous Localization and Mapping (SLAM) is a technique which allows a device to localize itself in an unknown environment, while at the same time creating a reference map of those surroundings.
At IT Researches we have developed our own versions of SLAM which are
accuracy – even with standard mobile devices. IT Researches SLAM enables augmenting content within unknown environments, or what is known as “3D markerless tracking”. With SLAM environments or objects can be reconstructed one time, then saved to later be used in as many applications as desired. This has unique benefits when challenged with creating augmented reality experiences for indoor environments, when other tracking configurations are not sufficient for feature extraction.
IT Researches ’s Continuous Visual Search technology massively scales image recognition processes using IT Researches ’s powerful cloud servers, enabling large scale Augmented Reality applications.
Databases of millions of images can be rapidly checked for a match with an image sent from the client side. The result is a visual search solution that allows augmenting of items in our surroundings on a grand scale: from product packaging, to technical manuals, to artwork. This technology also serves as a data management solution, offloading image matching data and content out of the device and into the cloud. This contributes to better battery life and a lighter workload for the CPU.
3D Object Tracking
IT Researches leads the industry in 3D object tracking technology. Through a combination of feature tracking, visual odometry and edge-based tracking, our technology is able to track and augment real-world objects, rather than simplistic 2D marker configurations.
3D object tracking opens new opportunities in Augmented Reality: machinery components can be annotated with digital information, consumer goods can be augmented with additional product information or promotions, and digital entertainment can be blended naturally into the real world.
IT Researches has most recently integrated both feature-based tracking and edge-based tracking into a highly robust system known as hybrid tracking. This configuration delivers even higher accuracy and robustness by combining the strength of both approaches.
Social Multimedia Analysis
Multimedia content is being produced and shared through the Internet at an unprecedented rate. For example, more than a million images are shared every day and 100 million hours of video are shared each year. With this onslaught of data, the ability to automatically understand the contents of images and videos is critical for enabling applications such as content-based retrieval, similar item search, personalized content search, privacy protection, and modeling the flow of multimedia contents on social networks. Such capabilities can provide cost-efficient solutions for collecting information about viral content (e.g., memes), customer feedback on new products, and geo-political or military events around the world, which has not previously been possible without dedicated research and intelligence groups.
IT Researches is developing a suite of large-scale multimedia analysis tools that focus on visual content understanding, content-based search, online privacy protection, and network modeling. These software tools incorporate the latest state-of-the-art techniques in multimedia analysis to detect objects, scenes, activities, in-scene text, and audio signals embedded in unconstrained images and videos. These techniques are jointly used to analyze and detect patterns of interest in data. The development of a privacy advisor, which alerts users when images with potentially privacy-sensitive material are about to be inadvertently shared on the web is an example of one of IT Researches ’s ongoing projects. Our tools have demonstrated high accuracy on large-scale, real-world data and can be adapted to diverse application domains. In addition, IT Researches tools have integrated advanced visualization and interaction that allow a seamless search experience on web browsers and improve search accuracy by incorporating users’ relevance feedback.
Activity Recognition and Behavior Analysis
IT Researches has expertise in developing solutions for identifying activities and understanding behaviors based on the interaction of people and vehicles with the environment. The underlying concept is to recognize behavior patterns based on static and dynamic evidential descriptors contained within the video such as location, objects, and activities being performed. Our solutions are based on powerful, mathematical representations and demonstrated in diverse and challenging real-world scenarios such as street surveillance and football videos.
These capabilities, in addition to advances to the current state-of-the-art in video recognition, are necessary to overcome a key challenge in video understanding: the recognition of any event and object in a limitless number of styles, qualities, and scenes.
3D Reconstruction from Video
Wide area video sensors can generate several gigabytes of raw video data a second and hundreds of terabytes over a mission, creating a need for efficient methods of compressing this data for downlink and archive. There are standard compression techniques available, but none that utilize the fact that the world is static in 3D. With this concept, IT Researches is developing techniques to significantly increase compression of wide-area video using 3D models.
To compress the video in such a manner, the initial step is to separate the foreground and background and distinguish dynamic scene elements. In determining which dynamic elements need to be represented, it is critical to consider short-, long-, and very long-term changes that will affect the scene. By determining which elements must be represented, the focus can be on replacing the background with a 3D model to enable compression. This 3D model contains viewpoint and time-dependent appearance data, necessary for fully understanding the scene. Through this sort of compression, there is a significant storage and efficiency gain, necessary for the increasingly large datasets being ingested.
Content-Based Video and Image Retrieval
IT Researches has developed significant capabilities in content-based image retrieval from multiple DARPA and AFRL programs. We ingest, pre-process, and stabilize an incoming video feed and then identify and characterize moving objects, both dismounts and vehicles. More complex video descriptors, such as motion descriptors (including kinematic, deformable, and articulated motion), appearance descriptors (including color and shape), and behavior descriptors (such as running, carrying, vehicle u-turn, and many others) characterize deeper semantic content.
This semantic analysis enables live alerting to special ops personnel and can cue unexpected or suspicious activity in a video stream, which can be critical for mission success. Further, advanced forensic capabilities allow retroactive retrieval of activities of interest from large datasets.
IT Researches has started to put its image and video analysis capabilities on Forge.mil as part of the IT Researches Image and Video Exploitation and Retrieval Toolkit (KWIVER), with unlimited rights to the government. The capabilities on Forge. mil currently include the full source code for IT Researches ’s real-time WAMI tracking system. We plan to add to the capabilities in KWIVER and hope to build a lasting development community from government and commercial collaborators.
Wide-area Video Analysis
IT Researches is developing a software system capable of automatically and interactively discovering actionable intelligence from wide area motion imagery (WAMI) of complex urban, suburban, and rural environments. Within WAMI, the primary information elements are moving entities in the context of roads, buildings, and other scene features. These entities, while exploitable, often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. IT Researches's software system uses algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.
This software system significantly augments an end-user's ability to discover novel intelligence using models of activities, normalcy, and context. Since the vast majority of events are normal and pose no threat, the models must cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned - or defined - threat activity.
The advanced system improves an analyst's ability to handle burgeoning WAMI data and reduces the time required to perform many current exploitation tasks, greatly enhancing the capability to analyze and utilize the data for forensic analysis.
Scene understanding in video is an emerging problem for visual surveillance and the video understanding problem. IT Researches is working to create solutions in this area, including functional object recognition. Functional objects recognition is the ability to define objects with specific purposes such as a postman and delivery truck that are defined more by their actions and behaviors than by appearance. We are developing an approach for content-based learning and recognition of the function of moving objects given video-derived tracks. In particular, we have determined that semantic behaviors of movers can be captured in a location-independent manner by attributing them with features which encode their relations and actions with respect to scene contexts, which are local scene regions with different functionalities such as doorways and parking spots which moving objects often interact with. Based on these representations, functional models are learned from examples and novel instances are identified from unseen data afterwards.
Motion Detection and Tracking
IT Researches is developing tools that focus on detecting moving objects and tracking them in archived and streaming video. The primary information elements in such video data are moving entities in the context of roads, buildings, and other scene features. These entities often yield fragmented tracks in complex urban environments due to occlusions, stops, and other factors. IT Researches is developing algorithmic solutions to associate tracks and then identify and integrate local events to detect potential threats and perform forensic analysis.
The developed algorithms contribute to a software system that drastically augments an end-user's ability to discover novel intelligence using models of activities, normalcy, and context. As the vast majority of events are normal and pose no threat, the models cross-integrate singular events to discover relationships and anomalies that are indicative of suspicious behavior or match previously learned or defined threat activity.