Enhanced Vision via Physical Interactions and Models

2 minute read

Published:

The way humans interact with the real world is an intrinsic function in human brains where we naturally obtain information from the visible changes arising from the interactions. For instance, we know that a bottle of transparent liquid is sparkling when we observe an excessive amount of bubbles generated by shaking the bottle. This is only the one of many examples where we sense the objects around us using physical interactions.

In many games empowered by game (physical) engines, designers hide information or rewards behind physical interactions between players and the virtual environment. In detective games like Sherlock Holmes, players obtain necessary pieces of clues after conducting a series of interactions with game objects (e.g., using UV light to uncover hidden information written by invisible ink). This process is an intuitive way for humans to retrieve information and make inference.

With the recent advancement in deep learning and computer vision, researchers attempt to instil human intuition and knowledge into machines. In deep learning, we try to simulate neural networks in human brains and migrate the structure in machines. In computer vision tasks like object detection and activity recognition, we simulate how humans perceive objects and activities, and transfer the same knowledge to machines. All these ideas are fantastic, but a question remains unanswered – since humans have such inherent capability to infer from physical interactions, can machines do the same?

In my line of research, I am trying to answer this question from the vision perspective – machines can uncover hidden information from the physical interactions they see. In our recent work, LiquidHash, we attempt to utilise smartphone cameras to detect counterfeit liquid food products in sealed bottles. LiquidHash relies on flipping the bottle to induce rising bubbles, which are further processed to infer the authenticity of the liquid content. Through LiquidHash, we demonstrate the feasibility of extracting information from what machines see from simple physical interactions performed by human users. Machines can even make inference beyond human capability. With implementation on commodity smartphones that most people would bring along with them everyday, LiquidHash could potentially benefit a large amount of consumers of liquid food products. This work achieves promising results and boosts confidence in my line of research.

We could envision the future where people are empowered by machine intelligence who could see and infer from simple physical interactions performed by either humans or machines themselves. We could achieve sensing capabilities beyond our inherent ones, and I would name this possibility as “Enhanced Vision”.