Interest in Extended Reality (XR) technologies has grown in recent years. Companies and researchers are focusing on the fields of applications and improving user interaction, with a particular focus on meta-user interface design. In this field, applications range from home automation to professional fields like surgery and manufacturing. Interaction can take place through various modalities, such as voice, touch, and hand gestures. Hand-gestural interaction has become more relevant in recent years, particularly due to the need for touchless interaction due to the COVID-19 pandemic. It is considered a natural interaction and allows users to feel present in the digital world through "direct manipulation". Recognizing hand gestures in realtime from video streams is difficult because it's hard to know when a gesture starts and ends. Scaling up recognition performance and the possibility of encountering unknown gestures also pose challenges. These challenges can impact the design of gestural interactions, which is closely connected to the effort needed for making XR systems recognize gestures and their precision, leading to a poor user experience. In this paper, we propose a real-time on-device Hand Gesture Recognition system that can be used in XR applications. It can handle static and dynamic gestures, with one or two hands, also considering egocentric perspective, making it usable with various devices, from expensive Smart Glasses to more affordable smartphones and laptops. The system uses well-known datasets, such as EgoGesture and Jester, and splits the gesture recognition task into two sub-tasks: identifying the hand skeleton of a human from a single RGB camera through Mediapipe Hands and then recognizing gestures using the detected hand skeleton. To extend the available datasets, we propose a procedure for generating large synthetic video datasets for hand gestures, as well as behavioural trees for generating variations of the acquired gestures. This approach saves time and effort spent on recording and annotating thousands of real videos, allowing greater flexibility in gesture design and envisioning XR applications involving intuitive and richer interactions, increasing user experience.
HAND GESTURE RECOGNITION USING RECURRENT NEURAL NETWORKS AND SYNTHETIC DATA GENERATION
Romano, Marco
2023-01-01
Abstract
Interest in Extended Reality (XR) technologies has grown in recent years. Companies and researchers are focusing on the fields of applications and improving user interaction, with a particular focus on meta-user interface design. In this field, applications range from home automation to professional fields like surgery and manufacturing. Interaction can take place through various modalities, such as voice, touch, and hand gestures. Hand-gestural interaction has become more relevant in recent years, particularly due to the need for touchless interaction due to the COVID-19 pandemic. It is considered a natural interaction and allows users to feel present in the digital world through "direct manipulation". Recognizing hand gestures in realtime from video streams is difficult because it's hard to know when a gesture starts and ends. Scaling up recognition performance and the possibility of encountering unknown gestures also pose challenges. These challenges can impact the design of gestural interactions, which is closely connected to the effort needed for making XR systems recognize gestures and their precision, leading to a poor user experience. In this paper, we propose a real-time on-device Hand Gesture Recognition system that can be used in XR applications. It can handle static and dynamic gestures, with one or two hands, also considering egocentric perspective, making it usable with various devices, from expensive Smart Glasses to more affordable smartphones and laptops. The system uses well-known datasets, such as EgoGesture and Jester, and splits the gesture recognition task into two sub-tasks: identifying the hand skeleton of a human from a single RGB camera through Mediapipe Hands and then recognizing gestures using the detected hand skeleton. To extend the available datasets, we propose a procedure for generating large synthetic video datasets for hand gestures, as well as behavioural trees for generating variations of the acquired gestures. This approach saves time and effort spent on recording and annotating thousands of real videos, allowing greater flexibility in gesture design and envisioning XR applications involving intuitive and richer interactions, increasing user experience.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.