General Description

The Skeleton-based Hand Recognition in the Wild track aims to offer a challenge for methods and algorithms designed to recognize hand gestures. The focus, in particular, will be on detecting and localizing correctly, both in time and by class, gestures that are part of a squence. While we already published a dataset with a dictionary of 13 gestures and 72 sequences (SFINGE 3D), in this year's track, we further enriched the challenge by mixing in the sequences semi-random hand movements between actual gestures (labeled as non-gestures).
For further details read on to the Dataset section.

Dataset

The dataset is made of 144 sequences of gestures. Each sequence contains from 3 to 5 gestures padded by semi-random hand movements labeled as non-gesture. The dictionary is made of 18 gestures divded in static gestures, characterized by a static pose of the hand, and dynamic gestures, characterized by the trajectory of the hand and its joints. The list of the gestures in the dictionary is the following.

Hand joints recorded with the LeapMotion sensor
Static Gestures
- ONE
- TWO
- THREE
- FOUR
- OK
- MENU
- POINTING
Dynamic Gestures
- LEFT
- RIGHT
- CIRCLE
- V
- CROSS
- GRAB
- PINCH
- TAP
- DENY
- KNOB
- EXPAND

The sensor used to capture the trajectory is a LeapMotion recording at 50fps. Each file in the dataset represents one of the sequences and each row contains the data of the hand's joints captured in a single frame by the sensor. All data in a row is separated by a semicolon. Each line includes the position and rotation values for the hand palm and all the finger joints.
The line index (starting from 1) coresponds to the timestamp of the corresponding frame. The structure of a row is summarized in the following scheme:

palmpos(x;y;z); palmquat(x,y,z,w); thumbApos(x;y;z); thumbAquat(x;y;z;w); thumBpos(x;y;z); thumbBquat(x;y;z;w); thumbEndpos(x;y;z); thumbEndquat(x;y;z;w); indexApos(x;y;z); indexAquat(x;y;z;w); indexBpos(x;y;z); indexBquat(x;y;z;w); indexCpos(x;y;z); indexCquat(x;y;z;w); indexEndpos(x;y;z); indexEndquat(x;y;z;w); middleApos(x;y;z); middleAquat(x;y;z;w); middleBpos(x;y;z); middleBquat(x;y;z;w); middleCpos(x;y;z); middleCquat(x;y;z;w); middleEndpos(x;y;z); middleEndquat(x;y;z;w); ringApos(x;y;z); ringAquat(x;y;z;w); ringBpos(x;y;z); ringBquat(x;y;z;w); ringCpos(x;y;z); ringCquat(x;y;z;w); ringEndpos(x;y;z); ringEndquat(x;y;z;w); pinkyApos(x;y;z); pinkyAquat(x;y;z;w); pinkyBpos(x;y;z); pinkyBquat(x;y;z;w); pinkyCpos(x;y;z); pinkyCquat(x;y;z;w); pinkyEndpos(x;y;z); pinkyEndquat(x;y;z;w)

where the joint positions corresponds to those reported in the image above. Each joint is therefore characterized by 7 floats, three for position and four for the quaternion, the sequence starts from the palm, then the thumb with three joints ending with the tip and then the other four fingers with four joints each ending with the tip.

Download Links
Example of unlabeled sequences with multiple gestures
Traning set (labeled)
Test set

Task and Evaluation

Participants should develop methods to detect and classify gestures of the dictionary in the test sequences. The developed methods should process data simulating an online detection scenario or, in other words, detect and classify gestures progressively by processing trajectories from beginning to end. The results format should consists of a row for each sequence with the following information: sequence number (the id of the sequence in the test set provided by the filename), identifed gesture label (see the dictionary), detected start of the gesture (frame/row number in the sequence file), detected end of the gesture (frame/row number in the sequence file). For example, if the algorithm in the first sequence detects a PINCH and a ONE and in the second a THREE, a LEFT and a KNOB, the first two lines of the results file will be like the following ones:

1;PINCH;12;54;ONE;82;138;
2;THREE;18;75;LEFT;111;183;KNOB;222;298;
The evaluation will be based on the number of correct gestures detected, the false positive rate and the accuracy of the start and end frame detected.

How to participate

To join the contest you can send an email to either andrea.giachetti(at)univr.it or ariel.caputo(at)univr.it stating your affiliation and your intention to participate. Results should also be submitted via mail in a text file (NAME_SURNAME.txt) up to a maximum 3 files with results from different methods or different variants of the proposode method. Executable code performing the classification should be provided with the submission with instruction for its use as well as a description of the method in a LaTeX format for the submission of the track report.

Timeline

February 1
Dataset available
February 7
Registrations deadline
March 1
Results submission deadline
March 15
Submission of the track report to the reviewers