NASA is seeking novel solutions to label and identify spacesuit motions from conventional and readily available video and photographs to overcome current system limitations in terms of cost and training feasibility.
A spacesuit has unique movement patterns that can be observed during spacewalks or Extravehicular Activities (EVA), and mobility assessments are needed to discern and mitigate suit injury risk. It is very difficult to measure spacesuit motion in uncontrolled environments such as training facilities, and a novel method is needed to quantify spacesuit motions from conventional and readily available video and photographs without requiring or needing motion capture cameras. Once validated for the accuracy and reliability of the posture extractions, the selected system will be deployed to estimate the EVA postures in current and future missions and analog training events. The framework will be applicable to Neutral Buoyancy Lab (NBL) and Active Response Gravity Offload System (ARGOS) testing, which will help to optimize training procedures. Additionally, the winning solution will be tested and validated on video recordings collected during the next-generation spacesuit testing.
The winning computer vision algorithms are expected to have the ability to:
- Detect spacesuits in a variety of environments and lightning conditions.
- Correctly discriminate between an “unsuited” person and a spacesuit.
- Robustly extract suit postures from images partially occluded.
- Capable of functioning with a single or multiple spacesuits.
In this challenge your task will be
- to extract polygonal areas that represent spacesuits from photographs,
- to determine the coordinates (in 2D pixel space) of some predefined joints (like Right Shoulder Joint or Left Knee Joint) from photographs,
- to determine the coordinates (in 3D metric space) of joints from videos.
The polygons and joint coordinates your algorithm returns will be compared to ground truth data, the quality of your solution will be judged by how much your solution matches the expected results, see Scoring for details.
Input Files
In this task you will work with two types of media: images and video. Image data is available in standard .jpg and .png formats, each training and test image is given in only one of these two formats. Video data is available in two formats: as .mov files and as a set of .jpg images representing the individual frames of the video. Each training and test video is given in both formats, you may choose to work with either or both of them. Although training data will be provided, challenge participants are strongly encouraged to augment the data set with their own labeled data or existing datasets.
IMAGE ANNOTATIONS
Image annotations (spacesuit contours and joint coordinates) are described in a .txt file, one image per line, as follows.
<image-id>,<joint-coordinate-list>,[<spacesuit-shape>]…
where
<image-id> is the case sensitive file name of the image, including the file extension. The angle brackets (here and also elsewhere in this document) are just for readability, they are not present in the annotation text.
<joint-coordinate-list> is a comma separated sequence of x,y,v triplets. X and y are pixel coordinates (x is from left to right, y is from top to bottom), v is visibility:
- 0: not labelled
- 1: labelled, not visible
- 2: labelled, visible
There are always a multiple of 15 x,y,v triplets present, based on how many spacesuits are shown in the image. Note there are images that show no spacesuits at all. The joints belonging to one spacesuit are described in this order:
- Right Head
- Left Head
- Base Head
- Right Shoulder
- Right Elbow
- Right Hand
- Left Shoulder
- Left Elbow
- Left Hand
- Right Hip
- Right Knee
- Right Foot
- Left Hip
- Left Knee
- Left Foot
<spacesuit-shape> describes the area occupied by a spacesuit as the union of polygonal areas:
(<polygon-1>)(<polygon-2>)(…etc)
where <polygon-i> is a comma separated sequence of x,y pixel coordinate pairs. Most spacesuit shapes can be described using a single polygonal area but for some shapes more polygons are needed, either because of occlusion the shape is made up of disjoint areas or because it contains holes. Positive areas are given by listing the polygon’s points in clockwise order, negative areas (holes) are given by listing the points in counter-clockwise order.
Examples:
This describes a single rectangular shape:
image1.jpg,100,200,2,120,210,2,…13 joint coordinates omitted for brevity…,[(100,100,200,100,200,200,100,200,100,100)]
A rectangular shape with a hole:
image2.jpg,100,200,2,120,210,2,…13 joint coordinates omitted for brevity…,[(100,100,200,100,200,200,100,200,100,100)(110,110,110,120,120,120,120,110,110,110)]
Two spacesuits, a rectangular and a triagonal:
image3.jpg,100,200,2,120,210,2,…28 joint coordinates omitted for brevity…,[(100,100,200,100,200,200,100,200,100,100)][(300,100,400,100,300,200,300,100)]
NOISY IMAGES
Besides the original images, the provisional and final test sets contain images with artificially lowered quality. Noise was added to originals using the following algorithm:
- Applied Gaussian blur using a radius chosen randomly and uniformly between 0 and 8 pixels.
- Added Gaussian noise (to each pixel and independently to the 3 colour channels), the standard deviation of the amount of change is chosen uniformly and randomly between 0 and 8.
VIDEO ANNOTATIONS
The video annotations are stored in .csv files, one file per video. Excluding two header lines each line in the file describes one video frame:
<frame-id>,x,y,z,x,y,z,…// 11 {x,y,z} triplets omitted for brevity
where <frame-id> is the frame ordinal (1-based integer), followed by 13 x,y,z coordinate triplets that represent the following joint positions in this order:
- CLAV Clavicle
- RSJC Right Shoulder Joint Center
- LSJC Left Shoulder Joint Center
- REJC Right Elbow Joint Center
- LEJC Left Elbow Joint Center
- RWJC Right Wrist Joint Center
- LWJC Left Wrist Joint Center
- RHJC Right Hip Joint Center
- LHJC Left Hip Joint Center
- RKJC Right Knee Joint Center
- LKJC Left Knee Joint Center
- RAJC Right Ankle Joint Center
- LAJC Left Ankle Joint Center
All coordinates are measured in meters. The z axis always points vertically upwards, but the orientation of the x and y axes are not fixed, they can be (and are) different across videos, but they remain constant for the duration of a video. In the annotations the origin of the coordinate system is tied to the RAJC point on frame #1, this is an arbitrary choice which has no significance in scoring. See the Scoring section for details on how the undefined nature of the coordinate system is handled.
Important notes:
- There are missing data in the annotations. For some frames the coordinates of certain joints are not known due to data collection limitations. Collection of the 3D position data is independent of the video capture process, so missing data has no correlation to joints being visible or not in the video.
- For most of the videos the 3D motion capture device recorded one less set of joint positions than the number of frames in the video, data for the last video frame was missing. This is handled by duplicating the last row of annotations in the .csv files.
- Video frames and annotations are not fully synchronized for some of the videos. The divergence is small in most cases, and should have minimal effect in scoring. The most visible difference is with the training video MKIII-06.
Awards:-