logo of the PLUS Image Processing and Imaging Page (Andreas Uhl)

Department Home

Homepage of Andreas Uhl

Research and projects

Students page of Andreas Uhl


Lab (PS) Topic Description

  • A biometric presentation attack detection (PAD) system is typically a binary classifier, discriminating a sample image taken by the sensor from a human body part (real sample) from a sample image taken by the sensor from a presentation attack instrument (PAI), an artefact used to deceive the system by pretending to be a real / genuine human body part (fake sample). Thus we have real samples, and fake samples which are used to train the binary PAD classifier.
  • Fake samples are difficult to produce, as for each specimen a PAI needs to be created and scanned by the sensor. This is why people have come up with synthetic PAI (fake) samples which are much easier to create. We will work with different types of such synthetic PAI samples and will investigate if these can be used instead of / in addition to fake samples. To summarise, we have three different types of data: (1) Real samples (2) Fake samples and (3) Synthetic samples
  • The major task in this lab is to investigate if type (3) samples can replace type (2) samples as these are much easier to create.
  • First we establish the groundtruth: the PAD system is trained with type (1) and (2) samples, and also evaluated with those (in a five-fold cross validation protocol).
  • Subsequently, we train the PAD sytem with type (1) vs. (3) samples, and evaluate it with type (1) vs. (2) samples (we want to see if training with synthetic data can lead to reliable results to discriminate real from fake samples).
  • Finally, we train again like for groundtruth results, but sucessively reduce the number of fake samples in the training set. Missing ones are replaced by synthetic ones (thus in training we train type (1) vs. a mixture of type (2) and (3), evaluation is done - as always - with type (1) vs. type (2) samples).
  • We use the k-nearest neighbour classifier as PAD binary classifier, different groups use different feature extractions schemes as input to the classifier.

First Task (defined Dec. 8th)

  • The data is available at https://www.cosy.sbg.ac.at/~uhl/Data_prepared.zip
  • We start by establishing ground truth for PLUS, IDIAP and SCUT, i.e. the PAD system is trained with type (1) and (2) samples, and also evaluated with those (in a five-fold cross validation protocol). We only use type (1) samples, for which type (2) samples do exist (balanced sets). The folds are constructed by partitioning the users into folds (approximately). We use 4/5 of the data as training data for the k-NN to check the classification accuracy of the remaining 1/5 of the data. This is repeated 5 times (5 folds) to obtain classification rates for the entire datasets.
  • We do the same, but now successively reduce the number of samples in the training set:
    Step 1: We repeat the same 5 folds to obtain all results, but now the training for each fold is done on 2/5 of the data only.
    Step 2: We repeat the same 5 folds to obtain all results, but now the training for each fold is done on 1/5 of the data only.
    Use a fold selection scheme where the data usage for constructing the training sets is selected randomly (as it is always a subset of the training data used for ground truth results), but still separating subjects for training and test sets.
  • Perform the evaluation according to the metrics defined in the paper I have sent, ACER, APCER as well as BPCER.

Second Task (defined Dec. 31st)

  • Step 3: We start in Step 2 of the First Task: We repeat the same 5 folds to obtain all results, again the training for each fold is done on 1/5 of the real data only (Real samples and Fake samples). But, additionally, we double the size of the training set: by adding the Real samples that have been removed to arrive at Step 2 (from Step 1) and by adding the Synthetic samples, the corresponding Fake samples of which have been removed to arrive at Step 2 (from Step 1). So the size of the training set is the same as in Step 1, but it is differently composed.
  • Step 4: We further increase the size of the training set: We repeat the same 5 folds to obtain all results, we keep again 1/5 of the real data (Real samples and Fake samples) as the training set for each fold. But now, we have the remaining three folds composed of Real samples and the corresponding Synthetic samples as training data. The size of the training data is the same as in Step 5, but it is differently composed.
  • Step 5 (to compare against the results from the paper): We again work on 5 folds, the training data however does only contain Real and Synthetic samples. Thus, now the training set for each fold consists of all Real samples from the other 4 folds plus the corresponding Synthetic samples from those 4 folds.
  • Perform the evaluation according to the metrics defined in the paper I have sent, ACER, APCER as well as BPCER.

Dataset description

There is one folder for each dataset:
IDIAP
PLUS
PROTECT [this dataset is ignored for now !!]
SCUT

Each folder contains subfolders for the genuine examples (real samples), the spoofed ones (fake samples) and the synthetic ones:
genuine
spoofed
spoofed_synthetic_[GANmethod]

where GANmethod describes the GAN used to generate the data: starGANv2, drit, distanceGAN, cycleGAN - you should be using all and compare results.

For the synthetic subfolders there are further subfolders where the first one indicates the ID of the variant - you should be using: 009 (IDIAP), 003 (PLUS), 007 (SCUT), [010 (PROTECT].

Further subfolders are subsequently: /all_rs/reference/ ......... here you find the synthetic data following the naming conventions for each dataset as described in the following.

Sample naming conventions

PLUS

The filename are encoded using the following structure: [scanner name]_[DORSAL/PALMAR]_[session ID]_[user ID]_[finger ID]_[image ID].png
an example filename is: PLUS-FV3-Laser_PALMAR_001_01_02_01.png

- scanner name: the name of the scanner used to capture the image, either PLUS-FV3-Laser or PLUS-FV3-LED
- DORSAL/PALMAR: denotes if the image is captured from the palmar or the dorsal side
- session ID: the session ID, two digits, where 01 denotes the first session
- user ID: the user ID, three digits, where 001 denotes the first user and 060 denotes the 60th user
- finger ID: the finger ID, two digits, starting from the left thumb (01) till the right pinky finger (10):
02: left index finger
03: left middle finger
04: left ring finger
07: right index finger
08: right middle finger
09: right ring finger
- image ID: the image ID, two digits, starting from 01.

IDIAP

Samples are stored as follow with the following filename convention: full/bf/004-F/004_L_2. The fields can be interpreted as //-/__. The represents one of two options full or cropped. The images in the full directory contain the full image produced by the sensor. The images in the cropped directory represent pre-cropped region-of-interests (RoI) which can be directly used for feature extraction without region-of-interest detection. We provide both verification and presentation-attack detection protocols for full or cropped versions of the images.

The field may one of bf (bona fide) or pa (presentation attack) and represent the genuiness of the image. Naturally, biometric recognition uses only images of the bf folder for all protocols as indicated below. The is a 3 digits number that stands for the subject's unique identifier. The value can be either M (male) or F (female). The corresponds to the index finger side and can be set to either "R" or "L" ("Right" or "Left"). The corresponds to either the first (1) or the second (2) time the subject interacted with the device.

SCUT

Images are labeled as follow: ID_finger_session_shot_light.bmp, where “ID” stands for client's ID, “finger” ranges from 1 to 6 standing for the index, middle and ring finger of right and left hand respectively, “session” stands for session number which can be "0" or "1" “shot” stands for the considered shot number ranging from 0 to 5, and “light” stands for the level of light intensity which can be an integer between 1 and 6.

Images from the same client are regrouped into a single folder labeled.

NOTE: When processing these images, you only need to consider the first two label, i.e. “ID” and “finger”.