peter wijeratne computer vision university of sussex spring 2024 cours
Search for question
Question
Peter Wijeratne: Computer Vision @ University of Sussex - Spring 2024
Coursework Assignment
1 Assignment Overview
This assignment will involve you designing, building, testing and critiquing a system for per-
forming face alignment, aka. locating facial landmarks in images. There is also a secondary
extension task detailed below.
This assignment is worth 80% of the grade for this module. It is designed to ensure you can
demonstrate achieving the learning outcomes for this module, which are:
1.1
• Write and document a computer program to extract useful information from image data.
• Propose designs for simple computer vision systems.
• Determine the applicability of a variety of computer vision techniques to practical prob-
lems.
• Describe and recognise the effects of a variety of image processing operations.
Secondary Task
You will design and implement a system for modifying the colour of the lips and/or the eyes in
the image. This should be achieved through a simple algorithmic procedure, either using the
estimated landmarks to identify the correct region or through another segmentation strategy.
This aspect is worth 25% of the marks for this assignment.
2 What to hand in?
1. A report that comprises a maximum of 8 pages and 1500 words, including captions but
excluding references. I'm expecting several pictures, diagrams/flowcharts and charts to be
included.
• A summary and justification for all the steps in your face alignment system, including
preprocessing, choice of image features and prediction model. Explaining diagram-
matically is very welcome.
• Results of your experiments: This should include some discussion of qualitative (ex-
ample based) and quantitative (number based) comparisons between different ap-
proaches that you have experimented with.
• Qualitative examples of your face alignment approach running on the small set of
provided example images, found in the compressed numpy file (examples.npz) here. Peter Wijeratne: Computer Vision @ University of Sussex - Spring 2024
• Examples of failure cases in the face alignment system and a critical analysis of these,
identifying potential biases of your approach.
• A brief summary of your system for modifying the colour of the lips and/or eyes.
2. A .csv file that contains the face landmark positions on the test set of images, found in the
compressed numpy file (test images.npz) here. You must use the provided “save_as_csv"
function in the colab worksheet to process an array of shape (number_test_image, num-
ber_points, 2) to a csv file. Please make sure you run this on the right data and submit
in the correct format to avoid losing marks. Please only include a single .csv file in
the submission.
3. Either .ipynb files or .py files containing annotated code for all data preprocessing, model
training and testing.
4. You may optionally include your trained model parameters, but please do not hand in any
other additional files, datasets or supplementary results as this complicates the marking
process.
Please only use the .zip format archive format with your submission, do not use
.rar .7z or .arc
3
How will this be graded?
The breakdown of marks (as a %) for this assignment are given below:
20 Marks Accuracy and robustness of face alignment
These marks are allocated based on the performance of the face alignment method. This
will be evaluated on the held out test set, which includes some difficult cases. The test
images, without annotations are provided in the compressed numpy file (test images.npz)
here and the error on the predicted points will be calculated after submission. Marks will
be awarded for average accuracy and robustness (% of images with error below a certain
threshold).
30 Marks Outline of methods employed
Justifying and explaining design decisions for the landmark finding. This does not have to
be in depth, and I do not expect you to regurgitate the contents of the lecture notes/papers.
You should state clearly:
• any image pre-processing steps you have used, and why.
what image features/representation you have used, briefly describe how they were
calculated, and why you chose them.
what predictions methods you have use; what ML task this corresponds to, the loss
function that your system is trained with, and a description of any regularisation that
you may have used.
• design/parameter decisions should be explained and justified
For top marks, you should clearly demonstrate a creative and methodical approach for
designing your system, drawing ideas from different sources and critically evaluating your
choices. Explaining using diagrams and/or flowcharts is very welcome. Peter Wijeratne: Computer Vision @ University of Sussex - Spring 2024
20 Marks Analysing results and failure cases
Critically evaluate the results produced by your system on validation data. You should
include quantitative (number based) and qualitative (example based) comparisons between
different approaches that you have tried (on a held-out validation set).
Quantitative measures including measuring the cumulative error distribution (see lecture
slides) or using boxplots or other plots to compare methods. Please note that we are
interested in your final prediction results, rather than how the cost function changes during
training. Please explicitly define any evaluation metrics and ensure they are appropriate
for the task.
A detailed qualitative analysis would investigate and identify systematic failure cases and
biases, providing visual examples, and proposing potential solutions.
25 Marks Lip/eye colour modification
Outline the employed methodology, ideally using a diagram or flowchart to explain the
steps. Provide several example results and illustrate some failure cases. The solutions do
not need to be complicated, but they should be clearly explained and appropriate for the
task. Marks will be allocated for the quality of the description, appropriateness of the
method and analysis and presentation of the results.
5 Marks Code annotation is for annotating sections of the training/testing code with what they
do. To get maximum marks, explain each algorithmic step (not necessarily each line) in
your notebook/.py files.
General Points on the report
• Read things! Provide references to anything you find useful. You can take figures from
other works as long as you reference them appropriately.
• Diagrams, flowcharts and pictures are very welcome! Make sure you label them properly
and refer to them from the text.
• All plots should have correctly labelled axis.
4
What resources are provided for me?
The training images are provided for you in a compressed Python array. They have already been
preprocessed to be the same size with the faces roughly (but not exactly) in the middle of the
image. The training data can be downloaded as compressed numpy files (training_images.npz)
here.
The data can be read by:
import numpy as np
# Load the data using np.load
data =
np.load('training_images.npz', allow_pickle=True)
# Extract the images: shape
images
=
data['images']
# and the data points: shape
pts data['points']
=
=
(2811, 254, 254, 3)
=
(2811, 44, 2) Peter Wijeratne: Computer Vision @ University of Sussex – Spring 2024
In this very basic colab worksheet I provide code for:
• Loading the data
• Visualising points on an image
• Estimating the error between predictions and ground truth.
• Saving the results to a .csv file, which contains some checks to make sure you're predicting
on the correct dataset.
A set of test images, without landmarks is provided in the compressed numpy array (test images.npz)
here. This data is loaded the same way as before, but there are no points stored in the file.
I also include 6 images to use for qualitative comparisons found in the compressed numpy
array (examples.npz) here. These images should be included in your report to demonstrate face
alignment performance across different genders, ethnicities and poses.
4.1 Notes on using Colab
Either you can complete this project using Google colab, which gives you a few hours of comput-
ing time completely free of charge, or you can use your personal/lab machine. The lab machines
are fairly powerful, so if you need more computing resource then try those!
If you are using Google colab, try and familiarise yourself with some of its useful features.
To keep your saved models, preprocessed data etc. you can save it to Google drive following
the instructions here. You can also directly download a file you make in colab using the code
below:
from google. colab import files
files.download (filename)
If you're refactor code into extra .py files, these should be stored in your google drive as well,
or on Box such that they are easy to load into your Colab worksheet.
4.2
Most important links
Contents
Training images and points
Test images
Examples images for qualitative comparisons
Colab worksheet with some useful functions
links
filetype
compressed numpy array link
(training_images.npz)
compressed numpy file link
(test images.npz)
compressed numpy file (ex- link
amples.npz)
colab worksheet
4.3 What library functionality can I use?
link
You're free to use fundamental components and functions from libraries such as OpenCV, numpy,
scipy, scikitlearn to solve this assignment, although you don't have to. Here, fundamental compo-
nents refers to things like regression/classification models and pre-processing/feature extraction
steps and other basic functionality. What you are not allowed to use are library functions that
have been written to directly solve the tasks you have been given, i.e. face alignment. You Peter Wijeratne: Computer Vision @ University of Sussex - Spring 2024
cannot use the dlib or mediapipe face alignment tools or anything that provides similar
functionality. Also, face detection is not required on this data.
In terms of tools and frameworks, it's absolutely fine to use convolutional neural networks
(CNNs) if you want to, which are introduced in fundamentals of machine learning. The best
packages would be either TensorFlow (probably with Keras) or PyTorch. If you use such an
approach you should be sure to document how you chose the architecture and loss functions.
A well justified and high performing CNN approach will receive equivalently high marks as if
you'd built it any other way.
In terms of sourcing additional labelled data, this is not allowed for this assignment. This
is because in real-world commercial projects you will typically have a finite dataset, and even if
there are possibly useful public datasets available, their license normally prohibits commercial
use. On the other hand data augmentation, which effectively synthesises additional training
examples from the labelled data that you have, is highly encouraged. If you use this, please try
and add some text or a flow-chart of this process in your report.
5 Where do I start
5.1
Face Alignment
Face alignment is covered in lecture 14, so that's a good place to look for information.
I have included a very basic colab worksheet illustrating how to load the data and visualise
the points on the face.
The simplest approach would be to treat this as either a regular or a cascaded regression
problem, where given an image you want to predict the set of continuous landmark coordinate
locations. To follow this approach you will need to consider what image features are helpful
to predict the landmarks and what pre-processing is required on the data. Although you could
directly use the flattened image as input, this will not be the optimal data representation for
this task.
A better representation would be to describe a set of locations, either evenly spaced across
the image, or in some more useful pattern (think about where in the image you might want to
calculate more information) using a feature descriptor, such as SIFT. These descriptions can
then be concatenated together and used as input into a linear regression model. Note that you
do not need to use the keypoint detection process for this task - rather the descriptors should be
computed at defined locations (hint: look at sift.compute() or similar) to create a representation
of the image that is comparable across the dataset.
You're not restricted to taking this approach, and for higher marks creativity is very much
encouraged. Face alignment has seen a lot of interesting and varied ideas, and if you find some
good ideas while reading around the topic that would be great.
5.2 Lips/Eye colour modification
We're looking for simple solutions for this task, which could be based on the landmarks you
are predicting and/or colour. One approach would be to segment the required pixels and then
modify the colour within the segmented region, although you could investigate other solutions.
I am intentionally not providing a training set of data for this task. There's some useful code
in OpenCV, take a look at cv2.fillPoly.