1 introduction 1 1 summary in this assignment you will explore a modif
Search for question
Question
1 Introduction
1.1 Summary
In this assignment you will explore a modified real dataset and practice the typical machine
learning process. This assignment is designed to help you become more confident in applying
machine learning approaches to solving tasks. In this assignment you will:
1. Selecting the appropriate ML techniques and applying them to solve a real-world ML
problem.
2. Analysing the output of the algorithm(s).
3. Research how to extend the modelling techniques that are taught in class.
4. Providing an ultimate judgement of the final trained model that you would use in a real-world
setting.
To complete this assignment, you will require skills and knowledge from lecture and lab material for
Weeks 1 to 4 (inclusive). If you have already started with the assignment quick pick, you have already have the
tools for kick starting this assignment. You may find that you will be unable to complete some of the activities
until you have completed the relevant lab work. However, you will be able to commence work on some
sections. Thus, do the work you can initially, and continue to build in new features as you learn the
relevant skills. A machine learning model cannot be developed within a day or two. Therefore, start early.
This assignment has four deliverables:
Please note, you will not receive any mark if you don't submit all 4 assignment deliverables. While
you must submit all the 4 deliverables, you will only be marked based on your PDF report and
presentation. That means we will not consider any part of your code that your presentation and
report does not cover.
1. A PDF report, preferably converted form of notebook, following bellow criteria:
•
• Bullet point format:
Bullet point is where you raise each point in one bullet point, using clear topic, then explain the
important detail as summary under the title (This description is a clear example). A report must
be no more than 4 pages, (plus up to 2 pages for possible references and graphs).
• Graphs:
Your report should include the graphs produced by your analysis.
Markdown:
If you are using notebook, it needs to be in the format of the provided tutorials. That means the
report should include markdown text explaining the rational, critical analysis of your approach
and ultimate judgement.
• Specification:
The report needs to be self- explanatory, well structured, and fulfill all the assignment
specifications.
2. A video presentation, following bellow criteria:
•
•
Presentation format:
You need to use your PDF report as the basis of your presentation. In the presentation you will
go through each bullet point and explain it in detail. That should include your judgment.
Presentation length:
Your presentation should be10 minutes (minimum of 9, and maximum of 11 minutes). You
should not exceed 11 minutes, as you will be marked only based on the first 11 minutes of
your presentation. You will lose mark if it is less than 9 minutes.
• Must cover:
Fulfill all the assignment specifications, based on your PDF report. You should share your
window containing your PDF report while presenting, and have your camera on so your face in
also captured in the video.
21 3. A set of prediction, following bellow criteria:
Your prediction must be based on your final method and your ultimate judgement. The sample
solution is included, the ID need to include the ID of the selected data from Data_Set that makes
up your test set (manual selection is not acceptable).
4. Your Jupyter notebook, following bellow criteria:
Your Jupyter notebook, used to perform your modelling & analysis with instructions on how to
run them, which need to have embedded explanatory comments. Remember that code is only
used for reference, and unless you also include your comments in the report and presentation,
you will not receive any mark for them.
Please note, you will not receive any mark if you don't submit all 4 assignment deliverables. While
you must submit all the 4 deliverables, you will only be marked based on your PDF report and
presentation. That means we will not consider any part of your code that your presentation and
report does not cover.
1.2 Learning Outcomes
This assignment contributes to the following course CLOS:
• CLO 1: Understand the fundamental concepts and algorithms of machine learning and
applications.
•
•
CLO 3: Set up a machine learning configuration, including processing data and
performing feature engineering, for a range of applications.
CLO 4: Apply machine learning software and toolkits for diverse applications.
3 • Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e.
directly copied), summarised, paraphrased, discussed or mentioned in your assessment through the
appropriate referencing methods
• Provided a reference list of the publication details so your reader can locate the source if necessary.
This includes material taken from Internet sites. If you do not acknowledge the sources of your material,
you may be accused of plagiarism because you have passed off the work and ideas of another person
without appropriate referencing, as if they were your own. 2 Task
In this assignment, you will predict the life span of a human based on several
attributes (features) related to the region which he/she was born in.
о
°
о
°
Roughly 2000 instances and 20 features/attributes for the training data
Metadata describing teach feature is included for more insight into the data
Train regression model and use it to predict the life expectancy
Make an Ultimate Judgement, of the best regression model you would choose
- Remember: "The best model (hypothesis) that you can justify"
You will also setup an evaluation framework, including selecting appropriate
performance measures, and determining how to split the data into training and
validation data (manual split is not acceptable).
You need to come up with an approach (that follows the restrictions in 3.2), where
each element of the system is justified using data analysis, performance analysis
and/or knowledge from relevant literature.
• As one of the aims of the assignment is to become familiar with the machine learning
paradigm, you should evaluate couple of different models (only use techniques taught
in class up to week 4 - inclusive) to determine which one is most appropriate for this
task.
• Setup an evaluation framework, including selecting appropriate performance
measures, and determining how to split the data.
• Finally, you need to analyse the model and the results from your models using
appropriate techniques and establish how adequate your model is to perform the task
in real world and discuss limitation if there are any (ultimate judgement).
• Predict the result for the test set.
50 2.1
Data Set
The data set for this assignment is available on Canvas. It has been modified and
pre-processed to some extent, such that all the attributes/features are integers
or floats, and missing values has been estimated and filled in.
There are the following files:
.
•
•
2.1.1
Data-set.csv, contains the entire dataset. You need to divide this data
set into training and testing (don't divide the dataset manually), then
perform your analysis and tasks on them.
The file metadata.txt contains some brief description of each of the fields (attribute
names).
The file sample_solution.csv shows the expected format for your predictions on the
unseen test data (reminder: test set is the result of randomly dividing your entire
dataset into train and test).
Restrictions
As the aim of this assignment is to encourage you to learn to explore different
approaches, while you can explore feature impotency, and regularization, your
approach must not explicitly perform feature selection. That is, your models
should have all features as input (except the "ID" field which is not an attribute).
2.3 Marking guidline
A detailed rubric is attached on canvas. In summary:
•
Approach and ultimate judgment 70%
Prediction and related justification 10%
Report and Presentation structure 20%
Approach: You are required to use a suitable approach to find a predictive model. You may use
any ML technique taught in class during week 1-4, including: linear, non-linear and regularization
techniques. Each element of the approach need to be justified using data analysis, performance
analysis, your analytical argument and/or published work in literature. This assignment isn't just
about your code or model, but the thought process behind your work. The elements of your
approach may include:
• Performing EDA
• Setting up the evaluation framework
Selecting models, loss function and optimization procedure.
• Hyper-parameter setting and tuning
•
Identify problem specific issues/properties and solutions./n