Answer all questions in this section.
Question 1
The objective is to apply clustering techniques using Python to group customers based
on the variables Recency, Frequency, and Monetary (RFM), which can help a company
for analyzing customer value based on past buying behavior.
Recency refers to how recently a customer has made a purchase, Frequency is defined
as how often a customer makes a purchase, and Monetary refers to how much money a
customer spends on purchases.
You will be provided with a dataset that requires preprocessing to obtain the values for
these variables.
Your task is to perform clustering analysis on the variables Recency, Frequency, and
Monetary and interpret the results. Please choose the data from 1 January 2011 to 31
March 2011 in the provided dataset for your analysis.
This assignment aims to enhance your understanding of clustering algorithms and their
application in real-world scenarios. A report and Zip file needs to be submitted,
covering the results of the following 4 sub-questions.
Data Description: The provided dataset contains all the transactions occurring in 2011
in UK for online retail, including the following variables:
• InvoiceNo: The number of the invoice, unique per each purchase. Refund invoice
numbers contain "C"
•
•
StockCode: Unique code per each item
Description: Name of the item
Quantity: The number of items within the invoice
InvoiceDate: Date and time of the purchase
UnitPrice: Price of a single item
CustomerID: Unique id number per each customer/nQuestion la
Data Preprocessing: Choose the data from 1 January 2011 to 31 March 2011 in the
provided dataset, prepare the data by performing necessary data cleaning, like handling
missing values and removing negative values.
(10 marks)
Question lb
Prepare and analyse the data by extracting the values for Recency, Frequency, and
Monetary from the available attributes for each customer. The three variables (Recency,
Frequency, and Monetary) need to be normalized before performing the clustering.
• Calculating the frequency of customers by counting Invoice numbers of each
customer.
Calculating Recency by calculating the days since last purchase for each customer,
here you need to group by CustomerID and check the last date of purchase for each
customer, choose a date as a point of reference to evaluate the days since last
purchase.
• Calculating monetary by summing up all the amounts for each customer. Kindly
note that the total price for each purchase is Quantity UnitPrice.
• Reference link for data preprocessing and the calculation of Recency, Frequency,
and Monetary.
• Submit the dataset with the values of Recency, Frequency, and Monetary for each
customer.
(35 marks)
Question le
Design and implement K-means clustering using Python to cluster the customers in
terms of the triple (Recency, Frequency, Monetary). Use the Elbow method to appraise
the clustering results and find the optimum number of clusters.
(30 marks)
Question Id
Visualize and evaluate the clusters obtained from the clustering algorithm, assess the
key findings and discuss the insights gained from the analysis of customer behavior.
(25 marks)
Fig: 1
Fig: 2