Search for question
Question

Section A (100 marks)

Answer all questions in this section.

Question 1

The objective is to apply clustering techniques using Python to group customers based

on the variables Recency, Frequency, and Monetary (RFM), which can help a company

for analyzing customer value based on past buying behavior.

Recency refers to how recently a customer has made a purchase, Frequency is defined

as how often a customer makes a purchase, and Monetary refers to how much money a

customer spends on purchases.

You will be provided with a dataset that requires preprocessing to obtain the values for

these variables.

Your task is to perform clustering analysis on the variables Recency, Frequency, and

Monetary and interpret the results. Please choose the data from 1 January 2011 to 31

March 2011 in the provided dataset for your analysis.

This assignment aims to enhance your understanding of clustering algorithms and their

application in real-world scenarios. A report and Zip file needs to be submitted,

covering the results of the following 4 sub-questions.

Data Description: The provided dataset contains all the transactions occurring in 2011

in UK for online retail, including the following variables:

• InvoiceNo: The number of the invoice, unique per each purchase. Refund invoice

numbers contain "C"

StockCode: Unique code per each item

Description: Name of the item

Quantity: The number of items within the invoice

InvoiceDate: Date and time of the purchase

UnitPrice: Price of a single item

CustomerID: Unique id number per each customer/nQuestion la

Data Preprocessing: Choose the data from 1 January 2011 to 31 March 2011 in the

provided dataset, prepare the data by performing necessary data cleaning, like handling

missing values and removing negative values.

(10 marks)

Question lb

Prepare and analyse the data by extracting the values for Recency, Frequency, and

Monetary from the available attributes for each customer. The three variables (Recency,

Frequency, and Monetary) need to be normalized before performing the clustering.

• Calculating the frequency of customers by counting Invoice numbers of each

customer.

Calculating Recency by calculating the days since last purchase for each customer,

here you need to group by CustomerID and check the last date of purchase for each

customer, choose a date as a point of reference to evaluate the days since last

purchase.

• Calculating monetary by summing up all the amounts for each customer. Kindly

note that the total price for each purchase is Quantity UnitPrice.

• Reference link for data preprocessing and the calculation of Recency, Frequency,

and Monetary.

• Submit the dataset with the values of Recency, Frequency, and Monetary for each

customer.

(35 marks)

Question le

Design and implement K-means clustering using Python to cluster the customers in

terms of the triple (Recency, Frequency, Monetary). Use the Elbow method to appraise

the clustering results and find the optimum number of clusters.

(30 marks)

Question Id

Visualize and evaluate the clusters obtained from the clustering algorithm, assess the

key findings and discuss the insights gained from the analysis of customer behavior.

(25 marks)

Fig: 1

Fig: 2