Q3 (i) [ 0.5 marks] Add two columns in the df_transactions dataframe: 1. A column named
TX_DATE should have the date of the transaction without any time information 2. A column name
TX_WEEK should have the week number in which the transaction occurred (e.g., first week of January
is week 1, the second week of January is week 2 and so forth). A week is defined as starting on a
Monday and ending on a Sunday.
[ ]:"""Add the columns discussed above with appropriate values here"""
# BEGIN - YOUR CODE GOES HERE
pass
# END - YOUR CODE GOES HERE
[]:"""Do not remove this cell. """
Q3 (ii) [ 1.5 marks] This question asks you to create columns that will store the Frequency of
transactions for customers. In particular, we are interested in the number of transactions a customer
did on the previous day and in the previous week (where week is defined as starting on a Monday
and ending on a Sunday). The columns should be added to the df_transactions dataframe as
per: 1. CUSTOMER_TOTAL_1D: The number of transactions for this customer on previous day 2.
CUSTOMER_TOTAL_1W: The number of transactions for this customer on previous week/nNote The df_transactions dataframe should not have any columns that are not required as per
this assignment.
[ ]:"""Populate the variables shown above with appropriate values here"""
#BEGIN - YOUR CODE GOES HERE
pass
# END - YOUR CODE GOES HERE
[]:"""Do not remove this cell. """
Q3 (iii) [ 1.5 marks] This question asks you to create columns that will store the expected Mon-
etary value of transactions for customers. In particular, we are interested in the median value
of transactions a customer did on the previous day and in the previous week (where week is de-
fined as starting on a Monday and ending on a Sunday). The columns should be added to the
df_transactions dataframe as per: 1. SPENT_1D: The median dollar value of transactions for this
customer on previous day 2. SPENT_1W: The median dollar value of transactions for this customer
on previous week
Note The df_transactions dataframe should not have any columns that are not required as per
this assignment.
[ ]:"""Populate the variables shown above with appropriate values here"""
# BEGIN - YOUR CODE GOES HERE
pass
#END - YOUR CODE GOES HERE
[]:"""Do not remove this cell. """
Q3 (iv) [0.5 marks] Generate a scatter plot with amount (in dollars) on y-axis, and customer id
on the x-axis. The scatter plot should have two markers, one for the median amount a customer
spent on the previous day, and second for the value of fraudulent transactions for that customer.
You should label the plot appropriately.
[]:"""Populate the variables shown above with appropriate values here"""
# BEGIN - YOUR CODE GOES HERE
pass
# END - YOUR CODE GOES HERE
Q3 (v) [ 0.5 marks ] Will including the amount a customer spent in the previous day help in
improving the logistic regression model from the previous question? You do not need to run the
logistic regression model. You should answer the question in the context of the scatter plot from
previous question. [Word limit < 150 words]
Note Write your justification in the Markdown cell below
0.4.1 WRITE YOUR ANSWER(S) HERE IN THIS CELL
You can use Markdown syntax here.
7
Fig: 1
Fig: 2