q1 does the chance of making a claim depend on gender or age the calcu
Search for question
Question
Q1: Does the chance of making a claim depend on gender or age?
The calculated claim probabilities are as follows:
•
Claim probability for men: 13.27%
•
Claim probability for women: 13.05%
Based on these values, there seems to be a very slight difference in the claim probabilities
between men and women. But, this difference is quite small (only about 0.22%). Therefore, it
might not be statistically significant.
Claim probability for men is 13.27%
Claim probability for women is 13.05%
The below plot represents the claim probabilities for 5-year age bands (0-4, 5-9 etc.).
20-
Claim Probability (%)
G
5
Claim Probability by Age Band (5-Year Bands)
0
0
1
2
3
4 5 6 7 8
9
10
11
12
13
14
15 16
17 18
19
Age Band
The below plot represents the claim probabilities for 10-year age bands (0-9, 10-19 etc.). Claim Probability (%)
20.0-
17.5-
15.0-
12.5-
10.0-
7.5-
5.0-
2.5-
Claim Probability by Age Band (10-Year Bands)
0.0-
0
2
3
4
5
6
7
8
9
Age Band
These plots help us to analyse how claim probabilities varies with age. For the younger age
bands (0-19), the claim probability is zero, which makes sense as the youngest customer is
18. Starting from the 20-24 age band, the claim probability fluctuates between approximately
8% and 15% until the 70-74 age band. After that, the claim probability increases, reaching a
peak of around 23% in the 90-94 age band.
The second plot also shows a similar pattern, but with less fluctuation due to wider age bands.
These results suggest that the chance of making a claim does depend on the customer's age.
In general, the chance of making a claim seems to increase with age, especially for customers
aged 70 and above.
Since plot of 10-year age bands provides a smoother view of the trend and might be better for
seeing the overall pattern, it is better to use this plot.
Q2: Does the value of a claim tend to increase with age?
For analysing this, we have developed a Decision Tree regressor. The max_depth is chosen as
3 since increasing the value of this parameter also seems to increase the average error.
Following is the final decision tree for max_depth = 3: x10129872.486
squared error 9
samples = 473
value 9446.07
x[0]<10639210.345|
squared error=
samples 987
value=10169.916
x[0] <= 97.5
squared error 28016301.75
samples = 514
value 10836.024
x10122.5
squared error=
8674287.367
samples 214
value 9207.075
squared error =10150614.671
samples 259
value 9643.54
squared error =28321357.869
samples 502
value 10893.591
x101-98.5
squared error=
9316733.346
samples 12
value 8427.825
squared error=8447369.83
squared error -8714048
samples 89
value=9523.185
samples 125
squared error 10084223.562
samples 238
value 8982.004
value 9743.935
squared error 9494214.076
samples 21
value 8505.734
squared error 22346726.24
samples 11
value 12656.407
squared error =28384030.763
samples 491
value 10854.098
squared error 8572860.08
samples B
value 9002.4
squared error=8823661 296
samples = 4
value=7278.675
The final scatter plot of Age vs. Claim value is shown below:
Claim_Value
35000
30000
25000
20000
15000
CCCODICCO
10000
5000
20
30
40
50
60
70
80
90
100
Age
Based on the data, there seems to be no particular relationship between Age and Claim Value
as there is a random scatter in the plot. Regardless of the age, most of the people seems to
have a claim value in the range of 5000 to 15000 as there is a high cluster density in this
region. Q3: Does the value of a claim tend to increase nearer to London?
The decision tree for max_depth = 2 is shown below as this value of max_depth had the
lowest average error.
x[0] 198.057
squared_error = 18998923.597
samples 978
value 10153.104
x[0] <= 77.386
squared_error = 15804388.387
samples = 568
value = 12123.421
x[0] <= 609.144
squared_error = 10595566.051
samples = 410
value = 7423.496
squared_error = 15142703.33 squared error = 12655676.392
samples 299
value = 13409.87
samples = 269
value = 10693.501
squared_error = 10928584.713
samples 388
value 7489.803
squared_error = 3277245.513
samples = 22
value = 6254.076
The corresponding scatterplot is: Claim_Value
35000
30000
25000
20000
15000
10000
5000
OLCODIOD
0
200
400
600
800
1000
Distance
According to the scatter plot and decision tree, the data does support the suggested banding
that the prices fall into three bands depending on the distances from London. Following are
the predicted bands by the model:
-
-
Band 1: 0 to 77.386 kms
Band 2: 77.386 km to 198.057 km
Band 3: 198.057 km to 609.144 km
Q4: Can you give a reason for the answer to Q2?
The odd feature in the plot for Q2, where the values of claims are very large, could be
explained by looking at the other data in the file. Specifically, the Car_Make column might
provide some insights./n