Search for question
Question

Q1: Does the chance of making a claim depend on gender or age? The calculated claim probabilities are as follows: • Claim probability for men: 13.27% • Claim probability for women: 13.05% Based on these values, there seems to be a very slight difference in the claim probabilities between men and women. But, this difference is quite small (only about 0.22%). Therefore, it might not be statistically significant. Claim probability for men is 13.27% Claim probability for women is 13.05% The below plot represents the claim probabilities for 5-year age bands (0-4, 5-9 etc.). 20- Claim Probability (%) G 5 Claim Probability by Age Band (5-Year Bands) 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Age Band The below plot represents the claim probabilities for 10-year age bands (0-9, 10-19 etc.). Claim Probability (%) 20.0- 17.5- 15.0- 12.5- 10.0- 7.5- 5.0- 2.5- Claim Probability by Age Band (10-Year Bands) 0.0- 0 2 3 4 5 6 7 8 9 Age Band These plots help us to analyse how claim probabilities varies with age. For the younger age bands (0-19), the claim probability is zero, which makes sense as the youngest customer is 18. Starting from the 20-24 age band, the claim probability fluctuates between approximately 8% and 15% until the 70-74 age band. After that, the claim probability increases, reaching a peak of around 23% in the 90-94 age band. The second plot also shows a similar pattern, but with less fluctuation due to wider age bands. These results suggest that the chance of making a claim does depend on the customer's age. In general, the chance of making a claim seems to increase with age, especially for customers aged 70 and above. Since plot of 10-year age bands provides a smoother view of the trend and might be better for seeing the overall pattern, it is better to use this plot. Q2: Does the value of a claim tend to increase with age? For analysing this, we have developed a Decision Tree regressor. The max_depth is chosen as 3 since increasing the value of this parameter also seems to increase the average error. Following is the final decision tree for max_depth = 3: x10129872.486 squared error 9 samples = 473 value 9446.07 x[0]<10639210.345| squared error= samples 987 value=10169.916 x[0] <= 97.5 squared error 28016301.75 samples = 514 value 10836.024 x10122.5 squared error= 8674287.367 samples 214 value 9207.075 squared error =10150614.671 samples 259 value 9643.54 squared error =28321357.869 samples 502 value 10893.591 x101-98.5 squared error= 9316733.346 samples 12 value 8427.825 squared error=8447369.83 squared error -8714048 samples 89 value=9523.185 samples 125 squared error 10084223.562 samples 238 value 8982.004 value 9743.935 squared error 9494214.076 samples 21 value 8505.734 squared error 22346726.24 samples 11 value 12656.407 squared error =28384030.763 samples 491 value 10854.098 squared error 8572860.08 samples B value 9002.4 squared error=8823661 296 samples = 4 value=7278.675 The final scatter plot of Age vs. Claim value is shown below: Claim_Value 35000 30000 25000 20000 15000 CCCODICCO 10000 5000 20 30 40 50 60 70 80 90 100 Age Based on the data, there seems to be no particular relationship between Age and Claim Value as there is a random scatter in the plot. Regardless of the age, most of the people seems to have a claim value in the range of 5000 to 15000 as there is a high cluster density in this region. Q3: Does the value of a claim tend to increase nearer to London? The decision tree for max_depth = 2 is shown below as this value of max_depth had the lowest average error. x[0] 198.057 squared_error = 18998923.597 samples 978 value 10153.104 x[0] <= 77.386 squared_error = 15804388.387 samples = 568 value = 12123.421 x[0] <= 609.144 squared_error = 10595566.051 samples = 410 value = 7423.496 squared_error = 15142703.33 squared error = 12655676.392 samples 299 value = 13409.87 samples = 269 value = 10693.501 squared_error = 10928584.713 samples 388 value 7489.803 squared_error = 3277245.513 samples = 22 value = 6254.076 The corresponding scatterplot is: Claim_Value 35000 30000 25000 20000 15000 10000 5000 OLCODIOD 0 200 400 600 800 1000 Distance According to the scatter plot and decision tree, the data does support the suggested banding that the prices fall into three bands depending on the distances from London. Following are the predicted bands by the model: - - Band 1: 0 to 77.386 kms Band 2: 77.386 km to 198.057 km Band 3: 198.057 km to 609.144 km Q4: Can you give a reason for the answer to Q2? The odd feature in the plot for Q2, where the values of claims are very large, could be explained by looking at the other data in the file. Specifically, the Car_Make column might provide some insights./n