Search for question
Question

GV950 Homework 1 Spring 2024 Guidelines ● This assignment is worth a total of 55 marks. There are a total of 10 marks you can get just from following instructions

and having a file that knits correctly. More details on this are given within these guidelines. The remaining 45 marks are based on the questions below, where the marks for each question are stated next to it. • This homework is due on Faser on Wednesday, February 07 at 9:45am. Faser is different from Moodle and you should be able to access it from https://faser .essex.ac.uk/Account/Login?ReturnUrl=%2f. Please do not submit it directly to the instructor or TA, we cannot accept that. • If you are unable to submit on time, please do not email us as there is noth- ing we can do. If you are able to submit within 7 days of the deadline, do so and make sure you submit an extenuating circumstances form within those 7 days as well. See here: https://www.essex.ac.uk/student/exams-and-coursework/ late-submission-of-coursework. • You need to upload two files to Faser: an RMarkdown (.Rmd) file and a .docx (MS- Word) file, which is the output of the Rmd file. Submitting just a Word file or just an Rmd file will lead to a deduction of 30 marks from the total because this is considered an incomplete submission. Similarly, submitting a Word file that is not the output from your .Rmd file will be penalized the same. • We should be able to run the Rmd file by 'knitting' it; of course we will load the datasets from our own computers to do so but, once we do that, the entire file should run without any errors. If it does all run without errors and gives us the exact same output that you have submitted as the Word file, you will receive 6 marks for this. • In the Rmd file, the author field must have your 7 digit student number not your actual name (this is the number in the following format: 231XXXX). Without the student number, we cannot link your marks to your profile. And the assignment must not have your name mentioned anywhere. The title field should say "GV950 Homework 1" and the date can be whatever you want. Doing all of this correctly will get you 2 marks. • Your filename (i.e., the .Rmd file you submit) should be GV950 HW1_231XXXX.Rmd (where 231XXXX is your same 7 digit student number as above). Note that the filename of the Word file will also then automatically be the same. Doing this correctly will get you 2 marks. • Unless the question specifies differently, you should have both the code and its output in the Word document. 1 GV950 Homework 1 Spring 2024 • You must comment your code (at least briefly) to indicate what you are doing, how, why et cetera. The comments do not need to be very detailed but not including brief comments will lead to a mark deduction. • Any regular text that is part of the answer should not be in R comments but, rather, should be written in the Markdown file such that it displays as normal text in the resultant Word file. Not doing so will incur a mark deduction. • All questions must be completed using packages and functions we have learned in class (in lectures and labs). Using other functions will receive a mark of 0 for that question, even if the output is correct. • Please do not include long unnecessary output, especially using head(dataset_name) as that takes up many pages of output and provides no information, and make it harder to find the actual answer. Unnecessary output will have marks deducted. • When a question asks you to explain something, make sure to justify and explain your answer. E.g., if a question asks what direction a variable is skewed, say more than 'left' or 'right.' How can you tell? Is it only slightly skewed or very much skewed? And so on. • Any answer you give must have the relevant code and its output given as well. E.g., if you look at a histogram to determine how a variable is skewed, you must also give the code and output of the histogram before sharing your conclusion. Any interpretation based on code that is not shown or with the output not shown will receive no more than half the mark that question. • Any figure you make must have informative axes and title, and other aesthetic choices must be made appropriately to convey the most informative and useful figure in order to attain full marks. • Please remember that the homework is to be done individually, not in consultation with anyone else. . Please note that not following the instructions given in these guidelines will ad- versely affect your overall marks. ● Reminder that I have academic support hours on Tuesdays, 9:30am-11am in Office 5.016 and Qi is available to answer questions in the second hour of the lab sessions (Thursdays, 2-4pm) in Lab J. Dataset & Codebook For this homework, you will need two files that are available on Moodle under the Home- works section. One is a .csv file called “BES_HW1" (where BES stands for the British Election Study) and the other is a PDF file called "BES_codebook." This codebook lists 2 GV950 Homework 1 Spring 2024 all the variables in the original BES dataset along with a brief description. However, note that BES_HW1 does not have all the variables listed in the codebook as there would have been too many for you to be able to comprehend so I have reduced the columns by about one half. In addition, there are three variables in the dataset that are not in the codebook as I have added them for your ease. These are as follows: Country Name: Name of the country (note that the Country variable in the dataset refers to the same countries but assigns a number to each of these instead of giving the name in words). ● Region_Name: Name of the region (same explanation as the variable above, for the variable called Region). ● WinnerParty_Name19: The name of the party that won in the given constituency in the 2019 General Election (same explanation as the variable above, for the variable called Winner 19). Questions 1. Import the BES_HW1.csv dataset into R. What is the unit of analysis in this dataset and how can you tell? [1 mark] 2. Say that a friend of yours who knows nothing about the UK 2019 elections asks you to explain to them how well the three main parties - Conservatives, Labour, and Liberal Democrats did descriptively in constituencies across England versus Scotland. You choose to do this by using variables that measure the percentage of votes that these three parties each got at the constituency-level. Generate code, output and a description (in words) to explain this to your friend. In doing so, think carefully about various ways we have learned (last term) of describing data and choose what you think will be the most effective and succinct. You must also describe the patterns based on your output; just code and output is not enough. You can choose how to best describe it (be creative!) but, as a rough guideline, a good answer will have one type of figure (rather than several types), perhaps some numbers in addition to it, and approximately 6-8 lines of text to describe the figures and numbers. Remember that you should not do any formal statistical tests for this question. (5 marks) 3. Now, you want to more systematically study whether the Labour party received a significantly different percentage of votes in England versus Wales. To do so, state (in words) a hypothesis and conduct an appropriate test. Make sure to display the output of your test, identify the test statistic and the conclusion of the test. Don't forget to explain how you reached that conclusion. (3 marks) 4. Make a scatterplot that plots the total number of votes cast in each constituency in 2017 and in 2019. Based just on the figure, describe the correlation between these 3 GV950 Homework 1 Spring 2024 two variables. Think about the direction, whether the correlation is high or low, and why it likely is high or low. (Remember to make the figure as informative as possible, using good axes labels, axis markings etc). (4 marks) 5. Create a new variable that measures the change in the number of voters in a con- stituency between 2017 and 2019. Do more constituencies have more voters in 2019 compared to 2017, or fewer voters? How can you tell? (2 marks) 6. Create a variable that measures whether the same gender person ran for the Conser- vative party in 2019 and 2017. Then, using table(), summarize the information from this new variable and describe it in a sentence to state how many constituencies had a same gender person versus different gender person. (3 marks) 7. I want to try to understand the Labour party's vote share in 2019. To help me do so, first choose a continuous main independent variable that you think would help to explain this dependent variable. (Note that the X variable can not be the Labour vote share in 2017). State a hypothesis and a credible mechanism to link X to Y (Note: this is causal hurdle 1 from class and the readings and does not require you to write any code). (3 marks) 8. Do the chosen X and given Y variable fulfill causal hurdle 2? That is, is it possible that Y affects X, and why/how? (Note that this question does not require you to write any code.) (3 marks) 9. Is there covariation between X and Y (Causal hurdle 3)? To analyze this, make an appropriate figure describe what the figure shows; also calculate the correlation. (3 marks) 10. For causal hurdle 4 think of at least three potential confounders that we should be concerned about. For each, state what the confounding variable is, why we should be concerned about it and what variable you will use from the dataset to control for it. (Note: for this question, it does not have to be the case that each confounding variable will entirely cause both X and Y but each confounding variable must have a reason for being correlated with X and Y. Also note that this answer does not require any code.) (6 marks). 11. Now, run a regression of Y on just your main independent variable, X. Display the model results and interpret the coefficient on X. Have you found support for your hypothesis? Why or why not? (3 marks) 12. Run a second regression, this time of Y on all four variables (the original inde- pendent variables plus the three confounders you identified above). Interpret each variable in a sentence each, focusing on both the substantive and statistical signifi- cance for each. Now explain whether/how the coefficient on the main independent variable has changed. (6 marks) 4 GV950 Homework 1 13. Run an ANOVA test to compare both models: which one explains variation in Y better? How can you tell? Make sure to show the output of the test, and explain how you reached your conclusion. (3 marks) 5 Cr Spring 2024