Search for question
Question

Data Analysis Assignment #1 The first two parts of this assignment involve creating various plots in R, using R to calculate some summary statistics, and giving some interpretations. Any plot

that is done in software other than R (e.g. Excel) will receive a grade of 0. It is a straight 0 if you use any other software-this is an R assignment. (Using R Studio is fine, as that simply sits on top of R. Many people prefer to use R studio.) Create the plots in R and save the plot file (not screenshots). For questions 1-3, each part to each question is worth 5 marks. Question 4 is worth 5 marks total. Note that some instructors can be very, very particular about the specifics of titles, captions, and labels. (And they sometimes have good reasons for that.) I am (perhaps) a little less particular about that sort of thing. The end result should be something that is neat and tidy, and clearly informs the reader what the plot or table represents. (You can do this assignment in either base R or R Studio. I usually phrase things in terms of base R.) 1 Surface Counts in an Atlantic Puffin Breeding Colony (20 marks total) Coastal Newfoundland is summertime home to hundreds of thousands of Atlantic puffins, and puffin viewing is a major tourism draw. The puffins spend winters in the open ocean, and come to coastal Newfoundland in the summer to breed. A pair of breeding puffins lay a single egg in a burrow. If all goes well, a puffling hatches in roughly 6 weeks. The parents spend the next 6 weeks or so feeding the puffling in the burrow. A primary food source is the capeline that are abundant in Newfoundland waters in the summer. Adult puffins in a breeding colony will spend some of their time in the burrow, some on the surface of the island, some flying, and some being in or on the water. Let's do some simple exploratory data analysis of puffin behaviour. The data in 2040 W24 puffins.csv represent surface counts of Atlantic puffins on North Bird Island, near Bonavista, NL. Each observation is a surface count taken in the early afternoon, along with temperature, and an ordinal cloud cover variable (cloudy, mix, sunny). There are counts from 38 days between July 1 and August 19, 2023. (The data set has a single count for each of the 38 days, taken in the early afternoon.) The counts were taken from approximately one mile away, by a single observer with a spotting scope. The area is prone to fog, and some days were not clear enough to get a measurement. Figure 1: Puffins on the surface of a breeding colony island, with North Bird Island in the distance on the right. Let's explore this data by creating plots and summary statistics to investigate possible relationships between surface count and temperature and cloud cover. a) Import the data into R. Look at the data after you've imported it to make sure it imported properly and you know what you're dealing with!!!! 1. Let's start to creating a frequency histogram of surface counts. (Use R's hist command for this.) Appropriately label the axes, and give your plot an appropriate title or caption. (If you're going to use a title rather than a caption, that is simple to do in R; if you're going to use a caption that's probably best to do in whatever word processor you are using.) Find a way to use R to put the name of your or one of your group members in the top left of the plot (Hint: use the text command see help(text) for details.) (More details about creating histograms in R can be found in Section 5 of the Intro to R doc.) Include this plot with your submission. Include the final command or commands that you used to create the plot. (I don't need to see all commands, just the ones you used to create the plot. It's best to save your plot using R's standard ways of saving it, then import it into your word processor rather than taking a screenshot. Copy & paste works best for the commands, and can also be used for the plot (though I prefer save & import). b) Now let's create a scatterplot with surface count on the vertical axis and temperature on the horizontal. (Use R's plot command for this.) Appropriately label the axes, and give your plot an appropriate title or caption. (If you're going to use a title rather than a caption, that is simple to do in R; if you're going to use a caption that's probably best do in whatever word processor you are using.) Find a way to get R to put the name of your or one of your group members in the top left of the plot (Hint: use the text command see help(text) for details.) - Include this plot with your submission. Include the final command or commands that you used to create the plot. (I don't need to see all commands, just the ones you used to create the plot. It's best to save your plot using R's standard ways of saving it, then import it into your word processor rather than taking c) Comment on the plot, and give your thoughts on what they might indicate about a possible relationship between surface count and temperature. a screenshot. Copy & paste works best for the commands, and can also be used for the plot (though I prefer save & import). d) If we wish to use this data to draw formal conclusions about a relationship between surface count and temperature, describe a few potential problems associated with how this data was collected and why this might have introduced bias. 2 Atlantic Puffins, continued (15 marks) Let's now do a preliminary exploration of a possible link between surface count and cloud cover. Recall that the data set contains a cloud cover variable with 3 levels (Cloudy, Mix, Sunny, which roughly corresponded to greater than 75% cloud cover, between 25% and 75% cloud cover, and less than 25% cloud cover). a) Plot side-by-side boxplots of the surface counts against cloud cover (Cloudy, Mix, Sunny). b) Use R to find the mean, median, standard deviation, and sample size for each of the three groups. Create a table (in whatever word processor you are using) that shows these values, Give your table an appropriate caption. Put this table below the boxplots in your submission. c) Comment on the plot and summary statistics, and giving your thoughts on what they might indicate about a possible relationship between surface count and cloud cover. 4 Two Easy Questions (5 marks total) a) Here we are going to use R to randomly generate one single value from the integers 1 through 10. Try out the command sample(10,1), and note that it's randomly generating a value from the integers 1-10. Now, we can't cherry pick which one we like, so we're going to use the command one more time: use sample(10,1) and record whatever comes out. b) Here we are going to use R to randomly generate four values without replacement from the integers 1 through 20. Try out the command sample(20,4,replace=FALSE), and note that it's randomly generating 4 values from the integers 1-20. Now, we can't cherry pick which set we like, so we're going to use the command one more time: use sample(20,4,replace=FALSE) and record whatever comes out. There's a Courselink quiz question that asks one thing: Are all 4 of your numbers even? (e.g. the set 18,6,2,8 contains all even numbers, but the sets 18,3,2,8 and 9,3,17,2 do not.)