Search for question
Question

Important Information: Read all the instructions carefully before you begin! You will need to save the (.ipynb) file as a searchable PDF (NOT as a picture), and submit it as the primary resource. Pictures or snapshots of your work will NOT be accepted. The generated CSV file and .ipynb file must be submitted in a zip-folder as the secondary source. You may use Jupyter Notebook or Colab as per your convenience. Non-compliance with the above instructions will result in a 0 grade on the relevant portions of the assignment. Your instructor will grade your assignment based on what you submitted. Failure to submit the assignment or submitting an assignment intended for another class will result in a Ø grade, and resubmission will not be allowed. Make sure that you submit your original work. Suspected cases of plagiarism will be treated as potential academic misconduct and will be reported to the College Academic Integrity Committee for a formal investigation. As part of this procedure, your instructor may require you to meet with them for an oral exam on the assignment. Important First Steps: You can use either Anaconda or Colab to work on the Jupyter notebook that you will submit as your final project on Forum: Start by downloading this Jupyter Notebook to your local machine. Open a tab in your browser and type https://colab.research.google.com/. This will open a small window. Choose the last option Show notebooks in Drive on the upper menu, "Upload". Then choose the Jupyter notebook you have saved in step 1. You can start working on your assignment by answering the questions in the corresponding cells. A sample code is provided for tasks 3 to 6. Remember these are only sample codes, and you will need to make minor revisions to the codes to be able to complete the tasks. If you have any questions, please reach out to your instructors and the CIS tutors. Background Imagine that you have graduated CIS and now work as a consultant. You are hired by a health and fitness company. They have collected detailed data from 507 physically active participants. This data includes information about the participant's body measurements as well as personal attributes such as age, weight, height, and gender. The company wants you to analyze this data in ways that can help them design personalized fitness evaluations and training regimens for their users. Note: The entire dataset (and descriptions of each of the variables) can be found [here] (https://vincentarelbundock.github.io/Rdatasets/doc/openintro/bdims.html). In Assignment 1 you will take a random sample of 100 participants from the 507 individuals who were studied, and analyze the data for these 100 individuals. Task 1. As mentioned above, you will select a random sample of 100 individuals from the company's data set. You will then conduct analyses on this random sample. Look at the code below. To select a random sample from the data, you should replace Name with your own name in the code. After you have done so run the code. The code will generate a CSV file with a random sample of 100 participants. It will also be labeled with your name. REMEMBER: you need to add this CSV file to a zip file along with your .ipynb. file when submitting your assignment. Task 2. Now that you have your data set you are ready to start analyzing it! The first step is to explore your dataset. Look at the variables that make up the data set. Once you've done so, imagine you are writing a report for the fitness company that hired you. Start with a brief introduction to the research question you are exploring, then the dataset you are analyzing (e.g., what is the sample you are analyzing? What are the variables?) Assume that your audience is the company's leadership. They will be with what you are reporting. Task 3. Run the code to randomly select 4 variables from your dataset. It will then print the names of the four variables that were randomly selected. REMEMBER: Check the full name of each of your variables, you can find it here. Your task is to do the following: You should create a histogram and generate descriptive statistics for each of the four variables that were randomly selected above. You can use the code below to help you do so. For each variable you need to describe the following: shape, ** center**, spread**, and the presence of any outliers. Task 4. Now that you have described and plotted data, let's explore if the data differ for male and female participants. Generate grouped box plots for each of the 4 variables in Task 3. Your boxplot should compare the distributions for males and females in your dataset. Afterwards, you should describe what you observe in each case. Make sure you mention the five-number summaries for both genders. Task 5 Part A Select TWO variables from Task 3. Treat these as an independent variable. ** Now create a scatterplot for each variable. In each case, the plot should visualize the relationship between the variable and weight (dependent variable). Describe each scatterplot in terms of the form, ** strength**, and direction of the relationship between the variables. Part B Examine if the relationship explored in each scatterplot varies by gender. Hint: You will need to create scatterplots separately for each gender to answer this question. Task 6. PART A Finally, for each of the variables you focused on in Task 5: Fit a simple linear regression model that predicts a participant's Weight based on the variable you selected. Make sure you generate, interpret, and use the residual plot, the standard error, and the R^2 to assess the fit of each linear model. If the model is a good fit, interpret the slope and the y-intercept. PART B If you found that the relationship between weight and the variable you selected differed for males and females in Task 5 (Part B) then: Run the regression model for each gender separately and interpret your findings accordingly. Assignment Information Length: N/A Weight: 18% Learning Outcomes Added CompProgramDesign: Generate working programs in a computer language that can solve computational problems; find and fix bugs that appear in them. Variables: Identify and classify the relevant variables of a system, problem, or model. DescriptiveStats: Calculate and interpret descriptive statistics appropriately. Correlation: Apply and interpret measures of correlation; distinguish correlation and causation. Visualizations: Interpret, analyze, and create data visualizations. NOTE: The csv file is attached, just open the Jupiter notebook and press on top it'll say something like take me to Collab and there you'll see the questions and everything clearly. In task 1 it said that i should replace it with my name and i did and sent it but if you have to re do it from ur side replace it by my name which is “sanah" But please do it in the colab and send me the colab link later the same not in pages or any form. But the one who'll do it Will have another cvs file cause he'll have to re do it So just let him write my name Again in the code that's provided in colab ASIA_Assignment_1_Spring_2024.ipynb File Edit View Insert Runtime Tools Help Changes will not be saved + Code + Text Copy to Drive • REMEMBER: you need to add this csv file to a zip file along with y IMPORTANT: ONLY RUN THIS CODE BLOCK ONCE. If you run it a second time, it will generate a new random sample of 10 match your original analyses. # The code below will generate a random sample of 100 # You need to replace "Name" in the code below with you # contains a random sample of 100 individuals. # REMEMBER: you need to submit this csv file in the zi try: df = pd.read_csv('Name.csv') except FileNotFoundError: # replace Name original_data = pd.read_csv("https://raw.githubuser df1 =original_data.sample (100) df1.to_csv('Name.csv') df = pd.read_csv('Name.csv') df = pd.DataFrame(df) df.to_csv('Name.csv') # replace # replace # replace df.head() Unnamed: Unnamed: bia di bii_di bit_di che_de 0.1 0 0 128 504 35.3 28.7 30.4 17.7 B 1 390 121 42.1 28.5 33.1 20.2 Then, from there on top. I'll say open with google colab. Let him touch it And I'll work. Yes later send me the colab link The same with the CVs link you'll get of my name don't use the CVs I've provided You'll have to do it again with my name https://drive.google.com/file/d/10Gcx9DC5YFUW616keh1E8xHZNRsZPu9F/view?usp =sharing