important information read all the instructions carefully before you b
Search for question
Question
Important Information:
Read all the instructions carefully before you begin!
You will need to save the (.ipynb) file as a searchable PDF (NOT as a
picture), and submit it as the primary resource. Pictures or snapshots of
your work will NOT be accepted.
The generated CSV file and .ipynb file must be submitted in a zip-folder
as the secondary source.
You may use Jupyter Notebook or Colab as per your convenience.
Non-compliance with the above instructions will result in a 0 grade on the
relevant portions of the assignment. Your instructor will grade your
assignment based on what you submitted. Failure to submit the assignment
or submitting an assignment intended for another class will result in a Ø
grade, and resubmission will not be allowed. Make sure that you submit
your original work. Suspected cases of plagiarism will be treated as
potential academic misconduct and will be reported to the College Academic
Integrity Committee for a formal investigation. As part of this procedure,
your instructor may require you to meet with them for an oral exam on the
assignment.
Important First Steps:
You can use either Anaconda or Colab to work on the Jupyter notebook that
you will submit as your final project on Forum:
Start by downloading this Jupyter Notebook to your local machine.
Open a tab in your browser and type https://colab.research.google.com/.
This will open a small window. Choose the last option Show notebooks in
Drive on the upper menu, "Upload". Then choose the Jupyter notebook you
have saved in step 1.
You can start working on your assignment by answering the questions in the
corresponding cells.
A sample code is provided for tasks 3 to 6. Remember these are only sample
codes, and you will need to make minor revisions to the codes to be able
to complete the tasks.
If you have any questions, please reach out to your instructors and the
CIS tutors.
Background
Imagine that you have graduated CIS and now work as a consultant.
You are hired by a health and fitness company.
They have collected detailed data from 507 physically active participants.
This data includes information about the participant's body measurements
as well as personal attributes such as age, weight, height, and gender.
The company wants you to analyze this data in ways that can help them
design personalized fitness evaluations and training regimens for their
users.
Note: The entire dataset (and descriptions of each of the variables) can
be found [here]
(https://vincentarelbundock.github.io/Rdatasets/doc/openintro/bdims.html).
In Assignment 1 you will take a random sample of 100 participants from the
507 individuals who were studied, and analyze the data for these 100
individuals. Task 1.
As mentioned above, you will select a random sample of 100 individuals
from the company's data set.
You will then conduct analyses on this random sample.
Look at the code below. To select a random sample from the data, you
should replace Name with your own name in the code.
After you have done so run the code. The code will generate a CSV file
with a random sample of 100 participants. It will also be labeled with
your name.
REMEMBER: you need to add this CSV file to a zip file along with your
.ipynb. file when submitting your assignment.
Task 2.
Now that you have your data set you are ready to start analyzing it!
The first step is to explore your dataset.
Look at the variables that make up the data set.
Once you've done so, imagine you are writing a report for the fitness
company that hired you.
Start with a brief introduction to the research question you are
exploring, then the dataset you are analyzing (e.g., what is the sample
you are analyzing? What are the variables?)
Assume that your audience is the company's leadership. They will be with
what you are reporting.
Task 3.
Run the code to randomly select 4 variables from your dataset.
It will then print the names of the four variables that were randomly
selected.
REMEMBER: Check the full name of each of your variables, you can find it
here.
Your task is to do the following:
You should create a histogram and generate descriptive statistics for each
of the four variables that were randomly selected above. You can use the
code below to help you do so.
For each variable you need to describe the following: shape, ** center**,
spread**, and the presence of any outliers.
Task 4.
Now that you have described and plotted data, let's explore if the data
differ for male and female participants.
Generate grouped box plots for each of the 4 variables in Task 3.
Your boxplot should compare the distributions for males and females in
your dataset.
Afterwards, you should describe what you observe in each case.
Make sure you mention the five-number summaries for both genders.
Task 5
Part A
Select TWO variables from Task 3. Treat these as an independent variable.
** Now create a scatterplot for each variable.
In each case, the plot should visualize the relationship between the
variable and weight (dependent variable).
Describe each scatterplot in terms of the form, ** strength**, and
direction of the relationship between the variables.
Part B
Examine if the relationship explored in each scatterplot varies by gender.
Hint: You will need to create scatterplots separately for each gender to
answer this question.
Task 6.
PART A
Finally, for each of the variables you focused on in Task 5:
Fit a simple linear regression model that predicts a participant's Weight
based on the variable you selected.
Make sure you generate, interpret, and use the residual plot, the standard
error, and the R^2 to assess the fit of each linear model.
If the model is a good fit, interpret the slope and the y-intercept.
PART B
If you found that the relationship between weight and the variable you
selected differed for males and females in Task 5 (Part B) then:
Run the regression model for each gender separately and interpret your
findings accordingly.
Assignment Information
Length:
N/A
Weight:
18%
Learning Outcomes Added
CompProgramDesign: Generate working programs in a computer language
that can solve computational problems; find and fix bugs that appear in
them.
Variables: Identify and classify the relevant variables of a system,
problem, or model.
DescriptiveStats: Calculate and interpret descriptive statistics
appropriately.
Correlation: Apply and interpret measures of correlation;
distinguish correlation and causation.
Visualizations: Interpret, analyze, and create data visualizations.
NOTE: The csv file is attached, just open the Jupiter notebook and press on top
it'll say something like take me to Collab and there you'll see the
questions and everything clearly.
In task 1 it said that i should replace it with my name and i did and sent
it but if you have to re do it from ur side
replace it by my name which is “sanah"
But please do it in the colab and send me the colab link later the same
not in pages or any form.
But the one who'll do it
Will have another cvs file cause he'll have to re do it
So just let him write my name
Again in the code that's provided in colab
ASIA_Assignment_1_Spring_2024.ipynb
File Edit View Insert Runtime Tools Help Changes will not be saved
+ Code + Text
Copy to Drive
• REMEMBER: you need to add this csv file to a zip file along with y
IMPORTANT: ONLY RUN THIS CODE BLOCK ONCE.
If you run it a second time, it will generate a new random sample of 10
match your original analyses.
# The code below will generate a random sample of 100
# You need to replace "Name" in the code below with you
# contains a random sample of 100 individuals.
# REMEMBER: you need to submit this csv file in the zi
try:
df = pd.read_csv('Name.csv')
except FileNotFoundError:
# replace Name
original_data = pd.read_csv("https://raw.githubuser
df1 =original_data.sample (100)
df1.to_csv('Name.csv')
df = pd.read_csv('Name.csv')
df = pd.DataFrame(df)
df.to_csv('Name.csv')
# replace
# replace
# replace
df.head()
Unnamed:
Unnamed:
bia di bii_di bit_di che_de
0.1
0
0
128
504
35.3
28.7
30.4
17.7
B
1
390
121
42.1
28.5
33.1
20.2
Then, from there on top. I'll say open with google colab.
Let him touch it
And I'll work.
Yes later send me the colab link
The same with the CVs link you'll get of my name
don't use the CVs I've provided You'll have to do it again with my name
https://drive.google.com/file/d/10Gcx9DC5YFUW616keh1E8xHZNRsZPu9F/view?usp
=sharing