working individually and in an entirely reproducible way please write
Search for question
Question
Working individually and in an entirely reproducible way please write a paper that involves
original work to tell a story with data.
Step 1 is to find a dataset then Data should be extracted, cleaned, and processed using R. I
need you to give me the R codes for data processing as "Scripts" like in the Rubric. The
analysis dataset should be saved as a parquet file.
Step 2 is using the cleaned dataset to write make graphs and models in the paper written
on a Quarto file with chunks of R code.
Develop a research question that is of interest to you based on your own interests,
background, and expertise, then obtain or create a relevant dataset.
Research Question: "How do changes in the Bank of Canada's interest rate policy impact
consumer debt levels in Canada?"
This question delves into the relationship between central bank policy decisions-
specifically, adjustments in interest rates—and the amount of debt that consumers carry.
It's particularly relevant in an era where household debt levels are a significant concern for
economic stability and personal financial health.
So create a model to see the relationship between the two (it can be a linear regression
model to see if higher interest rates mean more or less consumer debt).
Note: This is a research question I wanted to do but if you want to do something else with a
better model explanation and with lesser random factors that might affect our model, you
can do that but keep it economics/finance related.
Do not use a dataset from Kaggle, UCI, or Statistica. Mostly this is because everyone
else uses these datasets and so it does nothing to make you stand out to employers,
but there are sometimes also concerns that the data are old, or you do not know the
provenance.
FAQ:
How much should I write? Most students submit something that has 10-to-
20-pages of main content, with additional pages devoted to appendices, but it
is up to you. Be precise and thorough.
Can I use any model? You are welcome to use any model, but you need to
thoroughly explain it and this can be difficult for more complicated models.
Start small. Pick one or two predictors. Once you get that working, then
complicate it. Remember that every predictor and the outcome variable needs
to be graphed and explained in the data section. Rubric:
Component
R is appropriately
cited
Class paper
Range
Requirement
Must be properly referred to in the main content and included
0 'No' 1 'Yes' in the reference list. If not, no need to continue marking, paper
gets 0 overall.
Check meta data such as project and folder names, as well as
O 'No' 1 'Yes' other aspect such as title etc. If there is any sign this is a class
paper then no need to continue marking, paper gets 0 overall.
LLM usage is
documented
0 'No'; 1 'Yes'
Title
0 - 'Poor or not
A separate paragraph or dot point must be included in the
README about whether LLMs were used, and if so how. If auto-
complete tools such as co-pilot were used this must be
mentioned. If chat tools such as Chat-GPT4, were used then
the entire chat must be included in the usage text file. If not, no
need to continue marking, paper gets 0 overall.
An informative title is included that explains the story, and
done'; 1- 'Yes'; 2 ideally tells the reader what happens at the end of it. 'Paper X'
- 'Exceptional'
Author, date, and
repo
0 'Poor or not
done'; 2 'Yes'
Abstract
0 'Poor or not
done'; 1 'Gets
job done'; 2 -
'Fine'; 3-
'Great'; 4-
'Exceptional'
is not an informative title. There should be no evidence this is a
school paper.
The author, date of submission in unambiguous format, and a
link to a GitHub repo are clearly included. (The later likely, but
not necessarily, through a statement such as: 'Code and data
supporting this analysis is available at: LINK').
An abstract is included and appropriately pitched to a non-
specialist audience. The abstract answers: 1) what was done, 2)
what was found, and 3) why this matters (all at a high level).
Likely four sentences. Abstract must make clear what we learn
about the world because of this paper.
Introduction
Estimand
Data
Measurement
Model
0 - 'Poor or not
done'; 1 'Gets
job done'; 2-
'Fine'; 3-
'Great'; 4-
'Exceptional'
0 'Poor or not
done'; 1 'Done'
0 'Poor or not
done'; 2 'Many
issues'; 4 -
'Some issues'; 6
- 'Good'; 8-
'Great'; 10-
'Exceptional'
0 'Poor or not
done'; 2 'Some
issues'; 3 -
'Good'; 4-
'Exceptional'
0 'Poor or not
done'; 2
issues'; 4 -
'Many
'Some issues'; 6
- 'Good'; 8-
'Great'; 10 -
'Exceptional'
The introduction is self-contained and tells a reader everything
they need to know including: 1) broader context to motivate; 2)
some detail about what the paper is about; 3) a clear gap that
needs to be filled; 4) what was done; 5) what was found; 6) why
it is important; 7) the structure of the paper. A reader should be
able to read only the introduction and know what was done,
why, and what was found. Likely 3 or 4 paragraphs, or 10 per
cent of total.
The estimand is clearly stated in the introduction.
A sense of the dataset should be communicated to the reader.
The broader context of the dataset should be discussed. All
variables should be thoroughly examined and explained. Explain
if there were similar datasets that could have been used and
why they were not. If variables were constructed then this
should be mentioned, and high-level cleaning aspects of note
should be mentioned, but this section should focus on the
destination, not the journey. It is important to understand what
the variables look like by including graphs, and possibly tables,
of all observations, along with discussion of those graphs and
the other features of these data. Summary statistics should
also be included, and well as any relationships between the
variables. If this becomes too detailed, then appendices could
be used. Basically, for every variable in your dataset that is of
interest to your paper there needs to be graphs and explanation
and maybe tables.
A thorough discussion of measurement, relating to the dataset,
is provided in the data section. Please ensure that you explain
how we went from some phenomena in the world that
happened to an entry in the dataset that you are interested in.
The model should be nicely written out, well-explained,
justified, and appropriate. Results
Discussion
0 'Poor or not
Results will likely require summary statistics, tables, graphs,
done'; 2 'Many images, and possibly statistical analysis or maps. There should
issues'; 4-
'Some issues'; 6
- 'Good'; 8-
'Great'; 10-
'Exceptional'
0 'Poor or not
done'; 2 'Many
issues'; 4 -
also be text associated with all these aspects. Show the reader
the results by plotting them where possible. Talk about them.
Explain them. That said, this section should strictly relay
results. Regression tables must not contain stars.
Some questions that a good discussion would cover include
(each of these would be a sub-section of something like half a
page to a page): What is done in this paper? What is something
'Some issues'; 6 that we learn about the world? What is another thing that we
- 'Good'; 8 -
'Great'; 10-
'Exceptional'
Cross-references
0 'Poor or not
done'; 2 - 'Yes'
Prose
Graphs/tables/etc
Referencing
0 'Poor or not
done'; 2 'Many
issues'; 4-
'Good'; 6-
'Exceptional'
0 'Poor or not
done'; 1 'Gets
job done'; 2 -
'Fine'; 3-
'Great'; 4-
'Exceptional'
0 - 'Poor or not
done'; 3 'One
minor issue'; 4-
'Perfect'
learn about the world? What are some weaknesses of what was
done? What is left to learn or how should we proceed in the
future?
All figures, tables, and equations, should be numbered, and
referred to in the text using cross-references.
All aspects of submission should be free of noticeable typos,
spelling mistakes, and be grammatically correct. Prose should
be coherent, concise, and clear. Do not use filler phrases such
as 'delve into' or 'shed light'. Remove unnecessary words.
Graphs and tables must be included in the paper and should be
to well-formatted, clear, and digestible. They should: 1) serve a
clear purpose; 2) be fully self-contained through appropriate
use of captions and sub-captions; 3) appropriately sized and
colored; and 4) have appropriate significant figures, in the case
of tables.
All data, software, literature, and any other relevant material,
should be cited in-text and included in a properly formatted
reference list made using BibTeX. A few lines of code from
Stack Overflow or similar, would be acknowledged just with a
comment in the script immediately preceding the use of the
code rather than here. But larger chunks of code should be fully
acknowledged with an in-text citation and appear in the
reference list.
Simulation
Tests
Parquet
Reproducibility
SPOSI UITIOL
done'; 1 - 'Gets
job done'; 2-
'Fine'; 3-
'Great'; 4-
'Exceptional'
0 'Poor or not
done'; 1 - 'Gets
job done'; 2-
'Fine'; 3-
'Great'; 4-
'Exceptional'
0 'Not done'; 1
- 'Done'
0 'Poor or not
done'; 1 'Gets
job done'; 2-
'Fine'; 3 -
'Great'; 4-
'Exceptional'
The script is clearly commented and structured. All variables
are appropriately simulated.
Data and code tests are appropriately used.
The analysis dataset is saved as a parquet file. (Note that the
raw data should be saved in whatever format it came.)
The paper and analysis should be fully reproducible. The repo
should have a detailed README. All code should be thoroughly
documented. An R project should be used. Code should be
used to do all steps including appropriately read data, prepare
it, create plots, conduct analysis, and generate documents.
Seeds should be used where needed. Code should have a
preamble and be well-documented including comments and
layout. The repo should be appropriately organized and not
contain extraneous files. setwd() and hard coded file paths
must not be used.
0 'Poor or not
Code style
done'; 1-
'Exceptional'
Code is appropriately styled using styler or lintr
Enhancements
0 'Poor or not
done'; 1 'Gets
job done'; 2 -
'Fine'; 3-
'Great'; 4-
You should pick at least one of the following and include it to
enhance your submission: 1) A datasheet for the dataset; 2) A
model card for the model; 3) A Shiny application; 4) An R
package; or 5) API for the model.
'Exceptional' A couple of examples from past assignments:
Code should look like this in the Quarto document:
## Results
### Bacon (per 500 grams)
The cost of bacon fluctuated significantly over the analyzed time period. On average, the cost of bacon increased by 5.39% while the inflation rate increased by 3.04%. The highest price
{d}...
#| messages: false
#echo: false
#| warning: false
#### Create graphs plotting mandated and actual response times ####
# Define colors to be used in legend
bacon_table< grocery_data |>
select(year, bacon)
bacon_inflation_data <-
# Specify which tables to merge (this merges by row)
merge(
x = bacon_table,
y = inflation_data,
by = "year",
all.x = TRUE
> | >
# Calculate the percent change of chicken prices
mutate(percent_change = round((((bacon - lag (bacon, 1))/lag (bacon, 1)) * 100), digits = 2)) |>
# Remove empty rows
drop_na()
{J}...
#messages: false
#| echo: false
After rendering, Quarto document will look like this: Analyzing the Relationship between Increased Employment Rates
and Higher Permanent Immigrant Inflows*
An Analysis of Canada's Employment Rate and Permanent Immigrant Inflows
19 April 2023
Abstract
This paper utilizes data from the OECD to examine the relationship between Canada's employment
rate and permanent immigrant inflows between 2009 to 2019. The analysis revealed a positive correlation
between the two variables, indicating that as the employment rate increased, so did permanent immigrant
inflows. These findings matter as they highlight the importance of a strong labor market and economic
development in attracting and retaining permanent immigrants. The insights can guide policymakers in
developing policies to attract and retain permanent immigrants.
1 Introduction
Immigration has been a key driver of economic growth and cultural diversity for countries around the
world. Over the past two decades, many countries have opened doors to immigration, recognizing the
significant economic and social benefits that immigration can bring. However, immigration policies and
their implementation have varied across countries, with some facing challenges in attracting and retaining
permanent immigrants. One possible factor that could influence a country's ability to attract and retain
immigrants is its employment rate. High employment rates means that the country has economic stability
and job opportunities, making a country a more attractive destination for permanent immigrants. Whereas,
low employment rates could signal economic instability, leading to a decrease in permanent immigrant inflows.
In this paper, we will examine the relationship between Canada's employment rate and permanent immigrant
inflows through a linear regression analysis. The estimand here is how employment rate and immigrants
inflows are related. Specifically, we will focus on Canada, which has a relatively high immigration number.
We will draw data from the OECD website (OECD 2023b). Our respondents of interest are the percentage
of the working-age population, as they represent the potential labor force and have a significant impact on
a country's economic growth and development. Based on the analysis, we found that there is a positive
relationship between employment rate and permanent inflow rate.
While existing literature has examined the impact of economic factors on immigration, this paper specifically
focuses on the relationship between employment rate and permanent immigrant inflows. This exploration can
provide valuable insights for government officials and policy makers in developing policies to attract and
retain permanent immigrants and influencing permanent immigrant inflows. In addition, this research can
impact economic development strategies which can benefit the labour market and immigration.
permanent immigrant inflows, and we will examine the patterns and trends to highlight the similarities and
differences in immigration patterns, analysis of the bias and ethical concerns, and weakness and steps.
2 Data
2.1 Data Description and Methodology
The data used in this paper is obtained from the OECD Data (Organization for Economic Co-operation
and Development) and is publicly available through the OECD website (OECD 2023b). Founded in 1961,
the OECD is an intergovernmental organisation with 38 member countries collaborating to develop policy
standards to promote sustainable economic growth. The organization's data is widely used by policymakers,
researchers, and analysts to understand trends and inform policy decisions. The OECD has collected data
regarding economy, education, employment, environment, health, tax, trade, GDP, unemployment rate, and
inflation. It keeps records on a monthly, quarterly, and yearly data from the participating countries.
The OECD collects data through member countries, partner organizations, and surveys. One of the primary
sources of data for the OECD is its member countries, which provide data on a regular basis across a wide
range of indicators, including GDP, employment, education, health, and the environment. This data is then
aggregated and analyzed by the OECD to identify trends and inform policy recommendations.
The three datasets that I will be using are: Permanent immigrant inflows (OECD 2023c), Employment Rate
(OECD 2023a), and Population (OECD 2023d). All of them will be specifically Canada and have evolved
from 2009 to 2019. Permanent immigrant inflows cover regulated movements of foreigners considered to be
settling in the country from the perspective of the destination country. The data presented are the result
of a standardization process in Canada. The number of variables recorded in the data was 253 101 in 2009,
262 773 in 2013, and 341 173 in 2019. Employment rates are a measure of the extent to which available
labour resources (people available to work) are being used. They are calculated as the ratio of the employed
to the working age population. The working age population refers to people aged 15 to 64. The percentage
of variables recorded in the data was 71.5 in 2009, 2.7 in 2013, and 74.6 in 2019. Finally, the population
is defined as all nationals present in, or temporarily absent from a country, and aliens permanently settled
in a country. This indicator shows the number of people that usually live in an area. Growth rates are the
annual changes in population resulting from births, deaths, and net migration during the year.
Table 1: A summary table of cleaned data
Country Year Employment Rate Permanent Inflows Rate
1 Introduction
Immigration has been a key driver of economic growth and cultural diversity for countries around the
world. Over the past two decades, many countries have opened doors to immigration, recognizing the
significant economic and social benefits that immigration can bring. However, immigration policies and
their implementation have varied across countries, with some facing challenges in attracting and retaining
permanent immigrants. One possible factor that could influence a country's ability to attract and retain
immigrants is its employment rate. High employment rates means that the country has economic stability
and job opportunities, making a country a more attractive destination for permanent immigrants. Whereas,
low employment rates could signal economic instability, leading to a decrease in permanent immigrant inflows.
In this paper, we will examine the relationship between Canada's employment rate and permanent immigrant
inflows through a linear regression analysis. The estimand here is how employment rate and immigrants
inflows are related. Specifically, we will focus on Canada, which has a relatively high immigration number.
We will draw data from the OECD website (OECD 2023b). Our respondents of interest are the percentage.
of the working-age population, as they represent the potential labor force and have a significant impact on
a country's economic growth and development. Based on the analysis, we found that there is a positive
relationship between employment rate and permanent inflow rate.
While existing literature has examined the impact of economic factors on immigration, this paper specifically
focuses on the relationship between employment rate and permanent immigrant inflows. This exploration can
provide valuable insights for government officials and policy makers in developing policies to attract and
retain permanent immigrants and influencing permanent immigrant inflows. In addition, this research can
impact economic development strategies which can benefit the labour market and immigration.
In section 1, we discuss the source of data used in this paper, the strengths and weaknesses of OECD,
methodologies that follow it, and data terminology. In section 2, we present the results of our analysis,
focusing on the trajectory of employment rate and permanent immigrant inflows over the past 10 years in
Canada. In section 3, we will analyze the trend by establishing a linear regression model. In section 4 we
will present the result of the model in a graph. In the final section, we explore the factors that contribute to
In this paper, the analysis will be carried out using the statistical programming language R (R Core Team
2020), using the haven and tidyverse (Wickham et al. 2019), devtools (Wickham, Hester, and Chang
2020) and dplyr (Wickham et al. 2021). All figures in the report are generated using ggplot2 (Wickham
2016). We run the model in R (R Core Team 2020) using the rstanarm package of (Goodrich et al. 2022).
2.2 Data Visualization
2.2.1 Permanent Immigrants Inflows Rate from 2009 to 2019
€ 0.90-
0.85-
0.80-
0.75-
Permanent Immigrants Inflows Rate from 2009 to 2019
CAN
2009
71.50833
0.7526295
CAN
2010
71.55833
0.8271222
CAN
2011
71.97500
CAN
2012
72.31667
0.7259664
0.7442108
CAN
2013
72.71667
0.7490048
CAN
2014
72.50000
0.7375731
CAN
2015
72.74167
0.7726317
CAN
2016
72.67500
0.8217785
CAN
2017
73.56667
0.7838149
CAN
2018
CAN
2019
74.02500
74.60000
0.8661575
0.9073453
Table 1 presents the cleaned dataset, which includes 11 variables and 5 observations in total. The variables
in the dataset include Year (in years), Country, Employment Rate (in percentage), Permanent Immigrant
Rate (in percentage). The Permanent Inflows Rate was calculated by dividing the Permanent Immigrant
Flows by the Total Population of Canada. All percentages are based on the corresponding population for
each variable.
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Year
Figure 1: Overall increase in Canada's permanent immigration inflow from 2009 to 2019 which is from 0.751
to 0.925.
Figure 1 shows the overall trend of permanent immigrant inflows in Canada between 2009 and 2019. The
overall trend of the plot shows an increasing pattern in permanent immigration inflow rate over the years,
with fluctuations. From 2009 to 2019, the permanent immigrant inflow rate increased from 0.751 to 0.925.
Furthermore, there have been fluctuations in the permanent immigrant inflow rate over time, the overall
trend has been upward, with a notable rise in the rate following a decline from 0.85 to 0.71 in 2011. This
trend highlights the increasing importance of immigration as a driver of economic and social growth in
OECD member countries. These trends and analyzing the underlying drivers can provide valuable insights
into the impacts of immigration on its member countries and help inform policy decisions related to
immigration and integration.