Olympics Data Analysis and Dashboard:
Power BI Data Preparation and Dashboard Development
Summary
With Power BI "Transform Data," clean and prepare the Olympics data set. Then, with Power BI Desktop,
plan and develop at least 4 professional quality dashboards plus a sheet that allows the user to drill down
to lowest-level details contained in the data, to help someone generally knowledgeable about the
Olympics, explore and learn more about:
a) the Olympics as a whole (dashboard);
b) the Olympics in a specific year and season that the user specifies (winter or summer)
(dashboard);
c) country-level information (user specifies the country) (dashboard);
d) sport/event-level information (specified by the user), including information about individual
athletes (dashboard); and
e) a page in the PBI file that allows the user to interactively drill down to the lowest level details in
the data.
Prepare a brief writeup as part of the PBI file.
Guidance
Read the entire document before starting. Plan to go through two or three iterations of your dashboards.
Start early enough so you can be thinking about how to improve the dashboards and make the document
unified (i.e., common styles, look/feel, user interactions).
Data Files and Making Connections to Data
The zip file olympics.zip contains three CSV files. Two contain information about Olympic Games
performances over many years (olympic_summer.csv, over 220,000 rows; and olympic_winter.csv, over
48,000 rows). These two files have the same schema (format/structure of data columns). The third CSV
file (olympic_countries.csv; about 230 rows) contains country abbreviations and associated country
names (see the Data Dictionary section below). Do not make any changes to the CSV files. Connect to these
files from Power BI. You can certainly look at and explore them in a text editor and/or Excel, but when
you load them to Power BI, connect to them as CSV files. It is best to keep the CSV files and your PBIX file
in the same directory.
It is important to realize that the PBIX file contains connections to data files. The PBIX file format is
designed to both keep this connection information in addition to a copy of the data. However, the “true”
source of the data is the CSV files, and whenever PBI refreshes the data (for example, in the Transform
Data steps), it looks to the connection information. You will likely be working on this from several
different computers. It is important, after copying the PBIX file to a new computer, to also update the
"Data Source" settings in PBI. From the Home menu, pull the arrow down next to Transform Data, and
choose "Data source settings." From there, select each CSV file and update the file location to where the
file resides on your computer. This is the most important during the Transform Data phase (data
preparation). You will not be able to do any Transform Data operations unless PBI can "find" the true
source files. After your data is completely prepared, PBI can actually work with the copy of the data in the
current PBIX file, but if the source data ever changes or you need to go into Transform Data again, you will
need to update the Data Source settings so PBI can find the files.
More information, not required if you follow the steps in the previous paragraph. It is possible to store the
CSV files on a web server or shared resource, e.g., Google Drive, Sharepoint, etc. In my testing with these
files, connections to Google Drive became quite slow due to the file sizes, and Sharepoint adds some
Olympics Data Preparation and Dashboard
Page 1 of 6 complexity that is beyond the current scope of this case. If you do choose to try this approach, make sure
that the source files are readable by anyone...me...without needing a username/password.
Tasks (see also the “Requirements for the Dashboards” section, below)
Note for dashboards: After your clean/prepare the data, you will have 4 tables: summer, winter,
combined, and countries. For the dashboards you will only use fields from the combined table and the
countries table. The purpose of the combined table is so that we can have combined summer & winter
results when desired, and just as easily filter out one season or the other when we don't. Do not create
visuals with the winter and summer tables.
1. Prepare/clean the data (see Data Preparation Guidance, below).
2.
Explore/Experiment. Spend time exploring the data by creating a number of visuals, tables, etc. Create
one or more "Experimental” pages for this. Do enough of this to learn major aspects/insights you want
to explore further, as well as nuances of the data that may not be obvious at first.
3. Overall Dashboad. Create an overall (i.e., top-level) dashboard (name the page "Overall"). The visuals
must show at least the following: medals awarded over time, athletes competing over time, medal
count by country, and athlete count by country. Include card(s) to summarize key values. You may
want to present some of this on the same visual(s). Use slicers and/or filters to allow user to drill into
the data as they choose to. Note that “athletes” competing over time can be interpreted several ways,
e.g., unique people who have competed ever, unique people for each Olympic games, and total number
of entrants into all events (this last one multi-counts athletes as people). Be clear in your visuals (on
this page and all) whether you are talking about unique people, or event entrants.
4. Year/Season Level Dashboard ("Year and Season Persona"). For this dashboard, have the user select
the year(s) and season(s) (use slicers or filters), and your dashboard will then present visuals that
inform the user about that specific year(s)/season(s).
5.
a. Note: For the dashboards in steps 4, 5, and 6, the idea is that the user will choose a specific
year(s)/season(s) (or country(s), or sport(s)/event(s)). The visuals then should show details
about that specific year/season (or country, or sport/event). Therefore, you would not want a
visual, for example, that shows medals over time for the year/season dashboard, as most of the
time the user will just be selecting a single year/season...that would make for a chart with one
data point, i.e., without meaning. Think about what is meaningful, test it out with various settings
as the user would do, and iterate; do not just click the buttons and accept defaults.
Country Level Dashboard ("Country Persona"). Provide a way for the user to select a country (or
countries), and then present information about that country and its participation in the Olympics.
6. Sport and Event Level Dashboard ("Sport and Event Persona"). Provide a way for the user to select a
sport(s) and event(s), and then present information about that sport/event. Within sports, there are
events, so this dashboard is more of a sport- and event-level dashboard. Ideally, the user should be
able to drill down to specific events and see specific information on individual athletes.
7. Drill-down sheet. On this sheet provide a way for the user to intelligently and seamlessly drill down
into the data to any level they wish, even to the individual record level in the data. You will probably
need to experiment with a mix of graphs, decomposition tree, and table/matrix (a combination of a
decomposition tree and a table or matrix is often a good way to provide drill-down capability).
8. Writeup. Create a "Writeup" page. Insert a text box. List/discuss three key insights you gained specific
to the problem context (that is, what did you learn about the Olympics). Then list/discuss three key
things you learned about visualization and/or Power BI in completing the assignment. Target length
for this is the equivalent of a one-page writeup (more than just bullet points...give the main point and
then explain, with examples to illustrate).
9. PBIX and PDF. Save your Power BI file as a PBIX file. Also generate a PDF of your PBI file by using the
File...Export option. You will be submitting both. You do not need to submit the CSV files.
Olympics Data Preparation and Dashboard
Page 2 of 6 Guidelines and suggestions for the dashboards
Spend time on the general layout and formatting of your first (overall) dashboard. That way you can
duplicate the page and make changes for the other dashboards. While obviously you will make
changes for each dashboard, there should be a reasonably common look/feel to your dashboards.
Try to make visuals professional quality, titled/labeled appropriately, with appropriate color
selections, and able to be understood by user without additional explanation. With visualization,
getting something 70% done can be pretty quick, but the remaining 30% of tweaking settings, titles,
colors, alignment, etc. is what often separates a professional-level job from a novice job. Slicers and
filters should add meaningfully to the dashboard's value.
Each dashboard should contain at least 4 visuals (for this count, slicers don't count as a visual, but you
should have one or more slicers also; table/matrix does count as a visual). Each dashboard must have
a brief text header for the title of the dashboard (put this in a text box).
Be precise about language. For example, "athletes" (unique competitors) is different than "event
entries" (all athletic entries in a competition), is different than "medals awarded," and is different than
“medalists” (unique athletes who won at least one medal). For many/most charts, you will need to
change the default titles to ensure the user knows exactly what the chart displays. You can distinguish
between these (depending on the field) by summarizing as a Count versus the Count Distinct option.
Create at least three measures and use them in cards or similar visuals (not required for every
dashboard, but in total).
●
Use at least one of each of the following visuals, across all your pages. This is not for every page, but
taking all your pages together, utilize at least one of each type of the following visuals: timeline (line
or area), ribbon, bar, column, scatter (xy), filled map, treemap, histogram (use data grouping and a
column chart, not a custom visual), matrix, table, card (or multi-value card). Of course, you will use
some visual types more often than others. The above is a requirement to have at least one of those in
the list so you can experiment with the best chart(s) to show particular aspects.
●
On every dashboard, use slicers (you will need several slicers for some pages) and/or page-level
filters. Visual-level filters get added automatically for each visual for you to be able to tweak an
individual visual and you may need to customize settings on some individual visuals. For this
assignment, do not use the "Filters on all pages" capability (this affects all pages in the document, sort
of like filtering out rows in Transform Data would do).
Use the country code and country name intelligently. Do not assume the user will know the country
codes.
Submit
●
All pages must be interactive, that is, clicking on one visual automatically cross-filters/highlights the
others, and similarly for slicers and filters.
PBIX file
PDF file obtained by exporting the PBI file to a PDF file (File...Export).
Data Preparation Guidance
Change the names of the queries to Summer, Winter, and Countries. The default names are probably
inherited from the rather long file identifier in the links provided earlier in this document.
First row as headers. When importing text (e.g., CSV) files, the first row may not be automatically used
for column headers. In Home tab of Query Editor, Use First Row As Headers option is used to promote
the first row as headers.
Winter and Summer Files (do these steps on each file)
O Delete the "Changed Type" step that PBI likely adds automatically. With this data, PBI's default
choices, result in errors for some of the columns (PBI looks at the first 200 rows to decide on
data types, and this causes issues for at least one file here). Deleting this step and reassigning
the appropriate data type (later) works better.
Olympics Data Preparation and Dashboard
Page 3 of 6 ●
●
Create New column named Season. Set to "Summer" for summer table; "Winter" for winter
table. The formula for this column is just = "Summer" or = "Winter", respectively when you are
in the Add Column dialog box. Make sure you do this before appending the queries!
Append the queries.
O Appending queries combines two queries into one by combining the rows; this is why we
needed a Season field in the previous step.
In this stage you will combine winter and summer results. It is critical that you successfully
complete the previous steps before doing this one.
While in Query Editor ("Transform Data" in PBI), select the summer query.
Go to Home Tab → Append Queries → Append Queries as New
In the dialog box, the summer query should already be filled in one field. In the other field,
select the winter query, and hit OK.
Rename the resulting query to be "olympic_combined" and verify that you have all rows from
both summer and winter in the combined query.
O
O
O
O
Clean the data in the olympic_combined query
O
Age, Height, and Weight columns. Replace NA with blank (i.e., nothing; you can usually also
type null without quotation marks). Convert to decimal number. You can do this one column at
a time or select all three columns and do it at once. Query Editor will probably insert the null
value for the empty values which is OK (it uses null to tell you that there is truly nothing
entered, not even a space character). If you try to convert a column containing some text
values to numeric, you will get errors.
Year. Convert to whole number (don't try to convert to date; we're just interested in year and
season so whole number is sufficient).
Season. Convert to text.
O
Medal column. This column contains Gold, Silver, Bronze, or NA. In contrast to some other
columns, NA is not missing data here. It means the athlete did not earn a medal. For clearer
communication, replace this with something like "No Medal."
Add a Conditional Column called Medal_Index. The Medal column is text, but it is really ordinal
categorical data (i.e, categorical data with a natural order from best to worst). We need to be
able to display results in Gold, Silver, Bronze, and "No Medal." To do this, create an index
column based on the Medal column. Specifically, add a conditional column. Assign the value 1
to Gold, 2 to Silver, 3 to Bronze, and 4 to "No Medal" (see screen shot). After closing Query
Editor, we instruct Power BI to sort the Medal column based on the value of the Medal_Index
column (keep reading for instructions).
Close and Apply to return to Power BI
Model View. Create the connection between the "Olympic Code" column in the olympic_countries
table and the "Country" column in the olympic_combined table. Go to the Model Tab and create a 1-to-
many relationship between Olympic_Code in the olympic_countries table and Country in the
olympic_overall table.
Data View. In the Data View for the olympic_combined table, select the Medal column. Choose the
Column Tools tab, and "Sort by Column" dropdown. Sort the Medal column by the value of the
Medal_Index column. This does not immediately sort the Medal column. Rather, it tells Power BI that
whenever Medal is included in a visual, that the medal names will be listed in order according to the
Medal_Index column (you can choose ascending or descending). Check this when you create your first
visual or table listing the medal types and counts.
O
O
Olympics Data Preparation and Dashboard
Page 4 of 6 Conditional Column Creation for Model_Index
●
●
●
●
●
Add Conditional Column
●
Add a conditional column that is computed from the other columns or values.
New column name
Medal_Index
If
Else If
Else If
Else If
Else (0)
123
Column Name
Medal
Medal
Medal
Add Clause
Medal
null
●
Operator
equals
equals
equals
equals
Sport
Event
Year (of Olympics)
City (of Olympics)
Value >
123
123
Gold
Silver
123 Bronze
123 No Medal
Then
Data Dictionary
olympic_summer and olympic_winter files. Each row represents one entry into an event.
Athlete
Gender
PBIX/td-p/800887
Output
12-1
ABC
Then 123
Then 123
Then
2
3
4
ОК
Age (some data is missing; see Data Preparation section)
Height (cm) (some data is missing; see Data Preparation section)
Weight (kg) (some data is missing; see Data Preparation section)
Country abbreviation of athlete
Cancel
X
Medal (Gold, Silver, Bronze, or NA, where NA means athlete competed but did not win a medal. See
guidance in Data Preparation for how to deal with the NA values)
olympic_countries file
Country
Olympic Code (abbreviation of country, compatible with Country in other files)
Making Copy of Report Page, From One File to Another
You will likely be partly working independently on your own files, and then seeking to combine (PBI
Desktop does not have “live” sharing). This will help. You can copy one report page from one file to
another using the following steps:
Files must have the same data source(s). Sources here are the three CSV files.
Add a blank page in your target PBIX file
Go to your source PBIX file, click on the page you want to copy, and with nothing selected on the
report page hit CTRL + A (select all). Then hit CTRL + C (copy).
Go to your target report file and on the blank page, hit CTRL+V
See https://community.powerbi.com/t5/Desktop/Copy-report-page-from-PBIX-to-another-
Notes and Tips
PBI does not have a direct way to allow multiple people to edit the file simultaneously. You may want
to keep your PBIX in a cloud storage folder and make sure everyone has read/write access to that
folder, to better maintain version control. You will still need to update the Data Source settings when
opening the PBIX file on another computer.
Olympics Data Preparation and Dashboard
Page 5 of 6