Search for question
Question

Olympics Data Analysis and Dashboard: Power BI Data Preparation and Dashboard Development Summary With Power BI "Transform Data," clean and prepare the Olympics data set. Then, with Power BI Desktop,

plan and develop at least 4 professional quality dashboards plus a sheet that allows the user to drill down to lowest-level details contained in the data, to help someone generally knowledgeable about the Olympics, explore and learn more about: a) the Olympics as a whole (dashboard); b) the Olympics in a specific year and season that the user specifies (winter or summer) (dashboard); c) country-level information (user specifies the country) (dashboard); d) sport/event-level information (specified by the user), including information about individual athletes (dashboard); and e) a page in the PBI file that allows the user to interactively drill down to the lowest level details in the data. Prepare a brief writeup as part of the PBI file. Guidance Read the entire document before starting. Plan to go through two or three iterations of your dashboards. Start early enough so you can be thinking about how to improve the dashboards and make the document unified (i.e., common styles, look/feel, user interactions). Data Files and Making Connections to Data The zip file olympics.zip contains three CSV files. Two contain information about Olympic Games performances over many years (olympic_summer.csv, over 220,000 rows; and olympic_winter.csv, over 48,000 rows). These two files have the same schema (format/structure of data columns). The third CSV file (olympic_countries.csv; about 230 rows) contains country abbreviations and associated country names (see the Data Dictionary section below). Do not make any changes to the CSV files. Connect to these files from Power BI. You can certainly look at and explore them in a text editor and/or Excel, but when you load them to Power BI, connect to them as CSV files. It is best to keep the CSV files and your PBIX file in the same directory. It is important to realize that the PBIX file contains connections to data files. The PBIX file format is designed to both keep this connection information in addition to a copy of the data. However, the “true” source of the data is the CSV files, and whenever PBI refreshes the data (for example, in the Transform Data steps), it looks to the connection information. You will likely be working on this from several different computers. It is important, after copying the PBIX file to a new computer, to also update the "Data Source" settings in PBI. From the Home menu, pull the arrow down next to Transform Data, and choose "Data source settings." From there, select each CSV file and update the file location to where the file resides on your computer. This is the most important during the Transform Data phase (data preparation). You will not be able to do any Transform Data operations unless PBI can "find" the true source files. After your data is completely prepared, PBI can actually work with the copy of the data in the current PBIX file, but if the source data ever changes or you need to go into Transform Data again, you will need to update the Data Source settings so PBI can find the files. More information, not required if you follow the steps in the previous paragraph. It is possible to store the CSV files on a web server or shared resource, e.g., Google Drive, Sharepoint, etc. In my testing with these files, connections to Google Drive became quite slow due to the file sizes, and Sharepoint adds some Olympics Data Preparation and Dashboard Page 1 of 6 complexity that is beyond the current scope of this case. If you do choose to try this approach, make sure that the source files are readable by anyone...me...without needing a username/password. Tasks (see also the “Requirements for the Dashboards” section, below) Note for dashboards: After your clean/prepare the data, you will have 4 tables: summer, winter, combined, and countries. For the dashboards you will only use fields from the combined table and the countries table. The purpose of the combined table is so that we can have combined summer & winter results when desired, and just as easily filter out one season or the other when we don't. Do not create visuals with the winter and summer tables. 1. Prepare/clean the data (see Data Preparation Guidance, below). 2. Explore/Experiment. Spend time exploring the data by creating a number of visuals, tables, etc. Create one or more "Experimental” pages for this. Do enough of this to learn major aspects/insights you want to explore further, as well as nuances of the data that may not be obvious at first. 3. Overall Dashboad. Create an overall (i.e., top-level) dashboard (name the page "Overall"). The visuals must show at least the following: medals awarded over time, athletes competing over time, medal count by country, and athlete count by country. Include card(s) to summarize key values. You may want to present some of this on the same visual(s). Use slicers and/or filters to allow user to drill into the data as they choose to. Note that “athletes” competing over time can be interpreted several ways, e.g., unique people who have competed ever, unique people for each Olympic games, and total number of entrants into all events (this last one multi-counts athletes as people). Be clear in your visuals (on this page and all) whether you are talking about unique people, or event entrants. 4. Year/Season Level Dashboard ("Year and Season Persona"). For this dashboard, have the user select the year(s) and season(s) (use slicers or filters), and your dashboard will then present visuals that inform the user about that specific year(s)/season(s). 5. a. Note: For the dashboards in steps 4, 5, and 6, the idea is that the user will choose a specific year(s)/season(s) (or country(s), or sport(s)/event(s)). The visuals then should show details about that specific year/season (or country, or sport/event). Therefore, you would not want a visual, for example, that shows medals over time for the year/season dashboard, as most of the time the user will just be selecting a single year/season...that would make for a chart with one data point, i.e., without meaning. Think about what is meaningful, test it out with various settings as the user would do, and iterate; do not just click the buttons and accept defaults. Country Level Dashboard ("Country Persona"). Provide a way for the user to select a country (or countries), and then present information about that country and its participation in the Olympics. 6. Sport and Event Level Dashboard ("Sport and Event Persona"). Provide a way for the user to select a sport(s) and event(s), and then present information about that sport/event. Within sports, there are events, so this dashboard is more of a sport- and event-level dashboard. Ideally, the user should be able to drill down to specific events and see specific information on individual athletes. 7. Drill-down sheet. On this sheet provide a way for the user to intelligently and seamlessly drill down into the data to any level they wish, even to the individual record level in the data. You will probably need to experiment with a mix of graphs, decomposition tree, and table/matrix (a combination of a decomposition tree and a table or matrix is often a good way to provide drill-down capability). 8. Writeup. Create a "Writeup" page. Insert a text box. List/discuss three key insights you gained specific to the problem context (that is, what did you learn about the Olympics). Then list/discuss three key things you learned about visualization and/or Power BI in completing the assignment. Target length for this is the equivalent of a one-page writeup (more than just bullet points...give the main point and then explain, with examples to illustrate). 9. PBIX and PDF. Save your Power BI file as a PBIX file. Also generate a PDF of your PBI file by using the File...Export option. You will be submitting both. You do not need to submit the CSV files. Olympics Data Preparation and Dashboard Page 2 of 6 Guidelines and suggestions for the dashboards Spend time on the general layout and formatting of your first (overall) dashboard. That way you can duplicate the page and make changes for the other dashboards. While obviously you will make changes for each dashboard, there should be a reasonably common look/feel to your dashboards. Try to make visuals professional quality, titled/labeled appropriately, with appropriate color selections, and able to be understood by user without additional explanation. With visualization, getting something 70% done can be pretty quick, but the remaining 30% of tweaking settings, titles, colors, alignment, etc. is what often separates a professional-level job from a novice job. Slicers and filters should add meaningfully to the dashboard's value. Each dashboard should contain at least 4 visuals (for this count, slicers don't count as a visual, but you should have one or more slicers also; table/matrix does count as a visual). Each dashboard must have a brief text header for the title of the dashboard (put this in a text box). Be precise about language. For example, "athletes" (unique competitors) is different than "event entries" (all athletic entries in a competition), is different than "medals awarded," and is different than “medalists” (unique athletes who won at least one medal). For many/most charts, you will need to change the default titles to ensure the user knows exactly what the chart displays. You can distinguish between these (depending on the field) by summarizing as a Count versus the Count Distinct option. Create at least three measures and use them in cards or similar visuals (not required for every dashboard, but in total). ● Use at least one of each of the following visuals, across all your pages. This is not for every page, but taking all your pages together, utilize at least one of each type of the following visuals: timeline (line or area), ribbon, bar, column, scatter (xy), filled map, treemap, histogram (use data grouping and a column chart, not a custom visual), matrix, table, card (or multi-value card). Of course, you will use some visual types more often than others. The above is a requirement to have at least one of those in the list so you can experiment with the best chart(s) to show particular aspects. ● On every dashboard, use slicers (you will need several slicers for some pages) and/or page-level filters. Visual-level filters get added automatically for each visual for you to be able to tweak an individual visual and you may need to customize settings on some individual visuals. For this assignment, do not use the "Filters on all pages" capability (this affects all pages in the document, sort of like filtering out rows in Transform Data would do). Use the country code and country name intelligently. Do not assume the user will know the country codes. Submit ● All pages must be interactive, that is, clicking on one visual automatically cross-filters/highlights the others, and similarly for slicers and filters. PBIX file PDF file obtained by exporting the PBI file to a PDF file (File...Export). Data Preparation Guidance Change the names of the queries to Summer, Winter, and Countries. The default names are probably inherited from the rather long file identifier in the links provided earlier in this document. First row as headers. When importing text (e.g., CSV) files, the first row may not be automatically used for column headers. In Home tab of Query Editor, Use First Row As Headers option is used to promote the first row as headers. Winter and Summer Files (do these steps on each file) O Delete the "Changed Type" step that PBI likely adds automatically. With this data, PBI's default choices, result in errors for some of the columns (PBI looks at the first 200 rows to decide on data types, and this causes issues for at least one file here). Deleting this step and reassigning the appropriate data type (later) works better. Olympics Data Preparation and Dashboard Page 3 of 6 ● ● Create New column named Season. Set to "Summer" for summer table; "Winter" for winter table. The formula for this column is just = "Summer" or = "Winter", respectively when you are in the Add Column dialog box. Make sure you do this before appending the queries! Append the queries. O Appending queries combines two queries into one by combining the rows; this is why we needed a Season field in the previous step. In this stage you will combine winter and summer results. It is critical that you successfully complete the previous steps before doing this one. While in Query Editor ("Transform Data" in PBI), select the summer query. Go to Home Tab → Append Queries → Append Queries as New In the dialog box, the summer query should already be filled in one field. In the other field, select the winter query, and hit OK. Rename the resulting query to be "olympic_combined" and verify that you have all rows from both summer and winter in the combined query. O O O O Clean the data in the olympic_combined query O Age, Height, and Weight columns. Replace NA with blank (i.e., nothing; you can usually also type null without quotation marks). Convert to decimal number. You can do this one column at a time or select all three columns and do it at once. Query Editor will probably insert the null value for the empty values which is OK (it uses null to tell you that there is truly nothing entered, not even a space character). If you try to convert a column containing some text values to numeric, you will get errors. Year. Convert to whole number (don't try to convert to date; we're just interested in year and season so whole number is sufficient). Season. Convert to text. O Medal column. This column contains Gold, Silver, Bronze, or NA. In contrast to some other columns, NA is not missing data here. It means the athlete did not earn a medal. For clearer communication, replace this with something like "No Medal." Add a Conditional Column called Medal_Index. The Medal column is text, but it is really ordinal categorical data (i.e, categorical data with a natural order from best to worst). We need to be able to display results in Gold, Silver, Bronze, and "No Medal." To do this, create an index column based on the Medal column. Specifically, add a conditional column. Assign the value 1 to Gold, 2 to Silver, 3 to Bronze, and 4 to "No Medal" (see screen shot). After closing Query Editor, we instruct Power BI to sort the Medal column based on the value of the Medal_Index column (keep reading for instructions). Close and Apply to return to Power BI Model View. Create the connection between the "Olympic Code" column in the olympic_countries table and the "Country" column in the olympic_combined table. Go to the Model Tab and create a 1-to- many relationship between Olympic_Code in the olympic_countries table and Country in the olympic_overall table. Data View. In the Data View for the olympic_combined table, select the Medal column. Choose the Column Tools tab, and "Sort by Column" dropdown. Sort the Medal column by the value of the Medal_Index column. This does not immediately sort the Medal column. Rather, it tells Power BI that whenever Medal is included in a visual, that the medal names will be listed in order according to the Medal_Index column (you can choose ascending or descending). Check this when you create your first visual or table listing the medal types and counts. O O Olympics Data Preparation and Dashboard Page 4 of 6 Conditional Column Creation for Model_Index ● ● ● ● ● Add Conditional Column ● Add a conditional column that is computed from the other columns or values. New column name Medal_Index If Else If Else If Else If Else (0) 123 Column Name Medal Medal Medal Add Clause Medal null ● Operator equals equals equals equals Sport Event Year (of Olympics) City (of Olympics) Value > 123 123 Gold Silver 123 Bronze 123 No Medal Then Data Dictionary olympic_summer and olympic_winter files. Each row represents one entry into an event. Athlete Gender PBIX/td-p/800887 Output 12-1 ABC Then 123 Then 123 Then 2 3 4 ОК Age (some data is missing; see Data Preparation section) Height (cm) (some data is missing; see Data Preparation section) Weight (kg) (some data is missing; see Data Preparation section) Country abbreviation of athlete Cancel X Medal (Gold, Silver, Bronze, or NA, where NA means athlete competed but did not win a medal. See guidance in Data Preparation for how to deal with the NA values) olympic_countries file Country Olympic Code (abbreviation of country, compatible with Country in other files) Making Copy of Report Page, From One File to Another You will likely be partly working independently on your own files, and then seeking to combine (PBI Desktop does not have “live” sharing). This will help. You can copy one report page from one file to another using the following steps: Files must have the same data source(s). Sources here are the three CSV files. Add a blank page in your target PBIX file Go to your source PBIX file, click on the page you want to copy, and with nothing selected on the report page hit CTRL + A (select all). Then hit CTRL + C (copy). Go to your target report file and on the blank page, hit CTRL+V See https://community.powerbi.com/t5/Desktop/Copy-report-page-from-PBIX-to-another- Notes and Tips PBI does not have a direct way to allow multiple people to edit the file simultaneously. You may want to keep your PBIX in a cloud storage folder and make sure everyone has read/write access to that folder, to better maintain version control. You will still need to update the Data Source settings when opening the PBIX file on another computer. Olympics Data Preparation and Dashboard Page 5 of 6