Search for question
Question

Objective: This is designed to assess your ability to prepare, analyze, and visualize data related to airline reviews. You will work with two datasets: 1. ● Instructions: 2. ● DSCI

5360_06 Data Visualization for Analytics Data Processing and Transformation Assess the datasets for data quality, including completeness, consistency, and accuracy. Cleanse the data by addressing missing values, eliminating duplicates, and correcting any anomalies or inconsistencies. Top12_airlines_reviews contain 191,123 entries of airline reviews. Airline_top12_list includes detailed information on 12 airlines. Transform the data into a format suitable for analysis. This may involve tasks such as normalization, aggregation, or encoding of categorical variables. Data Visualization ● ● Create insightful visualizations that reveal patterns, trends, and insights within the airline review and airline details datasets. Your visualizations should aim to answer specific questions or highlight notable findings related to airline reviews. 3. Visualization and Design Principles Identify and elaborate on the visualization and design principles you employed in creating your data visualizations. These principles can be derived from class discussions or reputable online resources. Proper citations for these sources are required. Discuss why you selected these principles and how they contribute to the effectiveness of your visualizations. Ensure your visualizations are clear and accurate and communicate their intended insights effectively. Submission Guidelines: Present your findings and visualizations directly in this Word document together with the .twbx source file. Include screenshots of your data processing steps, visualizations, and other pertinent outputs or analyses. Each student's submission must be distinct. Identical or wrong submissions will result in a score of zero for those involved. Evaluation Criteria: Your submission will be evaluated based on: ● Accuracy and thoroughness of data processing: Demonstrated skill in cleaning and preparing data for analysis. Creativity and relevance of visualizations: Visualizations should be insightful, well-chosen for the dataset, and effectively highlight key insights. Application of visualization and design principles: Demonstrated understanding and application of principles that improve the clarity and interpretability of your visualizations. Originality: Your submission should reflect your own analysis and insights. 90% and above: Outstanding Performance You have exhibited an excellent understanding of data analytics and visualization principles. Your work demonstrates thorough data processing and insightful analysis and showcases independent thinking and innovative approaches to visualizing complex data. Your application of visualization and design principles is sophisticated, enhancing the interpretability and impact of your findings. This grade signifies that you have exceeded the basic requirements, contributing unique perspectives and advanced analytical skills to your work. 80% -89%: Above Average You have shown a strong grasp of data analytics and visualization, with work that meets most of the objectives effectively. There are minor flaws or areas for improvement, such as slight inaccuracies in data processing, analysis, calculations, opportunities for deeper analysis, or the application of more varied visualization techniques. These issues do not significantly detract from your analysis's overall quality and clarity but indicate areas where further refinement could elevate your work. This grade reflects solid competence and understanding, with room for enhancement in precision and creativity. 70% -79%: Satisfactory Your work indicates a basic understanding of data analytics and visualization, but there are major flaws or missing components that impact the overall effectiveness of your analysis. This may include incomplete data cleaning, significant gaps in analysis, or a lack of clarity in your visualizations. While you have grasped fundamental concepts, there is a need for a more thorough approach and greater attention to detail. This grade suggests that while you have met some objectives, there is considerable room for improvement in both analytical rigor and the application of visualization principles. Below 70%: Needs Improvement This grade indicates that you are still developing your data analytics and visualization skills. Your submission may lack a coherent analysis, exhibit poor data processing practices, or fail to apply basic visualization and design principles effectively. It suggests a need for a more foundational understanding of the subject matter, as well as practice in applying these concepts to real-world data. Consider seeking additional resources, guidance, and practice opportunities to enhance your abilities in these areas. This offers you an opportunity to demonstrate your skills in data analysis and visualization in the context of airline reviews. Approach it with creativity and critical thinking. Good luck! Tasks Task 1 Data preparation: Extract flight route features (departure-destination) and Splitting Columns: Use Tableau calculations to split the flight route into separate columns: Departure and Destination. Ensure these new features accurately represent each flight's start and endpoints. Clean Data: Filter Null Values: Identify and remove records with null or missing values in critical columns like Departure, Destination, or cabin classes. This ensures the integrity of your analysis. Visualize Data: Map Plot: Plot all flights with departure and destination on a map. Use Size to show the number of flights. Use color to show four cabin classes (Business, Economy, First Class, and Premium Economy). Identify Outliers: Missing Values on Map: After plotting, review the map for any areas lacking flight routes which might indicate missing data not previously identified. This could manifest as major airports with significantly fewer connections than expected. Outlier Detection: Use statistical methods or visual inspection to identify outliers in the number of flights in the distribution of cabin classes. These could indicate data quality issues or genuinely interesting trends. Explanations: Take screenshots of your results and explain your process of doing visualizations and your findings. Cite any references, if any. Task 2 Data preparation: Ensure Relevant Columns: Your dataset should include, but not be limited to, columns such as Review Text, Overall Rating, and Aspect Ratings. Add two new binary columns: COVID-19 Mention (Yes/No) and Refund Mention (Yes/No). COVID-19 Keyword Identification: Tableau calculation to scan Review Text for COVID-related keywords (e.g., pandemic, COVID, coronavirus, virus). This can be achieved through regular expressions or keyword matches. Classification: Assign a "Yes" value to the COVID-19 Mention column for reviews containing any of the identified keywords; otherwise, mark as "No." Refund Keyword Identification: Similarly, identify reviews mentioning refunds or cancellations by searching for relevant keywords (e.g., refund, reimburse, cancellation). Classification: Update the Refund Mention column based on the presence of these keywords, marking it "Yes" for reviews that discuss refunds or cancellations and "No" for those that do not. Visualizations: Overall and Aspect Ratings Comparison: Use a combination of bar charts and box plots to compare overall ratings and aspect ratings (such as Cleanliness, Food & Beverage, Value, and Service) between reviews with and without mentions of COVID-19 and Refunds. Bar Charts: Show average ratings for each category, with separate bars for COVID-19 mentions and refund mentions. Use different colors to distinguish between the two. Box Plots: Provide a distribution view of ratings, which can help identify patterns, outliers, and the spread of ratings in each category. Color Coding: Apply intuitive color coding to your visualizations to enhance readability. For instance, use green for positive outcomes (e.g., reviews without COVID-19 mentions showing higher satisfaction) and red for negative outcomes (e.g., lower ratings in reviews mentioning refunds). Explanations: Take screenshots of your results and explain your process of doing visualizations and your findings. Cite any references, if any. Task 3 Data preparation: Date Segmentation: Ensure your dataset includes a 'Review Date' field. Use this to categorize reviews into 'Pre-Pandemic' (before March 11, 2020) and 'During Pandemic' (from March 11, 2020, onwards). Aspect Ratings: Confirm that your dataset includes ratings for eight aspects of the airline experience, such as Cleanliness, Service, Seat Comfort, and Value. Visualizations: Use any charts you choose to show the trend of average aspect ratings over time, with separate lines for each aspect. This can help identify any significant changes in ratings before and during the pandemic. A bar chart or grouped bar chart can effectively compare the average aspect ratings between the pre-pandemic and during-pandemic periods. Consider using a monthly or quarterly granularity to smooth out short-term fluctuations and better visualize long-term trends. Explanations: Take screenshots of your results and explain your process of doing visualizations and your findings. Cite any references, if any. Task 4 Variable Selection: Choose two variables relevant to your research question or business problem. For instance, if you analyze retail data, you might examine the correlation between advertising spending and sales revenue. Ensure both variables are quantitative, as correlation analysis requires numerical data to compute the relationship strength and direction. Data Preparation: Make sure your dataset is clean, with no missing values or outliers that could skew the results. Use data cleaning techniques to prepare your dataset for analysis. Consider normalizing the data if the variables are on very different scales or if one variable significantly varies in magnitude compared to the other. Correlation Analysis: Use Tableau or any statistical tools to calculate the Pearson correlation coefficient if the data is normally distributed. This will give you a value between -1 and 1, indicating the strength and direction of the relationship. Visualizations: A scatter plot is the most direct way to visualize the relationship between two quantitative variables. Plot one variable on the x-axis and the other on the y-axis. Add a trend line to the scatter plot to visualize the direction and strength of the relationship. Most visualization software, including Tableau, can calculate and display this automatically. Explanations: Take screenshots of your results and explain your process of doing visualizations and your findings. Cite any references, if any. Task 5 Identify areas for exploration: Review Existing Analysis: Review your current findings and identify gaps or areas that might benefit from further exploration. Industry Trends: Investigate recent trends or challenges in your project's domain. For instance, if your project is about e-commerce sales, you might explore consumer behavior changes due to external factors like economic shifts or seasonal trends. Stakeholder Interests: Consider the interests of stakeholders or potential users of your dashboard. What additional information could help them make informed decisions? Propose new questions: Based on the identified areas for exploration, propose new questions that your dashboard could answer. Here are examples based on various domains: Retail Sales: How do sales trends vary by region, and what products are most popular in each region? Healthcare: What are the trends in patient satisfaction scores across different departments, and how do they correlate with staff levels? Education: How does student performance vary across subjects, and what is the correlation between attendance and performance? E-commerce: What are the patterns in customer acquisition costs over time, and how do these costs relate to customer lifetime value? Dashboard design: Multiple Visualizations: Design your dashboard to include various types of visualizations that together answer the new questions. For example, use line charts for trend analysis, bar charts for comparisons, and scatter plots for correlations. Interactivity: Implement filters, selectors, and hover-over details to allow users to interact with the dashboard and explore the data in-depth. This could include filtering by time period, geographical area, or other relevant dimensions. Logical Layout: Organize the dashboard logically, grouping related visualizations near each other and ordering the visualizations to guide the viewer through your analysis. Highlight Key Insights: Use the dashboard to draw attention to the most important findings in relation to the new questions. This could be through annotated trends, highlighted outliers, or summary statistics. Explanations: Take screenshots of your results and explain your process of doing visualizations and dashboard and your findings. Cite any references, if any./n