Search for question
Question

Stat Slam: Dunking into NBA Data A Slam Dunk Analysis of Basketball Player Performance Description of Problem Context In the fast-paced world of professional basketball, knowing what makes players successful on the court is crucial. With tons of stats available, it's tough to figure out which ones really matter. That's where Stat Slam comes in. We're diving into the 2022-2023 NBA regular season data to uncover what factors influence player performance and team success. Motivation for this Problem Libby I am interested in learning more about professional basketball. It is one of the only professional sports I do not follow, and I think this project would be a fantastic opportunity to see if I like sports data analytics. Mika As a high schooler I had a summer job where I would keep stats for Nike's Elite Youth Basketball tournaments around the country. Doing that I would always wonder what statistics help teams win the most. With my previous experience around the game of basketball I believe this project will be fun for me to dive into. Functionality 1: Players Stats and Minutes Played Analyzing NBA players' performance goes beyond basic stats. This exploration utilizes player data to predict their rebounds, assists, points, and steals per game based on average minutes played. We will build a Random Forest Regression model using Java libraries. This model is better suited than linear regression for capturing the non-linear relationship between minutes played and these stats. While factors like talent and opponent strength influence performance, this approach provides a data-driven analysis to estimate a player's potential based on average minutes played. Functionality 2: Win Share Data for Players We can use a multi-step approach in Java to estimate a player's Win Shares (WS), a metric of their contribution to winning. First, we'll create new features by combining existing stats (FG%, rebounds, etc.) to capture aspects like scoring efficiency or rebounding rate that might influence wins. Then, we'll build a Random Forest Regression model, a prediction tool. This model will be trained on existing player data, but instead of using actual Win Shares, it will learn the relationships between stats and Win Shares using a formula like John Hollinger's (The creator of the original win share formula). By analyzing these patterns, the model can predict Win Shares for new players based solely on their provided stats. Remember, this is an estimate of a player's true impact on winning depends on various factors beyond just individual stats. Supporting methodologies, steps, functionalities Data Cleaning and Preprocessing: To ensure the reliability and accuracy of our analysis, data cleaning and preprocessing procedures will be employed. This includes handling missing values, removing duplicates, and standardizing formats. Making our data clean and tidy. Interface Development using Vaadin: Vaadin will play a pivotal role in creating an interactive user interface for Stat Slam. The interface will feature a grid displaying NBA player statistics, providing users with a comprehensive overview of the dataset. Additionally, two text boxes. One will enable users to search for a player of interest on the grid. Two buttons, corresponding to each functionality, will allow users to generate the models. Utilization of Random Forest: For Stat Slam's analysis of NBA player performance, we will utilize Java libraries to implement Random Forest Regression models for the two functionalities. First, predicting player metrics like rebounds, assists, points, and steals per game based on average minutes played, employing data preprocessing steps for reliability. Second, estimating Win Shares using a multi-step approach, including feature engineering to capture performance aspects, and training a Random Forest Regression model on existing player data. Our Data set https://www.kaggle.com/datasets/vivovinco/20222023-nba-player-stats-regular (regular season not playoffs)