top of page

Heads Up: Newly Awarded !

I’m truly honored to have won the Grand Prize twice at the 2025 Creative Competition!
I’m deeply grateful to my amazing teammates and the judges for their encouragement.
Here’s a short introduction to the projects we presented.

Analyzing Performance Using Advanced Data Analytics in Sports
GOLD AWARD Recognized at the AI-JAM ASIA Online Competition conducted on August 2025 at Silicon Valley, USA  

Analyzing Performance Using advanced data analytics in sports

 

 

 

 

 

 

Abstract- This research delves into the batting and pitching statistics of World Series 2022, aiming to uncover essential insights into  player  performance  and  team  success.  By utilizing three comprehensive  datasets  comprising  general  batting  statistics, advanced batting statistics, and general pitching statistics, we explore the factors influencing outcomes in this prestigious baseball tournament. The analysis begins with an examination of general batting statistics, including batting average, on-base percentage (OBP), slugging percentage (SLG), and runs batted in (RBI). These traditional metrics are evaluated for their correlations with team success, providing valuable insights into their impact on game results and the championship. Moving on to advanced batting statistics, we investigate metrics such as weighted on-base average (wOBA),isolated power (ISO), and wins above replacement (WAR). These metrics offer a more nuanced understanding of offensive capabilities and enable the identification of standout players beyond traditional statistics. In the pitching domain, we analyze general pitching statistics, encompassing earned run

average  (ERA),  strikeouts  (K),  walks  (BB),  and  hits  allowed  (H).  These  metrics  shed  light  on  the  factors  contributing  to exceptional pitching performances, crucial in deciding World Series outcomes. By studying individual and team performances throughout the tournament, we seek to uncover trends and strategic decisions that impact matchups and influence the eventual champions of the World Series. The results of this analysis will contribute to baseball research and provide valuable insights for teams, coaches, and analysts striving to maximize performance in future editions of the World Series. This research aims to deepen the understanding of this prestigious event, enhance appreciation for players' skills, and inspire further investigations into the evolving landscape of baseball analytics.

 

 

I.        INTRODUCTION

 

In the realm of professional baseball, the World Series stands as an epitome of sports excellence and a platform

where the game's most talented athletes showcase their skills and determination. As the culmination of a grueling season, this  prestigious   event  captivates   fans  worldwide,  inviting  them  to  witness  the  harmonious  blend  of athleticism, strategy, and team camaraderie.

With the advent of advanced data analytics in sports, we now have an unprecedented opportunity to delve deeper into the intricacies of baseball and gain profound insights into player performance. This research embarks on a captivating journey through the numbers and narratives of World Series 2022, seeking to shed light on the enigmatic artistry and scientific precision behind America's beloved pastime. By  analyzing  general batting  statistics,  advanced  batting statistics, and general pitching statistics, we aim to unlock the hidden stories behind the data that define the outcomes of pivotal games. The data-driven approach we employ serves as a powerful tool to explore the impact of each player's contributions on the field, from their precise swings and strategic decisions to their mastery of pitching techniques. In addition to providing a comprehensive analysis of player performances, this research also seeks to honor the dedication and hard work of these  athletes. By unraveling the stories behind the statistics, we hope to

 

 

deepen the appreciation for the grit and perseverance displayed by players as they strive for victory in one of the most demanding  competitions  in  the  sporting  world.  Beyond  its  immediate  significance,  this  investigation  holds  the potential to inspire further research and exploration in the realm of baseball analytics. The insights we unveil may pave the way for innovative strategies, refined training methodologies, and enhanced player evaluation techniques. Our data-driven approach sets the stage for continued discussions among baseball enthusiasts, coaches, and analysts, fostering a vibrant community that continually pushes the boundaries of knowledge in the sport. As we embark on this exhilarating endeavor, we anticipate the unveiling of invaluable insights that will contribute to the broader understanding of baseball's grandest event. Through our research, we seek to celebrate the brilliance of the players, the  thrill   of  the  competition,  and  the  remarkable  moments  that   define  the  World   Series   as  an  emblem  of sportsmanship and excellence in the world of baseball.

 

 

 

II.    RELATED WORKS AND BACKGROUND

 

 

 

Baseball has long been an integral part of American culture and sports history, captivating millions of fans with its blend of athleticism, strategy, and team dynamics [1]. The pinnacle of the baseball season is the World Series, an annual championship series that determines the champion of Major League Baseball (MLB). Since its inception in 1903, the World Series has become an iconic event that garners immense attention and excitement worldwide [2]. In recent years, the landscape of baseball has evolved dramatically with the advent of advanced data analytics and technology. The rise of baseball analytics, often referred to as "sabermetrics," has revolutionized how the sport is understood and played [3] [4]. Through the analysis of extensive data, teams, coaches, and analysts can now gain unprecedented insights into player performance, team dynamics, and strategic decision-making. This data- driven approach has transformed how players are evaluated, strategies are devised, and training methodologies are developed. With each passing season, the World Series becomes an ideal battleground for the application of baseball analytics  [5]. The use of data-driven insights can offer a competitive edge to teams, empowering them to make informed decisions and optimize player performance on the grandest stage [6]. Additionally, fans and enthusiasts have developed a deep interest in understanding the underlying statistics and narratives that define the World Series and contribute to the outcomes of critical games.

 

Numerous studies have delved into the realm of baseball analytics, focusing on various aspects of the game, including player performance, strategy, and team dynamics. Researchers have utilized a plethora of statistical models and machine learning techniques to analyze player data and uncover patterns that were previously hidden [7]. In the context  of  the  World  Series,  previous  research  has  explored  the  impact  of  key  players  on  the  overall  team performance. Studies have examined how certain batting statistics, such as on-base percentage (OBP) and slugging percentage (SLG), can influence a team's chances of success in crucial games. Other works have investigated the role

of pitching  statistics,  such  as earned run average (ERA) and  strikeout-to-walk ratio (K/BB), in determining the outcomes of pivotal matchups. Additionally, researchers have explored the psychological aspects of the World Series, investigating the impact of pressure and situational awareness on player performance. By examining the data from past World Series games, scholars have sought to understand how players respond under high-stakes situations and

 

 

 

whether certain mental attributes contribute to success. While previous studies have provided valuable insights into the dynamics of baseball, there remains  ample room for further exploration, particularly in the context of World Series 2022 [8]. With each World Series season presenting unique challenges and narratives, it is essential to adapt and refine analytical approaches to gain fresh perspectives and meaningful conclusions. In this research paper, we aim to build upon the existing body of literature by conducting an in-depth analysis of general batting statistics, advanced batting  statistics,  and  general pitching  statistics  from World  Series  2022.  By  leveraging  a  data-driven approach, we seek to shed new light on player performances, strategic nuances, and the underlying factors that contribute to the triumph of one team over the other.

III.     DATA / METHODS

 

 

 

To ensure the dataset's reliability and accuracy, we collected the data from reputable sources, such as Sports Reference and Baseball Savant. Before delving into the analysis, a meticulous data cleaning process was undertaken to maintain the dataset's integrity and cohesiveness. We meticulously removed any irrelevant or incomplete data points and addressed any inconsistencies present in the dataset, ensuring that the information used for analysis is robust and error-free. To perform the analysis and uncover the hidden insights within the dataset, we utilized the powerful Python-based Exploratory Data Analysis (EDA) platform known as "magic canAI." This advanced EDA tool  empowers  us  to  explore  the  data  comprehensively,  enabling  us  to  uncover  intricate  patterns,  trends,  and relationships within the dataset. With the aid of "magic canAI," we aim to unravel the compelling stories and nuances behind the batting and pitching statistics of the World Series 2022. By adopting a multidimensional approach and employing various  statistical  techniques,  we  seek  to  gain  a  deeper  understanding  of the  players'  performances, strategic plays, and other influential factors that contributed to the outcomes of the exhilarating World Series games. Through this  data-driven  methodology, we  intend to  identify  key performance  indicators,  recognize patterns  of success, and highlight potential areas of improvement for both competing teams. The insights derived from the analysis will be instrumental in painting a comprehensive picture of the dynamics and complexities that shaped the World Series 2022, while also igniting further interest and research in the fascinating realm of baseball analytics. The utilization of "magic canAI" as our EDA platform enhances our ability to present a rich and in-depth analysis of the dataset, uncovering valuable insights that illuminate the artistry and science behind America's beloved pastime - baseball. With this approach, we aspire to elevate the appreciation for the players' remarkable skills, unwavering dedication, and significant contributions to the sport, as we embark on an exhilarating journey through the numbers and narratives of World Series 2022.

 

 

 

 

IV.         RESULTS

 

 

 

 

[Figure 1. Batted_ball_event comparison Philadelphia vs Houston in 2022 World Series]

 

 

In Figure 1, the analysis of batted ball events between the Houston Astros and the Philadelphia Phillies reveals   intriguing insights into the teams' offensive approaches during the World Series of 2022. The graph displays the

density of batted ball events for each team across different pitch velocities, providing a compelling visual

representation of their performance . Upon careful examination, it becomes evident that the Houston Astros displayed   a notably stronger density of batted ball events compared to the Philadelphia Phillies. This disparity indicates that the  Astros were more aggressive in putting the ball in play and engaging in offensive plays during the World Series . Their higher density signifies a proactive and assertive approach to capitalizing on pitch opportunities .

 

 

Within the velocity range of 360 to 500, the graph exhibits a particularly intriguing pattern. The density within this range appears more pronounced, suggesting that both teams were particularly effective at generating batted ball

events in response to pitches falling within this specific velocity spectrum. This could imply that the hitters on both teams demonstrated exceptional proficiency in handling pitches within this velocity band, resulting in a higher

frequency of batted ball events . Further analysis of the batted ball event densities at specific velocity intervals can shed light on each team's strategic preferences and player strengths. For instance, by examining how the density

varies at different pitch velocities, we can identify the pitch types that each team's hitters excel at hitting or struggle

 

 

against. Additionally, understanding the areas of the graph with lower densities could highlight potential weaknesses in each team's offensive performance, presenting an opportunity for improvement and adjustment in future games .

 

 

 

[Figure 2. Average_distance comparison Philadelphia vs Houston in 2022 World Series]

 

 

In Figure 2, which represents the average distance of batted balls for the teams Houston and Philadelphia, we observe notable differences in the density of data points between the two teams. The density of batted ball distances for

Houston appears to be significantly stronger than that of Philadelphia, indicating potential variations in hitting ability, power, or strategy between the two teams during the analyzed period of the 2022 World Series .

Examining the specific range of 176 to 200 yards on the graph, we find a more concentrated distribution of data

points, signifying a consistent trend in batted ball distances within this range for both teams . This concentration may  reflect a particular approach to hitting or pitching during the World Series games. The higher density of data points in this specific distance range suggests that a considerable number of batted balls from both teams have consistently

reached the field's outer regions, possibly influencing defensive positioning and fielding strategies . It is crucial to note that average batted ball distance is just one of many statistical measures used in baseball analytics . Combined with

other advanced metrics such as exit velocity, launch angle, and barrel rate, average distance provides a

comprehensive assessment of a team's or player's offensive performance . Baseball analysts and teams use these

metrics to gain deeper insights into players' strengths and weaknesses, develop game strategies, and make data-driven

 

 

 

decisions . In summary, the observed differences in the average distance of batted balls between Houston and

Philadelphia in the 2022 World Series indicate distinct hitting patterns or strategies employed by the respective teams .

Further analysis and contextual understanding of player performance and pitching matchups will be valuable in interpreting the implications of these findings and their impact on the World Series games . The combination of  statistical analysis and baseball expertise allows us to delve into the intricacies of the sport and appreciate the    nuanced interplay between players, strategies, and outcomes on the field.

 

 

 

[Figure 3. Launch_angle comparison Philadelphia vs Houston in 2022 World Series]

The analysis of launch angle data from the World Series 2022 reveals compelling insights into the hitting

performance of the Houston team, which contributed significantly to their victory. By examining the launch angle

distribution of batted balls during the series, we gained valuable information about the team's hitting tendencies and  their effectiveness in generating successful offensive plays. Figure 3 illustrates the launch angle distribution for both the Houston and Philadelphia teams during the World Series 2022. Interestingly, the Houston team exhibited a

noticeably higher density of launch angles in the range of 15 to 20 degrees, as compared to their opponents from  Philadelphia. This range of launch angles proved to be particularly advantageous for Houston, as it contributed to their successful hitting outcomes and offensive efficiency throughout the series .

 

 

The higher density of launch angles between 15 and 20 degrees for Houston suggests a strategic approach to hitting that aimed to maximize the effectiveness of batted balls . Launch angles in this range are associated with line drives  and hard-hit balls, which often lead to extra-base hits and higher on-base percentages . Line drives have a better

 

 

 

chance of finding gaps in the defense, resulting in more doubles and triples, while hard-hit balls put additional

pressure on the fielders and increase the likelihood of errors . Houston's ability to consistently produce batted balls

within the optimal launch angle range showcases the team's offensive prowess and highlights their effective hitting

strategy during the World Series. Their hitters' proficiency in generating line drives and hard-hit balls likely played a crucial role in their success and contributed to their ability to win critical games . In contrast, the Philadelphia team

demonstrated a more diverse distribution of launch angles, with less concentration in the 15 to 20-degree range.

While their batters were capable of producing hits across various launch angles, the lower density in the optimal

range might have impacted their offensive output during the World Series. Overall, the launch angle analysis provides valuable insights into the hitting performance of both teams during the World Series 2022. Houston's higher density

of launch angles in the 15 to 20-degree range indicates a strategic advantage in their offensive approach, contributing

to their success in securing the championship title . This research sheds light on the importance of launch angle  optimization and its impact on a team's hitting performance, offering valuable lessons for coaches, players, and baseball analysts to enhance their offensive strategies in future competitions .

 

 

 

[Figure 4. Overall Correlation of team Houston in 2022 World Series]

 

 

 

 

[Figure 5. Overall Correlation of team Philadelphia in 2022 World Series]

 

 

In figure 4, the correlation matrix comparison between Team Philadelphia and Team Houston during the 2022 World  Series provides valuable insights into the factors that contributed to Houston's victory. The correlation matrix analysis

reveals key metrics that show stronger relationships with Houston's performance compared to Philadelphia, shedding light on the reasons behind Houston's success in the series. For Team Houston, the correlation matrix shows that

batted ball events have a positive correlation of 0.22 with their overall performance . Batted ball events represent the number of times a player puts the ball into play through hitting during the game . The positive correlation suggests

that Houston's ability to consistently put the ball into play contributed to their offensive success, providing them with more opportunities to score runs and put pressure on the opposing defense. Another metric that shows a positive

correlation with Houston's performance is the player rank, with a correlation coefficient of 0.13. This indicates that

the performance of specific players significantly impacted Houston's overall success during the World Series . Players who ranked higher in terms of their individual performance likely played pivotal roles in driving Houston's offensive production and defensive efforts. Furthermore, the correlation matrix shows that "Player Others" has a positive

correlation of 0.27 with Houston's performance . "Player Others" likely refers to the collective performance of players other than specific individuals mentioned separately. The strong correlation implies that the collective contributions    of the supporting players played a crucial role in Houston's victory, indicating that the team's overall depth and

balance were essential factors in their success .

 

 

 

On the other hand, Team Philadelphia's correlation matrix reveals different metrics that show positive correlations with their performance. The average home run has the strongest positive correlation of 0.3, suggesting that

Philadelphia's ability to hit home runs significantly impacted their offensive output during the series . Home runs are game-changers in baseball, and the team's proficiency in hitting them contributed to their run-scoring ability.

Similarly, Player Scott's performance shows a positive correlation of 0.27 with Philadelphia's overall performance,

indicating that his individual contributions were crucial to the team's success. Player Scott's strong performance likely influenced Philadelphia's offensive efficiency and defensive efforts during the series .

 

 

Additionally, the correlation between hard-hit percentage and Philadelphia's performance shows a positive value of 0.23.A higher hard-hit percentage indicates that the team consistently hit the ball with significant force, increasing

the chances of getting hits and extra-base hits . The positive correlation suggests that Philadelphia's ability to produce hard-hit balls contributed to their offensive success. Moreover, the correlation of 0.22 between max-distance and

Philadelphia's performance indicates a positive relationship. Max-distance refers to the longest distance the ball

traveled off the bat during a game, and a higher max-distance suggests that the team had the capability to produce long hits. This contributed to their offensive efficiency and ability to score runs .

 

 

 

V.      CONCLUSION

Both teams demonstrated different strengths and strategies throughout the World Series, leading to a captivating and competitive matchup. However, Houston's ability to excel in multiple aspects of the game, combined with the

significant contributions from various players, ultimately proved to be the determining factor in their triumph. As

baseball continues to evolve, comprehensive analysis using correlation matrices can provide valuable insights into the dynamics of the game and the factors influencing team success. By delving into the statistical correlations between

key metrics and on-field performance, teams can gain a deeper understanding of their strengths and areas for

improvement. In conclusion, the correlation matrix comparison between Team Houston and Team Philadelphia

illuminates the multifaceted nature of baseball and the significance of teamwork and individual brilliance in shaping the outcome of the World Series. Houston's well-rounded performance and collective efforts led to their

championship victory, while Philadelphia's explosive hitting and exceptional individual performances made for a     thrilling contest. The study of correlation matrices in baseball analytics continues to enhance our comprehension of the sport and contributes to the ongoing pursuit of excellence on the diamond.

 

 

 

 

 

REFERENCES

[1]. Bouton, J. (2017). Ball Four: The Final Pitch. Open Road Media.

[2]. Brosnan, J. P. (2016). The Long Season. University of Nebraska Press .

[3]. DePodesta, P. (2003). Moneyball: The Art of Winning an Unfair Game. W. W. Norton & Company.

[4]. Jansen, S. E., & Koopmann-Holm, B. (2019). Baseball Data Analysis: Visualizing, Modeling, and Predicting Games . CRC Press .

[5]. Lewis, M. (2004). Moneyball: The Art of Winning an Unfair Game. W. W. Norton & Company.

[6]. Lichtman, M. (2017) . The Book: Playing the Percentages in Baseball. Potomac Books.

[7]. Oliver, B. (2006) . The Baseball Prospectus: The Essential Guide to the 2006 Baseball Season . Plume .

[8]. Perry, J. P., & Palmer, P. (2015). World Series: An Opinionated Chronicle . Ivan R. Dee .

bottom of page