Open Free PyBaseball Stats Spreadsheet
Introduction
For data scientists, data analysts, and engineers who know python, the pybaseball package is a great way to access all the baseball statistics and advanced baseball analytics a fan could dream. Unfortunately, pybaseball is less accessible for the average fan who is looking to track pitch-by-pitch stats on their favorite players, look up historical baseball stats and obscure baseball records, or build an analysis to pick and manage a fantasy baseball team.
We've used Row Zero, a next-gen spreadsheet built for big data, to pull these baseball stats in a spreadsheet that's easy for any baseball fan with basic spreadsheet skills to use.
In this post, we show you how to use pybaseball to pull a universe of baseball stats into a dynamic spreadsheet. If you just want to skip straight to the spreadsheet, you can view pybaseball stats in a spreadsheet using this template. Or read on to see how to write simple python code to get pybaseball data into a spreadsheet and do your own baseball data analysis using everyday spreadsheet skills.
Index
- What is Pybaseball?
- What is Row Zero?
- Explore the Pybaseball Spreadsheet Template
- Getting Started with Pybaseball
- Pybaseball Individual Player Batting Stats
- Pybaseball Individual Player Pitching Stats
- Pybaseball Season Data
- Pybaseball Team Data
- Conclusion
What is Pybaseball?
Pybaseball is a popular python package written and maintained by James Ledoux as well as other contributors. The package scrapes Baseball Reference, Baseball Savant, and Fan Graphs to aggregate data and make it easy to use Python to analyze baseball data.
The data available through the pybaseball python package is queryable at the individual pitch or hit level, as well as historical player or team data aggregated across seasons. The dataset includes advanced baseball stats and lets you analyze moneyball stats and sabermetrics driving modern baseball analytics. Information on the pybaseball python package can be found on github. The python commands for pybaseball are well documented in the pybaseball package documentation.
What is Row Zero?
Row Zero is a next-gen spreadsheet built for big data. Row Zero brings the power of big data analytics into a simple to use spreadsheet that is accessable to anyone with basic spreadsheet skills. You can open a free spreadsheet on Row Zero to get started. Row Zero works like Microsoft Excel and Google Sheets but supports billion row data sets (1000x the Excel row limit) and has a built-in Python development environment. The code window where pybaseball commands can be evaluated is accessible on the right-hand side of the screen as highlighted in the image below.
Explore the Pybaseball Spreadsheet Template
We've built an easy to use pybaseball spreadsheet template to help any user get started. Simply open the spreadsheet and start playing around. You can view aggregate individual batting stats and pitching stats for any season or drill down into pitch-by-pitch stats for any player or game. The data includes advanced baseball analytics on any player or team.
Pitch-by-pitch and game-by-game stats for any player
View pitch-by-pitch stats for every at bat for any player. Build your own baseball stat tracker in a spreadsheet, drill into sabermetrics and advanced moneyball stats, or build your own fantasy baseball analytics. Here's a few examples to get you started:
Analyze how close Aaron Judge was to hitting 60 or 70 home runs
View the spreadsheet of Aaron Judge statistics
Analyze every pitch from Ohtani's historic 50/50 season
View the spreadsheet of Shohei Ohtani statistics
Re-live Paul Skenes rookie season and analyze every pitch over 100 MPH
View the spreadsheet of Paul Skenes statistics
Aggregate Player Stats and Stat Leaders
Explore baseball stats for every player for any time range. Rank leaders for any baseball stat or filter and sort for your favorite players. View the spreadsheet.
How to Get Pybaseball Data into a Spreadsheet
Getting Started with Pybaseball
To get started with pybaseball, there are three main steps:
- Import necessary python packages
- Write code to pull desired states
- Enter the name of the data table in the Row Zero spreadsheet to see stats.
We will now walk through 4 examples to help show how baseball stats can be pulled into a spreadsheet for analysis.
1. Pybaseball Individual Player Batting Stats in a Spreadsheet
First import datetime and timedelta packages that will be useful when specifying the time periods over which the stats will be pulled. Next pull in 2 different packages from pybaseball: playerid_lookup and statcast_batter. With these packages imported, write a script that specifies the player name, time period for stats desired, and which stats provider is desired. The python script creates a data set called 'statcast_batter_df,' that will pull in the data for the player and timeframe specified in the code. Press 'shift + enter' or the 'run' button to execute the code window. Then, in any spreadsheet cell type '=statcast_batter_df' and the data table of stats will be populated.
from datetime import datetime, timedelta from pybaseball import playerid_lookup from pybaseball import statcast_batter ############## Player Data ###################### # Display statcast data for a given batter batter_first_name = "Shohei" batter_last_name = "Ohtani" start_date = '2023-03-30' # Opening day, 2023 season end_date = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d') # This makes the end_date yesterday's date batter_id_df = playerid_lookup(batter_last_name, batter_first_name) batter_mlb_id = batter_id_df['key_mlbam'][0] statcast_batter_df = statcast_batter(start_date, end_date, batter_mlb_id)
2. Pybaseball Individual Player Pitching Stats in a Spreadsheet
Now we create another example pulling pitching stats for an individual player into a workbook. To do this, we start out importing similar packages from pybaseball, executing them, and then pulling the stats into a data table in the Row Zero spreadsheet.
from datetime import datetime, timedelta from pybaseball import playerid_lookup from pybaseball import statcast_pitcher ############## Player Data ###################### # Display statcast data for a given pitcher pitcher_first_name = "Sandy" pitcher_last_name = "Alcantara" start_date = '2023-03-30' # Opening day, 2023 season end_date = (datetime.now() - timedelta(1)).strftime('%Y-%m-%d') # This makes the end_date yesterday's date pitcher_id_df = playerid_lookup(pitcher_last_name, pitcher_first_name) pitcher_mlb_id = pitcher_id_df['key_mlbam'][0] statcast_pitcher_df = statcast_pitcher(start_date, end_date, pitcher_mlb_id)
3. Pybaseball Season Data in a Spreadsheet
To pull season data into a Row Zero spreadsheet, the commands are similar. First import the necessary packages into the code window. In this example, packages are imported from pybaseball that imports data from either Fangraph (batting_stats and pitching_stats) or Baseball Reference (batting_stats_bref and Pitching_stats_bref). Then define functions that pull in data for a given season (2023) and sorts the data by “Name.” Once the code window is compiled, type '=baseball_reference_batting_season_df' or any one of the other functions into the spreadsheet to see the data.
from pybaseball import batting_stats from pybaseball import batting_stats_bref from pybaseball import pitching_stats from pybaseball import pitching_stats_bref ############## Season Data ###################### # Look at a single season worth of batting or pitching data from Fangraphs and/or Baseball Reference. # Dataframes will display in the Batting Season Data and Pitching Season Data tabs. # Edit the season below to look at a different season season = 2023 # Fangraph data is displayed in tabs in the spreadsheet fangraph_batting_season_df = batting_stats(season,qual=0).sort_values("Name") fangraph_pitching_season_df = pitching_stats(season,qual=0).sort_values("Name") # Simply type "=<dataframe name>" in a spreadsheet tab to pull in baseball reference data baseball_reference_batting_season_df = batting_stats_bref(season).sort_values("Name") baseball_reference_pitching_season_df = pitching_stats_bref(season).sort_values("Name")
4. Pybaseball Team Data in a Spreadsheet
In order to get specific team data into a spreadsheet, another set of commands are used. 3 different packages are imported: batting_stats, pitching_stats, and schedule_and_record. A variable called 'team' is set and assigned to the 3 letter city acronym for the team of interest. Then a set of data frames (data tables) are created with the various team stats included. In this case, tables are created for batting, pitching, and schedule/record stats. Run the code below in the code window and paste the names of the data tables into the spreadsheet (e.g. '=team_batting_df').
from pybaseball import batting_stats from pybaseball import pitching_stats from pybaseball import schedule_and_record ############## Team Data ###################### # Change the team below using the 3-letter city acronym of the team (e.g. "STL" for St. Louis) team = 'SEA' team_batting_df = batting_stats(2023, qual = 1, ind = 1) team_batting_df = team_batting_df.loc[team_batting_df['Team'] == team] team_pitching_df = pitching_stats(2023, qual = 1, ind = 1) team_pitching_df = team_pitching_df.loc[team_pitching_df['Team'] == team] team_schedule_df = schedule_and_record(2023, team)
Conclusion
Pybaseball is one of the best free tools for analyzing baseball stats. Now with a free Row Zero account and a few python scripts, it's possible to easily import baseball stats from Pybaseball into a spreadsheet. If you don't know any python, you can easily use our baseball stats spreadsheet template to get started and dig into advance baseball statistics and build your own baseball analytics. Row Zero brings the power of big data analytics into a simple to use spreadsheet.
Open Free PyBaseball Stats Spreadsheet