MLB Draft Research Sample



For the first edition of this database, I have decided to use the 1997-2007 MLB draft classes to provide an 11-year sample for three reasons: (1) stable number of teams/draft picks per round in year-to-year sample, following the 1997 expansion in Arizona and Tampa Bay to today’s 30 team environment; (2) a sample of draft classes that predates significant changes to resource allocation, primarily through the implementation of draft pools as dictated by the 2011 Collective Bargaining Agreement; (3) all drafted players have completed their age 29 season.

Draft History

Draft class history is provided by Baseball Reference, which provides round-by-round signing and background data with very limited exceptions. If signing status is listed as ‘Unknown’, it’s assumed that the player was never signed, provided he made no professional appearances at any level.

Player Data

I record the professional level by age of any drafted player through age 29, regardless of whether the player was traded or released by his original organization.

I award credit to the player for the highest level of affiliated baseball he reached at any given age, regardless of sample size. A player in rookie ball who plays one game in AA on an emergency basis is credited for having spent the season in question at AA. If he does not play at any affiliated level in a given season, or his professional career is already over, I denote his affiliated level as N for null.

For MLB seasons through age 29, I record the player’s fWAR (Fangraphs Wins Above Replacement value). Because across the sample I want to analyze frequency, distribution and probability of a given value threshold occurring (e.g. any 3rd round pick’s chance at a 3 fWAR season), I take an extra step to also place each season in a “bucket”. I broke WAR into columns of .5 increments from less than -1 WAR (think Chris Davis’ 2018 season) through greater than 10 WAR. Buckets are inclusive: that is, if a player had a 3 WAR season, a 5 WAR season, and a 7 WAR season, in my database this will be reflected as three seasons of 3+ WAR, two seasons of 5+ WAR, and one season of 7 WAR.

Here are a number of variables I am isolating, which will perhaps be renamed further on into this project:

  • DraftAge – the player’s age on draft day using June 30th (customary) as a cutoff. Let’s say that it’s 2019, like it is. A player born on June 29th, or June 30th for that matter, in 1994 would be 25 years old for the purposes of his 2019 baseball season. A player born on July 1st, 1994 would be considered 24 years old.
  • LvlAgeX – Each player has 13 columns of LvlAgeX, where X is every year between his potential age 17 and age 29 seasons. This can only be R, A-, A, A+, AA, AAA or MLB for players who appeared in an affiliated professional game. If the player was drafted but did not appear in a game in his DraftAge year, or any year that follows, this column will be marked null (N).
  • fWARAgeX – These are the accompanying fWAR columns for each age season and are only filled when the player appeared in a MLB game for the age season in question.
  • PeakLvl – The highest affiliated level of professional baseball that the drafted player ever played in through age 29.
  • AgeMLBDebut – If the player ever appeared in any MLB game, this is the age season when he made his debut.
  • MLB (Y/N) – Did the player ever appear in MLB?
  • MLB30+ – I added this column to account for the rare cases in which a player did not reach MLB during my sample of age 17-29 seasons, but did eventually reach MLB. I decided that as long as I was doing all this work, it would be nice to capture all MLB debuts, regardless of age.
  • YrsNonMLB – This is how many full seasons a drafted player spent in affiliated professional baseball before either (a) making his MLB debut the following season or (b) his affiliated professional career ending.
  • DraftDebut (Y/N) – Did the player make an appearance at any level of affiliated professional baseball in the summer immediately following the June draft he was selected in?
  • AgeDebut – The age season in which the player made his first appearance at any level of affiliated professional baseball.
  • LvlDebut – The highest level of affiliated professional baseball a player reached during the season in which the player made his first professional appearance.
  • AgeLast – The age season in which the player made his last appearance at any level of affiliated professional baseball. Note that some players continue playing beyond age 29. Age 29 is the endpoint of this sample, so in those cases we refer to the Age 29 season as the player’s last year of ball.
  • LvlLast – The highest level of affiliated professional baseball a player played at before either his professional career ended, or alternatively, the highest level of affiliated professional baseball that a player made an appearance at during his age 29 season.
  • YearsToMLB – This variable is only for players who made an MLB debut at some point during or before their age 29 season. It is otherwise the same as YrsNonMLB, except it includes the MLB debut season in its count, whereas YrsNonMLB isolates only the non-MLB seasons prior to debut.
  • MLBYears – The number of seasons during which a player spent any time in MLB between age 17 through age 29.

Study Design Choices

10 years ago, I created a different data-intensive draft study that used $/WAR to attempt to assign draft slot values. It was on the internet for several years, and although I can only hope a backup somehow exists somewhere, I deleted the blog that hosted it. Darn.

Something I learned from working through that study was that projects are as complicated as you choose for them to be. I would go through each player’s entire career history and attempt to identify the point at which a player would have exhausted his full six seasons of cost control (three league minimum seasons followed by, usually, three years of arbitration) that would have been potentially available to the team that originally drafted the player. This was a time intensive nightmare that likely added little value to the findings of the study.

I chose to stratify seasons by age rather than worry about service time at all. For most players, this has no effect whatsoever, since most draftees never reach MLB at all, and many who do fall out of the league well before playing six full seasons there. Still, it also means that a small handful of very successful draftees are credited for more MLB years than the six cost-controlled years available to the team. On one hand, this is a limitation of my methodology, in that this small handful of very successful draftees will have their success somewhat overstated as it relates to the team that originally drafted them. Most players reach free agency after their first six full seasons in MLB.

Conversely, I look at the 2006 draft file, and two of the first names to appear are star players Clayton Kershaw and Evan Longoria. Because of the rapid ascent of each player through the minor leagues, Kershaw was able to appear in MLB in 10 years through age 29, with Longoria playing in MLB in 8 years through age 29. It should be noted that both Kershaw and Longoria avoided free agency by signing what are generally regarded to be team-friendly extensions with their original teams. On this basis, there is arguably significant nonzero value in drafting the rights to a future star MLB player. In these two cases, both players provided value significantly below market cost to the teams that drafted them, well beyond the conclusion of each player’s sixth full season in MLB. Acquiring productive players at below market cost is a primary purpose of the MLB draft, and I don’t feel my methodology choice here negates the viability of the research.

All draftees’ professional timelines begin with the age season in which they were drafted, regardless of whether they played any affiliated professional ball in that season or not. This is a mouthful. If a player was drafted in 1997 but made his pro debut in rookie ball in 1998, I count 1998 as his second season. It’s a fair criticism to say this penalizes players for circumstances, like finite roster sizes or workload management, that are beyond their control. I did not make this choice with an intent to punish a player whose debut was held back. Age relative to league is a big factor in the career trajectory of any given player. The player has lost potential development time in each successive season whether or not he appeared in affiliated professional ball. I also want to be able to compare all draftees to one another on how many years it took them to either reach MLB or fall out of affiliated professional ball. If four years elapsed before a draftee reached MLB, but he only played in the minor leagues for the final three of those four years, it still took four years following the draft selection to develop a MLB contributor.

Only affiliated professional baseball is considered a part of a draftee’s career. The Mexican League (MEX) is considered a AAA professional level by MLB and is reflected as such. Nippon Professional Baseball (NPB; Japanese baseball) is not affiliated with MLB and is not counted in the sample. Neither are professional independent baseball leagues, like the Frontier League. Even if a player continues his professional career in a non-affiliated capacity, I only track his progress through the affiliated levels of MLB. Note that a handful of draftees play some minor league baseball, are released, play several seasons in an independent league, and return to play more affiliated minor league baseball. In these cases, I count each age season in between stints in affiliated minor league baseball as null (N), and I continue tracking a player’s progress through either his final affiliated minor league baseball season or his age 29 season, whichever comes first.

Drafted players who sign but never appear in affiliated professional baseball are not excluded from the sample. These cases are uncommon. Excluding them from the sample would reduce (slightly) the failure rates of draftees, as well as exaggerate (slightly) the number of years any given draftee spends in affiliated professional baseball prior to reaching MLB or his career coming to an end. In these uncommon cases, draftees spent 0 years in affiliated professional baseball and variables like DraftDebut, AgeDebut and LvlDebut are all null (N). They are still data points for the given draft class, as well as for the subset of draftees who sign professional contracts. They can always be excluded with relative ease from specific analyses that are built off of this database.