Lazy and Dumb II

This is the time of year to do offseason prospect lists. Even though I enjoy other sports, we’re missing baseball for another several weeks, and nothing compares to it. To each their own, but that’s my mine.

On the smiley face days, I’ve been able to keep up with everything I decided I want to do with the first six months of this year. There’s a super-cereal calendar staring back at me and everything.

New Years’ optimism makes you forget that there’s going to be a bunch of frowny face days in there, too, because stupid brain chemistry. I’m not optimistic about doing the two a week that would have the series finished by the end of March.

In reality, a lot of other things bring me joy, and the internet feels like a second world that sucks me away from it and makes those things less joyful by comparison. I (and many others) have now lived more than half of my life in this paradigm, so I hesitate to mentally commit myself to spending even more time on it. I don’t think this is a healthy world for mental health. Meanwhile, the real world sure is going to shit.

As for scouting Baseball Reference minor league numbers and doing my best with video (disclaimer: I am not experienced at this), I’m just not sure it’s adding much to the conversation anyway. I’d rather find it in me to complete the series than not, but it’s not an exciting project right now. I also don’t want to be derivative of the work of others, and there’s some real good work out there in this niche.

I don’t know how some people manage to stay on top of a couple thousand minor league players, in addition to all the kids pouring in every year from American high schools and colleges, not to mention the Dominican Republic and Venezuela and a couple dozen other countries. You’ve got to maintain contacts and weigh a bunch of sometimes disparate information to corroborate your evaluations.

Every year, dozens and dozens and dozens of minor leaguers are traded, and only a few of them are of the chosen upper echelon list of prospect. Most of them are interesting, developing players, however. There’s a large number of prospects who could make it, and a much smaller subgroup of them will make it. I don’t think anyone would dispute that. So how do you know which ones will and which ones won’t?

I think that’s a very, very difficult answer, and projecting outward is a very, very difficult job. In the course of thinking about this, I began a draft database project. I don’t have objective draft-based metrics to guide the probability of success or failure outcomes, and I can’t be the only one who’d be interested in some. I know that every year, a bunch of advanced college arms will be drafted, and a few of them will turn into tomorrow’s 5th starters and middle relievers and most of them won’t, and figuring out who is who is an imperfect science of guesswork.

My project will attempt to offer probabilistic guidance on outcomes. I want to answer questions like:

  • What are the odds of a 5th round pick reaching MLB and producing at least one 2 WAR season?
  • How often do draft-sourced 23-year-olds reach MLB in any capacity when they haven’t yet reached AA?
  • How are 3, 4 and 5+ WAR seasons distributed among the draft pool?
  • What are the odds of any minor league draftee at any level of reaching MLB after spending four full seasons in the minors?
  • How is MLB debut age distributed among the draft pool?

The process of creating a database involves a lot of manual data gathering and is very time intensive. I’m scrubbing from two different websites and I’m tracking every draftee’s debut level, as well as affiliate level and, if applicable, MLB season WAR by age through age 29. I have finished 1.5 drafts at this point, and I’ve drafted a plan to get through all of the remaining drafts by May, which is a brisk pace but will allow time for some other projects.

Another motivation for a draft database is to give myself a data pool I have familiarity with, which I think would make learning database query and programming languages easier to tackle if I follow through with trying to master those skills over the next couple years. A language you know will help you with a language you don’t. I’m not 100% sure that I want to dedicate a very significant part of my life to living in databases, but exploring that avenue is my happily vague plan.

At the point of finishing the database and acquiring advanced data set skills, I would be equipped to develop a projection system based on the probabilistic outputs I’m developing. I can see returning to these draft classes in the future to scrub additional data, such as BB%, K% and ISO, three factors I lean on heavily for projecting minor league players forward. Minor league performance data would improve the viability of any model, but I think there will nonetheless be quite a lot of helpful information to look at from age, minor league seasons and draft position alone. I like this as a jumping off point for considering growing that database into a projection model incorporating position and minor league performance data, at that point.

There are a lot of questions I’d like to dig into using this data, but I’ve got to collect all of it before I can do anything with it. Hopefully I’ll have some interesting draft articles for you to chew on in the weeks leading up to this year’s draft.