top of page

Predicting Tennis Matches with Logistic Regression

Keywords:

#Regression       #Predictive       #Modeling       #Sports

As part of a capstone project for my major in applied mathematics, I collaborated with a team to explore how machine learning could be used to predict the outcomes of professional tennis matches. By applying logistic regression to decades of historical match data, we aimed to model the probability of a player winning based on match conditions and player statistics.

ml1.png
ml2.png

Project Introduction

The project centered around classification through logistic regression, a statistical method well-suited for binary outcomes, such as win or loss. Unlike linear regression, which estimates continuous values, logistic regression outputs probabilities between 0 and 1, making it ideal for predicting match results. We trained our model on a dataset containing player stats, match outcomes, and contextual features like rank, age, and handedness.

We focused specifically on matches from the US Open, using the data to build a reliable predictive model. While one of my teammates implemented a Markov Chain approach for comparison, I focused on optimizing our logistic regression model and understanding the performance trade-offs between different techniques.

Model Optimization

My contributions involved implementing the logistic regression model and tuning it for performance. This required transforming raw data through the sigmoid function, calculating likelihoods, and minimizing the negative log-likelihood (NLL) to find the best-fit parameters.

To improve accuracy, we used hyperparameter tuning via RandomizedSearchCV and evaluated the model using key metrics: accuracy, precision, recall, and the AUC score. Logistic regression ultimately offered clear, interpretable predictions while accounting for dynamic factors such as player performance changes over time.

While no model is without limitations, logistic regression can be sensitive to outliers and depends heavily on the quality of the input data, it proved to be an effective method for this task. Our initial exploration with Markov Chains revealed limitations in capturing conditional dependencies, validating our pivot to machine learning.

 

You can view our presentation here:

bottom of page