Building F1 Race Prediction Models with Python and Scikit-learn

12 min readSep 20, 2025

As a McLaren F1 fan and data enthusiast, I wanted to combine my interests by building machine learning models to predict race outcomes. Here's how I built models that predict F1 race results with reasonable accuracy.

The Data

F1 has excellent historical data available through various APIs and datasets. I collected: - Race results from 2014-2024 - Qualifying positions - Driver and team performance metrics - Circuit characteristics - Weather data

The data cleaning process was extensive. F1 has rule changes, team changes, and driver changes that all affect performance. I had to normalize data across different eras.

Feature Engineering

The key to good predictions is good features. I created: - Driver form (average position in last 5 races) - Team form (average points in last 5 races) - Circuit-specific performance (how well a driver/team performs at specific tracks) - Qualifying position (strong predictor of race result) - Weather conditions (rain changes everything in F1)

python

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Prepare features
X = df[['qualifying_pos', 'driver_form', 'team_form', 'circuit_score']]
y = df['finish_position']

# Split and train
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)

Model Selection

I tested several models: - Random Forest (best overall performance) - Gradient Boosting - Neural Networks - Logistic Regression

Random Forest performed best, likely because it handles the non-linear relationships in F1 data well. A driver's performance isn't linear with their qualifying position, especially in wet conditions.

Results

The model predicts: - Podium finishers with 65% accuracy - Top 10 finishers with 75% accuracy - Race winner with 45% accuracy

These numbers might not seem impressive, but F1 is inherently unpredictable. Crashes, mechanical failures, and strategy calls can change everything.

Interesting Findings

The model revealed some interesting patterns: - Qualifying position is the strongest predictor (no surprise) - Recent form matters more than historical performance - Some circuits are more predictable than others - Weather is a huge wildcard that's hard to model

Future Improvements

I'm working on: - Incorporating strategy predictions (tire choices, pit stops) - Better weather modeling - Real-time updates during race weekends - Ensemble methods combining multiple models

The Code

The entire project is open source on my GitHub. It uses Python, Pandas for data processing, Scikit-learn for modeling, and Streamlit for visualization.

Building this project taught me a lot about feature engineering, model selection, and the importance of domain knowledge. Understanding F1 strategy and rules was just as important as understanding machine learning algorithms.