Decision Tree Regression using sklearn - Python

Last Updated : 02 Jun, 2025

Decision Tree Regression is a method used to predict continuous values like prices or scores by using a tree-like structure. It works by splitting the data into smaller parts based on simple rules taken from the input features. These splits help reduce errors in prediction. At the end of each branch, called a leaf node the model gives a prediction usually the average value of that group. In the tree:

Decision Nodes (shown as diamonds) ask yes/no questions about the data, like “Is age greater than 50?”
Leaf Nodes (shown as rectangles) give the final predicted number based on the data that reached that point.

python — Workflow of Decision Tree Regression

Branches connect nodes and represent the outcome of a decision. For example if the answer to a condition is "Yes," you follow one branch; if "No," you follow another. In below example it shows a decision tree that evaluates the smallest of three numbers:

Implementation of Decision Tree Regression

For example we want to predict house prices based on factors like size, location and age. A Decision Tree Regressor can split the data based on these features such as checking the location first, then the size and finally the age. This way it can accurately predicts the price by considering the most impactful factors first making it useful and easy to interpret.

Step 1: Importing the required libraries

We will import the following libraries.

NumPy: For numerical computations and array handling
Matplotlib: For plotting graphs and visualizations
We import different modules from scikit-learn (sklearn) for various tasks such as modeling, data splitting, tree visualization, and performance evaluation.

Python

import numpy as np import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeRegressor, export_text from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error

Step 2: Creating a Sample Dataset

Here we create a synthetic dataset using numpy library, where the feature values X are randomly sampled and sorted between 0 and 5, and the target y is a noisy sine function of X. The scatter plot visualizes the data points, showing how the target values vary with the feature.

Python

np.random.seed(42) X = np.sort(5 * np.random.rand(100, 1), axis=0) y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])  plt.scatter(X, y, color='red', label='Data') plt.title("Synthetic Dataset") plt.xlabel("Feature") plt.ylabel("Target") plt.legend() plt.show()

Output:

Step 3: Splitting the Dataset

We split the dataset into train and test dataset using the train_test_split function into the ratio of 70% training and 30% testing. We also set a random_state=42 to ensure reproducibility.

Python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 4: Initializing the Decision Tree Regressor

Here we used DecisionTreeRegressor method from Sklearn python library to implement Decision Tree Regression. We also define the max_depth as 4 which controls the maximum levels a tree can reach , controlling model complexity.

Python

regressor = DecisionTreeRegressor(max_depth=4, random_state=42)

Step 5: Fiting Decision Tree Regressor Model

We fit our model using the .fit() method on the X_train and y_train , so that the model can learn the relationships between different variables.

Python

regressor.fit(X_train, y_train)

Output:

DecisionTreeRegressor(max_depth=4, random_state=42)

Step 6: Predicting a New Value

We will now predict a new value using our trained model using the predict() function. After that we also calculated the mean squared error (MSE) to check how accurate is our predicted value from the actual value , telling how well the model fits to our training data.

Python

y_pred = regressor.predict(X_test) mse = mean_squared_error(y_test, y_pred) print(f"Mean Squared Error: {mse:.4f}")

Output:

Mean Squared Error: 0.0151

Step 7: Visualizing the result

We will visualise the regression line our model has calculated to see how well the decision tree fits the data and captures the underlying pattern, especially showing how the predictions change smoothly or in steps depending on the tree's splits.

Python

X_grid = np.arange(min(X), max(X), 0.01)[:, np.newaxis] y_grid_pred = regressor.predict(X_grid)  plt.figure(figsize=(10, 6)) plt.scatter(X, y, color='red', label='Data') plt.plot(X_grid, y_grid_pred, color='blue', label='Model Prediction') plt.title("Decision Tree Regression") plt.xlabel("Feature") plt.ylabel("Target") plt.legend() plt.show()

Output:

Step 8: Export and Show the Tree Structure below

For better understanding we used plot_tree to visualize the structure of the decision tree to understand how the model splits the feature space, showing the decision rules at each node and how the tree partitions the data to make predictions.

Python

from sklearn.tree import plot_tree  plt.figure(figsize=(20, 10)) plot_tree(     regressor,     feature_names=["Feature"],     filled=True,     rounded=True,     fontsize=10 ) plt.title("Decision Tree Structure") plt.show()

Output:

download8- — Visualized Decision Tree Regression

Decision Tree Regression is used for predicting continuous values effectively capturing non-linear patterns in data. Its tree-based structure makes model interpretability easy as we can tell why a decision was made and why we get this specific output. This information can further be used to fine tune model based on it flow of working.

Ensemble Methods in Python

AnkanDas22

Improve

Article Tags :

Practice Tags :

Machine Learning

Decision Tree Regression using sklearn - Python

Implementation of Decision Tree Regression

Step 1: Importing the required libraries

Step 2: Creating a Sample Dataset

Step 3: Splitting the Dataset

Step 4: Initializing the Decision Tree Regressor

Step 5: Fiting Decision Tree Regressor Model

Step 6: Predicting a New Value

Step 7: Visualizing the result

Step 8: Export and Show the Tree Structure below

Similar Reads

Linear Model Regression

Linear Model Classification

Regularization

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning