FEATURE EXTRACTION
DATA CLEANING PYTHON
DATA MUNGING
MACHINE LEARNING RECIPES
PANDAS CHEATSHEET
ALL TAGS
# How to drop out highly correlated features in Python?

This recipe helps you drop out highly correlated features in Python

In many datasets we find some of the features which are highly correlated that means which are some what linearly dependent with other features. These features contribute very less in predicting the output but increses the computational cost.

This data science python source code does the following:

1. Calculates correlation between different features.

2. Drops highly correlated features to escape curse of dimensionality.

3. Linear and non-linear correlation.

So we have to find out the correlation between the features and remove the features which have correlation coefficient greater than a certain limit.

So this recipe is a short example of how to find the correlation between the features and remove the highly correlated features.

**Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects**

```
import pandas as pd
import numpy as np
from sklearn import datasets
```

We have imported numpy, pandas and datasets. We will use datasets to get the inbuilt iris dataset.

Here we have used datasets to load the inbuilt iris dataset and we have created objects X and y to store the data and the target value respectively. With the data in X we have created a dataframe and printing the first five rows. ```
iris = datasets.load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X)
print(df.head())
```

So now we are creating a square matrix with dimensions equal to the number of features. In which we will have the elements as the absolute value of correlation between the features. ```
cor_matrix = df.corr().abs()
print(cor_matrix)
```

Note that Correlation matrix will be mirror image about the diagonal and all the diagonal elements will be 1. So, It does not matter that we select the upper triangular or lower triangular part of the correlation matrix but we should not include the diagonal elements. So we are selecting the upper traingular. ```
upper_tri = cor_matrix.where(np.triu(np.ones(cor_matrix.shape),k=1).astype(np.bool))
print(upper_tri)
```

So we are selecting the columns which are having absolute correlation greater than 0.95 and making a list of those columns named 'to_drop'. ```
to_drop = [column for column in upper_tri.columns if any(upper_tri[column] > 0.95)]
print(); print(to_drop)
```

Now we are droping the columns which are in the list 'to_drop' from the dataframe ```
df1 = df.drop(df.columns[to_drop], axis=1)
print(); print(df1.head())
```

**Explore More Data Science and Machine Learning Projects for Practice. Fast-Track Your Career Transition with ProjectPro**

In the output, initially there will be the dataframe with 4 columns. Then there will be the correlation matrix in which we can observe all diagonal elements as 1 and the upper triangular and lower triangular are the mirror image. After that there will be upper triangular matrix and the final dataframe with the highly correlated columns removed.

0 1 2 3 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2 0 1 2 3 0 1.000000 0.117570 0.871754 0.817941 1 0.117570 1.000000 0.428440 0.366126 2 0.871754 0.428440 1.000000 0.962865 3 0.817941 0.366126 0.962865 1.000000 0 1 2 3 0 NaN 0.11757 0.871754 0.817941 1 NaN NaN 0.428440 0.366126 2 NaN NaN NaN 0.962865 3 NaN NaN NaN NaN [3] 0 1 2 0 5.1 3.5 1.4 1 4.9 3.0 1.4 2 4.7 3.2 1.3 3 4.6 3.1 1.5 4 5.0 3.6 1.4

**
Download Materials
**

Estimating churners before they discontinue using a product or service is extremely important. In this ML project, you will develop a churn prediction model in telecom to predict customers who are most likely subject to churn.

This data science in python project predicts if a loan should be given to an applicant or not. We predict if the customer is eligible for loan based on several factors like credit score and past history.

In this machine learning pricing project, we implement a retail price optimization algorithm using regression trees. This is one of the first steps to building a dynamic pricing model.

In this loan prediction project you will build predictive models in Python using H2O.ai to predict if an applicant is able to repay the loan or not.

In this data science project, you will predict borrowers chance of defaulting on credit loans by building a credit score prediction model.

Music Recommendation Project using Machine Learning - Use the KKBox dataset to predict the chances of a user listening to a song again after their very first noticeable listening event.

Deep Learning Project- Learn to apply deep learning paradigm to forecast univariate time series data.

In this machine learning and IoT project, we are going to test out the experimental data using various predictive models and train the models and break the energy usage.

Use the Adult Income dataset to predict whether income exceeds 50K yr based oncensus data.

FEAST Feature Store Example- Learn to use FEAST Feature Store to manage, store, and discover features for customer churn prediction machine learning project.