How to drop rows with NAN values in DataFrame

Pandas

In this post, we will learn How to drop rows with NAN values in DataFrame of Python Pandas. We are going to use Python pandas built-in function dropna() with code examples.

Pandas dropna() function


The pandas dataframe dropna() function is used to drop the rows and columns with Nan, null, NAT values from dataframe.We can use numpy libaray to specify the null values.

Syntax

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Parameters

  • axis :It represent if the rows and coumns has missing value need to remove.
    • 0 or ‘index’ : drop rows which contain NAN/NT/NULL values.
    • 1 or column :drop columns which contain NAN/NT/NULL values.
  • how : It has two string values (any,all) , The defualt is ‘any’.
    • any : if any row or column contain any Null value.
    • all : if all rows or columns contain all NULL value.
  • thresh :It is option paramter that takes an int that determinium minimum amount of NULL value to drop.
  • Subset : This determine rows/columns to drop can passed as a list
  • inplace : it boolean values if TRUE then Source dataframe and NONE is return.

Note: The indexes of rows do not remain in the sequence after dropping the dataframe rows. We can reset the indexes by using the Dataframe reset_index(drop=True) function.

1.Pandas Drop rows if all values in that are null


In this example, we are using the dropna(how=all) function to drop all rows which has all null values.Finally resetting the index by using reset_index(drop=True).

Program Example

import pandas as pd
import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max',np.nan,np.nan,'David'],
    'Marks':[100,np.nan,np.nan,100],
    'Subject': ['Music',np.nan,np.nan, 'Physic']
}

original_df = pd.DataFrame(Student_dict)
print(f'{original_df}\n')

result_df = original_df.dropna(how='all')
print('result df:\n',result_df)

print('\n reset index :',result_df.reset_index(drop=True))

Output

    Name  Marks Subject
0    Max  100.0   Music
1  David  100.0  Physic
2    NaN    NaN     NaN
3    NaN    NaN     NaN

result df:

    Name  Marks Subject
0    Max  100.0   Music
1  David  100.0  Physic

 

2.Drop all rows with any Null value


Sometimes instead of dropping a row with all null values, we need to drop any row that contains at least one NULL value by using the dataframe dropna() function without any parameter.

Program Example

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max', 'David','laxi',np.nan],
    'Marks':[100,100,np.nan,100],
    'Subject': ['Music', 'Physic',np.nan,np.nan]
}

original_df = pd.DataFrame(Student_dict)
print(f'{original_df}\n')

result_df = original_df.dropna()
print('result DataFrame:\n',result_df)

Output

    Name  Marks Subject
0    Max  100.0   Music
1  David  100.0  Physic
2   laxi    NaN     NaN
3    NaN  100.0     NaN

result DataFrame:
     Name  Marks Subject
0    Max  100.0   Music
1  David  100.0  Physic

3. Drop row/columns when based on threshold value


We can set the threshold value in the dataframe dropna() function by passing the thresh parameter.In this example, we are dropping rows that have at least 2 or more null values.

Progarm Example

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max', 'David','brix',np.nan],
    'Marks':[100,100,np.nan,np.nan],
    'Subject': ['Math', 'Chem',np.nan,np.nan]
}

orig_df = pd.DataFrame(Student_dict)

print(f'{orig_df}\n')

res_df = orig_df.dropna(thresh=2)

print(res_df)

Output

    Name  Marks Subject
0    Max  100.0    Math
1  David  100.0    Chem
2   brix    NaN     NaN
3    NaN    NaN     NaN

    Name  Marks Subject
0    Max  100.0    Math
1  David  100.0    Chem

4. Pandas drop column if all values are null


In this example, we are using the dropna(axis=all) function to drop all column that has all null values.

Program Example

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max', 'David','brix','Mark'],
    'Marks':[np.nan,np.nan,np.nan,np.nan],
    'Subject': ['Music', 'Physic','Math','phy']
}

orig_df = pd.DataFrame(Student_dict)

print(f'{orig_df}\n')

res_df = orig_df.dropna(axis=all)
print(res_df)

Output

   Name  Marks Subject
0    Max    NaN   Music
1  David    NaN  Physic
2   brix    NaN    Math
3   Mark    NaN     phy

    Name Subject
0    Max   Music
1  David  Physic
2   brix    Math
3   Mark     phy

5. Drop column with any null or NAN value


In this example, we are using the dropna(axis=1) function to drop a column that has any null value.

Program Example

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max', 'David','brix','Mark'],
    'Marks':[100,100,np.nan,np.nan],
    'Subject': [np.nan, 'Physic','Math','phy']
}

orig_df = pd.DataFrame(Student_dict)

print(f'{orig_df}\n')

res_df = orig_df.dropna(axis=1)
print(res_df)

Output

    Name  Marks Subject
0    Max  100.0     NaN
1  David  100.0  Physic
2   brix    NaN    Math
3   Mark    NaN     phy

    Name
0    Max
1  David
2   brix
3   Mark

6. Specify label while looking for null values


In this example we have defined the label (subset=[‘Marks’]) to drop the null values in column marks.

Program Example

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Max','Rock','Lexi','David'],
    'Marks':[100,100,np.nan,100],
    'Subject': ['Music','Music','Science', np.nan]
}

original_df = pd.DataFrame(Student_dict)

print(f'{original_df}\n')

result_df = original_df.dropna(subset=['Marks'])

print('result df:\n',result_df)

Output

     Name  Marks  Subject
0    Max  100.0    Music
1   Rock  100.0    Music
2   Lexi    NaN  Science
3  David  100.0      NaN

result df:
     Name  Marks Subject
0    Max  100.0   Music
1   Rock  100.0   Music
3  David  100.0     NaN

Summary

We have learned multiple ways of How to drop rows with NAN values in DataFrame of python pandas with example.