In this post, we will learn How to drop rows with NAN values in DataFrame of Python Pandas. We are going to use Python pandas built-in function dropna() with code examples.
Pandas dropna() function
The pandas dataframe dropna() function is used to drop the rows and columns with Nan, null, NAT values from dataframe.We can use numpy libaray to specify the null values.
Syntax
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Parameters
- axis :It represent if the rows and coumns has missing value need to remove.
- 0 or ‘index’ : drop rows which contain NAN/NT/NULL values.
- 1 or column :drop columns which contain NAN/NT/NULL values.
- how : It has two string values (any,all) , The defualt is ‘any’.
- any : if any row or column contain any Null value.
- all : if all rows or columns contain all NULL value.
- thresh :It is option paramter that takes an int that determinium minimum amount of NULL value to drop.
- Subset : This determine rows/columns to drop can passed as a list
- inplace : it boolean values if TRUE then Source dataframe and NONE is return.
Note: The indexes of rows do not remain in the sequence after dropping the dataframe rows. We can reset the indexes by using the Dataframe reset_index(drop=True) function.
1.Pandas Drop rows if all values in that are null
In this example, we are using the dropna(how=all) function to drop all rows which has all null values.Finally resetting the index by using reset_index(drop=True).
Program Example
import pandas as pd
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max',np.nan,np.nan,'David'],
'Marks':[100,np.nan,np.nan,100],
'Subject': ['Music',np.nan,np.nan, 'Physic']
}
original_df = pd.DataFrame(Student_dict)
print(f'{original_df}\n')
result_df = original_df.dropna(how='all')
print('result df:\n',result_df)
print('\n reset index :',result_df.reset_index(drop=True))
Output
Name Marks Subject
0 Max 100.0 Music
1 David 100.0 Physic
2 NaN NaN NaN
3 NaN NaN NaN
result df:
Name Marks Subject
0 Max 100.0 Music
1 David 100.0 Physic
2.Drop all rows with any Null value
Sometimes instead of dropping a row with all null values, we need to drop any row that contains at least one NULL value by using the dataframe dropna() function without any parameter.
Program Example
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max', 'David','laxi',np.nan],
'Marks':[100,100,np.nan,100],
'Subject': ['Music', 'Physic',np.nan,np.nan]
}
original_df = pd.DataFrame(Student_dict)
print(f'{original_df}\n')
result_df = original_df.dropna()
print('result DataFrame:\n',result_df)
Output
Name Marks Subject
0 Max 100.0 Music
1 David 100.0 Physic
2 laxi NaN NaN
3 NaN 100.0 NaN
result DataFrame:
Name Marks Subject
0 Max 100.0 Music
1 David 100.0 Physic
3. Drop row/columns when based on threshold value
We can set the threshold value in the dataframe dropna() function by passing the thresh parameter.In this example, we are dropping rows that have at least 2 or more null values.
Progarm Example
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max', 'David','brix',np.nan],
'Marks':[100,100,np.nan,np.nan],
'Subject': ['Math', 'Chem',np.nan,np.nan]
}
orig_df = pd.DataFrame(Student_dict)
print(f'{orig_df}\n')
res_df = orig_df.dropna(thresh=2)
print(res_df)
Output
Name Marks Subject
0 Max 100.0 Math
1 David 100.0 Chem
2 brix NaN NaN
3 NaN NaN NaN
Name Marks Subject
0 Max 100.0 Math
1 David 100.0 Chem
4. Pandas drop column if all values are null
In this example, we are using the dropna(axis=all) function to drop all column that has all null values.
Program Example
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max', 'David','brix','Mark'],
'Marks':[np.nan,np.nan,np.nan,np.nan],
'Subject': ['Music', 'Physic','Math','phy']
}
orig_df = pd.DataFrame(Student_dict)
print(f'{orig_df}\n')
res_df = orig_df.dropna(axis=all)
print(res_df)
Output
Name Marks Subject
0 Max NaN Music
1 David NaN Physic
2 brix NaN Math
3 Mark NaN phy
Name Subject
0 Max Music
1 David Physic
2 brix Math
3 Mark phy
5. Drop column with any null or NAN value
In this example, we are using the dropna(axis=1) function to drop a column that has any null value.
Program Example
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max', 'David','brix','Mark'],
'Marks':[100,100,np.nan,np.nan],
'Subject': [np.nan, 'Physic','Math','phy']
}
orig_df = pd.DataFrame(Student_dict)
print(f'{orig_df}\n')
res_df = orig_df.dropna(axis=1)
print(res_df)
Output
Name Marks Subject
0 Max 100.0 NaN
1 David 100.0 Physic
2 brix NaN Math
3 Mark NaN phy
Name
0 Max
1 David
2 brix
3 Mark
6. Specify label while looking for null values
In this example we have defined the label (subset=[‘Marks’]) to drop the null values in column marks.
Program Example
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Max','Rock','Lexi','David'],
'Marks':[100,100,np.nan,100],
'Subject': ['Music','Music','Science', np.nan]
}
original_df = pd.DataFrame(Student_dict)
print(f'{original_df}\n')
result_df = original_df.dropna(subset=['Marks'])
print('result df:\n',result_df)
Output
Name Marks Subject
0 Max 100.0 Music
1 Rock 100.0 Music
2 Lexi NaN Science
3 David 100.0 NaN
result df:
Name Marks Subject
0 Max 100.0 Music
1 Rock 100.0 Music
3 David 100.0 NaN
Summary
We have learned multiple ways of How to drop rows with NAN values in DataFrame of python pandas with example.