In this post, we are going to learn Pandas dataframe filter by multiple conditions that include filter dataframe by column values, filter dataframe by rows and columns position, Filter dataframe based multiple column values:isin() ,using Tilde(~) operator, using str() function.
Filter Pandas dataframe by column values
In this Python program example, we will learn three ways to filter pandas dataframe by column values.
Using query() function
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[100,80, 100],
'Subj': ['Math', 'Music', 'Math']
}
dfobj = pd.DataFrame(Student_dict)
dfQuery = dfobj.query('Marks > 80 and Subj == "Math"')
print ('\n Query:\n ',dfQuery)
Output
Query:
Name Marks Subj
0 Jack 100 Math
2 Max 100 Math
Using loc[] function
dfobj = pd.DataFrame(Student_dict)
dfloc = dfobj.loc[(dfobj.Marks > 80 )& (dfobj.Subj == "Math")]
print ('fliter df by Using loc :\n ',dfloc)
Output
Name Marks Subj
0 Jack 100 Math
2 Max 100 Math
Using logical operator
dfobj = pd.DataFrame(Student_dict)
dfloc = dfobj[(dfobj.Marks > 80 )& (dfobj.Subj == "Math")]
print ('fliter df by Using logical operator :\n ',dfloc)
Output
fliter df by Using logical operator :
Name Marks Subj
0 Jack 100 Math
2 Max 100 Math
2. Filter dataframe by rows and column position
To filter rows based on integer index/select by the position we can use iloc[] and loc[] method of dataframe.
iloc[] : filters rows and columns by number( integer index/by position).Every row in the dataframe has a row number starting from 0 to the total number of rows and the column has a column number starting from 0 to the total number of columns.The iloc[] always takes integer index e.g : iloc[row_index:column_index]
loc[] : select rows on label/index or select row by condition
Select based on row position iloc[]
dfobj = pd.DataFrame(Student_dict)
#selecting based on position :iloc[]
df_iloc = dfobj.iloc[0:2]
print('\n select 1st, 2nd rows and all colums:\n',df_iloc)
df_row_colum = dfobj.iloc[0:2,[1]]
print('\n select 1st 2nd rows and first column:\n',df_row_colum)
Output
select 1st, 2nd rows and all colums:
Name Marks Subj
Row_1 Jack 100 Math
Row_2 Rack 80 Music
select 1st 2nd rows and first column:
Marks
Row_1 100
Row_2 80
Using loc[] select based on index label
#setting the index label for rows
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
#selecting based on index label ''Row_1','Row_2' : loc[]
df_loc1 = dfobj.loc[['Row_1','Row_2']]
print('select row_1 and row_2 all columns:\n',df_loc1)
Output
select row_1 and row_2 all columns:
Name Marks Subj
Row_1 Jack 100 Math
Row_2 Rack 80 Music
3. Filter dataframe based mutiple column values :isin()
The isin() method is used to filter the dataframe based on multiple values of the column.
dfobj = dfobj.loc[(dfobj.Subj == "Math") | (dfobj.Subj == "Music")]
#clean and short code of same as above line
dfobj = dfobj[dfobj.Subj.isin(["Math", "Music"])]
print(dfobj)
Output
Name Marks Subj
0 Jack 100 Math
1 Rack 80 Music
2 Max 100 Math
4. Filtter dataframe using Tilde(~) operator
The titde(~) operator adds the not equal logical operation in python if added in front of expression. It will opposite to the result of the condition.; like a not operator in other programming languages
dfresult = dfobj[~((dfobj.Marks > 80 )& (dfobj.Subj == "Math"))]
print(dfresult)
Output
Name Marks Subj
1 Rack 80 Music
5.Filter dataframe using str() function
The .str accessor allows us to filter dataframe based on string data by enabling the string functions means we can use the python string function while filtering data of the dataframe.
dfresult = dfobj[dfobj.Name.str.startswith('J')]
dfres_1 = dfobj[dfobj.Name.str.contains('J|M')]
print(dfresult)
print('\n',dfres_1)
Output
Name Marks Subj
0 Jack 100 Math
Name Marks Subj
0 Jack 100 Math
2 Max 100 Math
6.Filter dataframe based not null/Missing value
Filter entire dataframe, not null values
import pandas as pd
import numpy as np
Student_dict = {
'Name': [np.nan, 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', np.nan]
}
dfobj = pd.DataFrame(Student_dict)
print('dataframe no missing value:\n',dfobj.notnull())
filter not null values of dataframe column
print(dfobj["Name"][pd.notnull(dfobj["Name"])])
Output
1 Rack
2 Max
Name: Name, dtype: object
filter not null values of dataframe row
print(dfobj[~dfobj['Name'].isna()])
Output
Name Marks Subj
1 Rack 80 Math
2 Max 100 NaN
7.Filter dataframe by index mutiple using filter()
The dataframe filter method filters the dataframe based on index single or multiple or non-numeric indexes.
#mutiple rows
print(dfobj.filter(items = [0,1], axis=0))
Output
Name Marks Subj
0 Jack 100 Math
1 Rack 80 Music
8. Filter dataframe by row postion and column name
print(dfobj.loc[dfobj.index[0:3],["Name","Subj"]])
Output
Name Subj
0 Jack Math
1 Rack Music
2 Max Math
9.Filter dataframe based on mutiple columns using eval()
In this example selecting multiple colmn using pandas eval().
#using eval()
print(dfobj[dfobj.eval('Marks > 80 and Subj == "Math" and Name=="Jack"')]
)
Output
Name Marks Subj
0 Jack 100 Math