Pandas dataframe filter by multiple conditions

In this post, we are going to learn Pandas dataframe filter by mutiple conditions that include filter dataframe by column values, filter dataframe by rows and columns position, Filter dataframe based mutiple column values:isin() ,using Tilde(~) operator, using str() function

Filter Pandas dataframe by column values


In this python program example, we will learn three ways to filter pandas dataframe by column values.

Using query() function

import pandas as pd

Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[100,80, 100],
    'Subj': ['Math', 'Music', 'Math']
}

dfobj = pd.DataFrame(Student_dict) 

dfQuery =   dfobj.query('Marks > 80 and Subj == "Math"')
print ('\n Query:\n ',dfQuery)

Output

 Query:
     Name  Marks  Subj
0  Jack    100  Math
2   Max    100  Math

Using loc[] function

dfobj = pd.DataFrame(Student_dict) 

dfloc =   dfobj.loc[(dfobj.Marks > 80 )& (dfobj.Subj == "Math")]
print ('fliter df by Using loc :\n ',dfloc)

Output

    Name  Marks  Subj
0  Jack    100  Math
2   Max    100  Math

Using logical operator

dfobj = pd.DataFrame(Student_dict) 

dfloc =   dfobj[(dfobj.Marks > 80 )& (dfobj.Subj == "Math")]
print ('fliter df by Using logical operator :\n ',dfloc)

Output

fliter df by Using logical operator :
     Name  Marks  Subj
0  Jack    100  Math
2   Max    100  Math

2. Filter dataframe by rows and column position


To filter rows based on integer index/select by the position we can use iloc[] and loc[] method of dataframe.

iloc[] : filters rows and columns by number( integer index/by position).Every row in the dataframe has a row number starting from 0 to the total number of rows and the column has a column number starting from 0 to the total number of columns.The iloc[] always takes integer index e.g : iloc[row_index:column_index]

loc[] : select rows on label/index or select row by condition

Select based on row position iloc[]

dfobj = pd.DataFrame(Student_dict)


#selecting based on position :iloc[]
     
df_iloc = dfobj.iloc[0:2]

print('\n select 1st, 2nd rows and all colums:\n',df_iloc)

df_row_colum = dfobj.iloc[0:2,[1]]

print('\n select 1st 2nd rows and first column:\n',df_row_colum)


Output

 select 1st, 2nd rows and all colums:
        Name  Marks   Subj
Row_1  Jack    100   Math
Row_2  Rack     80  Music

 select 1st 2nd rows and first column:
        Marks
Row_1    100
Row_2     80

Using loc[] select based on index label

#setting the index label for rows
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])

 
#selecting based on index label ''Row_1','Row_2' : loc[]
df_loc1 = dfobj.loc[['Row_1','Row_2']]
 
print('select row_1 and row_2 all columns:\n',df_loc1)

Output

select row_1 and row_2 all columns:
        Name  Marks   Subj
Row_1  Jack    100   Math
Row_2  Rack     80  Music

3. Filter dataframe based mutiple column values :isin()


The isin() method is used to filter the dataframe based on multiple values of the column.

 

dfobj = dfobj.loc[(dfobj.Subj == "Math") | (dfobj.Subj == "Music")]


#clean and short code of same as above line
dfobj = dfobj[dfobj.Subj.isin(["Math", "Music"])]

print(dfobj)

Output

   Name  Marks   Subj
0  Jack    100   Math
1  Rack     80  Music
2   Max    100   Math

4. Filtter dataframe using Tilde(~) operator


The titde(~) operator adds the not equal logical operation in python if added in front of expression. It will opposite to the result of the condition.; like a not operator in other programming languages

dfresult =   dfobj[~((dfobj.Marks > 80 )& (dfobj.Subj == "Math"))]

print(dfresult)

Output

Name  Marks   Subj
1  Rack     80  Music

5.Filter dataframe using str() function


The .str accessor allows us to filter dataframe based on string data by enabling the string functions means we can use the python string function while filtering data of the dataframe.

dfresult =  dfobj[dfobj.Name.str.startswith('J')]

dfres_1 =  dfobj[dfobj.Name.str.contains('J|M')]

print(dfresult)


print('\n',dfres_1)

Output

  Name  Marks  Subj
0  Jack    100  Math

    Name  Marks  Subj
0  Jack    100  Math
2   Max    100  Math

6.Filter dataframe based not null/Missing value


Filter entire dataframe, not null values

import pandas as pd
import numpy as np
  
Student_dict = {
    'Name': [np.nan, 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', np.nan]
}
  
 
dfobj = pd.DataFrame(Student_dict)


print('dataframe no missing value:\n',dfobj.notnull())

filter not null values of dataframe column

print(dfobj["Name"][pd.notnull(dfobj["Name"])])

Output

1    Rack
2     Max
Name: Name, dtype: object

filter not null values of dataframe row

print(dfobj[~dfobj['Name'].isna()])

Output

   Name  Marks  Subj
1  Rack     80  Math
2   Max    100   NaN

7.Filter dataframe by index mutiple using filter()


The dataframe filter method filters the dataframe based on index single or multiple or non-numeric indexes.

#mutiple rows 
print(dfobj.filter(items = [0,1], axis=0))

Output

 Name  Marks   Subj
0  Jack    100   Math
1  Rack     80  Music

8. Filter dataframe by row postion and column name

print(dfobj.loc[dfobj.index[0:3],["Name","Subj"]])

Output

  Name   Subj
0  Jack   Math
1  Rack  Music
2   Max   Math

9.Filter dataframe based on mutiple columns using eval()


In this example selecting multiple colmn using pandas eval().

#using eval()
 
print(dfobj[dfobj.eval('Marks > 80 and Subj == "Math" and Name=="Jack"')]
)

Output

 Name  Marks  Subj
0  Jack    100  Math