15 Pandas interview Questions for data Science

Interview Question

In this post, we are going to explore the most asked Python 15 Pandas interview Questions for data Science.

1. How to fill nan values in pandas with zero


To file values in dataframe these are the methods

  • Replace():It can be used to replace ‘string’,’regx’,’dictionary’,’list’
  • fillna():It is used to fill NA/NAN using

Using fillna() method

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Jack', 'Rack', np.nan],
    'Marks':[100.5,np.nan, np.nan],
    'Subject': [np.nan, 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)


dfobj['Marks'] = dfobj['Marks'].fillna(0)

print (dfobj)


#whole dataframe
df = dfobj.fillna(0) 

print (df)

Output

   Name  Marks Subject
0  Jack  100.5     NaN
1  Rack    0.0    Math
2   NaN    0.0   Music

Replace() method

#single column
dfobj['Marks'] = dfobj['Marks'].replace(np.nan, 0) 

print (dfobj)


#whole dataframe
df = dfobj.replace(np.nan, 0) 

print (df)

Output

   Name  Marks Subject
0  Jack  100.5     NaN
1  Rack    0.0    Math
2   NaN    0.0   Music

2. How to replace nan value with custom value


In this example Using the above dataframe

dfobj = pd.DataFrame(Student_dict)



dfobj['Name'].fillna('Max', inplace=True)



print(dfobj.iloc[0:3,[0]])

Output

   Name
0  Jack
1  Rack
2   Max

3. How to filter data in pandas


To filter data in pandas dataframe there are multiple ways

  • isin()
  • logical operator
  • str()
  • Tilde(~)
  • Query
  • Nlargest or nsmallest
  • iloc or loc
import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)

#logical operator

dflog = dfobj[(dfobj.Marks > 70) & (dfobj.Subj == 'Math')]
print ('\nlogical:\n',dflog)

#isin()
dfisin = dfobj[dfobj.Name.isin(['Max'])]
print ('\n isin():\n ',dfisin)

#str
dfStr =  dfobj[dfobj.Name.str.startswith('J')]
print ('\n str:\n ',dfStr)

#Tilde: not
dfTilde =  dfobj[~dfobj.Name.str.startswith('J')]
print ('\n Tilde:\n ',dfTilde)

#Query
dfQuery =   dfobj.query('Marks > 80 and Subj == "Music"')
print ('\n Query:\n ',dfTilde)


#nlarget
dflarget =   dfobj.nlargest(2,'Marks')
print ('\nnlarget:\n ',dflarget)

#/nsmallest
dfnsmallest =   dfobj.nsmallest(1,'Marks')
print ('\nsmallest:\n ',dfnsmallest)

#iloc
dfiloc =  dfobj.iloc[1:3, :]
print ('\n iloc:\n  ',dfiloc)

#loc()
dfloc =  dfobj.loc[1:1, :]
print ('\n loc:\n',dfloc)

Output

logical:
    Name  Marks  Subj
1  Rack     80  Math


 isin():
    Name  Marks   Subj
2  Max    100  Music



 str:
     Name  Marks  Subj
0  Jack     70  Math


Tilde:
     Name  Marks   Subj
1  Rack     80   Math
2   Max    100  Music


Query:
     Name  Marks   Subj
1  Rack     80   Math
2   Max    100  Music

 
nlarget:
     Name  Marks   Subj
2   Max    100  Music
1  Rack     80   Math


smallest:
     Name  Marks  Subj
0  Jack     70  Math

 
iloc:
      Name  Marks   Subj
1  Rack     80   Math
2   Max    100  Music


 loc:
    Name  Marks  Subj
1  Rack     80  Math

4. How to filter null values in pandas dataframe


Find rows and columns with missing values we will use isnull() function that creates a boolean series of True for nan or missing values.

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': [np.nan, 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', np.nan]
}
 

dfobj = pd.DataFrame(Student_dict)

#Columns with missing values
print('Column with missing values:\n',dfobj.isnull().any())


#Rows with missing values
print('\nRows with missing values:\n',dfobj.isnull().any(axis=1))

Output

Column with missing values:
 Name      True
Marks    False
Subj      True
dtype: bool

Rows with missing values:
 0     True
1    False
2     True
dtype: bool

5. How to filter not null values in pandas dataframe


notnull(): This method creates a boolean series of False for missing values(NAN).

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': [np.nan, 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', np.nan]
}
 

dfobj = pd.DataFrame(Student_dict)

#Columns with missing values
print('Column with not missing values:\n',dfobj.notnull())

Output

Column with not missing values:
     Name  Marks   Subj
0  False   True   True
1   True   True   True
2   True   True  False

6. How to filter not null values of dataframe column


The pandas series method notnull() along with column name can be used to filleter the column data without missing or nan values

import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': [np.nan, 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', np.nan]
}
 

dfobj = pd.DataFrame(Student_dict)

#Columns with missing values
print(dfobj["Name"][pd.notnull(dfobj["Name"])])

Output

1    Rack
2     Max
Name: Name, dtype: object

7. How to filter dataframe rows without NaN specified column using isna()


import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': [np.nan, 'Rack', 'Max'],
    'Marks':[70,80, 100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict)

#Rows without nan/missing  values
print(dfobj[~dfobj['Name'].isna()])

Output

   Name  Marks   Subj
1  Rack     80   Math
2   Max    100  Music

8. How to filter dataframe by indexes single or mutiple


In this example we have used filter() method of dataframe

import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[70,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict)


#single row

df_one = dfobj.filter(items = [1], axis=0)
print(df_one)

#mutiple rows 

df_muti = dfobj.filter(items = [0,1], axis=0) 

print('\nfilter based on mutiple index:\n',df_muti)

Output

   Name  Marks  Subj
1  Rack     80  Math

filter based on mutiple index:
    Name  Marks  Subj
0  Jack     70  Math
1  Rack     80  Math

9. How to filter dataframe based on no-numeric index


import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[70,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])


#single row

df_one = dfobj.filter(items = ['Row_2'], axis=0)
print(df_one)

Output

     Name  Marks  Subj
Row_2  Rack     80  Math

10. How to Select Multiple Index Values Contain Specific String


In this, we want to select multiple indexes which contain the specified string ‘row’ as in the below dataframe all the three indexes contain ‘row’ string so it selects all three rows.

import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[70,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])




df_multi = dfobj.filter(like = 'Row', axis=0)
print(df_multi

Output

     Name  Marks   Subj
Row_1  Jack     70   Math
Row_2  Rack     80   Math
Row_3   Max    100  Music

11.How to filter dataframe with mutiple columns in Pandas


loc(): to select multiple columns of data we can use the index or name label.

by using query this works only with columns

eq() that works only with columns

Using logical operator

import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[100,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])

#using query 
df_multi = dfobj.query('Marks > 80 and Subj == "Math" and Name=="Jack"')

#using eval()
df_eval = dfobj[dfobj.eval('Marks > 80 and Subj == "Math" and Name=="Jack"')]


#using logical operator
df_log = dfobj[(dfobj["Name"]=="Jack") & (dfobj["Subj"]=="Math")]
#using loc 
df_loc = dfobj.loc[(dfobj["Name"].str.startswith("J") & (dfobj["Subj"]=="Math")) & (dfobj['Marks']==100)]


print('\n',df_multi)
                       
print('\n',df_eval)

print('\n',df_log)

print('\n':df_loc)

Output

        Name  Marks  Subj
Row_1  Jack    100  Math

        Name  Marks  Subj
Row_1  Jack    100  Math

        Name  Marks  Subj
Row_1  Jack    100  Math


     Name  Marks  Subj
Row_1  Jack    100  Math

12. How to filter Pandas DataFrame by column values


To select column value we can loc() with logical operator

isin() to select values based on some matched values.

import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[100,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])


options = ['Math', 'Music'] 
    


#using loc 
df_loc = dfobj.loc[(dfobj["Name"].str.startswith("J") & (dfobj["Subj"]=="Math")) & (dfobj['Marks']==100)]

# using isin() 
df_isin = dfobj[dfobj['Subj'].isin(options)] 
    
print('\nisin:\n',
      df_isin)



print('\nloc:\n',df_loc)


Output

 Name  Marks  Subj
Row_1  Jack    100  Math

13. How to filter row based on index in Pandas dataframe


To filter rows based on integer index/select by the position we can use iloc[] and loc[] method of dataframe.

iloc[] : filters rows and columns by number( integer index/by position).Every row in the dataframe has a row number starting from 0 to the total number of rows and the column has a column number starting from 0 to the total number of columns.

iloc[row_index:column_index]

loc[] : select rows on label/index or select row by condition

Using dataframe in the above program.

dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])

    
df_iloc = dfobj.iloc[0:2]
df_row_colum = dfobj.iloc[0:2,[1]]

#using loc based on index label ''Row_1','Row_2' 
df_loc1 = dfobj.loc[['Row_1','Row_2']]

print('\nloc:\n',df_loc1)
print('\n select 1 and 2nd rows:\n',df_iloc)
print('\n select 1 and 2nd rows and first column:\n',df_row_colum)

Output

loc:
        Name  Marks  Subj
Row_1  Jack    100  Math
Row_2  Rack     80  Math

 select 1 and 2nd rows:

        Name  Marks  Subj
Row_1  Jack    100  Math
Row_2  Rack     80  Math

 select 1 and 2nd rows and first column:
        Marks
Row_1    100
Row_2     80

14. How to Slice a dataframe by using loc[]


import pandas as pd

 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':[100,80,100],
    'Subj': ['Math', 'Math', 'Music']
}
 

dfobj = pd.DataFrame(Student_dict)


#using loc 1,2 rows for 'Name' column
df_loc1 = dfobj.loc[0:1,'Name']


print('\nloc:\n',df_loc1)

Output

loc:
 0    Jack
1    Rack
Name: Name, dtype: object

15. How to filter row from list of index in dataframe


dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])

lst_of_indexes = [0,2]
rows = dfobj.iloc[lst_of_indexes, :]
print(rows)

Output

   Name  Marks   Subj
Row_1  Jack    100   Math
Row_3   Max    100  Music