In this post, we are going to explore the most asked Python 15 Pandas Interview Questions for Data Science.
1. How to fill nan values in pandas with zero
To file values in dataframe these are the methods
- Replace():It can be used to replace ‘string’,’regx’,’dictionary’,’list’
- fillna():It is used to fill NA/NAN using
Using fillna() method
import pandas as pd
import numpy as np
Student_dict = {
'Name': ['Jack', 'Rack', np.nan],
'Marks':[100.5,np.nan, np.nan],
'Subject': [np.nan, 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict)
dfobj['Marks'] = dfobj['Marks'].fillna(0)
print (dfobj)
#whole dataframe
df = dfobj.fillna(0)
print (df)
Output
Name Marks Subject
0 Jack 100.5 NaN
1 Rack 0.0 Math
2 NaN 0.0 Music
Replace() method
#single column
dfobj['Marks'] = dfobj['Marks'].replace(np.nan, 0)
print (dfobj)
#whole dataframe
df = dfobj.replace(np.nan, 0)
print (df)
Output
Name Marks Subject
0 Jack 100.5 NaN
1 Rack 0.0 Math
2 NaN 0.0 Music
2. How to replace nan value with custom value
In this example Using the above dataframe
dfobj = pd.DataFrame(Student_dict)
dfobj['Name'].fillna('Max', inplace=True)
print(dfobj.iloc[0:3,[0]])
Output
Name
0 Jack
1 Rack
2 Max
3. How to filter data in pandas
To filter data in pandas dataframe there are multiple ways
- isin()
- logical operator
- str()
- Tilde(~)
- Query
- Nlargest or nsmallest
- iloc or loc
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict)
#logical operator
dflog = dfobj[(dfobj.Marks > 70) & (dfobj.Subj == 'Math')]
print ('\nlogical:\n',dflog)
#isin()
dfisin = dfobj[dfobj.Name.isin(['Max'])]
print ('\n isin():\n ',dfisin)
#str
dfStr = dfobj[dfobj.Name.str.startswith('J')]
print ('\n str:\n ',dfStr)
#Tilde: not
dfTilde = dfobj[~dfobj.Name.str.startswith('J')]
print ('\n Tilde:\n ',dfTilde)
#Query
dfQuery = dfobj.query('Marks > 80 and Subj == "Music"')
print ('\n Query:\n ',dfTilde)
#nlarget
dflarget = dfobj.nlargest(2,'Marks')
print ('\nnlarget:\n ',dflarget)
#/nsmallest
dfnsmallest = dfobj.nsmallest(1,'Marks')
print ('\nsmallest:\n ',dfnsmallest)
#iloc
dfiloc = dfobj.iloc[1:3, :]
print ('\n iloc:\n ',dfiloc)
#loc()
dfloc = dfobj.loc[1:1, :]
print ('\n loc:\n',dfloc)
Output
logical:
Name Marks Subj
1 Rack 80 Math
isin():
Name Marks Subj
2 Max 100 Music
str:
Name Marks Subj
0 Jack 70 Math
Tilde:
Name Marks Subj
1 Rack 80 Math
2 Max 100 Music
Query:
Name Marks Subj
1 Rack 80 Math
2 Max 100 Music
nlarget:
Name Marks Subj
2 Max 100 Music
1 Rack 80 Math
smallest:
Name Marks Subj
0 Jack 70 Math
iloc:
Name Marks Subj
1 Rack 80 Math
2 Max 100 Music
loc:
Name Marks Subj
1 Rack 80 Math
4. How to filter null values in pandas dataframe
Find rows and columns with missing values we will use isnull() function that creates a boolean series of True for nan or missing values.
import pandas as pd
import numpy as np
Student_dict = {
'Name': [np.nan, 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', np.nan]
}
dfobj = pd.DataFrame(Student_dict)
#Columns with missing values
print('Column with missing values:\n',dfobj.isnull().any())
#Rows with missing values
print('\nRows with missing values:\n',dfobj.isnull().any(axis=1))
Output
Column with missing values:
Name True
Marks False
Subj True
dtype: bool
Rows with missing values:
0 True
1 False
2 True
dtype: bool
5. How to filter not null values in pandas dataframe
notnull(): This method creates a boolean series of False for missing values(NAN).
import pandas as pd
import numpy as np
Student_dict = {
'Name': [np.nan, 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', np.nan]
}
dfobj = pd.DataFrame(Student_dict)
#Columns with missing values
print('Column with not missing values:\n',dfobj.notnull())
Output
Column with not missing values:
Name Marks Subj
0 False True True
1 True True True
2 True True False
6. How to filter not null values of dataframe column
The pandas series method notnull() along with column name can be used to filleter the column data without missing or nan values
import pandas as pd
import numpy as np
Student_dict = {
'Name': [np.nan, 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', np.nan]
}
dfobj = pd.DataFrame(Student_dict)
#Columns with missing values
print(dfobj["Name"][pd.notnull(dfobj["Name"])])
Output
1 Rack
2 Max
Name: Name, dtype: object
7. How to filter dataframe rows without NaN specified column using isna()
import pandas as pd
import numpy as np
Student_dict = {
'Name': [np.nan, 'Rack', 'Max'],
'Marks':[70,80, 100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict)
#Rows without nan/missing values
print(dfobj[~dfobj['Name'].isna()])
Output
Name Marks Subj
1 Rack 80 Math
2 Max 100 Music
8. How to filter dataframe by indexes single or mutiple
In this example we have used filter() method of dataframe
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[70,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict)
#single row
df_one = dfobj.filter(items = [1], axis=0)
print(df_one)
#mutiple rows
df_muti = dfobj.filter(items = [0,1], axis=0)
print('\nfilter based on mutiple index:\n',df_muti)
Output
Name Marks Subj
1 Rack 80 Math
filter based on mutiple index:
Name Marks Subj
0 Jack 70 Math
1 Rack 80 Math
9. How to filter dataframe based on no-numeric index
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[70,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
#single row
df_one = dfobj.filter(items = ['Row_2'], axis=0)
print(df_one)
Output
Name Marks Subj
Row_2 Rack 80 Math
10. How to Select Multiple Index Values Contain Specific String
In this, we want to select multiple indexes which contain the specified string ‘row’ as in the below dataframe all the three indexes contain ‘row’ string so it selects all three rows.
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[70,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
df_multi = dfobj.filter(like = 'Row', axis=0)
print(df_multi
Output
Name Marks Subj
Row_1 Jack 70 Math
Row_2 Rack 80 Math
Row_3 Max 100 Music
11.How to filter dataframe with mutiple columns in Pandas
loc(): to select multiple columns of data we can use the index or name label.
by using query this works only with columns
eq() that works only with columns
Using logical operator
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[100,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
#using query
df_multi = dfobj.query('Marks > 80 and Subj == "Math" and Name=="Jack"')
#using eval()
df_eval = dfobj[dfobj.eval('Marks > 80 and Subj == "Math" and Name=="Jack"')]
#using logical operator
df_log = dfobj[(dfobj["Name"]=="Jack") & (dfobj["Subj"]=="Math")]
#using loc
df_loc = dfobj.loc[(dfobj["Name"].str.startswith("J") & (dfobj["Subj"]=="Math")) & (dfobj['Marks']==100)]
print('\n',df_multi)
print('\n',df_eval)
print('\n',df_log)
print('\n':df_loc)
Output
Name Marks Subj
Row_1 Jack 100 Math
Name Marks Subj
Row_1 Jack 100 Math
Name Marks Subj
Row_1 Jack 100 Math
Name Marks Subj
Row_1 Jack 100 Math
12. How to filter Pandas DataFrame by column values
To select column value we can loc() with logical operator
isin() to select values based on some matched values.
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[100,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
options = ['Math', 'Music']
#using loc
df_loc = dfobj.loc[(dfobj["Name"].str.startswith("J") & (dfobj["Subj"]=="Math")) & (dfobj['Marks']==100)]
# using isin()
df_isin = dfobj[dfobj['Subj'].isin(options)]
print('\nisin:\n',
df_isin)
print('\nloc:\n',df_loc)
Output
Name Marks Subj
Row_1 Jack 100 Math
13. How to filter row based on index in Pandas dataframe
To filter rows based on integer index/select by the position we can use iloc[] and loc[] method of dataframe.
iloc[] : filters rows and columns by number( integer index/by position).Every row in the dataframe has a row number starting from 0 to the total number of rows and the column has a column number starting from 0 to the total number of columns.
iloc[row_index:column_index]
loc[] : select rows on label/index or select row by condition
Using dataframe in the above program.
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
df_iloc = dfobj.iloc[0:2]
df_row_colum = dfobj.iloc[0:2,[1]]
#using loc based on index label ''Row_1','Row_2'
df_loc1 = dfobj.loc[['Row_1','Row_2']]
print('\nloc:\n',df_loc1)
print('\n select 1 and 2nd rows:\n',df_iloc)
print('\n select 1 and 2nd rows and first column:\n',df_row_colum)
Output
loc:
Name Marks Subj
Row_1 Jack 100 Math
Row_2 Rack 80 Math
select 1 and 2nd rows:
Name Marks Subj
Row_1 Jack 100 Math
Row_2 Rack 80 Math
select 1 and 2nd rows and first column:
Marks
Row_1 100
Row_2 80
14. How to Slice a dataframe by using loc[]
import pandas as pd
Student_dict = {
'Name': ['Jack', 'Rack', 'Max'],
'Marks':[100,80,100],
'Subj': ['Math', 'Math', 'Music']
}
dfobj = pd.DataFrame(Student_dict)
#using loc 1,2 rows for 'Name' column
df_loc1 = dfobj.loc[0:1,'Name']
print('\nloc:\n',df_loc1)
Output
loc:
0 Jack
1 Rack
Name: Name, dtype: object
15. How to filter row from list of index in dataframe
dfobj = pd.DataFrame(Student_dict,index =['Row_1','Row_2','Row_3'])
lst_of_indexes = [0,2]
rows = dfobj.iloc[lst_of_indexes, :]
print(rows)
Output
Name Marks Subj
Row_1 Jack 100 Math
Row_3 Max 100 Music