read_csv skip rows while reading CSV to Dataframe

Pandas

In this post, we are going to learn how to read_csv skip rows while reading CSV to Dataframe that includes Pandas read_csv skip row at the start, end, skip rows at a specific position, skip rows by condition, Pandas read_csv skip N rows after the header and many more

Pandas read_csv() method


Pandas library has a built-in read_csv() method to read a CSV file to Dataframe. It read the file at the given path and read its contents in the dataframe.

Syntax

pandas.read_csv(filepath_or_buffer,sep='',skiprows=N)

Parameters

  • filepath_or_buffer :Path of file.
  • Skiprows:The numbers of rows to skips
    • If int then skip row from top.
    • if list of index is passed then skip rows for given indexes.
    • if callback function then check for each given index need to skip row or not.
  • sep: The default seperator is comma(,).We can use custom separtor as per need.

Sample CSV File

Name, Subjs,Marks
Alex,Phy,100
Ben,Chem,100
Jack,Math,100
Max,Phy,100
Tawn,Chem,100
Bruise,Math,100

1. Pandas read_csv skiprow at start


In this example, we are skipping 4 rows from the start of the CSV file.

Program Example

import pandas as pd

studf = pd.read_csv('student.csv', skiprows = 4)
print(studf)


Output

      Max   Phy  100
0    Tawn  Chem  100
1  Bruise  Math  100

2. Pandas read_csv skip rows by condition


The callback function or lambda function is passed to skiprows arguments of read_csv(). This function is called for each row to check if the row needs to be skipped or not.

Program Example

#python3 program to Pandas read_csv skip rows by condition

import pandas as pd

def fun_skiprows(index):
    if index % 2 == 0:
       return True
    return False

studf = pd.read_csv('student.csv', skiprows= lambda row: fun_skiprows(row) )
print(studf)

Output

    Jack     Math  100
0   Rack  Physisc   90
1    Max     Math  100
2  David    Music  100
3   Tawn     Chem   90

3. Pandas read_csv skip rows at specific postion


In this example, we have specified the list of rows number from which we need to skip the rows.

Program Example

#python3 program to Pandas read_csv skip rows at specific postion 

import pandas as pd

studf = pd.read_csv('student.csv', skiprows = [1, 2, 4])
print(studf)

Output

     Name  Subjs  Marks
0    Jack   Math    100
1    Tawn   Chem    100
2  Bruise   Math    100

4. Pandas read_csv skip N rows from end


The pandas reda_csv() method skipfooter argument is used to specify the number of rows to skip from the end or footer. In this example, we are skipping 4 rows from the end of the CSV file.

The engine = ‘python’ to avoid the fatal warning:-

ParserWarning: Falling back to the ‘python’ engine because the ‘c’ engine does not support skipfooter; you can avoid this warning by specifying engine=’python’.

Program Example

import pandas as pd


studf = pd.read_csv('student.csv', skipfooter = 4,
                  engine = 'python' )
print(studf)

Output

   Name     Mark  Subject
0  Jack     Math    100.0
1   NaN      NaN      NaN
2  Rack  Physisc     90.0
3   NaN      NaN      NaN
4   Max     Math    100.0

5.Pandas read_csv skip N rows after header


In this example, we are skipping 4 rows from start except for the column name.

Program Example

import pandas as pd

studf = pd.read_csv('student.csv', skiprows = [x for x in range(1, 4)])
print(studf)

Output

     Name  Subjs  Marks
0     Max    Phy    100
1    Tawn   Chem    100
2  Bruise   Math    100

6. Pandas read_csv skip rows and Data Column


Sometimes we do not want to load unwanted data, So to load required columns, we use usecols to specify the indexes of columns.

In this example, we are skipping the rows [1,2] from the start and loading columns using usecols =[0,1]. The header=0 is used to specify the first row consider it as header information.

Program Example

import pandas as pd

studf = pd.read_csv(
    'student.csv',sep = ',',skiprows=[1,2],header =0,usecols = [0, 1]
)
print(studf.head(10))


Output

     Name  Subjs
0    Jack   Math
1     Max    Phy
2    Tawn   Chem
3  Bruise   Math

7. Pandas read_csv skip empty rows


In this example, we are skipping the empty rows using while reading the CSV file to the dataframe.

Sample File with empty rows

 Name, Subjs,Marks
Alex,Phy,100
,,
,,
Ben,Chem,100
Jack,Math,100
Max,Phy,100

Program Example

  • First converted the CSV file to dataframe using read_csv().
  • Then using the dataframe object with dropna() method to drop empty rows.
  • We are resetting the Index using the reset_index() method.
import pandas as pd

dfobj = pd.read_csv('student.csv')

dfobj = dfobj.dropna()
#to reset the index
dfobj = dfobj.reset_index(drop=True)

print(dfobj)

Output

  Name  Subjs  Marks
0  Alex    Phy  100.0
1   Ben   Chem  100.0
2  Jack   Math  100.0
3   Max    Phy  100.0

Summary

In this post, we have learned different ways of how to read_csv skip rows while reading CSV to Dataframe.