Convert string column to int in Pandas

Pandas

In this post, we are going to understand how to Convert string column to int in Pandas using some of the built-in methods that can be single or multiple columns.

1. astype(int) to Convert string column to int in Pandas


The astype() method allows us to pass datatype explicitly, even we can use Python dictionary to change multiple datatypes at a time, Where keys specify the column and values specify the new datatype.

Program Example

import pandas as pd
 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':['100','100', '100'],
    'Subject': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)


dfobj['Marks'] = dfobj['Marks'].astype(int)

print ('\n string to int:\n',dfobj)
print ('\n converted datatype :\n',dfobj.dtypes)

Output

 string to int:
    Name  Marks Subject
0  Jack    100    Math
1  Rack    100    Math
2   Max    100   Music

 converted datatype :
 Name       object
Marks       int32
Subject    object
dtype: object

2. Convert multiple string column to int in Pandas


In this example, we are converting multiple columns that have a numeric string to int by using the astype(int) method of the Pandas library.

We are using a Python dictionary to change multiple columns datatype Where keys specify the column and values specify a new datatype.

Program Example

import pandas as pd
 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':['100','100', '100'],
    'Fee':['100','200','300'],
    'Subject': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)

dict_columns_type = {'Marks': int,
                'Fee': int
               }
  
dfobj = dfobj.astype(dict_columns_type)
print('dataframe str to int:\n',dfobj)

print(f'\n {dfobj.dtypes}')

Output

dataframe str to int:
    Name  Marks  Fee Subject
0  Jack    100  100    Math
1  Rack    100  200    Math
2   Max    100  300   Music

 Name       object
Marks       int32
Fee         int32
Subject    object
dtype: object

3. to_numeric() to convert single string column to int


The to_numeric() function is used to convert non-numeric values to suitable numeric type. In this, We can use both numeric or non-numeric values. It raises this error “ValueError: Unable to parse string” , the error parameter of to_numeric() method is used to handle this error.

The error parametehas two values

  • errors=’coerce’ used to convert non-numeric values to NAN
  • errors=’ignore’ use to ignore the error

Program Example

import pandas as pd
 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':['100','100', 'z100'],
    'Subject': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)


dfobj['Marks'] = pd.to_numeric(dfobj['Marks'], errors='coerce')


print ('\n string to int:\n',dfobj)
print ('\n converted datatype :\n',dfobj.dtypes)

Output

 string to int:
    Name  Marks Subject
0  Jack  100.0    Math
1  Rack  100.0    Math
2   Max    NaN   Music

 converted datatype :
 Name        object
Marks      float64
Subject     object
dtype: object

We can change the ‘NAN’ values to 0 by using the replace() method as we have done in the below example

Program to Replace Nan values to 0


import pandas as pd
import numpy as np
 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':['100','100', 'z100'],
    'Subject': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)


dfobj['Marks'] = pd.to_numeric(dfobj['Marks'], errors='coerce')
dfobj = dfobj.replace(np.nan, 0, regex=True)


print ('\n string to int :\n',dfobj)

Output

 string to int :
    Name  Marks Subject
0  Jack  100.0    Math
1  Rack  100.0    Math
2   Max    0.0   Music

4. to_numeric() to convert multiple string column to int


In this example, we are using apply() method and passing datatype to_numeric as an argument to change columns numeric string value to an integer.

Program Example

import pandas as pd
 
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max'],
    'Marks':['100','100', '100'],
    'Fee':['100','200','300'],
    'Subject': ['Math', 'Math', 'Music']
}
 


dfobj = pd.DataFrame(Student_dict)

dfobj[['Marks','Fee']]= dfobj[['Marks','Fee']].apply(pd.to_numeric)
  

print('dataframe str to int:\n',dfobj)

print(f'\n {dfobj.dtypes}')

Output

dataframe str to int:
    Name  Marks  Fee Subject
0  Jack    100  100    Math
1  Rack    100  200    Math
2   Max    100  300   Music

 Name       object
Marks       int64
Fee         int64
Subject    object
dtype: object

5. Convert entire dataframe to int


To convert an entire dataframe column to int we just need to call the astype() method using the dataframe object.

Program Example

import pandas as pd
 
Student_dict = {
    'StudID': ['12', '13', '14'],    
    'Marks':['100','100', '100'],
    'Fee':['100','200','300']
    
}
 


dfobj = pd.DataFrame(Student_dict)

dfobj= dfobj.astype(int)
  

print('dataframe str to int:\n',dfobj)

print(f'\n {dfobj.dtypes}')



Output

dataframe str to int:
    StudID  Marks  Fee
0      12    100  100
1      13    100  200
2      14    100  300

 StudID    int32
Marks     int32
Fee       int32
dtype: object

Summary

In this post, we have understood multiple ways of how to Convert string columns to int in Pandas with examples using the built-in method. These methods are also used to convert string to float.