In this post, we will learn how to Sum rows by a condition in Pandas Dataframe or Pandas rows sum on condition with example. To use the Pandas library first we have to install it on the local system by using the pip command “pip install pandas” and import it into our code by using “import pandas as pd” to use its functions.
1. How to Sum rows by condition Pandas Dataframe using loc[]
The Python Pandas sum() function returns the sum of the given values over the requested axis. If the parameter is axis=0 for summing the rows of Pandas dataframe it is the default value. axis=1 means to Sum the columns of Pandas dataframe.In this example, we are aggregating Pandas rows by condition. Where values of Marks columns > 97.When we sum all rows of Pandas dataframe along with axis =0 column-wise. The non-numeric values are appended on the header, to avoid that we have used numeric_only=True.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj.loc[dfobj['Marks']>97].sum(axis=0,numeric_only=True)
print(row_Sum)
Output
Marks 200
Fee 700
Tution_Fee 1300
dtype: int64
- Pandas sum rows by columns(6 ways)
- Remove rows with duplicate indices Pandas
- Pandas sum columns by multiple conditions
2. How to Sum rows by Mutiple conditions Pandas Dataframe
In this example, we have used the sum() function with axis=0 along with loc[] to Sum rows of the dataframe under multiple conditions.We are Summing the rows that have ‘Marks’]>97)&(dfobj[‘Fee’] >=200.
- When we sum all rows of Pandas dataframe along with axis =0 column-wise. The non-numeric values are appended on the header, to avoid that we have used numeric_only=True.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,500,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj.loc[(dfobj['Marks']>97)&(dfobj['Fee'] >=200)].sum(axis=0,numeric_only=True)
print(row_Sum)
Output
Marks 200
Fee 900
Tution_Fee 1300
dtype: int64
3. How to Sum rows by condition Pandas Dataframe using query()
The pandas.DataFrame.query() function is used to evaluate a string query .In the below example we have passed a string query ‘Marks>97’ to sum the rows of Pandas dataframe based on condition.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj.query('Marks>97').sum(axis=0,numeric_only=True)
print(row_Sum)
Output
Marks 200
Fee 700
Tution_Fee 1300
dtype: int64
4. Pandas Sum rows by multiple conditions using query()
In this python program we have used the pandas.DataFrame.query() function and passed a query string ‘Marks>97 and Fee >=200’ with multiple condition to Sum rows based multiple condition.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,500,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj.query('Marks>97 and Fee >=200').sum(axis=0,numeric_only=True)
print(row_Sum)
Output
Marks 200
Fee 900
Tution_Fee 1300
dtype: int64
5. Pandas sum rows by multiple conditions using group by
Sometimes we have to sum rows of Pandas dataframe by group.The groupby() function is used to group the identical repeated values in given data and return a Dataframegroupby that contain result after applying aggregate function like sum() and mean.In this example we are using the groupby() function by multiple conditions.
import pandas as pd
data = {
'Name': ['Rama', 'Rack', 'Max', 'David'],
'Marks':[97,97,100,100],
'Fee':[100,200,300,400],
'Tution_Fee':[400,500,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj.groupby('Marks').apply(lambda x: x[x['Marks']> 97].sum(axis=0,numeric_only=True))
print(row_Sum)
Output
Marks Fee Tution_Fee
Marks
97 0 0 0
100 200 700 1300
6. How to Sum rows by condition Pandas Dataframe
In this example We have used the simple way, column label to select rows based on condition and apply sum() to sum rows of Pandas dataframe based on condition.
import pandas as pd
data = {
'Name': ['Jack', 'Jack', 'Max', 'Jack'],
'Marks':[97,97,100,100],
'Admit_fee':[201,201,205,206],
'Fee':[100,200,300,400],
'Tution_Fee':[400,400,600,700]
}
dfobj = pd.DataFrame(data)
#to avoid concatenation of the non numeric values in the row header numeric_only=True
row_Sum = dfobj[dfobj['Marks']>97].sum(axis=0,numeric_only=True)
print(row_Sum)
Output
Marks 200
Admit_fee 411
Fee 700
Tution_Fee 1300
dtype: int64