Replace nan with mean in NumPy

In this post, we are going to learn how to replace nan with mean in NumPy array using built in function np.colmean(), np.take(), nanmean(),using masked array,isnan() and zip().

1. np.colmean() and np.take() to Replace nan with Mean

Table of Contents


In this Python code example, we are using np.colmean() and np. take(), np.where() functions to replace mean with average in numpy array.

  • np.colmean() : To find the mean of numpy columns.
  • np. where(): To find all the indices where NAN values exist in NumPy array
  • np. take(): To replace the mean of column indices with the mean of columns.
import numpy as np

nparr = np.array([[ 5,np.nan, 15, 45], [ 9,  np.nan, 11, 60],
 [16,10, 19, 70],
 [18, 26, 20, np.nan],
 [20, 7, 21, np.nan]])





# column mean
colmean = np.nanmean(nparr, axis = 0)
  

  
# find indices of nan values
Indxs = np.where(np.isnan(nparr))
  
# replace Indxs with mean of column
nparr[Indxs] = np.take(colmean, Indxs[1])
  

print (nparr)

Output

[[ 5.         14.33333333 15.         45.        ]
 [ 9.         14.33333333 11.         60.        ]
 [16.         10.         19.         70.        ]
 [18.         26.         20.         58.33333333]
 [20.          7.         21.         58.33333333]]

2. nanmean() : Replace nan with mean


In this python program example we are using nan_to_num() along with nanmean() function to get the mean of numpy array.

The numpy.nan_to_num() function is used whenever it need to replace nan(not a number) values. It replaces nan values with zero and inf with a finite number in an array. We pass numpy array to nan_to_num() function to replace nan values with mean.This is how to replace nan with mean.

  • import numpy library by using “import numpy as np”
  • Call the numpy library function nan_to_num(),nanmean() to replace nan with zero
import numpy as np

nparr = np.array([[ 5,np.nan, 15, 45], [ 9,  np.nan, 11, 60],
 [16,10, 19, 70],
 [18, 26, 20, np.nan],
 [20, 7, 21, np.nan]])


print('mean:',np.nanmean(nparr))

#replace nan with mean
print(np.nan_to_num(nparr, nan=np.nanmean(nparr)))

Output

23.25
[[ 5.   23.25 15.   45.  ]
 [ 9.   23.25 11.   60.  ]
 [16.   10.   19.   70.  ]
 [18.   26.   20.   23.25]
 [20.    7.   21.   23.25]]

3. Replace nan with mean using masked array


In this example we have used np.ma() masked array.The maskarray is subclass of ndarray and np.where() to replace nan values with average of column in NumPy.

import numpy as np

nparr = np.array([[ 5,np.nan, 15, 45], [ 9,  np.nan, 11, 60],
 [16,10, 19, 70],
 [18, 26, 20, np.nan],
 [20, 7, 21, np.nan]])



  



resarr = np.where(np.isnan(nparr), np.ma.array(nparr,
               mask = np.isnan(nparr)).mean(axis = 0), nparr)


print (resarr)

Output

[[ 5.         14.33333333 15.         45.        ]
 [ 9.         14.33333333 11.         60.        ]
 [16.         10.         19.         70.        ]
 [18.         26.         20.         58.33333333]
 [20.          7.         21.         58.33333333]]

isnan() : Replace nan with mean


The isnan() function is used to check for nan values in numpy array.The isnan() function returns an numpy array that contains true for nan values and false for other values.

In this example we are finding the nan values in array by using isnan() and and replacing nan values with mean.

import numpy as np

nparr = np.array([[ 5,np.nan, 15, 45], [ 9,  np.nan, 11, 60],
 [16,10, 19, 70],
 [18, 26, 20, np.nan],
 [20, 7, 21, np.nan]])



  


nparr[np.isnan(nparr)]= np.nanmean(nparr)
print(nparr)

Output

[[ 5.   23.25 15.   45.  ]
 [ 9.   23.25 11.   60.  ]
 [16.   10.   19.   70.  ]
 [18.   26.   20.   23.25]
 [20.    7.   21.   23.25]]

zip() to replace nan with mean


The zip() function takes iterables single or multiple aggregate in tuple returns.

import numpy as np

nparr = np.array([[ 5,np.nan, 15, 45], [ 9,  np.nan, 11, 60],
 [16,10, 19, 70],
 [18, 26, 20, np.nan],
 [20, 7, 21, np.nan]])


indxs = np.where(np.isnan(nparr))

for row, col in zip(*indxs):
    nparr[row, col] = np.mean(nparr[
           ~np.isnan(nparr[:, col]), col])

print (nparr) 

Output

[[ 5.         14.33333333 15.         45.        ]
 [ 9.         14.33333333 11.         60.        ]
 [16.         10.         19.         70.        ]
 [18.         26.         20.         58.33333333]
 [20.          7.         21.         58.33333333]]

Summary

In this post we have learned how to Replace nan with mean in NumPy with examples by using built in function np.colmean(), np.take(), nanmean(),using masked array,isnan() and zip()