In this article, we are going to learn about the intro to Pandas Data structures series Dataframe ndarray with code examples, how to create and use them, and difference. Mainly array, ndarray, series, and Dataframes are used widely while working with Pandas. Today we will learn about one of these data structures on Pandas.
Types of Data Structures in Pandas
Types of data structures in Pandas that widely used working with Pandas
- ndarray – N Dimensional Array
- Pandas Data structure: Pandas Series
1. ndarray – N Dimensional Array
We will learn about ndarray which is part of the NumPy library. Pandas use the ndarray for a lot of functions. Let us briefly understand what is ndarray.
Program Example 1D ndarray
import numpy as np np_array = np.random.rand(4) print(type(np_array)) print(np_array)
Here we can see the output as np_array is of type ndarray. And also we can see the random value generated by np.random.rand at index 0. We are using a 1-D array, but this can be created as a 2-D or N-D array also.
<class 'numpy.ndarray'> 0.179953500159113
Program Example 2D ndarray
In the below example we can see how 2-D ndarray can be created in Numpy
import numpy as np nd_array = np.random.rand(4,2) print(type(nd_array)) print(nd_array[2,1])
<class 'numpy.ndarray'> 0.09759443425495462
2. Pandas Data structure: Pandas Series
The Pandas Series are used in manipulating data frames and other operations on data. Data can be read in the form of this data structure and then operated upon. Series is a one-column data structure. Let us understand with examples what is panda series.
We passed the ndarray to the Series and got the series output. Here we can see the output as pd_series is of type pandas.core.series.Series. And also we can see the random value at index 0 of the series.
import pandas as pd import numpy as np nd_array = np.random.rand(4,2) pd_series = pd.Series(np_array) print(type(pd_series)) print(pd_series)
<class 'pandas.core.series.Series'> 0.7916550309100095
What is the difference in array and series
From the above example and output, we don’t see much difference between ndarray and series. They look the same then what is the difference in array and series.
The array has fixed and numeric indexes which start from 0,1,2,3….
But in series we have the numeric indexes and if we want we can also set the index of our choice. Let us understand this with an example:
import pandas as pd import numpy as np np_array = np.random.rand(4) pd_series = pd.Series(np_array,index=["zero","one","two","three"]) print(pd_series,'\n') print(pd_series["zero"]) print(pd_series)
As you can see number index still works and also our own index also works on series.
zero 0.411261 one 0.644768 two 0.350842 three 0.565235 dtype: float64 0.4112614144776182 0.4112614144776182
Get Index of panda series
So for a given panda series if we want we can check the index by using index property like below:
import pandas as pd import numpy as np np_array = np.random.rand(4) pd_series = pd.Series(np_array,index=["zero","one","two","three"]) print(pd_series.index)
Index(['zero', 'one', 'two', 'three'], dtype='object')