Python Tutorial: Pandas Series

Monday, 17 August 2020

Pandas Series

Pandas :

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structure. Pandas has derived its name from “Panel data system”, which is an econometrics from Multidimensional data. Pandas is high-level data manipulation tool developed by WesMcKinney.

Installing Pandas

Pandas can be installed vi pip from PyPI. In this text we use the pip  method to install Pandas. To install panda:

Step 1. Open an Administrator command prompt. Win key, type cmd , then click Run as administrator as shown figure:

Step2.  Click Yes on the User Access Control window to open administrator window. Type pip install pandas

Note:  Internet connection must on.

Testing pandas as IDLE interpreter

Series :

Series is an important data structure of panda. It represents a one-dimensional array of indexed data. A Series type object has two main components:

1.   An array of actual data

2.   An associated array of indexes or data labels.



 

Remember

  • It is 1D data structure
  • It is size immutable
  • It is value mutable
  • It stores homogeneous data
  • It supports explicit indexing

Creating Series Objects

A Series type object can be created in many ways using pandas library’s Series().

Use Series() method of pandas library to create series object as per syntax given below:

<Series object> = <panda_object>.Series( data = <value>, index = <value>)
For exp –
S = pd.Series(data = [12,23,34],index = [‘a’,’b’,’c’])

While creating Series object using Series() method, following are the points we should keep in our mind:

  • Series() can have two arguments data and index
  • Series() arguments can be taken in any order.
  • Index argument is optional
  • It can be made on any data such as python sequence (List, Tuple, String), ndarray, dictionary, scalar value
  • We can assign value of any data type as Index.
  • We can skip keyword data for assigning values to series object.

1  Create empty Series

>>> import pandas as pd

>>> S=pd.Series()

>>> S

Series([], dtype: float64)

>>> type(S)

<class 'pandas.core.series.Series'>

Creating Series object using List-

Syntax:

<Series object> = <panda_object>.Series(<any list of values>)

Without Indexing-

Example-1:

 import pandas as pd
 S = pd.Series([3,4,5])
 print(“My Series is”)
 print(S) 

Output –

 My Series is
 0  3
 1  4
 2  5
 dtype : int64 
With index –

Exp-2:

 import pandas as pd
 S = pd.Series([31,28,30], index = [‘jan’,’feb’,’mar’])
 print(“My Series is”)
 print(S) 

Output –

 My Series is
 jan       31
 feb       28
 mar       30

 dtype : int64 

2.   Creating a Series using List

A list can be converted into series using Series() method.

Systax:

          <Series Object>=pd.Series(data, index=idx)

Where data is the part of the Series object, it can be one of the following:

·         A Python sequence

·         An ndarray

·         A Python dictionary

·         A scalar value

A Python sequence: Simplest way to create Series type object is to give a sequence of values as attribute to Series()

Example:

>>> import pandas as pd

>>> S=pd.Series(range(5))

>>> S

Output

0    0

1    1

2    2

3    3

4    4

dtype: int64

Example:

>>> import pandas as pd

>>> S1=pd.Series([2,2.5,3,3.5,4])

>>> S1

Output

0    2.0

1    2.5

2    3.0

3    3.5

4    4.0

dtype: float64

an ndarray: The data attribute can be an ndarray also.

Example:

>>> import pandas as pd

>>> import numpy as np

>>> da=np.arange(2,15,2.5)

>>> print(da)

[ 2.   4.5  7.   9.5 12.  14.5]

>>> ser=pd.Series(da)

>>> print(ser)

0     2.0

1     4.5

2     7.0

3     9.5

4    12.0

5    14.5

dtype: float64

as a Python dictionary:  Dictionaries can also converted into Series().

Example

>>> import pandas as pd

>>> aDict={"Physics":87,"Chemistry":98,"Maths":76}

>>> se=pd.Series(aDict)

>>> print(se)

Output

Physics      87

Chemistry    98

Maths        76

dtype: int64

A Scalar value:   The Data  can be in form of a single value or a scalar value BUT if data is a scalar value, the index must be provided. There can be one or more entries in index sequence. The scalar value(given as data) will be repeated to match the length of index.

Example1:

 

>>> import pandas as pd

>>> score=pd.Series(10,index=range(0,1))

>>> score

Output:

0    10

dtype: int64

 

Example:

>>> import pandas as pd

>>> score1=pd.Series(20,index=range(1,10,2))

>>> score1

Output

1    20

3    20

5    20

7    20

9    20

dtype: int64

Example3:

>>> import pandas as pd

>>> marks=pd.Series(88,index=['Accountancy','Business studies','English'])

>>> marks

Output:

Accountancy         88

Business studies    88

English             88

dtype: int64

Additional functionality

(i)         Specifying/Adding NaN values is a Series object: Sometimes you need to create a series object of a certain size but you do not have complete data available at that time. In such cases, you can fill missing data with a NaN (Not any Number) value. Legal empty value NaN is defined in Numpy module  and hence you can use np.NaN to specify a missing value.

Example:

>>> import pandas as pd

>>> import numpy as np

>>> S=pd.Series([33,np.NaN,44])

>>> S

Output:

0    33.0

1     NaN

2    44.0

dtype: float64

(ii)        Specify index(es) as well as data with Series( ). While creating Series type object is that along with values, you also provide indexes. Both values and indexes are sequences.

Syntax:

<Series Object> =pandas.Series(data=None, index=None)

Example:

>>> import pandas as pd

>>> import numpy as np

>>> aMarkList=[54,23,65,76,76]

>>> SubList=['English','Accounts','Bst','IP','Economics']

>>> MarkList=[54,23,65,76,46]

>>> s=pd.Series(data=MarkList,index=SubList)

>>> print(s)

Output:

English      54

Accounts     23

Bst          65

IP           76

Economics    46

dtype: int64

Example:

>>> import pandas as pd

>>> import numpy as np

>>> s1=pd.Series(data=[34,54,76],index=['A','B','C'])

>>> print(s1)

Output

A    34

B    54

C    76

dtype: int64

 

Example: Loop used for specifying indexes

>>> import pandas as pd

>>> import numpy as np

>>> s2=pd.Series(range(0,20,5),index=[x for x in 'abcd'])

>>> s2

Output

a     0

b     5

c    10

d    15

dtype: int64

 

(iii)  Specify data type along with data and index

Syntax :

     <Series Object> =pandas.Series(data=None, index=None, dtype=None)

 

Example

Example.py

import pandas as pd

import numpy as np

aMarkList=[54,23,65,76,76]

SubList=['English','Accounts','Bst','IP','Economics']

MarkList=[54,23,65,76,46]

s=pd.Series(data=MarkList,index=SubList,dtype=np.float64)

print(s)

Output

English      54.0

Accounts     23.0

Bst          65.0

IP           76.0

Economics    46.0

dtype: float64

(iv)      Using Mathematical function/expression to create data array in Series( ):

Syntax :

     <Series Object> =pandas.Series(index=None, data=<function|expression>)

Example:

Ex3.py

import pandas as pd

import numpy as np

a=np.arange(2,20,4)

print("print..")

print(a)

s=pd.Series(index=a,data=a*2)

print("print Series")

print(s)

s1=pd.Series(index=a,data=a**2)

print("print Series")

print(s1)

Output

print..

[ 2  6 10 14 18]

print Series

2      4

6     12

10    20

14    28

18    36

dtype: int32

print Series

2       4

6      36

10    100

14    196

18    324

dtype: int32

Common attributes of Series objects:

Attributes

Description

Series.index

Defines the index of the Series.

Series.shape

It returns a tuple of shape of the data.

Series.dtype

It returns the data type of the data.

Series.size

It returns the size of the data.

Series.empty

It returns True if Series object is empty, otherwise returns false.

Series.hasnans

It returns True if there are any NaN values, otherwise returns false.

Series.nbytes

It returns the number of bytes in the data.

Series.ndim

It returns the number of dimensions in the data.

 

Example:

import pandas as pd

import numpy as np

aMarkList=[54,23,65,np.NaN,76]

SubList=['English','Accounts','Bst','IP','Economics']

MarkList=[54,np.NaN,65,76,46]

s=pd.Series(data=MarkList,index=SubList,dtype=np.float64)

print("Show Series")

print(s)

print("Show Index")

print(s.index)

print("Show values")

print(s.values)

print("Show data type")

print(s.dtype)

print("Show shape")

print(s.shape)

print("Show number of bytes")

print(s.nbytes)

print("Show number of dimension")

print(s.ndim)

print("Show number of elements ")

print(s.size)

print("Show size of data type ")

print(s.hasnans)

print("Show True if the Series is empty")

print(s.empty)

Output:

Show Series

English      54.0

Accounts      NaN

Bst          65.0

IP           76.0

Economics    46.0

dtype: float64

Show Index

Index(['English', 'Accounts', 'Bst', 'IP', 'Economics'], dtype='object')

Show values

[54. nan 65. 76. 46.]

Show data type

float64

Show shape

(5,)

Show number of bytes

40

Show number of dimension

1

Show number of elements

5

Show size of data type

True

Show True if the Series is empty

False

Accessing Data/Value/Elements of Series Object

Slicing is a powerful approach to retrieve subsets of data from a pandas object. A slice object is built using a syntax of start:end:step, the segments representing the first item, last item, and the increment between each item that you would like as the step.

 

Accessing Data/Value/Elements of Series Object

We can access elements of Series object in two ways –

Using Index: Using Index we can access

  • Using Indexing we can access Individual element of Series object.
  • Using Indexing we can access multiple elements of Series object that may not be contiguous element.
  • Indexing can be used in two ways : Labelled Index, Positional Index
  • In Positional Index an Integer value is taken which represent specific element.
  • In Labelled Index any user defined label as index is taken.

Using Slice: Using Slice we can access

  • Subset of Series object contain multiple elements are always contiguous element.

Accessing Individual Data/Value/Element

To access individual element, we have to provide index no of the element within square bracket of Series object.

Syntax:

<Series object>[<Index>]

Exp-1

 import pandas as pd
 S = pd.Series(range(10,101,10))
 print(“We have Accessed”)
 print(S[4]) 

Output –

 We have Accessed
 50 
Accessing Multiple Data/Value/Elements Using Index

To access multiple elements, we have to provide index no of the each element as List within square bracket of Series object .

Syntax:

<Series object>[[<Index, Index,…>]]

Exp-1

 import pandas as pd
 S = pd.Series([12,23,34,45,55],index = [‘R1’,’R2’,’R3’,’R4’,’R5’])
 print(“We have Accessed”)
 print(S[[‘R1’,’R4’]]) 

Output –

 We have Accessed
 R1        12
 R4        45
 dtype:    int64 

Using Slicing

  • Extracting a specific part of Series object is called Slicing.
  • Subset occurred after slicing contains contiguous elements.
  • Slicing is done using positions not Index of the Series object.
  • In Positional slicing value of end index is excluded.
  • If labels are used in slicing, than value at end index label is also included.
  • Slicing can also be used to extract slice elements in reverse order.

We can retrieve subset of series object as per syntax given below –

<Series object>[start_index : end_index : step-value]

Program-1

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[2:6]) 

Output –

 Slicing Demo
 2         34
 3         45
 4         55
 5         76
 dtype:    int64 

Program -2

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing Example”)
 print(S[:]) 

Output –

 Slicing Demo
0     12
1     23
2     34
3     45
4     55
5     76
6     80
7     92
8     41
9     69
10   56
 dtype:    int64 

Program -3

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[:4]) 

Output –

 Slicing Demo
 0         12
 1         23
 2         34
 3         45
 dtype:    int64 

Program -4

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[5:]) 

Output –

 Slicing Demo
5     76
6     80
7     92
8     41
9     69
10   56
 dtype:    int64 

Program -5

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[1:9:2]) 

Output –

 Slicing Demo
 1         23
 3         45
 5         76
 7         92
 dtype:    int64 

Program -6

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[7:15]) 

Output –

 Slicing Demo
 7         92
 8         41
 9         69
 10        56
 dtype:    int64 

Note –  if end index is out of bound, even though slicing produces subset

Program -7

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[3:-5]) 

Output –

 Slicing Demo
 3         45
 4         55
 5         76
 dtype:    int64 

Program -8

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[:-6]) 

Output –

Slicing Demo
0    12
1    23
2    34
3    45
4    55
dtype:    int64 

Program -9

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[-4:]) 

Output –

Slicing Demo
 7         92
 8         41
 9         69
 10        56
dtype:    int64 

Program -10

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[-7:-2]) 

Output –

 Slicing Demo
 4         55
 5         76
 6         80
 7         92
 8         41
 dtype:    int64 

Program -11

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[-4:]) 

Output –

 Slicing Demo
 7         92
 8         41
 9         69
 10        56
 dtype:    int64 

Program -12

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[::3]) 

Output –

 Slicing Demo
 0         12
 3         45
 6         80
 9         69
 dtype:    int64 

Program -13

 import pandas as pd
 S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
 print(“Slicing demo”)
 print(S[::-3]) 

Output –

 Slicing Demo
 10        56
 7         92
 4         55
 1         23
 dtype:    int64 
Program-14

import pandas as pd

num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900]

idx = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

series = pd.Series(num, index=idx)

print("\n [2:2] \n")

print(series[2:4])

print("\n [1:6:2] \n")

print(series[1:6:2])

print("\n [:6] \n")

print(series[:6])

print("\n [4:] \n")

print(series[4:])

print("\n [:4:2] \n")

print(series[:4:2])

print("\n [4::2] \n")

print(series[4::2])

print("\n [::-1] \n")

print(series[::-1])



C:\python\pandas examples>python example1f.py

 

 [2:2]

 

C    200

D    300

dtype: int64

 

 [1:6:2]

 

B    100

D    300

F    500

dtype: int64

 

 [:6]

 

A      0

B    100

C    200

D    300

E    400

F    500

dtype: int64

 

 [4:]

 

E    400

F    500

G    600

H    700

I    800

J    900

dtype: int64

 

 [:4:2]

 

A      0

C    200

dtype: int64

 

 [4::2]

 

E    400

G    600

I    800

dtype: int64

 

 [::-1]

 

J    900

I    800

H    700

G    600

F    500

E    400

D    300

C    200

B    100

A      0

dtype: int64

Operations on Series Object

1.   Modifying Elements of Series Object:

The data values of a Series object can be easily modified through  item assignment, i.e

Using Index

<SeriesObject>[<index>]=<new_data_value>

Above assignment will change the data value of the given index in Series object.

Using Slicing

<SeriesObject>[start:stop]=<new_data_value>

Above assignment will replace all the values falling in given slice.

Example:

import numpy as np  

import pandas as pd  

a=pd.Series(data=[1,2,3,4,5,6,7,8])  

b=pd.Series(data=[4.9,8.2,5.6,3.5,6.7,2,8,9,9])  

a[3]=10

print(a)

b[2:4]=11.11

print(b)

a[1:9:2]=10

print(a)

a.index=['a','b','c','d','e','f','g','h']

print(a)

Output:

0     1

1     2

2     3

3    10

4     5

5     6

6     7

7     8

dtype: int64

0     4.90

1     8.20

2    11.11

3    11.11

4     6.70

5     2.00

6     8.00

7     9.00

8     9.00

dtype: float64

0     1

1    10

2     3

3    10

4     5

5    10

6     7

7    10

dtype: int64

a     1

b    10

c     3

d    10

e     5

f    10

g     7

h    10

dtype: int64

No comments:

Post a Comment