Pandas :
Pandas is an open-source Python Library providing
high-performance data manipulation and analysis tool using its powerful data
structure. Pandas has derived its name from “Panel data system”, which is an econometrics from Multidimensional
data. Pandas is high-level data manipulation tool developed by WesMcKinney.
Installing
Pandas
Pandas can be installed vi pip from PyPI. In this text
we use the pip method to install Pandas. To install panda:
Step 1. Open an Administrator command prompt. Win key, type cmd , then click Run as
administrator as shown figure:
Step2. Click Yes on the User Access Control window
to open administrator window. Type pip
install pandas
Note: Internet
connection must on.
Testing
pandas as IDLE interpreter
Series :
Series is an important data structure of panda. It represents
a one-dimensional array of indexed data. A Series type object has two
main components:
1. An
array of actual data
2. An associated array of indexes or data labels.
Remember
- It is 1D data structure
- It is size immutable
- It is value mutable
- It stores homogeneous
data
- It supports explicit indexing
Creating Series Objects
A
Series type object can be created in many ways using pandas library’s Series().
Use Series() method of
pandas library to create series object as per syntax given below:
<Series object> =
<panda_object>.Series( data = <value>, index = <value>)
For exp –
S = pd.Series(data = [12,23,34],index = [‘a’,’b’,’c’])
While creating Series object using
Series() method, following are the points we should keep in our mind:
- Series() can have two
arguments data and index
- Series() arguments can
be taken in any order.
- Index argument is
optional
- It can be made on any
data such as python sequence (List, Tuple, String), ndarray, dictionary,
scalar value
- We can assign value of
any data type as Index.
- We can skip keyword data
for assigning values to series object.
1 Create
empty Series
>>>
import pandas as pd
>>>
S=pd.Series()
>>>
S
Series([],
dtype: float64)
>>>
type(S)
<class
'pandas.core.series.Series'>
Creating Series object using List-
Syntax:
<Series object> =
<panda_object>.Series(<any list of values>)
Without Indexing-
Example-1:
import pandas as pd
S = pd.Series([3,4,5])
print(“My Series is”)
print(S)
Output
–
My Series is
0 3
1 4
2 5
dtype : int64
With index –
Exp-2:
import pandas as pd
S = pd.Series([31,28,30], index = [‘jan’,’feb’,’mar’])
print(“My Series is”)
print(S)
Output
–
My Series is
jan 31
feb 28
mar 30
dtype : int64
2. Creating
a Series using List
A
list can be converted into series using Series()
method.
Systax:
<Series
Object>=pd.Series(data, index=idx)
Where
data is the part of the Series
object, it can be one of the following:
·
A Python sequence
·
An ndarray
·
A Python dictionary
·
A scalar value
A
Python sequence: Simplest way to create Series type object is to give
a sequence of values as attribute to Series()
Example:
>>> import pandas as pd
>>> S=pd.Series(range(5))
>>> S
Output
0 0
1 1
2 2
3 3
4 4
dtype: int64
Example:
>>> import pandas as pd
>>> S1=pd.Series([2,2.5,3,3.5,4])
>>> S1
Output
0 2.0
1 2.5
2 3.0
3 3.5
4 4.0
dtype: float64
an
ndarray: The data attribute can be an ndarray also.
Example:
>>> import pandas as pd
>>> import numpy as np
>>> da=np.arange(2,15,2.5)
>>> print(da)
[ 2. 4.5 7.
9.5 12. 14.5]
>>> ser=pd.Series(da)
>>> print(ser)
0 2.0
1 4.5
2 7.0
3 9.5
4 12.0
5 14.5
dtype: float64
as
a Python dictionary: Dictionaries
can also converted into Series().
Example
>>> import pandas as pd
>>>
aDict={"Physics":87,"Chemistry":98,"Maths":76}
>>> se=pd.Series(aDict)
>>> print(se)
Output
Physics 87
Chemistry 98
Maths 76
dtype: int64
A
Scalar value: The Data can be in form of a
single value or a scalar value BUT if data is a scalar value, the index must be
provided. There can be one or more entries in index sequence. The scalar value(given
as data) will be repeated to match the length of index.
Example1:
>>>
import pandas as pd
>>>
score=pd.Series(10,index=range(0,1))
>>>
score
Output:
0 10
dtype: int64
Example:
>>>
import pandas as pd
>>>
score1=pd.Series(20,index=range(1,10,2))
>>>
score1
Output
1 20
3 20
5 20
7 20
9 20
dtype: int64
Example3:
>>> import pandas as pd
>>>
marks=pd.Series(88,index=['Accountancy','Business studies','English'])
>>> marks
Output:
Accountancy 88
Business
studies 88
English 88
dtype:
int64
Additional
functionality
(i)
Specifying/Adding
NaN values is a Series object: Sometimes you need to create a
series object of a certain size but you do not have complete data available at
that time. In such cases, you can fill missing data with a NaN (Not any Number) value. Legal empty value NaN is defined in Numpy module and hence you can use np.NaN to specify a
missing value.
Example:
>>>
import pandas as pd
>>>
import numpy as np
>>>
S=pd.Series([33,np.NaN,44])
>>> S
Output:
0 33.0
1 NaN
2 44.0
dtype: float64
(ii)
Specify
index(es) as well as data with Series( ). While creating
Series type object is that along with values, you also provide indexes. Both values and indexes are sequences.
Syntax:
<Series Object> =pandas.Series(data=None, index=None)
Example:
>>>
import pandas as pd
>>>
import numpy as np
>>>
aMarkList=[54,23,65,76,76]
>>>
SubList=['English','Accounts','Bst','IP','Economics']
>>>
MarkList=[54,23,65,76,46]
>>>
s=pd.Series(data=MarkList,index=SubList)
>>>
print(s)
Output:
English 54
Accounts 23
Bst 65
IP 76
Economics 46
dtype: int64
Example:
>>>
import pandas as pd
>>>
import numpy as np
>>>
s1=pd.Series(data=[34,54,76],index=['A','B','C'])
>>>
print(s1)
Output
A 34
B 54
C 76
dtype:
int64
Example:
Loop used for specifying indexes
>>> import pandas as pd
>>> import numpy as np
>>> s2=pd.Series(range(0,20,5),index=[x for x
in 'abcd'])
>>> s2
Output
a 0
b 5
c 10
d 15
dtype:
int64
(iii)
Specify
data type along with data and index
Syntax
:
<Series
Object> =pandas.Series(data=None, index=None, dtype=None)
Example
Example.py
import pandas
as pd
import numpy
as np
aMarkList=[54,23,65,76,76]
SubList=['English','Accounts','Bst','IP','Economics']
MarkList=[54,23,65,76,46]
s=pd.Series(data=MarkList,index=SubList,dtype=np.float64)
print(s)
Output
English
54.0
Accounts
23.0
Bst
65.0
IP
76.0
Economics
46.0
dtype: float64
(iv)
Using
Mathematical function/expression to create data array in Series( ):
Syntax
:
<Series
Object> =pandas.Series(index=None, data=<function|expression>)
Example:
Ex3.py
import pandas as pd
import numpy as np
a=np.arange(2,20,4)
print("print..")
print(a)
s=pd.Series(index=a,data=a*2)
print("print Series")
print(s)
s1=pd.Series(index=a,data=a**2)
print("print Series")
print(s1)
Output
print..
[
2 6 10 14 18]
print
Series
2 4
6 12
10 20
14 28
18 36
dtype:
int32
print
Series
2 4
6 36
10 100
14 196
18 324
dtype:
int32
Common
attributes of Series objects:
Attributes |
Description |
Series.index |
Defines the index of the Series. |
Series.shape |
It returns a tuple of shape of the data. |
Series.dtype |
It returns the data type of the data. |
Series.size |
It returns the size of the data. |
Series.empty |
It returns True if Series object is empty,
otherwise returns false. |
Series.hasnans |
It returns True if there are any NaN values,
otherwise returns false. |
Series.nbytes |
It returns the number of bytes in the data. |
Series.ndim |
It returns the number of dimensions in the data. |
Example:
import pandas as pd
import numpy as np
aMarkList=[54,23,65,np.NaN,76]
SubList=['English','Accounts','Bst','IP','Economics']
MarkList=[54,np.NaN,65,76,46]
s=pd.Series(data=MarkList,index=SubList,dtype=np.float64)
print("Show Series")
print(s)
print("Show Index")
print(s.index)
print("Show values")
print(s.values)
print("Show data type")
print(s.dtype)
print("Show shape")
print(s.shape)
print("Show number of bytes")
print(s.nbytes)
print("Show number of dimension")
print(s.ndim)
print("Show number of elements ")
print(s.size)
print("Show size of data type ")
print(s.hasnans)
print("Show True if the Series is empty")
print(s.empty)
Output:
Show
Series
English 54.0
Accounts NaN
Bst 65.0
IP 76.0
Economics 46.0
dtype:
float64
Show
Index
Index(['English',
'Accounts', 'Bst', 'IP', 'Economics'], dtype='object')
Show
values
[54.
nan 65. 76. 46.]
Show
data type
float64
Show
shape
(5,)
Show
number of bytes
40
Show
number of dimension
1
Show
number of elements
5
Show
size of data type
True
Show
True if the Series is empty
False
Accessing Data/Value/Elements
of Series Object
Slicing is a powerful approach to retrieve subsets of data from
a pandas object. A slice object is built using a syntax of start:end:step,
the segments representing the first item, last item, and the increment between
each item that you would like as the step.
Accessing Data/Value/Elements
of Series Object
We
can access elements of Series object in two ways –
Using Index: Using Index we can access
- Using Indexing we can access Individual element of Series object.
- Using Indexing we can access multiple elements of Series object that may not be contiguous element.
- Indexing can be used in two ways : Labelled Index, Positional Index
- In Positional Index an Integer value is taken which represent specific element.
- In Labelled Index any user defined label as index is taken.
Using Slice: Using Slice we can access
- Subset of Series object contain multiple elements are always contiguous element.
Accessing
Individual Data/Value/Element
To
access individual element, we have to provide index no of the element within
square bracket of Series object.
Syntax:
<Series
object>[<Index>]
Exp-1
import pandas as pd
S = pd.Series(range(10,101,10))
print(“We have Accessed”)
print(S[4])
Output
–
We have Accessed
50
Accessing
Multiple Data/Value/Elements Using Index
To
access multiple elements, we have to provide index no of the each element as
List within square bracket of Series object .
Syntax:
<Series object>[[<Index, Index,…>]]
Exp-1
import pandas as pd
S = pd.Series([12,23,34,45,55],index = [‘R1’,’R2’,’R3’,’R4’,’R5’])
print(“We have Accessed”)
print(S[[‘R1’,’R4’]])
Output
–
We have Accessed
R1 12
R4 45
dtype: int64
Using Slicing
- Extracting a specific part of Series object is called Slicing.
- Subset occurred after slicing contains contiguous elements.
- Slicing is done using positions not Index of the Series object.
- In Positional slicing value of end index is excluded.
- If labels are used in slicing, than value at end index label is also included.
- Slicing can also be used to extract slice elements in reverse order.
We
can retrieve subset of series object as per syntax given below –
<Series object>[start_index :
end_index : step-value]
Program-1
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[2:6])
Output
–
Slicing Demo
2 34
3 45
4 55
5 76
dtype: int64
Program
-2
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing Example”)
print(S[:])
Output
–
Slicing Demo
0 12
1 23
2 34
3 45
4 55
5 76
6 80
7 92
8 41
9 69
10 56
dtype: int64
Program
-3
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[:4])
Output
–
Slicing Demo
0 12
1 23
2 34
3 45
dtype: int64
Program
-4
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[5:])
Output
–
Slicing Demo
5 76
6 80
7 92
8 41
9 69
10 56
dtype: int64
Program
-5
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[1:9:2])
Output
–
Slicing Demo
1 23
3 45
5 76
7 92
dtype: int64
Program
-6
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[7:15])
Output
–
Slicing Demo
7 92
8 41
9 69
10 56
dtype: int64
Note
– if end index is out of bound, even though slicing produces subset
Program
-7
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[3:-5])
Output
–
Slicing Demo
3 45
4 55
5 76
dtype: int64
Program
-8
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[:-6])
Output
–
Slicing Demo
0 12
1 23
2 34
3 45
4 55
dtype: int64
Program
-9
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[-4:])
Output
–
Slicing Demo
7 92
8 41
9 69
10 56
dtype: int64
Program
-10
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[-7:-2])
Output
–
Slicing Demo
4 55
5 76
6 80
7 92
8 41
dtype: int64
Program
-11
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[-4:])
Output
–
Slicing Demo
7 92
8 41
9 69
10 56
dtype: int64
Program
-12
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[::3])
Output
–
Slicing Demo
0 12
3 45
6 80
9 69
dtype: int64
Program
-13
import pandas as pd
S = pd.Series([12,23,34,45,55,76,80,92,41,69,56])
print(“Slicing demo”)
print(S[::-3])
Output
–
Slicing Demo
10 56
7 92
4 55
1 23
dtype: int64
Program-14
import pandas as pd num = [000, 100, 200, 300, 400, 500,
600, 700, 800, 900] idx = ['A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J'] series = pd.Series(num, index=idx) print("\n [2:2] \n") print(series[2:4]) print("\n [1:6:2] \n") print(series[1:6:2]) print("\n [:6] \n") print(series[:6]) print("\n [4:] \n") print(series[4:]) print("\n [:4:2] \n") print(series[:4:2]) print("\n [4::2] \n") print(series[4::2]) print("\n [::-1] \n") print(series[::-1]) |
C:\python\pandas examples>python example1f.py
[2:2]
C 200
D 300
dtype: int64
[1:6:2]
B 100
D 300
F 500
dtype: int64
[:6]
A 0
B 100
C 200
D 300
E 400
F 500
dtype: int64
[4:]
E 400
F 500
G 600
H 700
I 800
J 900
dtype: int64
[:4:2]
A 0
C 200
dtype: int64
[4::2]
E 400
G 600
I 800
dtype: int64
[::-1]
J 900
I 800
H 700
G 600
F 500
E 400
D 300
C 200
B 100
A 0
dtype: int64
Operations
on Series Object
1.
Modifying
Elements of Series Object:
The data
values of a Series object can be easily modified through item assignment, i.e
Using Index
<SeriesObject>[<index>]=<new_data_value>
Above assignment
will change the data value of the given index in Series object.
Using
Slicing
<SeriesObject>[start:stop]=<new_data_value>
Above assignment
will replace all the values falling in given slice.
Example:
import numpy
as np
import pandas
as pd
a=pd.Series(data=[1,2,3,4,5,6,7,8])
b=pd.Series(data=[4.9,8.2,5.6,3.5,6.7,2,8,9,9])
a[3]=10
print(a)
b[2:4]=11.11
print(b)
a[1:9:2]=10
print(a)
a.index=['a','b','c','d','e','f','g','h']
print(a)
Output:
0 1
1 2
2 3
3 10
4 5
5 6
6 7
7 8
dtype: int64
0 4.90
1 8.20
2 11.11
3 11.11
4 6.70
5 2.00
6 8.00
7 9.00
8 9.00
dtype: float64
0 1
1 10
2 3
3 10
4 5
5 10
6 7
7 10
dtype: int64
a 1
b 10
c 3
d 10
e 5
f 10
g 7
h 10
dtype: int64
No comments:
Post a Comment