DATAFRAME DATA STRUCTURE
A
DataFrame in another pandas structure, which stores data in two-dimensional
way. It is actually a two-dimensional like tabular and spreadsheet labeled
array, which is actually an ordered collections of columns where columns may
store different types of data. E.g numeric or string or floating point or
Boolean type etc.
Representing
a DataFrame with two-dimensional array with heterogeneous data.
Name |
English |
Maths |
Physics |
Chemistry |
IP |
Sachin |
56 |
76 |
56 |
78 |
54 |
Depesh |
76 |
98 |
43 |
76 |
65 |
Nitin |
34 |
65 |
87 |
32 |
87 |
Tarun |
98 |
87 |
98 |
76 |
87 |
1. It has two
indexes or we can any say that two axis- a row index (axis=0) and
a columns index (axis=1).
2.
Conceptually it is like a spreadsheet where
each value is identifiable with the combination of row index and column index.
The row index is known as index in general and the column index is
called the column-name.
3.
The indexes can be of numbers or letters or
strings.
4.
There is no condition of having all data of
same type across columns, its columns can have data of different types.
5.
You can easily change its values i.e. it is value-mutable.
6. You can
add or delete row/columns in a DataFrame. In other words, it is size-mutable.
Creating and displaying a DataFrame
A
DataFrame object can be passing data in two-dimensional format. Like earlier,
before you do anything with pandas module, make sure to import pandas and Numpy
modules., i.e. give the following two import statements on your code or IPython
console:
import
pandas as pd
import
numpy as np
To
create a DataFrame object, you can use syntax as:
<dataFrameObject>=pd.DataFrame(<data structure>,
[columns=<column sqquesnce>, [index=<index sequence>])
Creating
a dataframe from a 2D dictionary having values as lists/ndarrya :
Example
:
Save file df.py
import
pandas as pd
student=["Sachin","divya","parul","Anshika","Ritesh","Ashish","Utkarsh","Saurya"]
marks=[54,76,99,76,54,87,54,87]
d={"Students":student,"Marks":marks}
print("Show
dictionary")
print(d)
df=pd.DataFrame(d)
print("Data
Frame List")
print(df)
Here,
you can have 2D dictionary wherein the value part consists of either
list/ndarrays. Passing such 2D dinctionary DataFrame() will create a dataframe
object as you can see yourself in the example above.
Example:
Save file df1.py
import
pandas as pd
SData={"name":['Vinita','Ankita','Amit','Sandeep','Vanshika','Jyotsna'],\
'Accounts':[54,76,98,54,76,87],'English':[89,87,54,89,43,67],\
'Bst':[65,67,87,56,87,54]}
print(SData)
print("Convert
dictionary to dataframe")
df=pd.DataFrame(SData)
print(df)
Output :
Example : using index in the DataFrame( ) function.
import pandas as pd
SData={"name":['Vinita','Ankita','Amit','Sandeep','Vanshika','Jyotsna'],\
'Accounts':[54,76,98,54,76,87],'English':[89,76,54,89,43,67],\
'Bst':[65,67,87,56,32,54]}
print(SData) # print dictionary
print("Convert dictionary to dataframe")
df=pd.DataFrame(SData,
index=['I','II','III','IV','V','VI']) #print DataFrame
print(df)
Output
Creating a
dataframe from a 2D dictionary have values as dictionary objects:
import pandas as pd
yr2017={'Qtr1':34500,'Qtr2':56000,'Qtr3':47000,'Qtr4':49000}
yr2018={'Qtr1':44500,'Qtr2':66000,'Qtr3':57000,'Qtr4':59000}
yr2019={'Qtr1':24500,'Qtr2':36000,'Qtr3':37000,'Qtr4':44000}
yr2020={'Qtr1':64500,'Qtr2':76000,'Qtr3':47000,'Qtr4':59000}
Sales={2017:yr2017,2018:yr2018,2019:yr2019,2020:yr2020}
print("Combined
Dictionary")
print(Sales)
print("\nConvert
dictionary to DataFrame")
df=pd.DataFrame(Sales)
print(df)
Output:
Creating a
DataFrame Object from a 2-D ndarray : You can also pass a
two-dimensional NumPy array to DataFrame() to create a dataframe object.
Example:
>>> import
pandas as pd
>>> import
numpy as np
>>>
arr=np.array([[1,2,3],[4,5,6]],np.int32)
>>>
arr.shape
(2, 3)
>>>
df=pd.DataFrame(arr)
>>> df
Output
0
1 2
0 1 2 3
1 4 5 6
Example : using
column names and /or index names by giving a columns sequence and /or index
sequence:
>>> import numpy as np
>>> import pandas as pd
>>> arr=np.array([[1,2,3],[4,5,6]],np.int32)
>>> df=pd.DataFrame(arr,columns=['One','Two','Three'])
>>> df
Output
Example: This
example the columns and indexes have names or labels as per the given columns and index sequences
respectively.
>>>
arr2=np.array([[11.5,21.2,33.8],[40,50,60],[212.3,301.5,405.2]])
>>> df1=pd.DataFrame(arr2,
columns=['First','Second','Third'], index=['A','B','C'])
>>> df1
Output
Data Types of
DataFrame columns: In pandas, many DataFrames have mixed data
types that is Some columns string/integer/float/date etc. you can check the
types of each column with .dtype property of DataFrame.
>>> import pandas as pd
>>>
data={'Name':['Sachin','Varun','Faiz','Trisha'], 'Marks':[76,98,65.5,54]}
>>> df=pd.DataFrame(data)
>>> df.dtypes
Output:
Name object
Marks float64
dtype:
object
>>> df['Name'].dtypes
dtype('O')
>>> df['Marks'].dtypes
dtype('float64')
Rearrangeing the
order of Columns
>>>
df=pd.DataFrame(data, columns=['Marks','Name'])
>>>
df
Output:
No comments:
Post a Comment