Python Tutorial: Pandas DataFrame-I

Wednesday, 19 August 2020

Pandas DataFrame-I

DATAFRAME DATA STRUCTURE

A DataFrame in another pandas structure, which stores data in two-dimensional way. It is actually a two-dimensional like tabular and spreadsheet labeled array, which is actually an ordered collections of columns where columns may store different types of data. E.g numeric or string or floating point or Boolean type etc.

Representing a DataFrame with two-dimensional array with heterogeneous data.

Name

English

Maths

Physics

Chemistry

IP

Sachin

56

76

56

78

54

Depesh

76

98

43

76

65

Nitin

34

65

87

32

87

Tarun

98

87

98

76

87

 Characteristics

1.   It has two indexes or we can any say that two axis- a row index (axis=0) and a columns index (axis=1).

2.   Conceptually it is like a spreadsheet where each value is identifiable with the combination of row index and column index. The row index is known as index in general and the column index is called the column-name.

3.   The indexes can be of numbers or letters or strings.

4.   There is no condition of having all data of same type across columns, its columns can have data of different types.

5.   You can easily change its values i.e. it is value-mutable.

6.   You can add or delete row/columns in a DataFrame. In other words, it is size-mutable.

Creating and displaying a DataFrame

A DataFrame object can be passing data in two-dimensional format. Like earlier, before you do anything with pandas module, make sure to import pandas and Numpy modules., i.e. give the following two import statements on your code or IPython console:

import pandas as pd

import numpy as np

To create a DataFrame object, you can use syntax as:

<dataFrameObject>=pd.DataFrame(<data structure>, [columns=<column sqquesnce>, [index=<index sequence>])

 

Creating a dataframe from a 2D dictionary having values as lists/ndarrya :

Example :

Save file df.py

import pandas as pd

student=["Sachin","divya","parul","Anshika","Ritesh","Ashish","Utkarsh","Saurya"]

marks=[54,76,99,76,54,87,54,87]

d={"Students":student,"Marks":marks}

print("Show dictionary")

print(d)

df=pd.DataFrame(d)

print("Data Frame List")

print(df)

Here, you can have 2D dictionary wherein the value part consists of either list/ndarrays. Passing such 2D dinctionary DataFrame() will create a dataframe object as you can see yourself in the example above.

Example:

Save file df1.py

import pandas as pd

SData={"name":['Vinita','Ankita','Amit','Sandeep','Vanshika','Jyotsna'],\

       'Accounts':[54,76,98,54,76,87],'English':[89,87,54,89,43,67],\

       'Bst':[65,67,87,56,87,54]}

print(SData)

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData)

print(df)

Output :

Example :  using index in the DataFrame( ) function.

import pandas as pd

SData={"name":['Vinita','Ankita','Amit','Sandeep','Vanshika','Jyotsna'],\

       'Accounts':[54,76,98,54,76,87],'English':[89,76,54,89,43,67],\

       'Bst':[65,67,87,56,32,54]}

print(SData)   # print dictionary

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData, index=['I','II','III','IV','V','VI']) #print DataFrame

print(df)

Output

Creating a dataframe from a 2D dictionary have values as dictionary objects:

import pandas as pd

yr2017={'Qtr1':34500,'Qtr2':56000,'Qtr3':47000,'Qtr4':49000}

yr2018={'Qtr1':44500,'Qtr2':66000,'Qtr3':57000,'Qtr4':59000}

yr2019={'Qtr1':24500,'Qtr2':36000,'Qtr3':37000,'Qtr4':44000}

yr2020={'Qtr1':64500,'Qtr2':76000,'Qtr3':47000,'Qtr4':59000}

Sales={2017:yr2017,2018:yr2018,2019:yr2019,2020:yr2020}

print("Combined Dictionary")

print(Sales)

print("\nConvert dictionary to DataFrame")

df=pd.DataFrame(Sales)

print(df)

Output:

Creating a DataFrame Object from a 2-D ndarray : You can also pass a two-dimensional NumPy array to DataFrame() to create a dataframe object.

Example:

>>> import pandas as pd

>>> import numpy as np

>>> arr=np.array([[1,2,3],[4,5,6]],np.int32)

>>> arr.shape

(2, 3)

>>> df=pd.DataFrame(arr)

>>> df

 Output

    0  1  2

0  1  2  3

1  4  5  6

 

Example : using column names and /or index names by giving a columns sequence and /or index sequence:

>>> import numpy as np

>>> import pandas as pd

>>> arr=np.array([[1,2,3],[4,5,6]],np.int32)

>>> df=pd.DataFrame(arr,columns=['One','Two','Three'])

>>> df

Output

Example: This example the columns and indexes have names or labels as per the given columns and index  sequences respectively.

>>> arr2=np.array([[11.5,21.2,33.8],[40,50,60],[212.3,301.5,405.2]])

>>> df1=pd.DataFrame(arr2, columns=['First','Second','Third'], index=['A','B','C'])

>>> df1

Output

Data Types of DataFrame columns: In pandas, many DataFrames have mixed data types that is Some columns string/integer/float/date etc. you can check the types of each column with .dtype  property of DataFrame.

>>> import pandas as pd

>>> data={'Name':['Sachin','Varun','Faiz','Trisha'], 'Marks':[76,98,65.5,54]}

>>> df=pd.DataFrame(data)

>>> df.dtypes

Output:

Name      object

Marks    float64

dtype: object

>>> df['Name'].dtypes

dtype('O')

>>> df['Marks'].dtypes

dtype('float64')

Rearrangeing the order of Columns

>>> df=pd.DataFrame(data, columns=['Marks','Name'])

>>> df

Output:

 


No comments:

Post a Comment