Python Tutorial: Pandas DataFrame-Pivot and Pivot Table

Wednesday, 30 September 2020

Pandas DataFrame-Pivot and Pivot Table

                           DataFrame-Pivot  and Pivot Table

Pivoting DataFrame

Pivot tables are useful for summarizing data.  A pivot table allows us to extract the significance from a large, detailed data set. Pivot table are particularly useful if you have long rows or columns holding values that you need to track the sums of and easily compare to one another. They can automatically sort, count, total or average data stored in one table. Then, they can show the results of those actions in a new table of that summarized data inside DataFrame.

In a general, pivoting means to use unique values from specified index/columns to form axes of the resulting DataFrame. We can get pandas to from a pivot table for out DataFrame by calling the pivot()  or Pivot_table() methods and providing parameters  about how we would like the resulting table to be organized.

There are two functions available in Python Pandas for pivoting DataFrame.

1.   pivot()

2.   pivot_table()

1.   pivot()

pivot() method creates a new dataframe after reshaping the data based on column values. This method takes 3 arguments-index, columns and  values. Out of these three arguments, a minimum of two arguments have to be given.

        pandas.pivot(index, columns, values)

        Hear,

·         Index is the column name to use in order to make new DataFrame’s index.

·         Columns is the column name to use to make new DataFrame’s columns.

·         Values is the column name to use to make new DataFrame’s values. The values parameter accepts both numeric and string.

Example

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Score')

print(pv)

Output:


As can seen, the value of Score for every row in the original table has been transferred to the new table, where its row  and column match the  Name and Subject  of its original row. Also notice that many of the values are NaN. This is because many of the positions of the table do not have matching information from the original DataFrame and data are set with NaN(None).

Using Pivot with .fillna()

If you want to avoid the NaN value is your new DataFrame, you can use pivot() with the .fillna(‘’) function.

This that :

pv=df.pivot(index='Name',columns='Subject',values='Score').fillna

 

Example: Write source code to ‘score’ using .fillna(‘’) function.

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Score').fillna('')

print(pv)

 

Output:


Example : Let us to create pivot table to display the Grade for every row in the original table using values= ‘Grade’”

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Grade').fillna('')

print(pv)

Output:


Pivoting By Multiple Columns:

To perform the pivoting operation on the basis of multiple columns, ‘values’ attribute is to be skipped. If you skip the values argument, it will consider the rest of the columns for values automatically.

Example:

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name', columns='Subject')#value attribute skipped

print(pv)

Output:


Duplicate Entry Problem in a pivot table:

Let use create a new data set with duplicate entries:

Example :

import pandas as pd

ClassXII={'Name':['Arshiya','Akshra','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

Only  the Name is duplicated. Here, if we apply the pivot()  function, it won’t produce any valueError as shown below:



Using stack and unstack Methods

Reshape using Stack() and unstack() function in Pandas python: Reshaping the data using stack() function in pandas converts the data into stacked format .i.e. the column is stacked row wise. When more than one column header is present we can stack the specific column header by specified the level. unstack() function in pandas converts the data into unstacked format. Let’s see with an example.

 

Score

Grade

Subject  

Accounts

Economics

Accounts

Economics

Name                                          

 

 

 

 

Akshra 

NaN     

98.0     

NaN       

A1

Arshiya      

99.0      

NaN      

A1      

NaN

Deependra    

98.0      

NaN      

A1      

NaN

Madhav       

43.0      

NaN     

 C1      

NaN

Rishabh       

NaN     

67.0    

 NaN       

B2

Taran         

NaN     

87.0     

NaN       

B2

Akshra 

NaN     

98.0     

NaN       

A1

Arshiya      

99.0      

NaN      

A1      

NaN

Example:

# pivot table

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

          'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

          'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject')

print(pv)

print(pv.stack()) 

Output:


Pivoting using Multiple Column with Filtering

In the last example, we have seen that there are many indices created with pivoting on multiple column and the values are displayed one after the other for both subject column and the grades for each student on the basis of their names. This repetition can be avoided and we can obtain the dataset can be filtered using .fillna() function based on Marks and Grades respectively.



Pivot Table using pivot_table() Method

 A bit confusingly, pandas DataFrames also come with a pivot_table() method,which is a generalization of the pivot() method. Whenever you have duplicate values for one index/column pair, you need to use the pivot_table().

Parameters:

data

DataFrame

values      

column to aggregate, optional

Index     :

 column, Grouper, array, or list of the previous

columns:

 column, Grouper, array, or list of the previous

aggfunc:

 function, list of functions, dict, default numpy.mean

-> If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names.
-> If dict is passed, the key is column to aggregate and value is function or list of functions

 

fill_value[scalar, default None] : 

Value to replace missing values with

margins[boolean, default False] : 

Add all row / columns (e.g. for subtotal / grand totals)

dropna[boolean, default True] : 

Do not include columns whose entries are all NaN

margins_name[string, default ‘All’] : 

Name of the row / column that will contain the totals when margins is True.

 

Returns:

 DataFrame

 

Example : Create DataFrame

import pandas as pd

import numpy as np

#Create a DataFrame

d = {'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',

            'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],

    'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],   

'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science','Mathematics','Mathematics','Mathematics','Science','Science','Science'],'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}

df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])

print(df)

 

The resultant dataframe will be



Create pivot table in pandas python with aggregate function mean:

# pivot table using aggregate function mean
pd.pivot_table(df, index=['Exam','Subject'], aggfunc='mean')

So the pivot table with aggregate function mean will be


Which shows the sum of scores of students across subjects

Create pivot table in Pandas python with aggregate function count:

# pivot table using aggregate function count
 
pd.pivot_table(df, index=['Exam','Subject'], aggfunc='count')

So the pivot table with aggregate function count will be


Which shows the count of student who appeared for the exam of different subject

  Differentiate between pivot() and pivot_table()

pivot() method

pivot_table() method

The index column of pivot() method does not accept duplicate rows with values for specified columns

The pivot_table() accepts duplicate values for one index/column pair you need to use.

Pivot() allows both numeric and string type as “values=”,

pivo_table() only allow numerical types as :”values=”

Pivot() is used for pivoting without aggregation

pivot_tabel() work with duplicate values by aggregating them.

No comments:

Post a Comment