DataFrame-Pivot and Pivot Table

Pivoting DataFrame

Pivot tables are useful for summarizing data. A pivot table allows us to extract the significance from a large, detailed data set. Pivot table are particularly useful if you have long rows or columns holding values that you need to track the sums of and easily compare to one another. They can automatically sort, count, total or average data stored in one table. Then, they can show the results of those actions in a new table of that summarized data inside DataFrame.

In a general, pivoting means to use unique values from specified index/columns to form axes of the resulting DataFrame. We can get pandas to from a pivot table for out DataFrame by calling the pivot() or Pivot_table() methods and providing parameters about how we would like the resulting table to be organized.

There are two functions available in Python Pandas for pivoting DataFrame.

1. pivot()

2. pivot_table()

1. pivot()

pivot() method creates a new dataframe after reshaping the data based on column values. This method takes 3 arguments-index, columns and values. Out of these three arguments, a minimum of two arguments have to be given.

pandas.pivot(index, columns, values)

Hear,

· Index is the column name to use in order to make new DataFrame’s index.

· Columns is the column name to use to make new DataFrame’s columns.

· Values is the column name to use to make new DataFrame’s values. The values parameter accepts both numeric and string.

Example

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Score')

print(pv)

Output:

As can seen, the value of Score for every row in the original table has been transferred to the new table, where its row and column match the Name and Subject of its original row. Also notice that many of the values are NaN. This is because many of the positions of the table do not have matching information from the original DataFrame and data are set with NaN(None).

Using Pivot with .fillna()

If you want to avoid the NaN value is your new DataFrame, you can use pivot() with the .fillna(‘’) function.

This that :

pv=df.pivot(index='Name',columns='Subject',values='Score').fillna

Example: Write source code to ‘score’ using .fillna(‘’) function.

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Score').fillna('')

print(pv)

Output:

Example : Let us to create pivot table to display the Grade for every row in the original table using values= ‘Grade’”

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject',values='Grade').fillna('')

print(pv)

Output:

Pivoting By Multiple Columns:

To perform the pivoting operation on the basis of multiple columns, ‘values’ attribute is to be skipped. If you skip the values argument, it will consider the rest of the columns for values automatically.

Example:

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name', columns='Subject')#value attribute skipped

print(pv)

Output:

Duplicate Entry Problem in a pivot table:

Let use create a new data set with duplicate entries:

Example :

import pandas as pd

ClassXII={'Name':['Arshiya','Akshra','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

Only the Name is duplicated. Here, if we apply the pivot() function, it won’t produce any valueError as shown below:

Using stack and unstack Methods

Reshape using Stack() and unstack() function in Pandas python: Reshaping the data using stack() function in pandas converts the data into stacked format .i.e. the column is stacked row wise. When more than one column header is present we can stack the specific column header by specified the level. unstack() function in pandas converts the data into unstacked format. Let’s see with an example.

	Score		Grade
Subject	Accounts	Economics	Accounts	Economics
Name
Akshra	NaN	98.0	NaN	A1
Arshiya	99.0	NaN	A1	NaN
Deependra	98.0	NaN	A1	NaN
Madhav	43.0	NaN	C1	NaN
Rishabh	NaN	67.0	NaN	B2
Taran	NaN	87.0	NaN	B2
Akshra	NaN	98.0	NaN	A1
Arshiya	99.0	NaN	A1	NaN

Example:

# pivot table

import pandas as pd

ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\

'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\

'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}

df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])

print(df)

print("To Create pivot Table:\n")

pv=df.pivot(index='Name',columns='Subject')

print(pv)

print(pv.stack())

Output:

Pivoting using Multiple Column with Filtering

In the last example, we have seen that there are many indices created with pivoting on multiple column and the values are displayed one after the other for both subject column and the grades for each student on the basis of their names. This repetition can be avoided and we can obtain the dataset can be filtered using .fillna() function based on Marks and Grades respectively.

Pivot Table using pivot_table() Method

A bit confusingly, pandas DataFrames also come with a pivot_table() method,which is a generalization of the pivot() method. Whenever you have duplicate values for one index/column pair, you need to use the pivot_table().

Parameters:

data :	DataFrame
values :	column to aggregate, optional
Index :	column, Grouper, array, or list of the previous
columns:	column, Grouper, array, or list of the previous
aggfunc:	function, list of functions, dict, default numpy.mean -> If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names. -> If dict is passed, the key is column to aggregate and value is function or list of functions
fill_value[scalar, default None] :	Value to replace missing values with
margins[boolean, default False] :	Add all row / columns (e.g. for subtotal / grand totals)
dropna[boolean, default True] :	Do not include columns whose entries are all NaN
margins_name[string, default ‘All’] :	Name of the row / column that will contain the totals when margins is True.
Returns:	DataFrame

Example : Create DataFrame

import pandas as pd

import numpy as np

#Create a DataFrame

d = {'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',

'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],

'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],

'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science','Mathematics','Mathematics','Mathematics','Science','Science','Science'],'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}

df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])

print(df)

The resultant dataframe will be

Create pivot table in pandas python with aggregate function mean:

# pivot table using aggregate function mean

pd.pivot_table(df, index=['Exam','Subject'], aggfunc='mean')

So the pivot table with aggregate function mean will be

Which shows the sum of scores of students across subjects

Create pivot table in Pandas python with aggregate function count:

# pivot table using aggregate function count

pd.pivot_table(df, index=['Exam','Subject'], aggfunc='count')

So the pivot table with aggregate function count will be

Which shows the count of student who appeared for the exam of different subject

Differentiate between pivot() and pivot_table()

pivot() method	pivot_table() method
The index column of pivot() method does not accept duplicate rows with values for specified columns	The pivot_table() accepts duplicate values for one index/column pair you need to use.
Pivot() allows both numeric and string type as “values=”,	pivo_table() only allow numerical types as :”values=”
Pivot() is used for pivoting without aggregation	pivot_tabel() work with duplicate values by aggregating them.

Python Tutorial

Wednesday, 30 September 2020

Pandas DataFrame-Pivot and Pivot Table

Create pivot table in Pandas python with aggregate function count:

Differentiate between pivot() and pivot_table()

No comments:

Post a Comment