DataFrame-Pivot and Pivot Table
Pivoting DataFrame
Pivot tables are useful for summarizing data. A pivot table allows us to extract the
significance from a large, detailed data set. Pivot table are particularly
useful if you have long rows or columns holding values that you need to track
the sums of and easily compare to one another. They can automatically sort,
count, total or average data stored in one table. Then, they can show the
results of those actions in a new table of that summarized data inside
DataFrame.
In a general, pivoting means to use unique values from
specified index/columns to form axes of the resulting DataFrame. We can get
pandas to from a pivot table for out DataFrame by calling the pivot() or Pivot_table() methods and providing
parameters about how we would like the
resulting table to be organized.
There are two functions available in Python Pandas for
pivoting DataFrame.
1.
pivot()
2.
pivot_table()
1.
pivot()
pivot()
method creates a new dataframe after reshaping the data based on column values.
This method takes 3 arguments-index, columns and values. Out of these three arguments, a
minimum of two arguments have to be given.
pandas.pivot(index,
columns, values)
Hear,
·
Index
is the column name to use in order to make new DataFrame’s index.
·
Columns
is the column name to use to make new DataFrame’s columns.
·
Values is the column name to use to make
new DataFrame’s values. The values parameter accepts both numeric and string.
Example
import
pandas as pd
ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
print("To
Create pivot Table:\n")
pv=df.pivot(index='Name',columns='Subject',values='Score')
print(pv)
Output:
As can seen, the value of Score for every row in
the original table has been transferred to the new table, where its row and column match the Name and Subject of its original row. Also notice that many of
the values are NaN. This is because many of the positions of the table
do not have matching information from the original DataFrame and data are set
with NaN(None).
Using Pivot with .fillna()
If you want to avoid the NaN value is your new
DataFrame, you can use pivot() with the .fillna(‘’) function.
This that :
pv=df.pivot(index='Name',columns='Subject',values='Score').fillna
Example: Write source code to ‘score’ using
.fillna(‘’) function.
import pandas as pd
ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
print("To Create pivot
Table:\n")
pv=df.pivot(index='Name',columns='Subject',values='Score').fillna('')
print(pv)
Output:
Example : Let us to create pivot table to display
the Grade for every row in the original table using values= ‘Grade’”
import
pandas as pd
ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
print("To
Create pivot Table:\n")
pv=df.pivot(index='Name',columns='Subject',values='Grade').fillna('')
print(pv)
Output:
Pivoting By Multiple Columns:
To perform the pivoting operation on the basis of
multiple columns, ‘values’ attribute is to be skipped. If you skip the
values argument, it will consider the rest of the columns for values
automatically.
Example:
import
pandas as pd
ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
print("To
Create pivot Table:\n")
pv=df.pivot(index='Name',
columns='Subject')#value attribute skipped
print(pv)
Output:
Duplicate Entry Problem in a pivot table:
Let use create a new data set with duplicate entries:
Example :
import
pandas as pd
ClassXII={'Name':['Arshiya','Akshra','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
Only the Name
is duplicated. Here, if we apply the pivot() function, it won’t produce any valueError as
shown below:
Using stack and unstack Methods
Reshape using Stack() and unstack() function in Pandas
python: Reshaping the data using stack() function in pandas converts the data
into stacked format .i.e. the column is stacked row wise. When more than one
column header is present we can stack the specific column header by specified
the level. unstack() function in pandas converts the data into unstacked
format. Let’s see with an example.
|
Score |
Grade |
||
Subject |
Accounts |
Economics |
Accounts |
Economics |
Name |
|
|
|
|
Akshra |
NaN |
98.0 |
NaN |
A1 |
Arshiya |
99.0 |
NaN |
A1 |
NaN |
Deependra |
98.0 |
NaN |
A1 |
NaN |
Madhav |
43.0 |
NaN |
C1
|
NaN |
Rishabh |
NaN |
67.0 |
NaN
|
B2 |
Taran |
NaN |
87.0 |
NaN |
B2 |
Akshra |
NaN |
98.0 |
NaN |
A1 |
Arshiya |
99.0 |
NaN |
A1 |
NaN |
Example:
# pivot table
import
pandas as pd
ClassXII={'Name':['Arshiya','Taran','Deependra','Rishabh','Madhav','Akshra'],\
'Subject':['Accounts','Economics','Accounts','Economics','Accounts','Economics'],\
'Score':[99,87,98,67,43,98],'Grade':['A1','B2','A1','B2','C1','A1']}
df=pd.DataFrame(ClassXII,columns=['Name','Subject','Score','Grade'])
print(df)
print("To
Create pivot Table:\n")
pv=df.pivot(index='Name',columns='Subject')
print(pv)
print(pv.stack())
Output:
Pivoting
using Multiple Column with Filtering
In
the last example, we have seen that there are many indices created with
pivoting on multiple column and the values are displayed one after the other
for both subject column and the grades for each student on the basis of their names.
This repetition can be avoided and we can obtain the dataset can be filtered
using .fillna() function based on Marks and Grades respectively.
Pivot
Table using pivot_table() Method
A bit confusingly, pandas DataFrames also come
with a pivot_table() method,which is a generalization of the pivot()
method. Whenever you have duplicate values for one index/column pair,
you need to use the pivot_table().
Parameters:
data : |
DataFrame |
values : |
column to aggregate, optional |
Index : |
column, Grouper, array, or list of the previous |
columns: |
column, Grouper, array, or list of the previous |
aggfunc: |
function, list of functions, dict, default
numpy.mean -> If list of functions passed, the resulting
pivot table will have hierarchical columns whose top level are the function
names. |
fill_value[scalar, default None] : |
Value to replace missing values with |
margins[boolean, default False] : |
Add all row / columns (e.g. for subtotal / grand
totals) |
dropna[boolean, default True] : |
Do not include columns whose entries are all NaN |
margins_name[string, default ‘All’] : |
Name of the row / column that will contain the totals
when margins is True. |
Returns: |
DataFrame |
Example : Create DataFrame
import pandas as pd
import numpy as np
#Create a DataFrame
d = {'Name':['Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine',
'Alisa','Bobby','Cathrine','Alisa','Bobby','Cathrine'],
'Exam':['Semester
1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester
2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science','Mathematics','Mathematics','Mathematics','Science','Science','Science'],'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}
df =
pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])
print(df)
The resultant dataframe will be
Create pivot table in pandas python with aggregate function mean:
# pivot table using aggregate function mean
pd.pivot_table(df, index=['Exam','Subject'], aggfunc='mean')
So the pivot table with aggregate function
mean will be
Which shows the sum of scores of students
across subjects
Create pivot table in Pandas python
with aggregate function count:
# pivot table using aggregate function count
pd.pivot_table(df, index=['Exam','Subject'], aggfunc='count')
So the pivot table with aggregate function
count will be
Which shows the count of student who appeared for the exam of different subject
Differentiate between pivot() and pivot_table()
pivot()
method |
pivot_table()
method |
The index column of
pivot() method does not accept duplicate rows with values for specified
columns |
The pivot_table() accepts
duplicate values for one index/column pair you need to use. |
Pivot() allows both
numeric and string type as “values=”, |
pivo_table() only allow
numerical types as :”values=” |
Pivot() is used for
pivoting without aggregation |
pivot_tabel() work with
duplicate values by aggregating them. |
No comments:
Post a Comment