Python Tutorial: Pandas DataFrame-Indexing

Pandas Set_Index : set_index()

Reset_index() function to make the index start from 0. This function transfers the index values into the DataFrame’s columns and set a simple integer index. This is inverse operation to set_index() function. The Syntax is:

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters:

Name	Description	Type/Default Value	Required / Optional
level	For a Series with a MultiIndex, only remove the specified levels from the index. Removes all levels by default.	int, str, tuple, or list,	optional
drop	Just reset the index, without inserting it as a column in the new DataFrame.	bool Default Value: False	Required
name	The name to use for the column containing the original Series values. Uses self.name by default. This argument is ignored when drop is True.	object	optional
inplace	Modify the Series in place (do not create a new object).	bool Default Value: False	Required

Example:

import pandas as pd

import numpy as np

df = pd.DataFrame([('bird', 389.0),

('bird', 24.0),

('mammal', 80.5),

('mammal', np.nan)],

index=['falcon', 'parrot', 'lion', 'monkey'],

columns=('class', 'max_speed'))

print(df)

print("\nWhen we reset the index, the old index is added as a column, and a new sequential index is used:")

print(df.reset_index())

print("\nWe can use the drop parameter to avoid the old index being added as a column:")

print(df.reset_index(drop=True))

Output

Example 1: Using column heading as index

In this example, it is shown how one of the columns of the dataframe is used for setting the index through set_index() function.

>>> df = pd.DataFrame({'month': [3, 5, 7, 9, 11],

'year': [2011, 2013, 2015, 2017, 2019],

'sale': [85, 40, 78, 87,97]})

>>> df

Output

	month	year	sale
0	3	2011	85
1	5	2013	40
2	7	2015	78
3	9	2017	87
4	11	2019	97

As shown the earlier index is discarded and month column is used for index.

>>> df.set_index('month')

Output

	year	sale
month
3	2011	85
5	2013	40
7	2015	78
9	2017	87
11	2019	97

Example 2: Using multiple columns as index

In this example, a couple of columns are used for setting the index in the set_index() function of pandas.

>>> df.set_index(['year', 'month'])

Output

		sale
year	month
2011	3	85
2013	5	40
2015	7	78
2017	9	87
2019	11	97

Example 3: Using set_index function on series data

In this example, set_index function is passed with series data. The series data is then appended to the existing dataframe as a column

>>> s = pd.Series([3,6,9,12,15])

>>> s

Output :

0 3

1 6

2 9

3 12

4 15

dtype: int64

>>> df.set_index([s, s**3])

Output:

		month	year	sale
3	27	3	2011	85
6	216	5	2013	40
9	729	7	2015	78
12	1728	9	2017	87
15	3375	11	2019	97

Pandas Reset_Index : reset_index()

The pandas reset_index() function is used for resetting the index of dataframe.

Syntax

pandas.DataFrame.reset_index(level, drop, inplace, col_level, col_fill)

level : int, str, tuple, or list, default None	– It is used to specify the levels which needs to be dropped.
drop : bool	– For resetting the index to default integer index value.
inplace : bool	– For modifying the dataframe inplace.
col_level : int or str	– This helps in selection of the columns that have multiple levels, it determines which level the labels are inserted into.
col_fill : object	– If the columns have multiple levels, determines how the other levels are named.

The pandas reset_index function returns a dataframe with new index or nothing is returned.

Example 1: Simple example of reset_index() function

Here a dataframe is created and then, using reset_index() function, the dataframe is provided with an index.

>>> df = pd.DataFrame([('fruit', 389.0),

('fruit', np.nan),

('vegetable', 80.5),

('vegetable', 450.5 )],

index=['kiwi', 'mango', 'potato', 'tomato'],

columns=('type', 'water_content'))

>>> df

Output :

	type	water_content
kiwi	fruit	389.0
mango	fruit	NaN
potato	vegetable	80.5
tomato	vegetable	450.5

>>> df.reset_index()

Output :

	index	type	water_content
0	kiwi	fruit	389.0
1	mango	fruit	NaN
2	potato	vegetable	80.5
3	tomato	vegetable	450.5

Example 2: Using level parameter with multiindex in reset_index function

In this example, the reset_index function is provided level parameter. Here we have created MultiIndex and then reset_index function is used.

>>> index = pd.MultiIndex.from_tuples([('B-class', 'BMW'),

('B-class', 'Audi'),

('A-class', 'Jaguar'),

('A-class', 'Mercedes')],

names=['class', 'name'])

>>> columns = pd.MultiIndex.from_tuples([('speed', 'max'),

('company', 'type')])

>>> df = pd.DataFrame([(389.0, 'Sedan'),

( 24.0, 'Sedan'),

( 80.5, 'Hatchback'),

(np.nan, 'Sports')],

index=index,

columns=columns)

>>> df

Output :

		speed	company
		max	type
class	name
B-class	BMW	389.0	Sedan
B-class	Audi	24.0	Sedan
A-class	Jaguar	80.5	Hatchback
A-class	Mercedes	NaN	Sports

Using reset_index function, we are able to pass the class value to level parameter. As we can see, the class parameter is used as index.

>>> df.reset_index(level='class')

Output :

	class	speed	company
		max	type
name
BMW	B-class	389.0	Sedan
Audi	B-class	24.0	Sedan
Jaguar	A-class	80.5	Hatchback
Mercedes	A-class	NaN	Sports

Python Tutorial

Friday, 4 September 2020

Pandas DataFrame-Indexing

No comments:

Post a Comment