Python Tutorial: Pandas DataFrame-Add Column(s) & Row(s)

Monday, 19 June 2023

Pandas DataFrame-Add Column(s) & Row(s)

Table of Contents

1.      How to Add a row in a DataFrame

·         Method 1: Add a new row at the End of using loc method

·         Method 2: Add a new row at the End of using append method

·         Method 3: Add a new row at the End of using concat method

·         Method 4: Add multiple rows at End of using append/_append method

2.     How to Add a column in a Dataframe

·         Method 1: Add a new Column at the End using indexing method

·         Method 2: Add a new Column at the End using loc method

·         Method 3: Add a new Column at the End using brackets

·         Method 4: Add a new Column at the End using insert method

·         Method 5: Add a new Column using Concat Method

Adding row(s) into DataFrame

Method 1: Add a new row at the End of using loc method

In this method, The loc method of pandas DataFrame allows users can add specific row and column lables. The loc method standards for “location” and is used to filter data by specifying the row and column indices.

Labal based add new row:  The .loc[] method is used to access Dataframe elements by label, and it supports adding new rows as well as creating copies of existing ones.

Example:

import pandas as pd

marks  = { "Eng" :[67,89,90,55],

           "Maths":[55,67,45,56],

            "IP":[66,78,89,90],

           "Acct" :[45,56,67,65],

           "B.St.":[54,65,76,87]}

result = pd.DataFrame(marks,index=["Manoj","Mandeep","Ravi","Sumedha"])

print("******************Marksheet****************")

print(result)

print("\n Add Row")

n  = { "Eng" :45,"Maths":55,"IP":33,"Acct" :66,"B.St.":22}

result.loc['Aditi']=n

print(result)

Output:



Add a new row using .loc[len(df)]: 
If you don’t know, how many row in previous Dataframe and add new row. You have to find out how many rows are there in the first dataframe? 
For that, you can find out the number of rows by using the len() function.
Example:
import pandas as pd
marks  = { "Eng" :[67,89,90,55],
           "Maths":[55,67,45,56],
            "IP":[66,78,89,90],
           "Acct" :[45,56,67,65],
           "B.St.":[54,65,76,87]}
result = pd.DataFrame(marks)
print("******************Marksheet****************")
print(result)
print("\n Add Row")
 
n  = { "Eng" :45,"Maths":55,"IP":33,"Acct" :66,"B.St.":22}
result.loc[len(result)]=n
print(result)
Output:


Method 2: Add a new row at the End of using append/_append method

Example 1: Add Row to DataFrame

In this example, we will create a DataFrame and append a new row to this DataFrame. The new row is initialized as a Python Dictionary and append() function is used to append the row to the dataframe.

When you are adding a Python Dictionary to append(), make sure that you pass ignore_index=True.

The append() method returns the dataframe with the newly added row.

Python Program

import pandas as pd
data = {'name': ['Somu', 'Kiku', 'Amol', 'Lini'],
               'physics': [68, 74, 77, 78],
               'chemistry': [84, 56, 73, 69],
               'algebra': [78, 88, 82, 87]}
#create dataframe
df_marks = pd.DataFrame(data)
print('Original DataFrame\n------------------')
print(df_marks)
new_row = {'name':'Geo', 'physics':87, 'chemistry':92, 'algebra':97}
#append row to the dataframe
df_marks = df_marks.append(new_row, ignore_index=True)
 
print('\n\nNew row added to DataFrame\n--------------------------')
print(df_marks)
Output

Run the above Python program, and you shall see the original dataframe, and the dataframe appended with the new row.

Original DataFrame
------------------
   name  physics  chemistry  algebra
0  Somu       68         84       78
1  Kiku       74         56       88
2  Amol       77         73       82
3  Lini       78         69       87
 
New row added to DataFrame
--------------------------
   name  physics  chemistry  algebra
0  Somu       68         84       78
1  Kiku       74         56       88
2  Amol       77         73       82
3  Lini       78         69       87
4   Geo       87         92       97

Example 2: Add Row to Pandas DataFrame (ignoreIndex = False)

If you do not provide the parameter ignoreIndex=False, you will get TypeError.

In the following example, we will try to append a row to DataFrame with the parameter ignoreIndex=False.

Python Program

import pandas as pd
data = {'name': ['Amol', 'Lini'],
               'physics': [77, 78],
               'chemistry': [73, 85]}
 #create dataframe
df_marks = pd.DataFrame(data)
print('Original DataFrame\n------------------')
print(df_marks)
 new_row = {'name':'Geo', 'physics':87, 'chemistry':92}
#append row to the dataframe
df_marks = df_marks.append(new_row, ignore_index=False)
 print('\n\nNew row added to DataFrame\n--------------------------')
print(df_marks)
Output
Original DataFrame
------------------
   name  physics  chemistry
0  Amol       77         73
1  Lini       78         85
Traceback (most recent call last):
  File "example1.py", line 14, in <module>
    df_marks = df_marks.append(new_row, ignore_index=False)
  File "C:\Users\PythonExamples\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\frame.py", line 6658, in append
    raise TypeError('Can only append a Series if ignore_index=True'
TypeError: Can only append a Series if ignore_index=True or if the Series has a name

As the error message says, we need to either provide the parameter ignore_index=True or append the row, that is Series with a name.

We have already seen in Example 1, how to add row to the DataFrame with ignore_index=True. Now we will see how to add a row with ignore_index=False.

Python Program

import pandas as pd
data = {'name': ['Amol', 'Lini'],
               'physics': [77, 78],
               'chemistry': [73, 85]}
#create dataframe
df_marks = pd.DataFrame(data)
print('Original DataFrame\n------------------')
print(df_marks)
new_row = pd.Series(data={'name':'Geo', 'physics':87, 'chemistry':92}, name='x')
#append row to the dataframe
df_marks = df_marks.append(new_row, ignore_index=False)
print('\n\nNew row added to DataFrame\n--------------------------')
print(df_marks)
We have named the Series as data. Therefore ignore_index=False does not return a TypeError and the row is appended to the
DataFrame.

Output

Original DataFrame
------------------
   name  physics  chemistry
0  Amol       77         73
1  Lini       78         85
 
New row added to DataFrame
--------------------------
   name  physics  chemistry
0  Amol       77         73
1  Lini       78         85
x   Geo       87         92

 

Python latest version using _append method

Example:

import pandas as pd
data1 = pd.DataFrame({ "ID" : [15, 16, 17, 18, 19],  "Name": ["Abid", "Matthew", "Nisha", "Natassha", "Nahla"],
        "CGPA": [2.3, 3.0, 3.9, 2.5, 3.2],
        "Dept": ["EEE", "IT", "CS", "BA", "LAW"],
        "Region": ["Islamabad", "Ontario", "London", "Saba", "Denver"], })
print(data1)
print('\nAdd new row\n')
row1 = pd.Series([25, 'Franc', 3.3, 'CS', 'Paris'], index=data1.columns)
data1 = data1._append(row1,ignore_index=True)
print(data1)
Output:





Method 3: Add a new row at the End of using concat method

Syntax:

result=pd.concate([list of DataFrames],axis=0,ignore_index=False)

 Using ignore_index

The option is used whether or not the original row labels should be retained or not. By default it is false.

Example :

import pandas as pd

SData={"name":['Adil','Deepak','Satyam'],\

       'Accounts':[54,76,87],'English':[89,43,67],\

       'Bst':[56,87,54]}

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData)

print(df)

print()

newDF=pd.DataFrame({'name':['Arif','Dilshad'],'Accounts':[56,79],\

                    'English':[67,87],'Bst':[67,78]},index=[0,1])

print("New Data Frame\n ",newDF)

df2=pd.concat([df,newDF])

print("\nAfter add rows new data frame\n and show index separately\n",df2)

Output :




Using ignore_index=True

 If you want that the resultant object has to follow its own index, then set ignore_index  to True.

Example:

import pandas as pd

SData={"name":['Adil','Deepak','Satyam'],\

       'Accounts':[54,76,87],'English':[89,43,67],\

       'Bst':[56,87,54]}

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData)

print(df)

print()

newDF=pd.DataFrame({'name':['Arif','Dilshad'],'Accounts':[56,79], 'English':[67,87], 'Bst':                                                            [67,78], 'IP':[98,88]},index=[0,1])

print("New Data Frame\n ",newDF)

df2=pd.concat([df,newDF],ignore_index=True)

print("\nafter Ignore Index\n",df2)

Output:

Method 4: Add multiple rows at End of using append/_append method

 According to Python version use append/_append (Python latest version) method

# Import pandas Python module

import pandas as pd

# Create a sample pandas DataFrame object

df = pd.DataFrame({'RegNo': [111, 112, 113, 114, 115],

                   'Name': ['Sankriti', 'Vedant', 'Rashmi', 'Kirti', 'Ravi'],

                   'CGPA': [9.05, 9.03, 8.85, 7.85, 9.75],

                   'Dept': ['ECE', 'ICE', 'IT', 'CSE', 'CHE'],

                   'City': ['Jalandhar','Ranchi','Patna','Patiala','Rajgir']})

# Print the created pandas DataFrame

print('Sample pandas DataFrame:\n')

print(df)

df2 = pd.DataFrame({'RegNo': [119, 120, 121],

                   'Name': ['Ankita', 'Vanshika', 'Jyotsna'],

                   'CGPA': [8.85, 9.03, 7.85],

                   'Dept': ['ECE', 'ICE', 'IT'],

                   'City': ['Jalandhar','Ranchi','Patna']})

# Print the newly created pandas DataFrame object

print('Add multiple rows pandas DataFrame:\n')

print(df2)

# Append the rows of the above pandas DataFrame to the existing pandas DataFrame

# Using the DataFrame.append()

df = df._append(df2,ignore_index=True)   OR    df = df.append(df2,ignore_index=True)

# Print the modified pandas DataFrame object after addition of rows

print('\nModified Sample pandas DataFrame:\n')

print(df)

Note: If show error “AttributeError: 'DataFrame' object has no attribute 'append'. Did you mean: '_append'?”. Replace append method with _append method.

Output:




Adding column(s) into DataFrame

There are five ways to add a new column into a DataFrame. These are indexing, locassign(), insert() and concat(). The concat() function can Adding Columns and Rows.

A Table Student Information

Name

Position

City

Age

Sex

Aakash

Manager

Delhi

35

M

Sunakshi

Programmer

Mumbai

37

F

Amitabh

Manager

Kanpur

33

M

Madhuri

Programmer

Mumbai

40

F

Ashish

Manager

Kanpur

27

M

Akshay

Programmer

Kanpur

34

M

Preeti

Programmer

Delhi

26

F

Govinda

Manager

Delhi

30

M

 

>>> import pandas as pd

>>> Student={'Name':['Aakash','Sunakshi','Amitabh','Madhuri','Ashish','Akshay','Preeti','Govinda']}

>>> df=pd.DataFrame(Student,columns=['Name'])

>>> df

 

Output

      Name

0    Aakash

1  Sunakshi

2   Amitabh

3   Madhuri

4    Ashish

5    Akshay

6    Preeti

7   Govinda

 Method1 : Using Index

The DataFrame df is now integer index based. To add a second column i.e. Position using indexing process,

>>> df['Position']=['Manager','Programmer','Manager','Programmer','Manager','Programmer','Programmer','Manager']

>>> df




Method 2 : Using .loc

>>> df.loc[:,'City']=['Delhi','Mumbai','Kanpur','Mumbai','Kanpur','Kanpur','Delhi','Delhi']

>>> df



Method3: Using assign() function

.loc has two limitations that it mutates the DataFrame in-place and it can’t be used with method chaining. If that’s a problem for you, use assign() function. The assign() function in Python, assigns the new column to the existing DataFrame. The Syntax is

DataFrame=DataFrame.assign(List)

Here,

The DataFrame on left side of assignment sign assigns a new DataFrame along with a new list (i.e. List).

The DataFrame on right side of assignment sign can assigns a list temporarily.

If the both the DataFrame name is same then the new DataFrame will hold a new column as the List.

 >>> df=df.assign(Age=[35,37,33,40,27,34,26,30])

>>> df


Method 4: Using Insert() function

The insert() function adds a column at the column index position. In a DataFrame, the first column is started from 0,1,2,3,4….. and so on. To add a column using insert() function.

Syntax

DataFrameName.insert(loc, column, value, allow_duplicates = False)

Parameters:

loc:

loc is an integer which is the location of column where we want to insert new column. This will shift the existing column at that position to the right.

column:

column is a string which is name of column to be inserted.

value:

value is simply the value to be inserted. It can be int, string, float or anything or even series / List of values. Providing only one value will set the same value for all rows.

allow_duplicates :

allow_duplicates is a boolean value which checks if column with same name already exists or not.

Example

>>> idx=4

>>> Sex=['M','F','M','F','M','M','F','M']

>>> df.insert(loc=idx,column="Sex",value=Sex)

>>> df

Output:


Method 5: Add Column to Pandas DataFrame concate() function

You can also specify axis=1 in order or join, merge or concatenate along the columns.

Example : Add column Total_Price is a Dataframe

import pandas as pd

SData={"name":['Adil','Deepak','Satyam'],\

       'Accounts':[54,76,87],'English':[89,43,67],\

       'Bst':[56,87,54]}

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData)

print(df)

Temp=pd.DataFrame({"Total":[]})

df=pd.concat([df,Temp],axis=1)

print(df)

Output: 





 

 

No comments:

Post a Comment