Python Tutorial: Pandas DataFrame-CSV

Thursday, 10 September 2020

Pandas DataFrame-CSV

What Is a CSV File?

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. 

If you open the above CSV file using a text editor such as sublime text, you will see:

ItemName,Quantity,Price

LED,5,30000

LCD,6,24000

Woofer,10,3000

Laptop,3,30000

desktop,3,25000

headphone,2,1400

CRT,5,30000

TFt,6,24000

Sound,10,300

CD Writer,3,5000

DVD Writer,3,450

Smart Mobile,2,1400

DataTransfer between dataFrames and .csv file

CSV format is a kind of tabular data separated by comma and is stored in the form of plan text.

 

 

 

 

 In CSV Format/propeties :

  • One line for each record
  • Comma separated fields
  • Space-characters adjacent to commas are ignored
  • Fields with in-built commas are separated by double quote characters.
  • Fields with double quote characters must be surrounded by double quotes. Each inbuilt double quote must be represented by a pair of consecutive quotes
  • Fields that contain inbuilt line-breaks must be surrounded by double quotes

When Use CSV?

  • When data has a strict tabular structure
  • To transfer large database between programs
  • To import and export data to office applications, Qedoc modules
  • To store, manage and modify shopping cart catalogue

 CSV Advantages

  • CSV is human readable and easy to edit manually
  • CSV is simple to implement and parse
  • CSV is processed by almost all existing applications
  • CSV provides a straightforward information schema
  • CSV is faster to handle
  • CSV is smaller in size
  • CSV is considered to be standard format
  • CSV is compact. For XML you start tag and end tag for each column in each row. In CSV you write the column headers only once.
  • CSV is easy to generate

CSV Disadvantages

  • CSV allows to move most basic data only. Complex configurations cannot be imported and exported this way
  • There is no distinction between text and numeric values
  • No standard way to represent binary data
  • Problems with importing CSV into SQL (no distinction between NULL and quotes)
  • Poor support of special characters
  • No standard way to represent control characters
  • Lack of universal standard

Write a CSV File

You can save your Pandas DataFrame as a CSV file with .to_csv():

Example:

import pandas as pd

SData={"name":['Jyotika','Muskan','Amit','Anchit','Aviral','Mahi'],\

       'Accounts':[54,76,98,54,76,87],'English':[89,87,54,89,43,67],\

       'Bst':[65,67,87,56,87,54]}

print("Convert dictionary to dataframe")

df=pd.DataFrame(SData)

print(df)

df.to_csv('data.csv')

That’s it! You’ve created the file data.csv in your current working directory.

Read a CSV File

Once your data is saved in a CSV file, you’ll likely want to load and use it from time to time. You can do that with the Pandas read_csv() function:

Example:

import pandas as pd

df = pd.read_csv('data.csv', index_col=0)

print(df)

Output:


 

Example : To display the shape (number of rows/record and columns/fields) of the CSV file.


>>> row,column=df.shape

>>> row

6

>>> column

4

>>> 

Reading CSV file with specific/selected columns :

While working with large tables in CSV format, there can be several columns contained in it. But you may require selective columns to be read into a dataframe. This can be done by using “usecols” attribute or option along with read_csv() method.

Example :

import pandas as pd

df = pd.read_csv('data.csv', usecols=['name','English'])

print(df)

Output:

Reading CSV file with specific/selected rows :

Like columns , there can be thousands of records in a CSV file. You can display selective records/rows using “nrows” option or attribute use with read_csv() method.

Example :

import pandas as pd

df = pd.read_csv('data.csv', nrows=3, index_col=0)

print(df)

Output:

Reading CSV file without header :

 If you can do not want to display the first row as the header for dataframe using Employee table, then this can be done by specifying None argument for ‘header’  option or ‘skiprows  option using read_csv() method.

import pandas as pd

df = pd.read_csv('data.csv', nrows=3, header=None)

print(df)

Output:

Reading CSV file with new column names

You can read and load data from CSV file into a dataframe with new column names(rename) while reading the .csv file. If the header exists, you have to skip it using skiprows option along with names options for renaming the columns.

Example :

import pandas as pd

df = pd.read_csv('data.csv',index_col=0)

print(df)

print('\nReading CSV file with rename column \n')

df = pd.read_csv('data.csv',skiprows=1,names=['Sname','Acct','Eng','BusStudies',])

print(df)

Output:

 

 

 


No comments:

Post a Comment