Python Tutorial: Matplotlib

Tuesday, 2 March 2021

Matplotlib

Plotting Data using Matplotlib

Plotting using Matplotlib Matplotlib library is used for creating static, animated, and interactive 2D- plots or figures in Python. It can be installed using the following pip command from the command prompt: pip install matplotlib For plotting using Matplotlib, we need to import its Pyplot module using the following command: import matplotlib.pyplot as plt Here, plt is an alias or an alternative name for matplotlib.pyplot. We can use any other alias also.


The pyplot module of matplotlib contains a collection of functions that can be used to work on a plot. The plot() function of the pyplot module is used to create a figure. A figure is the overall window where the outputs of pyplot functions are plotted. A figure contains a plotting area, legend, axis labels, ticks, title, etc. Each function makes some change to a figure: example, creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.

It is always expected that the data presented through charts easily understood. Hence, while presenting data we should always give a chart title, label the axis of the chart and provide legend in case we have more than one plotted data.

To plot x versus y, we can write plt.plot(x,y). The show() function is used to display the figure created using the plot() function. Let us consider that in a city, the maximum temperature of a day is recorded for three consecutive days. Program  demonstrates how to plot temperature values for the given dates. The output generated is a line chart.

Program - Plotting Temperature against Height

import matplotlib.pyplot as plt

#list storing date in string format

date=["25/12","26/12","27/12"]

#list storing temperature values

temp=[8.5,10.5,6.8]

#create a figure plotting temp versus date

plt.plot(date, temp)

#show the figure

plt.show()

List of Pyplot functions to plot different charts

plot(\*args[, scalex, scaley, data])                    

Plot x versus y as lines and/or markers.

bar(x, height[, width, bottom, align, data])

Make a bar plot.

boxplot(x[, notch, sym, vert, whis, ...])

Make a box and whisker plot.

hist(x[, bins, range, density, weights, ...])         

Plot a histogram.

pie(x[, explode, labels, colors, autopct, ...])

Plot a pie chart.

scatter(x, y[, s, c, marker, cmap, norm, ...])

A scatter plot of x versus y.


Customisation of Plots
Pyplot library gives us numerous functions, which can be used to customise charts such as adding titles or legends. Some of the customisation options are listed in Table

List of Pyplot functions to customise plots

grid([b, which, axis])                       

Configure the grid lines.

legend(\*args, \*\*kwargs)          

Place a legend on the axes.

savefig(\*args, \*\*kwargs)         

Save the current figure.

show(\*args, \*\*kw)                    

Display all figures.

title(label[, fontdict, loc, pad])       

Set a title for the axes.

xlabel(xlabel[, fontdict, labelpad])

Set the label for the x-axis.

xticks([ticks, labels])

Get or set the current tick locations and labels of the x-axis.

ylabel(ylabel[, fontdict, labelpad])

Set the label for the y-axis.

yticks([ticks, labels])                       

Get or set the current tick locations and labels of the y-axis.


Program 4-2 Plotting a line chart of date versus temperature by adding Label on X and Y axis, and adding a Title and Grids to the chart.

Answer :

import matplotlib.pyplot as plt
date=["25/12","26/12","27/12"]
temp=[8.5,10.5,6.8]
plt.plot(date, temp)
plt.xlabel("Date") #add the Label on x-axis
plt.ylabel("Temperature") #add the Label on y-axis
plt.title("Date wise Temperature") #add the title to the chart
plt.grid(True) #add gridlines to the background
plt.yticks(temp)
plt.show()



 Marker

We can make certain other changes to plots by passing various parameters to the plot() function. In Figure, we plot temperatures day-wise. It is also possible to specify each point in the line through a marker.Think and Reflect

A marker is any symbol that represents a data value in a line chart or a scatter plot. Table 4.3 shows a list of markers along with their corresponding symbol and description. These markers can be used in program codes:

Colour
It is also possible to format the plot further by changing the colour of the plotted data. Table 4.4 shows the list of colours that are supported. We can either use character codes or the color names as values to the parameter color in the plot().

Colour abbreviations for plotting

Character

Colur

‘b’

blue

‘g’

green

‘r’

red

‘c’

cyan

‘m’

magenta

‘y’

yellow

‘k’

black

‘w’

white


Linewidth and Line Style
The linewidth and linestyle property can be used to change the width and the style of the line chart.
Linewidth is specified in pixels. The default line width is 1 pixel showing a thin line. Thus, a number greater than 1 will output a thicker line depending on the value provided. We can also set the line style of a line chart using the linestyle parameter. It can take a string such as "solid", "dotted", "dashed" or "dashdot".

Program - Consider the average heights and weights of persons aged 8 to 16 stored in the following two lists:

height = [121.9,124.5,129.5,134.6,139.7,147.3, 152.4, 157.5,162.6]

weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6, 43.2]

Let us plot a line chart where:

i. x axis will represent weight
ii. y axis will represent height
iii. x axis label should be “Weight in kg”
iv. y axis label should be “Height in cm”
v. colour of the line should be green
vi. use * as marker
vii. Marker size as10
viii. The title of the chart should be “Average weight with respect to average height”.
ix. Line style should be dashed
x. Linewidth should be 2.

Answer

import matplotlib.pyplot as plt

import pandas as pd

height=[121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]

weight=[19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]

df=pd.DataFrame({"height":height,"weight":weight})

#Set xlabel for the plot

plt.xlabel('Weight in kg')

#Set ylabel for the plot

plt.ylabel('Height in cm')

#Set chart title:

plt.title('Average weight with respect to average height')

#plot using marker'-*' and line colour as green

plt.plot(df.weight,df.height,marker='*',markersize=10,color='green',linewidth=2, linestyle='dashdot')

plt.show()


The Pandas Plot function (Pandas Visualisation)

we learnt that the plot() function of the pyplot module of matplotlib can be used to plot a chart. However, starting from version 0.17.0, Pandas objects Series and DataFrame come equipped with their own .plot() methods. This plot() method is just a simple wrapper around the plot() function of pyplot. Thus, if we have a Series or DataFrame type object (let's say 's' or 'df') we can call the plot method by writing:s.plot() or df.plot().

The plot() method of Pandas accepts a considerable number of arguments that can be used to plot a variety of graphs. It allows customising different plot types by supplying the kind keyword arguments. The general syntax is: plt.plot(kind),where kind accepts a string indicating the type of .plot, as listed in Table. In addition, we can use the matplotlib.pyplot methods and functions also along with the plt() method of Pandas objects.

Arguments accepted by kind for different plots

Kind =

Plot Type

line

Line plot(default)

bar

Vertical bar plot

barh

Horizontal bar plot

hist

Histogram

box

Boxplot

area

Area plot

pie

Pie plot

scatter

Scatter plot

Plotting a Line chart

A line plot is a graph that shows the frequency of data along a number line. It is used to show continuous dataset. A line plot is used to visualise growth or decline in data over a time interval. We have already plotted line charts through Programs . In this section, we will learn to plot a line chart for data stored in a DataFrame.

Program - Smile NGO has participated in a three week cultural mela. Using Pandas, they have stored the sales (in Rs) made day wise for every week in a CSV file named “MelaSales.csv”, as shown in Table .

Day-wise mela sales data

Week 1

Week 2

Week 3

5000

4000

4000

5900

3000

5800

6500

5000

3500

3500

5500

2500

4000

3000

3000

5300

300

5300

7900

5900

6000

 Depict the sales for the three weeks using a Line chart. It should have the following:

i. Chart title as “Mela Sales Report”.

ii. axis label as Days.

iii. axis label as “Sales in Rs”.

Line colours are red for week 1, blue for week 2 and brown for week 3.

import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()

Customising Line Plot
We can substitute the ticks at x axis with a list of values of our choice by using plt.xticks(ticks,label) where ticks is a list of locations(locs) on x axis at which ticks should be placed, label is a list of items to place at the given ticks.

Program -Assuming the same CSV file, i.e., MelaSales. CSV, plot the line chart with following
customisations:
Maker ="*"
Marker size=10
linestyle="--"
Linewidth =3
Answer:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("MelaSales.csv")
#creates plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'],marker="*",marke
rsize=10,linewidth=3,linestyle="--")
plt.title('Mela Sales Report')
plt.xlabel('Days')
plt.ylabel('Sales in Rs')
#store converted index of DataFrame to a list
ticks = df.index.tolist()
#displays corresponding day on x axis
plt.xticks(ticks,df.Day)
plt.show()

Plotting Bar Chart
The line plot in Figure  shows that the sales for all the weeks increased during the weekend. Other than
weekends, it also shows that the sales increased on Wednesday for Week 1, on Thursday for Week 2 and on Tuesday for Week 3. But, the lines are unable to efficiently depict comparison between the weeks for which the sales data is plotted. In order to show comparisons, we prefer Bar charts. Unlike line plots, bar charts can plot strings on the x axis. To plot a bar chart, we will specify kind=’bar’. We can also specify the DataFrame columns to be used as x and y axes.
Let us now add a column “Days” consisting of day names to “MelaSales.csv”

Day-wise sales data along with Day’s names

Week 1

Week 2

Week 3

Day

5000

4000

4000

Monday

5900

3000

5800

Tuesday

6500

5000

3500

Wednesday

3500

5500

2500

Thursday

4000

3000

3000

Friday

5300

300

5300

Saturday

7900

5900

6000

Sunday


Program 4-6 This program displays the Python script to display Bar plot for the “MelaSales.csv” file with column Day on x axis as shown below in

Answer:

 

import pandas as pd

df= pd.read_csv('MelaSales.csv')

import matplotlib.pyplot as plt

# plots a bar chart with the column "Days" as x axis

df.plot(kind='bar',x='Day',title='Mela Sales Report')

#set title and set ylabel

plt.ylabel('Sales in Rs')

plt.show()

Customising Bar Chart
We can also customise the bar chart by adding certain parameters to the plot function. We can control the edgecolor of the bar, linestyle and linewidth. We can also control the color of the lines. The following example shows various customisations on the bar chart of Figure:


Program- Let us write a Python script to display Bar plot for the “MelaSales.csv” file with column Day on x axis, and having the following customisation:
● Changing the color of each bar to red, yellow and purple.
● Edgecolor to green
● Linewidth as 2
● Line style as "--"
Answer
import pandas as pd
import matplotlib.pyplot as plt
df= pd.read_csv('MelaSales.csv')
# plots a bar chart with the column "Days" as x axis
df.plot(kind='bar',x='Day',title='Mela Sales Report',color=['red',
'yellow','purple'],edgecolor='Green',linewidth=2,linestyle='--')
#set title and set ylabel
plt.ylabel('Sales in Rs')
plt.show()

Plotting Histogram
Histograms are column-charts, where each column represents a range of values, and the height of a column corresponds to how many values are in that range. To make a histogram, the data is sorted into "bins" and the number of data points in each bin is counted. The height of each column in the histogram is then proportional to the number of data points its bin contains. The df.plot(kind=’hist’) function automatically selects the size of the bins based on the spread of values in the data.
Program 

import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar', 'Bincy', 'Yash','Nazar'],
               'Height' : [60,61,63,65,61,60],
              'Weight' : [47,89,52,58,50,47]}
            }
df=pd.DataFrame(data)
df.plot(kind='hist')
plt.show()


It is also possible to set value for the bins parameter, for example,
df.plot(kind=’hist’,bins=20)
df.plot(kind='hist',bins=[18,19,20,21,22])
df.plot(kind='hist',bins=range(18,25))

Customising Histogram

Taking the same data as above, now let see how the histogram can be customised. Let us change the edgecolor, which is the border of each hist, to green. Also, let us change the line style to ":" and line width to 2. Let us try another property called fill, which takes boolean values. The default True means each hist will be filled with color and False means each hist will be empty. Another property called hatch can be used to fill to each hist with pattern ( '-', '+', 'x', '\\', '*', 'o', 'O', '.'). In the Program 4-10, we have used the hatch value as "o". 

Program 

import pandas as pd
import matplotlib.pyplot as plt
data = {'Name':['Arnav', 'Sheela', 'Azhar','Bincy','Yash',
'Nazar'],
'Height' : [60,61,63,65,61,60],
'Weight' : [47,89,52,58,50,47]}
df=pd.DataFrame(data)
df.plot(kind='hist',edgecolor='Green',linewidth=2,linestyle=':',fil
l=False,hatch='o')
plt.show()
 

 



No comments:

Post a Comment