10
Lesson 10
Box plots and violin plots
Objective
By the end of this lesson, students will understand how to create box plots for statistical visualization and violin plots for distribution visualization using Matplotlib. They will also understand how to customize and interpret these plots.
1. Understanding box plots (boxplot()) for statistical visualization:
A box plot (also known as a box-and-whisker plot) provides a graphical summary of data through its five-number summary:
- Minimum
- First quartile (Q1)
- Median (Q2)
- Third quartile (Q3)
- Maximum
Additionally, box plots highlight outliers in the data.
Syntax:
By the end of this lesson, students will understand how to create box plots for statistical visualization and violin plots for distribution visualization using Matplotlib. They will also understand how to customize and interpret these plots.
1. Understanding box plots (boxplot()) for statistical visualization:
A box plot (also known as a box-and-whisker plot) provides a graphical summary of data through its five-number summary:
- Minimum
- First quartile (Q1)
- Median (Q2)
- Third quartile (Q3)
- Maximum
Additionally, box plots highlight outliers in the data.
Syntax:
plt.boxplot(data, vert, patch_artist)
- data : The input data, usually in the form of a list or array.
- vert : A boolean to set the orientation of the plot (vertical or horizontal).
- patch_artist : A boolean to control whether the boxes are filled with color.
Example: Basic box plot
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # 100 data points from a normal distribution # Create a box plot plt.boxplot(data) # Add title and labels plt.title('Box Plot Example') plt.ylabel('Values') # Show the plot plt.show()
In this example:
- The boxplot() function creates a basic box plot that shows the distribution and spread of the data.
- The median is represented by a horizontal line inside the box, while the whiskers show the range.
2. Customizing box plots:
Box plots can be customized in terms of orientation, color, and other visual aspects.
a. Horizontal box plot:
You can create horizontal box plots by setting the vert parameter to False.
Example:
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # Create a horizontal box plot plt.boxplot(data, vert=False) # Add title and labels plt.title('Horizontal Box Plot') plt.xlabel('Values') # Show the plot plt.show()
b. Customizing box colors:
You can fill the boxes with color by using the patch_artist=True parameter.
Example:
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # Create a box plot with custom colors box = plt.boxplot(data, patch_artist=True) # Customize the box color for patch in box['boxes']: patch.set_facecolor('lightblue') plt.title('Box Plot with Custom Colors') plt.ylabel('Values') # Show the plot plt.show()
In this example, each box is filled with a light blue color, enhancing the visual appeal.
3. Understanding violin plots for distribution visualization:
A violin plot is a hybrid of a box plot and a density plot. It shows both the summary statistics of the data (like a box plot) and the distribution of the data (like a kernel density estimate).
Syntax:
plt.violinplot(data, vert, showmedians)
- data : The input data.
- vert : Controls whether the plot is vertical or horizontal.
- showmedians : A boolean to display the median line.
Example: Basic violin plot
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # Create a violin plot plt.violinplot(data) # Add title and labels plt.title('Violin Plot Example') plt.ylabel('Values') # Show the plot plt.show()
In this example: The violinplot() function displays the distribution of the data, with wider sections indicating more data points and narrower sections showing fewer data points.
4. Customizing violin plots:
Violin plots can be customized in various ways, including orientation, showing the median, and adding multiple violins for comparison.
a. Horizontal violin plot:
Similar to box plots, you can make violin plots horizontal by setting vert=False.
Example:
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # Create a horizontal violin plot plt.violinplot(data, vert=False) # Add title and labels plt.title('Horizontal Violin Plot') plt.xlabel('Values') # Show the plot plt.show()
b. Showing medians in violin plots:
You can add a line representing the median by using the showmedians=True parameter.
Example:
import matplotlib.pyplot as plt import numpy as np # Sample data data = np.random.normal(0, 1, 100) # Create a violin plot with the median line plt.violinplot(data, showmedians=True) # Add title and labels plt.title('Violin Plot with Median') plt.ylabel('Values') # Show the plot plt.show()
5. Box plots vs. violin plots:
Both box plots and violin plots are used for statistical visualization, but they have different strengths:
- Box plots are better for summarizing data with quartiles and outliers.
- Violin plots are better for visualizing the full distribution of the data.
6. Exercises:
Exercice: 1
1. Create a basic box plot. Use the following data: [3, 4, 2, 5, 6, 7, 5, 3, 4, 6]
2. Add titles and labels to the plot.
Exercice: 2
1. Create a box plot with the following data: [1, 2, 5, 4, 3, 6, 8, 9, 7, 6]
2. Make the plot horizontal and customize the box color to green.
Exercice: 3
1. Create a violin plot. Use the following data: [10, 15, 13, 18, 17, 14, 16, 11, 12]
2. Create a violin plot and show the median line.
Exercice: 4
1. Create both a box plot and a violin plot using the same data: [2, 4, 6, 8, 10, 12, 14, 16, 18]
2. Compare the two visualizations and explain which one provides better insight into the data.
Conclusion
In this lesson, we explored how to create box plots and violin plots using Matplotlib. We learned how to customize the appearance of these plots and understand their use in visualizing statistical data and distributions. In the next lesson, we will explore more advanced visualization techniques using Matplotlib.