Creating violin plots using SAS and Python

  • Date August 30, 2019
  • Written by Pavel Rogatch
  • Category Python

This guide shows different ways of creating violin plots using SAS9API and Python.

Introduction

Violin plots are similar to box plots except that they also show the probability density (usually smoothed by a kernel density estimator) of the data. Violin plots can be symmetric or asymmetric. They can be very useful in data analysis.

There are several ways to create a violin plot using SAS and Python. Here we demonstrate how to create violin plots using cars and bweight  datasets from the SASHELP library. Five examples below will help you to learn how to create violin plots using SAS and different Python libraries: matplotlib , seaborn  and plotly .

Prerequisites

To follow the examples presented here you need to have the following:

  • access to SAS9API proxy,
  • Python 3 installed,
  • sas9api , pandas ,  matplotlib ,  seaborn  and plotly Python libraries.

Step 1 – Getting the necessary libraries

Python sas9api  library can be gotten at https://github.com/analytium/python-sas9api. We need to download the sas9api.py  file to our local computer. It can be done in different ways:

  • using the following command (if you have Git installed): git clone https://github.com/analytium/python-sas9api  to download this repository to a designated folder on a local computer;
  • going to https://github.com/analytium/python-sas9api/blob/master/sas9api.py, right-clicking on the Raw button in the top right and then choosing “Save target as” to save sas9api.py   file to a local computer.

We need to place sas9api.py  file in the same folder with our Python code or provide a path to it for a successful import.

If you do not have the following Python libraries installed, please follow the respective links for the installation instructions:

Now everything is ready to write a code.

Step 2 – Importing the necessary libraries

We need to import necessary libraries.

Step 3 – Specifying SAS9API URL and port

We assign our SAS server URL and port to a variable which will be used to access data in SAS libraries :

Step 4 – Retrieving data from a SAS server

Now let us retrieve data from two SAS datasets using retrieve_data  function from the sas9api  Python library.

We pass the following parameters to this function:

  • url  – your SAS server URL defined earlier,
  • library_name  – SAS library name,
  • dataset_name  – SAS dataset name,
  • limit  – maximum number of records to retrieve,
  • only_payload  – a flag set to True to get data as a list of dictionaries containing dataset records only without the response header.

Then we convert data to a Pandas DataFrame and display its first rows.

 

Cylinders DriveTrain EngineSize Horsepower Invoice Length MPG_City MPG_Highway MSRP Make Model Origin Type Weight Wheelbase
0 6.0 All 3.5 265.0 33337 189.0 17.0 23.0 36945 Acura MDX Asia SUV 4451.0 106.0
1 4.0 Front 2.0 200.0 21761 172.0 24.0 31.0 23820 Acura RSX Type S 2dr Asia Sedan 2778.0 101.0
2 4.0 Front 2.4 200.0 24647 183.0 22.0 29.0 26990 Acura TSX 4dr Asia Sedan 3230.0 105.0
3 6.0 Front 3.2 270.0 30299 186.0 20.0 28.0 33195 Acura TL 4dr Asia Sedan 3575.0 108.0
4 6.0 Front 3.5 225.0 39014 197.0 18.0 24.0 43755 Acura 3.5 RL 4dr Asia Sedan 3880.0 115.0

 

Black Boy CigsPerDay Married MomAge MomEdLevel MomSmoke MomWtGain Visit Weight
0 0.0 1.0 0.0 1.0 -3.0 0.0 0.0 -16.0 1.0 4111.0
1 0.0 0.0 0.0 1.0 1.0 2.0 0.0 2.0 3.0 3997.0
2 0.0 1.0 0.0 1.0 0.0 0.0 0.0 -3.0 3.0 3572.0
3 0.0 1.0 0.0 1.0 -1.0 2.0 0.0 -5.0 3.0 1956.0
4 0.0 1.0 0.0 1.0 -6.0 0.0 0.0 -20.0 3.0 3515.0

 

Now let us create different violin plots in several different ways.

Step 5 – Creating violin plots

Example 1 (using matplotlib  library): violin plots of Weight for every Type of cars.

Example 2 (using seaborn  library): violin plots of MPG_Highway for every Type of cars.

Example 3 (using seaborn  library): violin plots of weights of newborn babies depending on their sex and their mothers’ smoking habits.

Here we use hue nesting with a variable MomSmoke that takes two levels and set ‘split’ to True to draw half of a violin for each level. This will make it easier to directly compare the distributions for babies with smoking and non-smoking mothers.

Example 4 (using plotly  library): violin plots of invoice prices for cars depending on their origin.

Hovering mouse over the plots we will be able to see different information about data points.

Example 5 (using plotly  library): violin plots of weights of newborn babies depending on their mothers’ smoking habits.

Here we draw violins on top of each other to compare distributions for babies with smoking and non-smoking mothers.

Conclusion

We have shown several ways of how to create violin plots from your data stored at a SAS server. SAS9API enables you to access your SAS data and use it with different programming languages as needed. It is a powerful tool which gives you more flexibility in your analytical work. Here we have used Python language, but if you want to learn about using SAS with R language bridged by SAS9API, please refer to https://sas9api.io/examples/r-violin-plot/.