Creating violin plots using SAS and Python

  • Date August 30, 2019
  • Written by Pavel Rogatch
  • Category Python

This guide shows different ways of creating violin plots using SAS9API and Python.


Violin plots are similar to box plots except that they also show the probability density (usually smoothed by a kernel density estimator) of the data. Violin plots can be symmetric or asymmetric. They can be very useful in data analysis.

There are several ways to create a violin plot using SAS and Python. Here we demonstrate how to create violin plots using cars and bweight  datasets from the SASHELP library. Five examples below will help you to learn how to create violin plots using SAS and different Python libraries: matplotlib , seaborn  and plotly .


To follow the examples presented here you need to have the following:

  • access to SAS9API proxy,
  • Python 3 installed,
  • sas9api , pandas ,  matplotlib ,  seaborn  and plotly Python libraries.

Step 1 – Getting the necessary libraries

Python sas9api  library can be gotten at We need to download the  file to our local computer. It can be done in different ways:

  • using the following command (if you have Git installed): git clone  to download this repository to a designated folder on a local computer;
  • going to, right-clicking on the Raw button in the top right and then choosing “Save target as” to save   file to a local computer.

We need to place  file in the same folder with our Python code or provide a path to it for a successful import.

If you do not have the following Python libraries installed, please follow the respective links for the installation instructions:

Now everything is ready to write a code.

Step 2 – Importing the necessary libraries

We need to import necessary libraries.

import sas9api  as sas             # Enables you to connect to a SAS server
import pandas as pd                # Used for a more convenient way to present data
import matplotlib.pyplot as plt    # Python plotting library

Step 3 – Specifying SAS9API URL and port

We assign our SAS server URL and port to a variable which will be used to access data in SAS libraries :

url = "your_SAS9API_url:port"

Step 4 – Retrieving data from a SAS server

Now let us retrieve data from two SAS datasets using retrieve_data  function from the sas9api  Python library.

We pass the following parameters to this function:

  • url  – your SAS server URL defined earlier,
  • library_name  – SAS library name,
  • dataset_name  – SAS dataset name,
  • limit  – maximum number of records to retrieve,
  • only_payload  – a flag set to True to get data as a list of dictionaries containing dataset records only without the response header.

Then we convert data to a Pandas DataFrame and display its first rows.

# Data from the 'cars' dataset
dat = sas.retrieve_data(url, library_name="sashelp", dataset_name="cars", limit=10000, only_payload=True)
cars_df = pd.DataFrame(dat)


Cylinders DriveTrain EngineSize Horsepower Invoice Length MPG_City MPG_Highway MSRP Make Model Origin Type Weight Wheelbase
0 6.0 All 3.5 265.0 33337 189.0 17.0 23.0 36945 Acura MDX Asia SUV 4451.0 106.0
1 4.0 Front 2.0 200.0 21761 172.0 24.0 31.0 23820 Acura RSX Type S 2dr Asia Sedan 2778.0 101.0
2 4.0 Front 2.4 200.0 24647 183.0 22.0 29.0 26990 Acura TSX 4dr Asia Sedan 3230.0 105.0
3 6.0 Front 3.2 270.0 30299 186.0 20.0 28.0 33195 Acura TL 4dr Asia Sedan 3575.0 108.0
4 6.0 Front 3.5 225.0 39014 197.0 18.0 24.0 43755 Acura 3.5 RL 4dr Asia Sedan 3880.0 115.0
# Data from the 'bweight' dataset
bweight_dat = sas.retrieve_data(url, library_name="sashelp", dataset_name="bweight",
server_name="SASApp", limit=10000, only_payload=True)
bweight_df = pd.DataFrame(bweight_dat)


Black Boy CigsPerDay Married MomAge MomEdLevel MomSmoke MomWtGain Visit Weight
0 0.0 1.0 0.0 1.0 -3.0 0.0 0.0 -16.0 1.0 4111.0
1 0.0 0.0 0.0 1.0 1.0 2.0 0.0 2.0 3.0 3997.0
2 0.0 1.0 0.0 1.0 0.0 0.0 0.0 -3.0 3.0 3572.0
3 0.0 1.0 0.0 1.0 -1.0 2.0 0.0 -5.0 3.0 1956.0
4 0.0 1.0 0.0 1.0 -6.0 0.0 0.0 -20.0 3.0 3515.0


Now let us create different violin plots in several different ways.

Step 5 – Creating violin plots

Example 1 (using matplotlib  library): violin plots of Weight for every Type of cars.

# Prepare a list of distributions of weights for each type of cars
car_weights = []
for car in cars_df["Type"].unique():
car_weights.append(list(cars_df[cars_df["Type"] == car]["Weight"]))

# Create a figure instance
fig = plt.figure()

# Create an axes instance
ax = fig.add_axes([0,0,1,1])

# Create the violin plot
ax.violinplot(car_weights, showmeans=True)

# Customize axis
labels = cars_df["Type"].unique()
ax.set_xlabel('Vehicle Type')
ax.set_title("Weight of Vehicles")

Example 2 (using seaborn  library): violin plots of MPG_Highway for every Type of cars.

# Import 'seaborn' library.
import seaborn as sns

# Create violin plot using 'cars' dataset
ax = sns.violinplot(x="Type", y="MPG_Highway", data=cars_df)
ax.set_title("Miles per Gallon for Different Types of Vehicles");

Example 3 (using seaborn  library): violin plots of weights of newborn babies depending on their sex and their mothers’ smoking habits.

Here we use hue nesting with a variable MomSmoke that takes two levels and set ‘split’ to True to draw half of a violin for each level. This will make it easier to directly compare the distributions for babies with smoking and non-smoking mothers.

# Create violin plot using 'bweight' dataset
ax = sns.violinplot(x="Boy", y="Weight", hue="MomSmoke",
data=bweight_df, palette="muted", split=True)
ax.set_xticklabels(["Girl", "Boy"]);
ax.set_title("Weights of newborn babies");

Example 4 (using plotly  library): violin plots of invoice prices for cars depending on their origin.

Hovering mouse over the plots we will be able to see different information about data points.

# Import 'plotly' library
import as px

# Create violin plot using 'cars' dataset
fig = px.violin(cars_df, y="Invoice", x="Origin", box=True,
title="Invoice prices for cars depending on their origin")

Example 5 (using plotly  library): violin plots of weights of newborn babies depending on their mothers’ smoking habits.

Here we draw violins on top of each other to compare distributions for babies with smoking and non-smoking mothers.

# Import 'plotly' library
import as px

# Create violin plot using 'bweight' dataset
fig = px.violin(bweight_df, y="Weight", color="MomSmoke", box=True,
violinmode='overlay', # draw violins on top of each other
hover_data=bweight_df.columns, title="Weight of newborn babies")


We have shown several ways of how to create violin plots from your data stored at a SAS server. SAS9API enables you to access your SAS data and use it with different programming languages as needed. It is a powerful tool which gives you more flexibility in your analytical work. Here we have used Python language, but if you want to learn about using SAS with R language bridged by SAS9API, please refer to