Build a Data Dashboard with Streamlit in Python
Streamlit is an open-source Python framework that lets you turn data scripts into shareable web apps in minutes. Streamlit makes it easy for data scientists and analysts to create and deploy interactive visualizations and dashboards for machine learning models and other Python applications.
You need almost no experience with building front ends to get started with Streamlit. It is designed to do the heavy lifting of generating an intuitive and responsive interface from a simple Python script.
This tutorial will teach you how to build a dashboard for a Github dataset of movie records. You’ll then learn how to deploy the web app and interactively explore the dataset, visualize, and retrieve information from it.
Why Should You Use Streamlit?
There are several reasons to choose Streamlit for data visualization. Some of them include:
Streamlit is written for Python and is compatible with major Python libraries for data analysis and machine learning. The Streamlit interface is intuitive and user-friendly. Streamlit dashboards can be hosted on Streamlit Cloud. You can configure Streamlit to monitor the GitHub repository where the code and data are hosted, and update the web app when changes are made to the repository.
Before we get started, you should know that you need a good understanding of data analysis and visualization in Python. This is because Streamlit only lets you embed your visualization within its framework and display them as web apps; you still need to analyze the data and meaningfully visualize it.
From the command prompt, install Streamlit using pip by running the following command:
pip install streamlit
To check if the installation worked, create a new Python script in your editor, import streamlit under the alias
st, and use the
write function to print out some text. The
st.write() function is used to display information like text, dataframes, or figures.
import streamlit as st "Hello World!") st.write("Hello Streamlit!")st.write(
Run the app on your browser (in your command prompt, change the directory to the folder where your file is located) by running this command:
streamlit run file_name.py
This will automatically open a tab in your browser.
If you get this output, your installation works and you are ready to use Streamlit.
Plotting With Streamlit
You can find the code for this project on Github and the real-time application on Streamlit Cloud.
Adding a Matplotlib Chart
st.pyplot() is the Streamlit function to create figure objects and plots. To create a matplotlib visualization, you have to perform data analysis and then create the visualization. For this section, we will use this movie industry dataset from Github.
This dataset contains over 7000 movie entries—from the period 1986-2016—scraped from IMDb (Internet Movie Database). It lists movies of different genres and countries. I’ll be using this dataset to create different interactive plots for this tutorial.
import streamlit as st import pandas as pd import matplotlib.pyplot as plt
We use the pandas
read_csv() function to read in the data into a dataframe.
#read in the file = pd.read_csv("https://raw.githubusercontent.com/danielgrijalva/movie-stats/7c6a562377ab5c91bb80c405be50a0494ae8e582/movies.csv")movies_data
To generate a summary of the dataset and check for missing values and duplicates, we’ll use the following functions:
#output <class 'pandas.core.frame.DataFrame'> RangeIndex: 7668 entries, 0 to 7667 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 7668 non-null object 1 rating 7591 non-null object 2 genre 7668 non-null object 3 year 7668 non-null int64 4 released 7666 non-null object 5 score 7665 non-null float64 6 votes 7665 non-null float64 7 director 7668 non-null object 8 writer 7665 non-null object 9 star 7667 non-null object 10 country 7665 non-null object 11 budget 5497 non-null float64 12 gross 7479 non-null float64 13 company 7651 non-null object 14 runtime 7664 non-null float64 dtypes: float64(5), int64(1), object(9)
movies_data.info() gives a quick overview of our dataset. We can see that there are 7668 entries (rows) and a total of 15 columns.
#output 0 False 1 False 2 False 3 False 4 False ... 7663 False 7664 False 7665 False 7666 False 7667 False Length: 7668, dtype: bool
movies_data.duplicated() checks if there are any duplicates. All rows returned
False which means there are no duplicates.
#output name 7668 rating 7591 genre 7668 year 7668 released 7666 score 7665 votes 7665 director 7668 writer 7665 star 7667 country 7665 budget 5497 gross 7479 company 7651 runtime 7664
count() method on the dataframe:
movies_data.count() returns the sum of all entries in a column. Columns with less than 7668 entries suggest missing values.
We dropped all columns with missing data using
movies_data.dropna(). The output is a new dataframe. Next, we’ll create a Matplotlib bar chart that shows the average movie budget of movies in different genres.
""" st.write(Average Movie Budget, Grouped by Genre """) = movies_data.groupby('genre')['budget'].mean().round() avg_budget = avg_budget.reset_index() avg_budget = avg_budget['genre'] genre = avg_budget['budget']avg_bud
groupby method groups data by categories using the columns of a dataset and applies a function to it. Here we group by the ‘genre’ and the ‘budget’. And we apply the
mean() and the
round() functions. The
mean() function returns the average of a list of numbers while the
round() function rounds up digits and returns a float. The
reset_index() method resets the index of an updated dataframe; creating a new row index that starts at 0. Resetting indexes is important so pandas can find the indexes of elements.
= plt.figure(figsize = (19, 10)) fig = 'maroon') plt.bar(genre, avg_bud, color 'genre') plt.xlabel('budget') plt.ylabel('Matplotlib Bar Chart Showing the Average \ plt.title(Budget of Movies in Each Genre')
Matplotlib has a function called
show() that creates a figure object. In Streamlit, this line should be replaced with
variable_name is the variable of visualization.
Layouts in Streamlit
How the dashboards are structured determines how well they’ll be received by all stakeholders. When a dashboard is messy, it confuses the users.
Streamlit offers a couple of options to lay out elements on a screen. Columns are the most common, but there are other containers like tabs, expanders, and sidebars. For this tutorial, we will focus on columns and sidebars.
Columns in Streamlit operate just as they do in documents and on web pages. They are also highly responsive and automatically resize on different screens.
To create columns, simply assign them to the variables that match the number of columns you need. Here,
col2 are the variable names because we need two columns.
= st.columns(2) col1, col2 '# This is Column 1') col1.write('# This is Column 2')col2.write(
We can as well create columns of different dimensions, where columns are of different sizes.
'### Columns of different sizes') st.write(= st.columns([1,3,1,2]) col1, col2, col3, col4 '# This is Column 1') col1.write('# This is Column 2') col2.write('# This is Column 3') col3.write('# This is Column 4')col4.write(
Working With Widgets
What Are Widgets?
Widgets are the elements that allow us to interact with data. Streamlit offers different widgets like sidebars, sliders, multiselect, text_input, radio button, and checkbox. Each widget has a different use case.
Why Are Widgets Important?
Widgets are important to interact with the rendered plots and charts. Before starting out, determine which widgets will be best for the project. A couple of steps are needed to make a widget interactive. We’ll look at them as we create widgets.
Here’s an overview of widgets used in this tutorial:
Slider: A slider is a widget that accepts numerical, date, or time data as input. It changes information according to the range of values selected.
Multiselect: Multiselect accepts strings and creates multiple selections of labels containing selected options. The default is a blank label so it should be assigned a default value.
Selectbox: Selectbox displays a select widget with options in a drop-down format.
Sidebar: A sidebar creates a sidebar at the left side of the page where other widgets, text, and even plots can reside. It is a very easy way to manage space on the web app .
To link data to a widget, we first convert the needed column to a unique list. This is important so only unique values are selected:
# Creating sidebar widget unique values from our movies dataset = movies_data['score'].unique().tolist() score_rating = movies_data['genre'].unique().tolist() genre_list = movies_data['year'].unique().tolist()year_list
Creating a Sidebar and Adding Other Widgets
Create Layouts Using
with statement provides a simpler, more organized way of displaying Streamlit layouts especially if multiple widgets or variables are attached to a layout component. Using
with makes the code easier to maintain.
We use the
with statement to group all elements of a layout together. We’ve implemented it with a sidebar layout, as shown:
with st.sidebar: "Select a range on the slider (it represents movie score) \ st.write( to view the total number of movies in a genre that falls \ within that range ") #create a slider to hold user scores = st.slider(label = "Choose a value:", new_score_rating = 1.0, min_value = 10.0, max_value = (3.0,4.0)) value #create a multiselect widget to display genre = st.multiselect('Choose Genre:', new_genre_list = ['Animation',\ genre_list, default 'Horror', 'Fantasy', 'Romance']) #create a selectbox option that holds all unique years = st.selectbox('Choose a Year', year 0) year_list,
st.sidebar() is the function to call a sidebar widget. Once this is specified, the sidebar is automatically created. Our sidebar will hold different widgets so we use it in the
st.slider is the function that creates a slider widget. It takes in parameters like label,
max_value, and a
min_value is the specified minimum value. The
max_value is the specified maximum value. The
value is the point where it is rendered.
st.multiselect() is the function that creates a multiselect widget. From our tutorial, we created a multiselect widget that displays all unique genres from the ‘genre’ column. We pre-selected ‘Animation’, ‘Horror’, ‘Fantasy’ and ‘Romance’ genres. While interacting with this widget, users can select or deselect as many options as they wish. A selectbox is created by calling the
st.selectbox() function. The selectbox represents a drop-down menu that allows only one option to be picked at a time. We linked the ‘year’ column to this widget, so we can only pick one year at a time.
To add interactivity among the slider, the selectbox, the multiselect widgets, and the plots on the main page, we need to create filters. We do this by mapping the columns of the dataframe to their unique list and using it in the analysis. By doing so, we can ensure that only the selected widgets affect a plot.
#Configure and filter the slider widget for interactivity = (movies_data['score'].between(*new_score_rating))score_info
We will be linking the slider widget to the line chart that displays the number of movies in a particular genre that have scores that fall within a specified range. We mapped the ‘score’ column to the slider widget. Therefore, whenever a user interacts with the slider, the line chart changes as well.
#Filter the selectbox and multiselect widget for interactivity = (movies_data['genre'].isin(new_genre_list)) \ new_genre_year & (movies_data['year'] == year)
We need the multiselect widget and the selectbox that holds genre and year to work together. We will be creating a dataframe that changes movie titles according to the year and genre(s) selected. In our configuration, we mapped the ‘genre’ column to the variable of our multiselect widget and mapped the ‘year’ column to the variable of our selectbox widget and joined them both using ‘and’.
# visualization section #group the columns needed for visualizations = st.columns([2,3]) col1, col2 with col1: """#### Lists of movies filtered by year and Genre """) st.write(= movies_data[new_genre_year]\ dataframe_genre_year 'name', 'genre'])['year'].sum() .groupby([= dataframe_genre_year.reset_index() dataframe_genre_year = 400) st.dataframe(dataframe_genre_year, width with col2: """#### User score of movies and their genre """) st.write(= movies_data[score_info]\ rating_count_year 'genre')['score'].count() .groupby(= rating_count_year.reset_index() rating_count_year = px.line(rating_count_year, x = 'genre', y = 'score') figpx st.plotly_chart(figpx)
In this tutorial, we have learned about Streamlit: what it is, and why we should use it, and the basics of data visualization with Streamlit. You’ve also learned how to create interactive visualizations by programmatically adding interactivity to a Plotly chart. As a next step, you can try building a data dashboard for a dataset of your choice.
Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.