import pandas as pd
import streamlit as st
= pd.read_csv("athlete_details_eventwise.csv")
athlete_statistics
st.dataframe(athlete_statistics)
35 Reducing Load Time with Caching
We often want to load data files into Streamlit.
There are two main issues we might run into
Each time you rerun something in your app (e.g. filtering the dataframe by using a drop-down select box), then the code to load the dataframe gets run again too. This can make your app feel really sluggish with larger data files
When we reach the stage of deploying our app to the web, we have to be a bit more conscious of how much memory our app is using when being accessed by multiple people simultaneously By default, the data will be loaded in separately for each user - even though it’s identical This can quickly lead to memory requirements ballooning (and your app crashing!)
By using the @st.cache_data
decorator, Streamlit will intelligently handle the loading in of datasets to minimize unnecessary reloads and memory use.
To switch from doing a standard data import to using caching, we
- Turn our data import into a function that returns the data
- Add the
@st.cache_data
decorator directly above the function - Call the function in our code to load the data in, assigning the output (the dataframe) to a variable
We can then just use our dataframe like normal - Streamlit will handle the awkward bits.
35.1 A Caching Code Example
35.1.1 Without Caching
35.1.2 Loading in the Same Dataset with Caching
import pandas as pd
import streamlit as st
@st.cache_data
def load_data():
return pd.read_csv("athlete_details_eventwise.csv")
= load_data()
athlete_statistics
st.dataframe(athlete_statistics)
35.2 How Does Caching Affect the App?
When a dataset is cached, this means it will almost be excluded from the top-to-bottom running of the app.
If the version of the cache is deemed to be valid, then rather than loading the dataset in again from scratch and using memory on your web host to do so, it will use the version already in the cache.
This will make your app feel far more responsive - when actions that trigger a rerun, like changing the value of a slider or the value entered in a text/numeric input, as the data reload isn’t triggered, subsequent changes to the dataset will feel much quicker.
Here is an example of the same app with and without caching so you can compare performance.
In this example, the performance improvement is very minor due to the small size of the dataset. With larger datasets with tens of thousands of rows or geodataframes that often contain complex information of 20+ mb, then the performance difference is significantly bigger.
35.2.1 Without Caching
35.2.2 With Caching
35.3 Caching Other Steps
If you have certain actions that take a long time, then you may wish to cache those too.
For example, if you had a long-running data cleaning step that you ran when the data was loaded into the app, then you could cache that as well.
import pandas as pd
import streamlit as st
@st.cache_data
def load_data():
return pd.read_csv("athlete_details_eventwise.csv")
= load_data()
athlete_statistics
@st.cache_data
def clean_data():
= athlete_statistics ... ## long-running code here...
my_clean_dataframe
return my_clean_dataframe
= clean_data()
clean_athlete_statistics
st.dataframe(clean_athlete_statistics)
35.4 Caching Types: Data vs Resources
Caching can also be used for other large files - for example, if you wanted to load in a trained machine learning model that’s the same for all users who will interact with your app.
It’s not always obvious which to use, so head to the Streamlit documentation if you’re a bit unsure.
35.5 Caching Functions with Parameters
While using caching for the initial load of a dataframe, it can also make sense to use it for other long-running functions.
Like normal functions, functions for caching can accept parameters.
Streamlit will be able to look at the parameters passed in and tell whether it should
use a cached version of the data (because the parameters are the same as a previous instance that’s already in its cache)
run the function (because it’s a new set of parameters that it doesn’t already have a saved output for in its cache)
There are a few complexities around parameters that you may want to look into if using this in your own app: https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters
35.6 Caching settings
When you choose to cache something, you can add in some additional settings to make the cache as effective as possible for your particular situation.
35.6.1 TTL (time to live)
TTL determines how long to keep cached data before rerunning it.
For example, if data is likely to change over time, you could set the ttl parameter to set the number of seconds to keep cached data for.
@st.cache_data(ttl=3600)
def your_function_here():
…
This also prevents your cache from becoming very large. Large caches could exceed the memory limits on your host.
35.6.2 max_entries
Once the cache contains the maximum number of objects you have specified, the oldest ones will be deleted to make way for new ones.
@st.cache_data(max_entries=10)
def your_function_here():
…
This also prevents your cache from becoming very large.
35.7 Further Caching Settings
Additional caching settings can be explored in the documentation.