35 Reducing Load Time with Caching

We often want to load data files into Streamlit.

There are two main issues we might run into

Each time you rerun something in your app (e.g. filtering the dataframe by using a drop-down select box), then the code to load the dataframe gets run again too. This can make your app feel really sluggish with larger data files
When we reach the stage of deploying our app to the web, we have to be a bit more conscious of how much memory our app is using when being accessed by multiple people simultaneously By default, the data will be loaded in separately for each user - even though it’s identical This can quickly lead to memory requirements ballooning (and your app crashing!)

By using the @st.cache_data decorator, Streamlit will intelligently handle the loading in of datasets to minimize unnecessary reloads and memory use.

To switch from doing a standard data import to using caching, we

Turn our data import into a function that returns the data
Add the @st.cache_data decorator directly above the function
Call the function in our code to load the data in, assigning the output (the dataframe) to a variable

We can then just use our dataframe like normal - Streamlit will handle the awkward bits.

35.1 A Caching Code Example

35.1.1 Without Caching

import pandas as pd
import streamlit as st

athlete_statistics = pd.read_csv("athlete_details_eventwise.csv")

st.dataframe(athlete_statistics)

35.1.2 Loading in the Same Dataset with Caching

import pandas as pd
import streamlit as st

@st.cache_data
def load_data():
  return pd.read_csv("athlete_details_eventwise.csv")

athlete_statistics = load_data()

st.dataframe(athlete_statistics)

35.2 How Does Caching Affect the App?

When a dataset is cached, this means it will almost be excluded from the top-to-bottom running of the app.

If the version of the cache is deemed to be valid, then rather than loading the dataset in again from scratch and using memory on your web host to do so, it will use the version already in the cache.

This will make your app feel far more responsive - when actions that trigger a rerun, like changing the value of a slider or the value entered in a text/numeric input, as the data reload isn’t triggered, subsequent changes to the dataset will feel much quicker.

Here is an example of the same app with and without caching so you can compare performance.

Note

In this example, the performance improvement is very minor due to the small size of the dataset. With larger datasets with tens of thousands of rows or geodataframes that often contain complex information of 20+ mb, then the performance difference is significantly bigger.

35.2.1 Without Caching

Click here to load the app in a new page

35.2.2 With Caching

Click here to load the app in a new page

35.3 Caching Other Steps

If you have certain actions that take a long time, then you may wish to cache those too.

For example, if you had a long-running data cleaning step that you ran when the data was loaded into the app, then you could cache that as well.

import pandas as pd
import streamlit as st

@st.cache_data
def load_data():
  return pd.read_csv("athlete_details_eventwise.csv")

athlete_statistics = load_data()

@st.cache_data
def clean_data():

  my_clean_dataframe = athlete_statistics ... ## long-running code here...

  return my_clean_dataframe

clean_athlete_statistics = clean_data()

st.dataframe(clean_athlete_statistics)

35.4 Caching Types: Data vs Resources

Caching can also be used for other large files - for example, if you wanted to load in a trained machine learning model that’s the same for all users who will interact with your app.

It’s not always obvious which to use, so head to the Streamlit documentation if you’re a bit unsure.

35.5 Caching Functions with Parameters

While using caching for the initial load of a dataframe, it can also make sense to use it for other long-running functions.

Like normal functions, functions for caching can accept parameters.

Streamlit will be able to look at the parameters passed in and tell whether it should

use a cached version of the data (because the parameters are the same as a previous instance that’s already in its cache)
run the function (because it’s a new set of parameters that it doesn’t already have a saved output for in its cache)
There are a few complexities around parameters that you may want to look into if using this in your own app: https://docs.streamlit.io/develop/concepts/architecture/caching#excluding-input-parameters

35.6 Caching settings

When you choose to cache something, you can add in some additional settings to make the cache as effective as possible for your particular situation.

35.6.1 TTL (time to live)

TTL determines how long to keep cached data before rerunning it.

For example, if data is likely to change over time, you could set the ttl parameter to set the number of seconds to keep cached data for.

@st.cache_data(ttl=3600)
def your_function_here():
    …

This also prevents your cache from becoming very large. Large caches could exceed the memory limits on your host.

35.6.2 max_entries

Once the cache contains the maximum number of objects you have specified, the oldest ones will be deleted to make way for new ones.

@st.cache_data(max_entries=10)
def your_function_here():
    …

This also prevents your cache from becoming very large.

35.7 Further Caching Settings

Additional caching settings can be explored in the documentation.