Performance Issues with Large Data in window.history.state

TL;DR: Storing full collection data in window.history.state is expensive (serialization time, retained memory, and browser size limits). Prefer storing cursors/params and keep large data in an in-memory cache keyed by URL; use sessionStorage/IndexedDB if persistence is required.

I’ve found an interesting issue with the Shopify Hydrogen component because it saves all nodes on a page in window.history.state as you can see in this line. For the particular website I am working with at the moment, the collection page has quite a bunch of data. When I tested, showing about 100 products on the page meant having ~8MB for the serialized version of the object using JSON.stringify. Deserialized structures also incur engine overhead (object headers, indices). The Pagination component from Shopify that we use does many things, but what I want to emphasize is that it saves the collection data in window.history.state:

const N = 8 * 128000; // ~8.71MB
const data = Array.from({length: N}, (_, i) => (i + 1).toString());
const url = location.pathname + location.search;
window.history.replaceState({data}, '', url);

If we measure the time for JSON.stringify:

const start = performance.now();
const serialized = JSON.stringify(data);
const duration = performance.now() - start;

That takes about 13ms when I run it with node on an M4 machine. On Chrome and Safari, that seemed to be faster, 7–9ms. The common target is 60 fps (16.7ms per frame), so 7–13ms consumes a large portion of the frame budget. There is already significant contention for the main thread, so doing this on each navigation is quite expensive. Also, window.history.state is accessible to any script on the page. Third-party scripts (e.g., via Google Tag Manager) can read it and create strong references to those large objects; if they structured-clone it, memory usage multiplies. If other code holds references, the data stays retained, increasing memory pressure and acting like a leak.

The reasoning for saving the collection data in the history stack is so that you can easily navigate back. However, by opting into using window.history instead of saving elsewhere, that data can easily increase memory pressure and hinder performance. Of course, we wouldn’t have this problem if the product data were smaller, so simplifying it is an option, but that would not be an easy refactor at the current stage. Another concern due to the huge data structures comes from the MDN documentation:

Note: Some browsers save state objects to the user’s disk so they can be restored after the user restarts the browser, and impose a size limit on the serialized representation of a state object, and will throw an exception if you pass a state object whose serialized representation is larger than that size limit. So in cases where you want to ensure you have more space than what some browsers might impose, you’re encouraged to use sessionStorage and/or localStorage.

Mitigations

Store only what is needed for back/forward navigation: pagination cursor, page number, and query variables. Avoid serializing full nodes.
Keep large data in an in-memory cache keyed by URL (e.g., a module-level Map or a small LRU) with a max size/TTL. On navigation, look up by URL; on reload, re-fetch.
If persistence across reloads is needed, use sessionStorage or IndexedDB and put only a small key/token in history.state.
In Hydrogen specifically, ensure router/pagination components don’t serialize heavy props into history.

Example: minimal state in history with an in-memory cache

// In-memory page cache (resets on reload)
const pageCache = new Map(); // key: URL, value: rendered data or API response

function navigate(url, data) {
  // Keep big data out of history.state
  pageCache.set(url, data);
  history.replaceState({key: url}, '', url);
}

// On back/forward:
window.addEventListener('popstate', (e) => {
  const url = e.state?.key || location.pathname + location.search;
  const data = pageCache.get(url);
  if (data) {
    render(data);
  } else {
    fetchAndRender(url); // re-fetch if not cached
  }
});

I’m leaning toward a module-level cache with a TTL; I’ll test this approach and measure its impact on serialization time and memory retention.