'memoize to disk - python - persistent memoization
Is there a way to memoize the output of a function to disk?
I have a function
def getHtmlOfUrl(url):
... # expensive computation
and would like to do something like:
def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")
and then call getHtmlMemoized(url), so as to do the expensive computation only once for each url.
Solution 1:[1]
Check out joblib.Memory. It's a library for doing exactly that.
from joblib import Memory
memory = Memory("cachedir")
@memory.cache
def f(x):
print('Running f(%s)' % x)
return x
Solution 2:[2]
There is also diskcache.
from diskcache import Cache
cache = Cache("cachedir")
@cache.memoize()
def f(x, y):
print('Running f({}, {})'.format(x, y))
return x, y
Solution 3:[3]
A cleaner solution powered by Python's Shelve module. The advantage is the cache gets updated in real time via well-known dict syntax, also it's exception proof(no need to handle annoying KeyError).
import shelve
def shelve_it(file_name):
d = shelve.open(file_name)
def decorator(func):
def new_func(param):
if param not in d:
d[param] = func(param)
return d[param]
return new_func
return decorator
@shelve_it('cache.shelve')
def expensive_funcion(param):
pass
This will facilitate the function to be computed just once. Next subsequent calls will return the stored result.
Solution 4:[4]
The Artemis library has a module for this. (you'll need to pip install artemis-ml)
You decorate your function:
from artemis.fileman.disk_memoize import memoize_to_disk
@memoize_to_disk
def fcn(a, b, c = None):
results = ...
return results
Internally, it makes a hash out of input arguments and saves memo-files by this hash.
Solution 5:[5]
Check out Cachier. It supports additional cache configuration parameters like TTL etc.
Simple example:
from cachier import cachier
import datetime
@cachier(stale_after=datetime.timedelta(days=3))
def foo(arg1, arg2):
"""foo now has a persistent cache, trigerring recalculation for values stored more than 3 days."""
return {'arg1': arg1, 'arg2': arg2}
Solution 6:[6]
Something like this should do:
import json
class Memoize(object):
def __init__(self, func):
self.func = func
self.memo = {}
def load_memo(filename):
with open(filename) as f:
self.memo.update(json.load(f))
def save_memo(filename):
with open(filename, 'w') as f:
json.dump(self.memo, f)
def __call__(self, *args):
if not args in self.memo:
self.memo[args] = self.func(*args)
return self.memo[args]
Basic usage:
your_mem_func = Memoize(your_func)
your_mem_func.load_memo('yourdata.json')
# do your stuff with your_mem_func
If you want to write your "cache" to a file after using it -- to be loaded again in the future:
your_mem_func.save_memo('yournewdata.json')
Solution 7:[7]
Assuming that you data is json serializable, this code should work
import os, json
def json_file(fname):
def decorator(function):
def wrapper(*args, **kwargs):
if os.path.isfile(fname):
with open(fname, 'r') as f:
ret = json.load(f)
else:
with open(fname, 'w') as f:
ret = function(*args, **kwargs)
json.dump(ret, f)
return ret
return wrapper
return decorator
decorate getHtmlOfUrl and then simply call it, if it had been run previously, you will get your cached data.
Checked with python 2.x and python 3.x
Solution 8:[8]
You can use the cache_to_disk package:
from cache_to_disk import cache_to_disk
@cache_to_disk(3)
def my_func(a, b, c, d=None):
results = ...
return results
This will cache the results for 3 days, specific to the arguments a, b, c and d. The results are stored in a pickle file on your machine, and unpickled and returned next time the function is called. After 3 days, the pickle file is deleted until the function is re-run. The function will be re-run whenever the function is called with new arguments. More info here: https://github.com/sarenehan/cache_to_disk
Solution 9:[9]
Most answers are in a decorator fashion. But maybe I don't want to cache the result every time when calling the function.
I made one solution using context manager, so the function can be called as
with DiskCacher('cache_id', myfunc) as myfunc2:
res=myfunc2(...)
when you need the caching functionality.
The 'cache_id' string is used to distinguish data files, which are named [calling_script]_[cache_id].dat. So if you are doing this in a loop, will need to incorporate the looping variable into this cache_id, otherwise data will be overwritten.
Alternatively:
myfunc2=DiskCacher('cache_id')(myfunc)
res=myfunc2(...)
Alternatively (this is probably not quite useful as the same id is used all time time):
@DiskCacher('cache_id')
def myfunc(*args):
...
The complete code with examples (I'm using pickle to save/load, but can be changed to whatever save/read methods. NOTE that this is also assuming the function in question returns only 1 return value):
from __future__ import print_function
import sys, os
import functools
def formFilename(folder, varid):
'''Compose abspath for cache file
Args:
folder (str): cache folder path.
varid (str): variable id to form file name and used as variable id.
Returns:
abpath (str): abspath for cache file, which is using the <folder>
as folder. The file name is the format:
[script_file]_[varid].dat
'''
script_file=os.path.splitext(sys.argv[0])[0]
name='[%s]_[%s].nc' %(script_file, varid)
abpath=os.path.join(folder, name)
return abpath
def readCache(folder, varid, verbose=True):
'''Read cached data
Args:
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
Returns:
results (tuple): a tuple containing data read in from cached file(s).
'''
import pickle
abpath_in=formFilename(folder, varid)
if os.path.exists(abpath_in):
if verbose:
print('\n# <readCache>: Read in variable', varid,
'from disk cache:\n', abpath_in)
with open(abpath_in, 'rb') as fin:
results=pickle.load(fin)
return results
def writeCache(results, folder, varid, verbose=True):
'''Write data to disk cache
Args:
results (tuple): a tuple containing data read to cache.
folder (str): cache folder path.
varid (str): variable id.
Keyword Args:
verbose (bool): whether to print some text info.
'''
import pickle
abpath_out=formFilename(folder, varid)
if verbose:
print('\n# <writeCache>: Saving output to:\n',abpath_out)
with open(abpath_out, 'wb') as fout:
pickle.dump(results, fout)
return
class DiskCacher(object):
def __init__(self, varid, func=None, folder=None, overwrite=False,
verbose=True):
'''Disk cache context manager
Args:
varid (str): string id used to save cache.
function <func> is assumed to return only 1 return value.
Keyword Args:
func (callable): function object whose return values are to be
cached.
folder (str or None): cache folder path. If None, use a default.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
'''
if folder is None:
self.folder='/tmp/cache/'
else:
self.folder=folder
self.func=func
self.varid=varid
self.overwrite=overwrite
self.verbose=verbose
def __enter__(self):
if self.func is None:
raise Exception("Need to provide a callable function to __init__() when used as context manager.")
return _Cache2Disk(self.func, self.varid, self.folder,
self.overwrite, self.verbose)
def __exit__(self, type, value, traceback):
return
def __call__(self, func=None):
_func=func or self.func
return _Cache2Disk(_func, self.varid, self.folder, self.overwrite,
self.verbose)
def _Cache2Disk(func, varid, folder, overwrite, verbose):
'''Inner decorator function
Args:
func (callable): function object whose return values are to be
cached.
varid (str): variable id.
folder (str): cache folder path.
overwrite (bool): whether to force a new computation or not.
verbose (bool): whether to print some text info.
Returns:
decorated function: if cache exists, the function is <readCache>
which will read cached data from disk. If needs to recompute,
the function is wrapped that the return values are saved to disk
before returning.
'''
def decorator_func(func):
abpath_in=formFilename(folder, varid)
@functools.wraps(func)
def wrapper(*args, **kwargs):
if os.path.exists(abpath_in) and not overwrite:
results=readCache(folder, varid, verbose)
else:
results=func(*args, **kwargs)
if not os.path.exists(folder):
os.makedirs(folder)
writeCache(results, folder, varid, verbose)
return results
return wrapper
return decorator_func(func)
if __name__=='__main__':
data=range(10) # dummy data
#--------------Use as context manager--------------
def func1(data, n):
'''dummy function'''
results=[i*n for i in data]
return results
print('\n### Context manager, 1st time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 2nd time call')
with DiskCacher('context_mananger', func1) as func1b:
res=func1b(data, 10)
print('res =', res)
print('\n### Context manager, 3rd time call with overwrite=True')
with DiskCacher('context_mananger', func1, overwrite=True) as func1b:
res=func1b(data, 10)
print('res =', res)
#--------------Return a new function--------------
def func2(data, n):
results=[i*n for i in data]
return results
print('\n### Wrap a new function, 1st time call')
func2b=DiskCacher('new_func')(func2)
res=func2b(data, 10)
print('res =', res)
print('\n### Wrap a new function, 2nd time call')
res=func2b(data, 10)
print('res =', res)
#----Decorate a function using the syntax sugar----
@DiskCacher('pie_dec')
def func3(data, n):
results=[i*n for i in data]
return results
print('\n### pie decorator, 1st time call')
res=func3(data, 10)
print('res =', res)
print('\n### pie decorator, 2nd time call.')
res=func3(data, 10)
print('res =', res)
The outputs:
### Context manager, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 2nd time call
# <readCache>: Read in variable context_mananger from disk cache:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Context manager, 3rd time call with overwrite=True
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[context_mananger].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### Wrap a new function, 2nd time call
# <readCache>: Read in variable new_func from disk cache:
/tmp/cache/[diskcache]_[new_func].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 1st time call
# <writeCache>: Saving output to:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
### pie decorator, 2nd time call.
# <readCache>: Read in variable pie_dec from disk cache:
/tmp/cache/[diskcache]_[pie_dec].nc
res = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Colonel Panic |
| Solution 2 | C. Yduqoli |
| Solution 3 | Zitrax |
| Solution 4 | |
| Solution 5 | thegreendroid |
| Solution 6 | |
| Solution 7 | Uri Goren |
| Solution 8 | Stewart Renehan |
| Solution 9 |
