'How can I get more information about Chrome's cache files? Or how does ChromeCacheView work?

I want to get some data from the browser's cache. Chrome's cache filename is like f_00001, which is meaningless. ChromeCacheView can obtain the request link corresponding to the cache file name.

ChromeCacheView is a small utility that reads the cache folder of Google Chrome Web browser, and displays the list of all files currently stored in the cache. For each cache file, the following information is displayed: URL, Content type, File size, Last accessed time, Expiration time, Server name, Server response, and more. You can easily select one or more items from the cache list, and then extract the files to another folder, or copy the URLs list to the clipboard.

But this is a GUI program that can only run on Windows. I want to know how it works.

In other words, how can I get more information about cached files, especially request links etc.



Solution 1:[1]

After my long search, I found the answer.

Instructions for Chrome disk cache format can be found on the following pages:

By reading these documents, we can implement parsers in arbitrary programming languages.

Fortunately, I found two python libraries to do this.

The first one doesn't seem to work correctly under Python3. The first one doesn't seem to work properly under Python3. And the second one is fantastic and does the job perfectly. About how to use pyhindsight, there are detailed instructions on the home page, I will introduce how to integrate it into our project.

import pyhindsight
from pyhindsight.analysis import AnalysisSession
import logging
import os

analysis_session = AnalysisSession()

cache_dir = '~\AppData\Local\Microsoft\Edge\User Data\Default'
analysis_session.input_path = cache_dir
analysis_session.cache_path = os.path.join(cache_dir, 'Cache\Cache_Data')
analysis_session.browser_type = 'Chrome'
analysis_session.no_copy = True
analysis_session.timezone = None


logging.basicConfig(filename=analysis_session.log_path, level=logging.FATAL,
                    format='%(asctime)s.%(msecs).03d | %(levelname).01s | %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S')
run_status = analysis_session.run()

for p in analysis_session.parsed_artifacts:
    if isinstance(p, pyhindsight.browsers.chrome.CacheEntry):
        print('Url: {}, Location: {}'.format(p.url, p.location))

That's all, please join it. Thanks for pyhindsight.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 J.W Kang