'Is there a way to determine the docker cache SHA for a COPY?
We're running a CI system that maintains several copies of the same Git repo in different directories, each tied to a CI agent. When the agent receives a job it checks out a specific commit and runs docker build. There are a few COPY steps in the Gemfile:
COPY Gemfile* /tmp/
COPY engines /tmp/engines
COPY gems /tmp/gems
COPY .npmrc /tmp/
COPY package.json /tmp/
COPY yarn.lock /tmp/
I expect that when a Gemfile/Gemfile.lock change happens in Git, the first time a job runs on this host afterwards it will checkout the new code, run docker build and that will cause a cache miss at this point because the file content has changed. I expect that if no subsequent change is made to these files then every subsequent docker build will use the cache layer from this first build. This is not what I see in practice, and I cannot tell why.
Here is an example from an agent's working directory that has been used today. Note, the last change to these files is Thursday, March 29th.
git log --pretty=format:"%h%x09%ad%x09" -n 1
228c1c2ac24 Fri Apr 1 11:30:38 2022 +0100
docker build . -f .buildkite/Dockerfile
Sending build context to Docker daemon 57.18MB
…
Step 18/30 : COPY Gemfile* /tmp/
---> Using cache
---> fc4c1b01ea8e
Here's what happened when I went into a less used agent's working directory (hasn't been used since Fri Mar 25):
git log --pretty=format:"%h%x09%ad%x09" -n 1
28725e2cffa Fri Mar 25 14:32:03 2022 +0000
git fetch origin master
git checkout 228c1c2ac24
Previous HEAD position was 28725e2cffa
HEAD is now at 228c1c2ac24
docker build . -f .buildkite/Dockerfile
Sending build context to Docker daemon 57.18MB
...
Step 18/30 : COPY Gemfile* /tmp/
---> 91d2717fa628
I'm using Docker 20.10.7
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: 84113eef6fc27af1b01b3181f31bbaf708715301
init version: de40ad0
I don't expect the cache layer ID to have changed and I expected to use fc4c1b01ea8e in my second build as both of these two working directories have the exact same code now.
docker inspect fc4c1b01ea8e | jq ".[0].ContainerConfig.Cmd"
[
"/bin/sh",
"-c",
"#(nop) COPY multi:70a6a1cd3b4a3148dc355928fd84fdee48b69fc60984dd7769f9365f0c2880b3 in /tmp/ "
]
docker inspect 91d2717fa628 | jq ".[0].ContainerConfig.Cmd"
[
"/bin/sh",
"-c",
"#(nop) COPY multi:b4b55469c26126e9daaa9d6b37ab6d3a729686134529b6e6b5302383a013a334 in /tmp/ "
]
Even when I pull out the UpperDir from each of these two layers and diff the folders there is no differences highlighted.
I thought it was mtime, atime or ctime or some other metadata but I'm unable to ascertain if this is correct. I know that touching the files does not cause any cache miss.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
