'Is there a way to determine the docker cache SHA for a COPY?

We're running a CI system that maintains several copies of the same Git repo in different directories, each tied to a CI agent. When the agent receives a job it checks out a specific commit and runs docker build. There are a few COPY steps in the Gemfile:

COPY Gemfile* /tmp/
COPY engines /tmp/engines
COPY gems /tmp/gems
COPY .npmrc /tmp/
COPY package.json /tmp/
COPY yarn.lock /tmp/

I expect that when a Gemfile/Gemfile.lock change happens in Git, the first time a job runs on this host afterwards it will checkout the new code, run docker build and that will cause a cache miss at this point because the file content has changed. I expect that if no subsequent change is made to these files then every subsequent docker build will use the cache layer from this first build. This is not what I see in practice, and I cannot tell why.

Here is an example from an agent's working directory that has been used today. Note, the last change to these files is Thursday, March 29th.

git log --pretty=format:"%h%x09%ad%x09" -n 1
228c1c2ac24     Fri Apr 1 11:30:38 2022 +0100

docker build . -f .buildkite/Dockerfile
Sending build context to Docker daemon  57.18MB
…
Step 18/30 : COPY Gemfile* /tmp/
  ---> Using cache
  ---> fc4c1b01ea8e

Here's what happened when I went into a less used agent's working directory (hasn't been used since Fri Mar 25):

git log --pretty=format:"%h%x09%ad%x09" -n 1
28725e2cffa     Fri Mar 25 14:32:03 2022 +0000

git fetch origin master
git checkout 228c1c2ac24
Previous HEAD position was 28725e2cffa
HEAD is now at 228c1c2ac24 

docker build . -f .buildkite/Dockerfile
Sending build context to Docker daemon  57.18MB
...
Step 18/30 : COPY Gemfile* /tmp/
 ---> 91d2717fa628

I'm using Docker 20.10.7

Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: 84113eef6fc27af1b01b3181f31bbaf708715301
 init version: de40ad0

I don't expect the cache layer ID to have changed and I expected to use fc4c1b01ea8e in my second build as both of these two working directories have the exact same code now.

docker inspect fc4c1b01ea8e | jq ".[0].ContainerConfig.Cmd"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY multi:70a6a1cd3b4a3148dc355928fd84fdee48b69fc60984dd7769f9365f0c2880b3 in /tmp/ "
]

docker inspect 91d2717fa628 | jq ".[0].ContainerConfig.Cmd"
[
  "/bin/sh",
  "-c",
  "#(nop) COPY multi:b4b55469c26126e9daaa9d6b37ab6d3a729686134529b6e6b5302383a013a334 in /tmp/ "
]

Even when I pull out the UpperDir from each of these two layers and diff the folders there is no differences highlighted.

I thought it was mtime, atime or ctime or some other metadata but I'm unable to ascertain if this is correct. I know that touching the files does not cause any cache miss.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source