'Dockerfile RUN layers vs script
Docker version 19.03.12, build 48a66213fe
So in a dockerfile, if I have the following lines:
RUN yum install aaa \
bbb \
ccc && \
<some cmd> && \
<etc> && \
<some cleanup>
is that a best practice? Should I keep yum part separate than when I call other <commands/scripts>?
If I want a cleaner (vs traceable) Dockerfile, what if I put those lines in a .sh script can just call that script (i.e. COPY followed by a RUN statement). Will the build step run each time, even though nothing is changes inside .sh script**?** Looking for some gotchas here.
I'm thinking, whatever packages are stable, have a separate RUN <those packages> i.e. in one layer and lines which depend upon / change frequently i.e. may use user-defined (docker build time CLI level args) keep those in separate RUN layer (so I can use layer cache effectively).
Wondering if you think keeping a cleaner Dockerfile (calling RUN some.sh) would be less efficient than a traceable Dockerfile (where everything is listed in Dockerfile what makes that image).
Thanks.
Solution 1:[1]
I guess the question is somewhat opinion based.
It depends on what you are after. It's ultimately a tradeoff between development experience and an optimized image.
If you put everything in on RUN instruction, you are reducing the number of layers and therefore the image size to some degree. Also, each layer is stored in the registry, so pushing and pulling would get more time-consuming and expensive. On the other hand, it means that each small change causes everything in the RUN instruction to run again, as it invalidates the cache for that single layer.
If you are creating temporary files with a RUN instruction that are removed by a later RUN instruction, then it would be better to run both commands in a single instruction to not create a layer with temporary files.
For a production image, I would opt for a single RUN instruction as optimization is more important than build speed and caching, IMO. If you can, you could also use multi staging, where the first stage uses an individual RUN instruction to utilize the layer caching. In the second stage, some artefacts from the first stage are taken and the number of layers is aggressively kept at a minimum. Only the final stage will be pushed and pulled from a registry.
For example, in the below image, the builder stage is using more instructions than strictly required to gain better caching. Even The template file is copied into the first stage, even though it's not used at all there, since it's only read and used at runtime. But this way the final stage can get the output binary and the template with a single COPY instruction.
FROM golang as builder
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY *.go /src/
RUN mkdir -p /dist/templates
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o /dist/run .
COPY haproxy.cfg.template /dist/templates/
FROM alpine
WORKDIR /mananger
COPY --from=builder /dist ./
ENTRYPOINT ["./run"]
In terms of script vs RUN instruction, I think it is more idiomatic to use a RUN instruction and concatenate multiple commands with the double ampersand &&. If things get very complex, then it may be better to use a dedicated script to make better use of shell syntax/features. It depends on what you are doing there.
Will the build step run each time, even though nothing is changes inside .sh script**?**
The build step would only run once and get cached. As long as the content of the script would not change, docker would use the cached layer. You need to get the file somehow into the image to run beforehand, so I guess the real cache invalidation would already happen in the COPY instruction, if the file has changed.
As mentioned in the previous paragraph, using a script will cost you at minium 1 COPY or ADD instruction more, introducing an additional layer that could have been avoided, if a RUN instruction had been used.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
