[OE-core] The state of reproducible Builds
Joshua Watt
jpewhacker at gmail.com
Tue Jul 2 14:13:01 UTC 2019
On 7/2/19 8:26 AM, Adrian Bunk wrote:
> On Mon, Jul 01, 2019 at 10:58:04AM -0500, Joshua Watt wrote:
>> ...
>> 1. HOSTTOOLS differences. There are a lot of tools listed in HOSTTOOLS, and
>> unfortunately some of them have version dependent output and are used for
>> target builds (the one I've currently stumbled upon is pod2man, but I'm sure
>> there are others). Unfortunately, one could probably argue that HOSTTOOLS is
>> somewhat antithetical to the above statement, at least in regard to target
>> builds. Any host tool output that "leaks" into the target build output can
>> result in a non-reproducible build across hosts, and possibly should be
>> avoided; the alternative is to use (or mandate) the corresponding -native
>> recipe that provides that tool as a DEPENDS so that the controlled
>> internally built version is used instead. Note that this only really applies
>> target builds, not -native (or nativesdk right now). -native recipes would
>> obviously need more HOSTTOOLS to help bootstrap the system. I suspect this
>> would require reworking how HOSTOOLS works so that they can be split into
>> two categories somehow; the tools that have "ubiquitous and stable"
>> interfaces and are fine for all recipes (e.g. cat, sed, true, rm, etc.) and
>> those that are variable and should only be used for -native builds (e.g.
>> pod2man, rpcgen(?), chrpath(?), tar(?)... others?). Anyone have thoughts on
>> this?
>> ...
> What is the goal?
>
> 1. being able to prove that a given binary has actually been
> built from the correct sources, or
> 2. builds on all hosts have the same output
I'm not sure there is just one goal...
> With 1. you can just record all host properties like installed packages
> and running kernel, and it isn't a problem if different hosts result in
> different output.
Right... I know that my employer would really like this sort of binary
reproducibility; that is we should be able to pull some archived code
out of our salt mine, build it, and know its the same binary that our
customers have. I think if you combine what we have today and some sort
of reproducible host image (archived Docker container, virtual machine,
et al.) we are pretty close to that
>
> With 2. any kind of differences due to host differences is a problem.
> You need -native for nearly everything, and then fix all other kinds of
> differences like the version of the running kernel recorded somewhere.
Yes. I would hope that after using mostly -native tools where
applicable, the currently running kernel wouldn't figure into the build
of target packages... if it does I would venture to say that is a
cross-compiling/reproducibility bug in the package.
Also, to be clear, I'm hoping we don't need to go so far as to say that
-native recipes need to necessarily be reproducible; as long as they
always generate reproducible output regardless of which host they were
built on I suspect they don't need to be.
>
> For detecting malicous binaries not built from the claimed sources 1. is
> sufficient. For distributions like Debian that build natively this is
> even the only option available since the host compiler is used.
>
> Doing 2. would of course be more desirable, but it can also be done in
> a second step after all issues related to building on exactly the same
> host have been sorted out.
I think there are also other use cases for #2 besides detecting
malicious binaries/source code, such as hash equivalence, or even being
able use sstate when making a reproducible build. You are correct that
this can be done in a second step, but I think that everyone needs to be
aware of the limitations that will present when #2 is not present (the
main one being that you probably can't make a reproducible build if you
use sstate).
>
>> Joshua Watt
>> ...
> cu
> Adrian
>
More information about the Openembedded-core
mailing list