[OE-core] [oe] kernel.bbclass: Fix do_shared_workdir task ordering

Jens Rehsack rehsack at gmail.com
Wed Nov 11 14:59:52 UTC 2015


> Am 11.11.2015 um 13:49 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
> 
> On Wed, Nov 11, 2015 at 4:00 AM, Jens Rehsack <rehsack at gmail.com> wrote:
>> 
>>> Am 11.11.2015 um 03:01 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
>>> 
>>> On Tue, Nov 10, 2015 at 4:33 AM, Jens Rehsack <rehsack at gmail.com> wrote:
>>>> 
>>>>> Am 14.10.2015 um 21:48 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
>>>>> 
>>>>> On Wed, Oct 14, 2015 at 3:30 PM, S. Lockwood-Childs <sjl at vctlabs.com> wrote:
>>>>>> http://patchwork.openembedded.org/patch/99875/
>>>>>> 
>>>>>> Apparently this patch is still not in master, and I just ran across the
>>>>>> problem with an externally built module (omaplfb from omap3-sgx-modules)
>>>>>> in meta-ti layer.
>>>>>> 
>>>>>> What's the plan for getting a correct Module.symver into shared_workdir
>>>>>> for external modules to build against? Above patch, or does someone have
>>>>>> an even better idea?
>>>>> 
>>>>> Richard and I sync'd on this while in Dublin @ ELCe, and the changes
>>>>> aren't missing
>>>>> from master by mistake .. but more because we are still working to come up with
>>>>> a comprehensive solution (tracked in bugzilla).
>>>>> 
>>>>> The solution is pretty much what I described before, we are balancing
>>>>> applications
>>>>> and tasks that do not need kernel modules to be built, versus external modules
>>>>> that depend on symbols from other modules. The devil is in the
>>>>> details, and getting
>>>>> a non-racy, task locked solution that allows the recipe writer to
>>>>> explicitly decide
>>>>> whether they need modules built or not .. attempts at detecting the
>>>>> need, or forcing
>>>>> a one size fits all solution have all lead to dead ends.
>>>>> 
>>>>> Since we are close to a release point, I'm still working on this out
>>>>> of the tree, and
>>>>> will propose some changes when the tree looks stable.
>>>>> 
>>>>> For now, you can carry the patch locally, or you can append to the kernel module
>>>>> compilation task and do a second copy of the symvers file to the share
>>>>> directory.
>>>>> 
>>>>> i.e. a variant of this:
>>>>> http://patchwork.openembedded.org/patch/94891/, done in a
>>>>> bbappend versus the class.
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Bruce
>>>> 
>>>> This is kind of insane to try a fix duplicating a job in a probably wrong way
>>>> (even a tiny) because of performance issue.
>>> 
>>> I have a fix already queued for this,
>> 
>> I've seen the fix you referred at http://patchwork.openembedded.org/patch/94891/,
>> this is broken.
> 
> No that isn't it. But also, no, that isn't broken. It works, but there is a
> potential for a race, which is what we've been fixing.
> 
> Can you elaborate on why you'd declare that broken ?

Because you fix one race condition by introducing another one. If there is
another word than broken for it, I'm happy to learn that,

>> 
>>> but I'm currently out of the office ..
>>> I swear, every time I go on vacation, someone brings this up.
>>> 
>>> I've been waiting for the smoke to clear on the 2.0 release before
>>> posting it .. so please, just a bit more patience and I'll send out
>>> that series.
>> 
>> Maybe giving us an impression whether it's correct or just work for you(tm) :P
> 
> For everyone. Richard wouldn't exactly take my series and kernel
> updates if they were only just for me. :)

I didn't talk about your work for the kernel, I'm just talking about your
patch to fix initially mentioned race condition.

>>>> Any 3rd party kernel module depends on do_shared_workdir - so do_shared_workdir
>>>> must wait for do_compile_kernelmodules - that's it by design (build is done
>>>> relying on dependencies ordered in a directed, noncyclical graph.
>>>> 
>>>> Since compile_kernelmodules is between compile and strip, I vote for
>>>> 
>>>> $ git diff
>>>> diff --git a/meta/classes/kernel.bbclass b/meta/classes/kernel.bbclass
>>>> index 5e8b6cf..49d7561 100644
>>>> --- a/meta/classes/kernel.bbclass
>>>> +++ b/meta/classes/kernel.bbclass
>>>> @@ -253,7 +253,7 @@ kernel_do_install() {
>>>> }
>>>> do_install[prefuncs] += "package_get_auto_pr"
>>>> 
>>>> -addtask shared_workdir after do_compile before do_compile_kernelmodules
>>>> +addtask shared_workdir after do_compile_kernelmodules before do_strip
>>>> addtask shared_workdir_setscene
>>>> 
>>>> do_shared_workdir_setscene () {
>>>> 
>>>> But that's surely kind of smell, whether before do_strip and do_install is
>>>> preferred. Mandatory is, that do_shared_workdir must not be processed before
>>>> do_compile_kernelmodules finishes.
>>>> 
>>>> There is no way to avoid it, and if those 5 seconds slow down your build,
>>>> there is probably another thing which should be fixed.
>>> 
>>> It's not 5 seconds. It is much more on some machines and configurations.
>> 
>> Depends on your build machine and much more, your sstate cache,
>> changes to kernel and why you do a full build after each kernel
>> change instead of just deploying the kernel.
> 
> Sure. I've been at this for years now .. and I've seen and used lots
> of configs. We are trying to give a choice on what level of building
> triggers.
> 
>> 
>>> The point is that not everyone who is building modules depends on
>>> other modules and not everyone that is building against the kernel
>>> may even want modules built.
>> 
>> That's only build time. How often do you recompile your kernel that
>> this dependency really matters?
> 
> I rebuild the kernel a lot .. I'm doing kernel and kernel tools maintenance
> after all :)

Understood, so your primary tooling becomes slower.

>>> The fix is to just re-copy the symbols after kernel modules are
>>> built, and put the onus on the recipe writer to depend on the
>>> right task/variant. By default, do_shared_workdir won't have that
>>> dependency, but anyone with a recipe that does depend on other
>>> module symbols will get that extra copy and dependency created.
>> 
>> This is simply wrong, period. Because (I already explained that)
>> module-base.bbclass adds a configure-stage dependency to do_shared_workdir.
>> 
> 
> And I explained that I didn't need that explanation.

I'm sorry - but I didn't got that. I still don't get that you
don't need that explanation. Even if you understand more and deeper
what's going on when building kernel and tools and 3rd party
modules around that - several people running into that race condition
and you disagree that moving the race keeps the thing broken.

> There's really no need to escalate your language. It isn't needed.
> 
>> You can avoid the mistake by adding a sane stage for redo_shared_workdir
>> after do_compile_kernelmodules before do_strip and reassign the 3rd
>> party module bbclass to depend on redo_shared_workdir.
>> 
>> OTOH I think, build performance isn't worth a however sophisticated
>> fragile patch. Let's apply Stefan's patch and be my guest for sending
>> an performance optimized one when you figured it out.
>> 
>> @Richard - isn't it worth use the correct, even if slow, patch and
>> let Bruce anytime in not to distant future provide an optimized one
>> which can be settled and backported for 2.0.1?
>> 
>> Isn't release time soon enough?
>> 
>> @Stefan - would you agree to "addtask shared_workdir after do_compile_kernelmodules before do_strip"
>> instead of "addtask shared_workdir after do_compile_kernelmodules before do_install"?
>> 
> 
> I'm checking out of the conversation, since I'm on vacation and shouldn't
> be near a computer.
> 
> I indicated months ago that RP has the final say, and he can merge any
> variants of the patches he wants, since they don't break anything and
> fix the problem.
> 
> RP: I'll ack either patch that has been posted. I'll will revisit all of this
> when I'm working on my 2.1 kernel packaging changes anyway, so
> there's no need to wait.

Great! Thanks, Bruce.

Cheers
-- 
Jens Rehsack - rehsack at gmail.com




More information about the Openembedded-core mailing list