[OE-core] [oe] kernel.bbclass: Fix do_shared_workdir task ordering
Bruce Ashfield
bruce.ashfield at gmail.com
Wed Nov 11 22:59:09 UTC 2015
On Wed, Nov 11, 2015 at 9:59 AM, Jens Rehsack <rehsack at gmail.com> wrote:
>
>> Am 11.11.2015 um 13:49 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
>>
>> On Wed, Nov 11, 2015 at 4:00 AM, Jens Rehsack <rehsack at gmail.com> wrote:
>>>
>>>> Am 11.11.2015 um 03:01 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
>>>>
>>>> On Tue, Nov 10, 2015 at 4:33 AM, Jens Rehsack <rehsack at gmail.com> wrote:
>>>>>
>>>>>> Am 14.10.2015 um 21:48 schrieb Bruce Ashfield <bruce.ashfield at gmail.com>:
>>>>>>
>>>>>> On Wed, Oct 14, 2015 at 3:30 PM, S. Lockwood-Childs <sjl at vctlabs.com> wrote:
>>>>>>> http://patchwork.openembedded.org/patch/99875/
>>>>>>>
>>>>>>> Apparently this patch is still not in master, and I just ran across the
>>>>>>> problem with an externally built module (omaplfb from omap3-sgx-modules)
>>>>>>> in meta-ti layer.
>>>>>>>
>>>>>>> What's the plan for getting a correct Module.symver into shared_workdir
>>>>>>> for external modules to build against? Above patch, or does someone have
>>>>>>> an even better idea?
>>>>>>
>>>>>> Richard and I sync'd on this while in Dublin @ ELCe, and the changes
>>>>>> aren't missing
>>>>>> from master by mistake .. but more because we are still working to come up with
>>>>>> a comprehensive solution (tracked in bugzilla).
>>>>>>
>>>>>> The solution is pretty much what I described before, we are balancing
>>>>>> applications
>>>>>> and tasks that do not need kernel modules to be built, versus external modules
>>>>>> that depend on symbols from other modules. The devil is in the
>>>>>> details, and getting
>>>>>> a non-racy, task locked solution that allows the recipe writer to
>>>>>> explicitly decide
>>>>>> whether they need modules built or not .. attempts at detecting the
>>>>>> need, or forcing
>>>>>> a one size fits all solution have all lead to dead ends.
>>>>>>
>>>>>> Since we are close to a release point, I'm still working on this out
>>>>>> of the tree, and
>>>>>> will propose some changes when the tree looks stable.
>>>>>>
>>>>>> For now, you can carry the patch locally, or you can append to the kernel module
>>>>>> compilation task and do a second copy of the symvers file to the share
>>>>>> directory.
>>>>>>
>>>>>> i.e. a variant of this:
>>>>>> http://patchwork.openembedded.org/patch/94891/, done in a
>>>>>> bbappend versus the class.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Bruce
>>>>>
>>>>> This is kind of insane to try a fix duplicating a job in a probably wrong way
>>>>> (even a tiny) because of performance issue.
>>>>
>>>> I have a fix already queued for this,
>>>
>>> I've seen the fix you referred at http://patchwork.openembedded.org/patch/94891/,
>>> this is broken.
>>
>> No that isn't it. But also, no, that isn't broken. It works, but there is a
>> potential for a race, which is what we've been fixing.
>>
>> Can you elaborate on why you'd declare that broken ?
>
> Because you fix one race condition by introducing another one. If there is
> another word than broken for it, I'm happy to learn that,
well no. I was pointing to that fix as to what inspired what I'm working
on, not as 'the fix'. In that thread itself, we comment that it was a race
condition .. so that has been clear for a while. That doesn't make it
broken, it makes it incomplete :)
>
>>>
>>>> but I'm currently out of the office ..
>>>> I swear, every time I go on vacation, someone brings this up.
>>>>
>>>> I've been waiting for the smoke to clear on the 2.0 release before
>>>> posting it .. so please, just a bit more patience and I'll send out
>>>> that series.
>>>
>>> Maybe giving us an impression whether it's correct or just work for you(tm) :P
>>
>> For everyone. Richard wouldn't exactly take my series and kernel
>> updates if they were only just for me. :)
>
> I didn't talk about your work for the kernel, I'm just talking about your
> patch to fix initially mentioned race condition.
>
>>>>> Any 3rd party kernel module depends on do_shared_workdir - so do_shared_workdir
>>>>> must wait for do_compile_kernelmodules - that's it by design (build is done
>>>>> relying on dependencies ordered in a directed, noncyclical graph.
>>>>>
>>>>> Since compile_kernelmodules is between compile and strip, I vote for
>>>>>
>>>>> $ git diff
>>>>> diff --git a/meta/classes/kernel.bbclass b/meta/classes/kernel.bbclass
>>>>> index 5e8b6cf..49d7561 100644
>>>>> --- a/meta/classes/kernel.bbclass
>>>>> +++ b/meta/classes/kernel.bbclass
>>>>> @@ -253,7 +253,7 @@ kernel_do_install() {
>>>>> }
>>>>> do_install[prefuncs] += "package_get_auto_pr"
>>>>>
>>>>> -addtask shared_workdir after do_compile before do_compile_kernelmodules
>>>>> +addtask shared_workdir after do_compile_kernelmodules before do_strip
>>>>> addtask shared_workdir_setscene
>>>>>
>>>>> do_shared_workdir_setscene () {
>>>>>
>>>>> But that's surely kind of smell, whether before do_strip and do_install is
>>>>> preferred. Mandatory is, that do_shared_workdir must not be processed before
>>>>> do_compile_kernelmodules finishes.
>>>>>
>>>>> There is no way to avoid it, and if those 5 seconds slow down your build,
>>>>> there is probably another thing which should be fixed.
>>>>
>>>> It's not 5 seconds. It is much more on some machines and configurations.
>>>
>>> Depends on your build machine and much more, your sstate cache,
>>> changes to kernel and why you do a full build after each kernel
>>> change instead of just deploying the kernel.
>>
>> Sure. I've been at this for years now .. and I've seen and used lots
>> of configs. We are trying to give a choice on what level of building
>> triggers.
>>
>>>
>>>> The point is that not everyone who is building modules depends on
>>>> other modules and not everyone that is building against the kernel
>>>> may even want modules built.
>>>
>>> That's only build time. How often do you recompile your kernel that
>>> this dependency really matters?
>>
>> I rebuild the kernel a lot .. I'm doing kernel and kernel tools maintenance
>> after all :)
>
> Understood, so your primary tooling becomes slower.
>
>>>> The fix is to just re-copy the symbols after kernel modules are
>>>> built, and put the onus on the recipe writer to depend on the
>>>> right task/variant. By default, do_shared_workdir won't have that
>>>> dependency, but anyone with a recipe that does depend on other
>>>> module symbols will get that extra copy and dependency created.
>>>
>>> This is simply wrong, period. Because (I already explained that)
>>> module-base.bbclass adds a configure-stage dependency to do_shared_workdir.
>>>
>>
>> And I explained that I didn't need that explanation.
>
> I'm sorry - but I didn't got that. I still don't get that you
> don't need that explanation. Even if you understand more and deeper
> what's going on when building kernel and tools and 3rd party
> modules around that - several people running into that race condition
> and you disagree that moving the race keeps the thing broken.
Even in the first threads on this, I didn't disagree that this fixes the
problem. I'm (and Richard) are the ones that will get the complaints
if someone has a performance issue, or their build is otherwise
changed .. hence the hesitation.
>
>> There's really no need to escalate your language. It isn't needed.
>>
>>> You can avoid the mistake by adding a sane stage for redo_shared_workdir
>>> after do_compile_kernelmodules before do_strip and reassign the 3rd
>>> party module bbclass to depend on redo_shared_workdir.
>>>
>>> OTOH I think, build performance isn't worth a however sophisticated
>>> fragile patch. Let's apply Stefan's patch and be my guest for sending
>>> an performance optimized one when you figured it out.
>>>
>>> @Richard - isn't it worth use the correct, even if slow, patch and
>>> let Bruce anytime in not to distant future provide an optimized one
>>> which can be settled and backported for 2.0.1?
>>>
>>> Isn't release time soon enough?
>>>
>>> @Stefan - would you agree to "addtask shared_workdir after do_compile_kernelmodules before do_strip"
>>> instead of "addtask shared_workdir after do_compile_kernelmodules before do_install"?
>>>
>>
>> I'm checking out of the conversation, since I'm on vacation and shouldn't
>> be near a computer.
>>
>> I indicated months ago that RP has the final say, and he can merge any
>> variants of the patches he wants, since they don't break anything and
>> fix the problem.
>>
>> RP: I'll ack either patch that has been posted. I'll will revisit all of this
>> when I'm working on my 2.1 kernel packaging changes anyway, so
>> there's no need to wait.
>
> Great! Thanks, Bruce.
Indeed. Don't take my prolonging of this discussion as anything but me trying
to make sure that any changes are thorough and not breaking workflows.
Raising the bar (even if only a little bit) is what we try to do :)
.. now I'm REALLY going to put down my keyboard .... hopefully ;)
Cheers,
Bruce
>
> Cheers
> --
> Jens Rehsack - rehsack at gmail.com
>
--
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"
More information about the Openembedded-core
mailing list