[RFC,00/12] locking/lockdep: Add a new class of terminal locks

Message ID	1541709268-3766-1-git-send-email-longman@redhat.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of longman@redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; From: Waiman Long <longman@redhat.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Will Deacon <will.deacon@arm.com>, Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, Petr Mladek <pmladek@suse.com>, Sergey Senozhatsky <sergey.senozhatsky@gmail.com>, Andrey Ryabinin <aryabinin@virtuozzo.com>, Tejun Heo <tj@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Waiman Long <longman@redhat.com> Subject: [RFC PATCH 00/12] locking/lockdep: Add a new class of terminal locks Date: Thu, 8 Nov 2018 15:34:16 -0500 Message-Id: <1541709268-3766-1-git-send-email-longman@redhat.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	locking/lockdep: Add a new class of terminal locks \| expand [RFC,00/12] locking/lockdep: Add a new class of terminal locks [RFC,01/12] locking/lockdep: Rework lockdep_set_novalidate_class() [RFC,02/12] locking/lockdep: Add a new terminal lock type [RFC,03/12] locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros [RFC,04/12] printk: Make logbuf_lock a terminal lock [RFC,05/12] debugobjects: Mark pool_lock as a terminal lock [RFC,06/12] debugobjects: Move printk out of db lock critical sections [RFC,07/12] locking/lockdep: Add support for nested terminal locks [RFC,08/12] debugobjects: Make object hash locks nested terminal locks [RFC,09/12] lib/stackdepot: Make depot_lock a terminal spinlock [RFC,10/12] locking/rwsem: Mark rwsem.wait_lock as a terminal lock [RFC,11/12] cgroup: Mark the rstat percpu lock as terminal [RFC,12/12] mm/kasan: Make quarantine_lock a terminal lock

Waiman Long Nov. 8, 2018, 8:34 p.m. UTC

The purpose of this patchset is to add a new class of locks called
terminal locks and converts some of the low level raw or regular
spinlocks to terminal locks. A terminal lock does not have forward
dependency and it won't allow a lock or unlock operation on another
lock. Two level nesting of terminal locks is allowed, though.

Only spinlocks that are acquired with the _irq/_irqsave variants or
acquired in an IRQ disabled context should be classified as terminal
locks.

Because of the restrictions on terminal locks, we can do simple checks on
them without using the lockdep lock validation machinery. The advantages
of making these changes are as follows:

 1) The lockdep check will be faster for terminal locks without using
    the lock validation code.
 2) It saves table entries used by the validation code and hence make
    it harder to overflow those tables.

In fact, it is possible to overflow some of the tables by running
a variety of different workloads on a debug kernel. I have seen bug
reports about exhausting MAX_LOCKDEP_KEYS, MAX_LOCKDEP_ENTRIES and
MAX_STACK_TRACE_ENTRIES. This patch will help to reduce the chance
of overflowing some of the tables.

Performance wise, there was no statistically significant difference in
performanace when doing a parallel kernel build on a debug kernel.

Below were selected output lines from the lockdep_stats files of the
patched and unpatched kernels after bootup and running parallel kernel
builds.

  Item                     Unpatched kernel  Patched kernel  % Change
  ----                     ----------------  --------------  --------
  direct dependencies           9732             8994          -7.6%
  dependency chains            18776            17033          -9.3%
  dependency chain hlocks      76044            68419         -10.0%
  stack-trace entries         110403           104341          -5.5%

There were some reductions in the size of the lockdep tables. They were
not significant, but it is still a good start to rein in the number of
entries in those tables to make it harder to overflow them.

Waiman Long (12):
  locking/lockdep: Rework lockdep_set_novalidate_class()
  locking/lockdep: Add a new terminal lock type
  locking/lockdep: Add DEFINE_TERMINAL_SPINLOCK() and related macros
  printk: Make logbuf_lock a terminal lock
  debugobjects: Mark pool_lock as a terminal lock
  debugobjects: Move printk out of db lock critical sections
  locking/lockdep: Add support for nested terminal locks
  debugobjects: Make object hash locks nested terminal locks
  lib/stackdepot: Make depot_lock a terminal spinlock
  locking/rwsem: Mark rwsem.wait_lock as a terminal lock
  cgroup: Mark the rstat percpu lock as terminal
  mm/kasan: Make quarantine_lock a terminal lock

 include/linux/lockdep.h            | 34 +++++++++++++++---
 include/linux/rwsem.h              | 11 +++++-
 include/linux/spinlock_types.h     | 34 ++++++++++++------
 kernel/cgroup/rstat.c              |  9 +++--
 kernel/locking/lockdep.c           | 67 ++++++++++++++++++++++++++++++------
 kernel/locking/lockdep_internals.h |  5 +++
 kernel/locking/lockdep_proc.c      | 11 ++++--
 kernel/locking/rwsem-xadd.c        |  1 +
 kernel/printk/printk.c             |  2 +-
 kernel/printk/printk_safe.c        |  2 +-
 lib/debugobjects.c                 | 70 ++++++++++++++++++++++++++------------
 lib/stackdepot.c                   |  2 +-
 mm/kasan/quarantine.c              |  2 +-
 13 files changed, 195 insertions(+), 55 deletions(-)

Ingo Molnar Nov. 9, 2018, 8:04 a.m. UTC | #1

* Waiman Long <longman@redhat.com> wrote:

> The purpose of this patchset is to add a new class of locks called
> terminal locks and converts some of the low level raw or regular
> spinlocks to terminal locks. A terminal lock does not have forward
> dependency and it won't allow a lock or unlock operation on another
> lock. Two level nesting of terminal locks is allowed, though.
> 
> Only spinlocks that are acquired with the _irq/_irqsave variants or
> acquired in an IRQ disabled context should be classified as terminal
> locks.
> 
> Because of the restrictions on terminal locks, we can do simple checks on
> them without using the lockdep lock validation machinery. The advantages
> of making these changes are as follows:
> 
>  1) The lockdep check will be faster for terminal locks without using
>     the lock validation code.
>  2) It saves table entries used by the validation code and hence make
>     it harder to overflow those tables.
> 
> In fact, it is possible to overflow some of the tables by running
> a variety of different workloads on a debug kernel. I have seen bug
> reports about exhausting MAX_LOCKDEP_KEYS, MAX_LOCKDEP_ENTRIES and
> MAX_STACK_TRACE_ENTRIES. This patch will help to reduce the chance
> of overflowing some of the tables.
> 
> Performance wise, there was no statistically significant difference in
> performanace when doing a parallel kernel build on a debug kernel.

Could you please measure a locking intense workload instead, such as:

   $ perf stat --null --sync --repeat 10 perf bench sched messaging

and profile which locks used there could be marked terminal, and measure 
the before/after performance impact?

> Below were selected output lines from the lockdep_stats files of the
> patched and unpatched kernels after bootup and running parallel kernel
> builds.
> 
>   Item                     Unpatched kernel  Patched kernel  % Change
>   ----                     ----------------  --------------  --------
>   direct dependencies           9732             8994          -7.6%
>   dependency chains            18776            17033          -9.3%
>   dependency chain hlocks      76044            68419         -10.0%
>   stack-trace entries         110403           104341          -5.5%

That's pretty impressive!

> There were some reductions in the size of the lockdep tables. They were
> not significant, but it is still a good start to rein in the number of
> entries in those tables to make it harder to overflow them.

Agreed.

BTW., if you are interested in more radical approaches to optimize 
lockdep, we could also add a static checker via objtool driven call graph 
analysis, and mark those locks terminal that we can prove are terminal.

This would require the unified call graph of the kernel image and of all 
modules to be examined in a final pass, but that's within the principal 
scope of objtool. (This 'final pass' could also be done during bootup, at 
least in initial versions.)

Note that beyond marking it 'terminal' such a static analysis pass would 
also allow the detection of obvious locking bugs at the build (or boot) 
stage already - plus it would allow the disabling of lockdep for 
self-contained locks that don't interact with anything else.

I.e. the static analysis pass would 'augment' lockdep and leave only 
those locks active for runtime lockdep tracking whose dependencies it 
cannot prove to be correct yet.

Thanks,

	Ingo

Waiman Long Nov. 9, 2018, 3:48 p.m. UTC | #2

On 11/09/2018 03:04 AM, Ingo Molnar wrote:
> * Waiman Long <longman@redhat.com> wrote:
>
>> The purpose of this patchset is to add a new class of locks called
>> terminal locks and converts some of the low level raw or regular
>> spinlocks to terminal locks. A terminal lock does not have forward
>> dependency and it won't allow a lock or unlock operation on another
>> lock. Two level nesting of terminal locks is allowed, though.
>>
>> Only spinlocks that are acquired with the _irq/_irqsave variants or
>> acquired in an IRQ disabled context should be classified as terminal
>> locks.
>>
>> Because of the restrictions on terminal locks, we can do simple checks on
>> them without using the lockdep lock validation machinery. The advantages
>> of making these changes are as follows:
>>
>>  1) The lockdep check will be faster for terminal locks without using
>>     the lock validation code.
>>  2) It saves table entries used by the validation code and hence make
>>     it harder to overflow those tables.
>>
>> In fact, it is possible to overflow some of the tables by running
>> a variety of different workloads on a debug kernel. I have seen bug
>> reports about exhausting MAX_LOCKDEP_KEYS, MAX_LOCKDEP_ENTRIES and
>> MAX_STACK_TRACE_ENTRIES. This patch will help to reduce the chance
>> of overflowing some of the tables.
>>
>> Performance wise, there was no statistically significant difference in
>> performanace when doing a parallel kernel build on a debug kernel.
> Could you please measure a locking intense workload instead, such as:
>
>    $ perf stat --null --sync --repeat 10 perf bench sched messaging
>
> and profile which locks used there could be marked terminal, and measure 
> the before/after performance impact?

I will run the test. It will probably be done after the LPC next week.

>> Below were selected output lines from the lockdep_stats files of the
>> patched and unpatched kernels after bootup and running parallel kernel
>> builds.
>>
>>   Item                     Unpatched kernel  Patched kernel  % Change
>>   ----                     ----------------  --------------  --------
>>   direct dependencies           9732             8994          -7.6%
>>   dependency chains            18776            17033          -9.3%
>>   dependency chain hlocks      76044            68419         -10.0%
>>   stack-trace entries         110403           104341          -5.5%
> That's pretty impressive!
>
>> There were some reductions in the size of the lockdep tables. They were
>> not significant, but it is still a good start to rein in the number of
>> entries in those tables to make it harder to overflow them.
> Agreed.
>
> BTW., if you are interested in more radical approaches to optimize 
> lockdep, we could also add a static checker via objtool driven call graph 
> analysis, and mark those locks terminal that we can prove are terminal.
>
> This would require the unified call graph of the kernel image and of all 
> modules to be examined in a final pass, but that's within the principal 
> scope of objtool. (This 'final pass' could also be done during bootup, at 
> least in initial versions.)
>
> Note that beyond marking it 'terminal' such a static analysis pass would 
> also allow the detection of obvious locking bugs at the build (or boot) 
> stage already - plus it would allow the disabling of lockdep for 
> self-contained locks that don't interact with anything else.
>
> I.e. the static analysis pass would 'augment' lockdep and leave only 
> those locks active for runtime lockdep tracking whose dependencies it 
> cannot prove to be correct yet.

It is a pretty interesting idea to use objtool to scan for locks. The
list of locks that I marked as terminal in this patch was found by
looking at /proc/lockdep for those that only have backward dependencies,
but no forward dependency. I focused on those with a large number of BDs
and check the code to see if they could marked as terminal. This is a
rather labor intensive process and is subject to error. It would be nice
if it can be done by an automated tool. So I am going to look into that,
but it won't be part of this initial patchset, though.

I sent this patchset out to see if anyone has any objection to it. It
seems you don't have any objection to that. So I am going to move ahead
to do more testing and performance analysis.

Thanks,
Longman

Peter Zijlstra Nov. 10, 2018, 2:10 p.m. UTC | #3

On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
> BTW., if you are interested in more radical approaches to optimize 
> lockdep, we could also add a static checker via objtool driven call graph 
> analysis, and mark those locks terminal that we can prove are terminal.
> 
> This would require the unified call graph of the kernel image and of all 
> modules to be examined in a final pass, but that's within the principal 
> scope of objtool. (This 'final pass' could also be done during bootup, at 
> least in initial versions.)

Something like this is needed for objtool LTO support as well. I just
dread the build time 'regressions' this will introduce :/

The final link pass is already by far the most expensive part (as
measured in wall-time) of building a kernel, adding more work there
would really suck :/

Waiman Long Nov. 10, 2018, 11:35 p.m. UTC | #4

On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
> On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
>> BTW., if you are interested in more radical approaches to optimize 
>> lockdep, we could also add a static checker via objtool driven call graph 
>> analysis, and mark those locks terminal that we can prove are terminal.
>>
>> This would require the unified call graph of the kernel image and of all 
>> modules to be examined in a final pass, but that's within the principal 
>> scope of objtool. (This 'final pass' could also be done during bootup, at 
>> least in initial versions.)
> Something like this is needed for objtool LTO support as well. I just
> dread the build time 'regressions' this will introduce :/
>
> The final link pass is already by far the most expensive part (as
> measured in wall-time) of building a kernel, adding more work there
> would really suck :/

I think the idea is to make objtool have the capability to do that. It
doesn't mean we need to turn it on by default in every build.

Cheers,
Longman

Ingo Molnar Nov. 12, 2018, 5:10 a.m. UTC | #5

* Waiman Long <longman@redhat.com> wrote:

> On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
> > On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
> >> BTW., if you are interested in more radical approaches to optimize 
> >> lockdep, we could also add a static checker via objtool driven call graph 
> >> analysis, and mark those locks terminal that we can prove are terminal.
> >>
> >> This would require the unified call graph of the kernel image and of all 
> >> modules to be examined in a final pass, but that's within the principal 
> >> scope of objtool. (This 'final pass' could also be done during bootup, at 
> >> least in initial versions.)
> >
> > Something like this is needed for objtool LTO support as well. I just
> > dread the build time 'regressions' this will introduce :/
> >
> > The final link pass is already by far the most expensive part (as
> > measured in wall-time) of building a kernel, adding more work there
> > would really suck :/
> 
> I think the idea is to make objtool have the capability to do that. It
> doesn't mean we need to turn it on by default in every build.

Yeah.

Also note that much of the objtool legwork would be on a per file basis 
which is reasonably parallelized already. On x86 it's also already done 
for every ORC build i.e. every distro build and the incremental overhead 
from also extracting locking dependencies should be reasonably small.

The final search of the global graph would be serialized but still 
reasonably fast as these are all 'class' level dependencies which are 
much less numerous than runtime dependencies.

I.e. I think we are talking about tens of thousands of dependencies, not 
tens of millions.

At least in theory. ;-)

Thanks,

	Ingo

Ingo Molnar Nov. 12, 2018, 5:15 a.m. UTC | #6

* Waiman Long <longman@redhat.com> wrote:

> > Could you please measure a locking intense workload instead, such as:
> >
> >    $ perf stat --null --sync --repeat 10 perf bench sched messaging
> >
> > and profile which locks used there could be marked terminal, and measure 
> > the before/after performance impact?
> 
> I will run the test. It will probably be done after the LPC next week.

Thanks!

> >> Below were selected output lines from the lockdep_stats files of the
> >> patched and unpatched kernels after bootup and running parallel kernel
> >> builds.
> >>
> >>   Item                     Unpatched kernel  Patched kernel  % Change
> >>   ----                     ----------------  --------------  --------
> >>   direct dependencies           9732             8994          -7.6%
> >>   dependency chains            18776            17033          -9.3%
> >>   dependency chain hlocks      76044            68419         -10.0%
> >>   stack-trace entries         110403           104341          -5.5%
> > That's pretty impressive!
> >
> >> There were some reductions in the size of the lockdep tables. They were
> >> not significant, but it is still a good start to rein in the number of
> >> entries in those tables to make it harder to overflow them.
> > Agreed.
> >
> > BTW., if you are interested in more radical approaches to optimize 
> > lockdep, we could also add a static checker via objtool driven call graph 
> > analysis, and mark those locks terminal that we can prove are terminal.
> >
> > This would require the unified call graph of the kernel image and of all 
> > modules to be examined in a final pass, but that's within the principal 
> > scope of objtool. (This 'final pass' could also be done during bootup, at 
> > least in initial versions.)
> >
> > Note that beyond marking it 'terminal' such a static analysis pass would 
> > also allow the detection of obvious locking bugs at the build (or boot) 
> > stage already - plus it would allow the disabling of lockdep for 
> > self-contained locks that don't interact with anything else.
> >
> > I.e. the static analysis pass would 'augment' lockdep and leave only 
> > those locks active for runtime lockdep tracking whose dependencies it 
> > cannot prove to be correct yet.
> 
> It is a pretty interesting idea to use objtool to scan for locks. The
> list of locks that I marked as terminal in this patch was found by
> looking at /proc/lockdep for those that only have backward dependencies,
> but no forward dependency. I focused on those with a large number of BDs
> and check the code to see if they could marked as terminal. This is a
> rather labor intensive process and is subject to error.

Yeah.

> [...] It would be nice if it can be done by an automated tool. So I am 
> going to look into that, but it won't be part of this initial patchset, 
> though.

Of course!

> I sent this patchset out to see if anyone has any objection to it. It
> seems you don't have any objection to that. So I am going to move ahead
> to do more testing and performance analysis.

The one worry I have is that this interim solution removes the benefit of 
a proper static analysis method.

But if you promise to make a serious effort on the static analysis 
tooling as well (which should have awesome performance results and 
automate the manual markup), then I have no fundamental objections to the 
interim approach either.

If static analysis works as well as I expect it to then in principle we 
might even be able to have lockdep enabled in production kernels: it 
would only add overhead to locks that are overly complex - which would 
create incentives to improve those dependencies.

Thanks,

	Ingo

Josh Poimboeuf Nov. 12, 2018, 5:53 a.m. UTC | #7

On Mon, Nov 12, 2018 at 06:10:33AM +0100, Ingo Molnar wrote:
> 
> * Waiman Long <longman@redhat.com> wrote:
> 
> > On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
> > > On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
> > >> BTW., if you are interested in more radical approaches to optimize 
> > >> lockdep, we could also add a static checker via objtool driven call graph 
> > >> analysis, and mark those locks terminal that we can prove are terminal.
> > >>
> > >> This would require the unified call graph of the kernel image and of all 
> > >> modules to be examined in a final pass, but that's within the principal 
> > >> scope of objtool. (This 'final pass' could also be done during bootup, at 
> > >> least in initial versions.)
> > >
> > > Something like this is needed for objtool LTO support as well. I just
> > > dread the build time 'regressions' this will introduce :/
> > >
> > > The final link pass is already by far the most expensive part (as
> > > measured in wall-time) of building a kernel, adding more work there
> > > would really suck :/
> > 
> > I think the idea is to make objtool have the capability to do that. It
> > doesn't mean we need to turn it on by default in every build.
> 
> Yeah.
> 
> Also note that much of the objtool legwork would be on a per file basis 
> which is reasonably parallelized already. On x86 it's also already done 
> for every ORC build i.e. every distro build and the incremental overhead 
> from also extracting locking dependencies should be reasonably small.
> 
> The final search of the global graph would be serialized but still 
> reasonably fast as these are all 'class' level dependencies which are 
> much less numerous than runtime dependencies.
> 
> I.e. I think we are talking about tens of thousands of dependencies, not 
> tens of millions.
> 
> At least in theory. ;-)

Generating a unified call graph sounds very expensive (and very far
beyond what objtool can do today).  Also, what about function pointers?

BTW there's another kernel static analysis tool which attempts to create
such a call graph already: smatch.

Ingo Molnar Nov. 12, 2018, 6:30 a.m. UTC | #8

* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Mon, Nov 12, 2018 at 06:10:33AM +0100, Ingo Molnar wrote:
> > 
> > * Waiman Long <longman@redhat.com> wrote:
> > 
> > > On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
> > > > On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
> > > >> BTW., if you are interested in more radical approaches to optimize 
> > > >> lockdep, we could also add a static checker via objtool driven call graph 
> > > >> analysis, and mark those locks terminal that we can prove are terminal.
> > > >>
> > > >> This would require the unified call graph of the kernel image and of all 
> > > >> modules to be examined in a final pass, but that's within the principal 
> > > >> scope of objtool. (This 'final pass' could also be done during bootup, at 
> > > >> least in initial versions.)
> > > >
> > > > Something like this is needed for objtool LTO support as well. I just
> > > > dread the build time 'regressions' this will introduce :/
> > > >
> > > > The final link pass is already by far the most expensive part (as
> > > > measured in wall-time) of building a kernel, adding more work there
> > > > would really suck :/
> > > 
> > > I think the idea is to make objtool have the capability to do that. It
> > > doesn't mean we need to turn it on by default in every build.
> > 
> > Yeah.
> > 
> > Also note that much of the objtool legwork would be on a per file basis 
> > which is reasonably parallelized already. On x86 it's also already done 
> > for every ORC build i.e. every distro build and the incremental overhead 
> > from also extracting locking dependencies should be reasonably small.
> > 
> > The final search of the global graph would be serialized but still 
> > reasonably fast as these are all 'class' level dependencies which are 
> > much less numerous than runtime dependencies.
> > 
> > I.e. I think we are talking about tens of thousands of dependencies, not 
> > tens of millions.
> > 
> > At least in theory. ;-)
> 
> Generating a unified call graph sounds very expensive (and very far
> beyond what objtool can do today).

Well, objtool already goes through the instruction stream and recognizes 
function calls - so it can in effect generate a stream of "function x 
called by function y" data, correct?

>  Also, what about function pointers?

So maybe it's possible to enumerate all potential values for function 
pointers with a reasonably simple compiler plugin and work from there?

One complication would be function pointers encoded as opaque data 
types...

> BTW there's another kernel static analysis tool which attempts to 
> create such a call graph already: smatch.

It's not included in the kernel tree though and I'd expect tight coupling 
(or at least lock-step improvements) between tooling and lockdep here.

Thanks,

	Ingo

Josh Poimboeuf Nov. 12, 2018, 10:22 p.m. UTC | #9

On Mon, Nov 12, 2018 at 07:30:50AM +0100, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > On Mon, Nov 12, 2018 at 06:10:33AM +0100, Ingo Molnar wrote:
> > > 
> > > * Waiman Long <longman@redhat.com> wrote:
> > > 
> > > > On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
> > > > > On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
> > > > >> BTW., if you are interested in more radical approaches to optimize 
> > > > >> lockdep, we could also add a static checker via objtool driven call graph 
> > > > >> analysis, and mark those locks terminal that we can prove are terminal.
> > > > >>
> > > > >> This would require the unified call graph of the kernel image and of all 
> > > > >> modules to be examined in a final pass, but that's within the principal 
> > > > >> scope of objtool. (This 'final pass' could also be done during bootup, at 
> > > > >> least in initial versions.)
> > > > >
> > > > > Something like this is needed for objtool LTO support as well. I just
> > > > > dread the build time 'regressions' this will introduce :/
> > > > >
> > > > > The final link pass is already by far the most expensive part (as
> > > > > measured in wall-time) of building a kernel, adding more work there
> > > > > would really suck :/
> > > > 
> > > > I think the idea is to make objtool have the capability to do that. It
> > > > doesn't mean we need to turn it on by default in every build.
> > > 
> > > Yeah.
> > > 
> > > Also note that much of the objtool legwork would be on a per file basis 
> > > which is reasonably parallelized already. On x86 it's also already done 
> > > for every ORC build i.e. every distro build and the incremental overhead 
> > > from also extracting locking dependencies should be reasonably small.
> > > 
> > > The final search of the global graph would be serialized but still 
> > > reasonably fast as these are all 'class' level dependencies which are 
> > > much less numerous than runtime dependencies.
> > > 
> > > I.e. I think we are talking about tens of thousands of dependencies, not 
> > > tens of millions.
> > > 
> > > At least in theory. ;-)
> > 
> > Generating a unified call graph sounds very expensive (and very far
> > beyond what objtool can do today).
> 
> Well, objtool already goes through the instruction stream and recognizes 
> function calls - so it can in effect generate a stream of "function x 
> called by function y" data, correct?

Yeah, though it would be quite simple to get the same data with a simple
awk script at link time.

> >  Also, what about function pointers?
> 
> So maybe it's possible to enumerate all potential values for function 
> pointers with a reasonably simple compiler plugin and work from there?

I think this would be somewhere between very difficult and impossible to
do properly.  I can't even imagine how this would be implemented in a
compiler plugin.  But I'd love to be proven wrong on that.

> One complication would be function pointers encoded as opaque data 
> types...
> 
> > BTW there's another kernel static analysis tool which attempts to 
> > create such a call graph already: smatch.
> 
> It's not included in the kernel tree though and I'd expect tight coupling 
> (or at least lock-step improvements) between tooling and lockdep here.

Fair enough.  Smatch's call tree isn't perfect anyway, but I don't think
perfect is attainable.

Waiman Long Nov. 12, 2018, 10:56 p.m. UTC | #10

On 11/12/2018 02:22 PM, Josh Poimboeuf wrote:
> On Mon, Nov 12, 2018 at 07:30:50AM +0100, Ingo Molnar wrote:
>> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>>> On Mon, Nov 12, 2018 at 06:10:33AM +0100, Ingo Molnar wrote:
>>>> * Waiman Long <longman@redhat.com> wrote:
>>>>
>>>>> On 11/10/2018 09:10 AM, Peter Zijlstra wrote:
>>>>>> On Fri, Nov 09, 2018 at 09:04:12AM +0100, Ingo Molnar wrote:
>>>>>>> BTW., if you are interested in more radical approaches to optimize 
>>>>>>> lockdep, we could also add a static checker via objtool driven call graph 
>>>>>>> analysis, and mark those locks terminal that we can prove are terminal.
>>>>>>>
>>>>>>> This would require the unified call graph of the kernel image and of all 
>>>>>>> modules to be examined in a final pass, but that's within the principal 
>>>>>>> scope of objtool. (This 'final pass' could also be done during bootup, at 
>>>>>>> least in initial versions.)
>>>>>> Something like this is needed for objtool LTO support as well. I just
>>>>>> dread the build time 'regressions' this will introduce :/
>>>>>>
>>>>>> The final link pass is already by far the most expensive part (as
>>>>>> measured in wall-time) of building a kernel, adding more work there
>>>>>> would really suck :/
>>>>> I think the idea is to make objtool have the capability to do that. It
>>>>> doesn't mean we need to turn it on by default in every build.
>>>> Yeah.
>>>>
>>>> Also note that much of the objtool legwork would be on a per file basis 
>>>> which is reasonably parallelized already. On x86 it's also already done 
>>>> for every ORC build i.e. every distro build and the incremental overhead 
>>>> from also extracting locking dependencies should be reasonably small.
>>>>
>>>> The final search of the global graph would be serialized but still 
>>>> reasonably fast as these are all 'class' level dependencies which are 
>>>> much less numerous than runtime dependencies.
>>>>
>>>> I.e. I think we are talking about tens of thousands of dependencies, not 
>>>> tens of millions.
>>>>
>>>> At least in theory. ;-)
>>> Generating a unified call graph sounds very expensive (and very far
>>> beyond what objtool can do today).
>> Well, objtool already goes through the instruction stream and recognizes 
>> function calls - so it can in effect generate a stream of "function x 
>> called by function y" data, correct?
> Yeah, though it would be quite simple to get the same data with a simple
> awk script at link time.
>
>>>  Also, what about function pointers?
>> So maybe it's possible to enumerate all potential values for function 
>> pointers with a reasonably simple compiler plugin and work from there?
> I think this would be somewhere between very difficult and impossible to
> do properly.  I can't even imagine how this would be implemented in a
> compiler plugin.  But I'd love to be proven wrong on that.

I would say we have to assume for the worst when a function pointer is
being called while holding a lock unless we are able to find out all its
possible targets.

Cheers,
Longman

[RFC,00/12] locking/lockdep: Add a new class of terminal locks

Message

Comments