[0/2] revert unconditional slab and page allocator fault injection calls

Message ID	20240711-b4-fault-injection-reverts-v1-0-9e2651945d68@suse.cz (mailing list archive)
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 778F116D312; Thu, 11 Jul 2024 16:35:49 +0000 (UTC) From: Vlastimil Babka <vbabka@suse.cz> Subject: [PATCH 0/2] revert unconditional slab and page allocator fault injection calls Date: Thu, 11 Jul 2024 18:35:29 +0200 Message-Id: <20240711-b4-fault-injection-reverts-v1-0-9e2651945d68@suse.cz> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: Andrew Morton <akpm@linux-foundation.org> Cc: Mateusz Guzik <mjguzik@gmail.com>, Akinobu Mita <akinobu.mita@gmail.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, John Fastabend <john.fastabend@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Martin KaFai Lau <martin.lau@linux.dev>, Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>, Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Christoph Lameter <cl@linux.com>, David Rientjes <rientjes@google.com>, Roman Gushchin <roman.gushchin@linux.dev>, Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>
Series	revert unconditional slab and page allocator fault injection calls \| expand [0/2] revert unconditional slab and page allocator fault injection calls [1/2] mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB [2/2] mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC

Message ID

20240711-b4-fault-injection-reverts-v1-0-9e2651945d68@suse.cz (mailing list archive)

Headers

From: Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH 0/2] revert unconditional slab and page allocator fault
 injection calls
Date: Thu, 11 Jul 2024 18:35:29 +0200
Message-Id: <20240711-b4-fault-injection-reverts-v1-0-9e2651945d68@suse.cz>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>,
 Akinobu Mita <akinobu.mita@gmail.com>, Alexei Starovoitov <ast@kernel.org>,
 Daniel Borkmann <daniel@iogearbox.net>,
 John Fastabend <john.fastabend@gmail.com>,
 Andrii Nakryiko <andrii@kernel.org>,
 Martin KaFai Lau <martin.lau@linux.dev>,
 Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
 Yonghong Song <yonghong.song@linux.dev>, KP Singh <kpsingh@kernel.org>,
 Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
 Jiri Olsa <jolsa@kernel.org>, Christoph Lameter <cl@linux.com>,
 David Rientjes <rientjes@google.com>,
 Roman Gushchin <roman.gushchin@linux.dev>,
 Hyeonggon Yoo <42.hyeyoo@gmail.com>, linux-kernel@vger.kernel.org,
 bpf@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>

Series

revert unconditional slab and page allocator fault injection calls | expand

Message

Vlastimil Babka July 11, 2024, 4:35 p.m. UTC

These two patches largely revert commits that added function call
overhead into slab and page allocation hotpaths and that cannot be
currently disabled even though related CONFIG_ options do exist.

A much more involved solution that can keep the callsites always
existing but hidden behind a static key if unused, is possible [1] and
can be pursued by anyone who believes it's necessary. Meanwhile the fact
the should_failslab() error injection is already not functional on
kernels built with current gcc without anyone noticing [2], and lukewarm
response to [1] suggests the need is not there. I believe it will be
more fair to have the state after this series as a baseline for possible
further optimisation, instead of the unconditional overhead.

For example a possible compromise for anyone who's fine with an empty
function call overhead but not the full CONFIG_FAILSLAB /
CONFIG_FAIL_PAGE_ALLOC overhead is to reuse patch 1 from [1] but insert
a static key check only inside should_failslab() and
should_fail_alloc_page() before performing the more expensive checks.

[1] https://lore.kernel.org/all/20240620-fault-injection-statickeys-v2-0-e23947d3d84b@suse.cz/#t
[2] https://github.com/bpftrace/bpftrace/issues/3258

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
Vlastimil Babka (2):
      mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
      mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC

 include/linux/fault-inject.h | 11 ++++-------
 kernel/bpf/verifier.c        |  4 ++++
 mm/fail_page_alloc.c         |  4 +++-
 mm/failslab.c                | 14 ++++++++------
 mm/page_alloc.c              |  6 ------
 mm/slub.c                    |  8 --------
 6 files changed, 19 insertions(+), 28 deletions(-)
---
base-commit: 256abd8e550ce977b728be79a74e1729438b4948
change-id: 20240711-b4-fault-injection-reverts-e4d099e620f5

Best regards,

Comments

Andrew Morton July 11, 2024, 7:36 p.m. UTC | #1

On Thu, 11 Jul 2024 18:35:29 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> These two patches largely revert commits that added function call
> overhead into slab and page allocation hotpaths and that cannot be
> currently disabled even though related CONFIG_ options do exist.

Five years ago.  I assume the overall overhead is small?

Vlastimil Babka July 12, 2024, 7:19 a.m. UTC | #2

On 7/11/24 9:36 PM, Andrew Morton wrote:
> On Thu, 11 Jul 2024 18:35:29 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
> 
>> These two patches largely revert commits that added function call
>> overhead into slab and page allocation hotpaths and that cannot be
>> currently disabled even though related CONFIG_ options do exist.
> 
> Five years ago.  I assume the overall overhead is small?

Well, what made me look into this in the first place was seeing
should_failslab() in perf profiles at 1-2% even though it was an empty
function that just immediately returned.
In [1] I posted some measurements that was not even a microbenchmark:

    To demonstrate the reduced overhead of calling an empty
    should_failslab() function, a kernel build with
    CONFIG_FUNCTION_ERROR_INJECTION enabled but CONFIG_FAILSLAB disabled,
    and CPU mitigations enabled, was used in a qemu-kvm (virtme-ng) on AMD
    Ryzen 7 2700 machine, and execution of a program trying to open() a
    non-existent file was measured 3 times:

        for (int i = 0; i < 10000000; i++) {
            open("non_existent", O_RDONLY);
        }

    After this patch, the measured real time was 4.3% smaller. Using perf
    profiling it was verified that should_failslab was gone from the
    profile.

Later I found that this CPU mitigations were really important here as
function calls are more expensive. With them disabled that benchmark was in
a noise, so I wasn't sure about claiming that number in the patch itself.
But I assume a microbenchmark would still demonstrate some overhead. Yet
ultimately I think the overhead is just plain unnecessary to pay when error
injection is not being performed, and also CPU mitigations enabled are
usually the default, so it's best get rid of it.

[1]
https://lore.kernel.org/all/20240620-fault-injection-statickeys-v2-0-e23947d3d84b@suse.cz/#t