[v1] kernel/module: allocate module vmap space after making sure the module is unique

We already make sure to allocate percpu data only after we verified that
the module we're loading hasn't already been loaded and isn't
concurrently getting loaded -- that it's unique.

On big systems (> 400 CPUs and many devices) with KASAN enabled, we're now
phasing a similar issue with the module vmap space.

When KASAN_INLINE is enabled (resulting in large module size), plenty
of devices that udev wants to probe and plenty (> 400) of CPUs that can
carry out that probing concurrently, we can actually run out of module
vmap space and trigger vmap allocation errors:

[  165.818200] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.836622] vmap allocation for size 315392 failed: use vmalloc=<size> to increase size
[  165.837461] vmap allocation for size 315392 failed: use vmalloc=<size> to increase size
[  165.840573] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.841059] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.841428] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.841819] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.842123] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.843359] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.844894] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[  165.847028] CPU: 253 PID: 4995 Comm: systemd-udevd Not tainted 5.19.0 #2
[  165.935689] Hardware name: Lenovo ThinkSystem SR950 -[7X12ABC1WW]-/-[7X12ABC1WW]-, BIOS -[PSE130O-1.81]- 05/20/2020
[  165.947343] Call Trace:
[  165.950075]  <TASK>
[  165.952425]  dump_stack_lvl+0x57/0x81
[  165.956532]  warn_alloc.cold+0x95/0x18a
[  165.960836]  ? zone_watermark_ok_safe+0x240/0x240
[  165.966100]  ? slab_free_freelist_hook+0x11d/0x1d0
[  165.971461]  ? __get_vm_area_node+0x2af/0x360
[  165.976341]  ? __get_vm_area_node+0x2af/0x360
[  165.981219]  __vmalloc_node_range+0x291/0x560
[  165.986087]  ? __mutex_unlock_slowpath+0x161/0x5e0
[  165.991447]  ? move_module+0x4c/0x630
[  165.995547]  ? vfree_atomic+0xa0/0xa0
[  165.999647]  ? move_module+0x4c/0x630
[  166.003741]  module_alloc+0xe7/0x170
[  166.007747]  ? move_module+0x4c/0x630
[  166.011840]  move_module+0x4c/0x630
[  166.015751]  layout_and_allocate+0x32c/0x560
[  166.020519]  load_module+0x8e0/0x25c0
[  166.024623]  ? layout_and_allocate+0x560/0x560
[  166.029586]  ? kernel_read_file+0x286/0x6b0
[  166.034269]  ? __x64_sys_fspick+0x290/0x290
[  166.038946]  ? userfaultfd_unmap_prep+0x430/0x430
[  166.044203]  ? lock_downgrade+0x130/0x130
[  166.048698]  ? __do_sys_finit_module+0x11a/0x1c0
[  166.053854]  __do_sys_finit_module+0x11a/0x1c0
[  166.058818]  ? __ia32_sys_init_module+0xa0/0xa0
[  166.063882]  ? __seccomp_filter+0x92/0x930
[  166.068494]  do_syscall_64+0x59/0x90
[  166.072492]  ? do_syscall_64+0x69/0x90
[  166.076679]  ? do_syscall_64+0x69/0x90
[  166.080864]  ? do_syscall_64+0x69/0x90
[  166.085047]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[  166.090984]  ? lockdep_hardirqs_on+0x79/0x100
[  166.095855]  entry_SYSCALL_64_after_hwframe+0x63/0xcd[  165.818200] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size

Interestingly, when reducing the number of CPUs (nosmt), it works as
expected.

The underlying issue is that we first allocate memory (including module
vmap space) in layout_and_allocate(), and then verify whether the module
is unique in add_unformed_module(). So we end up allocating module vmap
space even though we might not need it -- which is a problem when modules
are big and we can have a lot of concurrent probing of the same set of
modules as on the big system at hand.

Unfortunately, we cannot simply add the module earlier, because
move_module() -- that allocates the module vmap space -- essentially
brings the module to life from a temporary one. Adding the temporary one
and replacing it is also sub-optimal (because replacing it would require
to synchronize against RCU) and feels kind of dangerous judging that we
end up copying it.

So instead, add a second list (pending_load_infos) that tracks the modules
(via their load_info) that are unique and are still getting loaded
("pending"), but haven't made it to the actual module list yet. This
shouldn't have a notable runtime overhead when concurrently loading
modules: the new list is expected to usually either be empty or contain
very few entries for a short time.

Thanks to Uladzislau for his help to verify that it's not actually a
vmap code issue.

Reported-by: Lin Liu <linl@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 kernel/module/internal.h |   2 +
 kernel/module/main.c     | 122 +++++++++++++++++++++++++++------------
 2 files changed, 88 insertions(+), 36 deletions(-)

Message ID	20221013180518.217405-1-david@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-modules-owner@kernel.org> X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06ADDC433FE for <linux-modules@archiver.kernel.org>; Thu, 13 Oct 2022 18:26:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231361AbiJMSZ6 (ORCPT <rfc822;linux-modules@archiver.kernel.org>); Thu, 13 Oct 2022 14:25:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231954AbiJMSZd (ORCPT <rfc822;linux-modules@vger.kernel.org>); Thu, 13 Oct 2022 14:25:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C8AD18DD7B for <linux-modules@vger.kernel.org>; Thu, 13 Oct 2022 11:20:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665685153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=228Ak1koI4cRS8wN5hieMFYJjdxB30m/YzzVRJtmkcw=; b=IOw8E6gz2LWUoybdWEdl4I6MSfe48f7Kkk2g6yol4nw3pileWim401P/sx3o+boOlhm+OH E00rw2cQitRU/A8HxwVfOPsH7Tp7/duKC1cBKgn3vWmvQWJ8CVKCouCkuathWpUdTLYlSm qZZQfflqe5hVEdc/z+WxQoUWfrmWXIg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-674-vYNjGK6zNQ2_s5OW0XqIYw-1; Thu, 13 Oct 2022 14:05:23 -0400 X-MC-Unique: vYNjGK6zNQ2_s5OW0XqIYw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E58E086C042; Thu, 13 Oct 2022 18:05:22 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.192.179]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6A719208744E; Thu, 13 Oct 2022 18:05:19 +0000 (UTC) From: David Hildenbrand <david@redhat.com> To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, David Hildenbrand <david@redhat.com>, Lin Liu <linl@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, Luis Chamberlain <mcgrof@kernel.org>, Uladzislau Rezki <urezki@gmail.com>, Alexander Potapenko <glider@google.com>, Andrey Konovalov <andreyknvl@gmail.com>, Andrey Ryabinin <ryabinin.a.a@gmail.com>, Dmitry Vyukov <dvyukov@google.com>, Vincenzo Frascino <vincenzo.frascino@arm.com> Subject: [PATCH v1] kernel/module: allocate module vmap space after making sure the module is unique Date: Thu, 13 Oct 2022 20:05:18 +0200 Message-Id: <20221013180518.217405-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: <linux-modules.vger.kernel.org>
Series	[v1] kernel/module: allocate module vmap space after making sure the module is unique \| expand [v1] kernel/module: allocate module vmap space after making sure the module is unique

[v1] kernel/module: allocate module vmap space after making sure the module is unique

Commit Message

Comments

Patch