From patchwork Wed Apr 13 16:28:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Omar Sandoval X-Patchwork-Id: 12812215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0651C433EF for ; Wed, 13 Apr 2022 16:29:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48C846B0074; Wed, 13 Apr 2022 12:29:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43B186B0075; Wed, 13 Apr 2022 12:29:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DCE26B0078; Wed, 13 Apr 2022 12:29:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0108.hostedemail.com [216.40.44.108]) by kanga.kvack.org (Postfix) with ESMTP id 1F9846B0074 for ; Wed, 13 Apr 2022 12:29:08 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D4E2DAA181 for ; Wed, 13 Apr 2022 16:29:07 +0000 (UTC) X-FDA: 79352390334.23.A679B0B Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) by imf19.hostedemail.com (Postfix) with ESMTP id 5C5221A000B for ; Wed, 13 Apr 2022 16:29:07 +0000 (UTC) Received: by mail-qk1-f171.google.com with SMTP id 75so1836615qkk.8 for ; Wed, 13 Apr 2022 09:29:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=zzIM/c691FWX7myaCTJPK4YAUgJo9ywcYonojAOAbBU=; b=163DgOwData09u4xLSY/+RO01ykQxG3YCTEXM46SITLcujjYWZ/vIrlC30fN98wYZr sSL9ilvz8QAfuQsupqPgnTtyd1+/6yuoLXCxerpTAdunFTGhdB2H+URStwV87zi/ekdp arwT6Jcnx2ZQklKXiFvNPuTIZZefMTg0HzKipBniePKXlWfBB6YNfpDIzbwT+V+c/2YH nm0DWRZJgprLpkl+uZYtnTIIQIJKvVCYq0JClexeYHWSGdxTzDpiOq3tY12sv+l2Mmli KGQSosINwP05YTWjf3zrx4lLbU/cMlGzNJsknZ/ZQTEZS//sJ3UxPnEVuwRcKmSF4eoH kC4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=zzIM/c691FWX7myaCTJPK4YAUgJo9ywcYonojAOAbBU=; b=VL+9X7oAdCOBBZWuLw8kziDlrvO6h+S973XnQUrPeu66hdjD3ySVfPeIgHPnGETG3p V0gRg5JgBUCWny5Qr4NXSDX96HYy68qdt9BmI441ZymnlIn6xkrL+Ln7tTlBmL8K9Ygl Ddqzy9LbWwm9soy74rrIxTijUjHGJXz79O6uVoTDI2HIz3FFMTgolytSDBtlqKZE+J9q foAhfi3zxfKPMOutOR1ETSl7XdhXXSZqtb9qbv/bFxU/YN2nbs3n+0jiCPj+xwm2Dvge daeT7AS7E168j6xShSYFUR0Epdk0Zvm4JvB6Mdc2oiApIsF4kY40+rsmbQru7swF0QPO 3pdw== X-Gm-Message-State: AOAM530cSAtmk+oyR/Hdq+SR0TUwUsrwhyK3t8ITcR+JRN17MyVleKrj p/agjTXBdSoNQCJGirZNwlkDWO71+4ykhA== X-Google-Smtp-Source: ABdhPJxKARBUilb56GnyrHbVjVhtGnLRneJbgFgOVI22TUYB2jaJ3iOHBM/LQD16l4gtz4VtgCdWEg== X-Received: by 2002:a37:cc4:0:b0:69b:d704:5781 with SMTP id 187-20020a370cc4000000b0069bd7045781mr7509660qkm.353.1649867346156; Wed, 13 Apr 2022 09:29:06 -0700 (PDT) Received: from relinquished.tfbnw.net ([2620:10d:c091:480::1:5987]) by smtp.gmail.com with ESMTPSA id f6-20020ac859c6000000b002ee0948f1aesm7274544qtf.72.2022.04.13.09.29.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Apr 2022 09:29:05 -0700 (PDT) From: Omar Sandoval To: linux-mm@kvack.org, kexec@lists.infradead.org, Andrew Morton Cc: Uladzislau Rezki , Christoph Hellwig , Baoquan He , x86@kernel.org, kernel-team@fb.com Subject: [PATCH v3] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Date: Wed, 13 Apr 2022 09:28:54 -0700 Message-Id: <53f25ddc953211d50bed06427d695f51f5ea37c7.1649867251.git.osandov@fb.com> X-Mailer: git-send-email 2.35.2 MIME-Version: 1.0 X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=osandov-com.20210112.gappssmtp.com header.s=20210112 header.b=163DgOwD; spf=none (imf19.hostedemail.com: domain of osandov@osandov.com has no SPF policy when checking 209.85.222.171) smtp.mailfrom=osandov@osandov.com; dmarc=none X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5C5221A000B X-Stat-Signature: 3pm7oz4twpdkyf9sfuxw3uhos3fc5oxu X-HE-Tag: 1649867347-393801 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Omar Sandoval Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to lazy_max_pages() + 1, ensuring that any future vunmaps() immediately purge the vmap areas instead of doing it lazily. Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context") moved the purging from the vunmap() caller to a worker thread. Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin (possibly forever). For example, consider the following scenario: 1. Thread reads from /proc/vmcore. This eventually calls __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets vmap_lazy_nr to lazy_max_pages() + 1. 2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 pages (one page plus the guard page) to the purge list and vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the drain_vmap_work is scheduled. 3. Thread returns from the kernel and is scheduled out. 4. Worker thread is scheduled in and calls drain_vmap_area_work(). It frees the 2 pages on the purge list. vmap_lazy_nr is now lazy_max_pages() + 1. 5. This is still over the threshold, so it tries to purge areas again, but doesn't find anything. 6. Repeat 5. If the system is running with only one CPU (which is typicial for kdump) and preemption is disabled, then this will never make forward progress: there aren't any more pages to purge, so it hangs. If there is more than one CPU or preemption is enabled, then the worker thread will spin forever in the background. (Note that if there were already pages to be purged at the time that set_iounmap_nonlazy() was called, this bug is avoided.) This can be reproduced with anything that reads from /proc/vmcore multiple times. E.g., vmcore-dmesg /proc/vmcore. It turns out that improvements to vmap() over the years have obsoleted the need for this "optimization". I benchmarked `dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed entirely: |5.17 |5.18+fix|5.18+removal 4k|40.86s| 40.09s| 26.73s 1M|24.47s| 23.98s| 21.84s The removal was the fastest (by a wide margin with 4k reads). This patch removes set_iounmap_nonlazy(). Fixes: 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context") Reviewed-by: Christoph Hellwig Reviewed-by: Uladzislau Rezki (Sony) Acked-by: Chris Down Signed-off-by: Omar Sandoval Acked-by: Baoquan He --- Changes from v2 -> v3: - Add Fixes and Reviewed-by tags (no code changes) Changes from v1 -> v2: - Remove set_iounmap_nonlazy() entirely instead of fixing it. arch/x86/include/asm/io.h | 2 -- arch/x86/kernel/crash_dump_64.c | 1 - mm/vmalloc.c | 11 ----------- 3 files changed, 14 deletions(-) diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h index f6d91ecb8026..e9736af126b2 100644 --- a/arch/x86/include/asm/io.h +++ b/arch/x86/include/asm/io.h @@ -210,8 +210,6 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size); extern void iounmap(volatile void __iomem *addr); #define iounmap iounmap -extern void set_iounmap_nonlazy(void); - #ifdef __KERNEL__ void memcpy_fromio(void *, const volatile void __iomem *, size_t); diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c index a7f617a3981d..97529552dd24 100644 --- a/arch/x86/kernel/crash_dump_64.c +++ b/arch/x86/kernel/crash_dump_64.c @@ -37,7 +37,6 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, } else memcpy(buf, vaddr + offset, csize); - set_iounmap_nonlazy(); iounmap((void __iomem *)vaddr); return csize; } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index e163372d3967..0b17498a34f1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock); /* for per-CPU blocks */ static void purge_fragmented_blocks_allcpus(void); -#ifdef CONFIG_X86_64 -/* - * called before a call to iounmap() if the caller wants vm_area_struct's - * immediately freed. - */ -void set_iounmap_nonlazy(void) -{ - atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1); -} -#endif /* CONFIG_X86_64 */ - /* * Purges all lazily-freed vmap areas. */