From patchwork Tue Jun 18 15:41:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 13702530 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AE99BC27C4F for ; Tue, 18 Jun 2024 15:42:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=NrQ9SX++b7Nva91PMpgSnQgFuIlKDTDGV0m5SosKQkE=; b=cIef9CVX6i9iBnVxOw67j5NYGe h7eJDYhC999pMGavX13k9rv2SZcusZWFvmRmPugu77Sadd9L/Y9RozYk/0TBs9wnxtlHqGiIQ6QLl f+k2IbV4H+j5ejEJIFZR15SsCWBAuroIdgGc/kI8vClGMruOpTg+3lykXY2Ewc33sbwwiEYRguDgI dfwes0ezrtGiSUMZ/g2fsUMPHo/ATxlBDJ2RBNW6MjgCt90ngk7elDa9f+yd2RuBHJoDxstQYmYzR UWo4T2LLyS59oDOZ7Er3MPow5ua97M63P5O/dokrD2Lt364oJHQlHm3bMcLdNlhReX4ODZ6Kpt8+q nWnjX0Cg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sJaxx-0000000FejN-3eFZ; Tue, 18 Jun 2024 15:41:49 +0000 Received: from smtpout.efficios.com ([167.114.26.122]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sJaxf-0000000FeaB-1WOx for linux-arm-kernel@lists.infradead.org; Tue, 18 Jun 2024 15:41:34 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1718725286; bh=FctRNyLMVCXApM+zZ0hjS0VrXYFC+wVXydecSREV5uo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WacNiq9z+yhx9wZepGion63OStdao/cmKMlYCKm67HDIvEhchEczlFxFJ2tQQTwjU h6PS+ZyBPgGmpw08s7aW1taZ9RIbDzGUgoOswYRsbapzcGJcwxaDN6qHb+zJ0oFWYA 2RI5cE30TiGkGQZEeBqKJR5vFwUZRgOdYH+zDGjGQLejuDOOJT5PkjmuTTvH75dHwu qjsvd05m+ujXNlZ/GzCkHRrEUqU7uPA4TXi6MeKuhosWsrs6xnDi/XFiZou8g5w4Qo ZhyqhYIGuN4g6EJ0hsFI0Kx/re1sbFzccoI03pfENXRQq1TUxYsPWErkC/gW7tbdIP i7uV1nwBsFbOQ== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4W3WFK6N1Vz16w4; Tue, 18 Jun 2024 11:41:25 -0400 (EDT) From: Mathieu Desnoyers To: Dan Williams , Steven Rostedt Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Vishal Verma , Dave Jiang , Ira Weiny , nvdimm@lists.linux.dev, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Catalin Marinas , Will Deacon , linux-arm-kernel@lists.infradead.org Subject: [RFC PATCH 2/4] nvdimm/pmem: Flush to memory before machine restart Date: Tue, 18 Jun 2024 11:41:55 -0400 Message-Id: <20240618154157.334602-3-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20240618154157.334602-1-mathieu.desnoyers@efficios.com> References: <20240618154157.334602-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240618_084131_668769_D61A1EA5 X-CRM114-Status: GOOD ( 22.87 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Register pre-restart notifiers to flush pmem areas from CPU data cache to memory on reboot, immediately before restarting the machine. This ensures all other CPUs are quiescent before the pmem data is flushed to memory. I did an earlier POC that flushed caches on panic/die oops notifiers [1], but it did not cover the reboot case. I've been made aware that some distribution vendors have started shipping their own modified version of my earlier POC patch. This makes a strong argument for upstreaming this work. Use the newly introduced "pre-restart" notifiers to flush pmem data to memory immediately before machine restart. Delta from my POC patch [1]: Looking at the panic() code, it invokes emergency_restart() to restart the machine, which uses the new pre-restart notifiers. There is therefore no need to hook into panic handlers explicitly. Looking at the die notifiers, those don't actually end up triggering a machine restart, so it does not appear to be relevant to flush pmem to memory there. I must admit I originally looked at how ftrace hooked into panic/die-oops handlers for its ring buffers, but the use-case it different here: we only want to cover machine restart use-cases. Link: https://lore.kernel.org/linux-kernel/f6067e3e-a2bc-483d-b214-6e3fe6691279@efficios.com/ [1] Signed-off-by: Mathieu Desnoyers Cc: Dan Williams Cc: Vishal Verma Cc: Dave Jiang Cc: Ira Weiny Cc: Steven Rostedt Cc: nvdimm@lists.linux.dev Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: x86@kernel.org Cc: "H. Peter Anvin" Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-kernel@lists.infradead.org --- drivers/nvdimm/pmem.c | 29 ++++++++++++++++++++++++++++- drivers/nvdimm/pmem.h | 2 ++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index 598fe2e89bda..bf1d187a9dca 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -26,12 +26,16 @@ #include #include #include +#include #include #include "pmem.h" #include "btt.h" #include "pfn.h" #include "nd.h" +static int pmem_pre_restart_handler(struct notifier_block *self, + unsigned long ev, void *unused); + static struct device *to_dev(struct pmem_device *pmem) { /* @@ -423,6 +427,7 @@ static void pmem_release_disk(void *__pmem) { struct pmem_device *pmem = __pmem; + unregister_pre_restart_notifier(&pmem->pre_restart_notifier); dax_remove_host(pmem->disk); kill_dax(pmem->dax_dev); put_dax(pmem->dax_dev); @@ -575,9 +580,14 @@ static int pmem_attach_disk(struct device *dev, goto out_cleanup_dax; dax_write_cache(dax_dev, nvdimm_has_cache(nd_region)); } - rc = device_add_disk(dev, disk, pmem_attribute_groups); + pmem->pre_restart_notifier.notifier_call = pmem_pre_restart_handler; + pmem->pre_restart_notifier.priority = 0; + rc = register_pre_restart_notifier(&pmem->pre_restart_notifier); if (rc) goto out_remove_host; + rc = device_add_disk(dev, disk, pmem_attribute_groups); + if (rc) + goto out_unregister_reboot; if (devm_add_action_or_reset(dev, pmem_release_disk, pmem)) return -ENOMEM; @@ -589,6 +599,8 @@ static int pmem_attach_disk(struct device *dev, dev_warn(dev, "'badblocks' notification disabled\n"); return 0; +out_unregister_pre_restart: + unregister_pre_restart_notifier(&pmem->pre_restart_notifier); out_remove_host: dax_remove_host(pmem->disk); out_cleanup_dax: @@ -751,6 +763,21 @@ static void nd_pmem_notify(struct device *dev, enum nvdimm_event event) } } +/* + * For volatile memory use-cases where explicit flushing of the data cache is + * not useful after stores, the pmem reboot notifier is called on preparation + * for restart to make sure the content of the pmem memory area is flushed from + * data cache to memory, so it can be preserved across warm reboot. + */ +static int pmem_pre_restart_handler(struct notifier_block *self, + unsigned long ev, void *unused) +{ + struct pmem_device *pmem = container_of(self, struct pmem_device, pre_restart_notifier); + + arch_wb_cache_pmem(pmem->virt_addr, pmem->size); + return NOTIFY_DONE; +} + MODULE_ALIAS("pmem"); MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_IO); MODULE_ALIAS_ND_DEVICE(ND_DEVICE_NAMESPACE_PMEM); diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h index 392b0b38acb9..b8a2a518cf82 100644 --- a/drivers/nvdimm/pmem.h +++ b/drivers/nvdimm/pmem.h @@ -4,6 +4,7 @@ #include #include #include +#include #include #include #include @@ -27,6 +28,7 @@ struct pmem_device { struct dax_device *dax_dev; struct gendisk *disk; struct dev_pagemap pgmap; + struct notifier_block pre_restart_notifier; }; long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,