From patchwork Mon Sep 20 14:28:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12505529 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50799C433F5 for ; Mon, 20 Sep 2021 14:30:38 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E1C04610CE for ; Mon, 20 Sep 2021 14:30:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org E1C04610CE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 775CA6B006C; Mon, 20 Sep 2021 10:30:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 724FB6B0073; Mon, 20 Sep 2021 10:30:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 614E96B0074; Mon, 20 Sep 2021 10:30:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 531686B006C for ; Mon, 20 Sep 2021 10:30:37 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 181832776D for ; Mon, 20 Sep 2021 14:30:36 +0000 (UTC) X-FDA: 78608187672.05.4BB7AA7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf28.hostedemail.com (Postfix) with ESMTP id C130B9000506 for ; Mon, 20 Sep 2021 14:30:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632148235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OPXHqY9VOL97jpGsR9CxoGVxdCxjVyj0nMigqpjV/2w=; b=IpL29n+DgDAgL6hZ281qtFTLYfdZEYwLthed8QTySqSGdDHS2gJOiZzAs/2JaOMEetE+yY 8Az+1A/xCt+SlreNiVrQZ0U8RmlnA6+WyPw87iVmyWW71iB1XtZGgrB0zyKXpnTGLBjK0f YlaQTZfqAPluRtgHvX55REsCt+WraKo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-408-erBAtSQmMLWTTu5FHcrD1Q-1; Mon, 20 Sep 2021 10:30:34 -0400 X-MC-Unique: erBAtSQmMLWTTu5FHcrD1Q-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7936A1030C24; Mon, 20 Sep 2021 14:30:32 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id BD69360C17; Mon, 20 Sep 2021 14:29:21 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Arnd Bergmann , Greg Kroah-Hartman , "Michael S. Tsirkin" , Jason Wang , "Rafael J. Wysocki" , Andrew Morton , Dan Williams , Hanjun Guo , Andy Shevchenko , virtualization@lists.linux-foundation.org, linux-mm@kvack.org Subject: [PATCH v5 1/3] kernel/resource: clean up and optimize iomem_is_exclusive() Date: Mon, 20 Sep 2021 16:28:54 +0200 Message-Id: <20210920142856.17758-2-david@redhat.com> In-Reply-To: <20210920142856.17758-1-david@redhat.com> References: <20210920142856.17758-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=IpL29n+D; spf=none (imf28.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: C130B9000506 X-Stat-Signature: 58i43z7b4xy33zis7xn48pjbnottb36p X-HE-Tag: 1632148235-107996 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We end up traversing subtrees of ranges we are not interested in; let's optimize this case, skipping such subtrees, cleaning up the function a bit. For example, in the following configuration (/proc/iomem): 00000000-00000fff : Reserved 00001000-00057fff : System RAM 00058000-00058fff : Reserved 00059000-0009cfff : System RAM 0009d000-000fffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000c3fff : PCI Bus 0000:00 000c4000-000c7fff : PCI Bus 0000:00 000c8000-000cbfff : PCI Bus 0000:00 000cc000-000cffff : PCI Bus 0000:00 000d0000-000d3fff : PCI Bus 0000:00 000d4000-000d7fff : PCI Bus 0000:00 000d8000-000dbfff : PCI Bus 0000:00 000dc000-000dffff : PCI Bus 0000:00 000e0000-000e3fff : PCI Bus 0000:00 000e4000-000e7fff : PCI Bus 0000:00 000e8000-000ebfff : PCI Bus 0000:00 000ec000-000effff : PCI Bus 0000:00 000f0000-000fffff : PCI Bus 0000:00 000f0000-000fffff : System ROM 00100000-3fffffff : System RAM 40000000-403fffff : Reserved 40000000-403fffff : pnp 00:00 40400000-80a79fff : System RAM ... We don't have to look at any children of "0009d000-000fffff : Reserved" if we can just skip these 15 items directly because the parent range is not of interest. Reviewed-by: Dan Williams Signed-off-by: David Hildenbrand --- kernel/resource.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index ca9f5198a01f..2999f57da38c 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -73,6 +73,18 @@ static struct resource *next_resource(struct resource *p) return p->sibling; } +static struct resource *next_resource_skip_children(struct resource *p) +{ + while (!p->sibling && p->parent) + p = p->parent; + return p->sibling; +} + +#define for_each_resource(_root, _p, _skip_children) \ + for ((_p) = (_root)->child; (_p); \ + (_p) = (_skip_children) ? next_resource_skip_children(_p) : \ + next_resource(_p)) + static void *r_next(struct seq_file *m, void *v, loff_t *pos) { struct resource *p = v; @@ -1712,10 +1724,9 @@ static int strict_iomem_checks; */ bool iomem_is_exclusive(u64 addr) { - struct resource *p = &iomem_resource; - bool err = false; - loff_t l; + bool skip_children = false, err = false; int size = PAGE_SIZE; + struct resource *p; if (!strict_iomem_checks) return false; @@ -1723,15 +1734,19 @@ bool iomem_is_exclusive(u64 addr) addr = addr & PAGE_MASK; read_lock(&resource_lock); - for (p = p->child; p ; p = r_next(NULL, p, &l)) { + for_each_resource(&iomem_resource, p, skip_children) { /* * We can probably skip the resources without * IORESOURCE_IO attribute? */ if (p->start >= addr + size) break; - if (p->end < addr) + if (p->end < addr) { + skip_children = true; continue; + } + skip_children = false; + /* * A resource is exclusive if IORESOURCE_EXCLUSIVE is set * or CONFIG_IO_STRICT_DEVMEM is enabled and the From patchwork Mon Sep 20 14:28:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12505531 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FF2AC433F5 for ; Mon, 20 Sep 2021 14:30:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0C5A960E76 for ; Mon, 20 Sep 2021 14:30:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0C5A960E76 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id A83A36B0073; Mon, 20 Sep 2021 10:30:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0D266B0074; Mon, 20 Sep 2021 10:30:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D4316B0075; Mon, 20 Sep 2021 10:30:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0160.hostedemail.com [216.40.44.160]) by kanga.kvack.org (Postfix) with ESMTP id 7FE436B0073 for ; Mon, 20 Sep 2021 10:30:39 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3E52F180AD820 for ; Mon, 20 Sep 2021 14:30:39 +0000 (UTC) X-FDA: 78608187798.16.85A32F5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf06.hostedemail.com (Postfix) with ESMTP id E5CEA801AB0D for ; Mon, 20 Sep 2021 14:30:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632148238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xw4p12rNhpw8xyA7/N2Zq43KStyN0IgZ3lBn9YKbLA8=; b=LooQ9eZb55q+7biaeQ3BHfWiqFYaS984YzExQMcZyXyjcVTniVlH3TmP0uhoVgzHfboWhg red3vhjLBNabtEK/4Ni018Lfd7iqbtK40u1hTFYShHoFCCrvQoVXrLbp7wyDw1aIZL+K3D IwZvPZiVV8CQY236L61y9XIi9iVMnfE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-593-fHb1zMzKNaOZyNsMm_CB8w-1; Mon, 20 Sep 2021 10:30:37 -0400 X-MC-Unique: fHb1zMzKNaOZyNsMm_CB8w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CC6C4802928; Mon, 20 Sep 2021 14:30:35 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD97D60C17; Mon, 20 Sep 2021 14:30:32 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Arnd Bergmann , Greg Kroah-Hartman , "Michael S. Tsirkin" , Jason Wang , "Rafael J. Wysocki" , Andrew Morton , Dan Williams , Hanjun Guo , Andy Shevchenko , virtualization@lists.linux-foundation.org, linux-mm@kvack.org Subject: [PATCH v5 2/3] kernel/resource: disallow access to exclusive system RAM regions Date: Mon, 20 Sep 2021 16:28:55 +0200 Message-Id: <20210920142856.17758-3-david@redhat.com> In-Reply-To: <20210920142856.17758-1-david@redhat.com> References: <20210920142856.17758-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=LooQ9eZb; spf=none (imf06.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: uzdry69cfz37fnno7kebm6uujz8uh49q X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E5CEA801AB0D X-HE-Tag: 1632148238-169252 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: virtio-mem dynamically exposes memory inside a device memory region as system RAM to Linux, coordinating with the hypervisor which parts are actually "plugged" and consequently usable/accessible. On the one hand, the virtio-mem driver adds/removes whole memory blocks, creating/removing busy IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs memory inside added memory blocks, dynamically either exposing them to the buddy or hiding them from the buddy and marking them PG_offline. In contrast to physical devices, like a DIMM, the virtio-mem driver is required to actually make use of any of the device-provided memory, because it performs the handshake with the hypervisor. virtio-mem memory cannot simply be access via /dev/mem without a driver. There is no safe way to: a) Access plugged memory blocks via /dev/mem, as they might contain unplugged holes or might get silently unplugged by the virtio-mem driver and consequently turned inaccessible. b) Access unplugged memory blocks via /dev/mem because the virtio-mem driver is required to make them actually accessible first. The virtio-spec states that unplugged memory blocks MUST NOT be written, and only selected unplugged memory blocks MAY be read. We want to make sure, this is the case in sane environments -- where the virtio-mem driver was loaded. We want to make sure that in a sane environment, nobody "accidentially" accesses unplugged memory inside the device managed region. For example, a user might spot a memory region in /proc/iomem and try accessing it via /dev/mem via gdb or dumping it via something else. By the time the mmap() happens, the memory might already have been removed by the virtio-mem driver silently: the mmap() would succeeed and user space might accidentially access unplugged memory. So once the driver was loaded and detected the device along the device-managed region, we just want to disallow any access via /dev/mem to it. In an ideal world, we would mark the whole region as busy ("owned by a driver") and exclude it; however, that would be wrong, as we don't really have actual system RAM at these ranges added to Linux ("busy system RAM"). Instead, we want to mark such ranges as "not actual busy system RAM but still soft-reserved and prepared by a driver for future use." Let's teach iomem_is_exclusive() to reject access to any range with "IORESOURCE_SYSTEM_RAM | IORESOURCE_EXCLUSIVE", even if not busy and even if "iomem=relaxed" is set. Introduce EXCLUSIVE_SYSTEM_RAM to make it easier for applicable drivers to depend on this setting in their Kconfig. For now, there are no applicable ranges and we'll modify virtio-mem next to properly set IORESOURCE_EXCLUSIVE on the parent resource container it creates to contain all actual busy system RAM added via add_memory_driver_managed(). Reviewed-by: Dan Williams Signed-off-by: David Hildenbrand --- kernel/resource.c | 29 +++++++++++++++++++---------- mm/Kconfig | 7 +++++++ 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/kernel/resource.c b/kernel/resource.c index 2999f57da38c..5ad3eba619ba 100644 --- a/kernel/resource.c +++ b/kernel/resource.c @@ -1719,26 +1719,23 @@ static int strict_iomem_checks; #endif /* - * check if an address is reserved in the iomem resource tree - * returns true if reserved, false if not reserved. + * Check if an address is exclusive to the kernel and must not be mapped to + * user space, for example, via /dev/mem. + * + * Returns true if exclusive to the kernel, otherwise returns false. */ bool iomem_is_exclusive(u64 addr) { + const unsigned int exclusive_system_ram = IORESOURCE_SYSTEM_RAM | + IORESOURCE_EXCLUSIVE; bool skip_children = false, err = false; int size = PAGE_SIZE; struct resource *p; - if (!strict_iomem_checks) - return false; - addr = addr & PAGE_MASK; read_lock(&resource_lock); for_each_resource(&iomem_resource, p, skip_children) { - /* - * We can probably skip the resources without - * IORESOURCE_IO attribute? - */ if (p->start >= addr + size) break; if (p->end < addr) { @@ -1747,12 +1744,24 @@ bool iomem_is_exclusive(u64 addr) } skip_children = false; + /* + * IORESOURCE_SYSTEM_RAM resources are exclusive if + * IORESOURCE_EXCLUSIVE is set, even if they + * are not busy and even if "iomem=relaxed" is set. The + * responsible driver dynamically adds/removes system RAM within + * such an area and uncontrolled access is dangerous. + */ + if ((p->flags & exclusive_system_ram) == exclusive_system_ram) { + err = true; + break; + } + /* * A resource is exclusive if IORESOURCE_EXCLUSIVE is set * or CONFIG_IO_STRICT_DEVMEM is enabled and the * resource is busy. */ - if ((p->flags & IORESOURCE_BUSY) == 0) + if (!strict_iomem_checks || !(p->flags & IORESOURCE_BUSY)) continue; if (IS_ENABLED(CONFIG_IO_STRICT_DEVMEM) || p->flags & IORESOURCE_EXCLUSIVE) { diff --git a/mm/Kconfig b/mm/Kconfig index d16ba9249bc5..87a9b98924cd 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -109,6 +109,13 @@ config NUMA_KEEP_MEMINFO config MEMORY_ISOLATION bool +# IORESOURCE_SYSTEM_RAM regions in the kernel resource tree that are marked +# IORESOURCE_EXCLUSIVE cannot be mapped to user space, for example, via +# /dev/mem. +config EXCLUSIVE_SYSTEM_RAM + def_bool y + depends on !DEVMEM || STRICT_DEVMEM + # # Only be set on architectures that have completely implemented memory hotplug # feature. If you are not sure, don't touch it. From patchwork Mon Sep 20 14:28:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 12505533 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E8F6C433EF for ; Mon, 20 Sep 2021 14:30:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EB1AD60E76 for ; Mon, 20 Sep 2021 14:30:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org EB1AD60E76 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 8FE3F6B0074; Mon, 20 Sep 2021 10:30:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8ABA26B0075; Mon, 20 Sep 2021 10:30:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79B58900002; Mon, 20 Sep 2021 10:30:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0118.hostedemail.com [216.40.44.118]) by kanga.kvack.org (Postfix) with ESMTP id 6CD4A6B0074 for ; Mon, 20 Sep 2021 10:30:46 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2DE74181AEF3F for ; Mon, 20 Sep 2021 14:30:46 +0000 (UTC) X-FDA: 78608188092.07.FB2F255 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf20.hostedemail.com (Postfix) with ESMTP id C9AF2D000653 for ; Mon, 20 Sep 2021 14:30:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1632148245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u9TCtQQQDHvFRzvXMYX+z7+x7whiCvDZBWtE0o8SjRw=; b=VqhcvlXI01zjumZd3g770X4X7tkiItq7nv9C74CMNeoUBwGuUNpHKJ8nHcIjcb/9xgAvOL hx4Ir8cbXtUYNHFH/aLCAyB+JIwKL3smfaZqy/RpF8dIruyEwIlk21mYT5HBCQCC93Yfcq W+I7OIcoSbjJRbJjIy79Uw7g9Z4SkUc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-379-lQdMcZHkM46HaswXCYi9-g-1; Mon, 20 Sep 2021 10:30:42 -0400 X-MC-Unique: lQdMcZHkM46HaswXCYi9-g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 53797100C670; Mon, 20 Sep 2021 14:30:40 +0000 (UTC) Received: from t480s.redhat.com (unknown [10.39.194.236]) by smtp.corp.redhat.com (Postfix) with ESMTP id 25EF660C17; Mon, 20 Sep 2021 14:30:35 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Arnd Bergmann , Greg Kroah-Hartman , "Michael S. Tsirkin" , Jason Wang , "Rafael J. Wysocki" , Andrew Morton , Dan Williams , Hanjun Guo , Andy Shevchenko , virtualization@lists.linux-foundation.org, linux-mm@kvack.org Subject: [PATCH v5 3/3] virtio-mem: disallow mapping virtio-mem memory via /dev/mem Date: Mon, 20 Sep 2021 16:28:56 +0200 Message-Id: <20210920142856.17758-4-david@redhat.com> In-Reply-To: <20210920142856.17758-1-david@redhat.com> References: <20210920142856.17758-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C9AF2D000653 X-Stat-Signature: twr8tocmerkbp4nh3zswsybyzdyb7r8o Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=VqhcvlXI; spf=none (imf20.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 216.205.24.124) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1632148245-306401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We don't want user space to be able to map virtio-mem device memory directly (e.g., via /dev/mem) in order to have guarantees that in a sane setup we'll never accidentially access unplugged memory within the device-managed region of a virtio-mem device, just as required by the virtio-spec. As soon as the virtio-mem driver is loaded, the device region is visible in /proc/iomem via the parent device region. From that point on user space is aware of the device region and we want to disallow mapping anything inside that region (where we will dynamically (un)plug memory) until the driver has been unloaded cleanly and e.g., another driver might take over. By creating our parent IORESOURCE_SYSTEM_RAM resource with IORESOURCE_EXCLUSIVE, we will disallow any /dev/mem access to our device region until the driver was unloaded cleanly and removed the parent region. This will work even though only some memory blocks are actually currently added to Linux and appear as busy in the resource tree. So access to the region from user space is only possible a) if we don't load the virtio-mem driver. b) after unloading the virtio-mem driver cleanly. Don't build virtio-mem if access to /dev/mem cannot be restricticted -- if we have CONFIG_DEVMEM=y but CONFIG_STRICT_DEVMEM is not set. Reviewed-by: Dan Williams Acked-by: Michael S. Tsirkin Signed-off-by: David Hildenbrand --- drivers/virtio/Kconfig | 1 + drivers/virtio/virtio_mem.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index ce1b3f6ec325..0d974162899d 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -101,6 +101,7 @@ config VIRTIO_MEM depends on MEMORY_HOTPLUG_SPARSE depends on MEMORY_HOTREMOVE depends on CONTIG_ALLOC + depends on EXCLUSIVE_SYSTEM_RAM help This driver provides access to virtio-mem paravirtualized memory devices, allowing to hotplug and hotunplug memory. diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c index bef8ad6bf466..c619bc0e46a7 100644 --- a/drivers/virtio/virtio_mem.c +++ b/drivers/virtio/virtio_mem.c @@ -2525,8 +2525,10 @@ static int virtio_mem_create_resource(struct virtio_mem *vm) if (!name) return -ENOMEM; + /* Disallow mapping device memory via /dev/mem completely. */ vm->parent_resource = __request_mem_region(vm->addr, vm->region_size, - name, IORESOURCE_SYSTEM_RAM); + name, IORESOURCE_SYSTEM_RAM | + IORESOURCE_EXCLUSIVE); if (!vm->parent_resource) { kfree(name); dev_warn(&vm->vdev->dev, "could not reserve device region\n");