From patchwork Wed Jul 7 19:32:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B377EC07E95 for ; Wed, 7 Jul 2021 19:39:22 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6FE2161C5B for ; Wed, 7 Jul 2021 19:39:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FE2161C5B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:50016 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DOD-0006Od-Kl for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:39:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44824) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIH-0001Vr-4W for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:13 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:28730) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIF-0007eU-Bp for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686390; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sUUNacenpIZlJEyGmdObkLlThuzT36HxrtqCf/9iJfQ=; b=bEzk+HXsoPLVFoNKYwU1vpZBfK52AVXtnPHssp9pvP1cpsTOFb+N2inkVcoudNKUgpWVRn Z0T3/j7fvZvObGyAopK5ITqh6fER5XLqHxIC2n0vI8Yy8WrR6sUmw1SGHNBwkRRqslvvSh +B0DjP6CNOcjecfuJYFNI1kW59d6o/U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-311-4wINI3jENCmuhNDnlylBvg-1; Wed, 07 Jul 2021 15:32:45 -0400 X-MC-Unique: 4wINI3jENCmuhNDnlylBvg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D3DC283DC1D; Wed, 7 Jul 2021 19:32:43 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8147760BD9; Wed, 7 Jul 2021 19:32:43 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 01/15] vmbus: Don't make QOM property registration conditional Date: Wed, 7 Jul 2021 15:32:27 -0400 Message-Id: <20210707193241.2659335-2-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , "Maciej S . Szmigiero" , Eduardo Habkost Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Having properties registered conditionally makes QOM type introspection difficult. Instead of skipping registration of the "instanceid" property, always register the property but validate its value against the instance id required by the class. Signed-off-by: Eduardo Habkost Acked-by: Maciej S. Szmigiero Message-Id: <20201009200701.1830060-1-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost --- hw/hyperv/vmbus.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/hw/hyperv/vmbus.c b/hw/hyperv/vmbus.c index 984caf898dc..c9887d5a7bc 100644 --- a/hw/hyperv/vmbus.c +++ b/hw/hyperv/vmbus.c @@ -2372,6 +2372,14 @@ static void vmbus_dev_realize(DeviceState *dev, Error **errp) assert(!qemu_uuid_is_null(&vdev->instanceid)); + if (!qemu_uuid_is_null(&vdc->instanceid)) { + /* Class wants to only have a single instance with a fixed UUID */ + if (!qemu_uuid_is_equal(&vdev->instanceid, &vdc->instanceid)) { + error_setg(&err, "instance id can't be changed"); + goto error_out; + } + } + /* Check for instance id collision for this class id */ QTAILQ_FOREACH(child, &BUS(vmbus)->children, sibling) { VMBusDevice *child_dev = VMBUS_DEVICE(child->child); @@ -2438,18 +2446,22 @@ static void vmbus_dev_unrealize(DeviceState *dev) free_channels(vdev); } +static Property vmbus_dev_props[] = { + DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid), + DEFINE_PROP_END_OF_LIST() +}; + + static void vmbus_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *kdev = DEVICE_CLASS(klass); + device_class_set_props(kdev, vmbus_dev_props); kdev->bus_type = TYPE_VMBUS; kdev->realize = vmbus_dev_realize; kdev->unrealize = vmbus_dev_unrealize; kdev->reset = vmbus_dev_reset; } -static Property vmbus_dev_instanceid = - DEFINE_PROP_UUID("instanceid", VMBusDevice, instanceid); - static void vmbus_dev_instance_init(Object *obj) { VMBusDevice *vdev = VMBUS_DEVICE(obj); @@ -2458,8 +2470,6 @@ static void vmbus_dev_instance_init(Object *obj) if (!qemu_uuid_is_null(&vdc->instanceid)) { /* Class wants to only have a single instance with a fixed UUID */ vdev->instanceid = vdc->instanceid; - } else { - qdev_property_add_static(DEVICE(vdev), &vmbus_dev_instanceid); } } From patchwork Wed Jul 7 19:32:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4138AC07E95 for ; Wed, 7 Jul 2021 19:34:37 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C6D5461C77 for ; Wed, 7 Jul 2021 19:34:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C6D5461C77 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:33692 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DJb-0003ML-Ne for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:34:35 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44730) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DHv-0001DZ-Vm for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:32:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:21234) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DHt-0007c7-5R for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:32:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686367; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aXrAFMnrKnwrkhFdDtPiujztDYX2W+xq4SoIb47z0Oo=; b=FrsuCIO6Y6P3dLsZOpKEWOmxIlPgC7ugRVgRjezk+d81uxvpq49g2p4sHmxSdcbdhUX1bj WJUWN3WNVtgcMm2meQFV90lzsrly30ObDfEup1eF9pXG/LqfOsAaSFZXo8qpNiow3op29P 4XmHNToVsl6ti5F19inWqCTL6ntG8JE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-186-XcWTXugRP1S8nJ2y5XuyZw-1; Wed, 07 Jul 2021 15:32:45 -0400 X-MC-Unique: XcWTXugRP1S8nJ2y5XuyZw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DD20B1932489; Wed, 7 Jul 2021 19:32:44 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 81BC35D705; Wed, 7 Jul 2021 19:32:44 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 02/15] Deprecate pmem=on with non-DAX capable backend file Date: Wed, 7 Jul 2021 15:32:28 -0400 Message-Id: <20210707193241.2659335-3-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , =?utf-8?q?Philippe_Mathieu-Daud?= =?utf-8?q?=C3=A9?= , Eduardo Habkost , Igor Mammedov Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: Igor Mammedov It is not safe to pretend that emulated NVDIMM supports persistence while backend actually failed to enable it and used non-persistent mapping as fall back. Instead of falling-back, QEMU should be more strict and error out with clear message that it's not supported. So if user asks for persistence (pmem=on), they should store backing file on NVDIMM. Signed-off-by: Igor Mammedov Reviewed-by: Philippe Mathieu-Daudé Message-Id: <20210111203332.740815-1-imammedo@redhat.com> Signed-off-by: Eduardo Habkost --- docs/system/deprecated.rst | 18 ++++++++++++++++++ util/mmap-alloc.c | 2 ++ 2 files changed, 20 insertions(+) diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst index 70e08baff62..94fb7dbf4e6 100644 --- a/docs/system/deprecated.rst +++ b/docs/system/deprecated.rst @@ -221,6 +221,24 @@ This machine is deprecated because we have enough AST2500 based OpenPOWER machines. It can be easily replaced by the ``witherspoon-bmc`` or the ``romulus-bmc`` machines. +Backend options +--------------- + +Using non-persistent backing file with pmem=on (since 6.1) +'''''''''''''''''''''''''''''''''''''''''''''''''''''''''' + +This option is used when ``memory-backend-file`` is consumed by emulated NVDIMM +device. However enabling ``memory-backend-file.pmem`` option, when backing file +is (a) not DAX capable or (b) not on a filesystem that support direct mapping +of persistent memory, is not safe and may lead to data loss or corruption in case +of host crash. +Options are: + + - modify VM configuration to set ``pmem=off`` to continue using fake NVDIMM + (without persistence guaranties) with backing file on non DAX storage + - move backing file to NVDIMM storage and keep ``pmem=on`` + (to have NVDIMM with persistence guaranties). + Device options -------------- diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c index 838e286ce51..893d864354a 100644 --- a/util/mmap-alloc.c +++ b/util/mmap-alloc.c @@ -225,6 +225,8 @@ static void *mmap_activate(void *ptr, size_t size, int fd, "crash.\n", file_name); g_free(proc_link); g_free(file_name); + warn_report("Using non DAX backing file with 'pmem=on' option" + " is deprecated"); } /* * If mmap failed with MAP_SHARED_VALIDATE | MAP_SYNC, we will try From patchwork Wed Jul 7 19:32:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9B7DC07E9B for ; Wed, 7 Jul 2021 19:34:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2BF7461C5B for ; Wed, 7 Jul 2021 19:34:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BF7461C5B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:34896 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DJt-00048b-50 for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:34:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44776) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIE-0001PN-5D for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:33554) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIB-0007cl-Fx for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z/SigosLQsjzXuxPVs1hsehCdHaAeqSZBR+4KVU9C80=; b=EZw70JLfvu+cLPQyRkMilVNLxmT3OLxAwgBxDfkzpEEONwLlRj7SgWclby01e6D7mTwXnc WqBJTTtM5p1c/beE2C/KdPljIFnFB+z7W1b5cAZaTCqXMM1acsl41BUlGxRLBfpEHBOLNP fDmNfKeAjV44RAsDldK96ZmTRwAWexs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-590-K9yj1Wd3OaeNzb1Pf-USdg-1; Wed, 07 Jul 2021 15:33:05 -0400 X-MC-Unique: K9yj1Wd3OaeNzb1Pf-USdg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 63C611023F48; Wed, 7 Jul 2021 19:33:03 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 88016369A; Wed, 7 Jul 2021 19:32:45 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 03/15] memory: Introduce RamDiscardManager for RAM memory regions Date: Wed, 7 Jul 2021 15:32:29 -0400 Message-Id: <20210707193241.2659335-4-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , Alex Williamson , Peter Xu , "Dr . David Alan Gilbert" , Auger Eric , Pankaj Gupta , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand We have some special RAM memory regions (managed by virtio-mem), whereby the guest agreed to only use selected memory ranges. "unused" parts are discarded so they won't consume memory - to logically unplug these memory ranges. Before the VM is allowed to use such logically unplugged memory again, coordination with the hypervisor is required. This results in "sparse" mmaps/RAMBlocks/memory regions, whereby only coordinated parts are valid to be used/accessed by the VM. In most cases, we don't care about that - e.g., in KVM, we simply have a single KVM memory slot. However, in case of vfio, registering the whole region with the kernel results in all pages getting pinned, and therefore an unexpected high memory consumption - discarding of RAM in that context is broken. Let's introduce a way to coordinate discarding/populating memory within a RAM memory region with such special consumers of RAM memory regions: they can register as listeners and get updates on memory getting discarded and populated. Using this machinery, vfio will be able to map only the currently populated parts, resulting in discarded parts not getting pinned and not consuming memory. A RamDiscardManager has to be set for a memory region before it is getting mapped, and cannot change while the memory region is mapped. Note: At some point, we might want to let RAMBlock users (esp. vfio used for nvme://) consume this interface as well. We'll need RAMBlock notifier calls when a RAMBlock is getting mapped/unmapped (via the corresponding memory region), so we can properly register a listener there as well. Reviewed-by: Pankaj Gupta Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-2-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/exec/memory.h | 286 ++++++++++++++++++++++++++++++++++++++---- softmmu/memory.c | 71 +++++++++++ 2 files changed, 335 insertions(+), 22 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index b116f7c64ed..684fe47e6ee 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -42,6 +42,12 @@ typedef struct IOMMUMemoryRegionClass IOMMUMemoryRegionClass; DECLARE_OBJ_CHECKERS(IOMMUMemoryRegion, IOMMUMemoryRegionClass, IOMMU_MEMORY_REGION, TYPE_IOMMU_MEMORY_REGION) +#define TYPE_RAM_DISCARD_MANAGER "qemu:ram-discard-manager" +typedef struct RamDiscardManagerClass RamDiscardManagerClass; +typedef struct RamDiscardManager RamDiscardManager; +DECLARE_OBJ_CHECKERS(RamDiscardManager, RamDiscardManagerClass, + RAM_DISCARD_MANAGER, TYPE_RAM_DISCARD_MANAGER); + #ifdef CONFIG_FUZZ void fuzz_dma_read_cb(size_t addr, size_t len, @@ -65,6 +71,28 @@ struct ReservedRegion { unsigned type; }; +/** + * struct MemoryRegionSection: describes a fragment of a #MemoryRegion + * + * @mr: the region, or %NULL if empty + * @fv: the flat view of the address space the region is mapped in + * @offset_within_region: the beginning of the section, relative to @mr's start + * @size: the size of the section; will not exceed @mr's boundaries + * @offset_within_address_space: the address of the first byte of the section + * relative to the region's address space + * @readonly: writes to this section are ignored + * @nonvolatile: this section is non-volatile + */ +struct MemoryRegionSection { + Int128 size; + MemoryRegion *mr; + FlatView *fv; + hwaddr offset_within_region; + hwaddr offset_within_address_space; + bool readonly; + bool nonvolatile; +}; + typedef struct IOMMUTLBEntry IOMMUTLBEntry; /* See address_space_translate: bit 0 is read, bit 1 is write. */ @@ -448,6 +476,206 @@ struct IOMMUMemoryRegionClass { Error **errp); }; +typedef struct RamDiscardListener RamDiscardListener; +typedef int (*NotifyRamPopulate)(RamDiscardListener *rdl, + MemoryRegionSection *section); +typedef void (*NotifyRamDiscard)(RamDiscardListener *rdl, + MemoryRegionSection *section); + +struct RamDiscardListener { + /* + * @notify_populate: + * + * Notification that previously discarded memory is about to get populated. + * Listeners are able to object. If any listener objects, already + * successfully notified listeners are notified about a discard again. + * + * @rdl: the #RamDiscardListener getting notified + * @section: the #MemoryRegionSection to get populated. The section + * is aligned within the memory region to the minimum granularity + * unless it would exceed the registered section. + * + * Returns 0 on success. If the notification is rejected by the listener, + * an error is returned. + */ + NotifyRamPopulate notify_populate; + + /* + * @notify_discard: + * + * Notification that previously populated memory was discarded successfully + * and listeners should drop all references to such memory and prevent + * new population (e.g., unmap). + * + * @rdl: the #RamDiscardListener getting notified + * @section: the #MemoryRegionSection to get populated. The section + * is aligned within the memory region to the minimum granularity + * unless it would exceed the registered section. + */ + NotifyRamDiscard notify_discard; + + /* + * @double_discard_supported: + * + * The listener suppors getting @notify_discard notifications that span + * already discarded parts. + */ + bool double_discard_supported; + + MemoryRegionSection *section; + QLIST_ENTRY(RamDiscardListener) next; +}; + +static inline void ram_discard_listener_init(RamDiscardListener *rdl, + NotifyRamPopulate populate_fn, + NotifyRamDiscard discard_fn, + bool double_discard_supported) +{ + rdl->notify_populate = populate_fn; + rdl->notify_discard = discard_fn; + rdl->double_discard_supported = double_discard_supported; +} + +typedef int (*ReplayRamPopulate)(MemoryRegionSection *section, void *opaque); + +/* + * RamDiscardManagerClass: + * + * A #RamDiscardManager coordinates which parts of specific RAM #MemoryRegion + * regions are currently populated to be used/accessed by the VM, notifying + * after parts were discarded (freeing up memory) and before parts will be + * populated (consuming memory), to be used/acessed by the VM. + * + * A #RamDiscardManager can only be set for a RAM #MemoryRegion while the + * #MemoryRegion isn't mapped yet; it cannot change while the #MemoryRegion is + * mapped. + * + * The #RamDiscardManager is intended to be used by technologies that are + * incompatible with discarding of RAM (e.g., VFIO, which may pin all + * memory inside a #MemoryRegion), and require proper coordination to only + * map the currently populated parts, to hinder parts that are expected to + * remain discarded from silently getting populated and consuming memory. + * Technologies that support discarding of RAM don't have to bother and can + * simply map the whole #MemoryRegion. + * + * An example #RamDiscardManager is virtio-mem, which logically (un)plugs + * memory within an assigned RAM #MemoryRegion, coordinated with the VM. + * Logically unplugging memory consists of discarding RAM. The VM agreed to not + * access unplugged (discarded) memory - especially via DMA. virtio-mem will + * properly coordinate with listeners before memory is plugged (populated), + * and after memory is unplugged (discarded). + * + * Listeners are called in multiples of the minimum granularity (unless it + * would exceed the registered range) and changes are aligned to the minimum + * granularity within the #MemoryRegion. Listeners have to prepare for memory + * becomming discarded in a different granularity than it was populated and the + * other way around. + */ +struct RamDiscardManagerClass { + /* private */ + InterfaceClass parent_class; + + /* public */ + + /** + * @get_min_granularity: + * + * Get the minimum granularity in which listeners will get notified + * about changes within the #MemoryRegion via the #RamDiscardManager. + * + * @rdm: the #RamDiscardManager + * @mr: the #MemoryRegion + * + * Returns the minimum granularity. + */ + uint64_t (*get_min_granularity)(const RamDiscardManager *rdm, + const MemoryRegion *mr); + + /** + * @is_populated: + * + * Check whether the given #MemoryRegionSection is completely populated + * (i.e., no parts are currently discarded) via the #RamDiscardManager. + * There are no alignment requirements. + * + * @rdm: the #RamDiscardManager + * @section: the #MemoryRegionSection + * + * Returns whether the given range is completely populated. + */ + bool (*is_populated)(const RamDiscardManager *rdm, + const MemoryRegionSection *section); + + /** + * @replay_populated: + * + * Call the #ReplayRamPopulate callback for all populated parts within the + * #MemoryRegionSection via the #RamDiscardManager. + * + * In case any call fails, no further calls are made. + * + * @rdm: the #RamDiscardManager + * @section: the #MemoryRegionSection + * @replay_fn: the #ReplayRamPopulate callback + * @opaque: pointer to forward to the callback + * + * Returns 0 on success, or a negative error if any notification failed. + */ + int (*replay_populated)(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, void *opaque); + + /** + * @register_listener: + * + * Register a #RamDiscardListener for the given #MemoryRegionSection and + * immediately notify the #RamDiscardListener about all populated parts + * within the #MemoryRegionSection via the #RamDiscardManager. + * + * In case any notification fails, no further notifications are triggered + * and an error is logged. + * + * @rdm: the #RamDiscardManager + * @rdl: the #RamDiscardListener + * @section: the #MemoryRegionSection + */ + void (*register_listener)(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section); + + /** + * @unregister_listener: + * + * Unregister a previously registered #RamDiscardListener via the + * #RamDiscardManager after notifying the #RamDiscardListener about all + * populated parts becoming unpopulated within the registered + * #MemoryRegionSection. + * + * @rdm: the #RamDiscardManager + * @rdl: the #RamDiscardListener + */ + void (*unregister_listener)(RamDiscardManager *rdm, + RamDiscardListener *rdl); +}; + +uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr); + +bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section); + +int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, + void *opaque); + +void ram_discard_manager_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section); + +void ram_discard_manager_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl); + typedef struct CoalescedMemoryRange CoalescedMemoryRange; typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd; @@ -494,6 +722,7 @@ struct MemoryRegion { const char *name; unsigned ioeventfd_nb; MemoryRegionIoeventfd *ioeventfds; + RamDiscardManager *rdm; /* Only for RAM */ }; struct IOMMUMemoryRegion { @@ -825,28 +1054,6 @@ typedef bool (*flatview_cb)(Int128 start, */ void flatview_for_each_range(FlatView *fv, flatview_cb cb, void *opaque); -/** - * struct MemoryRegionSection: describes a fragment of a #MemoryRegion - * - * @mr: the region, or %NULL if empty - * @fv: the flat view of the address space the region is mapped in - * @offset_within_region: the beginning of the section, relative to @mr's start - * @size: the size of the section; will not exceed @mr's boundaries - * @offset_within_address_space: the address of the first byte of the section - * relative to the region's address space - * @readonly: writes to this section are ignored - * @nonvolatile: this section is non-volatile - */ -struct MemoryRegionSection { - Int128 size; - MemoryRegion *mr; - FlatView *fv; - hwaddr offset_within_region; - hwaddr offset_within_address_space; - bool readonly; - bool nonvolatile; -}; - static inline bool MemoryRegionSection_eq(MemoryRegionSection *a, MemoryRegionSection *b) { @@ -2023,6 +2230,41 @@ bool memory_region_present(MemoryRegion *container, hwaddr addr); */ bool memory_region_is_mapped(MemoryRegion *mr); +/** + * memory_region_get_ram_discard_manager: get the #RamDiscardManager for a + * #MemoryRegion + * + * The #RamDiscardManager cannot change while a memory region is mapped. + * + * @mr: the #MemoryRegion + */ +RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr); + +/** + * memory_region_has_ram_discard_manager: check whether a #MemoryRegion has a + * #RamDiscardManager assigned + * + * @mr: the #MemoryRegion + */ +static inline bool memory_region_has_ram_discard_manager(MemoryRegion *mr) +{ + return !!memory_region_get_ram_discard_manager(mr); +} + +/** + * memory_region_set_ram_discard_manager: set the #RamDiscardManager for a + * #MemoryRegion + * + * This function must not be called for a mapped #MemoryRegion, a #MemoryRegion + * that does not cover RAM, or a #MemoryRegion that already has a + * #RamDiscardManager assigned. + * + * @mr: the #MemoryRegion + * @urn: #RamDiscardManager to set + */ +void memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm); + /** * memory_region_find: translate an address/size relative to a * MemoryRegion into a #MemoryRegionSection. diff --git a/softmmu/memory.c b/softmmu/memory.c index f0161515e96..d20a9dec44e 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -2027,6 +2027,70 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr) return imrc->num_indexes(iommu_mr); } +RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr) +{ + if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) { + return NULL; + } + return mr->rdm; +} + +void memory_region_set_ram_discard_manager(MemoryRegion *mr, + RamDiscardManager *rdm) +{ + g_assert(memory_region_is_ram(mr) && !memory_region_is_mapped(mr)); + g_assert(!rdm || !mr->rdm); + mr->rdm = rdm; +} + +uint64_t ram_discard_manager_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); + + g_assert(rdmc->get_min_granularity); + return rdmc->get_min_granularity(rdm, mr); +} + +bool ram_discard_manager_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *section) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); + + g_assert(rdmc->is_populated); + return rdmc->is_populated(rdm, section); +} + +int ram_discard_manager_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *section, + ReplayRamPopulate replay_fn, + void *opaque) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); + + g_assert(rdmc->replay_populated); + return rdmc->replay_populated(rdm, section, replay_fn, opaque); +} + +void ram_discard_manager_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); + + g_assert(rdmc->register_listener); + rdmc->register_listener(rdm, rdl, section); +} + +void ram_discard_manager_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_GET_CLASS(rdm); + + g_assert(rdmc->unregister_listener); + rdmc->unregister_listener(rdm, rdl); +} + void memory_region_set_log(MemoryRegion *mr, bool log, unsigned client) { uint8_t mask = 1 << client; @@ -3320,10 +3384,17 @@ static const TypeInfo iommu_memory_region_info = { .abstract = true, }; +static const TypeInfo ram_discard_manager_info = { + .parent = TYPE_INTERFACE, + .name = TYPE_RAM_DISCARD_MANAGER, + .class_size = sizeof(RamDiscardManagerClass), +}; + static void memory_register_types(void) { type_register_static(&memory_region_info); type_register_static(&iommu_memory_region_info); + type_register_static(&ram_discard_manager_info); } type_init(memory_register_types) From patchwork Wed Jul 7 19:32:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA549C07E95 for ; Wed, 7 Jul 2021 19:34:52 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 93BC761C5B for ; Wed, 7 Jul 2021 19:34:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93BC761C5B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:34818 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DJr-00045F-48 for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:34:51 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44772) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DID-0001Oj-Rb for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:09 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:24689) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIB-0007cp-Qf for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686387; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eoc2HuKrdPIWd+dF/cgHQPsjBagOJO0avDeJhj5LIOU=; b=Pkby6c7/Jnp50A4HC0qfGTzreo7noABcnLvWFVwYwl9W9+vVzzksqEdNCokxoZb1QW20ER Rmia3HrAxoqQJZHdELzq9jQoTt+1Blwlyzz7+iJ5ezv3hYF9T3JToier+x7AeDF6pE30cF u61RiT8KrkhoxSHG4pn/uojeESG9bAU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-106-Or_08u8rMwmZ0S5gv7a0Mg-1; Wed, 07 Jul 2021 15:33:05 -0400 X-MC-Unique: Or_08u8rMwmZ0S5gv7a0Mg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AB1791023F42; Wed, 7 Jul 2021 19:33:04 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3423E19C66; Wed, 7 Jul 2021 19:33:04 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 04/15] memory: Helpers to copy/free a MemoryRegionSection Date: Wed, 7 Jul 2021 15:32:30 -0400 Message-Id: <20210707193241.2659335-5-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S. Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand In case one wants to create a permanent copy of a MemoryRegionSections, one needs access to flatview_ref()/flatview_unref(). Instead of exposing these, let's just add helpers to copy/free a MemoryRegionSection and properly adjust references. Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-3-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/exec/memory.h | 20 ++++++++++++++++++++ softmmu/memory.c | 27 +++++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/include/exec/memory.h b/include/exec/memory.h index 684fe47e6ee..a23487fdb30 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -1066,6 +1066,26 @@ static inline bool MemoryRegionSection_eq(MemoryRegionSection *a, a->nonvolatile == b->nonvolatile; } +/** + * memory_region_section_new_copy: Copy a memory region section + * + * Allocate memory for a new copy, copy the memory region section, and + * properly take a reference on all relevant members. + * + * @s: the #MemoryRegionSection to copy + */ +MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s); + +/** + * memory_region_section_new_copy: Free a copied memory region section + * + * Free a copy of a memory section created via memory_region_section_new_copy(). + * properly dropping references on all relevant members. + * + * @s: the #MemoryRegionSection to copy + */ +void memory_region_section_free_copy(MemoryRegionSection *s); + /** * memory_region_init: Initialize a memory region * diff --git a/softmmu/memory.c b/softmmu/memory.c index d20a9dec44e..cea2f622c96 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -2701,6 +2701,33 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, return ret; } +MemoryRegionSection *memory_region_section_new_copy(MemoryRegionSection *s) +{ + MemoryRegionSection *tmp = g_new(MemoryRegionSection, 1); + + *tmp = *s; + if (tmp->mr) { + memory_region_ref(tmp->mr); + } + if (tmp->fv) { + bool ret = flatview_ref(tmp->fv); + + g_assert(ret); + } + return tmp; +} + +void memory_region_section_free_copy(MemoryRegionSection *s) +{ + if (s->fv) { + flatview_unref(s->fv); + } + if (s->mr) { + memory_region_unref(s->mr); + } + g_free(s); +} + bool memory_region_present(MemoryRegion *container, hwaddr addr) { MemoryRegion *mr; From patchwork Wed Jul 7 19:32:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7820AC07E95 for ; Wed, 7 Jul 2021 19:41:18 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ECF5A61C84 for ; Wed, 7 Jul 2021 19:41:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ECF5A61C84 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:54676 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DQ5-0001K9-1O for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:41:17 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44888) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIQ-0001dC-JH for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:22 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54842) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIN-0007fP-BY for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4mvd82vnYSXuUqrFt/fYSsa29fGv5cLwcEIutn7JoWI=; b=Ha7UYv/nh9tU8GXzJtDKoQd7JAAAKfQatsqVQBuoGMzGtz5ZqM6aNhz4TEGs1xJDsZogay kHX0myxoaPR1JGUIJlnrKQBTI6G1nApWXEvADyC1WYJWh5aPyuyA8kRvx9GnEoIrnHToV4 NS3n9+3ThxOO93dkFs028bVhxZWJoJg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-557-BUpyVL9kNgiwiTupd0V0Qw-1; Wed, 07 Jul 2021 15:33:13 -0400 X-MC-Unique: BUpyVL9kNgiwiTupd0V0Qw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2ABF710B7462; Wed, 7 Jul 2021 19:33:12 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 40C88369A; Wed, 7 Jul 2021 19:33:05 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 05/15] virtio-mem: Factor out traversing unplugged ranges Date: Wed, 7 Jul 2021 15:32:31 -0400 Message-Id: <20210707193241.2659335-6-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , Alex Williamson , Peter Xu , "Dr . David Alan Gilbert" , Auger Eric , Pankaj Gupta , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Let's factor out the core logic, no need to replicate. Reviewed-by: Pankaj Gupta Acked-by: Michael S. Tsirkin Reviewed-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-4-david@redhat.com> Signed-off-by: Eduardo Habkost --- hw/virtio/virtio-mem.c | 86 ++++++++++++++++++++++++------------------ 1 file changed, 49 insertions(+), 37 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 75aa7d6f1b1..3942fd75496 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -145,6 +145,33 @@ static bool virtio_mem_is_busy(void) return migration_in_incoming_postcopy() || !migration_is_idle(); } +typedef int (*virtio_mem_range_cb)(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size); + +static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg, + virtio_mem_range_cb cb) +{ + unsigned long first_zero_bit, last_zero_bit; + uint64_t offset, size; + int ret = 0; + + first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size); + while (first_zero_bit < vmem->bitmap_size) { + offset = first_zero_bit * vmem->block_size; + last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, + first_zero_bit + 1) - 1; + size = (last_zero_bit - first_zero_bit + 1) * vmem->block_size; + + ret = cb(vmem, arg, offset, size); + if (ret) { + break; + } + first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, + last_zero_bit + 2); + } + return ret; +} + static bool virtio_mem_test_bitmap(VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plugged) { @@ -594,33 +621,27 @@ static void virtio_mem_device_unrealize(DeviceState *dev) ram_block_discard_require(false); } -static int virtio_mem_restore_unplugged(VirtIOMEM *vmem) +static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size) { RAMBlock *rb = vmem->memdev->mr.ram_block; - unsigned long first_zero_bit, last_zero_bit; - uint64_t offset, length; int ret; - /* Find consecutive unplugged blocks and discard the consecutive range. */ - first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size); - while (first_zero_bit < vmem->bitmap_size) { - offset = first_zero_bit * vmem->block_size; - last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, - first_zero_bit + 1) - 1; - length = (last_zero_bit - first_zero_bit + 1) * vmem->block_size; - - ret = ram_block_discard_range(rb, offset, length); - if (ret) { - error_report("Unexpected error discarding RAM: %s", - strerror(-ret)); - return -EINVAL; - } - first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, - last_zero_bit + 2); + ret = ram_block_discard_range(rb, offset, size); + if (ret) { + error_report("Unexpected error discarding RAM: %s", strerror(-ret)); + return -EINVAL; } return 0; } +static int virtio_mem_restore_unplugged(VirtIOMEM *vmem) +{ + /* Make sure all memory is really discarded after migration. */ + return virtio_mem_for_each_unplugged_range(vmem, NULL, + virtio_mem_discard_range_cb); +} + static int virtio_mem_post_load(void *opaque, int version_id) { if (migration_in_incoming_postcopy()) { @@ -872,28 +893,19 @@ static void virtio_mem_set_block_size(Object *obj, Visitor *v, const char *name, vmem->block_size = value; } -static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem) +static int virtio_mem_precopy_exclude_range_cb(const VirtIOMEM *vmem, void *arg, + uint64_t offset, uint64_t size) { void * const host = qemu_ram_get_host_addr(vmem->memdev->mr.ram_block); - unsigned long first_zero_bit, last_zero_bit; - uint64_t offset, length; - /* - * Find consecutive unplugged blocks and exclude them from migration. - * - * Note: Blocks cannot get (un)plugged during precopy, no locking needed. - */ - first_zero_bit = find_first_zero_bit(vmem->bitmap, vmem->bitmap_size); - while (first_zero_bit < vmem->bitmap_size) { - offset = first_zero_bit * vmem->block_size; - last_zero_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, - first_zero_bit + 1) - 1; - length = (last_zero_bit - first_zero_bit + 1) * vmem->block_size; + qemu_guest_free_page_hint(host + offset, size); + return 0; +} - qemu_guest_free_page_hint(host + offset, length); - first_zero_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, - last_zero_bit + 2); - } +static void virtio_mem_precopy_exclude_unplugged(VirtIOMEM *vmem) +{ + virtio_mem_for_each_unplugged_range(vmem, NULL, + virtio_mem_precopy_exclude_range_cb); } static int virtio_mem_precopy_notify(NotifierWithReturn *n, void *data) From patchwork Wed Jul 7 19:32:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E837EC07E95 for ; Wed, 7 Jul 2021 19:35:58 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF23D61C5B for ; Wed, 7 Jul 2021 19:35:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF23D61C5B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:38060 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DKv-0006SK-RQ for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:35:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44984) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DId-0001r5-6Y for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:35 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43604) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIa-0007gU-VD for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:34 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fA5uiscrduuPmsd8g8UnN6cMIJ0WEpjAe1dloV6xAew=; b=P7wF7IK3lZnM32H91SdQFg7uiHPnVRKXazBr412M062zx6Id8Nb5g004CpFpnX5Pr8ulB4 2rsdIlBKYzptqoHCGnnNBqjMA9CHwo3AJhwI1MrtIy/b4NnDSnNjYI3FfppdcQ9du6TwMa 9SnOzcLLbmI3qtod1CKbqv0wR21/f64= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-264-2yeXcn3oP5eGfo8IW6X2MQ-1; Wed, 07 Jul 2021 15:33:31 -0400 X-MC-Unique: 2yeXcn3oP5eGfo8IW6X2MQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 00EB8CC621; Wed, 7 Jul 2021 19:33:30 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id E308A5D9FC; Wed, 7 Jul 2021 19:33:12 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 06/15] virtio-mem: Don't report errors when ram_block_discard_range() fails Date: Wed, 7 Jul 2021 15:32:32 -0400 Message-Id: <20210707193241.2659335-7-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S. Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Any errors are unexpected and ram_block_discard_range() already properly prints errors. Let's stop manually reporting errors. Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-5-david@redhat.com> Signed-off-by: Eduardo Habkost --- hw/virtio/virtio-mem.c | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 3942fd75496..a92a067b283 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -246,17 +246,14 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plug) { const uint64_t offset = start_gpa - vmem->addr; - int ret; + RAMBlock *rb = vmem->memdev->mr.ram_block; if (virtio_mem_is_busy()) { return -EBUSY; } if (!plug) { - ret = ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size); - if (ret) { - error_report("Unexpected error discarding RAM: %s", - strerror(-ret)); + if (ram_block_discard_range(rb, offset, size)) { return -EBUSY; } } @@ -345,15 +342,12 @@ static void virtio_mem_resize_usable_region(VirtIOMEM *vmem, static int virtio_mem_unplug_all(VirtIOMEM *vmem) { RAMBlock *rb = vmem->memdev->mr.ram_block; - int ret; if (virtio_mem_is_busy()) { return -EBUSY; } - ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb)); - if (ret) { - error_report("Unexpected error discarding RAM: %s", strerror(-ret)); + if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) { return -EBUSY; } bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size); @@ -625,14 +619,8 @@ static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg, uint64_t offset, uint64_t size) { RAMBlock *rb = vmem->memdev->mr.ram_block; - int ret; - ret = ram_block_discard_range(rb, offset, size); - if (ret) { - error_report("Unexpected error discarding RAM: %s", strerror(-ret)); - return -EINVAL; - } - return 0; + return ram_block_discard_range(rb, offset, size) ? -EINVAL : 0; } static int virtio_mem_restore_unplugged(VirtIOMEM *vmem) From patchwork Wed Jul 7 19:32:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CD3BC07E95 for ; Wed, 7 Jul 2021 19:39:42 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 839FB61C77 for ; Wed, 7 Jul 2021 19:39:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 839FB61C77 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:50668 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DOW-0006sh-Mi for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:39:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45002) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIf-0001wh-IL for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:35064) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIc-0007ga-DU for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686413; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H5tF3JgVaXSeIg8YgS90iisjqdE3KKFj9VFXj8Gjbug=; b=FDWW4HAAIkid0tGyLGsH1LMEEhFbh7ls0vRSL3/XPEDA/AddgoVUC+f8hDSS3Y62gv4aF3 QL3E3we1mIYkpy1zYedargblekCsrUFzrlPlXLyzhnrxab2xm8nw9L0a+6vE671ZvVTKSp JAqITIAQh/wHJadhv0tayTyPX9BLoJ4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-47-9LduaUBwPXm9w_ngGOR-9Q-1; Wed, 07 Jul 2021 15:33:32 -0400 X-MC-Unique: 9LduaUBwPXm9w_ngGOR-9Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0D22E1932483; Wed, 7 Jul 2021 19:33:31 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF9BA5D9FC; Wed, 7 Jul 2021 19:33:30 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 07/15] virtio-mem: Implement RamDiscardManager interface Date: Wed, 7 Jul 2021 15:32:33 -0400 Message-Id: <20210707193241.2659335-8-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Let's properly notify when (un)plugging blocks, after discarding memory and before allowing the guest to consume memory. Handle errors from notifiers gracefully (e.g., no remaining VFIO mappings) when plugging, rolling back the change and telling the guest that the VM is busy. One special case to take care of is replaying all notifications after restoring the vmstate. The device starts out with all memory discarded, so after loading the vmstate, we have to notify about all plugged blocks. Acked-by: Michael S. Tsirkin Reviewed-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-6-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/hw/virtio/virtio-mem.h | 3 + hw/virtio/virtio-mem.c | 288 ++++++++++++++++++++++++++++++++- 2 files changed, 288 insertions(+), 3 deletions(-) diff --git a/include/hw/virtio/virtio-mem.h b/include/hw/virtio/virtio-mem.h index 4eeb82d5ddb..9a6e348fa2c 100644 --- a/include/hw/virtio/virtio-mem.h +++ b/include/hw/virtio/virtio-mem.h @@ -67,6 +67,9 @@ struct VirtIOMEM { /* don't migrate unplugged memory */ NotifierWithReturn precopy_notifier; + + /* listeners to notify on plug/unplug activity. */ + QLIST_HEAD(, RamDiscardListener) rdl_list; }; struct VirtIOMEMClass { diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index a92a067b283..f60cb8a3fc0 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -172,7 +172,146 @@ static int virtio_mem_for_each_unplugged_range(const VirtIOMEM *vmem, void *arg, return ret; } -static bool virtio_mem_test_bitmap(VirtIOMEM *vmem, uint64_t start_gpa, +/* + * Adjust the memory section to cover the intersection with the given range. + * + * Returns false if the intersection is empty, otherwise returns true. + */ +static bool virito_mem_intersect_memory_section(MemoryRegionSection *s, + uint64_t offset, uint64_t size) +{ + uint64_t start = MAX(s->offset_within_region, offset); + uint64_t end = MIN(s->offset_within_region + int128_get64(s->size), + offset + size); + + if (end <= start) { + return false; + } + + s->offset_within_address_space += start - s->offset_within_region; + s->offset_within_region = start; + s->size = int128_make64(end - start); + return true; +} + +typedef int (*virtio_mem_section_cb)(MemoryRegionSection *s, void *arg); + +static int virtio_mem_for_each_plugged_section(const VirtIOMEM *vmem, + MemoryRegionSection *s, + void *arg, + virtio_mem_section_cb cb) +{ + unsigned long first_bit, last_bit; + uint64_t offset, size; + int ret = 0; + + first_bit = s->offset_within_region / vmem->bitmap_size; + first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, first_bit); + while (first_bit < vmem->bitmap_size) { + MemoryRegionSection tmp = *s; + + offset = first_bit * vmem->block_size; + last_bit = find_next_zero_bit(vmem->bitmap, vmem->bitmap_size, + first_bit + 1) - 1; + size = (last_bit - first_bit + 1) * vmem->block_size; + + if (!virito_mem_intersect_memory_section(&tmp, offset, size)) { + break; + } + ret = cb(&tmp, arg); + if (ret) { + break; + } + first_bit = find_next_bit(vmem->bitmap, vmem->bitmap_size, + last_bit + 2); + } + return ret; +} + +static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg) +{ + RamDiscardListener *rdl = arg; + + return rdl->notify_populate(rdl, s); +} + +static int virtio_mem_notify_discard_cb(MemoryRegionSection *s, void *arg) +{ + RamDiscardListener *rdl = arg; + + rdl->notify_discard(rdl, s); + return 0; +} + +static void virtio_mem_notify_unplug(VirtIOMEM *vmem, uint64_t offset, + uint64_t size) +{ + RamDiscardListener *rdl; + + QLIST_FOREACH(rdl, &vmem->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!virito_mem_intersect_memory_section(&tmp, offset, size)) { + continue; + } + rdl->notify_discard(rdl, &tmp); + } +} + +static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t offset, + uint64_t size) +{ + RamDiscardListener *rdl, *rdl2; + int ret = 0; + + QLIST_FOREACH(rdl, &vmem->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (!virito_mem_intersect_memory_section(&tmp, offset, size)) { + continue; + } + ret = rdl->notify_populate(rdl, &tmp); + if (ret) { + break; + } + } + + if (ret) { + /* Notify all already-notified listeners. */ + QLIST_FOREACH(rdl2, &vmem->rdl_list, next) { + MemoryRegionSection tmp = *rdl->section; + + if (rdl2 == rdl) { + break; + } + if (!virito_mem_intersect_memory_section(&tmp, offset, size)) { + continue; + } + rdl2->notify_discard(rdl2, &tmp); + } + } + return ret; +} + +static void virtio_mem_notify_unplug_all(VirtIOMEM *vmem) +{ + RamDiscardListener *rdl; + + if (!vmem->size) { + return; + } + + QLIST_FOREACH(rdl, &vmem->rdl_list, next) { + if (rdl->double_discard_supported) { + rdl->notify_discard(rdl, rdl->section); + } else { + virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, + virtio_mem_notify_discard_cb); + } + } +} + +static bool virtio_mem_test_bitmap(const VirtIOMEM *vmem, uint64_t start_gpa, uint64_t size, bool plugged) { const unsigned long first_bit = (start_gpa - vmem->addr) / vmem->block_size; @@ -225,7 +364,8 @@ static void virtio_mem_send_response_simple(VirtIOMEM *vmem, virtio_mem_send_response(vmem, elem, &resp); } -static bool virtio_mem_valid_range(VirtIOMEM *vmem, uint64_t gpa, uint64_t size) +static bool virtio_mem_valid_range(const VirtIOMEM *vmem, uint64_t gpa, + uint64_t size) { if (!QEMU_IS_ALIGNED(gpa, vmem->block_size)) { return false; @@ -256,6 +396,11 @@ static int virtio_mem_set_block_state(VirtIOMEM *vmem, uint64_t start_gpa, if (ram_block_discard_range(rb, offset, size)) { return -EBUSY; } + virtio_mem_notify_unplug(vmem, offset, size); + } else if (virtio_mem_notify_plug(vmem, offset, size)) { + /* Could be a mapping attempt resulted in memory getting populated. */ + ram_block_discard_range(vmem->memdev->mr.ram_block, offset, size); + return -EBUSY; } virtio_mem_set_bitmap(vmem, start_gpa, size, plug); return 0; @@ -350,6 +495,8 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem) if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) { return -EBUSY; } + virtio_mem_notify_unplug_all(vmem); + bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size); if (vmem->size) { vmem->size = 0; @@ -598,6 +745,13 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem)); qemu_register_reset(virtio_mem_system_reset, vmem); precopy_add_notifier(&vmem->precopy_notifier); + + /* + * Set ourselves as RamDiscardManager before the plug handler maps the + * memory region and exposes it via an address space. + */ + memory_region_set_ram_discard_manager(&vmem->memdev->mr, + RAM_DISCARD_MANAGER(vmem)); } static void virtio_mem_device_unrealize(DeviceState *dev) @@ -605,6 +759,11 @@ static void virtio_mem_device_unrealize(DeviceState *dev) VirtIODevice *vdev = VIRTIO_DEVICE(dev); VirtIOMEM *vmem = VIRTIO_MEM(dev); + /* + * The unplug handler unmapped the memory region, it cannot be + * found via an address space anymore. Unset ourselves. + */ + memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL); precopy_remove_notifier(&vmem->precopy_notifier); qemu_unregister_reset(virtio_mem_system_reset, vmem); vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem)); @@ -632,11 +791,27 @@ static int virtio_mem_restore_unplugged(VirtIOMEM *vmem) static int virtio_mem_post_load(void *opaque, int version_id) { + VirtIOMEM *vmem = VIRTIO_MEM(opaque); + RamDiscardListener *rdl; + int ret; + + /* + * We started out with all memory discarded and our memory region is mapped + * into an address space. Replay, now that we updated the bitmap. + */ + QLIST_FOREACH(rdl, &vmem->rdl_list, next) { + ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, + virtio_mem_notify_populate_cb); + if (ret) { + return ret; + } + } + if (migration_in_incoming_postcopy()) { return 0; } - return virtio_mem_restore_unplugged(VIRTIO_MEM(opaque)); + return virtio_mem_restore_unplugged(vmem); } typedef struct VirtIOMEMMigSanityChecks { @@ -918,6 +1093,7 @@ static void virtio_mem_instance_init(Object *obj) notifier_list_init(&vmem->size_change_notifiers); vmem->precopy_notifier.notify = virtio_mem_precopy_notify; + QLIST_INIT(&vmem->rdl_list); object_property_add(obj, VIRTIO_MEM_SIZE_PROP, "size", virtio_mem_get_size, NULL, NULL, NULL); @@ -937,11 +1113,107 @@ static Property virtio_mem_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static uint64_t virtio_mem_rdm_get_min_granularity(const RamDiscardManager *rdm, + const MemoryRegion *mr) +{ + const VirtIOMEM *vmem = VIRTIO_MEM(rdm); + + g_assert(mr == &vmem->memdev->mr); + return vmem->block_size; +} + +static bool virtio_mem_rdm_is_populated(const RamDiscardManager *rdm, + const MemoryRegionSection *s) +{ + const VirtIOMEM *vmem = VIRTIO_MEM(rdm); + uint64_t start_gpa = vmem->addr + s->offset_within_region; + uint64_t end_gpa = start_gpa + int128_get64(s->size); + + g_assert(s->mr == &vmem->memdev->mr); + + start_gpa = QEMU_ALIGN_DOWN(start_gpa, vmem->block_size); + end_gpa = QEMU_ALIGN_UP(end_gpa, vmem->block_size); + + if (!virtio_mem_valid_range(vmem, start_gpa, end_gpa - start_gpa)) { + return false; + } + + return virtio_mem_test_bitmap(vmem, start_gpa, end_gpa - start_gpa, true); +} + +struct VirtIOMEMReplayData { + void *fn; + void *opaque; +}; + +static int virtio_mem_rdm_replay_populated_cb(MemoryRegionSection *s, void *arg) +{ + struct VirtIOMEMReplayData *data = arg; + + return ((ReplayRamPopulate)data->fn)(s, data->opaque); +} + +static int virtio_mem_rdm_replay_populated(const RamDiscardManager *rdm, + MemoryRegionSection *s, + ReplayRamPopulate replay_fn, + void *opaque) +{ + const VirtIOMEM *vmem = VIRTIO_MEM(rdm); + struct VirtIOMEMReplayData data = { + .fn = replay_fn, + .opaque = opaque, + }; + + g_assert(s->mr == &vmem->memdev->mr); + return virtio_mem_for_each_plugged_section(vmem, s, &data, + virtio_mem_rdm_replay_populated_cb); +} + +static void virtio_mem_rdm_register_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl, + MemoryRegionSection *s) +{ + VirtIOMEM *vmem = VIRTIO_MEM(rdm); + int ret; + + g_assert(s->mr == &vmem->memdev->mr); + rdl->section = memory_region_section_new_copy(s); + + QLIST_INSERT_HEAD(&vmem->rdl_list, rdl, next); + ret = virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, + virtio_mem_notify_populate_cb); + if (ret) { + error_report("%s: Replaying plugged ranges failed: %s", __func__, + strerror(-ret)); + } +} + +static void virtio_mem_rdm_unregister_listener(RamDiscardManager *rdm, + RamDiscardListener *rdl) +{ + VirtIOMEM *vmem = VIRTIO_MEM(rdm); + + g_assert(rdl->section->mr == &vmem->memdev->mr); + if (vmem->size) { + if (rdl->double_discard_supported) { + rdl->notify_discard(rdl, rdl->section); + } else { + virtio_mem_for_each_plugged_section(vmem, rdl->section, rdl, + virtio_mem_notify_discard_cb); + } + } + + memory_region_section_free_copy(rdl->section); + rdl->section = NULL; + QLIST_REMOVE(rdl, next); +} + static void virtio_mem_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass); VirtIOMEMClass *vmc = VIRTIO_MEM_CLASS(klass); + RamDiscardManagerClass *rdmc = RAM_DISCARD_MANAGER_CLASS(klass); device_class_set_props(dc, virtio_mem_properties); dc->vmsd = &vmstate_virtio_mem; @@ -957,6 +1229,12 @@ static void virtio_mem_class_init(ObjectClass *klass, void *data) vmc->get_memory_region = virtio_mem_get_memory_region; vmc->add_size_change_notifier = virtio_mem_add_size_change_notifier; vmc->remove_size_change_notifier = virtio_mem_remove_size_change_notifier; + + rdmc->get_min_granularity = virtio_mem_rdm_get_min_granularity; + rdmc->is_populated = virtio_mem_rdm_is_populated; + rdmc->replay_populated = virtio_mem_rdm_replay_populated; + rdmc->register_listener = virtio_mem_rdm_register_listener; + rdmc->unregister_listener = virtio_mem_rdm_unregister_listener; } static const TypeInfo virtio_mem_info = { @@ -966,6 +1244,10 @@ static const TypeInfo virtio_mem_info = { .instance_init = virtio_mem_instance_init, .class_init = virtio_mem_class_init, .class_size = sizeof(VirtIOMEMClass), + .interfaces = (InterfaceInfo[]) { + { TYPE_RAM_DISCARD_MANAGER }, + { } + }, }; static void virtio_register_types(void) From patchwork Wed Jul 7 19:32:34 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2AFC07E95 for ; Wed, 7 Jul 2021 19:44:43 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6541861CB0 for ; Wed, 7 Jul 2021 19:44:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6541861CB0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:35896 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DTO-0007cv-B5 for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:44:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45166) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIy-0002bB-37 for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:56 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:31125) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIv-0007jS-M9 for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686431; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oOqAsyfGN+tfhfyDNS5MYJ0jhylZvByR/xO3HiD1SXs=; b=asgm41jgst04vw0Tyju3nwINzpHVrDUJBgw8USuu893W3wm/Ssf9c1NNwiqoaNWfgH1ZOn TA2kG5J+JCF8f83mzn8k4VRwDyqqUMdeIqX8l1SNtNLxuDzERcspTAHvL6vawmM2Q5d98A 9QdmszEGwp6UcBoAlRgFGkBJn05aE44= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-389-eD1Mm5LbM664J5h2krD5Rw-1; Wed, 07 Jul 2021 15:33:50 -0400 X-MC-Unique: eD1Mm5LbM664J5h2krD5Rw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 16A2E10B7462; Wed, 7 Jul 2021 19:33:49 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id A903360843; Wed, 7 Jul 2021 19:33:31 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 08/15] vfio: Support for RamDiscardManager in the !vIOMMU case Date: Wed, 7 Jul 2021 15:32:34 -0400 Message-Id: <20210707193241.2659335-9-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Implement support for RamDiscardManager, to prepare for virtio-mem support. Instead of mapping the whole memory section, we only map "populated" parts and update the mapping when notified about discarding/population of memory via the RamDiscardListener. Similarly, when syncing the dirty bitmaps, sync only the actually mapped (populated) parts by replaying via the notifier. Using virtio-mem with vfio is still blocked via ram_block_discard_disable()/ram_block_discard_require() after this patch. Reviewed-by: Alex Williamson Acked-by: Alex Williamson Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-7-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/hw/vfio/vfio-common.h | 11 +++ hw/vfio/common.c | 164 ++++++++++++++++++++++++++++++++++ 2 files changed, 175 insertions(+) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 6141162d7ae..681432213d1 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -91,6 +91,7 @@ typedef struct VFIOContainer { QLIST_HEAD(, VFIOGuestIOMMU) giommu_list; QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list; QLIST_HEAD(, VFIOGroup) group_list; + QLIST_HEAD(, VFIORamDiscardListener) vrdl_list; QLIST_ENTRY(VFIOContainer) next; } VFIOContainer; @@ -102,6 +103,16 @@ typedef struct VFIOGuestIOMMU { QLIST_ENTRY(VFIOGuestIOMMU) giommu_next; } VFIOGuestIOMMU; +typedef struct VFIORamDiscardListener { + VFIOContainer *container; + MemoryRegion *mr; + hwaddr offset_within_address_space; + hwaddr size; + uint64_t granularity; + RamDiscardListener listener; + QLIST_ENTRY(VFIORamDiscardListener) next; +} VFIORamDiscardListener; + typedef struct VFIOHostDMAWindow { hwaddr min_iova; hwaddr max_iova; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index ae5654fcdb8..5af77552279 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -649,6 +649,110 @@ out: rcu_read_unlock(); } +static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, + listener); + const hwaddr size = int128_get64(section->size); + const hwaddr iova = section->offset_within_address_space; + int ret; + + /* Unmap with a single call. */ + ret = vfio_dma_unmap(vrdl->container, iova, size , NULL); + if (ret) { + error_report("%s: vfio_dma_unmap() failed: %s", __func__, + strerror(-ret)); + } +} + +static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl, + MemoryRegionSection *section) +{ + VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener, + listener); + const hwaddr end = section->offset_within_region + + int128_get64(section->size); + hwaddr start, next, iova; + void *vaddr; + int ret; + + /* + * Map in (aligned within memory region) minimum granularity, so we can + * unmap in minimum granularity later. + */ + for (start = section->offset_within_region; start < end; start = next) { + next = ROUND_UP(start + 1, vrdl->granularity); + next = MIN(next, end); + + iova = start - section->offset_within_region + + section->offset_within_address_space; + vaddr = memory_region_get_ram_ptr(section->mr) + start; + + ret = vfio_dma_map(vrdl->container, iova, next - start, + vaddr, section->readonly); + if (ret) { + /* Rollback */ + vfio_ram_discard_notify_discard(rdl, section); + return ret; + } + } + return 0; +} + +static void vfio_register_ram_discard_listener(VFIOContainer *container, + MemoryRegionSection *section) +{ + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); + VFIORamDiscardListener *vrdl; + + /* Ignore some corner cases not relevant in practice. */ + g_assert(QEMU_IS_ALIGNED(section->offset_within_region, TARGET_PAGE_SIZE)); + g_assert(QEMU_IS_ALIGNED(section->offset_within_address_space, + TARGET_PAGE_SIZE)); + g_assert(QEMU_IS_ALIGNED(int128_get64(section->size), TARGET_PAGE_SIZE)); + + vrdl = g_new0(VFIORamDiscardListener, 1); + vrdl->container = container; + vrdl->mr = section->mr; + vrdl->offset_within_address_space = section->offset_within_address_space; + vrdl->size = int128_get64(section->size); + vrdl->granularity = ram_discard_manager_get_min_granularity(rdm, + section->mr); + + g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity)); + g_assert(vrdl->granularity >= 1 << ctz64(container->pgsizes)); + + ram_discard_listener_init(&vrdl->listener, + vfio_ram_discard_notify_populate, + vfio_ram_discard_notify_discard, true); + ram_discard_manager_register_listener(rdm, &vrdl->listener, section); + QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next); +} + +static void vfio_unregister_ram_discard_listener(VFIOContainer *container, + MemoryRegionSection *section) +{ + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); + VFIORamDiscardListener *vrdl = NULL; + + QLIST_FOREACH(vrdl, &container->vrdl_list, next) { + if (vrdl->mr == section->mr && + vrdl->offset_within_address_space == + section->offset_within_address_space) { + break; + } + } + + if (!vrdl) { + hw_error("vfio: Trying to unregister missing RAM discard listener"); + } + + ram_discard_manager_unregister_listener(rdm, &vrdl->listener); + QLIST_REMOVE(vrdl, next); + g_free(vrdl); +} + static void vfio_listener_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -810,6 +914,16 @@ static void vfio_listener_region_add(MemoryListener *listener, /* Here we assume that memory_region_is_ram(section->mr)==true */ + /* + * For RAM memory regions with a RamDiscardManager, we only want to map the + * actually populated parts - and update the mapping whenever we're notified + * about changes. + */ + if (memory_region_has_ram_discard_manager(section->mr)) { + vfio_register_ram_discard_listener(container, section); + return; + } + vaddr = memory_region_get_ram_ptr(section->mr) + section->offset_within_region + (iova - section->offset_within_address_space); @@ -947,6 +1061,10 @@ static void vfio_listener_region_del(MemoryListener *listener, pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1; try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask)); + } else if (memory_region_has_ram_discard_manager(section->mr)) { + vfio_unregister_ram_discard_listener(container, section); + /* Unregistering will trigger an unmap. */ + try_unmap = false; } if (try_unmap) { @@ -1108,6 +1226,49 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) rcu_read_unlock(); } +static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section, + void *opaque) +{ + const hwaddr size = int128_get64(section->size); + const hwaddr iova = section->offset_within_address_space; + const ram_addr_t ram_addr = memory_region_get_ram_addr(section->mr) + + section->offset_within_region; + VFIORamDiscardListener *vrdl = opaque; + + /* + * Sync the whole mapped region (spanning multiple individual mappings) + * in one go. + */ + return vfio_get_dirty_bitmap(vrdl->container, iova, size, ram_addr); +} + +static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container, + MemoryRegionSection *section) +{ + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr); + VFIORamDiscardListener *vrdl = NULL; + + QLIST_FOREACH(vrdl, &container->vrdl_list, next) { + if (vrdl->mr == section->mr && + vrdl->offset_within_address_space == + section->offset_within_address_space) { + break; + } + } + + if (!vrdl) { + hw_error("vfio: Trying to sync missing RAM discard listener"); + } + + /* + * We only want/can synchronize the bitmap for actually mapped parts - + * which correspond to populated parts. Replay all populated parts. + */ + return ram_discard_manager_replay_populated(rdm, section, + vfio_ram_discard_get_dirty_bitmap, + &vrdl); +} + static int vfio_sync_dirty_bitmap(VFIOContainer *container, MemoryRegionSection *section) { @@ -1139,6 +1300,8 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container, } } return 0; + } else if (memory_region_has_ram_discard_manager(section->mr)) { + return vfio_sync_ram_discard_listener_dirty_bitmap(container, section); } ram_addr = memory_region_get_ram_addr(section->mr) + @@ -1770,6 +1933,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, container->dirty_pages_supported = false; QLIST_INIT(&container->giommu_list); QLIST_INIT(&container->hostwin_list); + QLIST_INIT(&container->vrdl_list); ret = vfio_init_container(container, group->fd, errp); if (ret) { From patchwork Wed Jul 7 19:32:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363973 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F00A4C07E95 for ; Wed, 7 Jul 2021 19:43:03 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B6A1261C84 for ; Wed, 7 Jul 2021 19:43:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6A1261C84 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:59870 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DRl-0004p9-UV for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:43:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45158) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIx-0002YI-GK for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:46436) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIv-0007jY-MR for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686432; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C24ZGd0Da8KxgT6BRhWaJFq+MFxJcTPCpX5y0hzpiXs=; b=IUruqlpNSLv3FBdfeY+/q/H6a9kVDasWhvGEkggP+70z7VmbjvdGKkJSPgkdCGjTCUaiX4 ja1FRkVp9PefJ3iQX1TcCk9BrvZjy4iQNy/+Pw3O+r4m9imbbpFb7pWyO6aqJfI9/M8NYA obrp1jK+CE8FkE02H/XgHUADhgtfX4I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-560-lRgVg2BKO5uZAzYNWXbS4w-1; Wed, 07 Jul 2021 15:33:51 -0400 X-MC-Unique: lRgVg2BKO5uZAzYNWXbS4w-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 21A09801107; Wed, 7 Jul 2021 19:33:50 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB8DE5D6A8; Wed, 7 Jul 2021 19:33:49 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 09/15] vfio: Query and store the maximum number of possible DMA mappings Date: Wed, 7 Jul 2021 15:32:35 -0400 Message-Id: <20210707193241.2659335-10-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Let's query the maximum number of possible DMA mappings by querying the available mappings when creating the container (before any mappings are created). We'll use this informaton soon to perform some sanity checks and warn the user. Reviewed-by: Alex Williamson Acked-by: Alex Williamson Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-8-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/hw/vfio/vfio-common.h | 1 + hw/vfio/common.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 681432213d1..8af11b0a769 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -88,6 +88,7 @@ typedef struct VFIOContainer { uint64_t dirty_pgsizes; uint64_t max_dirty_bitmap_size; unsigned long pgsizes; + unsigned int dma_max_mappings; QLIST_HEAD(, VFIOGuestIOMMU) giommu_list; QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list; QLIST_HEAD(, VFIOGroup) group_list; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 5af77552279..79628d60aed 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1931,6 +1931,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, container->fd = fd; container->error = NULL; container->dirty_pages_supported = false; + container->dma_max_mappings = 0; QLIST_INIT(&container->giommu_list); QLIST_INIT(&container->hostwin_list); QLIST_INIT(&container->vrdl_list); @@ -1962,7 +1963,10 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, vfio_host_win_add(container, 0, (hwaddr)-1, info->iova_pgsizes); container->pgsizes = info->iova_pgsizes; + /* The default in the kernel ("dma_entry_limit") is 65535. */ + container->dma_max_mappings = 65535; if (!ret) { + vfio_get_info_dma_avail(info, &container->dma_max_mappings); vfio_get_iommu_info_migration(container, info); } g_free(info); From patchwork Wed Jul 7 19:32:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB643C07E95 for ; Wed, 7 Jul 2021 19:45:54 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1B27761CB0 for ; Wed, 7 Jul 2021 19:45:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B27761CB0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:38312 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DUX-0000oW-7l for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:45:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45170) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIy-0002e4-VW for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:57 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:32213) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DIw-0007jm-8g for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:33:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686433; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kJ5LwKOrrp3cXPafTOZWAfLKPhKVXnWGeznvU4GZumI=; b=ZlqMP5ZIcauyBiXWgP0RJXLa2M/wmu044cKADr8qz+eLI2P+XVvEk2PZLGoShApbgItWSx /nBjCdUMf9Mxu/oWkFbupJsOcVCeTpMTyBjAKIT+nf5/VuB2fVLW1Ar7xM5P8Nql3mpw/2 y7vnHA2UjaiKZV7j1VHxW5cBTTqrioU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-148-UEJKfup5OaW8_oDDc0o_tw-1; Wed, 07 Jul 2021 15:33:52 -0400 X-MC-Unique: UEJKfup5OaW8_oDDc0o_tw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1AD5B10B7463; Wed, 7 Jul 2021 19:33:51 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id C7E9E5D6A8; Wed, 7 Jul 2021 19:33:50 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 10/15] vfio: Sanity check maximum number of DMA mappings with RamDiscardManager Date: Wed, 7 Jul 2021 15:32:36 -0400 Message-Id: <20210707193241.2659335-11-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand Although RamDiscardManager can handle running into the maximum number of DMA mappings by propagating errors when creating a DMA mapping, we want to sanity check and warn the user early that there is a theoretical setup issue and that virtio-mem might not be able to provide as much memory towards a VM as desired. As suggested by Alex, let's use the number of KVM memory slots to guess how many other mappings we might see over time. Acked-by: Alex Williamson Reviewed-by: Alex Williamson Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-9-david@redhat.com> Signed-off-by: Eduardo Habkost --- hw/vfio/common.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 79628d60aed..f8a2fe8441a 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -728,6 +728,49 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container, vfio_ram_discard_notify_discard, true); ram_discard_manager_register_listener(rdm, &vrdl->listener, section); QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next); + + /* + * Sanity-check if we have a theoretically problematic setup where we could + * exceed the maximum number of possible DMA mappings over time. We assume + * that each mapped section in the same address space as a RamDiscardManager + * section consumes exactly one DMA mapping, with the exception of + * RamDiscardManager sections; i.e., we don't expect to have gIOMMU sections + * in the same address space as RamDiscardManager sections. + * + * We assume that each section in the address space consumes one memslot. + * We take the number of KVM memory slots as a best guess for the maximum + * number of sections in the address space we could have over time, + * also consuming DMA mappings. + */ + if (container->dma_max_mappings) { + unsigned int vrdl_count = 0, vrdl_mappings = 0, max_memslots = 512; + +#ifdef CONFIG_KVM + if (kvm_enabled()) { + max_memslots = kvm_get_max_memslots(); + } +#endif + + QLIST_FOREACH(vrdl, &container->vrdl_list, next) { + hwaddr start, end; + + start = QEMU_ALIGN_DOWN(vrdl->offset_within_address_space, + vrdl->granularity); + end = ROUND_UP(vrdl->offset_within_address_space + vrdl->size, + vrdl->granularity); + vrdl_mappings += (end - start) / vrdl->granularity; + vrdl_count++; + } + + if (vrdl_mappings + max_memslots - vrdl_count > + container->dma_max_mappings) { + warn_report("%s: possibly running out of DMA mappings. E.g., try" + " increasing the 'block-size' of virtio-mem devies." + " Maximum possible DMA mappings: %d, Maximum possible" + " memslots: %d", __func__, container->dma_max_mappings, + max_memslots); + } + } } static void vfio_unregister_ram_discard_listener(VFIOContainer *container, From patchwork Wed Jul 7 19:32:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4E95C07E95 for ; Wed, 7 Jul 2021 19:38:11 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6FEF661C5B for ; Wed, 7 Jul 2021 19:38:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6FEF661C5B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:46564 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DN4-0003xR-Lv for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:38:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45342) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJJ-0003TT-UG for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:17 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:40417) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJE-0007mq-HU for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2+8IfqQN0CCqbn+cyAHI5vCXe7BGvi3hH9f/m9ZiP5M=; b=bj+zVE+7dTCwFPJJGS2NX1hT/NP3FYxARO+stGOvDRKtzYfwx5di/F53PgnEyk/YQMCemu a2EXoaLukT56djSgLLs7A1rmYDmK6Wwdd+l4NO+BK3Ev89sBW9vhW+kz4f297jkppldfBK X+IyeSZ0u9J+mCxfkllTQgleRjSQ5w0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-169-NESVKRfCMpyol0wkRhpNAw-1; Wed, 07 Jul 2021 15:34:10 -0400 X-MC-Unique: NESVKRfCMpyol0wkRhpNAw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E467510C1ADC; Wed, 7 Jul 2021 19:34:08 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id C6F7D369A; Wed, 7 Jul 2021 19:33:51 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 11/15] vfio: Support for RamDiscardManager in the vIOMMU case Date: Wed, 7 Jul 2021 15:32:37 -0400 Message-Id: <20210707193241.2659335-12-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand vIOMMU support works already with RamDiscardManager as long as guests only map populated memory. Both, populated and discarded memory is mapped into &address_space_memory, where vfio_get_xlat_addr() will find that memory, to create the vfio mapping. Sane guests will never map discarded memory (e.g., unplugged memory blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while memory is getting discarded. However, there are two cases where a malicious guests could trigger pinning of more memory than intended. One case is easy to handle: the guest trying to map discarded memory into an IOMMU. The other case is harder to handle: the guest keeping memory mapped in the IOMMU while it is getting discarded. We would have to walk over all mappings when discarding memory and identify if any mapping would be a violation. Let's keep it simple for now and print a warning, indicating that setting RLIMIT_MEMLOCK can mitigate such attacks. We have to take care of incoming migration: at the point the IOMMUs get restored and start creating mappings in vfio, RamDiscardManager implementations might not be back up and running yet: let's add runstate priorities to enforce the order when restoring. Acked-by: Alex Williamson Reviewed-by: Alex Williamson Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-10-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/migration/vmstate.h | 1 + hw/vfio/common.c | 39 +++++++++++++++++++++++++++++++++++++ hw/virtio/virtio-mem.c | 1 + 3 files changed, 41 insertions(+) diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h index 8df7b69f389..017c03675ca 100644 --- a/include/migration/vmstate.h +++ b/include/migration/vmstate.h @@ -153,6 +153,7 @@ typedef enum { MIG_PRI_DEFAULT = 0, MIG_PRI_IOMMU, /* Must happen before PCI devices */ MIG_PRI_PCI_BUS, /* Must happen before IOMMU */ + MIG_PRI_VIRTIO_MEM, /* Must happen before IOMMU */ MIG_PRI_GICV3_ITS, /* Must happen before PCI devices */ MIG_PRI_GICV3, /* Must happen before the ITS */ MIG_PRI_MAX, diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f8a2fe8441a..8a9bbf27918 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -36,6 +36,7 @@ #include "qemu/range.h" #include "sysemu/kvm.h" #include "sysemu/reset.h" +#include "sysemu/runstate.h" #include "trace.h" #include "qapi/error.h" #include "migration/migration.h" @@ -569,6 +570,44 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, error_report("iommu map to non memory area %"HWADDR_PRIx"", xlat); return false; + } else if (memory_region_has_ram_discard_manager(mr)) { + RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr); + MemoryRegionSection tmp = { + .mr = mr, + .offset_within_region = xlat, + .size = int128_make64(len), + }; + + /* + * Malicious VMs can map memory into the IOMMU, which is expected + * to remain discarded. vfio will pin all pages, populating memory. + * Disallow that. vmstate priorities make sure any RamDiscardManager + * were already restored before IOMMUs are restored. + */ + if (!ram_discard_manager_is_populated(rdm, &tmp)) { + error_report("iommu map to discarded memory (e.g., unplugged via" + " virtio-mem): %"HWADDR_PRIx"", + iotlb->translated_addr); + return false; + } + + /* + * Malicious VMs might trigger discarding of IOMMU-mapped memory. The + * pages will remain pinned inside vfio until unmapped, resulting in a + * higher memory consumption than expected. If memory would get + * populated again later, there would be an inconsistency between pages + * pinned by vfio and pages seen by QEMU. This is the case until + * unmapped from the IOMMU (e.g., during device reset). + * + * With malicious guests, we really only care about pinning more memory + * than expected. RLIMIT_MEMLOCK set for the user/process can never be + * exceeded and can be used to mitigate this problem. + */ + warn_report_once("Using vfio with vIOMMUs and coordinated discarding of" + " RAM (e.g., virtio-mem) works, however, malicious" + " guests can trigger pinning of more memory than" + " intended via an IOMMU. It's possible to mitigate " + " by setting/adjusting RLIMIT_MEMLOCK."); } /* diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index f60cb8a3fc0..368ae1db903 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -886,6 +886,7 @@ static const VMStateDescription vmstate_virtio_mem_device = { .name = "virtio-mem-device", .minimum_version_id = 1, .version_id = 1, + .priority = MIG_PRI_VIRTIO_MEM, .post_load = virtio_mem_post_load, .fields = (VMStateField[]) { VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks, From patchwork Wed Jul 7 19:32:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363971 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68670C07E95 for ; Wed, 7 Jul 2021 19:41:36 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BB50561CB0 for ; Wed, 7 Jul 2021 19:41:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB50561CB0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55266 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DQM-0001j3-MI for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:41:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45438) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJZ-0004BO-H3 for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:33 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41024) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJW-0007nv-HU for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686470; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=muJDJ7+wfZs2/M1m0FdspOh9UZ+U4wDZmyCZZIiyx5A=; b=RFXgj1NTysGWtxp+oDG2fb1kpPAemvmHf6mAHAoriOqlWT1/9Cmfb6rmnKyYjlaFIHSdA2 BjGuS+zMHpWgsSxIrZwNbWZrYa6u4yis3X5rGBMoFidnzQomViCzP12uZ9jcZ1l95kq5iG bjU2c7UWM9LGuYLEuN2b5McIdIYhdCI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-505-kQaNX9OPPUKCenRNG8JFYQ-1; Wed, 07 Jul 2021 15:34:28 -0400 X-MC-Unique: kQaNX9OPPUKCenRNG8JFYQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E0C821023F45; Wed, 7 Jul 2021 19:34:26 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id B90BD100EBB0; Wed, 7 Jul 2021 19:34:09 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 12/15] softmmu/physmem: Don't use atomic operations in ram_block_discard_(disable|require) Date: Wed, 7 Jul 2021 15:32:38 -0400 Message-Id: <20210707193241.2659335-13-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , David Hildenbrand , "Michael S . Tsirkin" , Alex Williamson , Peter Xu , "Dr . David Alan Gilbert" , Auger Eric , Pankaj Gupta , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand We have users in migration context that don't hold the BQL (when finishing migration). To prepare for further changes, use a dedicated mutex instead of atomic operations. Keep using qatomic_read ("READ_ONCE") for the functions that only extract the current state (e.g., used by virtio-balloon), locking isn't necessary. While at it, split up the counter into two variables to make it easier to understand. Suggested-by: Peter Xu Reviewed-by: Peter Xu Reviewed-by: Pankaj Gupta Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-11-david@redhat.com> Signed-off-by: Eduardo Habkost --- softmmu/physmem.c | 70 ++++++++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 31 deletions(-) diff --git a/softmmu/physmem.c b/softmmu/physmem.c index 9b171c9dbe5..f1275b61f84 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -3684,56 +3684,64 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root) } } -/* - * If positive, discarding RAM is disabled. If negative, discarding RAM is - * required to work and cannot be disabled. - */ -static int ram_block_discard_disabled; +static unsigned int ram_block_discard_required_cnt; +static unsigned int ram_block_discard_disabled_cnt; +static QemuMutex ram_block_discard_disable_mutex; + +static void ram_block_discard_disable_mutex_lock(void) +{ + static gsize initialized; + + if (g_once_init_enter(&initialized)) { + qemu_mutex_init(&ram_block_discard_disable_mutex); + g_once_init_leave(&initialized, 1); + } + qemu_mutex_lock(&ram_block_discard_disable_mutex); +} + +static void ram_block_discard_disable_mutex_unlock(void) +{ + qemu_mutex_unlock(&ram_block_discard_disable_mutex); +} int ram_block_discard_disable(bool state) { - int old; + int ret = 0; + ram_block_discard_disable_mutex_lock(); if (!state) { - qatomic_dec(&ram_block_discard_disabled); - return 0; + ram_block_discard_disabled_cnt--; + } else if (!ram_block_discard_required_cnt) { + ram_block_discard_disabled_cnt++; + } else { + ret = -EBUSY; } - - do { - old = qatomic_read(&ram_block_discard_disabled); - if (old < 0) { - return -EBUSY; - } - } while (qatomic_cmpxchg(&ram_block_discard_disabled, - old, old + 1) != old); - return 0; + ram_block_discard_disable_mutex_unlock(); + return ret; } int ram_block_discard_require(bool state) { - int old; + int ret = 0; + ram_block_discard_disable_mutex_lock(); if (!state) { - qatomic_inc(&ram_block_discard_disabled); - return 0; + ram_block_discard_required_cnt--; + } else if (!ram_block_discard_disabled_cnt) { + ram_block_discard_required_cnt++; + } else { + ret = -EBUSY; } - - do { - old = qatomic_read(&ram_block_discard_disabled); - if (old > 0) { - return -EBUSY; - } - } while (qatomic_cmpxchg(&ram_block_discard_disabled, - old, old - 1) != old); - return 0; + ram_block_discard_disable_mutex_unlock(); + return ret; } bool ram_block_discard_is_disabled(void) { - return qatomic_read(&ram_block_discard_disabled) > 0; + return qatomic_read(&ram_block_discard_disabled_cnt); } bool ram_block_discard_is_required(void) { - return qatomic_read(&ram_block_discard_disabled) < 0; + return qatomic_read(&ram_block_discard_required_cnt); } From patchwork Wed Jul 7 19:32:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363961 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F45DC11F66 for ; Wed, 7 Jul 2021 19:37:10 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AF0BA61C77 for ; Wed, 7 Jul 2021 19:37:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF0BA61C77 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:43370 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DM4-0001iD-RA for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:37:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45510) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJp-0004z4-VC for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:47169) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DJn-0007pT-V7 for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:34:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686487; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TUlkG2qHARb3AwaZ4WyQyaPRh3qzS1d7m4+PGhzg/Ww=; b=CceHdhrRMcnBjlym1dEuDygnsA+e9ASfSSpf3637JgjUuXHZ2VongeiMllSfwkvG9xnhwT poaEp2yfrn2nxL5hGQ0SFzahoVIyq/joEOiLf8xgsgEYXWR4JQcsxsib0YANuPNYA2Aavh JkJi9XwEUF6+PWgxx890EdWaPEEf9pc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-165-BQbVVz3WO1-Iti3l1MJMIQ-1; Wed, 07 Jul 2021 15:34:46 -0400 X-MC-Unique: BQbVVz3WO1-Iti3l1MJMIQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 62EE610B7464; Wed, 7 Jul 2021 19:34:44 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id A38735C1C2; Wed, 7 Jul 2021 19:34:27 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 13/15] softmmu/physmem: Extend ram_block_discard_(require|disable) by two discard types Date: Wed, 7 Jul 2021 15:32:39 -0400 Message-Id: <20210707193241.2659335-14-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=216.205.24.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , Alex Williamson , Peter Xu , "Dr . David Alan Gilbert" , Auger Eric , Pankaj Gupta , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand We want to separate the two cases whereby we discard ram - uncoordinated: e.g., virito-balloon - coordinated: e.g., virtio-mem coordinated via the RamDiscardManager Reviewed-by: Pankaj Gupta Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-12-david@redhat.com> Signed-off-by: Eduardo Habkost --- include/exec/memory.h | 18 +++++++++++++-- softmmu/physmem.c | 54 ++++++++++++++++++++++++++++++++++++++----- 2 files changed, 64 insertions(+), 8 deletions(-) diff --git a/include/exec/memory.h b/include/exec/memory.h index a23487fdb30..87357a724aa 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -2893,6 +2893,12 @@ static inline MemOp devend_memop(enum device_endian end) */ int ram_block_discard_disable(bool state); +/* + * See ram_block_discard_disable(): only disable uncoordinated discards, + * keeping coordinated discards (via the RamDiscardManager) enabled. + */ +int ram_block_uncoordinated_discard_disable(bool state); + /* * Inhibit technologies that disable discarding of pages in RAM blocks. * @@ -2902,12 +2908,20 @@ int ram_block_discard_disable(bool state); int ram_block_discard_require(bool state); /* - * Test if discarding of memory in ram blocks is disabled. + * See ram_block_discard_require(): only inhibit technologies that disable + * uncoordinated discarding of pages in RAM blocks, allowing co-existance with + * technologies that only inhibit uncoordinated discards (via the + * RamDiscardManager). + */ +int ram_block_coordinated_discard_require(bool state); + +/* + * Test if any discarding of memory in ram blocks is disabled. */ bool ram_block_discard_is_disabled(void); /* - * Test if discarding of memory in ram blocks is required to work reliably. + * Test if any discarding of memory in ram blocks is required to work reliably. */ bool ram_block_discard_is_required(void); diff --git a/softmmu/physmem.c b/softmmu/physmem.c index f1275b61f84..3c1912a1a07 100644 --- a/softmmu/physmem.c +++ b/softmmu/physmem.c @@ -3684,8 +3684,14 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root) } } +/* Require any discards to work. */ static unsigned int ram_block_discard_required_cnt; +/* Require only coordinated discards to work. */ +static unsigned int ram_block_coordinated_discard_required_cnt; +/* Disable any discards. */ static unsigned int ram_block_discard_disabled_cnt; +/* Disable only uncoordinated discards. */ +static unsigned int ram_block_uncoordinated_discard_disabled_cnt; static QemuMutex ram_block_discard_disable_mutex; static void ram_block_discard_disable_mutex_lock(void) @@ -3711,10 +3717,27 @@ int ram_block_discard_disable(bool state) ram_block_discard_disable_mutex_lock(); if (!state) { ram_block_discard_disabled_cnt--; - } else if (!ram_block_discard_required_cnt) { - ram_block_discard_disabled_cnt++; + } else if (ram_block_discard_required_cnt || + ram_block_coordinated_discard_required_cnt) { + ret = -EBUSY; } else { + ram_block_discard_disabled_cnt++; + } + ram_block_discard_disable_mutex_unlock(); + return ret; +} + +int ram_block_uncoordinated_discard_disable(bool state) +{ + int ret = 0; + + ram_block_discard_disable_mutex_lock(); + if (!state) { + ram_block_uncoordinated_discard_disabled_cnt--; + } else if (ram_block_discard_required_cnt) { ret = -EBUSY; + } else { + ram_block_uncoordinated_discard_disabled_cnt++; } ram_block_discard_disable_mutex_unlock(); return ret; @@ -3727,10 +3750,27 @@ int ram_block_discard_require(bool state) ram_block_discard_disable_mutex_lock(); if (!state) { ram_block_discard_required_cnt--; - } else if (!ram_block_discard_disabled_cnt) { - ram_block_discard_required_cnt++; + } else if (ram_block_discard_disabled_cnt || + ram_block_uncoordinated_discard_disabled_cnt) { + ret = -EBUSY; } else { + ram_block_discard_required_cnt++; + } + ram_block_discard_disable_mutex_unlock(); + return ret; +} + +int ram_block_coordinated_discard_require(bool state) +{ + int ret = 0; + + ram_block_discard_disable_mutex_lock(); + if (!state) { + ram_block_coordinated_discard_required_cnt--; + } else if (ram_block_discard_disabled_cnt) { ret = -EBUSY; + } else { + ram_block_coordinated_discard_required_cnt++; } ram_block_discard_disable_mutex_unlock(); return ret; @@ -3738,10 +3778,12 @@ int ram_block_discard_require(bool state) bool ram_block_discard_is_disabled(void) { - return qatomic_read(&ram_block_discard_disabled_cnt); + return qatomic_read(&ram_block_discard_disabled_cnt) || + qatomic_read(&ram_block_uncoordinated_discard_disabled_cnt); } bool ram_block_discard_is_required(void) { - return qatomic_read(&ram_block_discard_required_cnt); + return qatomic_read(&ram_block_discard_required_cnt) || + qatomic_read(&ram_block_coordinated_discard_required_cnt); } From patchwork Wed Jul 7 19:32:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12363979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52044C07E95 for ; Wed, 7 Jul 2021 19:47:32 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD8D161CB0 for ; Wed, 7 Jul 2021 19:47:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD8D161CB0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:40538 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DW6-0002Oi-R6 for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:47:30 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45568) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DK7-0005hQ-ML for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:35:08 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:42590) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DK5-0007qL-N4 for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:35:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686505; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jmMy1Y1I/TVjQg7dab2WOf/FV4GNTyiNEjt9xnQkh60=; b=YJuCAkZPb52PBKVPWexRsOuEYc4S1BQSScwVleKAn4hjvhXUcDPJ87L3S00K8caer2vd7q /3ihhjJFS/DCQ+5M4AugvTOnZD6Ex2+qmWDN389z02hU7F/Vrb8/dWht0G8dRrtrJQ4bHU kA+TiWOI60Me36sJS390TDVJBgTc9hU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-380-o3J-z7UyN2CW59D-1kHOnw-1; Wed, 07 Jul 2021 15:35:03 -0400 X-MC-Unique: o3J-z7UyN2CW59D-1kHOnw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 782C51932484; Wed, 7 Jul 2021 19:35:02 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22CF65D741; Wed, 7 Jul 2021 19:34:45 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 14/15] virtio-mem: Require only coordinated discards Date: Wed, 7 Jul 2021 15:32:40 -0400 Message-Id: <20210707193241.2659335-15-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Alex Williamson , Eduardo Habkost , David Hildenbrand , "Michael S . Tsirkin" , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Pankaj Gupta , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand We implement the RamDiscardManager interface and only require coordinated discarding of RAM to work. Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Pankaj Gupta Acked-by: Michael S. Tsirkin Reviewed-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-13-david@redhat.com> Signed-off-by: Eduardo Habkost --- hw/virtio/virtio-mem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c index 368ae1db903..df91e454b2f 100644 --- a/hw/virtio/virtio-mem.c +++ b/hw/virtio/virtio-mem.c @@ -719,7 +719,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) return; } - if (ram_block_discard_require(true)) { + if (ram_block_coordinated_discard_require(true)) { error_setg(errp, "Discarding RAM is disabled"); return; } @@ -727,7 +727,7 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp) ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb)); if (ret) { error_setg_errno(errp, -ret, "Unexpected error discarding RAM"); - ram_block_discard_require(false); + ram_block_coordinated_discard_require(false); return; } @@ -771,7 +771,7 @@ static void virtio_mem_device_unrealize(DeviceState *dev) virtio_del_queue(vdev, 0); virtio_cleanup(vdev); g_free(vmem->bitmap); - ram_block_discard_require(false); + ram_block_coordinated_discard_require(false); } static int virtio_mem_discard_range_cb(const VirtIOMEM *vmem, void *arg, From patchwork Wed Jul 7 19:32:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduardo Habkost X-Patchwork-Id: 12364001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C196C07E95 for ; Wed, 7 Jul 2021 19:48:55 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F2A1061CBE for ; Wed, 7 Jul 2021 19:48:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F2A1061CBE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:44128 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m1DXS-0004lZ-5e for qemu-devel@archiver.kernel.org; Wed, 07 Jul 2021 15:48:54 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45588) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DKA-0005i3-GB for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:35:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:48077) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m1DK7-0007qZ-FF for qemu-devel@nongnu.org; Wed, 07 Jul 2021 15:35:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625686506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v+oduKwky4JwZRsX/JwVarVY96A/4FN8qrmBVH0MMfQ=; b=K79T7uOQ1KeXi6zZaXiq94heYLxrmKNZ5TBUCf3PnowKs/+gWLMMGKl5e1OFFsgxF7if89 Np7PrEmcIp/NI8Ha4uy1Iu51yNYwjMZ4fjVGo05vbtGoJbShozfA3UbwaWUfGn3GVLDDxj FSedV62sWa0BCQ3UC21P7cMhZO7edB8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-428-ylbAmG22OfCJ_XJtdxnpjw-1; Wed, 07 Jul 2021 15:35:04 -0400 X-MC-Unique: ylbAmG22OfCJ_XJtdxnpjw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 85E6C1023F45; Wed, 7 Jul 2021 19:35:03 +0000 (UTC) Received: from localhost (ovpn-113-79.rdu2.redhat.com [10.10.113.79]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3AFFF5D6A8; Wed, 7 Jul 2021 19:35:03 +0000 (UTC) From: Eduardo Habkost To: Peter Maydell , qemu-devel@nongnu.org Subject: [PULL 15/15] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus Date: Wed, 7 Jul 2021 15:32:41 -0400 Message-Id: <20210707193241.2659335-16-ehabkost@redhat.com> In-Reply-To: <20210707193241.2659335-1-ehabkost@redhat.com> References: <20210707193241.2659335-1-ehabkost@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ehabkost@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Received-SPF: pass client-ip=170.10.133.124; envelope-from=ehabkost@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.439, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pankaj Gupta , Eduardo Habkost , "Michael S . Tsirkin" , David Hildenbrand , "Dr . David Alan Gilbert" , Peter Xu , Auger Eric , Alex Williamson , teawater , Igor Mammedov , Paolo Bonzini , Marek Kedzierski , Wei Yang Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" From: David Hildenbrand We support coordinated discarding of RAM using the RamDiscardManager for the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards, keeping uncoordinated discards (e.g., via virtio-balloon) disabled if possible. This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://" by the block layer has to be implemented/unlocked separately. For now, virtio-mem only supports x86-64; we don't restrict RamDiscardManager to x86-64, though: arm64 and s390x are supposed to work as well, and we'll test once unlocking virtio-mem support. The spapr IOMMUs will need special care, to be tackled later, e.g.., once supporting virtio-mem. Note: The block size of a virtio-mem device has to be set to sane sizes, depending on the maximum hotplug size - to not run out of vfio mappings. The default virtio-mem block size is usually in the range of a couple of MBs. The maximum number of mapping is 64k, shared with other users. Assume you want to hotplug 256GB using virtio-mem - the block size would have to be set to at least 8 MiB (resulting in 32768 separate mappings). Acked-by: Alex Williamson Reviewed-by: Alex Williamson Acked-by: Michael S. Tsirkin Cc: Paolo Bonzini Cc: "Michael S. Tsirkin" Cc: Alex Williamson Cc: Dr. David Alan Gilbert Cc: Igor Mammedov Cc: Pankaj Gupta Cc: Peter Xu Cc: Auger Eric Cc: Wei Yang Cc: teawater Cc: Marek Kedzierski Signed-off-by: David Hildenbrand Message-Id: <20210413095531.25603-14-david@redhat.com> Signed-off-by: Eduardo Habkost --- hw/vfio/common.c | 65 +++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 53 insertions(+), 12 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 8a9bbf27918..3f0d1113608 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -135,6 +135,29 @@ static const char *index_to_str(VFIODevice *vbasedev, int index) } } +static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state) +{ + switch (container->iommu_type) { + case VFIO_TYPE1v2_IOMMU: + case VFIO_TYPE1_IOMMU: + /* + * We support coordinated discarding of RAM via the RamDiscardManager. + */ + return ram_block_uncoordinated_discard_disable(state); + default: + /* + * VFIO_SPAPR_TCE_IOMMU most probably works just fine with + * RamDiscardManager, however, it is completely untested. + * + * VFIO_SPAPR_TCE_v2_IOMMU with "DMA memory preregistering" does + * completely the opposite of managing mapping/pinning dynamically as + * required by RamDiscardManager. We would have to special-case sections + * with a RamDiscardManager. + */ + return ram_block_discard_disable(state); + } +} + int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex, int action, int fd, Error **errp) { @@ -1977,15 +2000,25 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, * new memory, it will not yet set ram_block_discard_set_required() and * therefore, neither stops us here or deals with the sudden memory * consumption of inflated memory. + * + * We do support discarding of memory coordinated via the RamDiscardManager + * with some IOMMU types. vfio_ram_block_discard_disable() handles the + * details once we know which type of IOMMU we are using. */ - ret = ram_block_discard_disable(true); - if (ret) { - error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken"); - return ret; - } QLIST_FOREACH(container, &space->containers, next) { if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { + ret = vfio_ram_block_discard_disable(container, true); + if (ret) { + error_setg_errno(errp, -ret, + "Cannot set discarding of RAM broken"); + if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, + &container->fd)) { + error_report("vfio: error disconnecting group %d from" + " container", group->groupid); + } + return ret; + } group->container = container; QLIST_INSERT_HEAD(&container->group_list, group, container_next); vfio_kvm_device_add_group(group); @@ -2023,6 +2056,12 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, goto free_container_exit; } + ret = vfio_ram_block_discard_disable(container, true); + if (ret) { + error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken"); + goto free_container_exit; + } + switch (container->iommu_type) { case VFIO_TYPE1v2_IOMMU: case VFIO_TYPE1_IOMMU: @@ -2070,7 +2109,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, if (ret) { error_setg_errno(errp, errno, "failed to enable container"); ret = -errno; - goto free_container_exit; + goto enable_discards_exit; } } else { container->prereg_listener = vfio_prereg_listener; @@ -2082,7 +2121,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, ret = -1; error_propagate_prepend(errp, container->error, "RAM memory listener initialization failed: "); - goto free_container_exit; + goto enable_discards_exit; } } @@ -2095,7 +2134,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, if (v2) { memory_listener_unregister(&container->prereg_listener); } - goto free_container_exit; + goto enable_discards_exit; } if (v2) { @@ -2110,7 +2149,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, if (ret) { error_setg_errno(errp, -ret, "failed to remove existing window"); - goto free_container_exit; + goto enable_discards_exit; } } else { /* The default table uses 4K pages */ @@ -2151,6 +2190,9 @@ listener_release_exit: vfio_kvm_device_del_group(group); vfio_listener_release(container); +enable_discards_exit: + vfio_ram_block_discard_disable(container, false); + free_container_exit: g_free(container); @@ -2158,7 +2200,6 @@ close_fd_exit: close(fd); put_space_exit: - ram_block_discard_disable(false); vfio_put_address_space(space); return ret; @@ -2280,7 +2321,7 @@ void vfio_put_group(VFIOGroup *group) } if (!group->ram_block_discard_allowed) { - ram_block_discard_disable(false); + vfio_ram_block_discard_disable(group->container, false); } vfio_kvm_device_del_group(group); vfio_disconnect_container(group); @@ -2334,7 +2375,7 @@ int vfio_get_device(VFIOGroup *group, const char *name, if (!group->ram_block_discard_allowed) { group->ram_block_discard_allowed = true; - ram_block_discard_disable(false); + vfio_ram_block_discard_disable(group->container, false); } }