From patchwork Thu Dec 12 06:36:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904720 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2058.outbound.protection.outlook.com [40.107.244.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E701318787A; Thu, 12 Dec 2024 06:37:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.58 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985466; cv=fail; b=pr8bqk9G+l9be0OAH2p226iyU/BOBsG82CHmXSH8yrPVllCL9AvYgBnRZt2a9pyyJeoYlfuGF2Z9yws2xF/75SdUHbaFvQZ4cF31SZzl1jFA1IbGKdXUjGjQXc6WURTrbj64eQutzf4dHw71lj1m+qoQxsEtoXKghN5RyXHJIWs= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985466; c=relaxed/simple; bh=DKuI1bIedzb8yEPhsKd99F1Kuy92BaF8v5ekTLYOqOM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gN5zwr453rEOWDqD6A6ABq7F1Rrs0e2QnkKy74hNMj7Rq5vnJgEmY4RQdECQGsFsDRAMv9kgh0YZpJOTP6/eHoCDN7LeYvIPZAtGLQq4wyEkkmLP805FWeNzGSN5S7Eo+0CYq/vsR8j8Un0ZWVyUb1i+MUG4ni8ZnSsDQvB9AIU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=E9vkbK65; arc=fail smtp.client-ip=40.107.244.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="E9vkbK65" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=A/nw6CHuS9bWYDtT+GJHTbxL/zdlGYLQQAvdF33I0ux4mwHLpxwglIknZzHhCG1QA1GfIOHfxcAOQ2iZA32uf3MSrPEkS2uYgrs0gk0+qWwpBDcQmKPCdX9lidvEYSjWxlYr07nwRPdQ8tVpgruPNEVqroRGaZEA6+hUyP0PCjfUZIbQnsRExgYZ45hhVYzPzYV6CxI5A2giEQQQvW8qQfEOS5RJ5LqGsS0u8MH+ykVMcQWAX8Ra5muAscmIhytYlL2ZhderYtx1ClErtQR+h1Ni6O5TfDsPvNEbjDl0jIhod/8yh6w06N5z3tCegdiOcXH1SLlpPoyoamXXHdlD+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SEpCbqaP6FQ1j0TWnQ3nVGlIYvGyQD458x1MBo6G7N8=; b=yCh6md83K+GNaTIAQ/Jitl8GZsT2Lfz7825jGcaWOkoTigZ03zlcXXu0CGwzDUhF3Z351NGvm0T0fnU1xU+8T6wIdxCBEOxIvONF3/QQFVw0xTpLonl1VkPFlFqkxGEJB+g6xYBjtstJfQq/sXQA6CDrYwcHX4wgGqQCMQU1kgUv537QZ/y0hEQQJn/aRi72JehuZSb7IaNXk/uZ1O9WXb+MSglxGYNbrSKCYlzm9QBfsle/j2PLYCSDorNrPNoiXnHkE/b7kt0IKL9DHPLTC295PfXYiqYxgPzg/+t5KKNftfdHQVB7dXof+paTLXjnOrWW2qOMWcTqKmFpJ1XXug== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=SEpCbqaP6FQ1j0TWnQ3nVGlIYvGyQD458x1MBo6G7N8=; b=E9vkbK651amljcd47a3F6zCVJLJGYZCtn07JzVzfe1eM12F0tIrCCjDzEPnDZ91Bed2FaELbovFhjUsrQK8StDT8ZS8m7zaEnzKWBoGkZtR6uvkAU8IKm+8M8rtIGRkXA/mMX2ZjhJlLRpzITFjqg0Z0c7eZUkSC+yJs7VV9Kqw= Received: from CH0PR03CA0017.namprd03.prod.outlook.com (2603:10b6:610:b0::22) by PH0PR12MB8822.namprd12.prod.outlook.com (2603:10b6:510:28d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8230.20; Thu, 12 Dec 2024 06:37:40 +0000 Received: from CH2PEPF000000A0.namprd02.prod.outlook.com (2603:10b6:610:b0:cafe::df) by CH0PR03CA0017.outlook.office365.com (2603:10b6:610:b0::22) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8158.24 via Frontend Transport; Thu, 12 Dec 2024 06:37:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF000000A0.mail.protection.outlook.com (10.167.244.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:37:39 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:37:38 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH 1/5] KVM: gmem: Don't rely on __kvm_gmem_get_pfn() for preparedness Date: Thu, 12 Dec 2024 00:36:31 -0600 Message-ID: <20241212063635.712877-2-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241212063635.712877-1-michael.roth@amd.com> References: <20241212063635.712877-1-michael.roth@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF000000A0:EE_|PH0PR12MB8822:EE_ X-MS-Office365-Filtering-Correlation-Id: 565f4e3d-34e6-4fa7-7255-08dd1a777972 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|376014|82310400026|7416014; X-Microsoft-Antispam-Message-Info: 2LwviWw6veuI0RziRn3eKGsjkbIWG9gjmwK6jYLERg5Qj2NSP58BwE17rZLR8Qv5Q3LiO/gJF6CA72cnQJ+iWvZVZ7unpVDr9/iAZJC8tSGUy7k1dRIpgcOl5IS/nuCTwyM2ESEB9x6uTS3Wd6Jo/8iu5smWEf3f36rtU/HYyIgwKsCTCRwCW/jTb+Z32ed2XI4+ydGJdvoDRwT4zGl1TtLrcRK59RYtocynh3pzbIsvmFKh9yVM93W5U7gzi1fdZCSebJYkpz1WMx+CssOnpFhUiNhbiAy9OUD8uv5m8yblnOYv1G/vYJPnArxbFoS+rJxVdoKm3WdQrP0sQnE1nGFWa5E8L7E5glE3dRneQSplEUdLoegUvEDUXV2YocXIkNmFhlroQgiRJ84m3Agbwjxg1wjmGpY0bR6GcPzHza43fSPp48vhRnfk9bIY+r31nGamBbbwnUcSd1lyDbowPjSHZMG5KH+akf5pWgjPwUnVXRKXuRKY81pAGflaOoQ5Jd1kqtIVMxnDDo06CKQ0lM0u3Q5mEjMDw/1XBc/bjzEOS0nDesY+3iZTxJbdAH5WbId+SY4L5AC0ZXtuzlGMEPADhnzJnwvkfEAbahRcsRROgEv409ZGWNLLqaJvYGdhV+Bt3u8h6/YCKpHrJdKffpX47JZiMJXgh1Je+6T4IQJuZ2PY/XeLyqjd88GxXwm7iczEAStJldjc4WkD3TmXy58h06I8pbbn6yCVGqcIQXnYb0kP0h99wA2+Zp3arbCReuckOSfNbGITtGYHKWjryvVgKq74Zvo5LjHzPBTRLpF+GCi3hXAgewQIm7QGoQCavohdBsV0m1maqmYDj2Vd5cDPfjUX+ZbvbZzgWF6kXNoeWuwZ5x5oN+zz+2JbdjDupuUSp8CWZYtxHSJQDFfQIgo6ai75DOADq+eZHV1INOOri35v0xmv3MQdzvFcRLGpFg9VGT18cD9dMSYuVvyXXgr73Z9iWYnBbk9jO7tgK15vGu4mWXPhMAoothq6efhZnm2tewK2likoEyYjxlNkwsP+5HscpEr9DVaOeXMO16H5THz1R7nBfznjivN0g5TTW4cMw2EJcq94XfqsiyfqABOmJcJVgMPEuvtDjgpvvBEzc09DNPnoNIquqv6bv0GBgToQ8T6orIyrq5vm+IzZwwOHHUf7iak2bNSraYPm9sXhgULXc3EhMvz+qeT6Mw+sGtxFb43imRSu9uRoH2n9jX8Cqm9xrFisLbeDJ+jTLuRUYUXSaQBPbnryUvNkpW8auWau1w7oxxkSt8iGkJ5HzGGKCRWfrmbHTTTq20etCDvgRBodGjMQZoNVKjAATrtJCs/fNJ0bjT1KNfD5C8ZyLnV/UDeaWw1rTmOIz6tzovo7qzm4J+2Gg+0ZFCCtbD18J+NW9X0fUzYKFsX4Veq0AVQCLIlTHPgtFX1EzLHconIacRj307Vc9t1Ru/bvShZY X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:37:39.6671 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 565f4e3d-34e6-4fa7-7255-08dd1a777972 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF000000A0.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR12MB8822 Currently __kvm_gmem_get_pfn() sets 'is_prepared' so callers can skip calling kvm_gmem_prepare_folio(). However, subsequent patches will introduce some locking constraints around setting/checking preparedness that will require filemap_invalidate_lock*() to be held while checking for preparedness. This locking could theoretically be done inside __kvm_gmem_get_pfn(), or by requiring that filemap_invalidate_lock*() is held while calling __kvm_gmem_get_pfn(), but that places unnecessary constraints around when __kvm_gmem_get_pfn() can be called, whereas callers could just as easily call kvm_gmem_is_prepared() directly. So, in preparation for these locking changes, drop the 'is_prepared' argument, and leave it up to callers to handle checking preparedness where needed and with the proper locking constraints. Signed-off-by: Michael Roth --- virt/kvm/guest_memfd.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index b69af3580bef..aa0038ddf4a4 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -773,7 +773,7 @@ void kvm_gmem_unbind(struct kvm_memory_slot *slot) static struct folio *__kvm_gmem_get_pfn(struct file *file, struct kvm_memory_slot *slot, pgoff_t index, kvm_pfn_t *pfn, - bool *is_prepared, int *max_order) + int *max_order) { struct kvm_gmem *gmem = file->private_data; struct folio *folio; @@ -803,7 +803,6 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, if (max_order) *max_order = 0; - *is_prepared = kvm_gmem_is_prepared(file, index, folio); return folio; } @@ -814,19 +813,18 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); struct folio *folio; - bool is_prepared = false; int r = 0; if (!file) return -EFAULT; - folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &is_prepared, max_order); + folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order); if (IS_ERR(folio)) { r = PTR_ERR(folio); goto out; } - if (!is_prepared) + if (kvm_gmem_is_prepared(file, index, folio)) r = kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio); folio_unlock(folio); @@ -872,7 +870,6 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long struct folio *folio; gfn_t gfn = start_gfn + i; pgoff_t index = kvm_gmem_get_index(slot, gfn); - bool is_prepared = false; kvm_pfn_t pfn; if (signal_pending(current)) { @@ -880,13 +877,13 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long break; } - folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &is_prepared, &max_order); + folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &max_order); if (IS_ERR(folio)) { ret = PTR_ERR(folio); break; } - if (is_prepared) { + if (kvm_gmem_is_prepared(file, index, folio)) { folio_unlock(folio); folio_put(folio); ret = -EEXIST; From patchwork Thu Dec 12 06:36:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904721 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2053.outbound.protection.outlook.com [40.107.94.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D642D204C18; Thu, 12 Dec 2024 06:38:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.53 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985488; cv=fail; b=Q2VFyrpV0445h1UrJ7IKtzfpCsqqI0ymJhKadCrKJ5IwJxYig6cwVkI8z9IVjNmC/T0E5OLPqG9+DwqZ7eCYZ6WVz0KNkm1HSYZh8qktaPzAR5MTycuGRKXdQv/UH1mXfzUEiUFua+CZC742g1XB0n8p8I6U7puvqcCYk4GFWD8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985488; c=relaxed/simple; bh=LbEpoVZlO1THU1oztBjmWykLJELBpnTrQEoy5MrNh/4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=RsiQHwDpAMq3IG/vytUZBMU0v9NLIA9NVTFDf7bFuX7+5Fc4UI5WfcbqH91wjBSE74FCYHXwXeWPFTN2o17eVK0JioORPP7HsHTP9BLuLS9WsGrgm5P1sVXEjFy1P8TB8pGchYIng2lbuiOUuOF3XsBFux02nS40jXun3J8huwE= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=VmjHKYGi; arc=fail smtp.client-ip=40.107.94.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="VmjHKYGi" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Id3ARw30xNEMlQeAv10M5MQlyaiQm7r+QD6bEykBdKXPnCGw8rEt9Rv2Dckw7r0pO0zUR9ALNkRSOEcdUg4NBy+mjfq/MD9VMwhcJO6Y4vIHV4/t4pajBZAjGcYVJqqleLLB8nDo7HUsncs+MpmvMz6+bIoV9g7J+JZximo1p10UIwSXZBJUTwpRqoWxM22mgcOCttGyZixrKWbqoyTEAxFWbKZkKiW/s6b5gr7ImzVzJI4T0ZKss0PKGffJleps1J1ZF2sf1CnQGt/dx0UJKEG2fIKzmg5DZ8CEv8y4i1uhQ+qE/CekABfb3XUP1zX8bkeM0z0sFljPjLfDIaGcAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=f8++FbLT1BPaInNzV16tlPtGo1oawpAnxszWg7chPq8=; b=yviNpo/hFpCoFUW+Qybyo0GtcZ9nvQ63x3fy6dknBQkxOArEHNI3+fUZUv+l0IB0FiZvehlArZb0lThaE8zzpRK1/wy+6W44LdfdUOfIxqGmOgjF55ZkX9LP56eRh06PL/32ZqBAb8iWzt9SbP1QgERDqb90qutXOtUY2dlyNMr5qu12MCgJk+V7zU5uQOMQrSpX91+VK8sunSVM1oSmEyIkC1cXJOsgHN3WHEhV4IXXghXLSnDMcbxpg/YMRLW4dJgQdG94ehbbPEUxHcSCMyl+cD5nNXVCYXvw6H6P3+rlFRgF+NBFZD89BO0H94MHg9PFJ8QIUGGpyKnUWPMazg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f8++FbLT1BPaInNzV16tlPtGo1oawpAnxszWg7chPq8=; b=VmjHKYGiRJ+9+pehpQxYEPWkUCTD2N5yLBa3soa2zhe9MSRjbWV8VDKUCO76XUD1RAi1gFZAxtbyCwOv1Jmmn001aUZoXjCE0azJsKKVdC2nSzm1f/L/2KNyqqCFcxghU/mKM8ZQg46jKFhZLucXjGV5mgNIpcWElo9gQvtaPlk= Received: from CH0PR03CA0225.namprd03.prod.outlook.com (2603:10b6:610:e7::20) by PH8PR12MB7159.namprd12.prod.outlook.com (2603:10b6:510:229::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.16; Thu, 12 Dec 2024 06:38:00 +0000 Received: from CH2PEPF0000009F.namprd02.prod.outlook.com (2603:10b6:610:e7:cafe::17) by CH0PR03CA0225.outlook.office365.com (2603:10b6:610:e7::20) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8230.22 via Frontend Transport; Thu, 12 Dec 2024 06:37:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF0000009F.mail.protection.outlook.com (10.167.244.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:37:59 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:37:59 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH 2/5] KVM: gmem: Don't clear pages that have already been prepared Date: Thu, 12 Dec 2024 00:36:32 -0600 Message-ID: <20241212063635.712877-3-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241212063635.712877-1-michael.roth@amd.com> References: <20241212063635.712877-1-michael.roth@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF0000009F:EE_|PH8PR12MB7159:EE_ X-MS-Office365-Filtering-Correlation-Id: 190b7953-7f2e-4952-8c9e-08dd1a778532 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|376014|7416014|82310400026; X-Microsoft-Antispam-Message-Info: q7h4Ei+ypi9ZG5dGc9Ulaqwy7Vct82PWUyievis5K3KHBFsqe1UGe3AA6fpmQ2qtk8Zw18rXDw8BVnYZ9Nlsvja+iXxPegrCbQr+e8DN5grycsV9l596Gh1ztjF55WkRr8kwZFx3/Xgf0xBTmPoQS9JAlUg3wyd/uhza1qIIsQdlbyfxgoNW9WD4CslzVIfQSa4whptlvLjP/TOTvNLzulBQxUfd1odWuBOxmPplZiUyMy6B5UCFdWdN25FfyUb4jEDPfLN5gHyBQqdLi8ekemjFM7DJlI9PW4jVGNTzTJWarcEWE3Hq0Yhq0NR3TxxMfBGJbZ6+89CCTJeF3t7CROO643uCG6rPHyTPXYOAHjzAX28CJPv2S0HgnYcf5u1OJkZYzc/4oQp10WsP84z8Phf7YsWbsHl4U7mMMbdV+PNBuwier3WPJx7AeKrvEcrSlbwqo/r+eHwj1MBzmIzK+OJEq2ZJfZVyOpJwRMrIghoUVsSinI7GV1PALtKYU+5Rbv8u4BpwYCg+gBMnaOus076E/KT4CitA51pEtu/iXO6CNF1n715qfT/P1BektGL0vAe5bHKCHlRahpagAZ2bQWpJP40o4hZwFAaukAy6ZEvCzByWEMsw7n6m6MP9vvny2VOPgPnR/ultv6Cutcq6cG5cNiahoePWmKuLGGhVytyPKulcxfQ3lRlH4O8rfc/FYur+cfJIBwSp63ErZviX8gwBV7mtQHlDD0mgslHpzI9/1uGKNv9EGvIml8oE4WcYLTOYo5MpQXm0lA5CkAd/KNuoc/YXMezBxSRpJQSvF7kK+UykYt+xroGQLDaIRff8i/DBQt9g1jDote1soQrwDozKfBMpl64MlFVEyopMHlrBtDY/LwIGrGDGWId9anQRZlLAflF0hJaWYwL1AiFfQhVqxIr/pBW8GxsR4HmKO0ap92nM1PO1G/1mxqC6U+UQ3zHSBfOPCcjkGDi8WcAXJwNmJJ1G3QD99JplgWVmlqxQ68H9ztr956ID5S7jQX3wEWotwkC9LC163g+zAdmdRIdpvptqand0OEd6a/R9J5isFAwsfU4dwpIdjYVhiwj+d8ujj1AqROXBElKh9++Mx8KpgYrrm1Ai0Rq+wGLIZUlLHWOyoNw1reHxkJ326XSGJmSCNF3JDIZqNvzodaVDttdJU/lTtdT0NPbr77caUOqeLgMO/APRUD8Icj3+M/HK0vmYDRiFSIGiPyzReEeZTogY2wupc4doYsrohIgAbwj5Nbadxe/vr6Ar/dR7macufRnY66QbSG6LZf3+WwzL09IatbMsK1ud9UIUwT5popLyZfafJtdd7Yrric8RXg02rZktpURP4l9EXJx7z1fj0kmL8yzt6GhgxDEPiu/ztEwPxL9PnycaURGc/uah7oCSBehf+Z75PhDuMGyv3NCl9vsMSf2vWAR/5PucwZC9HxZkBgsGPjUA/7FXkIDpaotc X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(376014)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:37:59.3810 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 190b7953-7f2e-4952-8c9e-08dd1a778532 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF0000009F.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB7159 Currently kvm_gmem_prepare_folio() and kvm_gmem_mark_prepared() try to use the folio order to determine the range of PFNs that needs to be cleared before usage and subsequently marked prepared. There may however be cases, at least once hugepage support is added, where some PFNs may have been previously prepared when kvm_gmem_prepare_folio() was called with a smaller max_order than the current one, and this can lead to the current code attempting to clear pages that have already been prepared. It also makes sense to provide more control to the caller over what order to use, since interfaces like kvm_gmem_populate() might specifically want to prepare sub-ranges while leaving other PFNs within the folio in an unprepared state. It could be argued that opportunistically preparing additional pages isn't necessarily a bad thing, but this will complicate things down the road when future uses cases like using gmem for both shared/private guest memory come along. Address these issues by allowing the callers of kvm_gmem_prepare_folio()/kvm_gmem_mark_prepared() to explicitly specify the order of the range being prepared, and in cases where these ranges overlap with previously-prepared pages, do not attempt to re-clear the pages. Signed-off-by: Michael Roth --- virt/kvm/guest_memfd.c | 106 ++++++++++++++++++++++++++--------------- 1 file changed, 68 insertions(+), 38 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index aa0038ddf4a4..6907ae9fe149 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -96,15 +96,15 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *folio, pgoff_t index) } static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot, - pgoff_t index, struct folio *folio) + pgoff_t index, struct folio *folio, int max_order) { #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE kvm_pfn_t pfn = folio_file_pfn(folio, index); gfn_t gfn = slot->base_gfn + index - slot->gmem.pgoff; - int rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, folio_order(folio)); + int rc = kvm_arch_gmem_prepare(kvm, gfn, pfn, max_order); if (rc) { - pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx error %d.\n", - index, gfn, pfn, rc); + pr_warn_ratelimited("gmem: Failed to prepare folio for index %lx GFN %llx PFN %llx max_order %d error %d.\n", + index, gfn, pfn, max_order, rc); return rc; } #endif @@ -148,15 +148,15 @@ static bool bitmap_test_allset_word(unsigned long *p, unsigned long start, unsig return (*p & mask_to_set) == mask_to_set; } -static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struct folio *folio) +static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, int order) { struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private; - unsigned long *p = i_gmem->prepared + BIT_WORD(index); - unsigned long npages = folio_nr_pages(folio); + unsigned long npages = (1ul << order); + unsigned long *p; - /* Folios must be naturally aligned */ - WARN_ON_ONCE(index & (npages - 1)); + /* The index isn't necessarily aligned to the requested order. */ index &= ~(npages - 1); + p = i_gmem->prepared + BIT_WORD(index); /* Clear page before updating bitmap. */ smp_wmb(); @@ -193,16 +193,16 @@ static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t index, p bitmap_clear_atomic_word(p++, 0, npages); } -static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct folio *folio) +static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, int order) { struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private; - unsigned long *p = i_gmem->prepared + BIT_WORD(index); - unsigned long npages = folio_nr_pages(folio); + unsigned long npages = (1ul << order); + unsigned long *p; bool ret; - /* Folios must be naturally aligned */ - WARN_ON_ONCE(index & (npages - 1)); + /* The index isn't necessarily aligned to the requested order. */ index &= ~(npages - 1); + p = i_gmem->prepared + BIT_WORD(index); if (npages < BITS_PER_LONG) { ret = bitmap_test_allset_word(p, index, npages); @@ -226,35 +226,41 @@ static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct folio */ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, struct kvm_memory_slot *slot, - gfn_t gfn, struct folio *folio) + gfn_t gfn, struct folio *folio, int max_order) { unsigned long nr_pages, i; - pgoff_t index; + pgoff_t index, aligned_index; int r; - nr_pages = folio_nr_pages(folio); + index = gfn - slot->base_gfn + slot->gmem.pgoff; + nr_pages = (1ull << max_order); + WARN_ON(nr_pages > folio_nr_pages(folio)); + aligned_index = ALIGN_DOWN(index, nr_pages); + for (i = 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); + if (!kvm_gmem_is_prepared(file, aligned_index + i, 0)) + clear_highpage(folio_page(folio, aligned_index - folio_index(folio) + i)); /* - * Preparing huge folios should always be safe, since it should - * be possible to split them later if needed. - * - * Right now the folio order is always going to be zero, but the - * code is ready for huge folios. The only assumption is that - * the base pgoff of memslots is naturally aligned with the - * requested page order, ensuring that huge folios can also use - * huge page table entries for GPA->HPA mapping. + * In cases where only a sub-range of a folio is prepared, e.g. via + * calling kvm_gmem_populate() for a non-aligned GPA range, or when + * there's a mix of private/shared attributes for the GPA range that + * the folio backs, it's possible that later on the same folio might + * be accessed with a larger order when it becomes possible to map + * the full GPA range into the guest using a larger order. In such + * cases, some sub-ranges might already have been prepared. * - * The order will be passed when creating the guest_memfd, and - * checked when creating memslots. + * Because of this, the arch-specific callbacks should be expected + * to handle dealing with cases where some sub-ranges are already + * in a prepared state, since the alternative would involve needing + * to issue multiple prepare callbacks with finer granularity, and + * potentially obfuscating cases where arch-specific callbacks can + * be notified of larger-order mappings and potentially optimize + * preparation based on that knowledge. */ - WARN_ON(!IS_ALIGNED(slot->gmem.pgoff, 1 << folio_order(folio))); - index = gfn - slot->base_gfn + slot->gmem.pgoff; - index = ALIGN_DOWN(index, 1 << folio_order(folio)); - r = __kvm_gmem_prepare_folio(kvm, slot, index, folio); + r = __kvm_gmem_prepare_folio(kvm, slot, index, folio, max_order); if (!r) - kvm_gmem_mark_prepared(file, index, folio); + kvm_gmem_mark_prepared(file, index, max_order); return r; } @@ -812,20 +818,31 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, { pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); + int max_order_local; struct folio *folio; int r = 0; if (!file) return -EFAULT; - folio = __kvm_gmem_get_pfn(file, slot, index, pfn, max_order); + /* + * The caller might pass a NULL 'max_order', but internally this + * function needs to be aware of any order limitations set by + * __kvm_gmem_get_pfn() so the scope of preparation operations can + * be limited to the corresponding range. The initial order can be + * arbitrarily large, but gmem doesn't currently support anything + * greater than PMD_ORDER so use that for now. + */ + max_order_local = PMD_ORDER; + + folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &max_order_local); if (IS_ERR(folio)) { r = PTR_ERR(folio); goto out; } - if (kvm_gmem_is_prepared(file, index, folio)) - r = kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio); + if (!kvm_gmem_is_prepared(file, index, max_order_local)) + r = kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio, max_order_local); folio_unlock(folio); @@ -835,6 +852,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, folio_put(folio); out: + if (max_order) + *max_order = max_order_local; fput(file); return r; } @@ -877,13 +896,24 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long break; } + /* + * The max order shouldn't extend beyond the GFN range being + * populated in this iteration, so set max_order accordingly. + * __kvm_gmem_get_pfn() will then further adjust the order to + * one that is contained by the backing memslot/folio. + */ + max_order = 0; + while (IS_ALIGNED(gfn, 1 << (max_order + 1)) && + (npages - i >= (1 << (max_order + 1)))) + max_order++; + folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &max_order); if (IS_ERR(folio)) { ret = PTR_ERR(folio); break; } - if (kvm_gmem_is_prepared(file, index, folio)) { + if (kvm_gmem_is_prepared(file, index, max_order)) { folio_unlock(folio); folio_put(folio); ret = -EEXIST; @@ -907,7 +937,7 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long ret = post_populate(kvm, gfn, pfn, p, max_order, opaque); if (!ret) { pgoff_t index = gfn - slot->base_gfn + slot->gmem.pgoff; - kvm_gmem_mark_prepared(file, index, folio); + kvm_gmem_mark_prepared(file, index, max_order); } put_folio_and_exit: From patchwork Thu Dec 12 06:36:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904722 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2049.outbound.protection.outlook.com [40.107.244.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A77C9204C18; Thu, 12 Dec 2024 06:38:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.49 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985511; cv=fail; b=pCYDWiXfJwYaXXbg8+oqnGJ8xjH9eQavUw6cQYqTxeq5e+viglPaYr1qZ+6baweR6ikHLoOX/S2WIqsbjDc/Ab+jR1VHrPybHf1astc3Q8qHmRpjj3MiVNb2cV0Do3XqWxQ2eTLNvfP6ZmslywgDPpyv8zi7VHAluuChHogpGI0= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985511; c=relaxed/simple; bh=/U3Z3glLZUhwxfpxBGOK/7Yo6Qft9bQAcmaTSDVigH8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=axj3GT6Sgeh3czjGsJbgSbldO26GXd2OHeEHOY+stv1ia/0Dl7Wo2G7qwTOi36+Ezg3Kilm7ViL8ljLR3pmAd0sFtoaDL0nXoc5/R5cwPm4tE9ABtOVVGro8I9RlPykfUPQ+p5qm3at5BL4iPGiOukKXXoMcJk7DGy0CdvctqeI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Luiq8S4E; arc=fail smtp.client-ip=40.107.244.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Luiq8S4E" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=xtau7Aqv+u6p3il7C31tUgPiAltOmXOp24rTfwJnaz38Azg2DZLSwJxjdaNm0gMcqWw94zRfxCJJnjP3fQzwRFrbLGN2B0/J8FZNiDIo2p+fbTn3E/Q3KAJzELBThWGeL5wqUxNWuBOEewVRlF8Db4h44WlCpy4VSGKMdXggpXlidfHHk7XOAjViuLsysjHgK6/g4F6PlcV20sKFCLmiMuGLlz/6ITqrYTMSzyOV6WBqrbc9+zfOHbVQF7INwlypY/5/G7czoBBuoiR3DCrBNdnlLt2zIhbPf53unZ81k/HltUsO3pKB97h6r9O7ymakJVrHcVTy4ve6a8WB6hKXsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=A9XYF0QJO5+ytlz+nk9KzbhWH+1hy5p4tpp3erNpAR4=; b=yPJ7wQrsNtWgxPBEzBTLreYoBNt5tA60zxhMQkAMkFCiFfP9hWtGhQMKLaJOcvgJMrmS729xrclmn+k2Ykwd8T56Sxw4EA2tmv4Own37e5CaXZSpAOGv7ZqqD3hyTaYv0CySm18d/6ag337aWHvROooxuoNyBTCCYixTCkpH7NP7WeY6gzdtBeZxMdEecuM5XgPZOBhrCO+tOO3Lq3/vjLqWgopXLPWdoOx22gPJ+4WNuGPsQzKSrU4h7Kqp8gl075LtkMfjAV5Z6cTES4N88/53FqOwLOycfwINq8kS/qNsgfrel12Ue0tOCu0pr5GHE6Ebo3Df2jAsJkLETB85Mw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=A9XYF0QJO5+ytlz+nk9KzbhWH+1hy5p4tpp3erNpAR4=; b=Luiq8S4EpjzAZbRSNr3OEZ6mYgWZqjTTmBGkkt4i2xaY0+3XY3V5QUY2bKMvW0HEV5brlzlTMOxjBhT+HjUm1EAcSSPRTTD0gWPGgm1sUqCmGcieovqwxDHTLIjna24b0q3bwIbrQPOnl2nWU83lUeNVy6irA997gAwuy1UIulw= Received: from CH0PR03CA0234.namprd03.prod.outlook.com (2603:10b6:610:e7::29) by CH2PR12MB4184.namprd12.prod.outlook.com (2603:10b6:610:a7::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8182.21; Thu, 12 Dec 2024 06:38:25 +0000 Received: from CH2PEPF0000009F.namprd02.prod.outlook.com (2603:10b6:610:e7:cafe::b8) by CH0PR03CA0234.outlook.office365.com (2603:10b6:610:e7::29) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.14 via Frontend Transport; Thu, 12 Dec 2024 06:38:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF0000009F.mail.protection.outlook.com (10.167.244.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:38:25 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:38:24 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH 3/5] KVM: gmem: Hold filemap invalidate lock while allocating/preparing folios Date: Thu, 12 Dec 2024 00:36:33 -0600 Message-ID: <20241212063635.712877-4-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241212063635.712877-1-michael.roth@amd.com> References: <20241212063635.712877-1-michael.roth@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF0000009F:EE_|CH2PR12MB4184:EE_ X-MS-Office365-Filtering-Correlation-Id: be83d3cd-20d5-40b0-210f-08dd1a7794df X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|36860700013|1800799024|82310400026; X-Microsoft-Antispam-Message-Info: oN/MTV2i7zVlEcCb9zpOOX8bxmXnV/KyZYy70+wWj9RvacoX2P0g1wHrEllM09A4GWQENKNvqdoowKL5HXi8y/SY0y5ihia/j2anghIYALZIs8xq3ZHgp31IAt+0W0D8fIyjVc747STmbSrrPLNs9W/OE42NMrsR6lUWOfp21OHxwHSPtqudYpBf1jYwpi/e53TzVvqO+N3EsUkq08JGiB51lWZiYkLZFp0S4fuewQo/2BZ5EEQR3sZ/Qa/YYBFAAHzQaTQ7XBzfeiM7CX7NGQu2DCscdtqAWkYO4TxxeAja2mhIAzdJ/eZo03Cvz7NKOvmZ3E973IisafCtPBFgz4VoEgLFvBFZZH3eODXe4h240dPs95ppYgfxNY3acEJjyoZyPIcm/FRxlTLc08LGAQvvFiEXM1Vb14i+bh623cEcW/Bz3zuY+HZYoAu5HikcJIVlDXTghNEd73CBc0RqKALHKImDRddVFmC0kd9mVGb9dhxLytQRwTG+9zkBVR4KYX2EJXLwLa2y/ShE6AWqcsmb04kGTvh7IRe+cBNrw4MKrIRV49RYwcc6ZMcFnnYLO0Tq5oH3TiJjV0ig6NExCwGy4HIMI9e0DyrlM9bhKmSFDHYX+Hts1e8iT6nHtxwduxrQJoNuH+UjDHALjoL/lfs9TqRwtNMRIx/RoZSdeW+r1S7HCJbescChOSArYg1U1GigAjCfomo6wcEiImJ9iUTRPkeuhfq163FDRT2IhClAGKLzy0fPSnIUzE3GBi2E+gkWihLEYqSjsS6jcJJs/FAWwn1kDFzHN3zrt+v0jGtdDUPHnT/CbWvXOybYu3tbHSJKN+ZshCZGP5tKy4rkelz5Zjvr1n+7b2MdmC4e5oapIS+Ho+i85MyzhvYLFty7MGB5hlXeI+Gg+qOsBMMaoWYjGcA57KRjbpmE8KrsGyuim90+sYnZnhlnnQNEQo7c8SglhyP+XnZSWMy07WD9dmm6yxSFGcEEhXzqy4tBclswBmYI/hrJdnUNrNYWyyjNIgOiPgD71YC0181y7/0xI5c+B7BBDfv4kDTOMwHEe8+FEM4AblU918nbLrd1rb6XjE9wjwJJZTwIQCgD21/jE902nOVjjmGDL05bnkw/UziXSZfLBN2NYaCjdjk6KXuf4KwJbcU0gEJzw/APwBa8UZEIwRZ8h6Gvy/iEOS8kXgTnSZ/Cl5UWj4NmwhOZzT45MVxHU8G9jGaPbPVjzD524RPe4g2hMr7NHFkDn4CvzRtmfbr7hKK0QH1qrqySchBtYTIUS/wtdI3NKrRCb2d4Ok5X0jvfScyLkCKrSQBu5d0GtkJQEYUrKVA5btLGAJ/1BFGuhcv6V8qd9DN3AFjGAg+AGuJ4T0Nw64fVZyEqfydpebW/P1gNLhHu1sBtIowl8jJ9aN0kv0sPv+Sg04am68eGrWOuGprU91sLgW86Lrtsny/LVdrRljTRd5l3eYdl X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(376014)(36860700013)(1800799024)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:38:25.6309 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: be83d3cd-20d5-40b0-210f-08dd1a7794df X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF0000009F.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4184 Currently the preparedness tracking relies on holding a folio's lock to keep allocations/preparations and corresponding updates to the prepared bitmap atomic. However, on the invalidation side, the bitmap entry for the GFN/index corresponding to a folio might need to be cleared after truncation. In these cases the folio's are no longer part of the filemap, so nothing guards against a newly-allocated folio getting prepared for the same GFN/index, and then subsequently having its bitmap entry cleared by the concurrently executing invalidation code. Avoid this by ensuring that the filemap invalidation lock is held to ensure allocations/preparations and corresponding updates to the prepared bitmap are atomic even versus invalidations. Use a shared lock in the kvm_gmem_get_pfn() case so vCPUs can still fault in pages in parallel. Signed-off-by: Michael Roth --- virt/kvm/guest_memfd.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6907ae9fe149..9a5172de6a03 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -154,6 +154,8 @@ static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, int order) unsigned long npages = (1ul << order); unsigned long *p; + rwsem_assert_held(&file->f_mapping->invalidate_lock); + /* The index isn't necessarily aligned to the requested order. */ index &= ~(npages - 1); p = i_gmem->prepared + BIT_WORD(index); @@ -174,6 +176,8 @@ static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t index, p struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)inode->i_private; unsigned long *p = i_gmem->prepared + BIT_WORD(index); + rwsem_assert_held(&inode->i_mapping->invalidate_lock); + index &= BITS_PER_LONG - 1; if (index) { int first_word_count = min(npages, BITS_PER_LONG - index); @@ -200,6 +204,8 @@ static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, int order) unsigned long *p; bool ret; + rwsem_assert_held(&file->f_mapping->invalidate_lock); + /* The index isn't necessarily aligned to the requested order. */ index &= ~(npages - 1); p = i_gmem->prepared + BIT_WORD(index); @@ -232,6 +238,8 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, pgoff_t index, aligned_index; int r; + rwsem_assert_held(&file->f_mapping->invalidate_lock); + index = gfn - slot->base_gfn + slot->gmem.pgoff; nr_pages = (1ull << max_order); WARN_ON(nr_pages > folio_nr_pages(folio)); @@ -819,12 +827,16 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, pgoff_t index = kvm_gmem_get_index(slot, gfn); struct file *file = kvm_gmem_get_file(slot); int max_order_local; + struct address_space *mapping; struct folio *folio; int r = 0; if (!file) return -EFAULT; + mapping = file->f_inode->i_mapping; + filemap_invalidate_lock_shared(mapping); + /* * The caller might pass a NULL 'max_order', but internally this * function needs to be aware of any order limitations set by @@ -838,6 +850,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, folio = __kvm_gmem_get_pfn(file, slot, index, pfn, &max_order_local); if (IS_ERR(folio)) { r = PTR_ERR(folio); + filemap_invalidate_unlock_shared(mapping); goto out; } @@ -845,6 +858,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, r = kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio, max_order_local); folio_unlock(folio); + filemap_invalidate_unlock_shared(mapping); if (!r) *page = folio_file_page(folio, index); From patchwork Thu Dec 12 06:36:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904723 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2061.outbound.protection.outlook.com [40.107.223.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3587E54723; Thu, 12 Dec 2024 06:38:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.223.61 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985532; cv=fail; b=YUNNcRNAYaQBIjFGsB7EbDFJ2iiN802r5Te/exPMi0LBt+ZO6bgiQiNBhHLk5bfpnqZt9yAM3nKctoex8DWgt/5dciXQBp/YM2A7q78u7TqxYFhvYKP9WgvqCZv74aJZy3XTQiz9vmXvK8sMyqTgU66Ye63In46kqz0vgYbrqFk= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985532; c=relaxed/simple; bh=3S4FitfyvHq2biTg8nSnnFWWzhemMZx43KNm5Ur0f9o=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=V+aDEdOnMK8+mkHdJOUBFh3NJT6WmZx24JvExZrqnO+eOdJh6O3QSJMOxKYYxvD8eok7d8GmsPDgKyXR1PVBguMxC1CECUr+hkq8wpiZCXxqCz5+LnmPEZBE3pO79sSGXTjZB2hwbkdjKBUucwUGcAHCxduEN2F9I+4wTSLLT8k= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Ihi5b7tt; arc=fail smtp.client-ip=40.107.223.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Ihi5b7tt" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=pbKKjn2e0zGLQgpKvShyDRy16+uXktqqPen22PpgC8dL5HxpG7u7XEWsFC6jnamRo7HfdrBox/N5L1xTDc+waNLqLBUOajuPOr5L3C97F+KYmGc73vIIXbb8lzrYYhTtFv4AKIlf4hxxtdVKwJlV+6TymTJeb1HDzgwxeRvUjW30q+Vl4vdggFSw+EuYfQjLhuujJlGPGlBSZ7Z7UI40bUPySE9C8loWxGxCtRzimlZSnuhcCvHn5hGclpptr58LyRZGGDAXOGe/cUID4P7I8OEIuJTDe6w/9IZ+yxlw6P47meCpNbv2T2zcZPVwwu+RzpOv9xJ6NGi0YmJfNrmHvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4eGZxnQBjTodY1JhmVlQdeeW0tmFoosWN4I8NNqgCBw=; b=Yx8pzXK3M+9IiMSEolyUYbhytN+QiWGlxhCW3Rr3D22fm3n+iNbjkarh6/lrWniTL3aRcVzLtO1u9/vpcJgtQtV0DPwX9JDStu/MKdBD0IJXCXjyiF4kdl0/xQf/4qQgs2WLfX+kee7rzmxRZy/b4/NB4gtSO40CsG7aTA+qMmbbkJfYgUHrKcCehp5eruk5boNWzeF+YtkRKADOyNgtgv7i57QdOdW/hU2yhjNtufVRRhugnt83KObJlO6FQLJj7nqNX+sIVZceSEUcXaM64I+MHxHM90FOquwuuV8ImyQ7FWW0PF9aEGHEo6+sr2PfbfB29p/Xn1p3J/yPNvzf+g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4eGZxnQBjTodY1JhmVlQdeeW0tmFoosWN4I8NNqgCBw=; b=Ihi5b7tt5yg8MC85K42g3RMjnuiu/a11b/TlSlvfd5ql/uSbiitd5gXhjkhgBU+TIpmj4nLOj9NvMUWi6Hl0V4SLb2u6NnMWefB7zHGHh2HvFnpY9i0vDRi/3z6UqyRaIo1n9QSbJCLw+wQeDKig/A8q0mXwZYJfctijVJR86OE= Received: from CH2PR04CA0019.namprd04.prod.outlook.com (2603:10b6:610:52::29) by CH3PR12MB7524.namprd12.prod.outlook.com (2603:10b6:610:146::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.15; Thu, 12 Dec 2024 06:38:46 +0000 Received: from CH2PEPF0000009B.namprd02.prod.outlook.com (2603:10b6:610:52:cafe::42) by CH2PR04CA0019.outlook.office365.com (2603:10b6:610:52::29) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:38:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF0000009B.mail.protection.outlook.com (10.167.244.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:38:46 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:38:45 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH 4/5] KVM: SEV: Improve handling of large ranges in gmem prepare callback Date: Thu, 12 Dec 2024 00:36:34 -0600 Message-ID: <20241212063635.712877-5-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241212063635.712877-1-michael.roth@amd.com> References: <20241212063635.712877-1-michael.roth@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF0000009B:EE_|CH3PR12MB7524:EE_ X-MS-Office365-Filtering-Correlation-Id: 01d338c7-cd1d-4308-e71a-08dd1a77a10e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|36860700013|82310400026; X-Microsoft-Antispam-Message-Info: iGMrAeweamXYwcYa/wSFB3TvgYzZ1Q8Sh/QVkAlLUy378wr74Bw2eQWHWH8iTiC6ici9NqVou7upkkerZQOxD3KBiOhCf62ApTK7yFiJEVzQPwkVC1xdTMkK1vDA3jlHgYi4XyKvI0Y97UOw0kMQDDe6jYh/T79xbpCSj0mc9A8NWIX59DowqJrTfIMNHlhhIuXS8QVirFFSSXiVrekLdFkW+Y/wH0KAdI0EyVAO/NHSqxxNcsdx98JrWr4LWGL6qwCYUVE2Q/xdCXVhj+mBRK7CoBKQntaFOuizWMEAi+PTbgkN3joYO1n9uGrRuys4OybsmRF2tC3Mtl67gfO9V+B/zXtQzt5fVtNHTJTW9O2BZAQuPGslN1nu6tqjO804NLHwQdjJpWbj+84HYdCurWz9lcZksgcrjhKZrG0OlFCd/Yh/Tk+60UBzlEqDW4Jcp/FCZvpgHd3Xo6qR/lxFP9HbPAZ2nWSqlTtFWcnrvpoBpEnejQDa6N7Xc5coHXsfxgSK0ZvuvP9hrO8W/F+AZodAvq5RIjlvLKNKU0E90BPlmTINHfWbLxuRFj7kkylHmA4r1OLHAOY30VJ922aF8gPDbRrLYaK3qd3Sqhl33EEUEmyn8oFZnizz056w7/Up2P9rXWU1E3MMpTWysuuEjQK+mkSh0aid16CVdH0JQM2S5F9SdADpGQWSIHlp/ICN5qW+kBOvlef/P14p27txOOprNGo03se4+76FHxd/aAnb9/Y+h0PqB3HRUZjhkz8Bzl3+lcRtVNLCl+6oMW1ICa83qqKYFTz3o5lxE38ZBdVoVqItXqTwoC13lImEtOoYPi9VbeX+/c/6TpI12/tLmcPeaCMVJcxCrzk/7ppWn6axG76uOkdl1mMuF80V7EfbG4G7mAmUnA7UCVQwteFyuHV90ZbhHGHZS2ZWBfwskx0ym5cJ0MpU+XHzpMd3Hs1u1hD+rp9m5zKegU81+gN0w/K6hHCIsLzBwobZ+wxiKM1WDdK7Dq0oiLCTQZDMVQTkF0oZiwY1vat93YOAA0pCq0yNuut5Fls+0VVc6jcOFvQFBEAsSl8dkf6oobOB9Hhw6SMcbWNq4uX7AJCSjj7TW8gnyMiE40YcZIb0+tkJWBO1GJWRjjA6pia6Mv44Pup4GYPfhklmoUWsqGbDkSqQwgrf9r2rmWDhrQIPhtXvF/1GceMsI2+DfdU2nMER8eF5JT29TeJ2MI5LtpcSW/JNaYoIph7UT/IZIUJ1cA9VMy4UYN0zZhqQmqkCMs/1WjpRkUWdPPB2Ph9YyvSRBV+2SC9yf9MjA0N9aY2TjbmsIlVhjxzpAg/LjzvwJ+7OMAt8/gUEJW1KwJNx5hL5cb0alrC7cpzdF1ESKVydefkxDap8/CgkKlDwxHU7bI+cgBKOesME4LdxKK6lrl4tKgb4gGetmzMirTOeRma61f9W8kGWYSfoXojaNimosDH6JWkw X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(36860700013)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:38:46.1079 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 01d338c7-cd1d-4308-e71a-08dd1a77a10e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF0000009B.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB7524 The current code relies on the fact that guest_memfd will always call sev_gmem_prepare() for each initial access to a particular guest GFN. Once hugepage support is added to gmem, sev_gmem_prepare() might only be called once for an entire range of GFNs. The current code will handle this properly for 2MB folios if the entire range is currently shared and can be marked as private using a 2MB RMP entry, but if any sub-ranges were already in a prepared state (e.g. because they were part of the initial guest state prepared via kvm_gmem_populate(), or userspace initially had the 2MB region in a mixed attribute state for whatever reason), then only the specific 4K GFN will get updated. If gmem rightly decides it shouldn't have to call the prepare hook again for that range, then the RMP entries for the other GFNs will never get updated. Additionally, the current code assumes it will never be called for a range larger than 2MB. This obviously won't work when 1GB+ hugepage support is eventually added. Rework the logic to ensure everything in the entire range gets updated, with care taken to avoid ranges that are already private while still maximizing the RMP entry sizes used to fill in the shared gaps. Signed-off-by: Michael Roth --- arch/x86/kvm/svm/sev.c | 163 ++++++++++++++++++++++++----------------- 1 file changed, 96 insertions(+), 67 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 418767dd69fa..40407768e4dd 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -4777,100 +4777,129 @@ void sev_handle_rmp_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u64 error_code) kvm_release_page_unused(page); } -static bool is_pfn_range_shared(kvm_pfn_t start, kvm_pfn_t end) +/* + * Find the offset of the next contiguous shared PFN range within the bounds of + * pfn_start/npages_max. If no shared pages are present, 'offset' will correspond + * to the end off the range and 'npages_shared' will be 0. + */ +static int next_shared_offset(struct kvm *kvm, kvm_pfn_t pfn_start, long npages_max, + kvm_pfn_t *offset, long *npages_shared) { - kvm_pfn_t pfn = start; + kvm_pfn_t pfn = pfn_start; + int ret; - while (pfn < end) { - int ret, rmp_level; + *offset = 0; + *npages_shared = 0; + + while (pfn < pfn_start + npages_max) { bool assigned; + int level; - ret = snp_lookup_rmpentry(pfn, &assigned, &rmp_level); + ret = snp_lookup_rmpentry(pfn, &assigned, &level); if (ret) { - pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx GFN start 0x%llx GFN end 0x%llx RMP level %d error %d\n", - pfn, start, end, rmp_level, ret); - return false; + pr_warn_ratelimited("SEV: Failed to retrieve RMP entry: PFN 0x%llx error %d\n", + pfn, ret); + return -EINVAL; } if (assigned) { - pr_debug("%s: overlap detected, PFN 0x%llx start 0x%llx end 0x%llx RMP level %d\n", - __func__, pfn, start, end, rmp_level); - return false; + /* Continue if a shared range hasn't been found yet. */ + if (*npages_shared) + break; + } else { + if (!*npages_shared) + *offset = pfn - pfn_start; + *npages_shared += PHYS_PFN(page_level_size(level)); } - pfn++; - } - - return true; -} - -static u8 max_level_for_order(int order) -{ - if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) - return PG_LEVEL_2M; - - return PG_LEVEL_4K; -} + pfn += PHYS_PFN(page_level_size(level)); -static bool is_large_rmp_possible(struct kvm *kvm, kvm_pfn_t pfn, int order) -{ - kvm_pfn_t pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD); + /* + * Only possible if RMP entry size is larger than the folio, + * which kvm_gmem_prepare() should never allow for. + */ + WARN_ON_ONCE(pfn > pfn_start + npages_max); + } - /* - * If this is a large folio, and the entire 2M range containing the - * PFN is currently shared, then the entire 2M-aligned range can be - * set to private via a single 2M RMP entry. - */ - if (max_level_for_order(order) > PG_LEVEL_4K && - is_pfn_range_shared(pfn_aligned, pfn_aligned + PTRS_PER_PMD)) - return true; + if (!*npages_shared) + *offset = npages_max; - return false; + return 0; } +/* + * This relies on the fact that the folio backing the PFN range is locked while + * this callback is issued. Otherwise, concurrent accesses to the same folio + * could result in the RMP table getting out of sync with what gmem is tracking + * as prepared/unprepared, likely resulting in the vCPU looping on + * KVM_EXIT_MEMORY_FAULTs that are never resolved since gmem thinks it has + * already processed the RMP table updates. + * + * This also assumes gmem is using filemap invalidate locks (or some other + * mechanism) to ensure that invalidations/hole-punches don't get interleaved + * with prepare callbacks. + * + * The net affect of this is that RMP table checks/updates should be consistent + * for the range of PFNs/GFNs this function is called with. + */ int sev_gmem_prepare(struct kvm *kvm, kvm_pfn_t pfn, gfn_t gfn, int max_order) { struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info; - kvm_pfn_t pfn_aligned; - gfn_t gfn_aligned; - int level, rc; - bool assigned; + unsigned long npages; + kvm_pfn_t pfn_start; + gfn_t gfn_start; if (!sev_snp_guest(kvm)) return 0; - rc = snp_lookup_rmpentry(pfn, &assigned, &level); - if (rc) { - pr_err_ratelimited("SEV: Failed to look up RMP entry: GFN %llx PFN %llx error %d\n", - gfn, pfn, rc); - return -ENOENT; - } + npages = (1ul << max_order); + pfn_start = ALIGN_DOWN(pfn, npages); + gfn_start = ALIGN_DOWN(gfn, npages); + + for (pfn = pfn_start, gfn = gfn_start; pfn < pfn_start + npages;) { + long npages_shared; + kvm_pfn_t offset; + int rc; + + rc = next_shared_offset(kvm, pfn, npages - (pfn - pfn_start), + &offset, &npages_shared); + if (rc < 0) + return offset; + + pfn += offset; + gfn += offset; + + while (npages_shared) { + int order, level; + + if (IS_ALIGNED(pfn, 1ull << PMD_ORDER) && + npages_shared >= (1ul << PMD_ORDER)) { + order = PMD_ORDER; + level = PG_LEVEL_2M; + } else { + order = 0; + level = PG_LEVEL_4K; + } - if (assigned) { - pr_debug("%s: already assigned: gfn %llx pfn %llx max_order %d level %d\n", - __func__, gfn, pfn, max_order, level); - return 0; - } + pr_debug("%s: preparing sub-range: gfn 0x%llx pfn 0x%llx order %d npages_shared %ld\n", + __func__, gfn, pfn, order, npages_shared); - if (is_large_rmp_possible(kvm, pfn, max_order)) { - level = PG_LEVEL_2M; - pfn_aligned = ALIGN_DOWN(pfn, PTRS_PER_PMD); - gfn_aligned = ALIGN_DOWN(gfn, PTRS_PER_PMD); - } else { - level = PG_LEVEL_4K; - pfn_aligned = pfn; - gfn_aligned = gfn; - } + rc = rmp_make_private(pfn, gfn_to_gpa(gfn), level, + sev->asid, false); + if (rc) { + pr_err_ratelimited("SEV: Failed to update RMP entry: GFN 0x%llx PFN 0x%llx order %d error %d\n", + gfn, pfn, order, rc); + return rc; + } - rc = rmp_make_private(pfn_aligned, gfn_to_gpa(gfn_aligned), level, sev->asid, false); - if (rc) { - pr_err_ratelimited("SEV: Failed to update RMP entry: GFN %llx PFN %llx level %d error %d\n", - gfn, pfn, level, rc); - return -EINVAL; + gfn += (1ull << order); + pfn += (1ull << order); + npages_shared -= (1ul << order); + } } - pr_debug("%s: updated: gfn %llx pfn %llx pfn_aligned %llx max_order %d level %d\n", - __func__, gfn, pfn, pfn_aligned, max_order, level); + pr_debug("%s: updated: gfn_start 0x%llx pfn_start 0x%llx npages %ld max_order %d\n", + __func__, gfn_start, pfn_start, npages, max_order); return 0; } From patchwork Thu Dec 12 06:36:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 13904725 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2073.outbound.protection.outlook.com [40.107.94.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1054204C20; Thu, 12 Dec 2024 06:39:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.73 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985553; cv=fail; b=pYs1qAxJTTG73MtjpdCcobm0c3mEZ8jbnoD+PZjRTzH9ohcQjz6QTzbNewD8NJbsqk8K9EgsQICelcZr6a8j46hj0ImXJH0iCMRgl3RJdoNQ0prDa6cAjqUYZW+7HpMyHv//lNlLViQ7u8gkwP6cokqis9saOyx7/l6GkIJrdAY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733985553; c=relaxed/simple; bh=3SfUG1MDWOSWweOT3pfen8IPZ8hKxMCrGnREfsrnDUM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=WzrbWLej/f4eCDJYfTXWvs5TqVYq0yKNnxOW5afIAJaFQFAkIzxO+x8bi3uXG+T1GLY2B4VMHWtFBJ5lcYqeY/fqT2T2T70+1MQF0uAlRKc4zs117FyaUoO63h94W47QbknMKdTDR0dxUMZKqYvMuSiwsR8q6iZdYlwC0Sc1AiA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=tUcvn5dW; arc=fail smtp.client-ip=40.107.94.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="tUcvn5dW" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=l0c4LgrtlT7seVsQ1V2WcukgzUiNn3PSGJBuoU6v29WRePjU00rc/E5Eg5+XLEwhh4OX8vZHoR+biaxthFCC/5ySFgQC440FkK26PGhTLakUgWP9rZUoyNQPYAzq9kTSvGsG+e7an3q8ayus/qhD6AE/Lr0EMHNwkVOt8jp7zJonn3BuzbUys5w5FSU0QAD9viambCaSXsyydCPTe/WLGi7PdiUkNtOji2oIVOtVeZuik85OGJRi3jc46n3UZqmEoKsigkXyGO/Q1/HroQSvvlo7pwCsCNcW2x4dJwxvqFlrY4mMGons2+V1BVCYMMU9OnSKtDJ+tGUqWPonP5oGaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=OJko6VK/4yYa1/Q3u/SX/rr6ao4Sfll0k5Fjriv+GgI=; b=EnsTcXsP5hnOgeNWnHfUAoZp043pdAESWAGaHvDGOmiCbgGzM/Wh1vTxuR4Foc6U3lWfqnlS0I5X7y9mx/W5JPDOM8wJW6orIV2Np5k9abouHYtiKA2mR2kXUAlHWPc2vgujYTz/IiFDvIfg4qiB/TK9He2/XmVrsNOIHhZJZV5moFA3VN53SeGTyk0tFSigZeQg3G7VfgDByDSjBLsRTobm47H5nE6wCYMlBXxQTuXxzW6OeZycoGB2RJAQdLuSfmvHBx2bGbHKiFFrlLQHI5Lt0aUs20dHtB5Nlh9qweRYO9DWzBFpH3GUYN7r4OrXXysq97FCga0pETyUI/mdsw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=vger.kernel.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=OJko6VK/4yYa1/Q3u/SX/rr6ao4Sfll0k5Fjriv+GgI=; b=tUcvn5dWpaQvyaZ91gqAKnNvBURBo/Mx0U4MvN76crk0JB3a6leiJPmG315xUSvHwTG9wampK1VTgxBLSO8DHDgX6yHoMOyEciEE5vLEBPi3vbGXMb70wvnDPXVXWLKLkSv4qZ4eK7LSiLNeSmXuzy56WwZ7/oXAezckUimbItw= Received: from CH2PR10CA0026.namprd10.prod.outlook.com (2603:10b6:610:4c::36) by SA1PR12MB7222.namprd12.prod.outlook.com (2603:10b6:806:2bf::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8207.14; Thu, 12 Dec 2024 06:39:07 +0000 Received: from CH2PEPF0000009A.namprd02.prod.outlook.com (2603:10b6:610:4c:cafe::e6) by CH2PR10CA0026.outlook.office365.com (2603:10b6:610:4c::36) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.17 via Frontend Transport; Thu, 12 Dec 2024 06:39:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CH2PEPF0000009A.mail.protection.outlook.com (10.167.244.22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Thu, 12 Dec 2024 06:39:06 +0000 Received: from localhost (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 12 Dec 2024 00:39:06 -0600 From: Michael Roth To: CC: , , , , , , , , , , , , , , , Subject: [PATCH 5/5] KVM: Add hugepage support for dedicated guest memory Date: Thu, 12 Dec 2024 00:36:35 -0600 Message-ID: <20241212063635.712877-6-michael.roth@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20241212063635.712877-1-michael.roth@amd.com> References: <20241212063635.712877-1-michael.roth@amd.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CH2PEPF0000009A:EE_|SA1PR12MB7222:EE_ X-MS-Office365-Filtering-Correlation-Id: f5f751d0-e858-4d15-9e61-08dd1a77ad6c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|1800799024|36860700013|7416014; X-Microsoft-Antispam-Message-Info: SiaS4GNlI+RLVI429LgqSGFS34du03p+2TsX/lTyCsSkonO5oW3QpJ/PCPjXtMaKSGEduA0PM+goNqbQZUIERXvuCp6W2qo8Di3DQyENxfxwbxPxQQuj/jSSURtSViQXINedtVd7baUt8ylf2g8tqxTrR9zEmQGeJlJHlaxoV3X9P5dkmjgQUgGfSySK3klRKTagyzZcpM1eMoL5LcJca7Fgk+9XxPFwZrbVpekH78ezs240oIfVv17MmILBA4LK7OMXGXADKHGGgchmQLDUGlq37KWhgxMpFrjCZsilZBJB1+HO4gT03hRjjqQp1WPbn6Mfxake4QK6e7wPMFX8kvauj04vqgjXkihRmlYcgl9l88ISNB6JYlIRH1fI28vxZdI8Mjl0nwEnz+IZ3UjIILqbq3FC467uWRp91cQ9tTNzliwz4lWRYGh03/25C7yxxNgRn5xHwvvEBMpXxfqCFCM7kgDUgIHPx2mmHld6JoQ77dAWkednspOXBQKhO769a7tyclzslXelOl64kLj8QbNF/yMGeLR8NhPRU3QenyNuQMSEfZTeXrvDQN/u21F3CeMqgn+YkC5WnerGT/voJIZvlW+erh0j2uDt6s7qn2uHdzykG6VSSu9mxSxxBZ4ds1BLIhSPMnKBrgrWahrRS9Mri+5S5l3Y5mE/28v7oki+qX/kcPUQ0UmFmRtg8GT3mkWp4cFpv40mbLJJ8SjFclpobTfPwzqzlB5OI81UvZgeVSs+1xFmi3GJDh/b0t0VDK4d5nsELtxSO/32L81CW16Bsndsc0DV2qYYfKtvY3kBMhn3vqkk7xKTppo/bLjW+pvQfnlXMDDv9fT+PeOntecaeCjJ0EZQ/3bL9/YFN7crnfpSW4OXpwBOHLsswxfdBZLBRVs+1B3fZRlrLnvF7aDaJu7QHKbNTWo2UEQOxSIFymwrcwDgrlfhNQPLm9ivrxGF36oUKHIzLF4I5EOEY3PIpeSKGxYKXQwhTUg2BHM72QS3zdsEEaZMZ9+VkHgnhfXAKdizWdq4NUlHPQJsb4kYuWk/nvSqhNpYFvoqnIuhRn3pRbo3uhYJPLcs/xkEebOtS2DRgQ9kYpIKw2CFLGpgwZSNqBRNI10UkbQoOISg68Dx0MlNi2PBqXB6W994Wfm6Ef+kVEBoZW9hjRe2VKYYNGAuMgXobgk7nwX2+OZKss3ExLUCZEs0ptMb90Q4mjkADdyFUF1DRuiZ3kQ9Z+VgV8Ifsa61q6TJWND6PFeGx55X5soCIZA5+wMQaUD1QK33Ph2bcQfTns/f9uEVP1qfchdy5jlACJpYl83rl8EAEClbHmdA52/TC/ksKITxh58lYPJVaLHDtvQfoeVWGXuq6r7vUPLxop0XarvNO9kGVtlDROR83fm+xfPV1WJPbxy+DITzYst/nHvC8mNv6A== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(376014)(1800799024)(36860700013)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Dec 2024 06:39:06.8704 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f5f751d0-e858-4d15-9e61-08dd1a77ad6c X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CH2PEPF0000009A.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7222 From: Sean Christopherson Extended guest_memfd to allow backing guest memory with hugepages. This is done as a best-effort by default until a better-defined mechanism is put in place that can provide better control/assurances to userspace about hugepage allocations. When reporting the max order when KVM gets a pfn from guest_memfd, force order-0 pages if the hugepage is not fully contained by the memslot binding, e.g. if userspace requested hugepages but punches a hole in the memslot bindings in order to emulate x86's VGA hole. Link: https://lore.kernel.org/kvm/20231027182217.3615211-1-seanjc@google.com/T/#mccbd3e8bf9897f0ddbf864e6318d6f2f208b269c Signed-off-by: Sean Christopherson Message-Id: <20231027182217.3615211-18-seanjc@google.com> [Allow even with CONFIG_TRANSPARENT_HUGEPAGE; dropped momentarily due to uneasiness about the API. - Paolo] Signed-off-by: Paolo Bonzini [mdr: based on discussion in the Link regarding original patch, make the following set of changes: - For now, don't introduce an opt-in flag to enable hugepage support. By default, just make a best-effort for PMD_ORDER allocations so that there are no false assurances to userspace that they'll get hugepages. Performance-wise, it's better at least than the current guarantee that they will get 4K pages every time. A more proper opt-in interface can then improve on things later. - Pass GFP_NOWARN to alloc_pages() so failures are not disruptive to normal operations - Drop size checks during creation time. Instead just avoid huge allocations if they extend beyond end of the memfd. - Drop hugepage-related unit tests since everything is now handled transparently to userspace anyway. - Update commit message accordingly.] Signed-off-by: Michael Roth --- include/linux/kvm_host.h | 2 ++ virt/kvm/guest_memfd.c | 68 +++++++++++++++++++++++++++++++--------- virt/kvm/kvm_main.c | 4 +++ 3 files changed, 59 insertions(+), 15 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c7e4f8be3e17..c946ec98d614 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2278,6 +2278,8 @@ extern unsigned int halt_poll_ns_grow; extern unsigned int halt_poll_ns_grow_start; extern unsigned int halt_poll_ns_shrink; +extern unsigned int gmem_2m_enabled; + struct kvm_device { const struct kvm_device_ops *ops; struct kvm *kvm; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 9a5172de6a03..d0caec99fe03 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -273,6 +273,36 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, return r; } +static struct folio *kvm_gmem_get_huge_folio(struct inode *inode, pgoff_t index, + unsigned int order) +{ + pgoff_t npages = 1UL << order; + pgoff_t huge_index = round_down(index, npages); + struct address_space *mapping = inode->i_mapping; + gfp_t gfp = mapping_gfp_mask(mapping) | __GFP_NOWARN; + loff_t size = i_size_read(inode); + struct folio *folio; + + /* Make sure hugepages would be fully-contained by inode */ + if ((huge_index + npages) * PAGE_SIZE > size) + return NULL; + + if (filemap_range_has_page(mapping, (loff_t)huge_index << PAGE_SHIFT, + (loff_t)(huge_index + npages - 1) << PAGE_SHIFT)) + return NULL; + + folio = filemap_alloc_folio(gfp, order); + if (!folio) + return NULL; + + if (filemap_add_folio(mapping, folio, huge_index, gfp)) { + folio_put(folio); + return NULL; + } + + return folio; +} + /* * Returns a locked folio on success. The caller is responsible for * setting the up-to-date flag before the memory is mapped into the guest. @@ -284,8 +314,15 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, */ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { - /* TODO: Support huge pages. */ - return filemap_grab_folio(inode->i_mapping, index); + struct folio *folio = NULL; + + if (gmem_2m_enabled) + folio = kvm_gmem_get_huge_folio(inode, index, PMD_ORDER); + + if (!folio) + folio = filemap_grab_folio(inode->i_mapping, index); + + return folio; } static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -660,6 +697,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) inode->i_size = size; mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); mapping_set_inaccessible(inode->i_mapping); + mapping_set_large_folios(inode->i_mapping); /* Unmovable mappings are supposed to be marked unevictable as well. */ WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); @@ -791,6 +829,7 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, { struct kvm_gmem *gmem = file->private_data; struct folio *folio; + pgoff_t huge_index; if (file != slot->gmem.file) { WARN_ON_ONCE(slot->gmem.file); @@ -803,6 +842,17 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, return ERR_PTR(-EIO); } + /* + * The folio can be mapped with a hugepage if and only if the folio is + * fully contained by the range the memslot is bound to. Note, the + * caller is responsible for handling gfn alignment, this only deals + * with the file binding. + */ + huge_index = ALIGN_DOWN(index, 1ull << *max_order); + if (huge_index < slot->gmem.pgoff || + huge_index + (1ull << *max_order) > slot->gmem.pgoff + slot->npages) + *max_order = 0; + folio = kvm_gmem_get_folio(file_inode(file), index); if (IS_ERR(folio)) return folio; @@ -814,8 +864,7 @@ static struct folio *__kvm_gmem_get_pfn(struct file *file, } *pfn = folio_file_pfn(folio, index); - if (max_order) - *max_order = 0; + *max_order = min_t(int, *max_order, folio_order(folio)); return folio; } @@ -910,17 +959,6 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long break; } - /* - * The max order shouldn't extend beyond the GFN range being - * populated in this iteration, so set max_order accordingly. - * __kvm_gmem_get_pfn() will then further adjust the order to - * one that is contained by the backing memslot/folio. - */ - max_order = 0; - while (IS_ALIGNED(gfn, 1 << (max_order + 1)) && - (npages - i >= (1 << (max_order + 1)))) - max_order++; - folio = __kvm_gmem_get_pfn(file, slot, index, &pfn, &max_order); if (IS_ERR(folio)) { ret = PTR_ERR(folio); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 5901d03e372c..525d136ba235 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -94,6 +94,10 @@ unsigned int halt_poll_ns_shrink = 2; module_param(halt_poll_ns_shrink, uint, 0644); EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); +unsigned int gmem_2m_enabled; +EXPORT_SYMBOL_GPL(gmem_2m_enabled); +module_param(gmem_2m_enabled, uint, 0644); + /* * Allow direct access (from KVM or the CPU) without MMU notifier protection * to unpinned pages.