From patchwork Tue Nov 28 20:49:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pasha Tatashin X-Patchwork-Id: 13471782 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB2E2C4167B for ; Tue, 28 Nov 2023 20:49:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 410BF6B0350; Tue, 28 Nov 2023 15:49:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C0A96B0351; Tue, 28 Nov 2023 15:49:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2617F6B0352; Tue, 28 Nov 2023 15:49:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 162966B0350 for ; Tue, 28 Nov 2023 15:49:44 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id DD6EF403A2 for ; Tue, 28 Nov 2023 20:49:43 +0000 (UTC) X-FDA: 81508554246.24.C898971 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf26.hostedemail.com (Postfix) with ESMTP id 1544714001C for ; Tue, 28 Nov 2023 20:49:41 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=LhyVyo9s; spf=pass (imf26.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701204582; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=FsBQ1c03W+bPyqjQatoY3AtDcRPwnaylyTF2JWK/6Qg=; b=IiboTBBX9QilutJJZn4fN3mBAFXmTEk42S7BmqjtPPJVlYFa/rSAN/CWRdOsBfzAPw4dEM D2PbD/f7vuj6iuEr46DbAruitTBMoM+Br4Uhfqgz7oXS5khk0OeON4HB2b4ItIcjsHhrwp Khy0u5Uymx+fgIOtJTMHg8kX14y/H8s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701204582; a=rsa-sha256; cv=none; b=k0ubQNzVE6W0nhBs0YcYLVuR42OSQDbdEOLe+ZKFVhumG+Hv/KFtAmyOESSaTrpO+i98za oborgvBfbTb66FHgKeQFUW4xsFkbIMB+QgrW2PXt3SCjAbVqQfJrGVot0knOlN/osPMwwP dXbY/fSpIEV1cAIJNZUVaqOga733z/0= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=soleen.com header.s=google header.b=LhyVyo9s; spf=pass (imf26.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com; dmarc=none Received: by mail-qk1-f182.google.com with SMTP id af79cd13be357-77d89b4cb96so17826085a.0 for ; Tue, 28 Nov 2023 12:49:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1701204581; x=1701809381; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=FsBQ1c03W+bPyqjQatoY3AtDcRPwnaylyTF2JWK/6Qg=; b=LhyVyo9s6aX+vdTgJykKdDmEXoHDegYp93yfTQePzJee0a5qel/F+0h79W+Jb/+ABw ORU9oJnUCVQOjq6PoM6sBVdCNvZHmwTQ7r6W+aIuLE6rRBg4Iw+xsHGJDJJAB9lMUbeg bAhIprZOPfrWVshKlC/0eTAfrBJCkF4E1RYcQ2h8WCPwDiSgJFooKut1MBuPaiSJ17jG auwYa+QXhLAfZEkI646I5W2fLE5kjwpALCgM66bTKIehWRMmWViyAeh+5tQrUFW9VeNK ngtbhqJApJRqA2WEqbv2u/b7mIOBuKvzdnZr0aUSRs5JYi9FVsthgd4Kf++ConfTLoFk BoVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701204581; x=1701809381; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FsBQ1c03W+bPyqjQatoY3AtDcRPwnaylyTF2JWK/6Qg=; b=rEgUm8yybbhSskD988IgnPL1dCmZVXRnGq06WuIoKDhVBcTP2fRf1LwliIdO73nh1/ lJxBwJPKSM5+4vnl139d8X1v3aBJi5WLYPus5wCg0Msx+Q7qu/e/g+u25rt+q0BQnwLw gVODMcb0KLwVTHdDJy9lm2TWwStoKC79gvpxOL4U2+CWl5XwZBWuxz04erXQFOnDiO/Z KBJXIuiO50ltvL1PhsJmdERrMr5cX5nXsIAi+etFjVab7b3vNKQ4ptCribyqNynNCFBY 5Iam+ApvlZRl95jKnJoht5pVIgHA4ohGupICsWH6P4CucRf7AELp3mYsZCmehPT88wiC aRyA== X-Gm-Message-State: AOJu0YxLmNbzK2wNNDxCUj72SfJQv8lGbaQvyk2ildz62Nf4LHtxZOQc pZu3vNV9nJAVA6l3qsfg5quYTw== X-Google-Smtp-Source: AGHT+IEGKj81OhPPfZGrAMOrKOsHXDo+wy0whZdL8ikR2KmUyXglOYylmfrRjMDC83oWuy4vndr7lg== X-Received: by 2002:a05:620a:1452:b0:77d:c593:f63c with SMTP id i18-20020a05620a145200b0077dc593f63cmr2543824qkl.24.1701204581121; Tue, 28 Nov 2023 12:49:41 -0800 (PST) Received: from soleen.c.googlers.com.com (55.87.194.35.bc.googleusercontent.com. [35.194.87.55]) by smtp.gmail.com with ESMTPSA id d11-20020a0cfe8b000000b0067a56b6adfesm1056863qvs.71.2023.11.28.12.49.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 12:49:40 -0800 (PST) From: Pasha Tatashin To: akpm@linux-foundation.org, alex.williamson@redhat.com, alim.akhtar@samsung.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, baolu.lu@linux.intel.com, bhelgaas@google.com, cgroups@vger.kernel.org, corbet@lwn.net, david@redhat.com, dwmw2@infradead.org, hannes@cmpxchg.org, heiko@sntech.de, iommu@lists.linux.dev, jasowang@redhat.com, jernej.skrabec@gmail.com, jgg@ziepe.ca, jonathanh@nvidia.com, joro@8bytes.org, kevin.tian@intel.com, krzysztof.kozlowski@linaro.org, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rockchip@lists.infradead.org, linux-samsung-soc@vger.kernel.org, linux-sunxi@lists.linux.dev, linux-tegra@vger.kernel.org, lizefan.x@bytedance.com, marcan@marcan.st, mhiramat@kernel.org, mst@redhat.com, m.szyprowski@samsung.com, netdev@vger.kernel.org, pasha.tatashin@soleen.com, paulmck@kernel.org, rdunlap@infradead.org, robin.murphy@arm.com, samuel@sholland.org, suravee.suthikulpanit@amd.com, sven@svenpeter.dev, thierry.reding@gmail.com, tj@kernel.org, tomas.mudrunka@gmail.com, vdumpa@nvidia.com, virtualization@lists.linux.dev, wens@csie.org, will@kernel.org, yu-cheng.yu@intel.com Subject: [PATCH 00/16] IOMMU memory observability Date: Tue, 28 Nov 2023 20:49:22 +0000 Message-ID: <20231128204938.1453583-1-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.43.0.rc2.451.g8631bc7472-goog MIME-Version: 1.0 X-Rspamd-Queue-Id: 1544714001C X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 7acsi94e4kk77aqxbjtdz4iqw49tdcm6 X-HE-Tag: 1701204581-356972 X-HE-Meta: U2FsdGVkX1/2qvALM2azhBjwHiZmnXTnEaGOXPhEN6ujR99MnJpd9ok/N3lyQglH0iufEykkhY0DlCZbMrw5Knx7VVdvqrYExXcGl4aFHHMg6tH4aV58InvaGa4ZTpi178mF6eGQisgSayDOo/Mc30ja2bAJbPE8kpPc6gy3eLiJOwbXO/ITWSlpfNErSzMZXJA01GlS/bIn7h/fIAi6jsMTgpMmaTy1kLd4WU4c+Ecr0zIVh3xxylj9rntfXQRbvxptaa2du/4EIABq43QCR5WUpX3kMuxIT6artG6u8X08NNlDJdtmC1WIhLm9iY5kYt4nINBAXGE3X/PXfmG6ToEyzDxXK9OoHk178MiuxjI7GizrsqgRhk6lhP4l3iKENNtZtTUTj0mgXYImeRmXG47TBqvbi5cZAWz8w9SanZ1gTYSFmX0pw0MgaIVBJA9c2+W0WUTlBUedFozVbP2zA5VBawwpuyy8WpfAad5GIbaW6y4M3pO5EFxWugGfPx9Z+3gtWRm2ogwT/MjI/1GwP+O4XD+JElFYe8HqeRDyRrz2iFBJ7jfc5Vc+W5l/0GM9EdwNSPLHEAAXCKEWb1SiqVRRL2dpdjKI0CADDKebL82iTfnzax04yfcGIqdnGYonBetdFTFlElu5IcsT7OYflxvzpz1hY/TIZWUbNyJyq59YeHsnecFH4N7GCg2NXMFgz6JT+yFOFWR+zdmmdZ1LRR2P8Gke3ZG+wzk92riARQl3vf4QtGBDx1V2pHLAInRFpDTuvdA1xM62MAc0LjAauzz8QRLHy40HF4G4ucBt1oyg8HXzcPMWgPiZk/MM3CbwXOs0xw38LFK6lobfEfVlPYNfjhi3kyrBQrkx8dASrkEjzBMAhTDqc8B9hVrMsPd5evlhENiagNOYTx18T/lNM+3vW42vUBjjHZdWlS0yMdpUSUe6klaLuBK2DPfcFFDtdf/3NCUofZvFd2km+X7 AcwgT8eP NTh6xInfXRNJ/yN+Um1c0HRwFtkXejRrAKVRaaAFTSCX3WOcyOul57BnTS8eSslTjkhRSXFxpsqScxdEHEZsbHm63ORPeScVO8COBMlbqH3gqG2wcM5tU41eI9KwaFHT7r1zUYa8AfpaD7/PtebnqvX0fpvYy3XEe3V9RHvtUiaGEMl+SDl8zxcAbt/QpFsldUfI9DchDkpey+916Tlav5+s+pRRczSEMFCxwMXflpTIypGAzWYxpHUC7uq+Ke5NB4OKCidrPxtWwnPGr8tnYAoFFiPaobzAueN6KSOgFl4RiCKCZuGiPjPI2kHvW9epgMaUxjdRwLM4N4eU6W5vbh1/RSpPvQkyvaWWVJjQwhMSL3Lhu/qNXBED63rHgnTgBuQCIIpAZNvkEXD3XiJqcAd1K/ahHR+hEfCcS1Xd3ikXYPliPNvA3+wHxjQPCJEif5quRKjMUxnL4KtNrhd67ea00LRwubHt8s4O1BLdFg4+OKvvhrFL1oFU+mXt7jTe6LzyqtWLQ3dddp48= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Pasha Tatashin IOMMU subsystem may contain state that is in gigabytes. Majority of that state is iommu page tables. Yet, there is currently, no way to observe how much memory is actually used by the iommu subsystem. This patch series solves this problem by adding both observability to all pages that are allocated by IOMMU, and also accountability, so admins can limit the amount if via cgroups. The system-wide observability is using /proc/meminfo: SecPageTables: 438176 kB Contains IOMMU and KVM memory. Per-node observability: /sys/devices/system/node/nodeN/meminfo Node N SecPageTables: 422204 kB Contains IOMMU and KVM memory memory in the given NUMA node. Per-node IOMMU only observability: /sys/devices/system/node/nodeN/vmstat nr_iommu_pages 105555 Contains number of pages IOMMU allocated in the given node. Accountability: using sec_pagetables cgroup-v2 memory.stat entry. With the change, iova_stress[1] stops as limit is reached: # ./iova_stress iova space: 0T free memory: 497G iova space: 1T free memory: 495G iova space: 2T free memory: 493G iova space: 3T free memory: 491G stops as limit is reached. This series encorporates suggestions that came from the discussion at LPC [2]. [1] https://github.com/soleen/iova_stress [2] https://lpc.events/event/17/contributions/1466 Pasha Tatashin (16): iommu/vt-d: add wrapper functions for page allocations iommu/amd: use page allocation function provided by iommu-pages.h iommu/io-pgtable-arm: use page allocation function provided by iommu-pages.h iommu/io-pgtable-dart: use page allocation function provided by iommu-pages.h iommu/io-pgtable-arm-v7s: use page allocation function provided by iommu-pages.h iommu/dma: use page allocation function provided by iommu-pages.h iommu/exynos: use page allocation function provided by iommu-pages.h iommu/fsl: use page allocation function provided by iommu-pages.h iommu/iommufd: use page allocation function provided by iommu-pages.h iommu/rockchip: use page allocation function provided by iommu-pages.h iommu/sun50i: use page allocation function provided by iommu-pages.h iommu/tegra-smmu: use page allocation function provided by iommu-pages.h iommu: observability of the IOMMU allocations iommu: account IOMMU allocated memory vhost-vdpa: account iommu allocations vfio: account iommu allocations Documentation/admin-guide/cgroup-v2.rst | 2 +- Documentation/filesystems/proc.rst | 4 +- drivers/iommu/amd/amd_iommu.h | 8 - drivers/iommu/amd/init.c | 91 +++++----- drivers/iommu/amd/io_pgtable.c | 13 +- drivers/iommu/amd/io_pgtable_v2.c | 20 +- drivers/iommu/amd/iommu.c | 13 +- drivers/iommu/dma-iommu.c | 8 +- drivers/iommu/exynos-iommu.c | 14 +- drivers/iommu/fsl_pamu.c | 5 +- drivers/iommu/intel/dmar.c | 10 +- drivers/iommu/intel/iommu.c | 47 ++--- drivers/iommu/intel/iommu.h | 2 - drivers/iommu/intel/irq_remapping.c | 10 +- drivers/iommu/intel/pasid.c | 12 +- drivers/iommu/intel/svm.c | 7 +- drivers/iommu/io-pgtable-arm-v7s.c | 9 +- drivers/iommu/io-pgtable-arm.c | 7 +- drivers/iommu/io-pgtable-dart.c | 37 ++-- drivers/iommu/iommu-pages.h | 231 ++++++++++++++++++++++++ drivers/iommu/iommufd/iova_bitmap.c | 6 +- drivers/iommu/rockchip-iommu.c | 14 +- drivers/iommu/sun50i-iommu.c | 7 +- drivers/iommu/tegra-smmu.c | 18 +- drivers/vfio/vfio_iommu_type1.c | 8 +- drivers/vhost/vdpa.c | 3 +- include/linux/mmzone.h | 5 +- mm/vmstat.c | 3 + 28 files changed, 415 insertions(+), 199 deletions(-) create mode 100644 drivers/iommu/iommu-pages.h