From patchwork Mon Jun 24 06:36:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13709041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B79AC2D0D1 for ; Mon, 24 Jun 2024 07:05:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 494376B042B; Mon, 24 Jun 2024 03:05:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 444F06B042C; Mon, 24 Jun 2024 03:05:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30C6E6B042E; Mon, 24 Jun 2024 03:05:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0D4426B042B for ; Mon, 24 Jun 2024 03:05:28 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 872DFA24FA for ; Mon, 24 Jun 2024 07:05:27 +0000 (UTC) X-FDA: 82264896294.11.0FA7217 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.21]) by imf03.hostedemail.com (Postfix) with ESMTP id 4460920011 for ; Mon, 24 Jun 2024 07:05:25 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFqUrKRF; spf=pass (imf03.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719212719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=sClbAZq0AFQDutex2pOniFcf6lQ2usC2FcWVMz36Cpw=; b=IE25HLhcEndHW+VQ4j+cmbqQbKGdJrL/jGvVY6hkPPDzeMMmH39Xf2+9up9gG03/97Wuqe JRaW/BRBla0naTh4W7J7lF2twGmYX2r0ztnJp/IeOgA3zQ3hJF+C2PQQimGWj2bpHLmS0O 7hzAAscpDmskUZHKwJdVO5N/igJFFJI= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFqUrKRF; spf=pass (imf03.hostedemail.com: domain of vivek.kasireddy@intel.com designates 198.175.65.21 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719212719; a=rsa-sha256; cv=none; b=s4HnXZk7NqkX5yD+11dFV8HA5o+cLPee4vFGLP54LeZKxQn/BLNxk7AanxDNQEaQkppJNa mSvv1+XTA3/vese3C6FJSJFGF9jo10GtGquFc6PXndpR2ZA7ibZT/js3EsXMwqA852Wh8e 8bDs2T92vWUjnzyLj8vQWXCM8DiOr74= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1719212726; x=1750748726; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=TK9UW1dOeF0+19q5Bl1xRcBy5E1WW9rDWntREkoVRuQ=; b=BFqUrKRFPDGDLp8gu2GUaZkV3+Qk/TTDuIBcnGvn5cOnOskl9/NLBw3m 33nwwbzduFHcXtw+KDiKppMu7314j5qW4rj1bOxoOiYAYUYZfv0xfJY9V VatHheIFhx6/EF8cAGHJ5rWgqQczVYSLaZIuzdQnoMbI8MsNWnYp3EFQk N6W9QsoXMsiz4A6vMLq7EECyKfaH/XnJK3lpOGb7b6xpSGkbSWcunB7ay pFCzrJrVv9axVOzL8UBzVoxRYyc8OTAN05hrlOZQhDRYT7ldAOTbjOHdF B87BeamLbaksZ23UKBQJbOFt+XXk4r5qoyCEhkWpz4rDkK4ShKubSACEK g==; X-CSE-ConnectionGUID: bOeU5H3nQk2X0G9Mpa59Rg== X-CSE-MsgGUID: ddUte37PQJu6T7Ddwp98Ow== X-IronPort-AV: E=McAfee;i="6700,10204,11112"; a="16134927" X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="16134927" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa113.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2024 00:05:24 -0700 X-CSE-ConnectionGUID: c9QrbHC2TEuxCsweK5lQ0A== X-CSE-MsgGUID: cvwYdPkcR7u47Xlo8+r3xg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,261,1712646000"; d="scan'208";a="73955856" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2024 00:05:24 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Matthew Wilcox , Christoph Hellwig , Andrew Morton , Oscar Salvador , Daniel Vetter , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v16 0/9] mm/gup: Introduce memfd_pin_folios() for pinning memfd folios Date: Sun, 23 Jun 2024 23:36:08 -0700 Message-ID: <20240624063952.1572359-1-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.45.1 MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4460920011 X-Stat-Signature: kcexm893sc9ptytga7e88o486iqd7da5 X-HE-Tag: 1719212725-120362 X-HE-Meta: U2FsdGVkX18E7ckK7hLmkwwZ1h33vhOwUiXF/x2TpDQFtLAhzTmcTmB9GYandDNsuh1uyqWDk699ZDx/vHnN6cEg5h3040+Hz0yr6JtmAlSRSR8jRAZ1NPXNFn+RJd0uVQCOy24q/nqeVHqn2y+o9fRX68m6zcJfNw0I730tiBgbO33KGTq2+PC+xqnelfe5gOMKltbcWnqf4S5tTRpFNtgbgwSmJj3xFuH6Wzc+FFa9yg0ZyS8lX7Ma46kQW1ryj/8CXwFU8SUTzWOxBGixam6LjGkqm7nPBtE4Sf/UsyWHG0BJMzO3FvnB7UQpbvm3DLRnsW+gkUf6DGlDOepfIBQK2MjXskMVwNUTTG7c+btD8U+fr/Fqcv7qLpo3eEhIOJxl2/r5TPk2N8MXdWuGkVaqpeE3dlheKTTQl+kFYot2L59PdTzO1M+THNL1hQnAmYQkHNj1vyMoQjif3cpvwdRyWKTonXeQD9/2qYIa9WvWM7cHXiGI8RRnTl9u7ZvFWNBCBxvZNsHr/6K9s4daP9/lOQUpVU8NuFRA3MSx+cQXWP2XcgzfXiMeAjzsDL/VPMqEhLLVwlBNnexRaeepf3VEgH86A0n3x5V3Zp9ctrvgCU83X2kp6DwegSO46ij6quxvwllZnZSA0gfyJxj+g6VdMeT1hc0u6AY5Hj+V92t++GKT5zCWF8IKh+Ex7J2S7KTOqBgx/qOK15VCpzPwRZrdpxK0gU/jsVp34Bmh+OCEPV80y1lwMlMon/zUY4rc+Ny+9DtMO0xUc590XsPrQZ79m41oYRIbf0zC2yV4KLYXWSrHQjaFTGXgONMpK1NM+amwhPgSjNE8Mbn07N6iIk8VGdeA55erhILaen3lSK+sgHICZ0Nh2kKwJqSGrOI693A7dttev9rYf6C85ZKFFJatWY5Ej6V47fd+FcJx9FMggOI07eqDCwlqirca7EkMbNOGKSCQPZIp+Cgx/Kt EBs/aRM4 PZUrZ7qoU8bURTTQL50MA9qSJfdJQ9AFl3JuXGAWfqO8Ho1QNDqtIDPihYbGcyXmUNoWI+ARyI11ojEq/Tiq7ototcOItef8SLYP7o66t/74zzaOAgtf9OEanDkASSVKMDjz+lRYYN1d6ds559s92XqmDebXZP1n6/myKX2qTUybGfsEjUSthxf2ZJSAkEoCpkipcBqB7/k25FAM+m5EzNCbobIamGnrhagQ1qCdyzi3f5cIWbNBUHfmuHslDrQ0unucjY4gnWoYhXQYh8QfkDYb7LllHe80EnXKyrq5FEwhoWwHbjKDWG0O4Do3ROnjZMF4OzyrSlUE84tIAsJsKaQjAzMpVU3dg+ijwAsQAVfpaaOhVC3uBwklJ+xnT3b4B7fV/OXPK2K1hT5ORecvKMoH9lfCn8HLwSeh44hGVh8eoyJQyTgifzLCYXlxbKRARfQgb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Currently, some drivers (e.g, Udmabuf) that want to longterm-pin the pages/folios associated with a memfd, do so by simply taking a reference on them. This is not desirable because the pages/folios may reside in Movable zone or CMA block. Therefore, having drivers use memfd_pin_folios() API ensures that the folios are appropriately pinned via FOLL_PIN for longterm DMA. This patchset also introduces a few helpers and converts the Udmabuf driver to use folios and memfd_pin_folios() API to longterm-pin the folios for DMA. Two new Udmabuf selftests are also included to test the driver and the new API. --- Patchset overview: Patch 1-2: GUP helpers to migrate and unpin one or more folios Patch 3: Introduce memfd_pin_folios() API Patch 4-6: Udmabuf driver bug fixes for Qemu + hugetlb=on, blob=true case Patch 7-9: Convert Udmabuf to use memfd_pin_folios() and add selftests This series is tested using the following methods: - Run the subtests added in the last patch - Run Qemu (master) with the following options and a few additional patches to Spice: qemu-system-x86_64 -m 4096m.... -device virtio-gpu-pci,max_outputs=1,blob=true,xres=1920,yres=1080 -spice port=3001,gl=on,disable-ticketing=on,preferred-codec=gstreamer:h264 -object memory-backend-memfd,hugetlb=on,id=mem1,size=4096M -machine memory-backend=mem1 - Run source ./run_vmtests.sh -t gup_test -a to check GUP regressions Changelog: v15 -> v16: - Instead of passing GFP_USER while allocating a hugetlb folio, use htlb_alloc_mask(h) & ~(__GFP_HIGHMEM | __GFP_MOVABLE) as gfp mask to discourage new users from passing GFP_xxx flags. Also add comments to explain this situation (Oscar) - Replace NUMA_NO_NODE with numa_node_id() while allocating the htlb folio to discourage new users from passing NUMA_NO_NODE v14 -> v15: - Add an error check start < 0 in memfd_pin_folios() - Return an error in udmabuf driver if memfd_pin_folios() returns 0 These two checks fix the following issue identified by syzbot: https://syzkaller.appspot.com/bug?extid=40c7dad27267f61839d4 - Set memfd = NULL before dmabuf export to ensure that memfd is not closed twice. This fixes the following syzbot issue: https://syzkaller.appspot.com/bug?extid=b2cfdac9ae5278d4b621 v13 -> v14: - Drop the redundant comments before check_and_migrate_movable_pages() and refer to check_and_migrate_movable_folios() comments (David) - Use appropriate ksft_* functions for printing and KSFT_* codes for exit() in udmabuf selftest (Shuah) - Add Mike Kravetz's suggested-by tag in udmabuf selftest patch (Shuah) - Collect Ack and Rb tags from David v12 -> v13: (suggestions from David) - Drop the sanity checks in unpin_folio()/unpin_folios() due to unavailability of per folio anon-exclusive flag - Export unpin_folio()/unpin_folios() using EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL - Have check_and_migrate_movable_pages() just call check_and_migrate_movable_folios() instead of calling other helpers - Slightly improve the comments and commit messages v11 -> v12: - Rebased and tested on mm-unstable v10 -> v11: - Remove the version string from the patch subject (Andrew) - Move the changelog from the patches into the cover letter - Rearrange the patchset to have GUP patches at the beginning v9 -> v10: - Introduce and use unpin_folio(), unpin_folios() and check_and_migrate_movable_folios() helpers - Use a list to track the folios that need to be unpinned in udmabuf v8 -> v9: (suggestions from Matthew) - Drop the extern while declaring memfd_alloc_folio() - Fix memfd_alloc_folio() declaration to have it return struct folio * instead of struct page * when CONFIG_MEMFD_CREATE is not defined - Use folio_pfn() on the folio instead of page_to_pfn() on head page in udmabuf - Don't split the arguments to shmem_read_folio() on multiple lines in udmabuf v7 -> v8: (suggestions from David) - Have caller pass [start, end], max_folios instead of start, nr_pages - Replace offsets array with just offset into the first page - Add comments explaning the need for next_idx - Pin (and return) the folio (via FOLL_PIN) only once v6 -> v7: - Rename this API to memfd_pin_folios() and make it return folios and offsets instead of pages (David) - Don't continue processing the folios in the batch returned by filemap_get_folios_contig() if they do not have correct next_idx - Add the R-b tag from Christoph v5 -> v6: (suggestions from Christoph) - Rename this API to memfd_pin_user_pages() to make it clear that it is intended for memfds - Move the memfd page allocation helper from gup.c to memfd.c - Fix indentation errors in memfd_pin_user_pages() - For contiguous ranges of folios, use a helper such as filemap_get_folios_contig() to lookup the page cache in batches - Split the processing of hugetlb or shmem pages into helpers to simplify the code in udmabuf_create() v4 -> v5: (suggestions from David) - For hugetlb case, ensure that we only obtain head pages from the mapping by using __filemap_get_folio() instead of find_get_page_flags() - Handle -EEXIST when two or more potential users try to simultaneously add a huge page to the mapping by forcing them to retry on failure v3 -> v4: - Remove the local variable "page" and instead use 3 return statements in alloc_file_page() (David) - Add the R-b tag from David v2 -> v3: (suggestions from David) - Enclose the huge page allocation code with #ifdef CONFIG_HUGETLB_PAGE (Build error reported by kernel test robot ) - Don't forget memalloc_pin_restore() on non-migration related errors - Improve the readability of the cleanup code associated with non-migration related errors - Augment the comments by describing FOLL_LONGTERM like behavior - Include the R-b tag from Jason v1 -> v2: - Drop gup_flags and improve comments and commit message (David) - Allocate a page if we cannot find in page cache for the hugetlbfs case as well (David) - Don't unpin pages if there is a migration related failure (David) - Drop the unnecessary nr_pages <= 0 check (Jason) - Have the caller of the API pass in file * instead of fd (Jason) Cc: David Hildenbrand Cc: Matthew Wilcox (Oracle) Cc: Christoph Hellwig Cc: Andrew Morton Cc: Oscar Salvador Cc: Daniel Vetter Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Arnd Bergmann (1): udmabuf: add CONFIG_MMU dependency Vivek Kasireddy (8): mm/gup: Introduce unpin_folio/unpin_folios helpers mm/gup: Introduce check_and_migrate_movable_folios() mm/gup: Introduce memfd_pin_folios() for pinning memfd folios udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap udmabuf: Add back support for mapping hugetlb pages udmabuf: Convert udmabuf driver to use folios udmabuf: Pin the pages using memfd_pin_folios() API selftests/udmabuf: Add tests to verify data after page migration drivers/dma-buf/Kconfig | 1 + drivers/dma-buf/udmabuf.c | 232 +++++++++---- include/linux/memfd.h | 5 + include/linux/mm.h | 5 + mm/gup.c | 308 +++++++++++++++--- mm/memfd.c | 45 +++ .../selftests/drivers/dma-buf/udmabuf.c | 214 ++++++++++-- 7 files changed, 673 insertions(+), 137 deletions(-)