From patchwork Thu Apr 4 07:26:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kasireddy, Vivek" X-Patchwork-Id: 13617397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6E635CD1284 for ; Thu, 4 Apr 2024 07:54:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 130D06B0098; Thu, 4 Apr 2024 03:54:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 019AE6B009C; Thu, 4 Apr 2024 03:54:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CC6BE6B0099; Thu, 4 Apr 2024 03:54:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 93FB46B0098 for ; Thu, 4 Apr 2024 03:54:39 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 69E60A1315 for ; Thu, 4 Apr 2024 07:54:39 +0000 (UTC) X-FDA: 81971087478.02.72B8903 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by imf14.hostedemail.com (Postfix) with ESMTP id 578FC100005 for ; Thu, 4 Apr 2024 07:54:37 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IjwSa+Am; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.13 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712217277; a=rsa-sha256; cv=none; b=8fBzYmClX6B3grHQBQAcFGy3iOG9wqPQDFbiKrbuNYcpVIglnaz9zFhjuebmPJxRifwcHx uyauDSPd9XFLE13DurqVVuVhYDYt6yYtxrhEzShgDFyqy/00NpOxcRB5vDxR14amF3gbFL liFtP/Phx6+eo2/qGDd+gHjfzkAWdlw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IjwSa+Am; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf14.hostedemail.com: domain of vivek.kasireddy@intel.com designates 192.198.163.13 as permitted sender) smtp.mailfrom=vivek.kasireddy@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712217277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S0hncFE+TN2V8h0Vy9nPky+8cKi/00juTYk6MbcQZr4=; b=P39rC1chWhjqlmuJaKdVVRa+buYcTGBIpL2lXgrraBMzcbWrKikPMmf4nPzcFe3XcfRS3f 9hn8zlfWy4vN6ZPHDECAC2fqk3qL0JEObMaFhtD/+dfPn3uq84hgRKZm5tQY3F9ptQDVfv DS0jnziiwQiiTAk8+iwmmHmNvEGTGYA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712217277; x=1743753277; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CRNrLA7Zi25moWFEfOmiw/XIaGMGZ6Mt2vHfsiWi2qA=; b=IjwSa+Am87xY9CR1QO1nyi0Mi3r4w//7emzod17W3vpaZHKxfMbRfoU4 MaJUOj5gwYa4DMVyMkHuUZsChfQWDQL/3ePGwm1XHm8P6cYZy4ZBKGdav WePn2R26RrQNhTKB9SM+UpLHGvvfnDia8TSd0s5147IzoGs3k6bq/R7Pw dKF38NZ4crYSw+TgR+hVB8ix+LYXIsdjYR4nr+mJ8fbYQETNXUCwCqGhq gmJvcG9OAN5K0NoNMg3fKTo5qjCF+G2fPMM8FWPX+m5LtyXEJ6We0U4qm WvNskrKCXCVJmkHVt2G7gO+PFGYWQmigF8EEwvJoxNjK4Su4Aloi1KiZY w==; X-CSE-ConnectionGUID: QChKU5J/TzS42Wrs7ysCDA== X-CSE-MsgGUID: nab/+NYgTtWfr7kUhMFVkA== X-IronPort-AV: E=McAfee;i="6600,9927,11033"; a="10450823" X-IronPort-AV: E=Sophos;i="6.07,178,1708416000"; d="scan'208";a="10450823" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2024 00:54:29 -0700 X-CSE-ConnectionGUID: B2+98LLfRjC4ffidreQs8Q== X-CSE-MsgGUID: IyPDF0H4TNivTfewedllvQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,178,1708416000"; d="scan'208";a="19298778" Received: from vkasired-desk2.fm.intel.com ([10.105.128.132]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2024 00:54:29 -0700 From: Vivek Kasireddy To: dri-devel@lists.freedesktop.org, linux-mm@kvack.org Cc: Vivek Kasireddy , David Hildenbrand , Matthew Wilcox , Daniel Vetter , Mike Kravetz , Hugh Dickins , Peter Xu , Jason Gunthorpe , Gerd Hoffmann , Dongwon Kim , Junxiao Chang Subject: [PATCH v13 7/8] udmabuf: Pin the pages using memfd_pin_folios() API Date: Thu, 4 Apr 2024 00:26:14 -0700 Message-ID: <20240404073053.3073706-8-vivek.kasireddy@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240404073053.3073706-1-vivek.kasireddy@intel.com> References: <20240404073053.3073706-1-vivek.kasireddy@intel.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 578FC100005 X-Stat-Signature: 77uz7zekjwwn6w9epwpw85qeg97ekgom X-HE-Tag: 1712217277-242848 X-HE-Meta: U2FsdGVkX1//X+0RWK66Sbrpo9R+Q9omUnqOG/qBudNu3nmqJsEMIkwoxEV7QGp8QtPEHJyMQTb0ous5Uy0tNb/KpfOWXdgxeBrrwb7bzgU7BA6fiobRnsEiCrihUhrGs1KV1PLw2IAWHOHL41K/WT8lbDmrJ0Tz+1+TrHxobZLBmV0ovVxknOeM10zUGVR21WGVhmcyWUDdKls+fox/cxwgnATO+n9lkKamTpXtvnSV1f4UlbyC0M0QD4MyABzsk5sm5Bo48t8CjjQh5mn9/U7NSPw0ys3j503yeNeidsGGZ5uUvcmTFLfi+sXEGgcSIoOFQn3XJOL/HvT3GQ/CH4ErM6DXA9ZU/2ruk2U3pqWahYXLX5PF71Z5e9ZmtcookB/4ewhWJrO9+CcUKjgx8BF39ZrFYjzWzOzKZHaRaeXJ/e14CmCer7khP1ci50gR+hT7in8nMt7Df+NuV1O93ZfxNmrCOAR2hFaVb3sgBdvjX/2VtOAYAwAltmQ8mQ410nlzLiNqaciOAB8RILZQV7PconDONDJOcSlohNAJbpiuqmszCGBrNcuKSNwFt3j5hpiuRQqwSg/kWL+8tHmPx6kU0dq3tLtRvlGv5PwM/9ideiwvJVuK4xA2TwdoKxY2iys4n7Tw4evDCMUeu42vJXL/orOBibrlUYUJjaxIWTLbGvZ7J7Jl41ZlV0+3+mhbulPJW+XVIx1Llp9soOGl+QKxMQ0oq4lzGuW1WApYxeQUR6NhyBXOYk33CLE2qYVRejgMamMXl3NBP6fMvLRyoBSJzlY/8P4wMs2qtgEvAoWeD0eDz4YqZ2q5Crwh1HRqN937Z4KMU0Co3s9C1gjCmLSqA9PkIk9cBrqHRq2ZEThQecEQqQA2dzOxxg5J1B0DdRZPxklUiEZPcZ7wJpMMrAbw0NMvOjcF3xrKQExYgaN3pv7GW/Bw558beV9P4lJ2gmbr/W+yrzq4+AGWrOV f6F427vQ m1I5hChO5pSlQRToYbkq828PPffbBS6aHD+djaaiAhMq2p5aMMx3rEL+mfgdmsm9vunozBMbUwOmGOX/cC1kJP/8X0MgSN+dH40dRuBEUjXB0IhuQeT0z8whbNp0A8033BW4QUNEh8TAso8hcsva5Gz9NNPfyz9qv6kHIGYr2RC40U4TTx0MKCtEI7TRdmFg0U4LILhpeX0hKXRFy9jD6oLwfZlQAevegWzfadA53hAs+oAAoykNNXI8BFuxXxmKC3l6ZLp1SVKfhycJ83Iczj45im9qXS5b0Ve2mhG0DfBAeKSPwfeixpsmMH70s69klgdEKHDLa+J4X630Agu7g79kgMhFuArk6iFX4aXs6Fb0Tre4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Using memfd_pin_folios() will ensure that the pages are pinned correctly using FOLL_PIN. And, this also ensures that we don't accidentally break features such as memory hotunplug as it would not allow pinning pages in the movable zone. Using this new API also simplifies the code as we no longer have to deal with extracting individual pages from their mappings or handle shmem and hugetlb cases separately. Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Daniel Vetter Cc: Mike Kravetz Cc: Hugh Dickins Cc: Peter Xu Cc: Jason Gunthorpe Cc: Gerd Hoffmann Cc: Dongwon Kim Cc: Junxiao Chang Signed-off-by: Vivek Kasireddy --- drivers/dma-buf/udmabuf.c | 153 +++++++++++++++++++------------------- 1 file changed, 78 insertions(+), 75 deletions(-) diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index a8f3af61f7f2..afa8bfd2a2a9 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -30,6 +30,12 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; pgoff_t *offsets; + struct list_head unpin_list; +}; + +struct udmabuf_folio { + struct folio *folio; + struct list_head list; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -153,17 +159,43 @@ static void unmap_udmabuf(struct dma_buf_attachment *at, return put_sg_table(at->dev, sg, direction); } +static void unpin_all_folios(struct list_head *unpin_list) +{ + struct udmabuf_folio *ubuf_folio; + + while (!list_empty(unpin_list)) { + ubuf_folio = list_first_entry(unpin_list, + struct udmabuf_folio, list); + unpin_folio(ubuf_folio->folio); + + list_del(&ubuf_folio->list); + kfree(ubuf_folio); + } +} + +static int add_to_unpin_list(struct list_head *unpin_list, + struct folio *folio) +{ + struct udmabuf_folio *ubuf_folio; + + ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL); + if (!ubuf_folio) + return -ENOMEM; + + ubuf_folio->folio = folio; + list_add_tail(&ubuf_folio->list, unpin_list); + return 0; +} + static void release_udmabuf(struct dma_buf *buf) { struct udmabuf *ubuf = buf->priv; struct device *dev = ubuf->device->this_device; - pgoff_t pg; if (ubuf->sg) put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL); - for (pg = 0; pg < ubuf->pagecount; pg++) - folio_put(ubuf->folios[pg]); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf); @@ -218,64 +250,6 @@ static const struct dma_buf_ops udmabuf_ops = { #define SEALS_WANTED (F_SEAL_SHRINK) #define SEALS_DENIED (F_SEAL_WRITE) -static int handle_hugetlb_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - struct hstate *hpstate = hstate_file(memfd); - pgoff_t mapidx = offset >> huge_page_shift(hpstate); - pgoff_t subpgoff = (offset & ~huge_page_mask(hpstate)) >> PAGE_SHIFT; - pgoff_t maxsubpgs = huge_page_size(hpstate) >> PAGE_SHIFT; - struct folio *folio = NULL; - pgoff_t pgidx; - - mapidx <<= huge_page_order(hpstate); - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - if (!folio) { - folio = __filemap_get_folio(memfd->f_mapping, - mapidx, - FGP_ACCESSED, 0); - if (IS_ERR(folio)) - return PTR_ERR(folio); - } - - folio_get(folio); - ubuf->folios[*pgbuf] = folio; - ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; - (*pgbuf)++; - if (++subpgoff == maxsubpgs) { - folio_put(folio); - folio = NULL; - subpgoff = 0; - mapidx += pages_per_huge_page(hpstate); - } - } - - if (folio) - folio_put(folio); - - return 0; -} - -static int handle_shmem_pages(struct udmabuf *ubuf, struct file *memfd, - pgoff_t offset, pgoff_t pgcnt, - pgoff_t *pgbuf) -{ - pgoff_t pgidx, pgoff = offset >> PAGE_SHIFT; - struct folio *folio = NULL; - - for (pgidx = 0; pgidx < pgcnt; pgidx++) { - folio = shmem_read_folio(memfd->f_mapping, pgoff + pgidx); - if (IS_ERR(folio)) - return PTR_ERR(folio); - - ubuf->folios[*pgbuf] = folio; - (*pgbuf)++; - } - - return 0; -} - static int check_memfd_seals(struct file *memfd) { int seals; @@ -321,16 +295,19 @@ static long udmabuf_create(struct miscdevice *device, struct udmabuf_create_list *head, struct udmabuf_create_item *list) { - pgoff_t pgcnt, pgbuf = 0, pglimit; + pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0; + long nr_folios, ret = -EINVAL; struct file *memfd = NULL; + struct folio **folios; struct udmabuf *ubuf; - int ret = -EINVAL; - u32 i, flags; + u32 i, j, k, flags; + loff_t end; ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL); if (!ubuf) return -ENOMEM; + INIT_LIST_HEAD(&ubuf->unpin_list); pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT; for (i = 0; i < head->count; i++) { if (!IS_ALIGNED(list[i].offset, PAGE_SIZE)) @@ -366,17 +343,44 @@ static long udmabuf_create(struct miscdevice *device, goto err; pgcnt = list[i].size >> PAGE_SHIFT; - if (is_file_hugepages(memfd)) - ret = handle_hugetlb_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - else - ret = handle_shmem_pages(ubuf, memfd, - list[i].offset, - pgcnt, &pgbuf); - if (ret < 0) + folios = kmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL); + if (!folios) { + ret = -ENOMEM; goto err; + } + end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1; + ret = memfd_pin_folios(memfd, list[i].offset, end, + folios, pgcnt, &pgoff); + if (ret < 0) { + kfree(folios); + goto err; + } + + nr_folios = ret; + pgoff >>= PAGE_SHIFT; + for (j = 0, k = 0; j < pgcnt; j++) { + ubuf->folios[pgbuf] = folios[k]; + ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT; + + if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) { + ret = add_to_unpin_list(&ubuf->unpin_list, + folios[k]); + if (ret < 0) { + kfree(folios); + goto err; + } + } + + pgbuf++; + if (++pgoff == folio_nr_pages(folios[k])) { + pgoff = 0; + if (++k == nr_folios) + break; + } + } + + kfree(folios); fput(memfd); } @@ -388,10 +392,9 @@ static long udmabuf_create(struct miscdevice *device, return ret; err: - while (pgbuf > 0) - folio_put(ubuf->folios[--pgbuf]); if (memfd) fput(memfd); + unpin_all_folios(&ubuf->unpin_list); kfree(ubuf->offsets); kfree(ubuf->folios); kfree(ubuf);