From patchwork Sun Sep 4 02:16:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dan Williams X-Patchwork-Id: 12965092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A21A0C38145 for ; Sun, 4 Sep 2022 02:16:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9E3B88016E; Sat, 3 Sep 2022 22:16:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 993268015A; Sat, 3 Sep 2022 22:16:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 859F98016E; Sat, 3 Sep 2022 22:16:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7319D8015A for ; Sat, 3 Sep 2022 22:16:04 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 429181C7085 for ; Sun, 4 Sep 2022 02:16:04 +0000 (UTC) X-FDA: 79872787848.13.F781A02 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf02.hostedemail.com (Postfix) with ESMTP id 86D248006B for ; Sun, 4 Sep 2022 02:16:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662257762; x=1693793762; h=subject:from:to:cc:date:message-id:mime-version: content-transfer-encoding; bh=fnBW3tH05kzrxa7RlM0wu6waCCayGzZFmKPuKurUqcI=; b=R6ug0kiBz1bMQTBQUT+2Wf4e3JhwPLNgzDPDmj083tqx4AGQXYYw+yMr B0YD9JkClfqU73dfxxIJCZpuEMOD3cHrBx9Y+u8dBq5BenQMjBl0Gno52 lwwsaUlNnSww8vjFIadEmimlk3G5j9MTDBk0uoExHbsiUaGcPWG+o9qkK o/4eyKHr7x8WRkgJLAQ+rurBvFkUkYq1uJkgvaTAw3MQ2hwTTQ1omqaA+ 1V9WKUqU0ZJGIETS2j/4eGIPqeOlWfBLF3ESYWa6LGFIoc2YtGFzKmmjC O1pV7KsJqRbVgxNb6rIR0A56c4PB5K00sTp2lQxGwdY1p51yEndxhfbbD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10459"; a="279219061" X-IronPort-AV: E=Sophos;i="5.93,288,1654585200"; d="scan'208";a="279219061" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2022 19:16:00 -0700 X-IronPort-AV: E=Sophos;i="5.93,288,1654585200"; d="scan'208";a="643384361" Received: from pg4-mobl3.amr.corp.intel.com (HELO dwillia2-xfh.jf.intel.com) ([10.212.132.198]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2022 19:16:00 -0700 Subject: [PATCH 00/13] Fix the DAX-gup mistake From: Dan Williams To: akpm@linux-foundation.org Cc: Jason Gunthorpe , Jan Kara , Christoph Hellwig , "Darrick J. Wong" , John Hubbard , Matthew Wilcox , linux-mm@kvack.org, nvdimm@lists.linux.dev, linux-fsdevel@vger.kernel.org Date: Sat, 03 Sep 2022 19:16:00 -0700 Message-ID: <166225775968.2351842.11156458342486082012.stgit@dwillia2-xfh.jf.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662257764; a=rsa-sha256; cv=none; b=uyco7kq7V3wiaYT/CZDNGGyyodmZ6fWOXd89TO1nAmdfKWJnODDwEw3npUDrXSFX1bTqq9 w4iLudnGtnbiJVWzUjHI6e4cZYVYBvAJ8rUA3SeakVGqq0pkOYq8ejrWyHkGmt+RUps8nX WkYt5hvPCFR3rN3kN59MznEJr41UKD8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=R6ug0kiB; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662257764; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=HIZ9BXhxgzFxvGxFFqV3y78J/2PtS9kL3FCn9w+R6l0=; b=irjb+RP6ns/QTelhaf6w69FUMXUZhCCd8ly3NXD1rvwuOx6Sg3w0EHmAp1+RGlAENZycgM lX3nfunYtGIaog29l2GtW84EmK9VGLcheCxS4lyeVagdx1BbDueBQG0tQ/rHfK6ZYTRJvW MaKy81nWy3Hdl8qHW+xux9DP0ElRSvA= X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=R6ug0kiB; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf02.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 86D248006B X-Stat-Signature: kq9qnrswic9ert5g1cd7muax1ehyuhs4 X-HE-Tag: 1662257762-674642 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: tl;dr: Move the pin of 'struct dev_pagemap' instances from gup-time to map time, move the unpin of 'struct dev_pagemap' to truncate_inode_pages() for fsdax and devdax inodes, and use page_maybe_dma_pinned() to determine when filesystems can safely truncate DAX mappings vs DMA. The longer story is that DAX has caused friction with folio development and other device-memory use cases due to its hack of using a page-reference count of 1 to indicate that the page is DMA idle. That situation arose from the mistake of not managing DAX page reference counts at map time. The lack of page reference counting at map time grew organically from the original DAX experiment of attempting to manage DAX mappings without page structures. The page lock, dirty tracking and other entry management was supported sans pages. However, the page support was then bolted on incrementally so solve problems with gup, memory-failure, and all the other kernel services that are missing when a pfn does not have an associated page structure. Since then John has led an effort to account for when a page is pinned for DMA vs other sources that elevate the reference count. The page_maybe_dma_pinned() helper slots in seamlessly to replace the need to track transitions to page->_refount == 1. The larger change in this set comes from Jason's observation that inserting DAX mappings without any reference taken is a bug. So dax_insert_entry(), that fsdax uses, is updated to take 'struct dev_pagemap' references, and devdax is updated to reuse the same. This allows for 'struct dev_pagemap' manipulation to be self-contained to DAX-specific paths. It is also a foundation to build towards removing pte_devmap() and start treating DAX pages as another vm_normal_page(), and perhaps more conversions of the DAX infrastructure to reuse typical page mapping helpers. One of the immediate hurdles is the usage of pmd_devmap() to distinguish large page mappings that are not transparent huge pages. --- Dan Williams (13): fsdax: Rename "busy page" to "pinned page" fsdax: Use page_maybe_dma_pinned() for DAX vs DMA collisions fsdax: Delete put_devmap_managed_page_refs() fsdax: Update dax_insert_entry() calling convention to return an error fsdax: Cleanup dax_associate_entry() fsdax: Rework dax_insert_entry() calling convention fsdax: Manage pgmap references at entry insertion and deletion devdax: Minor warning fixups devdax: Move address_space helpers to the DAX core dax: Prep dax_{associate,disassociate}_entry() for compound pages devdax: add PUD support to the DAX mapping infrastructure devdax: Use dax_insert_entry() + dax_delete_mapping_entry() mm/gup: Drop DAX pgmap accounting .clang-format | 1 drivers/Makefile | 2 drivers/dax/Kconfig | 6 drivers/dax/Makefile | 1 drivers/dax/bus.c | 2 drivers/dax/dax-private.h | 1 drivers/dax/device.c | 73 ++- drivers/dax/mapping.c | 1020 ++++++++++++++++++++++++++++++++++++++++++++ drivers/dax/super.c | 2 fs/dax.c | 1049 ++------------------------------------------- fs/ext4/inode.c | 9 fs/fuse/dax.c | 10 fs/xfs/xfs_file.c | 8 fs/xfs/xfs_inode.c | 2 include/linux/dax.h | 124 ++++- include/linux/huge_mm.h | 23 - include/linux/memremap.h | 24 + include/linux/mm.h | 58 +- mm/gup.c | 92 +--- mm/huge_memory.c | 54 -- mm/memremap.c | 31 - mm/swap.c | 2 22 files changed, 1326 insertions(+), 1268 deletions(-) create mode 100644 drivers/dax/mapping.c base-commit: 1c23f9e627a7b412978b4e852793c5e3c3efc555