From patchwork Fri Aug 30 05:16:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dev Jain X-Patchwork-Id: 13784283 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9C224CA0EE0 for ; Fri, 30 Aug 2024 05:17:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: MIME-Version:Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-Type: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References:List-Owner; bh=q5Bbo9z3iiieq1lwlFSsauTrsxkXFTD0RbIPhtFlIro=; b=kqhayNq1ZDYf43w5KZ9DKGfcgp ni5VhbiSU/KqM0bzrbeIzxQfLptTLqtyygbv2ECKnP14wViZTdQ9v5IAQhXCRS/pW7Yfgkorz5b1i 7NrpdcNW39OF4d8rtbqYUV5BtOWK55JM5Y+VnB1IQ0T6mJEIbT0q7sfIm3lk2agjR8pGza5lI6hie O8gqv4kpcRm8iZ9uALYoDlkWVcj/Ow24gqEi7k4/g2mcjr+m+2VSuPayto4fT2Uh34DUAToGjNylf F+VYBdGKqGGWS5KDhD7MLrcXDLEyFfhUUWCD7rbgOV/1FvSGvSSFBwvMZW0AqIf0BLx/xjGTQyhtp KuHgXmXQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sju0j-00000004ldR-40Ci; Fri, 30 Aug 2024 05:17:25 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sjtzr-00000004lUR-2xl2 for linux-arm-kernel@lists.infradead.org; Fri, 30 Aug 2024 05:16:33 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 270A51063; Thu, 29 Aug 2024 22:16:56 -0700 (PDT) Received: from e116581.blr.arm.com (e116581.arm.com [10.162.40.24]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 855F83F66E; Thu, 29 Aug 2024 22:16:19 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, shuah@kernel.org, david@redhat.com, willy@infradead.org Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, osalvador@suse.de, baolin.wang@linux.alibaba.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, ioworker0@gmail.com, gshan@redhat.com, mark.rutland@arm.com, kirill.shutemov@linux.intel.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, broonie@kernel.org, mgorman@techsingularity.net, ying.huang@intel.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Dev Jain Subject: [PATCH] selftests/mm: Relax test to fail after 100 migration failures Date: Fri, 30 Aug 2024 10:46:09 +0530 Message-Id: <20240830051609.4037834-1-dev.jain@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240829_221631_812205_6E705470 X-CRM114-Status: GOOD ( 15.13 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org It was recently observed at [1] that during the folio unmapping stage of migration, when the PTEs are cleared, a racing thread faulting on that folio may increase the refcount of the folio, sleep on the folio lock (the migration path has the lock), and migration ultimately fails when asserting the actual refcount against the expected. Thereby, the migration selftest fails on shared-anon mappings. The above enforces the fact that migration is a best-effort service, therefore, it is wrong to fail the test for just a single failure; hence, fail the test after 100 consecutive failures (where 100 is still a subjective choice). Note that, this has no effect on the execution time of the test since that is controlled by a timeout. [1] https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/ Signed-off-by: Dev Jain Suggested-by: David Hildenbrand Reviewed-by: Ryan Roberts Tested-by: Ryan Roberts --- The above patch was part of the following: https://lore.kernel.org/all/20240809103129.365029-1-dev.jain@arm.com/ I decided to send it separately since it should be applied nevertheless. tools/testing/selftests/mm/migration.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/mm/migration.c b/tools/testing/selftests/mm/migration.c index 6908569ef406..64bcbb7151cf 100644 --- a/tools/testing/selftests/mm/migration.c +++ b/tools/testing/selftests/mm/migration.c @@ -15,10 +15,10 @@ #include #include -#define TWOMEG (2<<20) -#define RUNTIME (20) - -#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) +#define TWOMEG (2<<20) +#define RUNTIME (20) +#define MAX_RETRIES 100 +#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) FIXTURE(migration) { @@ -65,6 +65,7 @@ int migrate(uint64_t *ptr, int n1, int n2) int ret, tmp; int status = 0; struct timespec ts1, ts2; + int failures = 0; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) return -1; @@ -79,13 +80,17 @@ int migrate(uint64_t *ptr, int n1, int n2) ret = move_pages(0, 1, (void **) &ptr, &n2, &status, MPOL_MF_MOVE_ALL); if (ret) { - if (ret > 0) + if (ret > 0) { + /* Migration is best effort; try again */ + if (++failures < MAX_RETRIES) + continue; printf("Didn't migrate %d pages\n", ret); + } else perror("Couldn't migrate pages"); return -2; } - + failures = 0; tmp = n2; n2 = n1; n1 = tmp;