From patchwork Tue Nov 21 12:30:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charan Teja Kalla X-Patchwork-Id: 13462972 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ABA6C61D94 for ; Tue, 21 Nov 2023 12:31:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AB486B0411; Tue, 21 Nov 2023 07:31:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 65C786B041A; Tue, 21 Nov 2023 07:31:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 524066B0449; Tue, 21 Nov 2023 07:31:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 41FCF6B0411 for ; Tue, 21 Nov 2023 07:31:37 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0565D1CB405 for ; Tue, 21 Nov 2023 12:31:37 +0000 (UTC) X-FDA: 81481897434.20.E17A129 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf30.hostedemail.com (Postfix) with ESMTP id CA4E38000A for ; Tue, 21 Nov 2023 12:31:34 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=o+Srooa+; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf30.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700569895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=UT6PJ7wxHz9LTqez0TYiCSC4HK4PIiOlBrnHF6eVSgw=; b=AKMmSw7MPgJAU84MkAWigMjHsjVV9Ua0548DDRP4qCc5h+q0emnpO2LnZaKwIHA0oLTao1 GtdA6Z3UJP91hLm9sgX2QW5tyYs30yb6Tex324B+FkQtmJJscC+kZdExlWUeWyUtpTH1cb nM1PkYTl3KbQhYlXsmfrfKWCWvNq3a0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=o+Srooa+; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf30.hostedemail.com: domain of quic_charante@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_charante@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700569895; a=rsa-sha256; cv=none; b=OfqZBzW6ER4LSG5y9cbzRq8Ymbz30tho6p/AAK5pfxFtLmNBLcNsuyHYAmRI042ol1m5Wi 6qFfZiZ/W7ZNlhqr2KqNKpBGmdOj2RR8gmQlDU+JNGqL+ByjAvHBjmiKwg1qnIZOQboMhw zWywWgySG3U4JM5cFGJC0vk4DYfhkjs= Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3ALA9TR1011167; Tue, 21 Nov 2023 12:31:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=qcppdkim1; bh=UT6PJ7wxHz9LTqez0TYiCSC4HK4PIiOlBrnHF6eVSgw=; b=o+Srooa+JAa3gYEL1VkgyaI45ziZ+Pl3SVwKf3nNzCPMjVnMUy3QkalDYElsBqxa+wX7 mqalnN8yCov1jhHm3jZUy2KmHpq+Bwjk4UOETdJm+/J0VegsaK4hziNOUwmBT4kkNWjL Dy2LrH93hHh2t+xlIkyNcCl6giGiu77EYoAGOT3lcNvCqv7q/HSfHzHwAjlmPCj5uDuN RhaT2nLDynOOeTOL+nSFPQBALRnLgbY2JFrisr0xyA7WNvQLWNzgjGb1ck73Ejj+hYAC ZgrkUnD+WL0XkTotfjgnUdskAb02u9GUdN3YUz5XNO4LPfaxHRH8nR0TgpsMNp2Wt0Qq eg== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3ugrk20v4x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Nov 2023 12:31:10 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 3ALCV98M019727 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Nov 2023 12:31:09 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 21 Nov 2023 04:31:06 -0800 From: Charan Teja Kalla To: , , , , , , CC: , , Charan Teja Kalla Subject: [PATCH] [RFC] mm: migrate: rcu stalls because of invalid swap cache entries Date: Tue, 21 Nov 2023 18:00:40 +0530 Message-ID: <1700569840-17327-1-git-send-email-quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: Pxa7keu9mkjU_iee9OT6SYcjyyfo_Xfj X-Proofpoint-ORIG-GUID: Pxa7keu9mkjU_iee9OT6SYcjyyfo_Xfj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-21_05,2023-11-21_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 clxscore=1011 bulkscore=0 priorityscore=1501 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311210098 X-Rspamd-Queue-Id: CA4E38000A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: mn1hukbqsx5fpm6khsx7t9n3rg3fqamy X-HE-Tag: 1700569894-540079 X-HE-Meta: U2FsdGVkX1+1bOE1eRZVWggzT6uNFHhR5xZwitFTFbRe4GGm+1C3oIcdRH/Guu7jukrGcg+fASFIFTFVDRO8r2cNw96bzIKPgYttYsGv/L8n9WpXeABn1x8m1jutlFQvTYHjrJQuAXhM6kXazKTkgXgaXN8skWVhOh4204Z3HxcOy6zffQGuYZSwkhFVIqTk1tH+pcBFJDTScxo9vMxUBpSWAehrzLtSw9zWF4O4f3xYFdRKcga/6PsiP8OhhQOixh2Cxq+rVDlAeRyDU8jfKLdsSINXvmNFzRS2ncds/KKNVkWNwI5ORIvSO8kYl1yK94QhAGbX6MY5d9ZCCpasClyRIx3qb+bm6z1HsSYWh70+roulhqTTlX+XxdSdTkebaU+1KGo9J+dYiSM1lwXAfQ2gwm703QRWifRlH2xSy/OzegLR8/IGlFfG0O5RSttDEgaIQVJT1nylBtfNt2s87nRji4a/wUHfboKQB7bOsDrr/0VTS5oTbMFa4E2UP5wSMJY9R106vxC9Vfmjp65d7pwZb32zI/iC522HujXN/tPyvwkep4wH/u880eTjikWaWVOZYNNSps4PpuaKjm/nZ7Wdd6JJHIagoyNelSWo53y1UEON36xyg299Rgj7nDqLsa0/Uyk2PPEaDh7lz2tAF0dBSMjNZZnl/Rn0IlKpn7i8YDYsqabX9m+BF4/6bn+qZaMDnTvivhYeleZtu8vQ9UB2KegqWL5TlZlkc/tVe9gSlctZ/3hZBM5c/oKBZQ3iGHc8ylxU9XfWKaDLnsUr52rSeva1iXx71ohQ41JCRSCOQmhf7wzZT6l4A+Sx/jzQWuRMk04W9aEqrwuQfqR6x/fVG7N9qtk3SKc7rg+0x6RmejhEu/lrk2zOrnt6lZn6udjdFK/UPegGXlvVZGzW0Rziijl61FwuqbEgT6qkjmniX3drtPGZdFX1CICWJGzSpKuxzytoeo+Kf8j1l/J LJzxnkhv ZvqOWYoS4MG19md3jr0hnMcd0WdyjyL05dB1Jsg0QwlIFGt3yFmNj7kuERHqlo51uxQ9EmLA00TNMjrYm08m65cqI/ieyda0HDi8iAD8ZtcrgllGC2pGZsNWAbVFzE8jm5/gSxEGTgGrp2v2nGQX5WmQwLEHfIC0uhJYvR6lVpyyQuuOo2BNs/IRNudLoaUZjX5eXe+cRrvXHf+oZ5Updo2BnrDxxghtqulzfpM3Y/rU4O+TsB+VNKq5LKx/c9oWOJLzRyLjPz0f3/pb0R/l2E9WVkvDdlOsuPpC/qg9Qb2SuuzmKm8Tw6kURuwtc8BOWFWqgGct7PFu3DxOMspETTqhm9jnKQu4g/cuBBJq0dTc/KPs3BSY0Bs2mkQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The below race on a folio between reclaim and migration exposed a bug of not populating the swap cache with proper folio resulting into the rcu stalls: Reclaim migration (from mem offline) -------- ------------------ 1) folio_trylock(); 2) add_to_swap(): a) get swap_entry to store the thp folio through folio_alloc_swap(). b) do add_to_swap_cache() on a folio, which fills the xarray with folio corresponding to indexes of swap entries. Also dirty this folio. c) try_to_unmap(folio, TTU_SPLIT_HUGE_PMD) which splits the pmd, unmaps the folio and replace the mapped entries with swap entries. d) as the folio is dirty do, pageout()::mapping->a_ops->writepage(). This calls swap_writepage() which unlock the page from 1) and do submit_bio(). 3) Since the page can still be under writeback, add the folio added back to the LRU. 4) As the folio now on LRU, it is visible to migration thus will endup in offline_pages()->migrate_pages(): a) isolate the folio. b) do __unmap_and_move(): 1) lock the folio and wait till writeback is done. 2) Replace the eligible pte entries with migrate and then issue the move_to_new_folio(), which calls migrate_folio()-> folio_migrate_mapping(), for the pages on the swap cache which just replace a single swap cache entry source folio with destination folio and can endup in freeing the source folio. Now A process in parallel can endup in do_swap_page() which will try read the stale entry(of source folio) after step4 above and thus will endup in the below loop with rcu lock held. mapping_get_entry(): rcu_read_lock(); repeat: xas_reset(&xas); folio = xas_load(&xas); if (!folio || xa_is_value(folio)) goto out; if (!folio_try_get_rcu(folio)) goto repeat; folio_try_get_rcu(): if (unlikely(!folio_ref_add_unless(folio, count, 0))) { /* Either the folio has been freed, or will be freed. */ return false; Because of the source folio is freed in 4.b.2) The above loop can continue till the destination folio too is reclaimed where it is removed from the swap cache and then set the swap cache entry to zero where the xas_load() return 0 thus exit. And this destination folio can be either removed immediately as part of the reclaim or can stay longer in the swap cache because of parallel swapin happen between 3) and 4.b.1)(whose valid pte mappings, pointing to the source folio, is replaced with the destination folio). It is the latter case which is resulted into the rcu stalls. The similar sort of issue also reported sometime back and is fixed in [1]. This issue seems to be introduced from the commit 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache"), in the function folio_migrate_mapping()[2]. Since a large folio to be migrated and present in the swap cache can't use the multi-index entries, and migrate code uses the same folio_migrate_mapping() for migrating this folio, any inputs you can provide to fix this issue, please? What I have thought is, if the adjacent entry in the xarray is not a sibling, then assume that it is not a multi-index entry thus store as 2^N consecutive entries. [1] https://lore.kernel.org/all/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp/T/#u [2] https://lore.kernel.org/linux-mm/20210715033704.692967-128-willy@infradead.org/#Z31mm:migrate.c Signed-off-by: Charan Teja Kalla --- mm/migrate.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/mm/migrate.c b/mm/migrate.c index 35a8833..05cb4a9b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -403,6 +403,7 @@ int folio_migrate_mapping(struct address_space *mapping, XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct zone *oldzone, *newzone; int dirty; + void *entry; int expected_count = folio_expected_refs(mapping, folio) + extra_count; long nr = folio_nr_pages(folio); @@ -454,6 +455,16 @@ int folio_migrate_mapping(struct address_space *mapping, } xas_store(&xas, newfolio); + entry = xas_next(&xas); + + if (nr > 1 && !xa_is_sibling(entry)) { + int i; + + for (i = 1; i < nr; ++i) { + xas_store(&xas, newfolio); + xas_next(&xas); + } + } /* * Drop cache reference from old page by unfreezing