From patchwork Mon Feb 10 18:12:24 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Raphael S. Carvalho" X-Patchwork-Id: 13968334 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62B69C02198 for ; Mon, 10 Feb 2025 18:12:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E46E66B0083; Mon, 10 Feb 2025 13:12:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DCFC96B0088; Mon, 10 Feb 2025 13:12:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C220E6B0089; Mon, 10 Feb 2025 13:12:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A18A06B0083 for ; Mon, 10 Feb 2025 13:12:44 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 07BD5C03C9 for ; Mon, 10 Feb 2025 18:12:44 +0000 (UTC) X-FDA: 83104830648.29.5D4844F Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf24.hostedemail.com (Postfix) with ESMTP id D4822180017 for ; Mon, 10 Feb 2025 18:12:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=scylladb.com header.s=google header.b="OUrb/cOJ"; spf=pass (imf24.hostedemail.com: domain of raphaelsc@scylladb.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=raphaelsc@scylladb.com; dmarc=pass (policy=reject) header.from=scylladb.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739211161; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ghqe50qYZAx8/Up2NeOnbz1uVUWHYq1gR6XXZUhiI3o=; b=Fhj5GV6DVBBcPqLfCEYHP2iSIABcaHaBw124hSy3biltF3SsB4R3Okx2pDrZHqukERtyvd Dk7npz/Nv8t9FVu0auGHeisslf83FXBIjC1/6mnxBkSOmoKfV3Ul4F0XMsX4ztOF8J1+Ik C4WAvlwt7SXaTeYUw/B1rYQTxPq09dQ= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=scylladb.com header.s=google header.b="OUrb/cOJ"; spf=pass (imf24.hostedemail.com: domain of raphaelsc@scylladb.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=raphaelsc@scylladb.com; dmarc=pass (policy=reject) header.from=scylladb.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739211161; a=rsa-sha256; cv=none; b=btBW6i7qnJM9ka55Q2tg6D/QFFDv8ZEmkva/Bq1zwtW3Kbkg9F7vz1PqR3QzAaeubcQxUN 0OaoJU0kdertw5aJcpUFwAkpPltIxkGMgNjp26hzwXBCt1VvTRyFsYtbyvalkj0iMpizEV 89GTE3/Ej6Uo/Cbl3NXl4GT8xJeZo3I= Received: by mail-pj1-f53.google.com with SMTP id 98e67ed59e1d1-2f441791e40so6469748a91.3 for ; Mon, 10 Feb 2025 10:12:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=scylladb.com; s=google; t=1739211161; x=1739815961; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ghqe50qYZAx8/Up2NeOnbz1uVUWHYq1gR6XXZUhiI3o=; b=OUrb/cOJOgQG94nru8hOT8isaTOoePHRyJ869Mej1SlXhBDXkrApdk10GekSO0XRHJ OxtMoxAkmAKU+PSF6WsDPeah6NDI58f9VlSnYmZ8hcBOpR66yBmeIPiCXfPrhwioXHsU Qq4oN3rFMRCJA8tF2G3DSeBazKNg9tIzcmmeRia+yqSwlkhkSPbPd9MAlWd0tRAinOdc ++guXBCmPEGNlOijkdKSPRkSGZRumFEVqVH79YHWKuWh7Rj2Ij2VUEeY5aW5fVKj7z7q AV14oJb0Yv3mQ6bxQO4zWWfwZv47vYpjR2K41NpOZqeq791/7a+NYWW4vxeHVirc3AxC mwGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739211161; x=1739815961; h=content-transfer-encoding:cc:to:subject:message-id:date:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ghqe50qYZAx8/Up2NeOnbz1uVUWHYq1gR6XXZUhiI3o=; b=aFhj3NN/IyMH3pwgk7nlkGTx9CpvnOJAcNwer+pr0yFPVQODz4h01cEvaOkiH1H61i E54N/9FDCF4rgkr5A2vgMAZJO3MtUTGHf/znfl2Eh3JZZZOLPp1YcOhHTS5GgT/GmE6Q 8hD7QasxiUSLxvLiCIei4S/sU9rj2Yl5tyhXKBVJmK8LNAxu/ZdgHSub0nHvIbMX3d3o lHLuVhMSj2U05C+3afJjL/bk2XTch0W8KHoYIjnTLJvuYtCgITtxfujQ6tueGOw5yEGX XT7Nk8rXTxKIzVjbLDSS1ItjPivZXuAe5rdJqGVjgGwcNcasKCBhC5e3QMZKw1YYx3lj NJ/g== X-Forwarded-Encrypted: i=1; AJvYcCUZbV/r9NsyxJeO4oAjMwWMZoHCcBFAuPrESN3MnaGIh8xRFYOHoaPXWAPzGrGDUD7Y5oKUgToE3Q==@kvack.org X-Gm-Message-State: AOJu0YxiOt2t/p4iNA92uFF2A+U9GmzFH2xMpgRrjhLUZsGX/Pg6oAwG kG3LpV57b7bNPhTHf0vLuR5oj9SeCV0YBEFDXfymJoHwdiSU4HKw9TSZ5H9HLyIPbQJkqf0t4eq MZQs+aVIuDJqEiItBNshA2n6KVcPg/Rhmfi4CgjDyV2atC+5tDwF8Xx6ftyRERJhTz7FuOwKi3g bACm7Nlj918DfoZgnm9o+kN1r7ar9wFFs1IynwQQsty6LRg9R1v/S7iHUdMpGoUHO9ajIp1Mzj6 uZtJ+vcUFKEd9MBjhvTsZp59Nd2+Xw+xvUlMvcrRr8iwsqI0oULrjqH3MBKk9f/asno40MbPV6m f61CD8BA5MwCXzN89k5MlmFd9FhQ/YmKcCjNfXRyITRsJphDZyw0CB9i X-Gm-Gg: ASbGncsWAvf79nWIAGP0LHWvz8dlyJpNHZssXlR/J8jjlRtGUk4Oh/uCt6rIHp/aa18 g7VVJnBh9NHATyjRg0+7rxa1HfBQU2tfk3wHNxmoEgyKzPMHvnVm6B4MH6hysd+zEz2mBMQRJGl M2s4AnMIw9Pbs8cw== X-Google-Smtp-Source: AGHT+IH42tWjnTVpToH+j5t5Jo4/pmVQaB9GJtJldKzz+C6W598q1wQBxNOoLHTh4L4xQClrCInTELzbqnmG1Vghncw= X-Received: by 2002:a17:90b:2e47:b0:2fa:20f4:d277 with SMTP id 98e67ed59e1d1-2fa243e39dbmr22125692a91.24.1739211160478; Mon, 10 Feb 2025 10:12:40 -0800 (PST) MIME-Version: 1.0 From: "Raphael S. Carvalho" Date: Mon, 10 Feb 2025 15:12:24 -0300 X-Gm-Features: AWEUYZla75N_T1omj1dgRRrPxpzZF8m_so82a-s0ma_JD4_WjgtvDUIb2E1F1E8 Message-ID: Subject: Possible regression with buffered writes + NOWAIT behavior, under memory pressure To: linux-xfs@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Cc: djwong@kernel.org, Dave Chinner , hch@lst.de, Avi Kivity X-CLOUD-SEC-AV-Sent: true X-CLOUD-SEC-AV-Info: scylladb,google_mail,monitor X-Gm-Spam: 0 X-Gm-Phishy: 0 X-CLOUD-SEC-AV-Sent: true X-CLOUD-SEC-AV-Info: scylla,google_mail,monitor X-Gm-Spam: 0 X-Gm-Phishy: 0 X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D4822180017 X-Stat-Signature: 7bjn17rrcy1jrq8tu3ey36fkf84eskch X-HE-Tag: 1739211161-246842 X-HE-Meta: U2FsdGVkX1+fFcvY9OPmxjwCVWJRTMlGx0g4rBoe7VdQFeyrnbkA6L+lJpbt82MKXx2Lm2DZMo9f5NvryJhsE3HYyUTZI2hL7pbNJE5NrVIwLDjuV/v86sc70haLghj0CYUmkkcmf+n2S72vm6pJh0QzxQD54/QeISgQmtGZT1wlGxz7EBrebnavm/azZfKsRM0M2weQoGFHxU2rcmte4k8kULQGqLLnQlDkzOxRb55MS7VPtBqj2DVm57wmwjxOCdSMcX2P/CWjNXf+aqqW15kVft6I3kPYUySAycdDartMci6fSA15NMh5CYiFzPuWvhewgGofmSCLdymC/nWC8hN3w/eymW/UzxvDAMiOEAXKT1+u+rgIJhNQr2sjVYpvqV69DgJvpUugsXP6WOL51hSQ5m2oi6gFTTe5hDekVhF/ZYigj1LnB61QQtF/pxTpp3gzEFA6qVZuijkWo+fVn7UEJnFQO4wTs6rlGl0JrVMZ1MIWb/DEkpG4tlGhe4EewAXj0ese7OxbyvJ2tg2ye+4mo88/9gjDCoc8En/tU3sD4EyL8/NeMCy0gntxZSIqlOjZHULTQwTtgZ1kNA2qo49GOIjebIhFgmXrEthe1giPkqV4pYEqPB+3EyCLyNhhF+S71LFrHaLngVHJYzdo7izW6UILW/6CxgkYidZ9kxuO1JQDHcww5wjP11S8TqEUaBxmm6AOrKU6F++YLzCe0wydtDt0l+6O4C21VMzKb8kafhth3BWG9eZoS9Ng64zcNOzlUFHNAOLcI2CHPWY5v+bPrV99kggRJMku3XGx6vkz0veBf5ZLRscK7dd7KxBsexTkPUIkH6iW+GRX4tvXfWC3tCd5GZlegI8cQJu6zxcKxRyklJA53iAM3lMHF7qZYIOAzaD2sD0jy2fqH/aYIejHdUSZIu9HvMWIXXDa8rnl9ZEkaidRzQUj6ilNE7G+k+9iuk18LVvIs9ZoYg3 9oZvrt3X dH/q+/bE/4AmJ56Pe+IQcsEk+5hiRT9WWhd3IIQ2jEBVlEm/r9G5k/Fr9Alc8tf6HE6AZj53vPIO/yp5pCcys4Iivw5MVUOHTXfMGmDOsYpXP5DSYALrTbZoNFnZa0pv2eKr9BRCJImiW0jR2aJ25/9UQUyE45uxqrX5Xpc3uvwqmjDPrmeoYBtpxaPRkIEMGAF5o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: While running scylladb test suite, which uses io_uring + buffered writes + XFS, the system was spuriously returning ENOMEM, despite there being plenty of available memory to be reclaimed from the page cache. FWIW, I am running: 6.12.9-100.fc40.x86_64 Tracing showed io_uring_complete failing the request with ENOMEM: # cat /sys/kernel/debug/tracing/trace | grep "result -12" -B 100 | grep "0000000065b91cd1" reactor-1-707139 [000] ..... 46737.358518: io_uring_submit_req: ring 00000000e52339b8, req 0000000065b91cd1, user_data 0x50f0001e4000, opcode WRITE, flags 0x200000, sq_thread 0 reactor-1-707139 [000] ..... 46737.358526: io_uring_file_get: ring 00000000e52339b8, req 0000000065b91cd1, user_data 0x50f0001e4000, fd 45 reactor-1-707139 [000] ...1. 46737.358560: io_uring_complete: ring 00000000e52339b8, req 0000000065b91cd1, user_data 0x50f0001e4000, result -12, cflags 0x0 extra1 0 extra2 0 That puzzled me. Using retsnoop, it pointed to iomap_get_folio: 00:34:16.180612 -> 00:34:16.180651 TID/PID 253786/253721 (reactor-1/combined_tests): entry_SYSCALL_64_after_hwframe+0x76 do_syscall_64+0x82 __do_sys_io_uring_enter+0x265 io_submit_sqes+0x209 io_issue_sqe+0x5b io_write+0xdd xfs_file_buffered_write+0x84 iomap_file_buffered_write+0x1a6 32us [-ENOMEM] iomap_write_begin+0x408 iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x… pos=0 len=4096 foliop=0xffffb32c296b7b80 ! 4us [-ENOMEM] iomap_get_folio iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x… pos=0 len=4096 Another trace shows iomap_file_buffered_write with ki_flags 2359304, which translate into (IOCB_WRITE & IOCB_ALLOC_CACHE & IOCB_NOWAIT) And flags 33 in iomap_get_folio means IOMAP_NOWAIT, which makes sense since XFS translates IOCB_NOWAIT into IOMAP_NOWAIT for performing the buffered write through iomap subsystem: fs/iomap/buffered-io.c- if (iocb->ki_flags & IOCB_NOWAIT) fs/iomap/buffered-io.c: iter.flags |= IOMAP_NOWAIT; We know io_uring works by first attempting to write with IOCB_NOWAIT, and if it fails with EAGAIN, it falls back to worker thread without the NOWAIT semantics. iomap_get_folio(), once called with IOMAP_NOWAIT, will request the allocation to follow GFP_NOWAIT behavior, so allocation can potentially fail under pressure. Coming across 'iomap: Add async buffered write support', I see Darrick wrote: "FGP_NOWAIT can cause __filemap_get_folio to return a NULL folio, which makes iomap_write_begin return -ENOMEM. If nothing has been written yet, won't that cause the ENOMEM to escape to userspace? Why do we want that instead of EAGAIN?" In the patch ''mm: return an ERR_PTR from __filemap_get_folio', I see the following changes: folio = filemap_alloc_folio(alloc_gfp, order); Am I missing something? Regards, Raphael --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -468,19 +468,12 @@ EXPORT_SYMBOL_GPL(iomap_is_partially_uptodate); struct folio *iomap_get_folio(struct iomap_iter *iter, loff_t pos) { unsigned fgp = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE | FGP_NOFS; - struct folio *folio; if (iter->flags & IOMAP_NOWAIT) fgp |= FGP_NOWAIT; - folio = __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT, + return __filemap_get_folio(iter->inode->i_mapping, pos >> PAGE_SHIFT, fgp, mapping_gfp_mask(iter->inode->i_mapping)); - if (folio) - return folio; - - if (iter->flags & IOMAP_NOWAIT) - return ERR_PTR(-EAGAIN); - return ERR_PTR(-ENOMEM); } This leads to me believe we have a regression in this area, after that patch, since iomap_get_folio() is no longer returning EAGAIN with IOMAP_NOWAIT, if __filemap_get_folio() failed to get a folio. Now it returns ENOMEM unconditionally. Since we pushed the error picking decision to __filemap_get_folio, I think it makes sense for us to patch it such that it returns EAGAIN if allocation failed (under pressure) because IOMAP_NOWAIT was requested by its caller and allocation is not allowed to block waiting for reclaimer to do its thing. A possible way to fix it is this one-liner, but I am not well versed in this area, so someone may end up suggesting a better fix: diff --git a/mm/filemap.c b/mm/filemap.c index 804d7365680c..9e698a619545 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1964,7 +1964,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index, do { gfp_t alloc_gfp = gfp; - err = -ENOMEM; + err = (fgp_flags & FGP_NOWAIT) ? -ENOMEM : -EAGAIN; if (order > min_order) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;