From patchwork Tue Oct 6 00:48:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Faria de Oliveira X-Patchwork-Id: 11819249 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43CE46CB for ; Tue, 6 Oct 2020 21:18:55 +0000 (UTC) Received: from userp2120.oracle.com (userp2120.oracle.com [156.151.31.85]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EB004206BE for ; Tue, 6 Oct 2020 21:18:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EB004206BE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=canonical.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=ocfs2-devel-bounces@oss.oracle.com Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 096LFS9x141921; Tue, 6 Oct 2020 21:18:38 GMT Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by userp2120.oracle.com with ESMTP id 33xhxmxfy3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 06 Oct 2020 21:18:38 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 096LG9ap193711; Tue, 6 Oct 2020 21:16:38 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userp3030.oracle.com with ESMTP id 33y37xkmnc-1 (version=TLSv1 cipher=AES256-SHA bits=256 verify=NO); Tue, 06 Oct 2020 21:16:38 +0000 Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1kPuH2-00055K-4O; Tue, 06 Oct 2020 14:13:28 -0700 Received: from aserp3030.oracle.com ([141.146.126.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1kPbA7-0003gD-Ni for ocfs2-devel@oss.oracle.com; Mon, 05 Oct 2020 17:49:03 -0700 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0960VFxv190910 for ; Tue, 6 Oct 2020 00:49:03 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp3030.oracle.com with ESMTP id 33y2vm9ua1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 06 Oct 2020 00:49:03 +0000 Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 0960YD1Q027109 for ; Tue, 6 Oct 2020 00:49:02 GMT Received: from youngberry.canonical.com (youngberry.canonical.com [91.189.89.112]) by userp2030.oracle.com with ESMTP id 33xfbgw3fc-1 for ; Tue, 06 Oct 2020 00:49:02 +0000 Received: from mail-qv1-f71.google.com ([209.85.219.71]) by youngberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kPbA3-0004WJ-PR for ocfs2-devel@oss.oracle.com; Tue, 06 Oct 2020 00:48:59 +0000 Received: by mail-qv1-f71.google.com with SMTP id t7so7165939qvz.5 for ; Mon, 05 Oct 2020 17:48:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=M4L0gxyNWJ44GCTGybDq1XuqV+pDl5wyu3pRKrP7PvM=; b=Qrx8W0rfgN7zfH0mCT0Gvnm1Un78p1qoPf6ds6YBxzZqf2aM9ifLQpBHwD/C/tkjxw txwXuHoLczYH0pJusxdUgpLTMn6GXJjLtp5QynIQlkYUWVBfhEEWgNlwuTKG4X4rGApF t3Cyx6MAii3rdoUvs55uxLM6CR45uDQ1PNizKorxe4TFbv0HlJ8m9OGWNhYCgzG8zddC cR4ovrEqazIOwtdRIEGvtg5SOGLeujT5+dPIHd4cX6GY12oHEYjjD8JtLOSSP6nImbTm OFmlvfMvsw2LalMdB7NSXfJLURXywDG+ueyIlPhmbb1mNjIWTQvcj+lNCFVAjTVtfoS1 ldHw== X-Gm-Message-State: AOAM533BxaYH9ayBz9q9cSHT48a4CLyGrQfFzDRi/fqicJ4AqhjJaQcF zhwBQ5IczOkygnf4v1aAsL3dHzp3tdh83sS7FlRpkrAWlNwuyFwIRd/oWs6bQqsdR9zz7gd4R0R wvLTbZuS6IzHQnh1qcLPZJNWeoFO0tgvfF8xOkVY= X-Received: by 2002:aed:2786:: with SMTP id a6mr2675405qtd.92.1601945338558; Mon, 05 Oct 2020 17:48:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyDhAj1WgfHdPqHMpVrxVlAVWpeyTVJh0BtqWvI5sQK6pUvWSC9VjTkcgDY7J9EiHW+mNYLIA== X-Received: by 2002:aed:2786:: with SMTP id a6mr2675388qtd.92.1601945338263; Mon, 05 Oct 2020 17:48:58 -0700 (PDT) Received: from localhost.localdomain ([201.82.49.101]) by smtp.gmail.com with ESMTPSA id l125sm1355322qke.23.2020.10.05.17.48.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Oct 2020 17:48:57 -0700 (PDT) From: Mauricio Faria de Oliveira To: linux-ext4@vger.kernel.org, ocfs2-devel@oss.oracle.com Date: Mon, 5 Oct 2020 21:48:40 -0300 Message-Id: <20201006004841.600488-4-mfo@canonical.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20201006004841.600488-1-mfo@canonical.com> References: <20201006004841.600488-1-mfo@canonical.com> MIME-Version: 1.0 X-PDR: PASS X-Source-IP: 91.189.89.112 X-ServerName: youngberry.canonical.com X-Proofpoint-SPF-Result: None X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9765 signatures=668680 X-Proofpoint-Spam-Details: rule=tap_notspam policy=tap score=0 spamscore=0 mlxscore=0 clxscore=255 impostorscore=0 lowpriorityscore=0 priorityscore=100 adultscore=0 bulkscore=0 malwarescore=0 suspectscore=0 phishscore=0 mlxlogscore=809 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010060001 X-Spam: Clean X-Mailman-Approved-At: Tue, 06 Oct 2020 14:13:27 -0700 Cc: Andreas Dilger , dann frazier , Jan Kara Subject: [Ocfs2-devel] [PATCH v5 3/4] ext4: data=journal: fixes for ext4_page_mkwrite() X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9766 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 suspectscore=0 adultscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010060141 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9766 signatures=668680 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 malwarescore=0 bulkscore=0 impostorscore=0 lowpriorityscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 adultscore=0 clxscore=1015 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2006250000 definitions=main-2010060141 These are two fixes for data journalling required by the next patch, discovered while testing it. First, the optimization to return early if all buffers are mapped is not appropriate for the next patch: The inode _must_ be added to the transaction's list in data=journal mode (so to write-protect pages on commit) thus we cannot return early there. Second, once that optimization to reduce transactions was disabled for data=journal mode, more transactions happened, and occasionally hit this warning message: 'JBD2: Spotted dirty metadata buffer'. Reason is, block_page_mkwrite() will set_buffer_dirty() before do_journal_get_write_access() that is there to prevent it. This issue was masked by the optimization. So, on data=journal use __block_write_begin() instead. This also requires page locking and len recalculation. (see block_page_mkwrite() for implementation details.) Finally, as Jan noted there is little sharing between data=journal and other modes in ext4_page_mkwrite(). However, a prototype of ext4_journalled_page_mkwrite() showed there still would be lots of duplicated lines (tens of) that didn't seem worth it. Thus this patch ends up with an ugly goto to skip all non-data journalling code (to avoid long indentations, but that can be changed..) in the beginning, and just a conditional in the transaction section. Well, we skip a common part to data journalling which is the page truncated check, but we do it again after ext4_journal_start() when we re-acquire the page lock (so not to acquire the page lock twice needlessly for data journalling.) Signed-off-by: Mauricio Faria de Oliveira Suggested-by: Jan Kara Reviewed-by: Jan Kara Reviewed-by: Andreas Dilger --- fs/ext4/inode.c | 51 ++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bf596467c234..ac153e340a6f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5977,9 +5977,17 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) if (err) goto out_ret; + /* + * On data journalling we skip straight to the transaction handle: + * there's no delalloc; page truncated will be checked later; the + * early return w/ all buffers mapped (calculates size/len) can't + * be used; and there's no dioread_nolock, so only ext4_get_block. + */ + if (ext4_should_journal_data(inode)) + goto retry_alloc; + /* Delalloc case is easy... */ if (test_opt(inode->i_sb, DELALLOC) && - !ext4_should_journal_data(inode) && !ext4_nonda_switch(inode->i_sb)) { do { err = block_page_mkwrite(vma, vmf, @@ -6005,6 +6013,9 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) /* * Return if we have all the buffers mapped. This avoids the need to do * journal_start/journal_stop which can block and take a long time + * + * This cannot be done for data journalling, as we have to add the + * inode to the transaction's list to writeprotect pages on commit. */ if (page_has_buffers(page)) { if (!ext4_walk_page_buffers(NULL, page_buffers(page), @@ -6029,16 +6040,42 @@ vm_fault_t ext4_page_mkwrite(struct vm_fault *vmf) ret = VM_FAULT_SIGBUS; goto out; } - err = block_page_mkwrite(vma, vmf, get_block); - if (!err && ext4_should_journal_data(inode)) { - if (ext4_walk_page_buffers(handle, page_buffers(page), 0, - PAGE_SIZE, NULL, do_journal_get_write_access)) { + /* + * Data journalling can't use block_page_mkwrite() because it + * will set_buffer_dirty() before do_journal_get_write_access() + * thus might hit warning messages for dirty metadata buffers. + */ + if (!ext4_should_journal_data(inode)) { + err = block_page_mkwrite(vma, vmf, get_block); + } else { + lock_page(page); + size = i_size_read(inode); + /* Page got truncated from under us? */ + if (page->mapping != mapping || page_offset(page) > size) { unlock_page(page); - ret = VM_FAULT_SIGBUS; + ret = VM_FAULT_NOPAGE; ext4_journal_stop(handle); goto out; } - ext4_set_inode_state(inode, EXT4_STATE_JDATA); + + if (page->index == size >> PAGE_SHIFT) + len = size & ~PAGE_MASK; + else + len = PAGE_SIZE; + + err = __block_write_begin(page, 0, len, ext4_get_block); + if (!err) { + if (ext4_walk_page_buffers(handle, page_buffers(page), + 0, len, NULL, do_journal_get_write_access)) { + unlock_page(page); + ret = VM_FAULT_SIGBUS; + ext4_journal_stop(handle); + goto out; + } + ext4_set_inode_state(inode, EXT4_STATE_JDATA); + } else { + unlock_page(page); + } } ext4_journal_stop(handle); if (err == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))