From patchwork Tue May 3 22:17:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836447 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1000AC433FE for ; Tue, 3 May 2022 22:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235028AbiECWVJ (ORCPT ); Tue, 3 May 2022 18:21:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235359AbiECWVG (ORCPT ); Tue, 3 May 2022 18:21:06 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 573BC3F33A for ; Tue, 3 May 2022 15:17:33 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 1D9F610E6188 for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGA-03 for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pl-000mGB-V0 for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:29 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 01/10] xfs: zero inode fork buffer at allocation Date: Wed, 4 May 2022 08:17:19 +1000 Message-Id: <20220503221728.185449-2-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=Rdbp8xITySdt82h7jNoA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When we first allocate or resize an inline inode fork, we round up the allocation to 4 byte alingment to make journal alignment constraints. We don't clear the unused bytes, so we can copy up to three uninitialised bytes into the journal. Zero those bytes so we only ever copy zeros into the journal. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Allison Henderson Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_inode_fork.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 9aee4a1e2fe9..a15ff38c3d41 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -50,8 +50,13 @@ xfs_init_local_fork( mem_size++; if (size) { + /* + * As we round up the allocation here, we need to ensure the + * bytes we don't copy data into are zeroed because the log + * vectors still copy them into the journal. + */ real_size = roundup(mem_size, 4); - ifp->if_u1.if_data = kmem_alloc(real_size, KM_NOFS); + ifp->if_u1.if_data = kmem_zalloc(real_size, KM_NOFS); memcpy(ifp->if_u1.if_data, data, size); if (zero_terminate) ifp->if_u1.if_data[size] = '\0'; @@ -500,10 +505,11 @@ xfs_idata_realloc( /* * For inline data, the underlying buffer must be a multiple of 4 bytes * in size so that it can be logged and stay on word boundaries. - * We enforce that here. + * We enforce that here, and use __GFP_ZERO to ensure that size + * extensions always zero the unused roundup area. */ ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4), - GFP_NOFS | __GFP_NOFAIL); + GFP_NOFS | __GFP_NOFAIL | __GFP_ZERO); ifp->if_bytes = new_size; } From patchwork Tue May 3 22:17:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836450 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBD73C433F5 for ; Tue, 3 May 2022 22:17:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243562AbiECWVL (ORCPT ); Tue, 3 May 2022 18:21:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231801AbiECWVI (ORCPT ); Tue, 3 May 2022 18:21:08 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 573203F311 for ; Tue, 3 May 2022 15:17:33 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 2460810E6126 for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGC-1L for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGF-05 for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 02/10] xfs: fix potential log item leak Date: Wed, 4 May 2022 08:17:20 +1000 Message-Id: <20220503221728.185449-3-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=deDjYVbe c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=8eoKa81H350I2gJdJ9oA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Ever since we added shadown format buffers to the log items, log items need to handle the item being released with shadow buffers attached. Due to the fact this requirement was added at the same time we added new rmap/reflink intents, we missed the cleanup of those items. In theory, this means shadow buffers can be leaked in a very small window when a shutdown is initiated. Testing with KASAN shows this leak does not happen in practice - we haven't identified a single leak in several years of shutdown testing since ~v4.8 kernels. However, the intent whiteout cleanup mechanism results in every cancelled intent in exactly the same state as this tiny race window creates and so if intents down clean up shadow buffers on final release we will leak the shadow buffer for just about every intent we create. Hence we start with this patch to close this condition off and ensure that when whiteouts start to be used we don't leak lots of memory. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_bmap_item.c | 2 ++ fs/xfs/xfs_icreate_item.c | 1 + fs/xfs/xfs_refcount_item.c | 2 ++ fs/xfs/xfs_rmap_item.c | 2 ++ 4 files changed, 7 insertions(+) diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 593ac29cffc7..2c8b686e2a11 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -39,6 +39,7 @@ STATIC void xfs_bui_item_free( struct xfs_bui_log_item *buip) { + kmem_free(buip->bui_item.li_lv_shadow); kmem_cache_free(xfs_bui_cache, buip); } @@ -198,6 +199,7 @@ xfs_bud_item_release( struct xfs_bud_log_item *budp = BUD_ITEM(lip); xfs_bui_release(budp->bud_buip); + kmem_free(budp->bud_item.li_lv_shadow); kmem_cache_free(xfs_bud_cache, budp); } diff --git a/fs/xfs/xfs_icreate_item.c b/fs/xfs/xfs_icreate_item.c index 508e184e3b8f..b05314d48176 100644 --- a/fs/xfs/xfs_icreate_item.c +++ b/fs/xfs/xfs_icreate_item.c @@ -63,6 +63,7 @@ STATIC void xfs_icreate_item_release( struct xfs_log_item *lip) { + kmem_free(ICR_ITEM(lip)->ic_item.li_lv_shadow); kmem_cache_free(xfs_icreate_cache, ICR_ITEM(lip)); } diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 0d868c93144d..10474fe389e1 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -35,6 +35,7 @@ STATIC void xfs_cui_item_free( struct xfs_cui_log_item *cuip) { + kmem_free(cuip->cui_item.li_lv_shadow); if (cuip->cui_format.cui_nextents > XFS_CUI_MAX_FAST_EXTENTS) kmem_free(cuip); else @@ -204,6 +205,7 @@ xfs_cud_item_release( struct xfs_cud_log_item *cudp = CUD_ITEM(lip); xfs_cui_release(cudp->cud_cuip); + kmem_free(cudp->cud_item.li_lv_shadow); kmem_cache_free(xfs_cud_cache, cudp); } diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index a22b2d19ef91..6c0b56ebdbe1 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -35,6 +35,7 @@ STATIC void xfs_rui_item_free( struct xfs_rui_log_item *ruip) { + kmem_free(ruip->rui_item.li_lv_shadow); if (ruip->rui_format.rui_nextents > XFS_RUI_MAX_FAST_EXTENTS) kmem_free(ruip); else @@ -227,6 +228,7 @@ xfs_rud_item_release( struct xfs_rud_log_item *rudp = RUD_ITEM(lip); xfs_rui_release(rudp->rud_ruip); + kmem_free(rudp->rud_item.li_lv_shadow); kmem_cache_free(xfs_rud_cache, rudp); } From patchwork Tue May 3 22:17:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836449 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41790C433EF for ; Tue, 3 May 2022 22:17:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243566AbiECWVK (ORCPT ); Tue, 3 May 2022 18:21:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32980 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243562AbiECWVI (ORCPT ); Tue, 3 May 2022 18:21:08 -0400 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4269B3EF3D for ; Tue, 3 May 2022 15:17:32 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 248745345A1 for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGF-2S for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGJ-1V for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 03/10] xfs: hide log iovec alignment constraints Date: Wed, 4 May 2022 08:17:21 +1000 Message-Id: <20220503221728.185449-4-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=VuxAv86n c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=yPCof4ZbAAAA:8 a=Ic1sBYCpAwEJxfN7YPEA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Callers currently have to round out the size of buffers to match the aligment constraints of log iovecs and xlog_write(). They should not need to know this detail, so introduce a new function to calculate the iovec length (for use in ->iop_size implementations). Also modify xlog_finish_iovec() to round up the length to the correct alignment so the callers don't need to do this, either. Convert the only user - inode forks - of this alignment rounding to use the new interface. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong --- fs/xfs/libxfs/xfs_inode_fork.c | 20 ++++------------ fs/xfs/xfs_inode_item.c | 25 ++++++-------------- fs/xfs/xfs_inode_item_recover.c | 4 ++-- fs/xfs/xfs_log.h | 42 ++++++++++++++++++++++++++++++--- 4 files changed, 52 insertions(+), 39 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index a15ff38c3d41..1a4cdf550f6d 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -36,7 +36,7 @@ xfs_init_local_fork( int64_t size) { struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, whichfork); - int mem_size = size, real_size = 0; + int mem_size = size; bool zero_terminate; /* @@ -50,13 +50,7 @@ xfs_init_local_fork( mem_size++; if (size) { - /* - * As we round up the allocation here, we need to ensure the - * bytes we don't copy data into are zeroed because the log - * vectors still copy them into the journal. - */ - real_size = roundup(mem_size, 4); - ifp->if_u1.if_data = kmem_zalloc(real_size, KM_NOFS); + ifp->if_u1.if_data = kmem_alloc(mem_size, KM_NOFS); memcpy(ifp->if_u1.if_data, data, size); if (zero_terminate) ifp->if_u1.if_data[size] = '\0'; @@ -502,14 +496,8 @@ xfs_idata_realloc( return; } - /* - * For inline data, the underlying buffer must be a multiple of 4 bytes - * in size so that it can be logged and stay on word boundaries. - * We enforce that here, and use __GFP_ZERO to ensure that size - * extensions always zero the unused roundup area. - */ - ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, roundup(new_size, 4), - GFP_NOFS | __GFP_NOFAIL | __GFP_ZERO); + ifp->if_u1.if_data = krealloc(ifp->if_u1.if_data, new_size, + GFP_NOFS | __GFP_NOFAIL); ifp->if_bytes = new_size; } diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 00733a18ccdc..721def0639fd 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -71,7 +71,7 @@ xfs_inode_item_data_fork_size( case XFS_DINODE_FMT_LOCAL: if ((iip->ili_fields & XFS_ILOG_DDATA) && ip->i_df.if_bytes > 0) { - *nbytes += roundup(ip->i_df.if_bytes, 4); + *nbytes += xlog_calc_iovec_len(ip->i_df.if_bytes); *nvecs += 1; } break; @@ -112,7 +112,7 @@ xfs_inode_item_attr_fork_size( case XFS_DINODE_FMT_LOCAL: if ((iip->ili_fields & XFS_ILOG_ADATA) && ip->i_afp->if_bytes > 0) { - *nbytes += roundup(ip->i_afp->if_bytes, 4); + *nbytes += xlog_calc_iovec_len(ip->i_afp->if_bytes); *nvecs += 1; } break; @@ -204,17 +204,12 @@ xfs_inode_item_format_data_fork( ~(XFS_ILOG_DEXT | XFS_ILOG_DBROOT | XFS_ILOG_DEV); if ((iip->ili_fields & XFS_ILOG_DDATA) && ip->i_df.if_bytes > 0) { - /* - * Round i_bytes up to a word boundary. - * The underlying memory is guaranteed - * to be there by xfs_idata_realloc(). - */ - data_bytes = roundup(ip->i_df.if_bytes, 4); ASSERT(ip->i_df.if_u1.if_data != NULL); ASSERT(ip->i_disk_size > 0); xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_ILOCAL, - ip->i_df.if_u1.if_data, data_bytes); - ilf->ilf_dsize = (unsigned)data_bytes; + ip->i_df.if_u1.if_data, + ip->i_df.if_bytes); + ilf->ilf_dsize = (unsigned)ip->i_df.if_bytes; ilf->ilf_size++; } else { iip->ili_fields &= ~XFS_ILOG_DDATA; @@ -288,17 +283,11 @@ xfs_inode_item_format_attr_fork( if ((iip->ili_fields & XFS_ILOG_ADATA) && ip->i_afp->if_bytes > 0) { - /* - * Round i_bytes up to a word boundary. - * The underlying memory is guaranteed - * to be there by xfs_idata_realloc(). - */ - data_bytes = roundup(ip->i_afp->if_bytes, 4); ASSERT(ip->i_afp->if_u1.if_data != NULL); xlog_copy_iovec(lv, vecp, XLOG_REG_TYPE_IATTR_LOCAL, ip->i_afp->if_u1.if_data, - data_bytes); - ilf->ilf_asize = (unsigned)data_bytes; + ip->i_afp->if_bytes); + ilf->ilf_asize = (unsigned)ip->i_afp->if_bytes; ilf->ilf_size++; } else { iip->ili_fields &= ~XFS_ILOG_ADATA; diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 6d44f5fd6d7e..d28ffaebd067 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -462,7 +462,7 @@ xlog_recover_inode_commit_pass2( ASSERT(in_f->ilf_size <= 4); ASSERT((in_f->ilf_size == 3) || (fields & XFS_ILOG_AFORK)); ASSERT(!(fields & XFS_ILOG_DFORK) || - (len == in_f->ilf_dsize)); + (len == xlog_calc_iovec_len(in_f->ilf_dsize))); switch (fields & XFS_ILOG_DFORK) { case XFS_ILOG_DDATA: @@ -497,7 +497,7 @@ xlog_recover_inode_commit_pass2( } len = item->ri_buf[attr_index].i_len; src = item->ri_buf[attr_index].i_addr; - ASSERT(len == in_f->ilf_asize); + ASSERT(len == xlog_calc_iovec_len(in_f->ilf_asize)); switch (in_f->ilf_fields & XFS_ILOG_AFORK) { case XFS_ILOG_ADATA: diff --git a/fs/xfs/xfs_log.h b/fs/xfs/xfs_log.h index 8dafe8f771c7..3a4f6a4e4eb7 100644 --- a/fs/xfs/xfs_log.h +++ b/fs/xfs/xfs_log.h @@ -21,23 +21,59 @@ struct xfs_log_vec { #define XFS_LOG_VEC_ORDERED (-1) +/* + * Calculate the log iovec length for a given user buffer length. Intended to be + * used by ->iop_size implementations when sizing buffers of arbitrary + * alignments. + */ +static inline int +xlog_calc_iovec_len(int len) +{ + return roundup(len, sizeof(uint32_t)); +} + void *xlog_prepare_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp, uint type); static inline void -xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, int len) +xlog_finish_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec *vec, + int data_len) { struct xlog_op_header *oph = vec->i_addr; - - /* opheader tracks payload length, logvec tracks region length */ + int len; + + /* + * Always round up the length to the correct alignment so callers don't + * need to know anything about this log vec layout requirement. This + * means we have to zero the area the data to be written does not cover. + * This is complicated by fact the payload region is offset into the + * logvec region by the opheader that tracks the payload. + */ + len = xlog_calc_iovec_len(data_len); + if (len - data_len != 0) { + char *buf = vec->i_addr + sizeof(struct xlog_op_header); + + memset(buf + data_len, 0, len - data_len); + } + + /* + * The opheader tracks aligned payload length, whilst the logvec tracks + * the overall region length. + */ oph->oh_len = cpu_to_be32(len); len += sizeof(struct xlog_op_header); lv->lv_buf_len += len; lv->lv_bytes += len; vec->i_len = len; + + /* Catch buffer overruns */ + ASSERT((void *)lv->lv_buf + lv->lv_bytes <= (void *)lv + lv->lv_size); } +/* + * Copy the amount of data requested by the caller into a new log iovec. + */ static inline void * xlog_copy_iovec(struct xfs_log_vec *lv, struct xfs_log_iovec **vecp, uint type, void *data, int len) From patchwork Tue May 3 22:17:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836448 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48350C4332F for ; Tue, 3 May 2022 22:17:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235359AbiECWVK (ORCPT ); Tue, 3 May 2022 18:21:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243555AbiECWVI (ORCPT ); Tue, 3 May 2022 18:21:08 -0400 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 4217F3EBB9 for ; Tue, 3 May 2022 15:17:32 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id DCE0D53459F for ; Wed, 4 May 2022 08:17:30 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGH-3W for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGO-2b for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 04/10] xfs: don't commit the first deferred transaction without intents Date: Wed, 4 May 2022 08:17:22 +1000 Message-Id: <20220503221728.185449-5-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=6271a9fa a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=VwQbUJbxAAAA:8 a=yPCof4ZbAAAA:8 a=9y3LhuyKLFJyHClU7WAA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner If the first operation in a string of defer ops has no intents, then there is no reason to commit it before running the first call to xfs_defer_finish_one(). This allows the defer ops to be used effectively for non-intent based operations without requiring an unnecessary extra transaction commit when first called. This fixes a regression in per-attribute modification transaction count when delayed attributes are not being used. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Allison Henderson --- fs/xfs/libxfs/xfs_defer.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 0805ade2d300..1aa32bfdf0cc 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -186,7 +186,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, }; -static void +static bool xfs_defer_create_intent( struct xfs_trans *tp, struct xfs_defer_pending *dfp, @@ -197,6 +197,7 @@ xfs_defer_create_intent( if (!dfp->dfp_intent) dfp->dfp_intent = ops->create_intent(tp, &dfp->dfp_work, dfp->dfp_count, sort); + return dfp->dfp_intent != NULL; } /* @@ -204,16 +205,18 @@ xfs_defer_create_intent( * associated extents, then add the entire intake list to the end of * the pending list. */ -STATIC void +static bool xfs_defer_create_intents( struct xfs_trans *tp) { struct xfs_defer_pending *dfp; + bool ret = false; list_for_each_entry(dfp, &tp->t_dfops, dfp_list) { trace_xfs_defer_create_intent(tp->t_mountp, dfp); - xfs_defer_create_intent(tp, dfp, true); + ret |= xfs_defer_create_intent(tp, dfp, true); } + return ret; } /* Abort all the intents that were committed. */ @@ -487,7 +490,7 @@ int xfs_defer_finish_noroll( struct xfs_trans **tp) { - struct xfs_defer_pending *dfp; + struct xfs_defer_pending *dfp = NULL; int error = 0; LIST_HEAD(dop_pending); @@ -506,17 +509,20 @@ xfs_defer_finish_noroll( * of time that any one intent item can stick around in memory, * pinning the log tail. */ - xfs_defer_create_intents(*tp); + bool has_intents = xfs_defer_create_intents(*tp); + list_splice_init(&(*tp)->t_dfops, &dop_pending); - error = xfs_defer_trans_roll(tp); - if (error) - goto out_shutdown; + if (has_intents || dfp) { + error = xfs_defer_trans_roll(tp); + if (error) + goto out_shutdown; - /* Possibly relog intent items to keep the log moving. */ - error = xfs_defer_relog(tp, &dop_pending); - if (error) - goto out_shutdown; + /* Relog intent items to keep the log moving. */ + error = xfs_defer_relog(tp, &dop_pending); + if (error) + goto out_shutdown; + } dfp = list_first_entry(&dop_pending, struct xfs_defer_pending, dfp_list); From patchwork Tue May 3 22:17:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836462 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0551C433EF for ; Tue, 3 May 2022 22:35:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230240AbiECWjA (ORCPT ); Tue, 3 May 2022 18:39:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237198AbiECWiv (ORCPT ); Tue, 3 May 2022 18:38:51 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A78FABC83 for ; Tue, 3 May 2022 15:35:17 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id A1DCF10E6247 for ; Wed, 4 May 2022 08:35:16 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGI-4R for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGU-3U for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 05/10] xfs: add log item flags to indicate intents Date: Wed, 4 May 2022 08:17:23 +1000 Message-Id: <20220503221728.185449-6-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=VuxAv86n c=1 sm=1 tr=0 ts=6271ae24 a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=yPCof4ZbAAAA:8 a=VwQbUJbxAAAA:8 a=SNIHMcCwxNbLk-RHDm4A:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner We currently have a couple of helper functions that try to infer whether the log item is an intent or intent done item from the combinations of operations it supports. This is incredibly fragile and not very efficient as it requires checking specific combinations of ops. We need to be able to identify intent and intent done items quickly and easily in upcoming patches, so simply add intent and intent done type flags to the log item ops flags. These are static flags to begin with, so intent items should have been typed like this from the start. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_bmap_item.c | 4 +++- fs/xfs/xfs_extfree_item.c | 4 +++- fs/xfs/xfs_refcount_item.c | 4 +++- fs/xfs/xfs_rmap_item.c | 4 +++- fs/xfs/xfs_trans.h | 25 +++++++++++++------------ 5 files changed, 25 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 2c8b686e2a11..0e0aae83308c 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -204,7 +204,8 @@ xfs_bud_item_release( } static const struct xfs_item_ops xfs_bud_item_ops = { - .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | + XFS_ITEM_INTENT_DONE, .iop_size = xfs_bud_item_size, .iop_format = xfs_bud_item_format, .iop_release = xfs_bud_item_release, @@ -588,6 +589,7 @@ xfs_bui_item_relog( } static const struct xfs_item_ops xfs_bui_item_ops = { + .flags = XFS_ITEM_INTENT, .iop_size = xfs_bui_item_size, .iop_format = xfs_bui_item_format, .iop_unpin = xfs_bui_item_unpin, diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 0e50f2c9348e..21a159f9d8c5 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -307,7 +307,8 @@ xfs_efd_item_release( } static const struct xfs_item_ops xfs_efd_item_ops = { - .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | + XFS_ITEM_INTENT_DONE, .iop_size = xfs_efd_item_size, .iop_format = xfs_efd_item_format, .iop_release = xfs_efd_item_release, @@ -688,6 +689,7 @@ xfs_efi_item_relog( } static const struct xfs_item_ops xfs_efi_item_ops = { + .flags = XFS_ITEM_INTENT, .iop_size = xfs_efi_item_size, .iop_format = xfs_efi_item_format, .iop_unpin = xfs_efi_item_unpin, diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 10474fe389e1..71225d094e03 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -210,7 +210,8 @@ xfs_cud_item_release( } static const struct xfs_item_ops xfs_cud_item_ops = { - .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | + XFS_ITEM_INTENT_DONE, .iop_size = xfs_cud_item_size, .iop_format = xfs_cud_item_format, .iop_release = xfs_cud_item_release, @@ -602,6 +603,7 @@ xfs_cui_item_relog( } static const struct xfs_item_ops xfs_cui_item_ops = { + .flags = XFS_ITEM_INTENT, .iop_size = xfs_cui_item_size, .iop_format = xfs_cui_item_format, .iop_unpin = xfs_cui_item_unpin, diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 6c0b56ebdbe1..6ecbc37e4b8d 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -233,7 +233,8 @@ xfs_rud_item_release( } static const struct xfs_item_ops xfs_rud_item_ops = { - .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED, + .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | + XFS_ITEM_INTENT_DONE, .iop_size = xfs_rud_item_size, .iop_format = xfs_rud_item_format, .iop_release = xfs_rud_item_release, @@ -632,6 +633,7 @@ xfs_rui_item_relog( } static const struct xfs_item_ops xfs_rui_item_ops = { + .flags = XFS_ITEM_INTENT, .iop_size = xfs_rui_item_size, .iop_format = xfs_rui_item_format, .iop_unpin = xfs_rui_item_unpin, diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 87e940b5366e..f68e74e46026 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -80,28 +80,29 @@ struct xfs_item_ops { struct xfs_trans *tp); }; -/* Is this log item a deferred action intent? */ +/* + * Log item ops flags + */ +/* + * Release the log item when the journal commits instead of inserting into the + * AIL for writeback tracking and/or log tail pinning. + */ +#define XFS_ITEM_RELEASE_WHEN_COMMITTED (1 << 0) +#define XFS_ITEM_INTENT (1 << 1) +#define XFS_ITEM_INTENT_DONE (1 << 2) + static inline bool xlog_item_is_intent(struct xfs_log_item *lip) { - return lip->li_ops->iop_recover != NULL && - lip->li_ops->iop_match != NULL; + return lip->li_ops->flags & XFS_ITEM_INTENT; } -/* Is this a log intent-done item? */ static inline bool xlog_item_is_intent_done(struct xfs_log_item *lip) { - return lip->li_ops->iop_unpin == NULL && - lip->li_ops->iop_push == NULL; + return lip->li_ops->flags & XFS_ITEM_INTENT_DONE; } -/* - * Release the log item as soon as committed. This is for items just logging - * intents that never need to be written back in place. - */ -#define XFS_ITEM_RELEASE_WHEN_COMMITTED (1 << 0) - void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item, int type, const struct xfs_item_ops *ops); From patchwork Tue May 3 22:17:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836452 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4118FC433F5 for ; Tue, 3 May 2022 22:17:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243560AbiECWVM (ORCPT ); Tue, 3 May 2022 18:21:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243533AbiECWVJ (ORCPT ); Tue, 3 May 2022 18:21:09 -0400 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2B40A3F33C for ; Tue, 3 May 2022 15:17:36 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 339E55345A9 for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGR-5q for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGY-4P for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 06/10] xfs: tag transactions that contain intent done items Date: Wed, 4 May 2022 08:17:24 +1000 Message-Id: <20220503221728.185449-7-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=VuxAv86n c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=yPCof4ZbAAAA:8 a=VwQbUJbxAAAA:8 a=f4-_iRGBJiCRqTHycS8A:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner Intent whiteouts will require extra work to be done during transaction commit if the transaction contains an intent done item. To determine if a transaction contains an intent done item, we want to avoid having to walk all the items in the transaction to check if they are intent done items. Hence when we add an intent done item to a transaction, tag the transaction to indicate that it contains such an item. We don't tag the transaction when the defer ops is relogging an intent to move it forward in the log. Whiteouts will never apply to these cases, so we don't need to bother looking for them. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_shared.h | 24 +++++++++++++++++------- fs/xfs/xfs_bmap_item.c | 2 +- fs/xfs/xfs_extfree_item.c | 2 +- fs/xfs/xfs_refcount_item.c | 2 +- fs/xfs/xfs_rmap_item.c | 2 +- 5 files changed, 21 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 25c4cab58851..c4381388c0c1 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -54,13 +54,23 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, /* * Values for t_flags. */ -#define XFS_TRANS_DIRTY 0x01 /* something needs to be logged */ -#define XFS_TRANS_SB_DIRTY 0x02 /* superblock is modified */ -#define XFS_TRANS_PERM_LOG_RES 0x04 /* xact took a permanent log res */ -#define XFS_TRANS_SYNC 0x08 /* make commit synchronous */ -#define XFS_TRANS_RESERVE 0x20 /* OK to use reserved data blocks */ -#define XFS_TRANS_NO_WRITECOUNT 0x40 /* do not elevate SB writecount */ -#define XFS_TRANS_RES_FDBLKS 0x80 /* reserve newly freed blocks */ +/* Transaction needs to be logged */ +#define XFS_TRANS_DIRTY (1u << 0) +/* Superblock is dirty and needs to be logged */ +#define XFS_TRANS_SB_DIRTY (1u << 1) +/* Transaction took a permanent log reservation */ +#define XFS_TRANS_PERM_LOG_RES (1u << 2) +/* Synchronous transaction commit needed */ +#define XFS_TRANS_SYNC (1u << 3) +/* Transaction can use reserve block pool */ +#define XFS_TRANS_RESERVE (1u << 4) +/* Transaction should avoid VFS level superblock write accounting */ +#define XFS_TRANS_NO_WRITECOUNT (1u << 5) +/* Transaction has freed blocks returned to it's reservation */ +#define XFS_TRANS_RES_FDBLKS (1u << 6) +/* Transaction contains an intent done log item */ +#define XFS_TRANS_HAS_INTENT_DONE (1u << 7) + /* * LOWMODE is used by the allocator to activate the lowspace algorithm - when * free space is running low the extent allocator may choose to allocate an diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 0e0aae83308c..3d1fa8edf28f 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -257,7 +257,7 @@ xfs_trans_log_finish_bmap_update( * 1.) releases the BUI and frees the BUD * 2.) shuts down the filesystem */ - tp->t_flags |= XFS_TRANS_DIRTY; + tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &budp->bud_item.li_flags); return error; diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 21a159f9d8c5..96735f23d12d 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -381,7 +381,7 @@ xfs_trans_free_extent( * 1.) releases the EFI and frees the EFD * 2.) shuts down the filesystem */ - tp->t_flags |= XFS_TRANS_DIRTY; + tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags); next_extent = efdp->efd_next_extent; diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 71225d094e03..b37a9d2ce652 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -262,7 +262,7 @@ xfs_trans_log_finish_refcount_update( * 1.) releases the CUI and frees the CUD * 2.) shuts down the filesystem */ - tp->t_flags |= XFS_TRANS_DIRTY; + tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &cudp->cud_item.li_flags); return error; diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 6ecbc37e4b8d..5221fd1e6f6f 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -330,7 +330,7 @@ xfs_trans_log_finish_rmap_update( * 1.) releases the RUI and frees the RUD * 2.) shuts down the filesystem */ - tp->t_flags |= XFS_TRANS_DIRTY; + tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &rudp->rud_item.li_flags); return error; From patchwork Tue May 3 22:17:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836446 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52BBFC433F5 for ; Tue, 3 May 2022 22:17:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243563AbiECWVI (ORCPT ); Tue, 3 May 2022 18:21:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32946 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235028AbiECWVG (ORCPT ); Tue, 3 May 2022 18:21:06 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BC55D3EA94 for ; Tue, 3 May 2022 15:17:31 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id DD5AF10E618B for ; Wed, 4 May 2022 08:17:30 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGU-77 for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGd-5r for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 07/10] xfs: factor and move some code in xfs_log_cil.c Date: Wed, 4 May 2022 08:17:25 +1000 Message-Id: <20220503221728.185449-8-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=VuxAv86n c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=VwQbUJbxAAAA:8 a=yPCof4ZbAAAA:8 a=2LVMjx26bSV1j-c8ysAA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner In preparation for adding support for intent item whiteouts. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Allison Henderson --- fs/xfs/xfs_log_cil.c | 119 ++++++++++++++++++++++++------------------- 1 file changed, 67 insertions(+), 52 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index e5ab62f08c19..0d8d092447ad 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -47,6 +47,38 @@ xlog_cil_ticket_alloc( return tic; } +/* + * Check if the current log item was first committed in this sequence. + * We can't rely on just the log item being in the CIL, we have to check + * the recorded commit sequence number. + * + * Note: for this to be used in a non-racy manner, it has to be called with + * CIL flushing locked out. As a result, it should only be used during the + * transaction commit process when deciding what to format into the item. + */ +static bool +xlog_item_in_current_chkpt( + struct xfs_cil *cil, + struct xfs_log_item *lip) +{ + if (list_empty(&lip->li_cil)) + return false; + + /* + * li_seq is written on the first commit of a log item to record the + * first checkpoint it is written to. Hence if it is different to the + * current sequence, we're in a new checkpoint. + */ + return lip->li_seq == READ_ONCE(cil->xc_current_sequence); +} + +bool +xfs_log_item_in_current_chkpt( + struct xfs_log_item *lip) +{ + return xlog_item_in_current_chkpt(lip->li_log->l_cilp, lip); +} + /* * Unavoidable forward declaration - xlog_cil_push_work() calls * xlog_cil_ctx_alloc() itself. @@ -934,6 +966,40 @@ xlog_cil_build_trans_hdr( tic->t_curr_res -= lvhdr->lv_bytes; } +/* + * Pull all the log vectors off the items in the CIL, and remove the items from + * the CIL. We don't need the CIL lock here because it's only needed on the + * transaction commit side which is currently locked out by the flush lock. + */ +static void +xlog_cil_build_lv_chain( + struct xfs_cil *cil, + struct xfs_cil_ctx *ctx, + uint32_t *num_iovecs, + uint32_t *num_bytes) +{ + struct xfs_log_vec *lv = NULL; + + while (!list_empty(&cil->xc_cil)) { + struct xfs_log_item *item; + + item = list_first_entry(&cil->xc_cil, + struct xfs_log_item, li_cil); + list_del_init(&item->li_cil); + if (!ctx->lv_chain) + ctx->lv_chain = item->li_lv; + else + lv->lv_next = item->li_lv; + lv = item->li_lv; + item->li_lv = NULL; + *num_iovecs += lv->lv_niovecs; + + /* we don't write ordered log vectors */ + if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) + *num_bytes += lv->lv_bytes; + } +} + /* * Push the Committed Item List to the log. * @@ -956,7 +1022,6 @@ xlog_cil_push_work( container_of(work, struct xfs_cil_ctx, push_work); struct xfs_cil *cil = ctx->cil; struct xlog *log = cil->xc_log; - struct xfs_log_vec *lv; struct xfs_cil_ctx *new_ctx; int num_iovecs = 0; int num_bytes = 0; @@ -1033,31 +1098,7 @@ xlog_cil_push_work( list_add(&ctx->committing, &cil->xc_committing); spin_unlock(&cil->xc_push_lock); - /* - * Pull all the log vectors off the items in the CIL, and remove the - * items from the CIL. We don't need the CIL lock here because it's only - * needed on the transaction commit side which is currently locked out - * by the flush lock. - */ - lv = NULL; - while (!list_empty(&cil->xc_cil)) { - struct xfs_log_item *item; - - item = list_first_entry(&cil->xc_cil, - struct xfs_log_item, li_cil); - list_del_init(&item->li_cil); - if (!ctx->lv_chain) - ctx->lv_chain = item->li_lv; - else - lv->lv_next = item->li_lv; - lv = item->li_lv; - item->li_lv = NULL; - num_iovecs += lv->lv_niovecs; - - /* we don't write ordered log vectors */ - if (lv->lv_buf_len != XFS_LOG_VEC_ORDERED) - num_bytes += lv->lv_bytes; - } + xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes); /* * Switch the contexts so we can drop the context lock and move out @@ -1508,32 +1549,6 @@ xlog_cil_force_seq( return 0; } -/* - * Check if the current log item was first committed in this sequence. - * We can't rely on just the log item being in the CIL, we have to check - * the recorded commit sequence number. - * - * Note: for this to be used in a non-racy manner, it has to be called with - * CIL flushing locked out. As a result, it should only be used during the - * transaction commit process when deciding what to format into the item. - */ -bool -xfs_log_item_in_current_chkpt( - struct xfs_log_item *lip) -{ - struct xfs_cil *cil = lip->li_log->l_cilp; - - if (list_empty(&lip->li_cil)) - return false; - - /* - * li_seq is written on the first commit of a log item to record the - * first checkpoint it is written to. Hence if it is different to the - * current sequence, we're in a new checkpoint. - */ - return lip->li_seq == READ_ONCE(cil->xc_current_sequence); -} - /* * Perform initial CIL structure initialisation. */ From patchwork Tue May 3 22:17:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836451 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7310C433FE for ; Tue, 3 May 2022 22:17:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231801AbiECWVM (ORCPT ); Tue, 3 May 2022 18:21:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243560AbiECWVI (ORCPT ); Tue, 3 May 2022 18:21:08 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5955A3EA86 for ; Tue, 3 May 2022 15:17:34 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 50A8A10E61AC for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGX-8Q for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGj-7C for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 08/10] xfs: add log item method to return related intents Date: Wed, 4 May 2022 08:17:26 +1000 Message-Id: <20220503221728.185449-9-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=e9dl9Yl/ c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=yPCof4ZbAAAA:8 a=VwQbUJbxAAAA:8 a=S1N-PUcTW3_U964qLncA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner To apply a whiteout to an intent item when an intent done item is committed, we need to be able to retrieve the intent item from the the intent done item. Add a log item op method for doing this, and wire all the intent done items up to it. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_bmap_item.c | 8 ++++++++ fs/xfs/xfs_extfree_item.c | 8 ++++++++ fs/xfs/xfs_refcount_item.c | 8 ++++++++ fs/xfs/xfs_rmap_item.c | 8 ++++++++ fs/xfs/xfs_trans.h | 1 + 5 files changed, 33 insertions(+) diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 3d1fa8edf28f..f05663fdb6ff 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -203,12 +203,20 @@ xfs_bud_item_release( kmem_cache_free(xfs_bud_cache, budp); } +static struct xfs_log_item * +xfs_bud_item_intent( + struct xfs_log_item *lip) +{ + return &BUD_ITEM(lip)->bud_buip->bui_item; +} + static const struct xfs_item_ops xfs_bud_item_ops = { .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | XFS_ITEM_INTENT_DONE, .iop_size = xfs_bud_item_size, .iop_format = xfs_bud_item_format, .iop_release = xfs_bud_item_release, + .iop_intent = xfs_bud_item_intent, }; static struct xfs_bud_log_item * diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 96735f23d12d..032db5269e97 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -306,12 +306,20 @@ xfs_efd_item_release( xfs_efd_item_free(efdp); } +static struct xfs_log_item * +xfs_efd_item_intent( + struct xfs_log_item *lip) +{ + return &EFD_ITEM(lip)->efd_efip->efi_item; +} + static const struct xfs_item_ops xfs_efd_item_ops = { .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | XFS_ITEM_INTENT_DONE, .iop_size = xfs_efd_item_size, .iop_format = xfs_efd_item_format, .iop_release = xfs_efd_item_release, + .iop_intent = xfs_efd_item_intent, }; /* diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index b37a9d2ce652..57a025f5fd4b 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -209,12 +209,20 @@ xfs_cud_item_release( kmem_cache_free(xfs_cud_cache, cudp); } +static struct xfs_log_item * +xfs_cud_item_intent( + struct xfs_log_item *lip) +{ + return &CUD_ITEM(lip)->cud_cuip->cui_item; +} + static const struct xfs_item_ops xfs_cud_item_ops = { .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | XFS_ITEM_INTENT_DONE, .iop_size = xfs_cud_item_size, .iop_format = xfs_cud_item_format, .iop_release = xfs_cud_item_release, + .iop_intent = xfs_cud_item_intent, }; static struct xfs_cud_log_item * diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 5221fd1e6f6f..1c7d8518cb48 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -232,12 +232,20 @@ xfs_rud_item_release( kmem_cache_free(xfs_rud_cache, rudp); } +static struct xfs_log_item * +xfs_rud_item_intent( + struct xfs_log_item *lip) +{ + return &RUD_ITEM(lip)->rud_ruip->rui_item; +} + static const struct xfs_item_ops xfs_rud_item_ops = { .flags = XFS_ITEM_RELEASE_WHEN_COMMITTED | XFS_ITEM_INTENT_DONE, .iop_size = xfs_rud_item_size, .iop_format = xfs_rud_item_format, .iop_release = xfs_rud_item_release, + .iop_intent = xfs_rud_item_intent, }; static struct xfs_rud_log_item * diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index f68e74e46026..d72a5995d33e 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -78,6 +78,7 @@ struct xfs_item_ops { bool (*iop_match)(struct xfs_log_item *item, uint64_t id); struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent, struct xfs_trans *tp); + struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done); }; /* From patchwork Tue May 3 22:17:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836463 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 456C1C433F5 for ; Tue, 3 May 2022 22:35:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237198AbiECWjB (ORCPT ); Tue, 3 May 2022 18:39:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243719AbiECWix (ORCPT ); Tue, 3 May 2022 18:38:53 -0400 Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D31C814092 for ; Tue, 3 May 2022 15:35:18 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 15EA110E626C for ; Wed, 4 May 2022 08:35:18 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGZ-9V for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGo-8W for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 09/10] xfs: whiteouts release intents that are not in the AIL Date: Wed, 4 May 2022 08:17:27 +1000 Message-Id: <20220503221728.185449-10-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=deDjYVbe c=1 sm=1 tr=0 ts=6271ae26 a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=yPCof4ZbAAAA:8 a=VwQbUJbxAAAA:8 a=FmXBc2Kc0CuRUgi88GUA:9 a=AjGcO6oz07-iQ99wixmX:22 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When we release an intent that a whiteout applies to, it will not have been committed to the journal and so won't be in the AIL. Hence when we drop the last reference to the intent, we do not want to try to remove it from the AIL as that will trigger a filesystem shutdown. Hence make the removal of intents from the AIL conditional on them actually being in the AIL so we do the correct thing. Signed-off-by: Dave Chinner Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_bmap_item.c | 9 +++++---- fs/xfs/xfs_extfree_item.c | 9 +++++---- fs/xfs/xfs_refcount_item.c | 9 +++++---- fs/xfs/xfs_rmap_item.c | 9 +++++---- 4 files changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index f05663fdb6ff..51f66e982484 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -55,10 +55,11 @@ xfs_bui_release( struct xfs_bui_log_item *buip) { ASSERT(atomic_read(&buip->bui_refcount) > 0); - if (atomic_dec_and_test(&buip->bui_refcount)) { - xfs_trans_ail_delete(&buip->bui_item, SHUTDOWN_LOG_IO_ERROR); - xfs_bui_item_free(buip); - } + if (!atomic_dec_and_test(&buip->bui_refcount)) + return; + + xfs_trans_ail_delete(&buip->bui_item, 0); + xfs_bui_item_free(buip); } diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 032db5269e97..765be054dffe 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -58,10 +58,11 @@ xfs_efi_release( struct xfs_efi_log_item *efip) { ASSERT(atomic_read(&efip->efi_refcount) > 0); - if (atomic_dec_and_test(&efip->efi_refcount)) { - xfs_trans_ail_delete(&efip->efi_item, SHUTDOWN_LOG_IO_ERROR); - xfs_efi_item_free(efip); - } + if (!atomic_dec_and_test(&efip->efi_refcount)) + return; + + xfs_trans_ail_delete(&efip->efi_item, 0); + xfs_efi_item_free(efip); } /* diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 57a025f5fd4b..7e97bf19793d 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -54,10 +54,11 @@ xfs_cui_release( struct xfs_cui_log_item *cuip) { ASSERT(atomic_read(&cuip->cui_refcount) > 0); - if (atomic_dec_and_test(&cuip->cui_refcount)) { - xfs_trans_ail_delete(&cuip->cui_item, SHUTDOWN_LOG_IO_ERROR); - xfs_cui_item_free(cuip); - } + if (!atomic_dec_and_test(&cuip->cui_refcount)) + return; + + xfs_trans_ail_delete(&cuip->cui_item, 0); + xfs_cui_item_free(cuip); } diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 1c7d8518cb48..fef92e02f3bb 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -54,10 +54,11 @@ xfs_rui_release( struct xfs_rui_log_item *ruip) { ASSERT(atomic_read(&ruip->rui_refcount) > 0); - if (atomic_dec_and_test(&ruip->rui_refcount)) { - xfs_trans_ail_delete(&ruip->rui_item, SHUTDOWN_LOG_IO_ERROR); - xfs_rui_item_free(ruip); - } + if (!atomic_dec_and_test(&ruip->rui_refcount)) + return; + + xfs_trans_ail_delete(&ruip->rui_item, 0); + xfs_rui_item_free(ruip); } STATIC void From patchwork Tue May 3 22:17:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 12836453 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C37C433EF for ; Tue, 3 May 2022 22:17:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243533AbiECWVN (ORCPT ); Tue, 3 May 2022 18:21:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32976 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243555AbiECWVK (ORCPT ); Tue, 3 May 2022 18:21:10 -0400 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 29D4F3F33A for ; Tue, 3 May 2022 15:17:36 -0700 (PDT) Received: from dread.disaster.area (pa49-181-2-147.pa.nsw.optusnet.com.au [49.181.2.147]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 5FA1E5345AF for ; Wed, 4 May 2022 08:17:31 +1000 (AEST) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1nm0pm-007gGb-AX for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 Received: from dave by discord.disaster.area with local (Exim 4.95) (envelope-from ) id 1nm0pm-000mGt-9a for linux-xfs@vger.kernel.org; Wed, 04 May 2022 08:17:30 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 10/10] xfs: intent item whiteouts Date: Wed, 4 May 2022 08:17:28 +1000 Message-Id: <20220503221728.185449-11-david@fromorbit.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220503221728.185449-1-david@fromorbit.com> References: <20220503221728.185449-1-david@fromorbit.com> MIME-Version: 1.0 X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.4 cv=deDjYVbe c=1 sm=1 tr=0 ts=6271a9fb a=ivVLWpVy4j68lT4lJFbQgw==:117 a=ivVLWpVy4j68lT4lJFbQgw==:17 a=oZkIemNP1mAA:10 a=20KFwNOVAAAA:8 a=MU2QJs_iwlFiPCsiWkIA:9 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Dave Chinner When we log modifications based on intents, we add both intent and intent done items to the modification being made. These get written to the log to ensure that the operation is re-run if the intent done is not found in the log. However, for operations that complete wholly within a single checkpoint, the change in the checkpoint is atomic and will never need replay. In this case, we don't need to actually write the intent and intent done items to the journal because log recovery will never need to manually restart this modification. Log recovery currently handles intent/intent done matching by inserting the intent into the AIL, then removing it when a matching intent done item is found. Hence for all the intent-based operations that complete within a checkpoint, we spend all that time parsing the intent/intent done items just to cancel them and do nothing with them. Hence it follows that the only time we actually need intents in the log is when the modification crosses checkpoint boundaries in the log and so may only be partially complete in the journal. Hence if we commit and intent done item to the CIL and the intent item is in the same checkpoint, we don't actually have to write them to the journal because log recovery will always cancel the intents. We've never really worried about the overhead of logging intents unnecessarily like this because the intents we log are generally very much smaller than the change being made. e.g. freeing an extent involves modifying at lease two freespace btree blocks and the AGF, so the EFI/EFD overhead is only a small increase in space and processing time compared to the overall cost of freeing an extent. However, delayed attributes change this cost equation dramatically, especially for inline attributes. In the case of adding an inline attribute, we only log the inode core and attribute fork at present. With delayed attributes, we now log the attr intent which includes the name and value, the inode core adn attr fork, and finally the attr intent done item. We increase the number of items we log from 1 to 3, and the number of log vectors (regions) goes up from 3 to 7. Hence we tripple the number of objects that the CIL has to process, and more than double the number of log vectors that need to be written to the journal. At scale, this means delayed attributes cause a non-pipelined CIL to become CPU bound processing all the extra items, resulting in a > 40% performance degradation on 16-way file+xattr create worklaods. Pipelining the CIL (as per 5.15) reduces the performance degradation to 20%, but now the limitation is the rate at which the log items can be written to the iclogs and iclogs be dispatched for IO and completed. Even log IO completion is slowed down by these intents, because it now has to process 3x the number of items in the checkpoint. Processing completed intents is especially inefficient here, because we first insert the intent into the AIL, then remove it from the AIL when the intent done is processed. IOWs, we are also doing expensive operations in log IO completion we could completely avoid if we didn't log completed intent/intent done pairs. Enter log item whiteouts. When an intent done is committed, we can check to see if the associated intent is in the same checkpoint as we are currently committing the intent done to. If so, we can mark the intent log item with a whiteout and immediately free the intent done item rather than committing it to the CIL. We can basically skip the entire formatting and CIL insertion steps for the intent done item. However, we cannot remove the intent item from the CIL at this point because the unlocked per-cpu CIL item lists do not permit removal without holding the CIL context lock exclusively. Transaction commit only holds the context lock shared, hence the best we can do is mark the intent item with a whiteout so that the CIL push can release it rather than writing it to the log. This means we never write the intent to the log if the intent done has also been committed to the same checkpoint, but we'll always write the intent if the intent done has not been committed or has been committed to a different checkpoint. This will result in correct log recovery behaviour in all cases, without the overhead of logging unnecessary intents. This intent whiteout concept is generic - we can apply it to all intent/intent done pairs that have a direct 1:1 relationship. The way deferred ops iterate and relog intents mean that all intents currently have a 1:1 relationship with their done intent, and hence we can apply this cancellation to all existing intent/intent done implementations. For delayed attributes with a 16-way 64kB xattr create workload, whiteouts reduce the amount of journalled metadata from ~2.5GB/s down to ~600MB/s and improve the creation rate from 9000/s to 14000/s. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Allison Henderson Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_log_cil.c | 78 ++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_trace.h | 3 ++ fs/xfs/xfs_trans.h | 6 ++-- 3 files changed, 82 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index 0d8d092447ad..fecd2ea3e935 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -476,7 +476,8 @@ xlog_cil_insert_format_items( static void xlog_cil_insert_items( struct xlog *log, - struct xfs_trans *tp) + struct xfs_trans *tp, + uint32_t released_space) { struct xfs_cil *cil = log->l_cilp; struct xfs_cil_ctx *ctx = cil->xc_ctx; @@ -525,7 +526,9 @@ xlog_cil_insert_items( ASSERT(tp->t_ticket->t_curr_res >= len); } tp->t_ticket->t_curr_res -= len; + tp->t_ticket->t_curr_res += released_space; ctx->space_used += len; + ctx->space_used -= released_space; /* * If we've overrun the reservation, dump the tx details before we move @@ -970,11 +973,16 @@ xlog_cil_build_trans_hdr( * Pull all the log vectors off the items in the CIL, and remove the items from * the CIL. We don't need the CIL lock here because it's only needed on the * transaction commit side which is currently locked out by the flush lock. + * + * If a log item is marked with a whiteout, we do not need to write it to the + * journal and so we just move them to the whiteout list for the caller to + * dispose of appropriately. */ static void xlog_cil_build_lv_chain( struct xfs_cil *cil, struct xfs_cil_ctx *ctx, + struct list_head *whiteouts, uint32_t *num_iovecs, uint32_t *num_bytes) { @@ -985,6 +993,13 @@ xlog_cil_build_lv_chain( item = list_first_entry(&cil->xc_cil, struct xfs_log_item, li_cil); + + if (test_bit(XFS_LI_WHITEOUT, &item->li_flags)) { + list_move(&item->li_cil, whiteouts); + trace_xfs_cil_whiteout_skip(item); + continue; + } + list_del_init(&item->li_cil); if (!ctx->lv_chain) ctx->lv_chain = item->li_lv; @@ -1000,6 +1015,19 @@ xlog_cil_build_lv_chain( } } +static void +xlog_cil_push_cleanup_whiteouts( + struct list_head *whiteouts) +{ + while (!list_empty(whiteouts)) { + struct xfs_log_item *item = list_first_entry(whiteouts, + struct xfs_log_item, li_cil); + list_del_init(&item->li_cil); + trace_xfs_cil_whiteout_unpin(item); + item->li_ops->iop_unpin(item, 1); + } +} + /* * Push the Committed Item List to the log. * @@ -1030,6 +1058,7 @@ xlog_cil_push_work( struct xfs_log_vec lvhdr = { NULL }; xfs_csn_t push_seq; bool push_commit_stable; + LIST_HEAD (whiteouts); new_ctx = xlog_cil_ctx_alloc(); new_ctx->ticket = xlog_cil_ticket_alloc(log); @@ -1098,7 +1127,7 @@ xlog_cil_push_work( list_add(&ctx->committing, &cil->xc_committing); spin_unlock(&cil->xc_push_lock); - xlog_cil_build_lv_chain(cil, ctx, &num_iovecs, &num_bytes); + xlog_cil_build_lv_chain(cil, ctx, &whiteouts, &num_iovecs, &num_bytes); /* * Switch the contexts so we can drop the context lock and move out @@ -1201,6 +1230,7 @@ xlog_cil_push_work( /* Not safe to reference ctx now! */ spin_unlock(&log->l_icloglock); + xlog_cil_push_cleanup_whiteouts(&whiteouts); return; out_skip: @@ -1212,6 +1242,7 @@ xlog_cil_push_work( out_abort_free_ticket: xfs_log_ticket_ungrant(log, ctx->ticket); ASSERT(xlog_is_shutdown(log)); + xlog_cil_push_cleanup_whiteouts(&whiteouts); if (!ctx->commit_iclog) { xlog_cil_committed(ctx); return; @@ -1360,6 +1391,43 @@ xlog_cil_empty( return empty; } +/* + * If there are intent done items in this transaction and the related intent was + * committed in the current (same) CIL checkpoint, we don't need to write either + * the intent or intent done item to the journal as the change will be + * journalled atomically within this checkpoint. As we cannot remove items from + * the CIL here, mark the related intent with a whiteout so that the CIL push + * can remove it rather than writing it to the journal. Then remove the intent + * done item from the current transaction and release it so it doesn't get put + * into the CIL at all. + */ +static uint32_t +xlog_cil_process_intents( + struct xfs_cil *cil, + struct xfs_trans *tp) +{ + struct xfs_log_item *lip, *ilip, *next; + uint32_t len = 0; + + list_for_each_entry_safe(lip, next, &tp->t_items, li_trans) { + if (!(lip->li_ops->flags & XFS_ITEM_INTENT_DONE)) + continue; + + ilip = lip->li_ops->iop_intent(lip); + if (!ilip || !xlog_item_in_current_chkpt(cil, ilip)) + continue; + set_bit(XFS_LI_WHITEOUT, &ilip->li_flags); + trace_xfs_cil_whiteout_mark(ilip); + len += ilip->li_lv->lv_bytes; + kmem_free(ilip->li_lv); + ilip->li_lv = NULL; + + xfs_trans_del_item(lip); + lip->li_ops->iop_release(lip); + } + return len; +} + /* * Commit a transaction with the given vector to the Committed Item List. * @@ -1382,6 +1450,7 @@ xlog_cil_commit( { struct xfs_cil *cil = log->l_cilp; struct xfs_log_item *lip, *next; + uint32_t released_space = 0; /* * Do all necessary memory allocation before we lock the CIL. @@ -1393,7 +1462,10 @@ xlog_cil_commit( /* lock out background commit */ down_read(&cil->xc_ctx_lock); - xlog_cil_insert_items(log, tp); + if (tp->t_flags & XFS_TRANS_HAS_INTENT_DONE) + released_space = xlog_cil_process_intents(cil, tp); + + xlog_cil_insert_items(log, tp, released_space); if (regrant && !xlog_is_shutdown(log)) xfs_log_ticket_regrant(log, tp->t_ticket); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index e1197f9ad97e..75934e3c3f55 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -1332,6 +1332,9 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_push); DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned); DEFINE_LOG_ITEM_EVENT(xfs_ail_locked); DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing); +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_mark); +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_skip); +DEFINE_LOG_ITEM_EVENT(xfs_cil_whiteout_unpin); DECLARE_EVENT_CLASS(xfs_ail_class, TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index d72a5995d33e..9561f193e7e1 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -55,13 +55,15 @@ struct xfs_log_item { #define XFS_LI_IN_AIL 0 #define XFS_LI_ABORTED 1 #define XFS_LI_FAILED 2 -#define XFS_LI_DIRTY 3 /* log item dirty in transaction */ +#define XFS_LI_DIRTY 3 +#define XFS_LI_WHITEOUT 4 #define XFS_LI_FLAGS \ { (1u << XFS_LI_IN_AIL), "IN_AIL" }, \ { (1u << XFS_LI_ABORTED), "ABORTED" }, \ { (1u << XFS_LI_FAILED), "FAILED" }, \ - { (1u << XFS_LI_DIRTY), "DIRTY" } + { (1u << XFS_LI_DIRTY), "DIRTY" }, \ + { (1u << XFS_LI_WHITEOUT), "WHITEOUT" } struct xfs_item_ops { unsigned flags;