Message ID | 169704721284.1773611.1915589661676489.stgit@frogsfrogsfrogs (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: refactor rt extent unit conversions | expand |
On Wed, Oct 11, 2023 at 11:06:14AM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Avoid the costs of integer division (32-bit and 64-bit) if the realtime > extent size is a power of two. Looks good: Reviewed-by: Christoph Hellwig <hch@lst.de> Do you have any data on how common non-power of two rtext sizes are? Might it be worth to add unlikely annotations? > @@ -11,6 +11,9 @@ xfs_rtx_to_rtb( > struct xfs_mount *mp, > xfs_rtxnum_t rtx) > { > + if (mp->m_rtxblklog >= 0) > + return rtx << mp->m_rtxblklog; > + > return rtx * mp->m_sb.sb_rextsize; i.e. if (unlikely(mp->m_rtxblklog == ‐1)) return rtx * mp->m_sb.sb_rextsize; return rtx << mp->m_rtxblklog;
On Thu, Oct 12, 2023 at 07:25:11AM +0200, Christoph Hellwig wrote: > On Wed, Oct 11, 2023 at 11:06:14AM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Avoid the costs of integer division (32-bit and 64-bit) if the realtime > > extent size is a power of two. > > Looks good: > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > Do you have any data on how common non-power of two rtext sizes are? > Might it be worth to add unlikely annotations? I don't really know about the historical uses. There might be old filesystems out there with a non-power-of-2 raid stripe size that are set up for full stripe allocations for speed. We (oracle) are interested in using rt for PMD allocations on pmem/cxl devices and atomic writes on scsi/nvme devices. Both of those cases will only ever use powers of 2. I'll add some if-test annotations and we'll see if anyone notices. ;) --D > > @@ -11,6 +11,9 @@ xfs_rtx_to_rtb( > > struct xfs_mount *mp, > > xfs_rtxnum_t rtx) > > { > > + if (mp->m_rtxblklog >= 0) > > + return rtx << mp->m_rtxblklog; > > + > > return rtx * mp->m_sb.sb_rextsize; > > i.e. > > if (unlikely(mp->m_rtxblklog == ‐1)) > return rtx * mp->m_sb.sb_rextsize; > return rtx << mp->m_rtxblklog; >
On Thu, Oct 12, 2023 at 11:19:08AM -0700, Darrick J. Wong wrote: > On Thu, Oct 12, 2023 at 07:25:11AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 11, 2023 at 11:06:14AM -0700, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Avoid the costs of integer division (32-bit and 64-bit) if the realtime > > > extent size is a power of two. > > > > Looks good: > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > Do you have any data on how common non-power of two rtext sizes are? > > Might it be worth to add unlikely annotations? > > I don't really know about the historical uses. There might be old > filesystems out there with a non-power-of-2 raid stripe size that are > set up for full stripe allocations for speed. > > We (oracle) are interested in using rt for PMD allocations on pmem/cxl > devices and atomic writes on scsi/nvme devices. Both of those cases > will only ever use powers of 2. > > I'll add some if-test annotations and we'll see if anyone notices. ;) > > --D We are using 1044KB realtime extents (blocksize = 4096, rextsize = 261) for our blob storage system. It's a goofy number, but IIRC it was chosen because their most common blob sizes were single-digit multiples of a megabyte, and they wanted a large-ish (~1MB) realtime extent size to reduce external fragmentation, but they also wanted to store a bit of extra metadata without requiring an extra realtime extent and blowing up internal fragmentation.
On Mon, Oct 16, 2023 at 10:19:47AM -0700, Omar Sandoval wrote: > On Thu, Oct 12, 2023 at 11:19:08AM -0700, Darrick J. Wong wrote: > > On Thu, Oct 12, 2023 at 07:25:11AM +0200, Christoph Hellwig wrote: > > > On Wed, Oct 11, 2023 at 11:06:14AM -0700, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > Avoid the costs of integer division (32-bit and 64-bit) if the realtime > > > > extent size is a power of two. > > > > > > Looks good: > > > > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > > > > > Do you have any data on how common non-power of two rtext sizes are? > > > Might it be worth to add unlikely annotations? > > > > I don't really know about the historical uses. There might be old > > filesystems out there with a non-power-of-2 raid stripe size that are > > set up for full stripe allocations for speed. > > > > We (oracle) are interested in using rt for PMD allocations on pmem/cxl > > devices and atomic writes on scsi/nvme devices. Both of those cases > > will only ever use powers of 2. > > > > I'll add some if-test annotations and we'll see if anyone notices. ;) > > > > --D > > We are using 1044KB realtime extents (blocksize = 4096, rextsize = 261) > for our blob storage system. It's a goofy number, but IIRC it was chosen > because their most common blob sizes were single-digit multiples of a > megabyte, and they wanted a large-ish (~1MB) realtime extent size to > reduce external fragmentation, but they also wanted to store a bit of > extra metadata without requiring an extra realtime extent and blowing up > internal fragmentation. LOL, and here I thought I was only pushing weird sizes like 28k to drive willy crazy. ;) I wrapped some of the if tests in likely(); they can get ripped back out. --D
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index bc51d3bfc7c45..9dd791181ca2b 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -11,6 +11,9 @@ xfs_rtx_to_rtb( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (mp->m_rtxblklog >= 0) + return rtx << mp->m_rtxblklog; + return rtx * mp->m_sb.sb_rextsize; } @@ -19,6 +22,9 @@ xfs_rtxlen_to_extlen( struct xfs_mount *mp, xfs_rtxlen_t rtxlen) { + if (mp->m_rtxblklog >= 0) + return rtxlen << mp->m_rtxblklog; + return rtxlen * mp->m_sb.sb_rextsize; } @@ -28,6 +34,9 @@ xfs_extlen_to_rtxmod( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len & mp->m_rtxblkmask; + return len % mp->m_sb.sb_rextsize; } @@ -36,6 +45,9 @@ xfs_extlen_to_rtxlen( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len >> mp->m_rtxblklog; + return len / mp->m_sb.sb_rextsize; } @@ -45,6 +57,11 @@ xfs_rtb_to_rtx( xfs_rtblock_t rtbno, xfs_extlen_t *mod) { + if (mp->m_rtxblklog >= 0) { + *mod = rtbno & mp->m_rtxblkmask; + return rtbno >> mp->m_rtxblklog; + } + return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, mod); } @@ -53,6 +70,9 @@ xfs_rtb_to_rtxt( struct xfs_mount *mp, xfs_rtblock_t rtbno) { + if (mp->m_rtxblklog >= 0) + return rtbno >> mp->m_rtxblklog; + return div_u64(rtbno, mp->m_sb.sb_rextsize); } diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 5fb142b8e3d92..db04378f287a6 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1011,6 +1011,8 @@ xfs_sb_mount_common( mp->m_blockmask = sbp->sb_blocksize - 1; mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; mp->m_blockwmask = mp->m_blockwsize - 1; + mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); + mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index dfde4021905b7..953466922ddf7 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -203,6 +203,18 @@ static inline bool isaligned_64(uint64_t x, uint32_t y) return do_div(x, y) == 0; } +/* If @b is a power of 2, return log2(b). Else return -1. */ +static inline int8_t log2_if_power2(unsigned long b) +{ + return is_power_of_2(b) ? ilog2(b) : -1; +} + +/* If @b is a power of 2, return a mask of the lower bits, else return zero. */ +static inline unsigned long long mask64_if_power2(unsigned long b) +{ + return is_power_of_2(b) ? b - 1 : 0; +} + int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, char *data, enum req_op op); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index f42679a6569ff..3c5b0327164bc 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -120,6 +120,7 @@ typedef struct xfs_mount { uint8_t m_blkbb_log; /* blocklog - BBSHIFT */ uint8_t m_agno_log; /* log #ag's */ uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ + int8_t m_rtxblklog; /* log2 of rextsize, if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ uint m_blockwmask; /* blockwsize-1 */ @@ -153,6 +154,7 @@ typedef struct xfs_mount { uint64_t m_features; /* active filesystem features */ uint64_t m_low_space[XFS_LOWSP_MAX]; uint64_t m_low_rtexts[XFS_LOWSP_MAX]; + uint64_t m_rtxblkmask; /* rt extent block mask */ struct xfs_ino_geometry m_ino_geo; /* inode geometry */ struct xfs_trans_resv m_resv; /* precomputed res values */ /* low free space thresholds */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index e62ca51b995ba..dc4874776c2cb 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1052,6 +1052,7 @@ xfs_growfs_rt( * Calculate new sb and mount fields for this round. */ nsbp->sb_rextsize = in->extsize; + nmp->m_rtxblklog = -1; /* don't use shift or masking */ nsbp->sb_rbmblocks = bmbno + 1; nrblocks_step = (bmbno + 1) * NBBY * nsbp->sb_blocksize * nsbp->sb_rextsize; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 22aa2111b6cab..35161074c4e7d 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -712,6 +712,10 @@ xfs_trans_unreserve_and_mod_sb( mp->m_sb.sb_agcount += tp->t_agcount_delta; mp->m_sb.sb_imax_pct += tp->t_imaxpct_delta; mp->m_sb.sb_rextsize += tp->t_rextsize_delta; + if (tp->t_rextsize_delta) { + mp->m_rtxblklog = log2_if_power2(mp->m_sb.sb_rextsize); + mp->m_rtxblkmask = mask64_if_power2(mp->m_sb.sb_rextsize); + } mp->m_sb.sb_rbmblocks += tp->t_rbmblocks_delta; mp->m_sb.sb_rblocks += tp->t_rblocks_delta; mp->m_sb.sb_rextents += tp->t_rextents_delta;