diff mbox series

[09/14] bitmap: extend bitmap_{get,set}_value8() to bitmap_{get,set}_bits()

Message ID 20231009151026.66145-10-aleksander.lobakin@intel.com (mailing list archive)
State Superseded
Headers show
Series ip_tunnel: convert __be16 tunnel flags to bitmaps | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-3 fail Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-0 success Logs for ${{ matrix.test }} on ${{ matrix.arch }} with ${{ matrix.toolchain_full }}
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 fail Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-7 success Logs for veristat
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 15633 this patch: 15633
netdev/cc_maintainers success CCed 3 of 3 maintainers
netdev/build_clang success Errors and warnings before: 3934 this patch: 3934
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 17035 this patch: 17035
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 83 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-PR fail PR summary
bpf/vmtest-bpf-VM_Test-6 success Logs for test_maps on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-VM_Test-0 success Logs for ShellCheck
bpf/vmtest-bpf-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-1 success Logs for build for aarch64 with gcc
bpf/vmtest-bpf-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-VM_Test-5 success Logs for set-matrix
bpf/vmtest-bpf-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-10 success Logs for test_progs on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-11 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-VM_Test-12 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-13 fail Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-14 success Logs for test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-15 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-VM_Test-16 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-17 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-18 success Logs for test_progs_no_alu32_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-19 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-20 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-21 success Logs for test_progs_parallel on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-22 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-23 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-24 success Logs for test_verifier on aarch64 with gcc
bpf/vmtest-bpf-VM_Test-25 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-VM_Test-26 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-VM_Test-27 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-VM_Test-28 success Logs for veristat

Commit Message

Alexander Lobakin Oct. 9, 2023, 3:10 p.m. UTC
Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
particular offset. Currently, there are only bitmap_{get,set}_value8()
to do that for 8 bits and that's it.
Instead of introducing a separate pair for u16 and so on, which doesn't
scale well, extend the existing functions to be able to pass the wanted
value width. Make both offset and width arbitrary, but in order to not
over complicate the current logic and keep the helpers as optimized as
the current ones, require the width to be a pow-2 value and the offset
to be a multiple of the width, while the target piece should not cross
a %BITS_PER_LONG boundary and stay within one long.
Avoid adjusting all the already existing callsites by defining oneliner
wrapper macros named after the former functions. bloat-o-meter shows
almost no difference (+1-2 bytes in a couple of places), meaning the
new helpers get optimized just nicely.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitmap.h | 51 ++++++++++++++++++++++++++++++------------
 1 file changed, 37 insertions(+), 14 deletions(-)

Comments

Yury Norov Oct. 9, 2023, 4:31 p.m. UTC | #1
+ Alexander Potapenko <glider@google.com>

On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> particular offset. Currently, there are only bitmap_{get,set}_value8()
> to do that for 8 bits and that's it.

And also a series from Alexander Potapenko, which I really hope will
get into the -next really soon. It introduces bitmap_read/write which
can set up to BITS_PER_LONG at once, with no limitations on alignment
of position and length:

https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb

Can you consider building your series on top of it?

> Instead of introducing a separate pair for u16 and so on, which doesn't
> scale well, extend the existing functions to be able to pass the wanted
> value width. Make both offset and width arbitrary, but in order to not
> over complicate the current logic and keep the helpers as optimized as
> the current ones, require the width to be a pow-2 value and the offset
> to be a multiple of the width, while the target piece should not cross
> a %BITS_PER_LONG boundary and stay within one long.
> Avoid adjusting all the already existing callsites by defining oneliner
> wrapper macros named after the former functions. bloat-o-meter shows
> almost no difference (+1-2 bytes in a couple of places), meaning the
> new helpers get optimized just nicely.
> 
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/linux/bitmap.h | 51 ++++++++++++++++++++++++++++++------------
>  1 file changed, 37 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
> index 63e422f8ba3d..9c010a7fa331 100644
> --- a/include/linux/bitmap.h
> +++ b/include/linux/bitmap.h
> @@ -6,8 +6,10 @@
>  
>  #include <linux/align.h>
>  #include <linux/bitops.h>
> +#include <linux/bug.h>
>  #include <linux/find.h>
>  #include <linux/limits.h>
> +#include <linux/log2.h>
>  #include <linux/string.h>
>  #include <linux/types.h>
>  
> @@ -569,38 +571,59 @@ static inline void bitmap_from_u64(unsigned long *dst, u64 mask)
>  }
>  
>  /**
> - * bitmap_get_value8 - get an 8-bit value within a memory region
> + * bitmap_get_bits - get a 8/16/32/64-bit value within a memory region
>   * @map: address to the bitmap memory region
> - * @start: bit offset of the 8-bit value; must be a multiple of 8
> + * @start: bit offset of the value; must be a multiple of @len
> + * @len: bit width of the value; must be a power of two
>   *
> - * Returns the 8-bit value located at the @start bit offset within the @src
> - * memory region.
> + * Return: the 8/16/32/64-bit value located at the @start bit offset within
> + * the @src memory region. Its position (@start + @len) can't cross
> + * a ``BITS_PER_LONG`` boundary.
>   */
> -static inline unsigned long bitmap_get_value8(const unsigned long *map,
> -					      unsigned long start)
> +static inline unsigned long bitmap_get_bits(const unsigned long *map,
> +					    unsigned long start, size_t len)
>  {
>  	const size_t index = BIT_WORD(start);
>  	const unsigned long offset = start % BITS_PER_LONG;
>  
> -	return (map[index] >> offset) & 0xFF;
> +	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
> +			 offset + len > BITS_PER_LONG))
> +		return 0;
> +
> +	return (map[index] >> offset) & GENMASK(len - 1, 0);
>  }
>  
>  /**
> - * bitmap_set_value8 - set an 8-bit value within a memory region
> + * bitmap_set_bits - set a 8/16/32/64-bit value within a memory region
>   * @map: address to the bitmap memory region
> - * @value: the 8-bit value; values wider than 8 bits may clobber bitmap
> - * @start: bit offset of the 8-bit value; must be a multiple of 8
> + * @start: bit offset of the value; must be a multiple of @len
> + * @value: new value to set
> + * @len: bit width of the value; must be a power of two
> + *
> + * Replaces the 8/16/32/64-bit value located at the @start bit offset within
> + * the @src memory region with the new @value. Its position (@start + @len)
> + * can't cross a ``BITS_PER_LONG`` boundary.
>   */
> -static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
> -				     unsigned long start)
> +static inline void bitmap_set_bits(unsigned long *map, unsigned long start,
> +				   unsigned long value, size_t len)
>  {
>  	const size_t index = BIT_WORD(start);
>  	const unsigned long offset = start % BITS_PER_LONG;
> +	unsigned long mask = GENMASK(len - 1, 0);
>  
> -	map[index] &= ~(0xFFUL << offset);
> -	map[index] |= value << offset;
> +	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
> +			 offset + len > BITS_PER_LONG))
> +		return;
> +
> +	map[index] &= ~(mask << offset);
> +	map[index] |= (value & mask) << offset;
>  }
>  
> +#define bitmap_get_value8(map, start)				\
> +	bitmap_get_bits(map, start, BITS_PER_BYTE)
> +#define bitmap_set_value8(map, value, start)			\
> +	bitmap_set_bits(map, start, value, BITS_PER_BYTE)
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __LINUX_BITMAP_H */
> -- 
> 2.41.0
Alexander Lobakin Oct. 11, 2023, 9:33 a.m. UTC | #2
From: Yury Norov <yury.norov@gmail.com>
Date: Mon, 9 Oct 2023 09:31:15 -0700

> + Alexander Potapenko <glider@google.com>
> 
> On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
>> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
>> particular offset. Currently, there are only bitmap_{get,set}_value8()
>> to do that for 8 bits and that's it.
> 
> And also a series from Alexander Potapenko, which I really hope will
> get into the -next really soon. It introduces bitmap_read/write which
> can set up to BITS_PER_LONG at once, with no limitations on alignment
> of position and length:
> 
> https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> 
> Can you consider building your series on top of it?

Yeah, I mentioned in the cover letter that I'm aware of it and in fact
it doesn't conflict much, as the functions I'm adding here get optimized
as much as the original bitmap_{get,set}_value8(), while Alexander's
generic helpers are heavier.
I realize lots of calls will be optimized as well due to the offset and
the width being compile-time constants, but not all of them. The idea of
keeping two pairs of helpers initially came from Andy if I understood
him correctly.
What do you think? I can provide some bloat-o-meter stats after
rebasing. And either way, I see no issue in basing this series on top of
Alex' one.

> 
>> Instead of introducing a separate pair for u16 and so on, which doesn't
>> scale well, extend the existing functions to be able to pass the wanted
>> value width. Make both offset and width arbitrary, but in order to not
>> over complicate the current logic and keep the helpers as optimized as
>> the current ones, require the width to be a pow-2 value and the offset
>> to be a multiple of the width, while the target piece should not cross
>> a %BITS_PER_LONG boundary and stay within one long.
>> Avoid adjusting all the already existing callsites by defining oneliner
>> wrapper macros named after the former functions. bloat-o-meter shows
>> almost no difference (+1-2 bytes in a couple of places), meaning the

[...]

Thanks,
Olek
Andy Shevchenko Oct. 11, 2023, 10:36 a.m. UTC | #3
On Wed, Oct 11, 2023 at 11:33:25AM +0200, Alexander Lobakin wrote:
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 9 Oct 2023 09:31:15 -0700
> 
> > + Alexander Potapenko <glider@google.com>
> > 
> > On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> >> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> >> particular offset. Currently, there are only bitmap_{get,set}_value8()
> >> to do that for 8 bits and that's it.
> > 
> > And also a series from Alexander Potapenko, which I really hope will
> > get into the -next really soon. It introduces bitmap_read/write which
> > can set up to BITS_PER_LONG at once, with no limitations on alignment
> > of position and length:
> > 
> > https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> > 
> > Can you consider building your series on top of it?
> 
> Yeah, I mentioned in the cover letter that I'm aware of it and in fact
> it doesn't conflict much, as the functions I'm adding here get optimized
> as much as the original bitmap_{get,set}_value8(), while Alexander's
> generic helpers are heavier.
> I realize lots of calls will be optimized as well due to the offset and
> the width being compile-time constants, but not all of them. The idea of
> keeping two pairs of helpers initially came from Andy if I understood
> him correctly.

Just a disclaimer: The idea came before I saw the series by Alexander Potapenko.

> What do you think? I can provide some bloat-o-meter stats after
> rebasing. And either way, I see no issue in basing this series on top of
> Alex' one.
Yury Norov Oct. 15, 2023, 2:20 a.m. UTC | #4
On Wed, Oct 11, 2023 at 11:33:25AM +0200, Alexander Lobakin wrote:
> From: Yury Norov <yury.norov@gmail.com>
> Date: Mon, 9 Oct 2023 09:31:15 -0700
> 
> > + Alexander Potapenko <glider@google.com>
> > 
> > On Mon, Oct 09, 2023 at 05:10:21PM +0200, Alexander Lobakin wrote:
> >> Sometimes there's need to get a 8/16/...-bit piece of a bitmap at a
> >> particular offset. Currently, there are only bitmap_{get,set}_value8()
> >> to do that for 8 bits and that's it.
> > 
> > And also a series from Alexander Potapenko, which I really hope will
> > get into the -next really soon. It introduces bitmap_read/write which
> > can set up to BITS_PER_LONG at once, with no limitations on alignment
> > of position and length:
> > 
> > https://lore.kernel.org/linux-arm-kernel/ZRXbOoKHHafCWQCW@yury-ThinkPad/T/#mc311037494229647088b3a84b9f0d9b50bf227cb
> > 
> > Can you consider building your series on top of it?
> 
> Yeah, I mentioned in the cover letter that I'm aware of it and in fact
> it doesn't conflict much, as the functions I'm adding here get optimized
> as much as the original bitmap_{get,set}_value8(), while Alexander's
> generic helpers are heavier.
> I realize lots of calls will be optimized as well due to the offset and
> the width being compile-time constants, but not all of them. The idea of
> keeping two pairs of helpers initially came from Andy if I understood
> him correctly.
> What do you think? I can provide some bloat-o-meter stats after
> rebasing. And either way, I see no issue in basing this series on top of
> Alex' one.

You're right, let's try both and see what how worse is one comparing
to another wrt bloat-o-meter and overall code generation. If the
difference is not that terrible, I'd stick to universal and simpler
for users version.

If the difference is significant, we'd have to keep both. Maybe it's
worth to try merge the aligned case into generic one, but it's not the
purpose of your series, of course.

Thanks,
Yury
diff mbox series

Patch

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 63e422f8ba3d..9c010a7fa331 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -6,8 +6,10 @@ 
 
 #include <linux/align.h>
 #include <linux/bitops.h>
+#include <linux/bug.h>
 #include <linux/find.h>
 #include <linux/limits.h>
+#include <linux/log2.h>
 #include <linux/string.h>
 #include <linux/types.h>
 
@@ -569,38 +571,59 @@  static inline void bitmap_from_u64(unsigned long *dst, u64 mask)
 }
 
 /**
- * bitmap_get_value8 - get an 8-bit value within a memory region
+ * bitmap_get_bits - get a 8/16/32/64-bit value within a memory region
  * @map: address to the bitmap memory region
- * @start: bit offset of the 8-bit value; must be a multiple of 8
+ * @start: bit offset of the value; must be a multiple of @len
+ * @len: bit width of the value; must be a power of two
  *
- * Returns the 8-bit value located at the @start bit offset within the @src
- * memory region.
+ * Return: the 8/16/32/64-bit value located at the @start bit offset within
+ * the @src memory region. Its position (@start + @len) can't cross
+ * a ``BITS_PER_LONG`` boundary.
  */
-static inline unsigned long bitmap_get_value8(const unsigned long *map,
-					      unsigned long start)
+static inline unsigned long bitmap_get_bits(const unsigned long *map,
+					    unsigned long start, size_t len)
 {
 	const size_t index = BIT_WORD(start);
 	const unsigned long offset = start % BITS_PER_LONG;
 
-	return (map[index] >> offset) & 0xFF;
+	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
+			 offset + len > BITS_PER_LONG))
+		return 0;
+
+	return (map[index] >> offset) & GENMASK(len - 1, 0);
 }
 
 /**
- * bitmap_set_value8 - set an 8-bit value within a memory region
+ * bitmap_set_bits - set a 8/16/32/64-bit value within a memory region
  * @map: address to the bitmap memory region
- * @value: the 8-bit value; values wider than 8 bits may clobber bitmap
- * @start: bit offset of the 8-bit value; must be a multiple of 8
+ * @start: bit offset of the value; must be a multiple of @len
+ * @value: new value to set
+ * @len: bit width of the value; must be a power of two
+ *
+ * Replaces the 8/16/32/64-bit value located at the @start bit offset within
+ * the @src memory region with the new @value. Its position (@start + @len)
+ * can't cross a ``BITS_PER_LONG`` boundary.
  */
-static inline void bitmap_set_value8(unsigned long *map, unsigned long value,
-				     unsigned long start)
+static inline void bitmap_set_bits(unsigned long *map, unsigned long start,
+				   unsigned long value, size_t len)
 {
 	const size_t index = BIT_WORD(start);
 	const unsigned long offset = start % BITS_PER_LONG;
+	unsigned long mask = GENMASK(len - 1, 0);
 
-	map[index] &= ~(0xFFUL << offset);
-	map[index] |= value << offset;
+	if (WARN_ON_ONCE(!is_power_of_2(len) || offset % len ||
+			 offset + len > BITS_PER_LONG))
+		return;
+
+	map[index] &= ~(mask << offset);
+	map[index] |= (value & mask) << offset;
 }
 
+#define bitmap_get_value8(map, start)				\
+	bitmap_get_bits(map, start, BITS_PER_BYTE)
+#define bitmap_set_value8(map, value, start)			\
+	bitmap_set_bits(map, start, value, BITS_PER_BYTE)
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __LINUX_BITMAP_H */