All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] bitops: optimize fns() for more
@ 2024-05-02 23:32 Yury Norov
  2024-05-02 23:32 ` [PATCH 1/4] lib: make test_bitops compilable into the kernel image Yury Norov
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Yury Norov @ 2024-05-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Yury Norov, Rasmus Villemoes, Andrew Morton, Kuan-Wei Chiu

This series follows up [1].

[1] improves performance by switching from __ffs() + __clear_bit()
in fns() to the equivalent but much faster expression that searches
and clears first N-1 bits:

	while (word && n--)
		word &= word - 1;

We can squeeze out of fns() even more by replacing linear walk over all
the bits in a word with a binary search.

Patch #3 implements it by adding fns8(), fns16(), fns32() and fns64(), 
and patches 1 and 2 are cleanups related to fns().

The last patch creates a MAINTAINERS record for bitops. Currently they
aren't maintained. I add Rasmus and myself as a reviewer and maintainer,
accordingly, just like for bitmaps. Rasmus, please let me know if you
don't want to review it.

[1] https://lore.kernel.org/linux-kernel/20240502092443.6845-2-visitorckw@gmail.com/T/

Yury Norov (4):
  lib: make test_bitops compilable into the kernel image
  bitmap: relax find_nth_bit() limitation on return value
  bitops: squeeze even more out of fns()
  MAINTAINERS: add BITOPS API record

 MAINTAINERS            | 13 +++++++++++++
 include/linux/bitops.h | 42 +++++++++++++++++++++++++++++++++++++-----
 include/linux/find.h   |  2 +-
 lib/Kconfig.debug      |  1 -
 lib/find_bit.c         |  2 +-
 lib/test_bitmap.c      |  4 ++--
 6 files changed, 54 insertions(+), 10 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-02 23:32 [PATCH 0/4] bitops: optimize fns() for more Yury Norov
@ 2024-05-02 23:32 ` Yury Norov
  2024-05-03  2:00   ` Kuan-Wei Chiu
  2024-05-02 23:32 ` [PATCH 2/4] bitmap: relax find_nth_bit() limitation on return value Yury Norov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Yury Norov @ 2024-05-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Yury Norov, Rasmus Villemoes, Andrew Morton, Kuan-Wei Chiu

The test is limited to be compiled as a module. There's no technical
reason for it. Now that the test bears performance benchmark, it would
be reasonable to allow running it at kernel load time, before userspace
starts, to reduce possible jitter.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/Kconfig.debug | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c63a5fbf1f1c..fc8fe1ea5b49 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2436,7 +2436,6 @@ config TEST_LKM
 
 config TEST_BITOPS
 	tristate "Test module for compilation of bitops operations"
-	depends on m
 	help
 	  This builds the "test_bitops" module that is much like the
 	  TEST_LKM module except that it does a basic exercise of the
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/4] bitmap: relax find_nth_bit() limitation on return value
  2024-05-02 23:32 [PATCH 0/4] bitops: optimize fns() for more Yury Norov
  2024-05-02 23:32 ` [PATCH 1/4] lib: make test_bitops compilable into the kernel image Yury Norov
@ 2024-05-02 23:32 ` Yury Norov
  2024-05-02 23:32 ` [PATCH 3/4] bitops: squeeze even more out of fns() Yury Norov
  2024-05-02 23:32 ` [PATCH 4/4] MAINTAINERS: add BITOPS API record Yury Norov
  3 siblings, 0 replies; 13+ messages in thread
From: Yury Norov @ 2024-05-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Yury Norov, Rasmus Villemoes, Andrew Morton, Kuan-Wei Chiu

The function claims to return the bitmap size if Nth bit doesn't exist.
This rule is violated in inline case because the fns() that is used
there doesn't know anything about size of the bitmap.

So, relax this requirement to '>= size', and make the outline
implementation a bit cheaper.

All in-tree kernel users of find_nth_bit() are safe against that.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Closes: https://lore.kernel.org/all/Zi50cAgR8nZvgLa3@yury-ThinkPad/T/#m6da806a0525e74dcc91f35e5f20766ed4e853e8a
Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/find.h | 2 +-
 lib/find_bit.c       | 2 +-
 lib/test_bitmap.c    | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/find.h b/include/linux/find.h
index c69598e383c1..02751e43d448 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -220,7 +220,7 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
  *	 idx = find_first_bit(addr, size);
  *
  * Returns the bit number of the N'th set bit.
- * If no such, returns @size.
+ * If no such, returns >= @size.
  */
 static inline
 unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n)
diff --git a/lib/find_bit.c b/lib/find_bit.c
index 32f99e9a670e..0bddfc3ff248 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -87,7 +87,7 @@ out:										\
 	if (sz % BITS_PER_LONG)							\
 		tmp = (FETCH) & BITMAP_LAST_WORD_MASK(sz);			\
 found:										\
-	sz = min(idx * BITS_PER_LONG + fns(tmp, nr), sz);			\
+	sz = idx * BITS_PER_LONG + fns(tmp, nr);				\
 out:										\
 	sz;									\
 })
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6b2b33579f56..088838f829c9 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -244,7 +244,7 @@ static void __init test_find_nth_bit(void)
 	expect_eq_uint(60,  find_nth_bit(bmap, 64 * 3, 5));
 	expect_eq_uint(80,  find_nth_bit(bmap, 64 * 3, 6));
 	expect_eq_uint(123, find_nth_bit(bmap, 64 * 3, 7));
-	expect_eq_uint(64 * 3, find_nth_bit(bmap, 64 * 3, 8));
+	expect_eq_uint(0,   !!(find_nth_bit(bmap, 64 * 3, 8) < 64 * 3));
 
 	expect_eq_uint(10,  find_nth_bit(bmap, 64 * 3 - 1, 0));
 	expect_eq_uint(20,  find_nth_bit(bmap, 64 * 3 - 1, 1));
@@ -254,7 +254,7 @@ static void __init test_find_nth_bit(void)
 	expect_eq_uint(60,  find_nth_bit(bmap, 64 * 3 - 1, 5));
 	expect_eq_uint(80,  find_nth_bit(bmap, 64 * 3 - 1, 6));
 	expect_eq_uint(123, find_nth_bit(bmap, 64 * 3 - 1, 7));
-	expect_eq_uint(64 * 3 - 1, find_nth_bit(bmap, 64 * 3 - 1, 8));
+	expect_eq_uint(0,   !!(find_nth_bit(bmap, 64 * 3 - 1, 8) < 64 * 3 - 1));
 
 	for_each_set_bit(bit, exp1, EXP1_IN_BITS) {
 		b = find_nth_bit(exp1, EXP1_IN_BITS, cnt++);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/4] bitops: squeeze even more out of fns()
  2024-05-02 23:32 [PATCH 0/4] bitops: optimize fns() for more Yury Norov
  2024-05-02 23:32 ` [PATCH 1/4] lib: make test_bitops compilable into the kernel image Yury Norov
  2024-05-02 23:32 ` [PATCH 2/4] bitmap: relax find_nth_bit() limitation on return value Yury Norov
@ 2024-05-02 23:32 ` Yury Norov
  2024-05-03  2:19   ` Kuan-Wei Chiu
  2024-05-02 23:32 ` [PATCH 4/4] MAINTAINERS: add BITOPS API record Yury Norov
  3 siblings, 1 reply; 13+ messages in thread
From: Yury Norov @ 2024-05-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Yury Norov, Rasmus Villemoes, Andrew Morton, Kuan-Wei Chiu

The function clears N-1 first set bits to find the N'th one with:

	while (word && n--)
		word &= word - 1;

In the worst case, it would take 63 iterations.

Instead of linear walk through the set bits, we can do a binary search
by using hweight(). This would work even better on platforms supporting
hardware-assisted hweight() - pretty much every modern arch.

On my Ryzen 9 5900X, the test_fns() benchmark runs binary fns() twice
faster, comparing to linear one.

The fns8() returns 64 to make sure that in case of no bit found, the
return value will be greater than the bit capacity of arguments of all
fnsXX() functions up to fns64().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 include/linux/bitops.h | 42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index 57ecef354f47..1c4773db56e0 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -247,17 +247,49 @@ static inline unsigned long __ffs64(u64 word)
 	return __ffs((unsigned long)word);
 }
 
+static inline unsigned long fns8(u8 word, unsigned int n)
+{
+	while (word && n--)
+		word &= word - 1;
+
+	return word ? __ffs(word) : 64;
+}
+
+static inline unsigned long fns16(u16 word, unsigned int n)
+{
+	unsigned int w = hweight8((u8)word);
+
+	return n < w ? fns8((u8)word, n) : 8 + fns8((u8)(word >> 8), n - w);
+}
+
+static inline unsigned long fns32(u32 word, unsigned int n)
+{
+	unsigned int w = hweight16((u16)word);
+
+	return n < w ? fns16((u16)word, n) : 16 + fns16((u16)(word >> 16), n - w);
+}
+
+static inline unsigned long fns64(u64 word, unsigned int n)
+{
+	unsigned int w = hweight32((u32)word);
+
+	return n < w ? fns32((u32)word, n) : 32 + fns32((u32)(word >> 32), n - w);
+}
+
 /**
  * fns - find N'th set bit in a word
  * @word: The word to search
- * @n: Bit to find
+ * @n: Bit to find, counting from 0
+ *
+ * Returns N'th set bit. If no such bit found, returns >= BITS_PER_LONG
  */
 static inline unsigned long fns(unsigned long word, unsigned int n)
 {
-	while (word && n--)
-		word &= word - 1;
-
-	return word ? __ffs(word) : BITS_PER_LONG;
+#if BITS_PER_LONG == 64
+	return fns64(word, n);
+#else
+	return fns32(word, n);
+#endif
 }
 
 /**
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/4] MAINTAINERS: add BITOPS API record
  2024-05-02 23:32 [PATCH 0/4] bitops: optimize fns() for more Yury Norov
                   ` (2 preceding siblings ...)
  2024-05-02 23:32 ` [PATCH 3/4] bitops: squeeze even more out of fns() Yury Norov
@ 2024-05-02 23:32 ` Yury Norov
  3 siblings, 0 replies; 13+ messages in thread
From: Yury Norov @ 2024-05-02 23:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Yury Norov, Rasmus Villemoes, Andrew Morton, Kuan-Wei Chiu

Bitops API is the very basic, and it's widely used by the kernel. But
corresponding files are not maintained.

Bitmaps actively use bit operations, and big share of bitops material
already moves through the bitmap branch.

I would like to take a closer look to bitops.

This patch creates a BITOPS API record in the MAINTAINERS, and adds
Rasmus as a reviewer, and myself as a maintainer of those files.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 MAINTAINERS | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index c23fda1aa1f0..0cfd2c5d9086 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3638,6 +3638,20 @@ F:	tools/include/vdso/bits.h
 F:	tools/lib/bitmap.c
 F:	tools/lib/find_bit.c
 
+BITOPS API
+M:	Yury Norov <yury.norov@gmail.com>
+R:	Rasmus Villemoes <linux@rasmusvillemoes.dk>
+S:	Maintained
+F:	arch/*/include/asm/bitops.h
+F:	arch/*/include/asm/bitops_32.h
+F:	arch/*/include/asm/bitops_64.h
+F:	arch/*/lib/bitops.c
+F:	include/asm-generic/bitops
+F:	include/asm-generic/bitops.h
+F:	include/linux/bitops.h
+F:	lib/test_bitops.c
+F:	tools/*/bitops*
+
 BLINKM RGB LED DRIVER
 M:	Jan-Simon Moeller <jansimon.moeller@gmx.de>
 S:	Maintained
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-02 23:32 ` [PATCH 1/4] lib: make test_bitops compilable into the kernel image Yury Norov
@ 2024-05-03  2:00   ` Kuan-Wei Chiu
  2024-05-03  2:14     ` Yury Norov
  0 siblings, 1 reply; 13+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-03  2:00 UTC (permalink / raw)
  To: Yury Norov; +Cc: linux-kernel, Rasmus Villemoes, Andrew Morton

On Thu, May 02, 2024 at 04:32:01PM -0700, Yury Norov wrote:
> The test is limited to be compiled as a module. There's no technical
> reason for it. Now that the test bears performance benchmark, it would
> be reasonable to allow running it at kernel load time, before userspace
> starts, to reduce possible jitter.
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  lib/Kconfig.debug | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index c63a5fbf1f1c..fc8fe1ea5b49 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2436,7 +2436,6 @@ config TEST_LKM
>  
>  config TEST_BITOPS
>  	tristate "Test module for compilation of bitops operations"
> -	depends on m


Perhaps it would be better to modify the description in the following
help section at the same time?

Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>

>  	help
>  	  This builds the "test_bitops" module that is much like the
>  	  TEST_LKM module except that it does a basic exercise of the
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-03  2:00   ` Kuan-Wei Chiu
@ 2024-05-03  2:14     ` Yury Norov
  2024-05-03  2:24       ` Kuan-Wei Chiu
  0 siblings, 1 reply; 13+ messages in thread
From: Yury Norov @ 2024-05-03  2:14 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: linux-kernel, Rasmus Villemoes, Andrew Morton

On Fri, May 03, 2024 at 10:00:07AM +0800, Kuan-Wei Chiu wrote:
> On Thu, May 02, 2024 at 04:32:01PM -0700, Yury Norov wrote:
> > The test is limited to be compiled as a module. There's no technical
> > reason for it. Now that the test bears performance benchmark, it would
> > be reasonable to allow running it at kernel load time, before userspace
> > starts, to reduce possible jitter.
> > 
> > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > ---
> >  lib/Kconfig.debug | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index c63a5fbf1f1c..fc8fe1ea5b49 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -2436,7 +2436,6 @@ config TEST_LKM
> >  
> >  config TEST_BITOPS
> >  	tristate "Test module for compilation of bitops operations"
> > -	depends on m
> 
> 
> Perhaps it would be better to modify the description in the following
> help section at the same time?

What exactly you want to change?
 
> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> 
> >  	help
> >  	  This builds the "test_bitops" module that is much like the
> >  	  TEST_LKM module except that it does a basic exercise of the
> > -- 
> > 2.40.1
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] bitops: squeeze even more out of fns()
  2024-05-02 23:32 ` [PATCH 3/4] bitops: squeeze even more out of fns() Yury Norov
@ 2024-05-03  2:19   ` Kuan-Wei Chiu
  2024-05-03 16:13     ` Yury Norov
  0 siblings, 1 reply; 13+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-03  2:19 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Rasmus Villemoes, Andrew Morton, Chin-Chun Chen,
	Ching-Chun Huang

+Cc Chin-Chun Chen & Ching-Chun (Jim) Huang

On Thu, May 02, 2024 at 04:32:03PM -0700, Yury Norov wrote:
> The function clears N-1 first set bits to find the N'th one with:
> 
> 	while (word && n--)
> 		word &= word - 1;
> 
> In the worst case, it would take 63 iterations.
> 
> Instead of linear walk through the set bits, we can do a binary search
> by using hweight(). This would work even better on platforms supporting
> hardware-assisted hweight() - pretty much every modern arch.
> 
Chin-Chun once proposed a method similar to binary search combined with
hamming weight and discussed it privately with me and Jim. However,
Chin-Chun found that binary search would actually impair performance
when n is small. Since we are unsure about the typical range of n in
our actual workload, we have not yet proposed any relevant patches. If
considering only the overall benchmark results, this patch looks good
to me.

Regards,
Kuan-Wei

> On my Ryzen 9 5900X, the test_fns() benchmark runs binary fns() twice
> faster, comparing to linear one.
> 
> The fns8() returns 64 to make sure that in case of no bit found, the
> return value will be greater than the bit capacity of arguments of all
> fnsXX() functions up to fns64().
> 
> Signed-off-by: Yury Norov <yury.norov@gmail.com>
> ---
>  include/linux/bitops.h | 42 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 57ecef354f47..1c4773db56e0 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -247,17 +247,49 @@ static inline unsigned long __ffs64(u64 word)
>  	return __ffs((unsigned long)word);
>  }
>  
> +static inline unsigned long fns8(u8 word, unsigned int n)
> +{
> +	while (word && n--)
> +		word &= word - 1;
> +
> +	return word ? __ffs(word) : 64;
> +}
> +
> +static inline unsigned long fns16(u16 word, unsigned int n)
> +{
> +	unsigned int w = hweight8((u8)word);
> +
> +	return n < w ? fns8((u8)word, n) : 8 + fns8((u8)(word >> 8), n - w);
> +}
> +
> +static inline unsigned long fns32(u32 word, unsigned int n)
> +{
> +	unsigned int w = hweight16((u16)word);
> +
> +	return n < w ? fns16((u16)word, n) : 16 + fns16((u16)(word >> 16), n - w);
> +}
> +
> +static inline unsigned long fns64(u64 word, unsigned int n)
> +{
> +	unsigned int w = hweight32((u32)word);
> +
> +	return n < w ? fns32((u32)word, n) : 32 + fns32((u32)(word >> 32), n - w);
> +}
> +
>  /**
>   * fns - find N'th set bit in a word
>   * @word: The word to search
> - * @n: Bit to find
> + * @n: Bit to find, counting from 0
> + *
> + * Returns N'th set bit. If no such bit found, returns >= BITS_PER_LONG
>   */
>  static inline unsigned long fns(unsigned long word, unsigned int n)
>  {
> -	while (word && n--)
> -		word &= word - 1;
> -
> -	return word ? __ffs(word) : BITS_PER_LONG;
> +#if BITS_PER_LONG == 64
> +	return fns64(word, n);
> +#else
> +	return fns32(word, n);
> +#endif
>  }
>  
>  /**
> -- 
> 2.40.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-03  2:14     ` Yury Norov
@ 2024-05-03  2:24       ` Kuan-Wei Chiu
  2024-05-03 15:13         ` Yury Norov
  0 siblings, 1 reply; 13+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-03  2:24 UTC (permalink / raw)
  To: Yury Norov; +Cc: linux-kernel, Rasmus Villemoes, Andrew Morton

On Thu, May 02, 2024 at 07:14:10PM -0700, Yury Norov wrote:
> On Fri, May 03, 2024 at 10:00:07AM +0800, Kuan-Wei Chiu wrote:
> > On Thu, May 02, 2024 at 04:32:01PM -0700, Yury Norov wrote:
> > > The test is limited to be compiled as a module. There's no technical
> > > reason for it. Now that the test bears performance benchmark, it would
> > > be reasonable to allow running it at kernel load time, before userspace
> > > starts, to reduce possible jitter.
> > > 
> > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > ---
> > >  lib/Kconfig.debug | 1 -
> > >  1 file changed, 1 deletion(-)
> > > 
> > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > index c63a5fbf1f1c..fc8fe1ea5b49 100644
> > > --- a/lib/Kconfig.debug
> > > +++ b/lib/Kconfig.debug
> > > @@ -2436,7 +2436,6 @@ config TEST_LKM
> > >  
> > >  config TEST_BITOPS
> > >  	tristate "Test module for compilation of bitops operations"
> > > -	depends on m
> > 
> > 
> > Perhaps it would be better to modify the description in the following
> > help section at the same time?
> 
> What exactly you want to change?
>
It seems to me that the entire description is written specifically for
the module. For instance, "doesn't run or load unless explicitly
requested by name. for example: modprobe test_bitops." In my view, this
description is no longer accurate.

Regards,
Kuan-Wei

> > Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>
> > 
> > >  	help
> > >  	  This builds the "test_bitops" module that is much like the
> > >  	  TEST_LKM module except that it does a basic exercise of the
> > > -- 
> > > 2.40.1
> > > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-03  2:24       ` Kuan-Wei Chiu
@ 2024-05-03 15:13         ` Yury Norov
  2024-05-03 15:29           ` Kuan-Wei Chiu
  0 siblings, 1 reply; 13+ messages in thread
From: Yury Norov @ 2024-05-03 15:13 UTC (permalink / raw)
  To: Kuan-Wei Chiu; +Cc: linux-kernel, Rasmus Villemoes, Andrew Morton

On Fri, May 03, 2024 at 10:24:23AM +0800, Kuan-Wei Chiu wrote:
> On Thu, May 02, 2024 at 07:14:10PM -0700, Yury Norov wrote:
> > On Fri, May 03, 2024 at 10:00:07AM +0800, Kuan-Wei Chiu wrote:
> > > On Thu, May 02, 2024 at 04:32:01PM -0700, Yury Norov wrote:
> > > > The test is limited to be compiled as a module. There's no technical
> > > > reason for it. Now that the test bears performance benchmark, it would
> > > > be reasonable to allow running it at kernel load time, before userspace
> > > > starts, to reduce possible jitter.
> > > > 
> > > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > > ---
> > > >  lib/Kconfig.debug | 1 -
> > > >  1 file changed, 1 deletion(-)
> > > > 
> > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > > index c63a5fbf1f1c..fc8fe1ea5b49 100644
> > > > --- a/lib/Kconfig.debug
> > > > +++ b/lib/Kconfig.debug
> > > > @@ -2436,7 +2436,6 @@ config TEST_LKM
> > > >  
> > > >  config TEST_BITOPS
> > > >  	tristate "Test module for compilation of bitops operations"
> > > > -	depends on m
> > > 
> > > 
> > > Perhaps it would be better to modify the description in the following
> > > help section at the same time?
> > 
> > What exactly you want to change?
> >
> It seems to me that the entire description is written specifically for
> the module. For instance, "doesn't run or load unless explicitly
> requested by name. for example: modprobe test_bitops." In my view, this
> description is no longer accurate.

In-kernel module is still module. Everything is the same as for .ko,
except that it's loaded automatically and earlier for you. To me this
part of the description is correct.

If you feel it should be reworded - feel free to submit a patch. Now
that we add more functionality in that, it's probably worth to do. Not
in this series, though.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/4] lib: make test_bitops compilable into the kernel image
  2024-05-03 15:13         ` Yury Norov
@ 2024-05-03 15:29           ` Kuan-Wei Chiu
  0 siblings, 0 replies; 13+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-03 15:29 UTC (permalink / raw)
  To: Yury Norov; +Cc: linux-kernel, Rasmus Villemoes, Andrew Morton

On Fri, May 03, 2024 at 08:13:05AM -0700, Yury Norov wrote:
> On Fri, May 03, 2024 at 10:24:23AM +0800, Kuan-Wei Chiu wrote:
> > On Thu, May 02, 2024 at 07:14:10PM -0700, Yury Norov wrote:
> > > On Fri, May 03, 2024 at 10:00:07AM +0800, Kuan-Wei Chiu wrote:
> > > > On Thu, May 02, 2024 at 04:32:01PM -0700, Yury Norov wrote:
> > > > > The test is limited to be compiled as a module. There's no technical
> > > > > reason for it. Now that the test bears performance benchmark, it would
> > > > > be reasonable to allow running it at kernel load time, before userspace
> > > > > starts, to reduce possible jitter.
> > > > > 
> > > > > Signed-off-by: Yury Norov <yury.norov@gmail.com>
> > > > > ---
> > > > >  lib/Kconfig.debug | 1 -
> > > > >  1 file changed, 1 deletion(-)
> > > > > 
> > > > > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > > > > index c63a5fbf1f1c..fc8fe1ea5b49 100644
> > > > > --- a/lib/Kconfig.debug
> > > > > +++ b/lib/Kconfig.debug
> > > > > @@ -2436,7 +2436,6 @@ config TEST_LKM
> > > > >  
> > > > >  config TEST_BITOPS
> > > > >  	tristate "Test module for compilation of bitops operations"
> > > > > -	depends on m
> > > > 
> > > > 
> > > > Perhaps it would be better to modify the description in the following
> > > > help section at the same time?
> > > 
> > > What exactly you want to change?
> > >
> > It seems to me that the entire description is written specifically for
> > the module. For instance, "doesn't run or load unless explicitly
> > requested by name. for example: modprobe test_bitops." In my view, this
> > description is no longer accurate.
> 
> In-kernel module is still module. Everything is the same as for .ko,
> except that it's loaded automatically and earlier for you. To me this
> part of the description is correct.
> 
> If you feel it should be reworded - feel free to submit a patch. Now
> that we add more functionality in that, it's probably worth to do. Not
> in this series, though.
> 
> Thanks,
> Yury

Got it, thank you for the explanation! :)

Regards,
Kuan-Wei

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] bitops: squeeze even more out of fns()
  2024-05-03  2:19   ` Kuan-Wei Chiu
@ 2024-05-03 16:13     ` Yury Norov
  2024-05-04  8:53       ` Kuan-Wei Chiu
  0 siblings, 1 reply; 13+ messages in thread
From: Yury Norov @ 2024-05-03 16:13 UTC (permalink / raw)
  To: Kuan-Wei Chiu
  Cc: linux-kernel, Rasmus Villemoes, Andrew Morton, Chin-Chun Chen,
	Ching-Chun Huang

On Fri, May 03, 2024 at 10:19:10AM +0800, Kuan-Wei Chiu wrote:
> +Cc Chin-Chun Chen & Ching-Chun (Jim) Huang
> 
> On Thu, May 02, 2024 at 04:32:03PM -0700, Yury Norov wrote:
> > The function clears N-1 first set bits to find the N'th one with:
> > 
> > 	while (word && n--)
> > 		word &= word - 1;
> > 
> > In the worst case, it would take 63 iterations.
> > 
> > Instead of linear walk through the set bits, we can do a binary search
> > by using hweight(). This would work even better on platforms supporting
> > hardware-assisted hweight() - pretty much every modern arch.
> > 
> Chin-Chun once proposed a method similar to binary search combined with
> hamming weight and discussed it privately with me and Jim. However,
> Chin-Chun found that binary search would actually impair performance
> when n is small. Since we are unsure about the typical range of n in
> our actual workload, we have not yet proposed any relevant patches. If
> considering only the overall benchmark results, this patch looks good
> to me.

fns() is used only as a helper to find_nth_bit(). 

In the kernel the find_nth_bit() is used in
 - bitmap_bitremap((),
 - bitmap_remap(), and
 - cpumask_local_spread() via sched_numa_find_nth_cpu()

with the bit to search calculated as n = n % cpumask_weigth(). This
virtually implies random uniformly distributed n and word, just like
in the test_fns().

In rebalance_wq_table() in drivers/crypto/intel/iaa/iaa_crypto_main.c
it's used like:
        
         for (cpu = 0; cpu < nr_cpus_per_node; cpu++) {
                   int node_cpu = cpumask_nth(cpu, node_cpus);
                   ...
         }

This is an API abuse, and should be rewritten with for_each_cpu()

In cpumask_any_housekeeping() at arch/x86/kernel/cpu/resctrl/internal.h
it's used like:

 90         hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
 91         if (hk_cpu == exclude_cpu)
 92                 hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
 93 
 94         if (hk_cpu < nr_cpu_ids)
 95                 cpu = hk_cpu;

And this is another example of the API abuse. We need to introduce a new
helper cpumask_andnot_any_but() and use it like:

        hk_cpu = cpumask_andnot_any_but(exclude_cpu, mask, tick_nohz_full_mask).
        if (hk_cpu < nr_cpu_ids)
                 cpu = hk_cpu;

So, where the use of find_nth_bit() is legitimate, the parameters are
distributed like in the test, and I would expect the real-life
performance impact to be similar to the test.

Optimizing the helper for non-legitimate cases doesn't worth the
effort.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/4] bitops: squeeze even more out of fns()
  2024-05-03 16:13     ` Yury Norov
@ 2024-05-04  8:53       ` Kuan-Wei Chiu
  0 siblings, 0 replies; 13+ messages in thread
From: Kuan-Wei Chiu @ 2024-05-04  8:53 UTC (permalink / raw)
  To: Yury Norov
  Cc: linux-kernel, Rasmus Villemoes, Andrew Morton, Chin-Chun Chen,
	Ching-Chun Huang

On Fri, May 03, 2024 at 09:13:32AM -0700, Yury Norov wrote:
> On Fri, May 03, 2024 at 10:19:10AM +0800, Kuan-Wei Chiu wrote:
> > +Cc Chin-Chun Chen & Ching-Chun (Jim) Huang
> > 
> > On Thu, May 02, 2024 at 04:32:03PM -0700, Yury Norov wrote:
> > > The function clears N-1 first set bits to find the N'th one with:
> > > 
> > > 	while (word && n--)
> > > 		word &= word - 1;
> > > 
> > > In the worst case, it would take 63 iterations.
> > > 
> > > Instead of linear walk through the set bits, we can do a binary search
> > > by using hweight(). This would work even better on platforms supporting
> > > hardware-assisted hweight() - pretty much every modern arch.
> > > 
> > Chin-Chun once proposed a method similar to binary search combined with
> > hamming weight and discussed it privately with me and Jim. However,
> > Chin-Chun found that binary search would actually impair performance
> > when n is small. Since we are unsure about the typical range of n in
> > our actual workload, we have not yet proposed any relevant patches. If
> > considering only the overall benchmark results, this patch looks good
> > to me.
> 
> fns() is used only as a helper to find_nth_bit(). 
> 
> In the kernel the find_nth_bit() is used in
>  - bitmap_bitremap((),
>  - bitmap_remap(), and
>  - cpumask_local_spread() via sched_numa_find_nth_cpu()
> 
> with the bit to search calculated as n = n % cpumask_weigth(). This
> virtually implies random uniformly distributed n and word, just like
> in the test_fns().
> 
> In rebalance_wq_table() in drivers/crypto/intel/iaa/iaa_crypto_main.c
> it's used like:
>         
>          for (cpu = 0; cpu < nr_cpus_per_node; cpu++) {
>                    int node_cpu = cpumask_nth(cpu, node_cpus);
>                    ...
>          }
> 
> This is an API abuse, and should be rewritten with for_each_cpu()
> 
> In cpumask_any_housekeeping() at arch/x86/kernel/cpu/resctrl/internal.h
> it's used like:
> 
>  90         hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
>  91         if (hk_cpu == exclude_cpu)
>  92                 hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
>  93 
>  94         if (hk_cpu < nr_cpu_ids)
>  95                 cpu = hk_cpu;
> 
> And this is another example of the API abuse. We need to introduce a new
> helper cpumask_andnot_any_but() and use it like:
> 
>         hk_cpu = cpumask_andnot_any_but(exclude_cpu, mask, tick_nohz_full_mask).
>         if (hk_cpu < nr_cpu_ids)
>                  cpu = hk_cpu;
> 
> So, where the use of find_nth_bit() is legitimate, the parameters are
> distributed like in the test, and I would expect the real-life
> performance impact to be similar to the test.
> 
> Optimizing the helper for non-legitimate cases doesn't worth the
> effort.
>
Got it, thank you for your detailed explanation :)

Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com>

Regards,
Kuan-Wei


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-04  8:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-02 23:32 [PATCH 0/4] bitops: optimize fns() for more Yury Norov
2024-05-02 23:32 ` [PATCH 1/4] lib: make test_bitops compilable into the kernel image Yury Norov
2024-05-03  2:00   ` Kuan-Wei Chiu
2024-05-03  2:14     ` Yury Norov
2024-05-03  2:24       ` Kuan-Wei Chiu
2024-05-03 15:13         ` Yury Norov
2024-05-03 15:29           ` Kuan-Wei Chiu
2024-05-02 23:32 ` [PATCH 2/4] bitmap: relax find_nth_bit() limitation on return value Yury Norov
2024-05-02 23:32 ` [PATCH 3/4] bitops: squeeze even more out of fns() Yury Norov
2024-05-03  2:19   ` Kuan-Wei Chiu
2024-05-03 16:13     ` Yury Norov
2024-05-04  8:53       ` Kuan-Wei Chiu
2024-05-02 23:32 ` [PATCH 4/4] MAINTAINERS: add BITOPS API record Yury Norov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.