All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
@ 2024-04-28 16:40 Jules Irenge
  2024-05-02 16:25 ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Jules Irenge @ 2024-04-28 16:40 UTC (permalink / raw)
  To: mark.rutland
  Cc: alexander.shishkin, jolsa, irogers, adrian.hunter, bp,
	dave.hansen, x86, hpa, linux-perf-users, linux-kernel

do_div() truncates a u64 divisor to 32 bit.
This can lead to non-zero being truncated to zero for division.

Fix coccinelle warning
WARNING: do_div() does a 64-by-32 division, please consider using div64_u64 instead

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
---
 arch/x86/events/amd/power.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
index 37d5b380516e..ff003c1a645b 100644
--- a/arch/x86/events/amd/power.c
+++ b/arch/x86/events/amd/power.c
@@ -64,7 +64,7 @@ static void event_update(struct perf_event *event)
 	delta *= cpu_pwr_sample_ratio * 1000;
 	tdelta = new_ptsc - prev_ptsc;
 
-	do_div(delta, tdelta);
+	div64_u64(delta, tdelta);
 	local64_add(delta, &event->count);
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-04-28 16:40 [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div() Jules Irenge
@ 2024-05-02 16:25 ` David Laight
  2024-05-02 16:34   ` Jules Bashizi Irenge
  0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2024-05-02 16:25 UTC (permalink / raw)
  To: 'Jules Irenge', mark.rutland
  Cc: alexander.shishkin, jolsa, irogers, adrian.hunter, bp,
	dave.hansen, x86, hpa, linux-perf-users, linux-kernel

From: Jules Irenge <jbi.octave@gmail.com>
> Sent: 28 April 2024 17:40
> 
> do_div() truncates a u64 divisor to 32 bit.
> This can lead to non-zero being truncated to zero for division.
> 
> Fix coccinelle warning
> WARNING: do_div() does a 64-by-32 division, please consider using div64_u64 instead
> 
> Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
> ---
>  arch/x86/events/amd/power.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
> index 37d5b380516e..ff003c1a645b 100644
> --- a/arch/x86/events/amd/power.c
> +++ b/arch/x86/events/amd/power.c
> @@ -64,7 +64,7 @@ static void event_update(struct perf_event *event)
>  	delta *= cpu_pwr_sample_ratio * 1000;
>  	tdelta = new_ptsc - prev_ptsc;
> 
> -	do_div(delta, tdelta);
> +	div64_u64(delta, tdelta);

Nak - you've not tested it.

	David

>  	local64_add(delta, &event->count);
>  }
> 
> --
> 2.43.2
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-02 16:25 ` David Laight
@ 2024-05-02 16:34   ` Jules Bashizi Irenge
  2024-05-02 22:18     ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Jules Bashizi Irenge @ 2024-05-02 16:34 UTC (permalink / raw)
  To: David Laight; +Cc: linux-perf-users

On Thu, May 02, 2024 at 04:25:58PM +0000, David Laight wrote:
> From: Jules Irenge <jbi.octave@gmail.com>
> > Sent: 28 April 2024 17:40
> > 
> > do_div() truncates a u64 divisor to 32 bit.
> > This can lead to non-zero being truncated to zero for division.
> > 
> > Fix coccinelle warning
> > WARNING: do_div() does a 64-by-32 division, please consider using div64_u64 instead
> > 
> > Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
> > ---
> >  arch/x86/events/amd/power.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/events/amd/power.c b/arch/x86/events/amd/power.c
> > index 37d5b380516e..ff003c1a645b 100644
> > --- a/arch/x86/events/amd/power.c
> > +++ b/arch/x86/events/amd/power.c
> > @@ -64,7 +64,7 @@ static void event_update(struct perf_event *event)
> >  	delta *= cpu_pwr_sample_ratio * 1000;
> >  	tdelta = new_ptsc - prev_ptsc;
> > 
> > -	do_div(delta, tdelta);
> > +	div64_u64(delta, tdelta);
> 
> Nak - you've not tested it.
> 
> 	David
> 
No, I later realised it's for 32bit here. It may be very expensive for
a 32bit.
I think I got it wrong here.

Thanks for reply.
Jules

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-02 16:34   ` Jules Bashizi Irenge
@ 2024-05-02 22:18     ` David Laight
  2024-05-02 22:59       ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2024-05-02 22:18 UTC (permalink / raw)
  To: 'Jules Bashizi Irenge'; +Cc: linux-perf-users, 'Linus Torvalds'

> > > -	do_div(delta, tdelta);
> > > +	div64_u64(delta, tdelta);
> >
> > Nak - you've not tested it.
> >
> > 	David
> >
> No, I later realised it's for 32bit here. It may be very expensive for
> a 32bit.
> I think I got it wrong here.

You've missed the main issue.
do_div(a, b) sets 'a = a / b' and returns 'a % b'.
div64_u64(a, b) returns 'a / b'.
They are not direct substitutes.

For x86 (32 bit - does anyone care are more??) do_div is defined as:

/*
 * do_div() is NOT a C function. It wants to return
 * two values (the quotient and the remainder), but
 * since that doesn't work very well in C, what it
 * does is:
 *
 * - modifies the 64-bit dividend _in_place_
 * - returns the 32-bit remainder
 *
 * This ends up being the most efficient "calling
 * convention" on x86.
 */
#define do_div(n, base)						\
({								\
	unsigned long __upper, __low, __high, __mod, __base;	\
	__base = (base);					\
	if (__builtin_constant_p(__base) && is_power_of_2(__base)) { \
		__mod = n & (__base - 1);			\
		n >>= ilog2(__base);				\
	} else {						\
		asm("" : "=a" (__low), "=d" (__high) : "A" (n));\
		__upper = __high;				\
		if (__high) {					\
			__upper = __high % (__base);		\
			__high = __high / (__base);		\
		}						\
		asm("divl %2" : "=a" (__low), "=d" (__mod)	\
			: "rm" (__base), "0" (__low), "1" (__upper));	\
		asm("" : "=A" (n) : "a" (__low), "d" (__high));	\
	}							\
	__mod;							\
})

Somewhen it must have got changed from a simple asm wrapper for divl.
(And the test in the middle should be '__high > __base' to avoid
a division that is guaranteed to return zero.)

I'm also not at all sure what happens when divide gets speculatively
executed - there is a fair chance it has to complete.
So that extra check for returning 64bit quotients may be move
expensive that you might think.
(Although x86 will fault if the quotient would overflow - not ideal.)

I do wonder how much more it would cost to check the high 32bits
of a 64bit 'base' for being zero in a full 64x64 divide?
Any 'full software' 64bit divide function is going to do that anyway.

For 32bit it should be possible to do an inline 32x32 divide if
both 64bit values are only 32bit and an out of line call otherwise.
Although that might need to be built into gcc (at least for
architectures where the modulus isn't free) since gcc detects
when code does a/b and a%b and will only do one divide.

So maybe just supporting 64bit divide wouldn't be that expensive??

(Although there is a fubar on risc-v where even 'q = a / b'
'r = a - q * b' gets converted to 'r = a % b' and two slow
divides.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-02 22:18     ` David Laight
@ 2024-05-02 22:59       ` Linus Torvalds
  2024-05-04 22:53         ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2024-05-02 22:59 UTC (permalink / raw)
  To: David Laight; +Cc: Jules Bashizi Irenge, linux-perf-users

On Thu, 2 May 2024 at 15:19, David Laight <David.Laight@aculab.com> wrote:
>
> Somewhen it must have got changed from a simple asm wrapper for divl.

It's never been a simple wrapper for divl. For the simple reason that
'divl' doesn't do a full 64/32 divide returning a full 64-bit result.

'divl' takes a 64-bit input, but only works if the whole result *fits*
in 32 bits.

If it doesn't, you need to do two divl instructions (the first to
reduce the range).

I think the only real change to do_div() has been to make it notice
the constant power-of-two divides and turning them into shifts/masks.
I'm not sure when that was done, and too lazy to check.

> So maybe just supporting 64bit divide wouldn't be that expensive??

A 64/64 divide is *much* more expensive than a 74/32 divide that does
(potentially) two 'divl' instructions to get a 64/32 division.

Go look at lib/math/div64.c and the mess that generates. We're talking
bit scanning instructions, several multiples and 64-bit additions and
subtractions, some shifts, etc etc. And that's in *addition* to the
two divides. It's no longer inlinable.

And yes, none of that is relevant on modern hardware, but the
corollary to that is that on the old hardware where this *is*
relevant, even the multiplies were 10+ cycles, and the bit scan
reverse (that the full 64-bit divide thing uses) was 20-70 cycles.

So a full 64/64 divide is cheap these days on 64-bit architectures,
but is really bad on those old CPUs.

In fact, the reason do_div() exists in the first place is that gcc has
historically not understood that (maybe it does now - didn't check),
and would force the full 64/64 divide.

This is in contrast to the 32/32 divide that gcc does fine, which is
why you have that

                        __upper = __high % (__base);            \
                        __high = __high / (__base);             \

which gcc will just turn into a single divl on its own.

That said, I agree that the "if (__high)" could always have been "if
(__high >= __base)".

It probably doesn't matter all that much, but div_u64_rem() randomly
gets that right (same function conceptually, different calling
conventions, different way to split the 64 bit value into two 32-bit
ones).

And if you don't care about the remainder, just use "div_u64()" which
has even simpler calling convention (on 32-bit x86 it's the same code
as div_u64_rem, the remainder just gets thrown away).

             Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-02 22:59       ` Linus Torvalds
@ 2024-05-04 22:53         ` David Laight
  2024-05-05 19:38           ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2024-05-04 22:53 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: Jules Bashizi Irenge, linux-perf-users

From: Linus Torvalds
> Sent: 03 May 2024 00:00
> 
> On Thu, 2 May 2024 at 15:19, David Laight <David.Laight@aculab.com> wrote:
> >
> > Somewhen it must have got changed from a simple asm wrapper for divl.
> 
> It's never been a simple wrapper for divl. For the simple reason that
> 'divl' doesn't do a full 64/32 divide returning a full 64-bit result.

I'd thought (before I'd looked) that it only returned a 32bit result.
After all that is what a lot of code wants.

I was thinking of something like (see https://godbolt.org/z/h8oj7h4sd):

#define divu(a, b) ({ \
    u64 _a = a, _b = b, _q; \
    u32 _alo = _a, _blo = _b; \
    u32 _ahi = _a >> 32, _bhi = _b >> 32; \
    if (_bhi) { \
        _q = div64(_a, _b); \
    } else { \
        u32 _qhi = 0; \
        if (_ahi >= _blo) { \
            _qhi = _ahi / _blo; \
            _ahi = _ahi % _blo; \
        } \
        asm( "divl %2" : "+a" (_alo), "+d" (_ahi) : "r" (_blo)); \
        asm( "" : "=A" (_q) : "a" (_alo), "d" (_qhi)); \
    } \
    _q; \
})

Which will call out for a large divisor, but will do inlined divides
otherwise - so is reasonably sane whatever the types (even 32 by 32).
Modern gcc/clang also tracks values (to some extent) so will detect
_ahi and _bhi being compile-time zero even if the types are 64bit.

I've ignored the remainder for clarity.
But I'd add a 'u64 *rem' parameter and do a compile-time NULL check.

Fortunately the kernel doesn't let gcc use the xmm regsiters.
It has a nasty habit of bloating the code by trying to use them
for 64bit integers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-04 22:53         ` David Laight
@ 2024-05-05 19:38           ` Linus Torvalds
  2024-05-06 16:39             ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2024-05-05 19:38 UTC (permalink / raw)
  To: David Laight; +Cc: Jules Bashizi Irenge, linux-perf-users

On Sat, 4 May 2024 at 15:53, David Laight <David.Laight@aculab.com> wrote:
>
> I was thinking of something like (see https://godbolt.org/z/h8oj7h4sd):
>  [...]
> Which will call out for a large divisor, but will do inlined divides
> otherwise - so is reasonably sane whatever the types

No. This is not sane.

Once you have a callout, you might as well do it all entirely out of
line. You've messed up your register state and instruction scheduling
opportunities, and the advantage of inlining part of the computation
is basically almost entirely gone.

And the thing is, if you do a 64-by-64 divide, then you just do it.
Gcc will create that disgusting out-of-line code for you.

BUT IT SUCKS.

We don 't do stupid things in the kernel. We basically never need the
64-by-64 divide. The reason we have "do_div()" and friends is EXATCLY
THAT - to catch the "oh, you did something stuipid, just fix it".

Yes, we do divide 64-bit entities even on 32-bit hosts, but 99% of the
time the range of the divisor is basically always limited, and gcc
sadly does not understand that that case matters.

So stop arguing for 64-by-64 divides. They are so rare as to not
exist, and WE DO NOT WANT TO POLLUTE THE NORMAL CASE WITH THAT UNUSUAL
SPECIAL CASE.

No, the do_div() API isn't wonderful, and it comes from the fact that
it basically uses the "inline asm" API model, which is not a function
call, but an in-out parameter one.  That API was a mistake, but
whatever.

But importantly, the mistake was *NOT* that it doesn't handle a full
64-by-64 divide. Not 30 years ago, and not today.

                     Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-05 19:38           ` Linus Torvalds
@ 2024-05-06 16:39             ` David Laight
  2024-05-06 16:45               ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2024-05-06 16:39 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: Jules Bashizi Irenge, linux-perf-users

From: Linus Torvalds
> Sent: 05 May 2024 20:39
> 
> On Sat, 4 May 2024 at 15:53, David Laight <David.Laight@aculab.com> wrote:
> >
> > I was thinking of something like (see https://godbolt.org/z/h8oj7h4sd):
> >  [...]
> > Which will call out for a large divisor, but will do inlined divides
> > otherwise - so is reasonably sane whatever the types
> 
> No. This is not sane.
> 
> Once you have a callout, you might as well do it all entirely out of
> line. You've messed up your register state and instruction scheduling
> opportunities, and the advantage of inlining part of the computation
> is basically almost entirely gone.
> 
> And the thing is, if you do a 64-by-64 divide, then you just do it.
> Gcc will create that disgusting out-of-line code for you.
> 
> BUT IT SUCKS.
> 
> We don 't do stupid things in the kernel. We basically never need the
> 64-by-64 divide.

There are 381 references to div64_u64() some of them are just plain stupid.
Constant divisors of 100 and 1000 and div64_u64(x_ns * 100, y_us * 1000).
Have a look :-)

> The reason we have "do_div()" and friends is EXATCLY
> THAT - to catch the "oh, you did something stuipid, just fix it".
> 
> Yes, we do divide 64-bit entities even on 32-bit hosts, but 99% of the
> time the range of the divisor is basically always limited, and gcc
> sadly does not understand that that case matters.

The problem is that code does do_div(u64, unsigned long) (which is fine).
But someone wrote a script that complains that is a full 64bit divide
on 64bit systems - so suggests div64_u64().
They someone thinks they can fix an easy 'bug' by doing the substitution.
Except the code was fine (due to the domain of the divisor) and the
modified code is entirely broken due the different calling conventions.
(Never mind that it is likely slow on 32bit.)

div64_u64() really needs a 'must check' attribute.

So the idea was to inline a 64x32 divide and do a call for a 64x64 one.
But include an optimisation that avoided the call for a 64x64 divide
if (at run time) the divisor was only 32bit.
(How much does a 'ret' cost these days with all the mitigations?)

Adding !__builtin_constant_p(_bhi) to the check for the call removes
the run-time check - but will still inline u64/u64 if the compiler
knows the divisor is small.

Given people won't use the right function, optimising div64_u64()
for divisors that are known at compile time to be 32bits has
to make some sense.

I'm trying to think how to do a compile time NULL test for the
'pointer to remainder' for an inline function.
Maybe if (!__builtin_const_p(r) || (r)) *(r) = _rem; will DTRT.
Then div64_u64(a, b) can be div64_u32_rem(a, b, NULL) without
any inlined version containing an actual NULL check.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-06 16:39             ` David Laight
@ 2024-05-06 16:45               ` Linus Torvalds
  2024-05-06 17:26                 ` Linus Torvalds
  2024-05-06 17:43                 ` David Laight
  0 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2024-05-06 16:45 UTC (permalink / raw)
  To: David Laight; +Cc: Jules Bashizi Irenge, linux-perf-users

On Mon, 6 May 2024 at 09:40, David Laight <David.Laight@aculab.com> wrote:
> >
> > We don 't do stupid things in the kernel. We basically never need the
> > 64-by-64 divide.
>
> There are 381 references to div64_u64() some of them are just plain stupid.
> Constant divisors of 100 and 1000 and div64_u64(x_ns * 100, y_us * 1000).

Sadly, some peopel will get the compile error for "you can't do that"
because they did a regular "a = b / c" and gcc created a mess, and
then they mindlessly just replace it with div64_u64".

> But someone wrote a script that complains that is a full 64bit divide
> on 64bit systems - so suggests div64_u64().

Yeah, that's well-intentioned, but it screws up.

The reason is that "do_div()" _always_ does a 64-by-32 divide, and
then if you have a divisor that is "long" (which is indeed 32 bit on
32-bit architectures), now on 64-bit architectures it truncates what
can be a 64-bit value.

So we probably _should_ have made "do_div()" be a "64-by-long" divide.

On actual 64-bit architectures, doing a 64-bti divide is typically
fine (I can imagine some broken cases, but not enough to care). The
whole problem is because of 32-bit architectures.

              Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-06 16:45               ` Linus Torvalds
@ 2024-05-06 17:26                 ` Linus Torvalds
  2024-05-06 20:25                   ` David Laight
  2024-05-06 17:43                 ` David Laight
  1 sibling, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2024-05-06 17:26 UTC (permalink / raw)
  To: David Laight; +Cc: Jules Bashizi Irenge, linux-perf-users

On Mon, 6 May 2024 at 09:45, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So we probably _should_ have made "do_div()" be a "64-by-long" divide.

Actually, some of those conversions are just wrong and stupid, and
didn't have 'long' divisors at all.

And we do have "div64_ul()", although it's not used as widely as it could be.

But I think we can do the reverse, and just teach "div64_u64()" to do
what gcc should always have done, and look at the divisor range at
compile-time.

Something like this, perhaps:

  /*
   * Do a 64-by-64 divide using div64_ul / div64_long / div64_u64
   * depending on type of divisor.
   */
  #define div64_u64_type(a,b) _Generic((b),     \
        unsigned long: div64_ul(a,b),           \
        long: div64_long(a,b),                  \
        int: div64_long(a,b),                   \
        default: div64_u64(a,b))

  /* Is it zero without evaluating it? */
  #define is_constant_zero(x) \
        (__builtin_constant_p(x) && !(x))

  /* This catches constants and compile-time ranges */
  #define is_32_bit(x) \
        is_constant_zero((x)>>16>>16)

  /*
   * Pick the right division helper based on
   * value or type of the divisor
   */
  #define div64_u64(a,b) \
        (is_32_bit(b) ? do_div64(a,b) : div64_u64_type(a,b))

should now make div64_u64() do the right thing and avoid the 64-bit
divide for relevant types (the signed type case makes it a bit less
obvious, but whatever).

               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-06 16:45               ` Linus Torvalds
  2024-05-06 17:26                 ` Linus Torvalds
@ 2024-05-06 17:43                 ` David Laight
  1 sibling, 0 replies; 12+ messages in thread
From: David Laight @ 2024-05-06 17:43 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: Jules Bashizi Irenge, linux-perf-users

From: Linus Torvalds
> Sent: 06 May 2024 17:46
> 
> On Mon, 6 May 2024 at 09:40, David Laight <David.Laight@aculab.com> wrote:
> > >
> > > We don 't do stupid things in the kernel. We basically never need the
> > > 64-by-64 divide.
> >
> > There are 381 references to div64_u64() some of them are just plain stupid.
> > Constant divisors of 100 and 1000 and div64_u64(x_ns * 100, y_us * 1000).
> 
> Sadly, some peopel will get the compile error for "you can't do that"
> because they did a regular "a = b / c" and gcc created a mess, and
> then they mindlessly just replace it with div64_u64".

A bit like the mindless min_t() - not hard to find:
	len = min_t(unsigned int, len, sizeof (int));
	if (len < 0)
		return -ERANGE;

> > But someone wrote a script that complains that is a full 64bit divide
> > on 64bit systems - so suggests div64_u64().
> 
> Yeah, that's well-intentioned, but it screws up.

The suggestion should be to double check the domain of the divisor,
and then change the type to u32.

> The reason is that "do_div()" _always_ does a 64-by-32 divide, and
> then if you have a divisor that is "long" (which is indeed 32 bit on
> 32-bit architectures), now on 64-bit architectures it truncates what
> can be a 64-bit value.
> 
> So we probably _should_ have made "do_div()" be a "64-by-long" divide.

Or got through the kernel and change lots of the 'long' to 'u32' :-)
I'm sure a lot are there because people were scared int might be 16bit.
Perhaps M$ got it right by leaving 'long' as 32bit :-)
But really what was the last 16bit Unix?
I never used ICL's 286 unix port - that might have been 16bit.
All the 386 ones were 32bit, as was the VAX (never used one).
So we're looking at 1970's systems like the AT&T 3B2 and DEC pdp-11.
Certainly far before you'd even though of Linux.

> On actual 64-bit architectures, doing a 64-bti divide is typically
> fine (I can imagine some broken cases, but not enough to care).

Well Intel x86 always take twice as long for the full 128 by 64 divide
than the 64 by 32 one (amd is the same speed if the values fit).
So selecting between the two divides may make sense.
OTOH you probably don't want to speculatively execute the wrong divide! 

> The whole problem is because of 32-bit architectures.

Indeed.
Both x86 and m68k have a 64 by 32 divide. I'm not whether any other do.
Nios-II wont (it may not even have divide), probably all MIPS derivatives
are the same (including RISC-V).
The suggested divide function in my sparc-32 book is several pages long.
(Quite what does the to I-cache is any-bodies guess.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div()
  2024-05-06 17:26                 ` Linus Torvalds
@ 2024-05-06 20:25                   ` David Laight
  0 siblings, 0 replies; 12+ messages in thread
From: David Laight @ 2024-05-06 20:25 UTC (permalink / raw)
  To: 'Linus Torvalds'; +Cc: Jules Bashizi Irenge, linux-perf-users

From: Linus Torvalds
> Sent: 06 May 2024 18:27
> 
> On Mon, 6 May 2024 at 09:45, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So we probably _should_ have made "do_div()" be a "64-by-long" divide.
> 
> Actually, some of those conversions are just wrong and stupid, and
> didn't have 'long' divisors at all.
> 
> And we do have "div64_ul()", although it's not used as widely as it could be.
> 
> But I think we can do the reverse, and just teach "div64_u64()" to do
> what gcc should always have done, and look at the divisor range at
> compile-time.
> 
> Something like this, perhaps:
> 
>   /*
>    * Do a 64-by-64 divide using div64_ul / div64_long / div64_u64
>    * depending on type of divisor.
>    */
>   #define div64_u64_type(a,b) _Generic((b),     \
>         unsigned long: div64_ul(a,b),           \
>         long: div64_long(a,b),                  \
>         int: div64_long(a,b),                   \
>         default: div64_u64(a,b))
> 
>   /* Is it zero without evaluating it? */
>   #define is_constant_zero(x) \
>         (__builtin_constant_p(x) && !(x))
> 
>   /* This catches constants and compile-time ranges */
>   #define is_32_bit(x) \
>         is_constant_zero((x)>>16>>16)
> 
>   /*
>    * Pick the right division helper based on
>    * value or type of the divisor
>    */
>   #define div64_u64(a,b) \
>         (is_32_bit(b) ? do_div64(a,b) : div64_u64_type(a,b))
> 
> should now make div64_u64() do the right thing and avoid the 64-bit
> divide for relevant types (the signed type case makes it a bit less
> obvious, but whatever).

I don't think there are any signed divides to worry about.
Signed divides are also too painful to sort out.
You have to worry about all 4 quadrants.

Which is pretty much exactly what I proposed :-)
(apart from the run-time check to avoid calling div64_u64().)
By the time you've done the is_32_bit(b) you've only got
64 bit values left (well and 128 but...), they might be 'long'
or 'long long' on 64bit but that doesn't matter.

I'm not sure of the best way to return a quotient and remainder
on a 32bit system.
The 'user' api is best as:
	quotient = divu_xxx(dividend, divisor, &quotient);
where divisor and quotient are the same type and quotient
might be NULL - and might be supplied by a wrapper.

Perhaps something like (untested):
#define div64_u64_rem(a, b, rem) ({ \
	typeof (b) _q; \
	if (is_32_bit(b)) { \
		u64 _q_r = _div_u64_u32_rem(a, b);
		if (!is_constant_zero(rem))
			*rem = _q_r >> 32;
		_q = _q_r & 0xffffffff;
	} else { \
		_q = _divu64_u64_rem(a, b, rem); \
	} \
	_q; \
})

Then x86 would have a #define/inline for _div_u32_rem().
I suspect the '*rem' needs to be *(typeof(&(b))rem
and something like (0 ? &b : rem) added somewhere.
Perhaps *(typeof(0 ? &b : rem))(rem) DTRT ?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-05-06 20:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-28 16:40 [PATCH] perf/x86/amd/power: Use div64_u64 onstead of do_div() Jules Irenge
2024-05-02 16:25 ` David Laight
2024-05-02 16:34   ` Jules Bashizi Irenge
2024-05-02 22:18     ` David Laight
2024-05-02 22:59       ` Linus Torvalds
2024-05-04 22:53         ` David Laight
2024-05-05 19:38           ` Linus Torvalds
2024-05-06 16:39             ` David Laight
2024-05-06 16:45               ` Linus Torvalds
2024-05-06 17:26                 ` Linus Torvalds
2024-05-06 20:25                   ` David Laight
2024-05-06 17:43                 ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.