diff mbox series

[RFC] kernel.h: Add generic roundup_64() macro

Message ID 20190523100013.52a8d2a6@gandalf.local.home (mailing list archive)
State New, archived
Headers show
Series [RFC] kernel.h: Add generic roundup_64() macro | expand

Commit Message

Steven Rostedt May 23, 2019, 2 p.m. UTC
From: Steven Rostedt (VMware) <rostedt@goodmis.org>

In discussing a build failure on x86_32 due to the use of roundup() on
a 64 bit number, I realized that there's no generic equivalent
roundup_64(). It is implemented in two separate places in the kernel,
but there really should be just one that all can use.

Although the other implementations are a static inline function, this
implementation is a macro to allow the use of typeof(x) to denote the
type that is being used. If the build is on a 64 bit machine, then the
roundup_64() macro will just default back to roundup(). But for 32 bit
machines, it will use the version that is will not cause issues with
dividing a 64 bit number on a 32 bit machine.

Link: http://lkml.kernel.org/r/20190522145450.25ff483d@gandalf.local.home

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

Comments

Linus Torvalds May 23, 2019, 3:10 p.m. UTC | #1
On Thu, May 23, 2019 at 7:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> +# define roundup_64(x, y) (                            \
> +{                                                      \
> +       typeof(y) __y = y;                              \
> +       typeof(x) __x = (x) + (__y - 1);                \
> +       do_div(__x, __y);                               \
> +       __x * __y;                                      \
> +}                                                      \

The thing about this is that it absolutely sucks for power-of-two arguments.

The regular roundup() that uses division has the compiler at least
optimize them to shifts - at least for constant cases. But do_div() is
meant for "we already know it's not a power of two", and the compiler
doesn't have any understanding of the internals.

And it looks to me like the use case you want this for is very much
probably a power of two. In which case division is all kinds of just
stupid.

And we already have a power-of-two round up function that works on
u64. It's called "round_up()".

I wish we had a better visual warning about the differences between
"round_up()" (limited to powers-of-two, but efficient, and works with
any size) and "roundup()" (generic, potentially horribly slow, and
doesn't work for 64-bit on 32-bit).

Side note: "round_up()" has the problem that it uses "x" twice.

End result: somebody should look at this, but I really don't like the
"force division" case that is likely horribly slow and nasty.

                  Linus
Steven Rostedt May 23, 2019, 3:27 p.m. UTC | #2
On Thu, 23 May 2019 08:10:44 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, May 23, 2019 at 7:00 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > +# define roundup_64(x, y) (                            \
> > +{                                                      \
> > +       typeof(y) __y = y;                              \
> > +       typeof(x) __x = (x) + (__y - 1);                \
> > +       do_div(__x, __y);                               \
> > +       __x * __y;                                      \
> > +}                                                      \
> 
> The thing about this is that it absolutely sucks for power-of-two arguments.
> 
> The regular roundup() that uses division has the compiler at least
> optimize them to shifts - at least for constant cases. But do_div() is
> meant for "we already know it's not a power of two", and the compiler
> doesn't have any understanding of the internals.
> 
> And it looks to me like the use case you want this for is very much
> probably a power of two. In which case division is all kinds of just
> stupid.
> 
> And we already have a power-of-two round up function that works on
> u64. It's called "round_up()".
> 
> I wish we had a better visual warning about the differences between
> "round_up()" (limited to powers-of-two, but efficient, and works with
> any size) and "roundup()" (generic, potentially horribly slow, and
> doesn't work for 64-bit on 32-bit).
> 
> Side note: "round_up()" has the problem that it uses "x" twice.
> 
> End result: somebody should look at this, but I really don't like the
> "force division" case that is likely horribly slow and nasty.

I haven't yet tested this, but what about something like the following:

# define roundup_64(x, y) (				\
{							\
	typeof(y) __y;					\
	typeof(x) __x;					\
							\
	if (__builtin_constant_p(y) &&			\
	    !(y & (y >> 1))) {				\
		__x = round_up(x, y);			\
	} else {					\
		__y = y;				\
		__x = (x) + (__y - 1);			\
		do_div(__x, __y);			\
		__x = __x * __y;			\
	}						\
	__x;						\
}							\
)

If the compiler knows enough that y is a power of two, it will use the
shift version. Otherwise, it doesn't know enough and would divide
regardless. Or perhaps forget about the constant check, and just force
the power of two check:

# define roundup_64(x, y) (				\
{							\
	typeof(y) __y = y;				\
	typeof(x) __x;					\
							\
	if (!(__y & (__y >> 1))) {			\
		__x = round_up(x, y);			\
	} else {					\
		__x = (x) + (__y - 1);			\
		do_div(__x, __y);			\
		__x = __x * __y;			\
	}						\
	__x;						\
}							\
)

This way even if the compiler doesn't know that this is a power of two,
it will still do the shift if y ends up being one.

-- Steve
Linus Torvalds May 23, 2019, 4:51 p.m. UTC | #3
On Thu, May 23, 2019 at 8:27 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I haven't yet tested this, but what about something like the following:

So that at least handles the constant case that the normal "roundup()"
case also handles.

At the same time, in the case you are talking about, I really do
suspect that we have a (non-constant) power of two, and that you
should have just used "round_up()" which works fine regardless of
size, and is always efficient.

On a slight tangent.. Maybe we should have something like this:

#define size_fn(x, prefix, ...) ({                      \
        typeof(x) __ret;                                \
        switch (sizeof(x)) {                            \
        case 1: __ret = prefix##8(__VA_ARGS__); break;  \
        case 2: __ret = prefix##16(__VA_ARGS__); break; \
        case 4: __ret = prefix##32(__VA_ARGS__); break; \
        case 8: __ret = prefix##64(__VA_ARGS__); break; \
        default: __ret = prefix##bad(__VA_ARGS__);      \
        } __ret; })

#define type_fn(x, prefix, ...) ({                              \
        typeof(x) __ret;                                        \
        if ((typeof(x))-1 > 1)                                  \
                __ret = size_fn(x, prefix##_u, __VA_ARGS__);    \
        else                                                    \
                __ret = size_fn(x, prefix##_s, __VA_ARGS__);    \
        __ret; })

which would allow typed integer functions like this. So you could do
something like

     #define round_up(x, y) size_fn(x, round_up_size, x, y)

and then you define functions for round_up_size8/16/32/64 (and you
have toi declare - but not define - round_up_sizebad()).

Of course, you probably want the usual "at least use 'int'" semantics,
in which case the "type" should be "(x)+0":

     #define round_up(x, y) size_fn((x)+0, round_up_size, x, y)

 and the 8-bit and 16-bit cases will never be used.

We have a lot of cases where we end up using "type overloading" by
size. The most explicit case is perhaps "get_user()" and "put_user()",
but this whole round_up thing is another example.

Maybe we never really care about "char" and "short", and always want
just the "int-vs-long-vs-longlong"? That would make the cases simpler
(32 and 64). And maybe we never care about sign. But we could try to
have some unified helper model like the above..

                  Linus
Steven Rostedt May 23, 2019, 5:36 p.m. UTC | #4
On Thu, 23 May 2019 09:51:29 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, May 23, 2019 at 8:27 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > I haven't yet tested this, but what about something like the following:  
> 
> So that at least handles the constant case that the normal "roundup()"
> case also handles.
> 
> At the same time, in the case you are talking about, I really do
> suspect that we have a (non-constant) power of two, and that you
> should have just used "round_up()" which works fine regardless of
> size, and is always efficient.

I think you are correct in this.

       act_size = roundup_64(attr->length, MLX5_SW_ICM_BLOCK_SIZE(dm_db->dev));

Where we have:

#define MLX5_SW_ICM_BLOCK_SIZE(dev) (1 << MLX5_LOG_SW_ICM_BLOCK_SIZE(dev))

Which pretty much guarantees that it is a power of two. Thus, the real
fix here is simply to s/roundup/round_up/ as you suggest.

> 
> On a slight tangent.. Maybe we should have something like this:
> 
> #define size_fn(x, prefix, ...) ({                      \
>         typeof(x) __ret;                                \
>         switch (sizeof(x)) {                            \
>         case 1: __ret = prefix##8(__VA_ARGS__); break;  \
>         case 2: __ret = prefix##16(__VA_ARGS__); break; \
>         case 4: __ret = prefix##32(__VA_ARGS__); break; \
>         case 8: __ret = prefix##64(__VA_ARGS__); break; \
>         default: __ret = prefix##bad(__VA_ARGS__);      \
>         } __ret; })
> 
> #define type_fn(x, prefix, ...) ({                              \
>         typeof(x) __ret;                                        \
>         if ((typeof(x))-1 > 1)                                  \
>                 __ret = size_fn(x, prefix##_u, __VA_ARGS__);    \
>         else                                                    \
>                 __ret = size_fn(x, prefix##_s, __VA_ARGS__);    \
>         __ret; })
> 
> which would allow typed integer functions like this. So you could do
> something like
> 
>      #define round_up(x, y) size_fn(x, round_up_size, x, y)
> 
> and then you define functions for round_up_size8/16/32/64 (and you

You mean define functions for round_up_size_{u|s}8/16/32/64

> have toi declare - but not define - round_up_sizebad()).
> 
> Of course, you probably want the usual "at least use 'int'" semantics,
> in which case the "type" should be "(x)+0":
> 
>      #define round_up(x, y) size_fn((x)+0, round_up_size, x, y)
> 
>  and the 8-bit and 16-bit cases will never be used.

I'm curious to what the advantage of that is?

> 
> We have a lot of cases where we end up using "type overloading" by
> size. The most explicit case is perhaps "get_user()" and "put_user()",
> but this whole round_up thing is another example.
> 
> Maybe we never really care about "char" and "short", and always want
> just the "int-vs-long-vs-longlong"? That would make the cases simpler
> (32 and 64). And maybe we never care about sign. But we could try to
> have some unified helper model like the above..

It may be simpler and perhaps more robust if we keep the char and short
cases.

I'm fine with adding something like this for round_up(), but do we want
to have a generic roundup_64() as well? I'm also thinking that we
perhaps should test for power of two on roundup():

#define roundup(x, y) (					\
{							\
	typeof(y) __y = y;				\
	typeof(x) __x;					\
							\
	if (__y & (__y - 1))				\
		__x = round_up(x, __y);			\
	else						\
		__x = (((x) + (__y - 1)) / __y) * __y;	\
	__x;						\
})


-- Steve
Linus Torvalds May 23, 2019, 9:19 p.m. UTC | #5
On Thu, May 23, 2019 at 10:36 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > Of course, you probably want the usual "at least use 'int'" semantics,
> > in which case the "type" should be "(x)+0":
> >
> >      #define round_up(x, y) size_fn((x)+0, round_up_size, x, y)
> >
> >  and the 8-bit and 16-bit cases will never be used.
>
> I'm curious to what the advantage of that is?

Let's say that you have a structure with a 'unsigned char' member,
because the value range is 0-255.

What happens if you do

   x = round_up(p->member, 4);

and the value is 255?

Now, if you stay in 'unsigned char' the end result is 0. If you follow
the usual C integer promotion rules ("all arithmetic promotes to at
least 'int'"), you get 256.

Most people probably expect 256, and that implies that even if you
pass an 'unsigned char' to an arithmetic function like this, you
expect any math to be done in 'int'. Doing the "(x)+0" forces that,
because the "+0" changes the type of the expression from "unsigned
char" to "int" due to C integer promotion.

Yes. The C integer type rules are subtle and sometimes surprising. One
of the things I've wanted is to have some way to limit silent
promotion (and silent truncation!), and cause warnings. 'sparse' does
some of that with some special-case types (ie __bitwise), but it's
pretty limited.

              Linus
Steven Rostedt May 24, 2019, 3:26 p.m. UTC | #6
On Fri, 24 May 2019 16:11:14 +0100
Roger Willcocks <roger@filmlight.ltd.uk> wrote:

> On 23/05/2019 16:27, Steven Rostedt wrote:
> >
> > I haven't yet tested this, but what about something like the following:
> >
> > ...perhaps forget about the constant check, and just force
> > the power of two check:
> >
> > 							\
> > 	if (!(__y & (__y >> 1))) {			\
> > 		__x = round_up(x, y);			\
> > 	} else {					\  
> 
> You probably want
> 
>             if (!(__y & (__y - 1))
> 
> --

Yes I do. I corrected it in my next email.

 http://lkml.kernel.org/r/20190523133648.591f9e78@gandalf.local.home

> #define roundup(x, y) (					\
> {							\
> 	typeof(y) __y = y;				\
> 	typeof(x) __x;					\
> 							\
> 	if (__y & (__y - 1))				\
> 		__x = round_up(x, __y);			\
> 	else						\
> 		__x = (((x) + (__y - 1)) / __y) * __y;	\
> 	__x;						\
> })


-- Steve
Steven Rostedt May 24, 2019, 4:36 p.m. UTC | #7
On Fri, 24 May 2019 19:30:45 +0300
Nikolay Borisov <nborisov@suse.com> wrote:


> > Yes I do. I corrected it in my next email.
> > 
> >  http://lkml.kernel.org/r/20190523133648.591f9e78@gandalf.local.home  
> 
> Or perhaps just using is_power_of_2 from include/linux/log2.h ?

Even better. Thanks,

-- Steve
diff mbox series

Patch

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 34a998012bf6..cdacfe1f732c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -143,14 +143,6 @@  nouveau_bo_del_ttm(struct ttm_buffer_object *bo)
 	kfree(nvbo);
 }
 
-static inline u64
-roundup_64(u64 x, u32 y)
-{
-	x += y - 1;
-	do_div(x, y);
-	return x * y;
-}
-
 static void
 nouveau_bo_fixup_align(struct nouveau_bo *nvbo, u32 flags,
 		       int *align, u64 *size)
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index edbd5a210df2..13de9d49bd52 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -207,13 +207,6 @@  static inline xfs_dev_t linux_to_xfs_dev_t(dev_t dev)
 #define xfs_sort(a,n,s,fn)	sort(a,n,s,fn,NULL)
 #define xfs_stack_trace()	dump_stack()
 
-static inline uint64_t roundup_64(uint64_t x, uint32_t y)
-{
-	x += y - 1;
-	do_div(x, y);
-	return x * y;
-}
-
 static inline uint64_t howmany_64(uint64_t x, uint32_t y)
 {
 	x += y - 1;
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74b1ee9027f5..cd0063629357 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -115,6 +115,20 @@ 
 	(((x) + (__y - 1)) / __y) * __y;		\
 }							\
 )
+
+#if BITS_PER_LONG == 32
+# define roundup_64(x, y) (				\
+{							\
+	typeof(y) __y = y;				\
+	typeof(x) __x = (x) + (__y - 1);		\
+	do_div(__x, __y);				\
+	__x * __y;					\
+}							\
+)
+#else
+# define roundup_64(x, y)	roundup(x, y)
+#endif
+
 /**
  * rounddown - round down to next specified multiple
  * @x: the value to round