diff mbox series

sysinfo: include availram field in sysinfo struct

Message ID 1641483250-18839-1-git-send-email-quic_pintu@quicinc.com (mailing list archive)
State New
Headers show
Series sysinfo: include availram field in sysinfo struct | expand

Commit Message

Pintu Kumar Jan. 6, 2022, 3:34 p.m. UTC
The sysinfo member does not have any "available ram" field and
the bufferram field is not much helpful either, to get a rough
estimate of available ram needed for allocation.

One needs to parse MemAvailable field separately from /proc/meminfo
to get this info instead of directly getting if from sysinfo itself.

Thus, this patch introduce a new field as availram in sysinfo
so that all the info total/free/available can be retrieved from
one place itself.

There are couple of places in kernel as well where this can be improved.
For example:
In fs/proc/meminfo.c:
meminfo_proc_show:
   si_meminfo(&i);
   available = si_mem_available();
Now with this change the second call be avoided.
Thus, we can directly do:
show_val_kb(m, "MemAvailable:   ", i.availram);

Note, this also requires update in procfs for free and other commands.
Like in free command as well we frist call sysinfo then again parse
/proc/meminfo to get available field.
This can be avoided too with higher kernel version.

A sample output with single sysinfo call is shown below:
Total RAM: 248376 kB
 Free RAM: 231540 kB
Avail RAM: 230448 kB

Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
---
 include/uapi/linux/sysinfo.h | 1 +
 kernel/sys.c                 | 4 ++++
 mm/page_alloc.c              | 2 ++
 3 files changed, 7 insertions(+)

Comments

Cyrill Gorcunov Jan. 6, 2022, 4:11 p.m. UTC | #1
On Thu, Jan 06, 2022 at 09:04:10PM +0530, Pintu Kumar wrote:
> The sysinfo member does not have any "available ram" field and
> the bufferram field is not much helpful either, to get a rough
> estimate of available ram needed for allocation.
> 
> One needs to parse MemAvailable field separately from /proc/meminfo
> to get this info instead of directly getting if from sysinfo itself.
> 
> Thus, this patch introduce a new field as availram in sysinfo
> so that all the info total/free/available can be retrieved from
> one place itself.
> 
> There are couple of places in kernel as well where this can be improved.
> For example:
> In fs/proc/meminfo.c:
> meminfo_proc_show:
>    si_meminfo(&i);
>    available = si_mem_available();
> Now with this change the second call be avoided.
> Thus, we can directly do:
> show_val_kb(m, "MemAvailable:   ", i.availram);
> 
> Note, this also requires update in procfs for free and other commands.
> Like in free command as well we frist call sysinfo then again parse
> /proc/meminfo to get available field.
> This can be avoided too with higher kernel version.
> 
> A sample output with single sysinfo call is shown below:
> Total RAM: 248376 kB
>  Free RAM: 231540 kB
> Avail RAM: 230448 kB
> 
> Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
> Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
> ---
>  include/uapi/linux/sysinfo.h | 1 +
>  kernel/sys.c                 | 4 ++++
>  mm/page_alloc.c              | 2 ++
>  3 files changed, 7 insertions(+)
> 
> diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> index 435d5c2..6e77e90 100644
> --- a/include/uapi/linux/sysinfo.h
> +++ b/include/uapi/linux/sysinfo.h
> @@ -12,6 +12,7 @@ struct sysinfo {
>  	__kernel_ulong_t freeram;	/* Available memory size */
>  	__kernel_ulong_t sharedram;	/* Amount of shared memory */
>  	__kernel_ulong_t bufferram;	/* Memory used by buffers */
> +	__kernel_ulong_t availram;	/* Memory available for allocation */
>  	__kernel_ulong_t totalswap;	/* Total swap space size */
>  	__kernel_ulong_t freeswap;	/* swap space still available */
>  	__u16 procs;		   	/* Number of current processes */

Hi! Sorry, but I don't understand -- the sysinfo structure seems to
be part of user API, no? Don't we break it up here?
Pintu Agarwal Jan. 6, 2022, 4:49 p.m. UTC | #2
On Thu, 6 Jan 2022 at 21:41, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>
> On Thu, Jan 06, 2022 at 09:04:10PM +0530, Pintu Kumar wrote:
> > The sysinfo member does not have any "available ram" field and
> > the bufferram field is not much helpful either, to get a rough
> > estimate of available ram needed for allocation.
> >
> > One needs to parse MemAvailable field separately from /proc/meminfo
> > to get this info instead of directly getting if from sysinfo itself.
> >
> > Thus, this patch introduce a new field as availram in sysinfo
> > so that all the info total/free/available can be retrieved from
> > one place itself.
> >
> > There are couple of places in kernel as well where this can be improved.
> > For example:
> > In fs/proc/meminfo.c:
> > meminfo_proc_show:
> >    si_meminfo(&i);
> >    available = si_mem_available();
> > Now with this change the second call be avoided.
> > Thus, we can directly do:
> > show_val_kb(m, "MemAvailable:   ", i.availram);
> >
> > Note, this also requires update in procfs for free and other commands.
> > Like in free command as well we frist call sysinfo then again parse
> > /proc/meminfo to get available field.
> > This can be avoided too with higher kernel version.
> >
> > A sample output with single sysinfo call is shown below:
> > Total RAM: 248376 kB
> >  Free RAM: 231540 kB
> > Avail RAM: 230448 kB
> >
> > Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
> > Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
> > ---
> >  include/uapi/linux/sysinfo.h | 1 +
> >  kernel/sys.c                 | 4 ++++
> >  mm/page_alloc.c              | 2 ++
> >  3 files changed, 7 insertions(+)
> >
> > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > index 435d5c2..6e77e90 100644
> > --- a/include/uapi/linux/sysinfo.h
> > +++ b/include/uapi/linux/sysinfo.h
> > @@ -12,6 +12,7 @@ struct sysinfo {
> >       __kernel_ulong_t freeram;       /* Available memory size */
> >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> >       __kernel_ulong_t totalswap;     /* Total swap space size */
> >       __kernel_ulong_t freeswap;      /* swap space still available */
> >       __u16 procs;                    /* Number of current processes */
>
> Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> be part of user API, no? Don't we break it up here?

Yes, the corresponding user space header /usr/include/linux/sysinfo.h
also needs to be updated.
When we generate the kernel header it will be updated automatically.
Cyrill Gorcunov Jan. 6, 2022, 5:27 p.m. UTC | #3
On Thu, Jan 06, 2022 at 10:19:55PM +0530, Pintu Agarwal wrote:
> > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > > index 435d5c2..6e77e90 100644
> > > --- a/include/uapi/linux/sysinfo.h
> > > +++ b/include/uapi/linux/sysinfo.h
> > > @@ -12,6 +12,7 @@ struct sysinfo {
> > >       __kernel_ulong_t freeram;       /* Available memory size */
> > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> > >       __kernel_ulong_t freeswap;      /* swap space still available */
> > >       __u16 procs;                    /* Number of current processes */
> >
> > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> > be part of user API, no? Don't we break it up here?
> 
> Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> also needs to be updated.
> When we generate the kernel header it will be updated automatically.

Wait. The userspace may pass old structure here, and in result we
return incorrect layout which won't match old one, no? Old binary
code has no clue about this header update.
David Laight Jan. 6, 2022, 5:41 p.m. UTC | #4
From: Pintu Agarwal
> Sent: 06 January 2022 16:50
> 
> On Thu, 6 Jan 2022 at 21:41, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> >
> > On Thu, Jan 06, 2022 at 09:04:10PM +0530, Pintu Kumar wrote:
> > > The sysinfo member does not have any "available ram" field and
> > > the bufferram field is not much helpful either, to get a rough
> > > estimate of available ram needed for allocation.
> > >
> > > One needs to parse MemAvailable field separately from /proc/meminfo
> > > to get this info instead of directly getting if from sysinfo itself.
> > >
> > > Thus, this patch introduce a new field as availram in sysinfo
> > > so that all the info total/free/available can be retrieved from
> > > one place itself.
> > >
> > > There are couple of places in kernel as well where this can be improved.
> > > For example:
> > > In fs/proc/meminfo.c:
> > > meminfo_proc_show:
> > >    si_meminfo(&i);
> > >    available = si_mem_available();
> > > Now with this change the second call be avoided.
> > > Thus, we can directly do:
> > > show_val_kb(m, "MemAvailable:   ", i.availram);
> > >
> > > Note, this also requires update in procfs for free and other commands.
> > > Like in free command as well we frist call sysinfo then again parse
> > > /proc/meminfo to get available field.
> > > This can be avoided too with higher kernel version.
> > >
> > > A sample output with single sysinfo call is shown below:
> > > Total RAM: 248376 kB
> > >  Free RAM: 231540 kB
> > > Avail RAM: 230448 kB
> > >
> > > Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
> > > Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
> > > ---
> > >  include/uapi/linux/sysinfo.h | 1 +
> > >  kernel/sys.c                 | 4 ++++
> > >  mm/page_alloc.c              | 2 ++
> > >  3 files changed, 7 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > > index 435d5c2..6e77e90 100644
> > > --- a/include/uapi/linux/sysinfo.h
> > > +++ b/include/uapi/linux/sysinfo.h
> > > @@ -12,6 +12,7 @@ struct sysinfo {
> > >       __kernel_ulong_t freeram;       /* Available memory size */
> > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> > >       __kernel_ulong_t freeswap;      /* swap space still available */
> > >       __u16 procs;                    /* Number of current processes */
> >
> > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> > be part of user API, no? Don't we break it up here?
> 
> Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> also needs to be updated.
> When we generate the kernel header it will be updated automatically.

You can't add a field in the middle of a UAPI structure.
It breaks compatibility for old binaries.

Depending on the interface definition you may be able to add one at the end.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Pintu Agarwal Jan. 6, 2022, 5:59 p.m. UTC | #5
On Thu, 6 Jan 2022 at 23:12, David Laight <David.Laight@aculab.com> wrote:
>
> From: Pintu Agarwal
> > Sent: 06 January 2022 16:50
> >
> > On Thu, 6 Jan 2022 at 21:41, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
> > >
> > > On Thu, Jan 06, 2022 at 09:04:10PM +0530, Pintu Kumar wrote:
> > > > The sysinfo member does not have any "available ram" field and
> > > > the bufferram field is not much helpful either, to get a rough
> > > > estimate of available ram needed for allocation.
> > > >
> > > > One needs to parse MemAvailable field separately from /proc/meminfo
> > > > to get this info instead of directly getting if from sysinfo itself.
> > > >
> > > > Thus, this patch introduce a new field as availram in sysinfo
> > > > so that all the info total/free/available can be retrieved from
> > > > one place itself.
> > > >
> > > > There are couple of places in kernel as well where this can be improved.
> > > > For example:
> > > > In fs/proc/meminfo.c:
> > > > meminfo_proc_show:
> > > >    si_meminfo(&i);
> > > >    available = si_mem_available();
> > > > Now with this change the second call be avoided.
> > > > Thus, we can directly do:
> > > > show_val_kb(m, "MemAvailable:   ", i.availram);
> > > >
> > > > Note, this also requires update in procfs for free and other commands.
> > > > Like in free command as well we frist call sysinfo then again parse
> > > > /proc/meminfo to get available field.
> > > > This can be avoided too with higher kernel version.
> > > >
> > > > A sample output with single sysinfo call is shown below:
> > > > Total RAM: 248376 kB
> > > >  Free RAM: 231540 kB
> > > > Avail RAM: 230448 kB
> > > >
> > > > Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
> > > > Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
> > > > ---
> > > >  include/uapi/linux/sysinfo.h | 1 +
> > > >  kernel/sys.c                 | 4 ++++
> > > >  mm/page_alloc.c              | 2 ++
> > > >  3 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > > > index 435d5c2..6e77e90 100644
> > > > --- a/include/uapi/linux/sysinfo.h
> > > > +++ b/include/uapi/linux/sysinfo.h
> > > > @@ -12,6 +12,7 @@ struct sysinfo {
> > > >       __kernel_ulong_t freeram;       /* Available memory size */
> > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> > > >       __kernel_ulong_t freeswap;      /* swap space still available */
> > > >       __u16 procs;                    /* Number of current processes */
> > >
> > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> > > be part of user API, no? Don't we break it up here?
> >
> > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> > also needs to be updated.
> > When we generate the kernel header it will be updated automatically.
>
> You can't add a field in the middle of a UAPI structure.
> It breaks compatibility for old binaries.
>
> Depending on the interface definition you may be able to add one at the end.
>
oh okay thank you for your feedback. I will move to the end and check again.
But my doubt is, whether I should move before this
char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];
or after this ?

Also, I could not understand what this is for ?
Do we need to update this since sture is changed ?
Eric W. Biederman Jan. 6, 2022, 7:20 p.m. UTC | #6
Pintu Agarwal <pintu.ping@gmail.com> writes:

> On Thu, 6 Jan 2022 at 23:12, David Laight <David.Laight@aculab.com> wrote:
>>
>> From: Pintu Agarwal
>> > Sent: 06 January 2022 16:50
>> >
>> > On Thu, 6 Jan 2022 at 21:41, Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>> > >
>> > > On Thu, Jan 06, 2022 at 09:04:10PM +0530, Pintu Kumar wrote:
>> > > > The sysinfo member does not have any "available ram" field and
>> > > > the bufferram field is not much helpful either, to get a rough
>> > > > estimate of available ram needed for allocation.
>> > > >
>> > > > One needs to parse MemAvailable field separately from /proc/meminfo
>> > > > to get this info instead of directly getting if from sysinfo itself.
>> > > >
>> > > > Thus, this patch introduce a new field as availram in sysinfo
>> > > > so that all the info total/free/available can be retrieved from
>> > > > one place itself.
>> > > >
>> > > > There are couple of places in kernel as well where this can be improved.
>> > > > For example:
>> > > > In fs/proc/meminfo.c:
>> > > > meminfo_proc_show:
>> > > >    si_meminfo(&i);
>> > > >    available = si_mem_available();
>> > > > Now with this change the second call be avoided.
>> > > > Thus, we can directly do:
>> > > > show_val_kb(m, "MemAvailable:   ", i.availram);
>> > > >
>> > > > Note, this also requires update in procfs for free and other commands.
>> > > > Like in free command as well we frist call sysinfo then again parse
>> > > > /proc/meminfo to get available field.
>> > > > This can be avoided too with higher kernel version.
>> > > >
>> > > > A sample output with single sysinfo call is shown below:
>> > > > Total RAM: 248376 kB
>> > > >  Free RAM: 231540 kB
>> > > > Avail RAM: 230448 kB
>> > > >
>> > > > Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com>
>> > > > Signed-off-by: Pintu Agarwal <pintu.ping@gmail.com>
>> > > > ---
>> > > >  include/uapi/linux/sysinfo.h | 1 +
>> > > >  kernel/sys.c                 | 4 ++++
>> > > >  mm/page_alloc.c              | 2 ++
>> > > >  3 files changed, 7 insertions(+)
>> > > >
>> > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
>> > > > index 435d5c2..6e77e90 100644
>> > > > --- a/include/uapi/linux/sysinfo.h
>> > > > +++ b/include/uapi/linux/sysinfo.h
>> > > > @@ -12,6 +12,7 @@ struct sysinfo {
>> > > >       __kernel_ulong_t freeram;       /* Available memory size */
>> > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
>> > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
>> > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
>> > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
>> > > >       __kernel_ulong_t freeswap;      /* swap space still available */
>> > > >       __u16 procs;                    /* Number of current processes */
>> > >
>> > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
>> > > be part of user API, no? Don't we break it up here?
>> >
>> > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
>> > also needs to be updated.
>> > When we generate the kernel header it will be updated automatically.
>>
>> You can't add a field in the middle of a UAPI structure.
>> It breaks compatibility for old binaries.
>>
>> Depending on the interface definition you may be able to add one at the end.
>>
> oh okay thank you for your feedback. I will move to the end and check again.
> But my doubt is, whether I should move before this
> char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];
> or after this ?

Before the padding and you should reduce the size of the padding by the
size of your new field.

> Also, I could not understand what this is for ?
> Do we need to update this since sture is changed ?

In general padding like that is so new fields can be added.  The
comment about libc5 makes me a wonder a bit, but I expect libc5 just
added the padding in it's copy of the structure that it exported to
userspace many many years ago so that new fields could be added.

Eric
Christian Brauner Jan. 7, 2022, 12:04 p.m. UTC | #7
On Thu, Jan 06, 2022 at 08:27:47PM +0300, Cyrill Gorcunov wrote:
> On Thu, Jan 06, 2022 at 10:19:55PM +0530, Pintu Agarwal wrote:
> > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > > > index 435d5c2..6e77e90 100644
> > > > --- a/include/uapi/linux/sysinfo.h
> > > > +++ b/include/uapi/linux/sysinfo.h
> > > > @@ -12,6 +12,7 @@ struct sysinfo {
> > > >       __kernel_ulong_t freeram;       /* Available memory size */
> > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> > > >       __kernel_ulong_t freeswap;      /* swap space still available */
> > > >       __u16 procs;                    /* Number of current processes */
> > >
> > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> > > be part of user API, no? Don't we break it up here?
> > 
> > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> > also needs to be updated.
> > When we generate the kernel header it will be updated automatically.
> 
> Wait. The userspace may pass old structure here, and in result we
> return incorrect layout which won't match old one, no? Old binary
> code has no clue about this header update.

Yes, that won't work as done.

If we want to do this it needs to be done at the end of the struct right
before the padding field and the newly added field substracted from the
padding. (Not the preferred way to do it these days for new structs.)

A new kernel can then pass in the struct with the newly added field and
an old kernel can just fill the struct in as usual. New kernel will
update the field with the correct value.

But there's a catch depending on the type of value.
The problem with these types of extensions is that you'll often need
indicators to and from the kernel whether the extension is supported.

Consider an extension where 0 is a valid value meaning "this resource is
completely used". Since the kernel and userspace always agree on the
size of the struct the kernel will zero the whole struct. So if in your
newly added field 0 is a valid value you can't differentiate between 0
as a valid value indicating that your resource isn't available and 0 as
the kernel not supporting your extension.

Other APIs solve this and similar problems by having a request mask and
a return mask.  Userspace fills in what values it wants reported in the
request mask and kernel sets the supported flags in the return mask.
This way you can differentiate between the two (see statx).

If the 0 example is not a concern or acceptable for userspace it's
probably fine. But you need to document that having 0 returned can mean
both things.

Or, you select a value different from 0 (-1?) that you can use to
indicate to userspace that the resource is used up so 0 can just mean
"kernel doesn't support this extension".
Pintu Agarwal Jan. 7, 2022, 1:44 p.m. UTC | #8
On Fri, 7 Jan 2022 at 17:35, Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Thu, Jan 06, 2022 at 08:27:47PM +0300, Cyrill Gorcunov wrote:
> > On Thu, Jan 06, 2022 at 10:19:55PM +0530, Pintu Agarwal wrote:
> > > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > > > > index 435d5c2..6e77e90 100644
> > > > > --- a/include/uapi/linux/sysinfo.h
> > > > > +++ b/include/uapi/linux/sysinfo.h
> > > > > @@ -12,6 +12,7 @@ struct sysinfo {
> > > > >       __kernel_ulong_t freeram;       /* Available memory size */
> > > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> > > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> > > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> > > > >       __kernel_ulong_t freeswap;      /* swap space still available */
> > > > >       __u16 procs;                    /* Number of current processes */
> > > >
> > > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> > > > be part of user API, no? Don't we break it up here?
> > >
> > > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> > > also needs to be updated.
> > > When we generate the kernel header it will be updated automatically.
> >
> > Wait. The userspace may pass old structure here, and in result we
> > return incorrect layout which won't match old one, no? Old binary
> > code has no clue about this header update.
>
> Yes, that won't work as done.
>
> If we want to do this it needs to be done at the end of the struct right
> before the padding field and the newly added field substracted from the
> padding. (Not the preferred way to do it these days for new structs.)
>
> A new kernel can then pass in the struct with the newly added field and
> an old kernel can just fill the struct in as usual. New kernel will
> update the field with the correct value.
>
> But there's a catch depending on the type of value.
> The problem with these types of extensions is that you'll often need
> indicators to and from the kernel whether the extension is supported.
>
> Consider an extension where 0 is a valid value meaning "this resource is
> completely used". Since the kernel and userspace always agree on the
> size of the struct the kernel will zero the whole struct. So if in your
> newly added field 0 is a valid value you can't differentiate between 0
> as a valid value indicating that your resource isn't available and 0 as
> the kernel not supporting your extension.
>
> Other APIs solve this and similar problems by having a request mask and
> a return mask.  Userspace fills in what values it wants reported in the
> request mask and kernel sets the supported flags in the return mask.
> This way you can differentiate between the two (see statx).
>
> If the 0 example is not a concern or acceptable for userspace it's
> probably fine. But you need to document that having 0 returned can mean
> both things.
>
> Or, you select a value different from 0 (-1?) that you can use to
> indicate to userspace that the resource is used up so 0 can just mean
> "kernel doesn't support this extension".

Thanks all for your inputs.
As Eric suggested in other thread (pasting here for reference):
{
> Before the padding and you should reduce the size of the padding by the
> size of your new field.

>> Also, I could not understand what this is for ?
>> Do we need to update this since sture is changed ?

> In general padding like that is so new fields can be added.  The
> comment about libc5 makes me a wonder a bit, but I expect libc5 just
> added the padding in it's copy of the structure that it exported to
> userspace many many years ago so that new fields could be added.

> Eric
}

I made the changes like below and this seems to work even with older user space.
I mean earlier, when I ran "free" command it was giving "stack
smashing..." error,
but now the "free" command (which comes as part of busybox) works fine
even without recompiling with the updated header.

These are the header changes for quick look:
{{{
diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
index 6e77e90..fe84c6a 100644
--- a/include/uapi/linux/sysinfo.h
+++ b/include/uapi/linux/sysinfo.h
@@ -12,7 +12,6 @@ struct sysinfo {
        __kernel_ulong_t freeram;       /* Available memory size */
        __kernel_ulong_t sharedram;     /* Amount of shared memory */
        __kernel_ulong_t bufferram;     /* Memory used by buffers */
-       __kernel_ulong_t availram;      /* Memory available for allocation */
        __kernel_ulong_t totalswap;     /* Total swap space size */
        __kernel_ulong_t freeswap;      /* swap space still available */
        __u16 procs;                    /* Number of current processes */
@@ -20,7 +19,8 @@ struct sysinfo {
        __kernel_ulong_t totalhigh;     /* Total high memory size */
        __kernel_ulong_t freehigh;      /* Available high memory size */
        __u32 mem_unit;                 /* Memory unit size in bytes */
-       char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
Padding: libc5 uses this.. */
+       __kernel_ulong_t availram;      /* Memory available for allocation */
+       char _f[20-3*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
Padding: libc5 uses this.. */
 };
}}}

If this is fine, I will push the new patch set.

Thanks,
Pintu
Vlastimil Babka Jan. 7, 2022, 4:58 p.m. UTC | #9
CC linux-api

On 1/7/22 14:44, Pintu Agarwal wrote:
> On Fri, 7 Jan 2022 at 17:35, Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
>>
>> On Thu, Jan 06, 2022 at 08:27:47PM +0300, Cyrill Gorcunov wrote:
>> > On Thu, Jan 06, 2022 at 10:19:55PM +0530, Pintu Agarwal wrote:
>> > > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
>> > > > > index 435d5c2..6e77e90 100644
>> > > > > --- a/include/uapi/linux/sysinfo.h
>> > > > > +++ b/include/uapi/linux/sysinfo.h
>> > > > > @@ -12,6 +12,7 @@ struct sysinfo {
>> > > > >       __kernel_ulong_t freeram;       /* Available memory size */
>> > > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
>> > > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
>> > > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
>> > > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
>> > > > >       __kernel_ulong_t freeswap;      /* swap space still available */
>> > > > >       __u16 procs;                    /* Number of current processes */
>> > > >
>> > > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
>> > > > be part of user API, no? Don't we break it up here?
>> > >
>> > > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
>> > > also needs to be updated.
>> > > When we generate the kernel header it will be updated automatically.
>> >
>> > Wait. The userspace may pass old structure here, and in result we
>> > return incorrect layout which won't match old one, no? Old binary
>> > code has no clue about this header update.
>>
>> Yes, that won't work as done.
>>
>> If we want to do this it needs to be done at the end of the struct right
>> before the padding field and the newly added field substracted from the
>> padding. (Not the preferred way to do it these days for new structs.)
>>
>> A new kernel can then pass in the struct with the newly added field and
>> an old kernel can just fill the struct in as usual. New kernel will
>> update the field with the correct value.
>>
>> But there's a catch depending on the type of value.
>> The problem with these types of extensions is that you'll often need
>> indicators to and from the kernel whether the extension is supported.
>>
>> Consider an extension where 0 is a valid value meaning "this resource is
>> completely used". Since the kernel and userspace always agree on the
>> size of the struct the kernel will zero the whole struct. So if in your
>> newly added field 0 is a valid value you can't differentiate between 0
>> as a valid value indicating that your resource isn't available and 0 as
>> the kernel not supporting your extension.
>>
>> Other APIs solve this and similar problems by having a request mask and
>> a return mask.  Userspace fills in what values it wants reported in the
>> request mask and kernel sets the supported flags in the return mask.
>> This way you can differentiate between the two (see statx).
>>
>> If the 0 example is not a concern or acceptable for userspace it's
>> probably fine. But you need to document that having 0 returned can mean
>> both things.
>>
>> Or, you select a value different from 0 (-1?) that you can use to
>> indicate to userspace that the resource is used up so 0 can just mean
>> "kernel doesn't support this extension".
> 
> Thanks all for your inputs.
> As Eric suggested in other thread (pasting here for reference):
> {
>> Before the padding and you should reduce the size of the padding by the
>> size of your new field.
> 
>>> Also, I could not understand what this is for ?
>>> Do we need to update this since sture is changed ?
> 
>> In general padding like that is so new fields can be added.  The
>> comment about libc5 makes me a wonder a bit, but I expect libc5 just
>> added the padding in it's copy of the structure that it exported to
>> userspace many many years ago so that new fields could be added.
> 
>> Eric
> }
> 
> I made the changes like below and this seems to work even with older user space.
> I mean earlier, when I ran "free" command it was giving "stack
> smashing..." error,
> but now the "free" command (which comes as part of busybox) works fine
> even without recompiling with the updated header.
> 
> These are the header changes for quick look:
> {{{
> diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> index 6e77e90..fe84c6a 100644
> --- a/include/uapi/linux/sysinfo.h
> +++ b/include/uapi/linux/sysinfo.h
> @@ -12,7 +12,6 @@ struct sysinfo {
>         __kernel_ulong_t freeram;       /* Available memory size */
>         __kernel_ulong_t sharedram;     /* Amount of shared memory */
>         __kernel_ulong_t bufferram;     /* Memory used by buffers */
> -       __kernel_ulong_t availram;      /* Memory available for allocation */
>         __kernel_ulong_t totalswap;     /* Total swap space size */
>         __kernel_ulong_t freeswap;      /* swap space still available */
>         __u16 procs;                    /* Number of current processes */
> @@ -20,7 +19,8 @@ struct sysinfo {
>         __kernel_ulong_t totalhigh;     /* Total high memory size */
>         __kernel_ulong_t freehigh;      /* Available high memory size */
>         __u32 mem_unit;                 /* Memory unit size in bytes */
> -       char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> Padding: libc5 uses this.. */
> +       __kernel_ulong_t availram;      /* Memory available for allocation */
> +       char _f[20-3*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> Padding: libc5 uses this.. */
>  };
> }}}
> 
> If this is fine, I will push the new patch set.

Please CC linux-api@vger.kernel.org on the new posting.

> Thanks,
> Pintu
>
Pintu Agarwal Jan. 7, 2022, 5:47 p.m. UTC | #10
On Fri, 7 Jan 2022 at 22:28, Vlastimil Babka <vbabka@suse.cz> wrote:
>
> CC linux-api
>
> On 1/7/22 14:44, Pintu Agarwal wrote:
> > On Fri, 7 Jan 2022 at 17:35, Christian Brauner
> > <christian.brauner@ubuntu.com> wrote:
> >>
> >> On Thu, Jan 06, 2022 at 08:27:47PM +0300, Cyrill Gorcunov wrote:
> >> > On Thu, Jan 06, 2022 at 10:19:55PM +0530, Pintu Agarwal wrote:
> >> > > > > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> >> > > > > index 435d5c2..6e77e90 100644
> >> > > > > --- a/include/uapi/linux/sysinfo.h
> >> > > > > +++ b/include/uapi/linux/sysinfo.h
> >> > > > > @@ -12,6 +12,7 @@ struct sysinfo {
> >> > > > >       __kernel_ulong_t freeram;       /* Available memory size */
> >> > > > >       __kernel_ulong_t sharedram;     /* Amount of shared memory */
> >> > > > >       __kernel_ulong_t bufferram;     /* Memory used by buffers */
> >> > > > > +     __kernel_ulong_t availram;      /* Memory available for allocation */
> >> > > > >       __kernel_ulong_t totalswap;     /* Total swap space size */
> >> > > > >       __kernel_ulong_t freeswap;      /* swap space still available */
> >> > > > >       __u16 procs;                    /* Number of current processes */
> >> > > >
> >> > > > Hi! Sorry, but I don't understand -- the sysinfo structure seems to
> >> > > > be part of user API, no? Don't we break it up here?
> >> > >
> >> > > Yes, the corresponding user space header /usr/include/linux/sysinfo.h
> >> > > also needs to be updated.
> >> > > When we generate the kernel header it will be updated automatically.
> >> >
> >> > Wait. The userspace may pass old structure here, and in result we
> >> > return incorrect layout which won't match old one, no? Old binary
> >> > code has no clue about this header update.
> >>
> >> Yes, that won't work as done.
> >>
> >> If we want to do this it needs to be done at the end of the struct right
> >> before the padding field and the newly added field substracted from the
> >> padding. (Not the preferred way to do it these days for new structs.)
> >>
> >> A new kernel can then pass in the struct with the newly added field and
> >> an old kernel can just fill the struct in as usual. New kernel will
> >> update the field with the correct value.
> >>
> >> But there's a catch depending on the type of value.
> >> The problem with these types of extensions is that you'll often need
> >> indicators to and from the kernel whether the extension is supported.
> >>
> >> Consider an extension where 0 is a valid value meaning "this resource is
> >> completely used". Since the kernel and userspace always agree on the
> >> size of the struct the kernel will zero the whole struct. So if in your
> >> newly added field 0 is a valid value you can't differentiate between 0
> >> as a valid value indicating that your resource isn't available and 0 as
> >> the kernel not supporting your extension.
> >>
> >> Other APIs solve this and similar problems by having a request mask and
> >> a return mask.  Userspace fills in what values it wants reported in the
> >> request mask and kernel sets the supported flags in the return mask.
> >> This way you can differentiate between the two (see statx).
> >>
> >> If the 0 example is not a concern or acceptable for userspace it's
> >> probably fine. But you need to document that having 0 returned can mean
> >> both things.
> >>
> >> Or, you select a value different from 0 (-1?) that you can use to
> >> indicate to userspace that the resource is used up so 0 can just mean
> >> "kernel doesn't support this extension".
> >
> > Thanks all for your inputs.
> > As Eric suggested in other thread (pasting here for reference):
> > {
> >> Before the padding and you should reduce the size of the padding by the
> >> size of your new field.
> >
> >>> Also, I could not understand what this is for ?
> >>> Do we need to update this since sture is changed ?
> >
> >> In general padding like that is so new fields can be added.  The
> >> comment about libc5 makes me a wonder a bit, but I expect libc5 just
> >> added the padding in it's copy of the structure that it exported to
> >> userspace many many years ago so that new fields could be added.
> >
> >> Eric
> > }
> >
> > I made the changes like below and this seems to work even with older user space.
> > I mean earlier, when I ran "free" command it was giving "stack
> > smashing..." error,
> > but now the "free" command (which comes as part of busybox) works fine
> > even without recompiling with the updated header.
> >
> > These are the header changes for quick look:
> > {{{
> > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > index 6e77e90..fe84c6a 100644
> > --- a/include/uapi/linux/sysinfo.h
> > +++ b/include/uapi/linux/sysinfo.h
> > @@ -12,7 +12,6 @@ struct sysinfo {
> >         __kernel_ulong_t freeram;       /* Available memory size */
> >         __kernel_ulong_t sharedram;     /* Amount of shared memory */
> >         __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > -       __kernel_ulong_t availram;      /* Memory available for allocation */
> >         __kernel_ulong_t totalswap;     /* Total swap space size */
> >         __kernel_ulong_t freeswap;      /* swap space still available */
> >         __u16 procs;                    /* Number of current processes */
> > @@ -20,7 +19,8 @@ struct sysinfo {
> >         __kernel_ulong_t totalhigh;     /* Total high memory size */
> >         __kernel_ulong_t freehigh;      /* Available high memory size */
> >         __u32 mem_unit;                 /* Memory unit size in bytes */
> > -       char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> > Padding: libc5 uses this.. */
> > +       __kernel_ulong_t availram;      /* Memory available for allocation */
> > +       char _f[20-3*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> > Padding: libc5 uses this.. */
> >  };
> > }}}
> >
> > If this is fine, I will push the new patch set.
>
> Please CC linux-api@vger.kernel.org on the new posting.
>

@Christian Brauner,
Regarding 0 case I guess it is fine.
Just to cross check, I used my test program to run with some other
kernel (where there are no changes to sysinfo).
I see that the field returns 0.
# ./test-sysinfo.out
Total RAM: 249320 kB
Free RAM: 233416 kB
Avail RAM: 0 kB

And this is fine and this is also good.
This also indicates 2 things:
a) Either "availram" field is not available in this kernel version
(less than 5.1x)
==> Thus it should fall back to parsing MemAvailable from /proc/meminfo
b) Or, MemAvailable field itself is not available (less than 3.1x)

I will push the new patch set now..

Thanks all!
Pintu
Cyrill Gorcunov Jan. 7, 2022, 7:51 p.m. UTC | #11
On Fri, Jan 07, 2022 at 01:04:51PM +0100, Christian Brauner wrote:
> > 
> > Wait. The userspace may pass old structure here, and in result we
> > return incorrect layout which won't match old one, no? Old binary
> > code has no clue about this header update.
> 
> Yes, that won't work as done.
> 

Yup. When I've been developing struct prctl_mm_map I reserved
PR_SET_MM_MAP_SIZE opcode so userspace would be able to query
current structure size and provide memory slab needed to fit
running structure.

As far as I see we can cut off some space from padding (at the
end of the structure) though one need to make a precise check
that there is no alignment holes appear on different architectures
other than x86. This area is pretty sensitive.
David Laight Jan. 7, 2022, 10:18 p.m. UTC | #12
> > These are the header changes for quick look:
> > {{{
> > diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
> > index 6e77e90..fe84c6a 100644
> > --- a/include/uapi/linux/sysinfo.h
> > +++ b/include/uapi/linux/sysinfo.h
> > @@ -12,7 +12,6 @@ struct sysinfo {
> >         __kernel_ulong_t freeram;       /* Available memory size */
> >         __kernel_ulong_t sharedram;     /* Amount of shared memory */
> >         __kernel_ulong_t bufferram;     /* Memory used by buffers */
> > -       __kernel_ulong_t availram;      /* Memory available for allocation */
> >         __kernel_ulong_t totalswap;     /* Total swap space size */
> >         __kernel_ulong_t freeswap;      /* swap space still available */
> >         __u16 procs;                    /* Number of current processes */
> > @@ -20,7 +19,8 @@ struct sysinfo {
> >         __kernel_ulong_t totalhigh;     /* Total high memory size */
> >         __kernel_ulong_t freehigh;      /* Available high memory size */
> >         __u32 mem_unit;                 /* Memory unit size in bytes */
> > -       char _f[20-2*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> > Padding: libc5 uses this.. */
> > +       __kernel_ulong_t availram;      /* Memory available for allocation */
> > +       char _f[20-3*sizeof(__kernel_ulong_t)-sizeof(__u32)];   /*
> > Padding: libc5 uses this.. */
> >  };
> > }}}
> >
> > If this is fine, I will push the new patch set.
> 
> Please CC linux-api@vger.kernel.org on the new posting.

That is probably still broken.
If __kernel_ulong_t is 64bit there is architecture
dependant padding after mem_unit.

In particular a 32bit x86 app running on a 64bit kernel
will probably see the wrong layout.

You definitely need a compile-time assert on the total structure size.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/include/uapi/linux/sysinfo.h b/include/uapi/linux/sysinfo.h
index 435d5c2..6e77e90 100644
--- a/include/uapi/linux/sysinfo.h
+++ b/include/uapi/linux/sysinfo.h
@@ -12,6 +12,7 @@  struct sysinfo {
 	__kernel_ulong_t freeram;	/* Available memory size */
 	__kernel_ulong_t sharedram;	/* Amount of shared memory */
 	__kernel_ulong_t bufferram;	/* Memory used by buffers */
+	__kernel_ulong_t availram;	/* Memory available for allocation */
 	__kernel_ulong_t totalswap;	/* Total swap space size */
 	__kernel_ulong_t freeswap;	/* swap space still available */
 	__u16 procs;		   	/* Number of current processes */
diff --git a/kernel/sys.c b/kernel/sys.c
index ecc4cf0..7059515 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2671,6 +2671,7 @@  static int do_sysinfo(struct sysinfo *info)
 	info->freeram <<= bitcount;
 	info->sharedram <<= bitcount;
 	info->bufferram <<= bitcount;
+	info->availram <<= bitcount;
 	info->totalswap <<= bitcount;
 	info->freeswap <<= bitcount;
 	info->totalhigh <<= bitcount;
@@ -2700,6 +2701,7 @@  struct compat_sysinfo {
 	u32 freeram;
 	u32 sharedram;
 	u32 bufferram;
+	u32 availram;
 	u32 totalswap;
 	u32 freeswap;
 	u16 procs;
@@ -2732,6 +2734,7 @@  COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)
 		s.freeram >>= bitcount;
 		s.sharedram >>= bitcount;
 		s.bufferram >>= bitcount;
+		s.availram >>= bitcount;
 		s.totalswap >>= bitcount;
 		s.freeswap >>= bitcount;
 		s.totalhigh >>= bitcount;
@@ -2747,6 +2750,7 @@  COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)
 	s_32.freeram = s.freeram;
 	s_32.sharedram = s.sharedram;
 	s_32.bufferram = s.bufferram;
+	s_32.availram = s.availram;
 	s_32.totalswap = s.totalswap;
 	s_32.freeswap = s.freeswap;
 	s_32.procs = s.procs;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index b5d62e1..5013c75 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5786,6 +5786,7 @@  void si_meminfo(struct sysinfo *val)
 	val->sharedram = global_node_page_state(NR_SHMEM);
 	val->freeram = global_zone_page_state(NR_FREE_PAGES);
 	val->bufferram = nr_blockdev_pages();
+	val->availram = si_mem_available();
 	val->totalhigh = totalhigh_pages();
 	val->freehigh = nr_free_highpages();
 	val->mem_unit = PAGE_SIZE;
@@ -5807,6 +5808,7 @@  void si_meminfo_node(struct sysinfo *val, int nid)
 	val->totalram = managed_pages;
 	val->sharedram = node_page_state(pgdat, NR_SHMEM);
 	val->freeram = sum_zone_node_page_state(nid, NR_FREE_PAGES);
+	val->availram = si_mem_available();
 #ifdef CONFIG_HIGHMEM
 	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) {
 		struct zone *zone = &pgdat->node_zones[zone_type];