diff mbox series

[v3,3/3] fs/file.c: add fast path in find_next_fd()

Message ID 20240703143311.2184454-4-yu.ma@intel.com (mailing list archive)
State New
Headers show
Series fs/file.c: optimize the critical section of file_lock in | expand

Commit Message

Ma, Yu July 3, 2024, 2:33 p.m. UTC
There is available fd in the lower 64 bits of open_fds bitmap for most cases
when we look for an available fd slot. Skip 2-levels searching via
find_next_zero_bit() for this common fast path.

Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
free slot is available there, as:
(1) The fd allocation algorithm would always allocate fd from small to large.
Lower bits in open_fds bitmap would be used much more frequently than higher
bits.
(2) After fdt is expanded (the bitmap size doubled for each time of expansion),
it would never be shrunk. The search size increases but there are few open fds
available here.
(3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
searching.

As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
update the fast path from alloc_fd() to find_next_fd(). With which, on top of
patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
Intel ICX 160 cores configuration with v6.10-rc6.

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Yu Ma <yu.ma@intel.com>
---
 fs/file.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Mateusz Guzik July 3, 2024, 2:17 p.m. UTC | #1
On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
>
> There is available fd in the lower 64 bits of open_fds bitmap for most cases
> when we look for an available fd slot. Skip 2-levels searching via
> find_next_zero_bit() for this common fast path.
>
> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
> free slot is available there, as:
> (1) The fd allocation algorithm would always allocate fd from small to large.
> Lower bits in open_fds bitmap would be used much more frequently than higher
> bits.
> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
> it would never be shrunk. The search size increases but there are few open fds
> available here.
> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
> searching.
>
> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
> Intel ICX 160 cores configuration with v6.10-rc6.
>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yu Ma <yu.ma@intel.com>
> ---
>  fs/file.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/file.c b/fs/file.c
> index a15317db3119..f25eca311f51 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>
>  static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>  {
> +       unsigned int bit;
> +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
> +       if (bit < BITS_PER_LONG)
> +               return bit;
> +

The rest of the patchset looks good on cursory read.

As for this one, the suggestion was to make it work across the entire range.

Today I wont have time to write and test what we proposed, but will
probably find some time tomorrow. Perhaps Jan will do the needful(tm)
in the meantime.

That said, please stay tuned for a patch. :)

>         unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>         unsigned int maxbit = maxfd / BITS_PER_LONG;
>         unsigned int bitbit = start / BITS_PER_LONG;
> --
> 2.43.0
>
Ma, Yu July 3, 2024, 2:28 p.m. UTC | #2
On 7/3/2024 10:17 PM, Mateusz Guzik wrote:
> On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
>> There is available fd in the lower 64 bits of open_fds bitmap for most cases
>> when we look for an available fd slot. Skip 2-levels searching via
>> find_next_zero_bit() for this common fast path.
>>
>> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
>> free slot is available there, as:
>> (1) The fd allocation algorithm would always allocate fd from small to large.
>> Lower bits in open_fds bitmap would be used much more frequently than higher
>> bits.
>> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
>> it would never be shrunk. The search size increases but there are few open fds
>> available here.
>> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
>> searching.
>>
>> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
>> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
>> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
>> Intel ICX 160 cores configuration with v6.10-rc6.
>>
>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Signed-off-by: Yu Ma <yu.ma@intel.com>
>> ---
>>   fs/file.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/file.c b/fs/file.c
>> index a15317db3119..f25eca311f51 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>>
>>   static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>>   {
>> +       unsigned int bit;
>> +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
>> +       if (bit < BITS_PER_LONG)
>> +               return bit;
>> +
> The rest of the patchset looks good on cursory read.
>
> As for this one, the suggestion was to make it work across the entire range.
>
> Today I wont have time to write and test what we proposed, but will
> probably find some time tomorrow. Perhaps Jan will do the needful(tm)
> in the meantime.
>
> That said, please stay tuned for a patch. :)

Sure, understood, Guzik, thanks for the quick feedback and consideration 
of chances to make it better and more versatile. I'll also give a try to 
double check previous proposal on entire fds range.

>>          unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>>          unsigned int maxbit = maxfd / BITS_PER_LONG;
>>          unsigned int bitbit = start / BITS_PER_LONG;
>> --
>> 2.43.0
>>
>
Jan Kara July 4, 2024, 10:03 a.m. UTC | #3
On Wed 03-07-24 10:33:11, Yu Ma wrote:
> There is available fd in the lower 64 bits of open_fds bitmap for most cases
> when we look for an available fd slot. Skip 2-levels searching via
> find_next_zero_bit() for this common fast path.
> 
> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
> free slot is available there, as:
> (1) The fd allocation algorithm would always allocate fd from small to large.
> Lower bits in open_fds bitmap would be used much more frequently than higher
> bits.
> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
> it would never be shrunk. The search size increases but there are few open fds
> available here.
> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
> searching.
> 
> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
> Intel ICX 160 cores configuration with v6.10-rc6.
> 
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yu Ma <yu.ma@intel.com>

Nice! The patch looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

One style nit below:

> diff --git a/fs/file.c b/fs/file.c
> index a15317db3119..f25eca311f51 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>  
>  static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>  {
> +	unsigned int bit;

Empty line here please to separate variable declaration and code...

> +	bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
> +	if (bit < BITS_PER_LONG)
> +		return bit;
> +
>  	unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>  	unsigned int maxbit = maxfd / BITS_PER_LONG;
>  	unsigned int bitbit = start / BITS_PER_LONG;

									Honza
Jan Kara July 4, 2024, 10:07 a.m. UTC | #4
On Wed 03-07-24 16:17:01, Mateusz Guzik wrote:
> On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
> >
> > There is available fd in the lower 64 bits of open_fds bitmap for most cases
> > when we look for an available fd slot. Skip 2-levels searching via
> > find_next_zero_bit() for this common fast path.
> >
> > Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
> > free slot is available there, as:
> > (1) The fd allocation algorithm would always allocate fd from small to large.
> > Lower bits in open_fds bitmap would be used much more frequently than higher
> > bits.
> > (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
> > it would never be shrunk. The search size increases but there are few open fds
> > available here.
> > (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
> > searching.
> >
> > As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
> > update the fast path from alloc_fd() to find_next_fd(). With which, on top of
> > patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
> > Intel ICX 160 cores configuration with v6.10-rc6.
> >
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > ---
> >  fs/file.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index a15317db3119..f25eca311f51 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -488,6 +488,11 @@ struct files_struct init_files = {
> >
> >  static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
> >  {
> > +       unsigned int bit;
> > +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
> > +       if (bit < BITS_PER_LONG)
> > +               return bit;
> > +
> 
> The rest of the patchset looks good on cursory read.
> 
> As for this one, the suggestion was to make it work across the entire range.

I'm not sure what do you exactly mean by "make it work across the entire
range" because what Ma has implemented is exactly what I originally had in
mind - i.e., search the first word of open_fds starting from next_fd (note
that 'start' in this function is already set to max(start, next_fd)), if
that fails, go through the two level bitmap.

								Honza
Ma, Yu July 4, 2024, 2:50 p.m. UTC | #5
On 7/4/2024 6:03 PM, Jan Kara wrote:
> On Wed 03-07-24 10:33:11, Yu Ma wrote:
>> There is available fd in the lower 64 bits of open_fds bitmap for most cases
>> when we look for an available fd slot. Skip 2-levels searching via
>> find_next_zero_bit() for this common fast path.
>>
>> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
>> free slot is available there, as:
>> (1) The fd allocation algorithm would always allocate fd from small to large.
>> Lower bits in open_fds bitmap would be used much more frequently than higher
>> bits.
>> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
>> it would never be shrunk. The search size increases but there are few open fds
>> available here.
>> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
>> searching.
>>
>> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
>> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
>> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
>> Intel ICX 160 cores configuration with v6.10-rc6.
>>
>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>> Signed-off-by: Yu Ma <yu.ma@intel.com>
> Nice! The patch looks good to me. Feel free to add:
>
> Reviewed-by: Jan Kara <jack@suse.cz>
>
> One style nit below:
>
>> diff --git a/fs/file.c b/fs/file.c
>> index a15317db3119..f25eca311f51 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>>   
>>   static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>>   {
>> +	unsigned int bit;
> Empty line here please to separate variable declaration and code...

Thanks Honza, copy that :)

>
>> +	bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
>> +	if (bit < BITS_PER_LONG)
>> +		return bit;
>> +
>>   	unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>>   	unsigned int maxbit = maxfd / BITS_PER_LONG;
>>   	unsigned int bitbit = start / BITS_PER_LONG;
> 									Honza
Mateusz Guzik July 4, 2024, 5:44 p.m. UTC | #6
On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
>
> There is available fd in the lower 64 bits of open_fds bitmap for most cases
> when we look for an available fd slot. Skip 2-levels searching via
> find_next_zero_bit() for this common fast path.
>
> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
> free slot is available there, as:
> (1) The fd allocation algorithm would always allocate fd from small to large.
> Lower bits in open_fds bitmap would be used much more frequently than higher
> bits.
> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
> it would never be shrunk. The search size increases but there are few open fds
> available here.
> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
> searching.
>
> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
> Intel ICX 160 cores configuration with v6.10-rc6.
>
> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Yu Ma <yu.ma@intel.com>
> ---
>  fs/file.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/fs/file.c b/fs/file.c
> index a15317db3119..f25eca311f51 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>
>  static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>  {
> +       unsigned int bit;
> +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
> +       if (bit < BITS_PER_LONG)
> +               return bit;
> +
>         unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>         unsigned int maxbit = maxfd / BITS_PER_LONG;
>         unsigned int bitbit = start / BITS_PER_LONG;
> --
> 2.43.0
>

I had something like this in mind:
diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..4d3307e39db7 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
*fdt, unsigned int start)
        unsigned int maxfd = fdt->max_fds; /* always multiple of
BITS_PER_LONG */
        unsigned int maxbit = maxfd / BITS_PER_LONG;
        unsigned int bitbit = start / BITS_PER_LONG;
+       unsigned int bit;
+
+       /*
+        * Try to avoid looking at the second level map.
+        */
+       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
+                               start & (BITS_PER_LONG - 1));
+       if (bit < BITS_PER_LONG) {
+               return bit + bitbit * BITS_PER_LONG;
+       }

        bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit,
bitbit) * BITS_PER_LONG;
        if (bitbit >= maxfd)

can you please test it out. I expect it to provide a tiny improvement
over your patch.
Jan Kara July 4, 2024, 9:55 p.m. UTC | #7
On Thu 04-07-24 19:44:10, Mateusz Guzik wrote:
> On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
> >
> > There is available fd in the lower 64 bits of open_fds bitmap for most cases
> > when we look for an available fd slot. Skip 2-levels searching via
> > find_next_zero_bit() for this common fast path.
> >
> > Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
> > free slot is available there, as:
> > (1) The fd allocation algorithm would always allocate fd from small to large.
> > Lower bits in open_fds bitmap would be used much more frequently than higher
> > bits.
> > (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
> > it would never be shrunk. The search size increases but there are few open fds
> > available here.
> > (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
> > searching.
> >
> > As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
> > update the fast path from alloc_fd() to find_next_fd(). With which, on top of
> > patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
> > Intel ICX 160 cores configuration with v6.10-rc6.
> >
> > Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Yu Ma <yu.ma@intel.com>
> > ---
> >  fs/file.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/fs/file.c b/fs/file.c
> > index a15317db3119..f25eca311f51 100644
> > --- a/fs/file.c
> > +++ b/fs/file.c
> > @@ -488,6 +488,11 @@ struct files_struct init_files = {
> >
> >  static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
> >  {
> > +       unsigned int bit;
> > +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
> > +       if (bit < BITS_PER_LONG)
> > +               return bit;
> > +
> >         unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
> >         unsigned int maxbit = maxfd / BITS_PER_LONG;
> >         unsigned int bitbit = start / BITS_PER_LONG;
> > --
> > 2.43.0
> >
> 
> I had something like this in mind:
> diff --git a/fs/file.c b/fs/file.c
> index a3b72aa64f11..4d3307e39db7 100644
> --- a/fs/file.c
> +++ b/fs/file.c
> @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
> *fdt, unsigned int start)
>         unsigned int maxfd = fdt->max_fds; /* always multiple of
> BITS_PER_LONG */
>         unsigned int maxbit = maxfd / BITS_PER_LONG;
>         unsigned int bitbit = start / BITS_PER_LONG;
> +       unsigned int bit;
> +
> +       /*
> +        * Try to avoid looking at the second level map.
> +        */
> +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
> +                               start & (BITS_PER_LONG - 1));
> +       if (bit < BITS_PER_LONG) {
> +               return bit + bitbit * BITS_PER_LONG;
> +       }

Drat, you're right. I missed that Ma did not add the proper offset to
open_fds. *This* is what I meant :)

								Honza
Ma, Yu July 5, 2024, 7:56 a.m. UTC | #8
On 7/5/2024 5:55 AM, Jan Kara wrote:
> On Thu 04-07-24 19:44:10, Mateusz Guzik wrote:
>> On Wed, Jul 3, 2024 at 4:07 PM Yu Ma <yu.ma@intel.com> wrote:
>>> There is available fd in the lower 64 bits of open_fds bitmap for most cases
>>> when we look for an available fd slot. Skip 2-levels searching via
>>> find_next_zero_bit() for this common fast path.
>>>
>>> Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
>>> free slot is available there, as:
>>> (1) The fd allocation algorithm would always allocate fd from small to large.
>>> Lower bits in open_fds bitmap would be used much more frequently than higher
>>> bits.
>>> (2) After fdt is expanded (the bitmap size doubled for each time of expansion),
>>> it would never be shrunk. The search size increases but there are few open fds
>>> available here.
>>> (3) There is fast path inside of find_next_zero_bit() when size<=64 to speed up
>>> searching.
>>>
>>> As suggested by Mateusz Guzik <mjguzik gmail.com> and Jan Kara <jack@suse.cz>,
>>> update the fast path from alloc_fd() to find_next_fd(). With which, on top of
>>> patch 1 and 2, pts/blogbench-1.1.0 read is improved by 13% and write by 7% on
>>> Intel ICX 160 cores configuration with v6.10-rc6.
>>>
>>> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
>>> Signed-off-by: Yu Ma <yu.ma@intel.com>
>>> ---
>>>   fs/file.c | 5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/file.c b/fs/file.c
>>> index a15317db3119..f25eca311f51 100644
>>> --- a/fs/file.c
>>> +++ b/fs/file.c
>>> @@ -488,6 +488,11 @@ struct files_struct init_files = {
>>>
>>>   static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
>>>   {
>>> +       unsigned int bit;
>>> +       bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
>>> +       if (bit < BITS_PER_LONG)
>>> +               return bit;
>>> +
>>>          unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
>>>          unsigned int maxbit = maxfd / BITS_PER_LONG;
>>>          unsigned int bitbit = start / BITS_PER_LONG;
>>> --
>>> 2.43.0
>>>
>> I had something like this in mind:
>> diff --git a/fs/file.c b/fs/file.c
>> index a3b72aa64f11..4d3307e39db7 100644
>> --- a/fs/file.c
>> +++ b/fs/file.c
>> @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
>> *fdt, unsigned int start)
>>          unsigned int maxfd = fdt->max_fds; /* always multiple of
>> BITS_PER_LONG */
>>          unsigned int maxbit = maxfd / BITS_PER_LONG;
>>          unsigned int bitbit = start / BITS_PER_LONG;
>> +       unsigned int bit;
>> +
>> +       /*
>> +        * Try to avoid looking at the second level map.
>> +        */
>> +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
>> +                               start & (BITS_PER_LONG - 1));
>> +       if (bit < BITS_PER_LONG) {
>> +               return bit + bitbit * BITS_PER_LONG;
>> +       }
> Drat, you're right. I missed that Ma did not add the proper offset to
> open_fds. *This* is what I meant :)
>
> 								Honza

Just tried this on v6.10-rc6, the improvement on top of patch 1 and 
patch 2 is 7% for read and 3% for write, less than just check first word.

Per my understanding, its performance would be better if we can find 
free bit in the same word of next_fd with high possibility, but next_fd 
just represents the lowest possible free bit. If fds are open/close 
frequently and randomly, that might not always be the case, next_fd may 
be distributed randomly, for example, 0-65 are occupied, fd=3 is 
returned, next_fd will be set to 3, next time when 3 is allocated, 
next_fd will be set to 4, while the actual first free bit is 66 , when 
66 is allocated, and fd=5 is returned, then the above process would be 
went through again.

Yu
Ma, Yu July 9, 2024, 8:32 a.m. UTC | #9
On 7/5/2024 3:56 PM, Ma, Yu wrote:
> I had something like this in mind:
>>> diff --git a/fs/file.c b/fs/file.c
>>> index a3b72aa64f11..4d3307e39db7 100644
>>> --- a/fs/file.c
>>> +++ b/fs/file.c
>>> @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
>>> *fdt, unsigned int start)
>>>          unsigned int maxfd = fdt->max_fds; /* always multiple of
>>> BITS_PER_LONG */
>>>          unsigned int maxbit = maxfd / BITS_PER_LONG;
>>>          unsigned int bitbit = start / BITS_PER_LONG;
>>> +       unsigned int bit;
>>> +
>>> +       /*
>>> +        * Try to avoid looking at the second level map.
>>> +        */
>>> +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
>>> +                               start & (BITS_PER_LONG - 1));
>>> +       if (bit < BITS_PER_LONG) {
>>> +               return bit + bitbit * BITS_PER_LONG;
>>> +       }
>> Drat, you're right. I missed that Ma did not add the proper offset to
>> open_fds. *This* is what I meant :)
>>
>>                                 Honza
>
> Just tried this on v6.10-rc6, the improvement on top of patch 1 and 
> patch 2 is 7% for read and 3% for write, less than just check first word.
>
> Per my understanding, its performance would be better if we can find 
> free bit in the same word of next_fd with high possibility, but 
> next_fd just represents the lowest possible free bit. If fds are 
> open/close frequently and randomly, that might not always be the case, 
> next_fd may be distributed randomly, for example, 0-65 are occupied, 
> fd=3 is returned, next_fd will be set to 3, next time when 3 is 
> allocated, next_fd will be set to 4, while the actual first free bit 
> is 66 , when 66 is allocated, and fd=5 is returned, then the above 
> process would be went through again.
>
> Yu
>
Hi Guzik, Honza,

Do we have any more comment or idea regarding to the fast path? Thanks 
for your time and any feedback :)


Regards

Yu
Mateusz Guzik July 9, 2024, 10:17 a.m. UTC | #10
Right, forgot to respond.

I suspect the different result is either because of mere variance
between reboots or blogbench using significantly less than 100 fds at
any given time -- I don't have an easy way to test at your scale at
the moment. You could probably test that by benching both approaches
while switching them at runtime with a static_branch. However, I don't
know if that effort is warranted atm.

So happens I'm busy with other stuff and it is not my call to either
block or let this in, so I'm buggering off.

On Tue, Jul 9, 2024 at 10:32 AM Ma, Yu <yu.ma@intel.com> wrote:
>
>
> On 7/5/2024 3:56 PM, Ma, Yu wrote:
> > I had something like this in mind:
> >>> diff --git a/fs/file.c b/fs/file.c
> >>> index a3b72aa64f11..4d3307e39db7 100644
> >>> --- a/fs/file.c
> >>> +++ b/fs/file.c
> >>> @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
> >>> *fdt, unsigned int start)
> >>>          unsigned int maxfd = fdt->max_fds; /* always multiple of
> >>> BITS_PER_LONG */
> >>>          unsigned int maxbit = maxfd / BITS_PER_LONG;
> >>>          unsigned int bitbit = start / BITS_PER_LONG;
> >>> +       unsigned int bit;
> >>> +
> >>> +       /*
> >>> +        * Try to avoid looking at the second level map.
> >>> +        */
> >>> +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
> >>> +                               start & (BITS_PER_LONG - 1));
> >>> +       if (bit < BITS_PER_LONG) {
> >>> +               return bit + bitbit * BITS_PER_LONG;
> >>> +       }
> >> Drat, you're right. I missed that Ma did not add the proper offset to
> >> open_fds. *This* is what I meant :)
> >>
> >>                                 Honza
> >
> > Just tried this on v6.10-rc6, the improvement on top of patch 1 and
> > patch 2 is 7% for read and 3% for write, less than just check first word.
> >
> > Per my understanding, its performance would be better if we can find
> > free bit in the same word of next_fd with high possibility, but
> > next_fd just represents the lowest possible free bit. If fds are
> > open/close frequently and randomly, that might not always be the case,
> > next_fd may be distributed randomly, for example, 0-65 are occupied,
> > fd=3 is returned, next_fd will be set to 3, next time when 3 is
> > allocated, next_fd will be set to 4, while the actual first free bit
> > is 66 , when 66 is allocated, and fd=5 is returned, then the above
> > process would be went through again.
> >
> > Yu
> >
> Hi Guzik, Honza,
>
> Do we have any more comment or idea regarding to the fast path? Thanks
> for your time and any feedback :)
>
>
> Regards
>
> Yu
>
Tim Chen July 10, 2024, 11:40 p.m. UTC | #11
On Tue, 2024-07-09 at 12:17 +0200, Mateusz Guzik wrote:
> Right, forgot to respond.
> 
> I suspect the different result is either because of mere variance
> between reboots or blogbench using significantly less than 100 fds at
> any given time -- I don't have an easy way to test at your scale at
> the moment. You could probably test that by benching both approaches
> while switching them at runtime with a static_branch. However, I don't
> know if that effort is warranted atm.
> 
> So happens I'm busy with other stuff and it is not my call to either
> block or let this in, so I'm buggering off.
> 
> On Tue, Jul 9, 2024 at 10:32 AM Ma, Yu <yu.ma@intel.com> wrote:
> > 
> > 
> > On 7/5/2024 3:56 PM, Ma, Yu wrote:
> > > I had something like this in mind:
> > > > > diff --git a/fs/file.c b/fs/file.c
> > > > > index a3b72aa64f11..4d3307e39db7 100644
> > > > > --- a/fs/file.c
> > > > > +++ b/fs/file.c
> > > > > @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
> > > > > *fdt, unsigned int start)
> > > > >          unsigned int maxfd = fdt->max_fds; /* always multiple of
> > > > > BITS_PER_LONG */
> > > > >          unsigned int maxbit = maxfd / BITS_PER_LONG;
> > > > >          unsigned int bitbit = start / BITS_PER_LONG;
> > > > > +       unsigned int bit;
> > > > > +
> > > > > +       /*
> > > > > +        * Try to avoid looking at the second level map.
> > > > > +        */
> > > > > +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
> > > > > +                               start & (BITS_PER_LONG - 1));
> > > > > +       if (bit < BITS_PER_LONG) {
> > > > > +               return bit + bitbit * BITS_PER_LONG;
> > > > > +       }

I think this approach based on next_fd quick check is more generic and scalable.

It just happen for blogbench, just checking the first 64 bit allow a quicker
skip to the two level search where this approach, next_fd may be left
in a 64 word that actually has no open bits and we are doing useless search
in find_next_zero_bit(). Perhaps we should check full_fds_bits to make sure
there are empty slots before we do
find_next_zero_bit() fast path.  Something like

	if (!test_bit(bitbit, fdt->full_fds_bits)) {
		bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
					start & (BITS_PER_LONG - 1));
		if (bit < BITS_PER_LONG)
			return bit + bitbit * BITS_PER_LONG;
	}
Tim

> > > > Drat, you're right. I missed that Ma did not add the proper offset to
> > > > open_fds. *This* is what I meant :)
> > > > 
> > > >                                 Honza
> > > 
> > > Just tried this on v6.10-rc6, the improvement on top of patch 1 and
> > > patch 2 is 7% for read and 3% for write, less than just check first word.
> > > 
> > > Per my understanding, its performance would be better if we can find
> > > free bit in the same word of next_fd with high possibility, but
> > > next_fd just represents the lowest possible free bit. If fds are
> > > open/close frequently and randomly, that might not always be the case,
> > > next_fd may be distributed randomly, for example, 0-65 are occupied,
> > > fd=3 is returned, next_fd will be set to 3, next time when 3 is
> > > allocated, next_fd will be set to 4, while the actual first free bit
> > > is 66 , when 66 is allocated, and fd=5 is returned, then the above
> > > process would be went through again.
> > > 
> > > Yu
> > > 
> > Hi Guzik, Honza,
> > 
> > Do we have any more comment or idea regarding to the fast path? Thanks
> > for your time and any feedback :)
> > 
> > 
> > Regards
> > 
> > Yu
> > 
> 
>
Ma, Yu July 11, 2024, 9:27 a.m. UTC | #12
On 7/11/2024 7:40 AM, Tim Chen wrote:
> On Tue, 2024-07-09 at 12:17 +0200, Mateusz Guzik wrote:
>> Right, forgot to respond.
>>
>> I suspect the different result is either because of mere variance
>> between reboots or blogbench using significantly less than 100 fds at
>> any given time -- I don't have an easy way to test at your scale at
>> the moment. You could probably test that by benching both approaches
>> while switching them at runtime with a static_branch. However, I don't
>> know if that effort is warranted atm.
>>
>> So happens I'm busy with other stuff and it is not my call to either
>> block or let this in, so I'm buggering off.
>>
>> On Tue, Jul 9, 2024 at 10:32 AM Ma, Yu <yu.ma@intel.com> wrote:
>>>
>>> On 7/5/2024 3:56 PM, Ma, Yu wrote:
>>>> I had something like this in mind:
>>>>>> diff --git a/fs/file.c b/fs/file.c
>>>>>> index a3b72aa64f11..4d3307e39db7 100644
>>>>>> --- a/fs/file.c
>>>>>> +++ b/fs/file.c
>>>>>> @@ -489,6 +489,16 @@ static unsigned int find_next_fd(struct fdtable
>>>>>> *fdt, unsigned int start)
>>>>>>           unsigned int maxfd = fdt->max_fds; /* always multiple of
>>>>>> BITS_PER_LONG */
>>>>>>           unsigned int maxbit = maxfd / BITS_PER_LONG;
>>>>>>           unsigned int bitbit = start / BITS_PER_LONG;
>>>>>> +       unsigned int bit;
>>>>>> +
>>>>>> +       /*
>>>>>> +        * Try to avoid looking at the second level map.
>>>>>> +        */
>>>>>> +       bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
>>>>>> +                               start & (BITS_PER_LONG - 1));
>>>>>> +       if (bit < BITS_PER_LONG) {
>>>>>> +               return bit + bitbit * BITS_PER_LONG;
>>>>>> +       }
> I think this approach based on next_fd quick check is more generic and scalable.
>
> It just happen for blogbench, just checking the first 64 bit allow a quicker
> skip to the two level search where this approach, next_fd may be left
> in a 64 word that actually has no open bits and we are doing useless search
> in find_next_zero_bit(). Perhaps we should check full_fds_bits to make sure
> there are empty slots before we do
> find_next_zero_bit() fast path.  Something like
>
> 	if (!test_bit(bitbit, fdt->full_fds_bits)) {
> 		bit = find_next_zero_bit(&fdt->open_fds[bitbit], BITS_PER_LONG,
> 					start & (BITS_PER_LONG - 1));
> 		if (bit < BITS_PER_LONG)
> 			return bit + bitbit * BITS_PER_LONG;
> 	}
> Tim

Yes, agree that it scales better, I'll update v4 with fast path for the 
word contains next_fd and send out for review soon

>>>>> Drat, you're right. I missed that Ma did not add the proper offset to
>>>>> open_fds. *This* is what I meant :)
>>>>>
>>>>>                                  Honza
>>>> Just tried this on v6.10-rc6, the improvement on top of patch 1 and
>>>> patch 2 is 7% for read and 3% for write, less than just check first word.
>>>>
>>>> Per my understanding, its performance would be better if we can find
>>>> free bit in the same word of next_fd with high possibility, but
>>>> next_fd just represents the lowest possible free bit. If fds are
>>>> open/close frequently and randomly, that might not always be the case,
>>>> next_fd may be distributed randomly, for example, 0-65 are occupied,
>>>> fd=3 is returned, next_fd will be set to 3, next time when 3 is
>>>> allocated, next_fd will be set to 4, while the actual first free bit
>>>> is 66 , when 66 is allocated, and fd=5 is returned, then the above
>>>> process would be went through again.
>>>>
>>>> Yu
>>>>
>>> Hi Guzik, Honza,
>>>
>>> Do we have any more comment or idea regarding to the fast path? Thanks
>>> for your time and any feedback :)
>>>
>>>
>>> Regards
>>>
>>> Yu
>>>
>>
diff mbox series

Patch

diff --git a/fs/file.c b/fs/file.c
index a15317db3119..f25eca311f51 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -488,6 +488,11 @@  struct files_struct init_files = {
 
 static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start)
 {
+	unsigned int bit;
+	bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start);
+	if (bit < BITS_PER_LONG)
+		return bit;
+
 	unsigned int maxfd = fdt->max_fds; /* always multiple of BITS_PER_LONG */
 	unsigned int maxbit = maxfd / BITS_PER_LONG;
 	unsigned int bitbit = start / BITS_PER_LONG;