mbox series

[RFC,0/4] Encode NFSv4 attributes via a branch table

Message ID 168808788945.7728.6965361432016501208.stgit@manet.1015granger.net (mailing list archive)
Headers show
Series Encode NFSv4 attributes via a branch table | expand

Message

Chuck Lever June 30, 2023, 1:34 a.m. UTC
Here's something just for fun. I've converted nfsd4_encode_fattr4()
to use a bitmask loop, calling an encode helper for each attribute
to be encoded. Rotten tomatoes and gold stars are both acceptible.

I was hoping for a bit of a performance gain because encode_fattr4
is called so very often, but there was not much difference at all.

The main benefit here is the hard scope boundary for each of the
separate attribute encoders -- that makes for safer code that is
easier to reason about, and might even be more straightforward to
convert to machine-generated code, if we ever want to do that.

And notably it will automatically encode the attributes in bitmask
order.

There are a few readability improvements that could be done, like
defining meaningfully-named macros for the bit positions. The ones
we have now are not directly usable for table indices. It might get
us another step closer to the XDR specification if we could find a
way to encode the whole bitmask in a single loop.

---

Chuck Lever (4):
      NFSD: Add struct nfsd4_fattr_args
      NFSD: Encode attributes in WORD0 using a bitmask loop
      NFSD: Encode attributes in WORD1 using a bitmask loop
      NFSD: Encode attributes in WORD2 using a bitmask loop


 fs/nfsd/nfs4xdr.c | 1089 +++++++++++++++++++++++++++------------------
 1 file changed, 657 insertions(+), 432 deletions(-)

--
Chuck Lever

Comments

NeilBrown July 3, 2023, 4:48 a.m. UTC | #1
On Fri, 30 Jun 2023, Chuck Lever wrote:
> Here's something just for fun. I've converted nfsd4_encode_fattr4()
> to use a bitmask loop, calling an encode helper for each attribute
> to be encoded. Rotten tomatoes and gold stars are both acceptible.

Tomatoes or stars .... it is a hard choice :-)

I wonder what the compiler does with this code.
If it unrolls the loop and inlines the functions - which it probably can
do as the array of pointers is declared const - you end up with much the
same result as the current code.

And I wonder where the compiler puts the code in each conditional now.
If it assumes an if() is unlikely, then it would all be out-of-line
which sounds like part of your goal (or maybe just a nice-to-have).

If the compiler does, or can be convinced to, do the unroll and inline
and unlikely optimisations, then I think I'd give this a goal star.

I guess in practice some of the attributes are "likely" and many are
"unlikely".  With the current code we could easily annotate that if we
wanted to and thought (or measured) there was any value.  With the
looping code we cannot really annotate the likelihood of each.

The code-generation idea is intriguing.  Even if we didn't reach that
goal, having the code highly structured as though it were auto-generated
would be no bad thing.

Thanks,
NeilBrown
Chuck Lever July 3, 2023, 1:56 p.m. UTC | #2
> On Jul 3, 2023, at 12:48 AM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 30 Jun 2023, Chuck Lever wrote:
>> Here's something just for fun. I've converted nfsd4_encode_fattr4()
>> to use a bitmask loop, calling an encode helper for each attribute
>> to be encoded. Rotten tomatoes and gold stars are both acceptible.
> 
> Tomatoes or stars .... it is a hard choice :-)
> 
> I wonder what the compiler does with this code.
> If it unrolls the loop and inlines the functions - which it probably can
> do as the array of pointers is declared const - you end up with much the
> same result as the current code.
> 
> And I wonder where the compiler puts the code in each conditional now.
> If it assumes an if() is unlikely, then it would all be out-of-line
> which sounds like part of your goal (or maybe just a nice-to-have).
> 
> If the compiler does, or can be convinced to, do the unroll and inline
> and unlikely optimisations, then I think I'd give this a goal star.
> 
> I guess in practice some of the attributes are "likely" and many are
> "unlikely".

This is absolutely the case.

My first attempt at optimizing nfsd4_encode_fattr() was to build
a miniature version that handled just the frequently-requested
combinations of attributes. It made very little difference.

The conclusions that I drew from that are:

- The number of conditional branches in here doesn't seem to be
  the costly part of encode_fattr().

- The frequently-requested attributes are expensive to process
  for some reason. Size is easy, but getting the user and
  group are not as quick as I would have hoped.

- It's not the efficiency of encode_fattr() that is the issue,
  it's the frequency of its use. That's something the server
  can't do much about.


> With the current code we could easily annotate that if we
> wanted to and thought (or measured) there was any value.  With the
> looping code we cannot really annotate the likelihood of each.

Nope, likelihood annotation isn't really possible with a bitmask
loop. But my understanding is that unlikely() means really
really really unlikely, as in "this code is an error case that
is almost never used". And that's not actually the case for most
of these attributes.


> The code-generation idea is intriguing.  Even if we didn't reach that
> goal, having the code highly structured as though it were auto-generated
> would be no bad thing.

Maybe it just calms my yearning for an ordered universe to deal
with these attributes in the same way that we deal with COMPOUND
operations.

I appreciate the review!


--
Chuck Lever
NeilBrown July 3, 2023, 9:33 p.m. UTC | #3
On Mon, 03 Jul 2023, Chuck Lever III wrote:
> 
> > On Jul 3, 2023, at 12:48 AM, NeilBrown <neilb@suse.de> wrote:
> > 
> > I guess in practice some of the attributes are "likely" and many are
> > "unlikely".
> 
> This is absolutely the case.
> 
> My first attempt at optimizing nfsd4_encode_fattr() was to build
> a miniature version that handled just the frequently-requested
> combinations of attributes. It made very little difference.

My understanding of the effect of these code-movement optimisations is
that they affect whole-system performance rather than micro-benchmarks. 
So they are quite hard to measure.
Modern CPUs are quite good at predicting code flow, so the benefit of
code movement is not a reduction in pipeline stalls, but a reduction in
cache usage.  The latter affects the whole system more-or-less equally.


> 
> The conclusions that I drew from that are:
> 
> - The number of conditional branches in here doesn't seem to be
>   the costly part of encode_fattr().
> 
> - The frequently-requested attributes are expensive to process
>   for some reason. Size is easy, but getting the user and
>   group are not as quick as I would have hoped.
> 
> - It's not the efficiency of encode_fattr() that is the issue,
>   it's the frequency of its use. That's something the server
>   can't do much about.

There probably needs a protocol revision to improve this.  I imagine a
GETATTR request including a CTIME value with the implication that if the
CTIME hasn't changed, then there is no need to return any attributes.

> 
> 
> > With the current code we could easily annotate that if we
> > wanted to and thought (or measured) there was any value.  With the
> > looping code we cannot really annotate the likelihood of each.
> 
> Nope, likelihood annotation isn't really possible with a bitmask
> loop. But my understanding is that unlikely() means really
> really really unlikely, as in "this code is an error case that
> is almost never used". And that's not actually the case for most
> of these attributes.

My understanding of unlikely() (which is largely compatible with yours)
is that it tells the compile to pessimise code dependant on the
condition being true.  Or more accurately: when there is an optimisation
trade off between code for the 'true' case and any other code, preference
the other code.
So it is definitely good to say errors are unlikely, because there is no
need to optimise for them.  You might also say a fast-path condition is
likely() because that path is more worthy of optimisation.

> 
> 
> > The code-generation idea is intriguing.  Even if we didn't reach that
> > goal, having the code highly structured as though it were auto-generated
> > would be no bad thing.
> 
> Maybe it just calms my yearning for an ordered universe to deal
> with these attributes in the same way that we deal with COMPOUND
> operations.

There are worse reasons for refactoring code:-)

Thanks,
NeilBrown

> 
> I appreciate the review!
> 
> 
> --
> Chuck Lever
> 
> 
>
Chuck Lever July 3, 2023, 11:05 p.m. UTC | #4
> On Jul 3, 2023, at 5:33 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Mon, 03 Jul 2023, Chuck Lever III wrote:
>> 
>> - It's not the efficiency of encode_fattr() that is the issue,
>>  it's the frequency of its use. That's something the server
>>  can't do much about.
> 
> There probably needs a protocol revision to improve this.  I imagine a
> GETATTR request including a CTIME value with the implication that if the
> CTIME hasn't changed, then there is no need to return any attributes.

You can precede the GETATTR with an NVERIFY operation.

https://datatracker.ietf.org/doc/html/rfc8881#name-operation-17-nverify-verify


--
Chuck Lever