Message ID | 20240823181423.20458-20-snitzer@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | nfs/nfsd: add support for localio | expand |
On Sat, 24 Aug 2024, Mike Snitzer wrote: > + > +6. Why is having the client perform a server-side file OPEN, without > + using RPC, beneficial? Is the benefit pNFS specific? > + > + Avoiding the use of XDR and RPC for file opens is beneficial to > + performance regardless of whether pNFS is used. However adding a > + requirement to go over the wire to do an open and/or close ends up > + negating any benefit of avoiding the wire for doing the I/O itself > + when we’re dealing with small files. There is no benefit to replacing > + the READ or WRITE with a new open and/or close operation that still > + needs to go over the wire. I don't think the above is correct. The current code still does a normal NFSv4 OPEN or NFSv3 GETATTR when then client opens a file. Only the READ/WRITE/COMMIT operations are avoided. While I'm not advocating for an over-the-wire request to map a filehandle to a struct nfsd_file*, I don't think you can convincingly argue against it without concrete performance measurements. NeilBrown
> On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > On Sat, 24 Aug 2024, Mike Snitzer wrote: >> + >> +6. Why is having the client perform a server-side file OPEN, without >> + using RPC, beneficial? Is the benefit pNFS specific? >> + >> + Avoiding the use of XDR and RPC for file opens is beneficial to >> + performance regardless of whether pNFS is used. However adding a >> + requirement to go over the wire to do an open and/or close ends up >> + negating any benefit of avoiding the wire for doing the I/O itself >> + when we’re dealing with small files. There is no benefit to replacing >> + the READ or WRITE with a new open and/or close operation that still >> + needs to go over the wire. > > I don't think the above is correct. I struggled with this text too. I thought the reason we want a server-side file OPEN is so that proper access authorization, same as would be done on a remote access, can be done. > The current code still does a normal NFSv4 OPEN or NFSv3 GETATTR when > then client opens a file. Only the READ/WRITE/COMMIT operations are > avoided. > > While I'm not advocating for an over-the-wire request to map a > filehandle to a struct nfsd_file*, I don't think you can convincingly > argue against it without concrete performance measurements. > > NeilBrown -- Chuck Lever
On Mon, 2024-08-26 at 14:16 +0000, Chuck Lever III wrote: > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > > > On Sat, 24 Aug 2024, Mike Snitzer wrote: > > > + > > > +6. Why is having the client perform a server-side file OPEN, > > > without > > > + using RPC, beneficial? Is the benefit pNFS specific? > > > + > > > + Avoiding the use of XDR and RPC for file opens is beneficial > > > to > > > + performance regardless of whether pNFS is used. However > > > adding a > > > + requirement to go over the wire to do an open and/or close > > > ends up > > > + negating any benefit of avoiding the wire for doing the I/O > > > itself > > > + when we’re dealing with small files. There is no benefit to > > > replacing > > > + the READ or WRITE with a new open and/or close operation that > > > still > > > + needs to go over the wire. > > > > I don't think the above is correct. > > I struggled with this text too. > > I thought the reason we want a server-side file OPEN is so that > proper access authorization, same as would be done on a remote > access, can be done. > You're conflating "server-side file open" with "on the wire open". The code does do a server side file open, and does call up to rpc.mountd to authenticate the client's IP address and domain. The text is basically pointing out that if you have to add stateful on- the-wire operations for small files (e.g. size < 1MB), then you might as well just send the READ or WRITE instead. > > > The current code still does a normal NFSv4 OPEN or NFSv3 GETATTR > > when > > then client opens a file. Only the READ/WRITE/COMMIT operations > > are > > avoided. > > > > While I'm not advocating for an over-the-wire request to map a > > filehandle to a struct nfsd_file*, I don't think you can > > convincingly > > argue against it without concrete performance measurements. > What is the value of doing an open over the wire? What are you trying to accomplish that can't be accomplished without going over the wire?
On Tue, 27 Aug 2024, Trond Myklebust wrote: > > > > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > While I'm not advocating for an over-the-wire request to map a > > > filehandle to a struct nfsd_file*, I don't think you can > > > convincingly > > > argue against it without concrete performance measurements. > > > > What is the value of doing an open over the wire? What are you trying > to accomplish that can't be accomplished without going over the wire? The advantage of going over the wire is avoiding code duplication. The cost is latency. Obviously the goal of LOCALIO is to find those points where the latency saving justifies the code duplication. When opening with AUTH_UNIX the code duplication to determine the correct credential is small and easy to review. If we ever wanted to support KRB5 or TLS I would be a lot less comfortable about reviewing the code duplication. So I think it is worth considering whether an over-the-wire open is really all that costly. As I noted we already have an over-the-wire request at open time. We could conceivably send the LOCALIO-OPEN request at the same time so as not to add latency. We could receive the reply through the in-kernel backchannel so there is no RPC reply. That might all be too complex and might not be justified. My point is that I think the trade-offs are subtle and I think the FAQ answer cuts off an avenue that hasn't really been explored. Thanks, NeilBrown
On Wed, 2024-08-28 at 07:49 +1000, NeilBrown wrote: > On Tue, 27 Aug 2024, Trond Myklebust wrote: > > > > > > > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > While I'm not advocating for an over-the-wire request to map a > > > > filehandle to a struct nfsd_file*, I don't think you can > > > > convincingly > > > > argue against it without concrete performance measurements. > > > > > > > What is the value of doing an open over the wire? What are you > > trying > > to accomplish that can't be accomplished without going over the > > wire? > > The advantage of going over the wire is avoiding code duplication. > The cost is latency. Obviously the goal of LOCALIO is to find those > points where the latency saving justifies the code duplication. > > When opening with AUTH_UNIX the code duplication to determine the > correct credential is small and easy to review. If we ever wanted to > support KRB5 or TLS I would be a lot less comfortable about reviewing > the code duplication. > > So I think it is worth considering whether an over-the-wire open is > really all that costly. As I noted we already have an over-the-wire > request at open time. We could conceivably send the LOCALIO-OPEN > request at the same time so as not to add latency. We could receive > the > reply through the in-kernel backchannel so there is no RPC reply. > > That might all be too complex and might not be justified. My point > is > that I think the trade-offs are subtle and I think the FAQ answer > cuts > off an avenue that hasn't really been explored. > So, your argument is that if there was a hypothetical situation where we wanted to add krb5 or TLS support, then we'd have more code to review? The counter-argument would be that we've already established the right of the client to do I/O to the file. This will already have been done by an over-the-wire call to OPEN (NFSv4), ACCESS (NFSv3/NFSv4) or CREATE (NFSv3). Those calls will have used krb5 and/or TLS to authenticate the user. All that remains to be done is perform the I/O that was authorised by those calls. Furthermore, we'd already have established that the client and the knfsd instance are running in the same kernel space on the same hardware (whether real or virtualised). There is no chance for a bad actor to compromise the one without also compromising the other. However, let's assume that somehow is possible: How does throwing in an on-the-wire protocol that is initiated by the one and interpreted by the other going to help, given that both have access to the exact same RPCSEC_GSS/TLS session and shared secret information via shared kernel memory? So again, what problem are you trying to fix?
On Wed, 28 Aug 2024, Trond Myklebust wrote: > On Wed, 2024-08-28 at 07:49 +1000, NeilBrown wrote: > > On Tue, 27 Aug 2024, Trond Myklebust wrote: > > > > > > > > > > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > While I'm not advocating for an over-the-wire request to map a > > > > > filehandle to a struct nfsd_file*, I don't think you can > > > > > convincingly > > > > > argue against it without concrete performance measurements. > > > > > > > > > > What is the value of doing an open over the wire? What are you > > > trying > > > to accomplish that can't be accomplished without going over the > > > wire? > > > > The advantage of going over the wire is avoiding code duplication. > > The cost is latency. Obviously the goal of LOCALIO is to find those > > points where the latency saving justifies the code duplication. > > > > When opening with AUTH_UNIX the code duplication to determine the > > correct credential is small and easy to review. If we ever wanted to > > support KRB5 or TLS I would be a lot less comfortable about reviewing > > the code duplication. > > > > So I think it is worth considering whether an over-the-wire open is > > really all that costly. As I noted we already have an over-the-wire > > request at open time. We could conceivably send the LOCALIO-OPEN > > request at the same time so as not to add latency. We could receive > > the > > reply through the in-kernel backchannel so there is no RPC reply. > > > > That might all be too complex and might not be justified. My point > > is > > that I think the trade-offs are subtle and I think the FAQ answer > > cuts > > off an avenue that hasn't really been explored. > > > > So, your argument is that if there was a hypothetical situation where > we wanted to add krb5 or TLS support, then we'd have more code to > review? > > The counter-argument would be that we've already established the right > of the client to do I/O to the file. This will already have been done > by an over-the-wire call to OPEN (NFSv4), ACCESS (NFSv3/NFSv4) or > CREATE (NFSv3). Those calls will have used krb5 and/or TLS to > authenticate the user. All that remains to be done is perform the I/O > that was authorised by those calls. The other thing that remains is to get the correct 'struct cred *' to store in ->f_cred (or to use for lookup in the nfsd filecache). > > Furthermore, we'd already have established that the client and the > knfsd instance are running in the same kernel space on the same > hardware (whether real or virtualised). There is no chance for a bad > actor to compromise the one without also compromising the other. > However, let's assume that somehow is possible: How does throwing in an > on-the-wire protocol that is initiated by the one and interpreted by > the other going to help, given that both have access to the exact same > RPCSEC_GSS/TLS session and shared secret information via shared kernel > memory? > > So again, what problem are you trying to fix? Conversely: what exactly is this FAQ entry trying to argue against? My current immediate goal is for the FAQ to be useful. It mostly is, but this one question/answer isn't clear to me. Thanks, NeilBrown
On Wed, 2024-08-28 at 09:41 +1000, NeilBrown wrote: > On Wed, 28 Aug 2024, Trond Myklebust wrote: > > On Wed, 2024-08-28 at 07:49 +1000, NeilBrown wrote: > > > On Tue, 27 Aug 2024, Trond Myklebust wrote: > > > > > > > > > > > > > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> > > > > > > wrote: > > > > > > > > > > > > While I'm not advocating for an over-the-wire request to > > > > > > map a > > > > > > filehandle to a struct nfsd_file*, I don't think you can > > > > > > convincingly > > > > > > argue against it without concrete performance measurements. > > > > > > > > > > > > > What is the value of doing an open over the wire? What are you > > > > trying > > > > to accomplish that can't be accomplished without going over the > > > > wire? > > > > > > The advantage of going over the wire is avoiding code > > > duplication. > > > The cost is latency. Obviously the goal of LOCALIO is to find > > > those > > > points where the latency saving justifies the code duplication. > > > > > > When opening with AUTH_UNIX the code duplication to determine the > > > correct credential is small and easy to review. If we ever > > > wanted to > > > support KRB5 or TLS I would be a lot less comfortable about > > > reviewing > > > the code duplication. > > > > > > So I think it is worth considering whether an over-the-wire open > > > is > > > really all that costly. As I noted we already have an over-the- > > > wire > > > request at open time. We could conceivably send the LOCALIO-OPEN > > > request at the same time so as not to add latency. We could > > > receive > > > the > > > reply through the in-kernel backchannel so there is no RPC reply. > > > > > > That might all be too complex and might not be justified. My > > > point > > > is > > > that I think the trade-offs are subtle and I think the FAQ answer > > > cuts > > > off an avenue that hasn't really been explored. > > > > > > > So, your argument is that if there was a hypothetical situation > > where > > we wanted to add krb5 or TLS support, then we'd have more code to > > review? > > > > The counter-argument would be that we've already established the > > right > > of the client to do I/O to the file. This will already have been > > done > > by an over-the-wire call to OPEN (NFSv4), ACCESS (NFSv3/NFSv4) or > > CREATE (NFSv3). Those calls will have used krb5 and/or TLS to > > authenticate the user. All that remains to be done is perform the > > I/O > > that was authorised by those calls. > > The other thing that remains is to get the correct 'struct cred *' to > store in ->f_cred (or to use for lookup in the nfsd filecache). This was why the original code called up into the sunrpc server domain code, and hence did consult with mountd when needed. Is there any reason to believe that we shouldn't be able to do the same with future security models? As I said, the client has direct access to all the RPCSEC_GSS/TLS session info, or other info that might be needed to look up the corresponding information in knfsd. In the worst case, we could fall back to sending on-the-wire info until the relevant context information has been re-established. > > > > > Furthermore, we'd already have established that the client and the > > knfsd instance are running in the same kernel space on the same > > hardware (whether real or virtualised). There is no chance for a > > bad > > actor to compromise the one without also compromising the other. > > However, let's assume that somehow is possible: How does throwing > > in an > > on-the-wire protocol that is initiated by the one and interpreted > > by > > the other going to help, given that both have access to the exact > > same > > RPCSEC_GSS/TLS session and shared secret information via shared > > kernel > > memory? > > > > So again, what problem are you trying to fix? > > Conversely: what exactly is this FAQ entry trying to argue against? > > My current immediate goal is for the FAQ to be useful. It mostly is, > but this one question/answer isn't clear to me. The question arose from the feedback when Mike submitted the earlier drafts in the beginning of July. I was on vacation at the time, but my understanding is that several people (including some listed in the MAINTAINERS file) were asking questiona about how the code would support RPCSEC_GSS and TLS. The FAQ entry is a direct response to those questions. I'm happy to ask Mike to drop that entry if everyone agrees that it is redundant, but the point is that at the time, there was a set of questions around this, and they were clearly blocking the ability to submit the code for merging.
On Wed, Aug 28, 2024 at 09:41:05AM +1000, NeilBrown wrote: > On Wed, 28 Aug 2024, Trond Myklebust wrote: > > On Wed, 2024-08-28 at 07:49 +1000, NeilBrown wrote: > > > On Tue, 27 Aug 2024, Trond Myklebust wrote: > > > > > > > > > > > > > > > > On Aug 25, 2024, at 9:56 PM, NeilBrown <neilb@suse.de> wrote: > > > > > > > > > > > > While I'm not advocating for an over-the-wire request to map a > > > > > > filehandle to a struct nfsd_file*, I don't think you can > > > > > > convincingly > > > > > > argue against it without concrete performance measurements. > > > > > > > > > > > > > What is the value of doing an open over the wire? What are you > > > > trying > > > > to accomplish that can't be accomplished without going over the > > > > wire? > > > > > > The advantage of going over the wire is avoiding code duplication. > > > The cost is latency. Obviously the goal of LOCALIO is to find those > > > points where the latency saving justifies the code duplication. > > > > > > When opening with AUTH_UNIX the code duplication to determine the > > > correct credential is small and easy to review. If we ever wanted to > > > support KRB5 or TLS I would be a lot less comfortable about reviewing > > > the code duplication. > > > > > > So I think it is worth considering whether an over-the-wire open is > > > really all that costly. As I noted we already have an over-the-wire > > > request at open time. We could conceivably send the LOCALIO-OPEN > > > request at the same time so as not to add latency. We could receive > > > the > > > reply through the in-kernel backchannel so there is no RPC reply. > > > > > > That might all be too complex and might not be justified. My point > > > is > > > that I think the trade-offs are subtle and I think the FAQ answer > > > cuts > > > off an avenue that hasn't really been explored. > > > > > > > So, your argument is that if there was a hypothetical situation where > > we wanted to add krb5 or TLS support, then we'd have more code to > > review? > > > > The counter-argument would be that we've already established the right > > of the client to do I/O to the file. This will already have been done > > by an over-the-wire call to OPEN (NFSv4), ACCESS (NFSv3/NFSv4) or > > CREATE (NFSv3). Those calls will have used krb5 and/or TLS to > > authenticate the user. All that remains to be done is perform the I/O > > that was authorised by those calls. > > The other thing that remains is to get the correct 'struct cred *' to > store in ->f_cred (or to use for lookup in the nfsd filecache). > > > > > Furthermore, we'd already have established that the client and the > > knfsd instance are running in the same kernel space on the same > > hardware (whether real or virtualised). There is no chance for a bad > > actor to compromise the one without also compromising the other. > > However, let's assume that somehow is possible: How does throwing in an > > on-the-wire protocol that is initiated by the one and interpreted by > > the other going to help, given that both have access to the exact same > > RPCSEC_GSS/TLS session and shared secret information via shared kernel > > memory? > > > > So again, what problem are you trying to fix? > > Conversely: what exactly is this FAQ entry trying to argue against? > > My current immediate goal is for the FAQ to be useful. It mostly is, > but this one question/answer isn't clear to me. The current answer to question 6 isn't meant to be dealing in absolutes, nor does it have to (but I agree that "negating any benefit" should be softened given we don't _know_ how it'd play out without implementing open-over-the-wire entirely to benchmark). We just need to give context for what motivated the current implementation: network protocol avoidance where possible. Given everything, do you have a suggestion for how to improve the answer to question 6? Happy to revise it however you like. Here is the incremental patch I just came up with. Any better? diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst index 4b6d63246479..5d652f637a97 100644 --- a/Documentation/filesystems/nfs/localio.rst +++ b/Documentation/filesystems/nfs/localio.rst @@ -120,12 +120,13 @@ FAQ using RPC, beneficial? Is the benefit pNFS specific? Avoiding the use of XDR and RPC for file opens is beneficial to - performance regardless of whether pNFS is used. However adding a - requirement to go over the wire to do an open and/or close ends up - negating any benefit of avoiding the wire for doing the I/O itself - when we’re dealing with small files. There is no benefit to replacing - the READ or WRITE with a new open and/or close operation that still - needs to go over the wire. + performance regardless of whether pNFS is used. Especially when + dealing with small files its best to avoid going over the wire + whenever possible, otherwise it could reduce or even negate the + benefits of avoiding the wire for doing the small file I/O itself. + Given LOCALIO's requirements the current approach of having the + client perform a server-side file open, without using RPC, is ideal. + If in the future requirements change then we can adapt accordingly. 7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)?
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst index 8cceb3db386a..4b6d63246479 100644 --- a/Documentation/filesystems/nfs/localio.rst +++ b/Documentation/filesystems/nfs/localio.rst @@ -61,6 +61,83 @@ fio for 20 secs with directio, qd of 8, 1 libaio thread: 128K read: IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec) 128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec) +FAQ +=== + +1. What are the use cases for LOCALIO? + + a. Workloads where the NFS client and server are on the same host + realize improved IO performance. In particular, it is common when + running containerised workloads for jobs to find themselves + running on the same host as the knfsd server being used for + storage. + +2. What are the requirements for LOCALIO? + + a. Bypass use of the network RPC protocol as much as possible. This + includes bypassing XDR and RPC for open, read, write and commit + operations. + b. Allow client and server to autonomously discover if they are + running local to each other without making any assumptions about + the local network topology. + c. Support the use of containers by being compatible with relevant + namespaces (e.g. network, user, mount). + d. Support all versions of NFS. NFSv3 is of particular importance + because it has wide enterprise usage and pNFS flexfiles makes use + of it for the data path. + +3. Why doesn’t LOCALIO just compare IP addresses or hostnames when + deciding if the NFS client and server are co-located on the same + host? + + Since one of the main use cases is containerised workloads, we cannot + assume that IP addresses will be shared between the client and + server. This sets up a requirement for a handshake protocol that + needs to go over the same connection as the NFS traffic in order to + identify that the client and the server really are running on the + same host. The handshake uses a secret that is sent over the wire, + and can be verified by both parties by comparing with a value stored + in shared kernel memory if they are truly co-located. + +4. Does LOCALIO improve pNFS flexfiles? + + Yes, LOCALIO complements pNFS flexfiles by allowing it to take + advantage of NFS client and server locality. Policy that initiates + client IO as closely to the server where the data is stored naturally + benefits from the data path optimization LOCALIO provides. + +5. Why not develop a new pNFS layout to enable LOCALIO? + + A new pNFS layout could be developed, but doing so would put the + onus on the server to somehow discover that the client is co-located + when deciding to hand out the layout. + There is value in a simpler approach (as provided by LOCALIO) that + allows the NFS client to negotiate and leverage locality without + requiring more elaborate modeling and discovery of such locality in a + more centralized manner. + +6. Why is having the client perform a server-side file OPEN, without + using RPC, beneficial? Is the benefit pNFS specific? + + Avoiding the use of XDR and RPC for file opens is beneficial to + performance regardless of whether pNFS is used. However adding a + requirement to go over the wire to do an open and/or close ends up + negating any benefit of avoiding the wire for doing the I/O itself + when we’re dealing with small files. There is no benefit to replacing + the READ or WRITE with a new open and/or close operation that still + needs to go over the wire. + +7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)? + + Strong authentication is usually tied to the connection itself. It + works by establishing a context that is cached by the server, and + that acts as the key for discovering the authorisation token, which + can then be passed to rpc.mountd to complete the authentication + process. On the other hand, in the case of AUTH_UNIX, the credential + that was passed over the wire is used directly as the key in the + upcall to rpc.mountd. This simplifies the authentication process, and + so makes AUTH_UNIX easier to support. + RPC ===