Message ID | 53FC9545.4000800@plexistor.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Aug 26, 2014 at 10:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote: > From: Boaz Harrosh <boaz@plexistor.com> > > This fixes a dead-lock in the pnfs recall processing > > pnfs_layoutcommit_inode() is called through update_inode() > called from VFS. By setting set_inode_dirty during > pnfs write IO. > > But the VFS will not schedule another update_inode() > If it is already inside an update_inode() or an sb-writeback > > As part of writeback pnfs code might get stuck in LAYOUT_GET > with the server returning ERR_RECALL_CONFLICT because some > operation has caused the server to RECALL all layouts, including > those from our client. > > So the RECALL is received, but our client is returning ERR_DELAY > because its write-segments need a LAYOUT_COMMIT, but > pnfs_layoutcommit_inode will never come because it is scheduled > behind the LAYOUT_GET which is stuck waiting for the recall to > finish > > Hence the deadlock, client is stuck polling LAYOUT_GET receiving > ERR_RECALL_CONFLICT. Server is stuck polling RECALL receiving > ERR_DELAY. > > With pnfs-objects the above condition can easily happen, when > a file grows beyond a group of devices. The pnfs-objects-server > will RECALL all layouts because the file-objects-map will > change and all old layouts will have stale attributes, therefor > the RECALL is initiated as part of a LAYOUT_GET, and this can > be triggered from within a single client operation. > > A simple solution is to kick out a pnfs_layoutcommit_inode() > from within the recall, to free any need-to-commit segments > and let the client return success on the RECALL, so streaming > can continue. > > This patch Is based on 3.17-rc1. It is completely UNTESTED. > I have tested a version of this patch at around the 3.12 Kernel > at which point the deadlock was resolved but I hit some race > conditions on pnfs state management farther on, so the actual > overall processing was not fixed. But hopefully these were fixed > by Trond and Christoph, and it should work better now. > > Signed-off-by: Boaz Harrosh <boaz@plexistor.com> > --- > fs/nfs/callback_proc.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c > index 41db525..8660f96 100644 > --- a/fs/nfs/callback_proc.c > +++ b/fs/nfs/callback_proc.c > @@ -171,6 +171,14 @@ static u32 initiate_file_draining(struct nfs_client *clp, > goto out; > > ino = lo->plh_inode; > + > + spin_lock(&ino->i_lock); > + pnfs_set_layout_stateid(lo, &args->cbl_stateid, true); > + spin_unlock(&ino->i_lock); > + > + /* kick out any segs held by need to commit */ > + pnfs_layoutcommit_inode(ino, true); Making this call synchronous could deadlock the entire back channel. Is there any reason why it can't just be made asynchonous? > + > spin_lock(&ino->i_lock); > if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) || > pnfs_mark_matching_lsegs_invalid(lo, &free_me_list, > @@ -178,7 +186,6 @@ static u32 initiate_file_draining(struct nfs_client *clp, > rv = NFS4ERR_DELAY; > else > rv = NFS4ERR_NOMATCHING_LAYOUT; > - pnfs_set_layout_stateid(lo, &args->cbl_stateid, true); > spin_unlock(&ino->i_lock); > pnfs_free_lseg_list(&free_me_list); > pnfs_put_layout_hdr(lo); > -- > 1.9.3 > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2014 05:26 PM, Trond Myklebust wrote: > On Tue, Aug 26, 2014 at 10:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote: >> + >> + /* kick out any segs held by need to commit */ >> + pnfs_layoutcommit_inode(ino, true); > > Making this call synchronous could deadlock the entire back channel. > Is there any reason why it can't just be made asynchonous? > We were just talking about that. So the logic here is that we want to save round trips and make this as efficient as possible with no extra round trips for the server recall. A single RECALL => LAYOUT_COMMIT => LAYOUT_COMMIT_REPLAY REACLL_REPLAY_NO_MATCHING. Please explain the deadlock you foresee. The worst is that the mal-behaving server will time-out and after a long time the RECALL_REPLAY will return with ERR. But why do you say deadlock how can this deadlock? Otherwise Christoph's version of this patch does the asynchonous way which will always cause another poll of the RECALL and more delays for every RECALL operation, which I was trying to avoid. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2014 05:37 PM, Boaz Harrosh wrote: > On 08/26/2014 05:26 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 10:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote: >>> + >>> + /* kick out any segs held by need to commit */ >>> + pnfs_layoutcommit_inode(ino, true); >> >> Making this call synchronous could deadlock the entire back channel. >> Is there any reason why it can't just be made asynchonous? >> > > We were just talking about that. > > So the logic here is that we want to save round trips and make this > as efficient as possible with no extra round trips for the server > recall. A single RECALL => LAYOUT_COMMIT => LAYOUT_COMMIT_REPLAY > REACLL_REPLAY_NO_MATCHING. > > Please explain the deadlock you foresee. The worst is that the > mal-behaving server will time-out and after a long time the > RECALL_REPLAY will return with ERR. But why do you say deadlock > how can this deadlock? > > Otherwise Christoph's version of this patch does the asynchonous > way which will always cause another poll of the RECALL and more delays > for every RECALL operation, which I was trying to avoid. > Also, for any client there is only a single server on it's backchannel do you mean that ANY recall from any server can only come on a single thread? I did not know that I thought it was from a pool of RPC threads. So are you saying that waiting on this server operation will cause any other server's recall to some other mounts stall until the operation is complete for this server? Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 26, 2014 at 10:37 AM, Boaz Harrosh <boaz@plexistor.com> wrote: > On 08/26/2014 05:26 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 10:10 AM, Boaz Harrosh <boaz@plexistor.com> wrote: >>> + >>> + /* kick out any segs held by need to commit */ >>> + pnfs_layoutcommit_inode(ino, true); >> >> Making this call synchronous could deadlock the entire back channel. >> Is there any reason why it can't just be made asynchonous? >> > > We were just talking about that. > > So the logic here is that we want to save round trips and make this > as efficient as possible with no extra round trips for the server > recall. A single RECALL => LAYOUT_COMMIT => LAYOUT_COMMIT_REPLAY > REACLL_REPLAY_NO_MATCHING. > > Please explain the deadlock you foresee. The worst is that the > mal-behaving server will time-out and after a long time the > RECALL_REPLAY will return with ERR. But why do you say deadlock > how can this deadlock? > > Otherwise Christoph's version of this patch does the asynchonous > way which will always cause another poll of the RECALL and more delays > for every RECALL operation, which I was trying to avoid. > > Thanks > Boaz > The above can deadlock if there are no session slots available to send the layoutcommit, in which case the recall won't complete, and the layoutget won't get a reply (which would free up the slot).
On 08/26/2014 05:55 PM, Trond Myklebust wrote: > On Tue, Aug 26, 2014 at 10:37 AM, Boaz Harrosh <boaz@plexistor.com> wrote: > > The above can deadlock if there are no session slots available to send > the layoutcommit, in which case the recall won't complete, and the > layoutget won't get a reply (which would free up the slot). > What? the back-channel and the fore-channel do not use the same slots. these are two different slots, No? Matt, Adam you need to chip in here. If it is as you say, then yes it must be as Christoph wrote it. And the Ganesha server must be fixed because it has a slot system per channel. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? Matt ----- "Boaz Harrosh" <boaz@plexistor.com> wrote: > On 08/26/2014 05:55 PM, Trond Myklebust wrote: > > On Tue, Aug 26, 2014 at 10:37 AM, Boaz Harrosh <boaz@plexistor.com> > wrote: > > > > The above can deadlock if there are no session slots available to > send > > the layoutcommit, in which case the recall won't complete, and the > > layoutget won't get a reply (which would free up the slot). > > > > What? the back-channel and the fore-channel do not use the same > slots. these are two different slots, No? > > Matt, Adam you need to chip in here. > > If it is as you say, then yes it must be as Christoph wrote it. > > And the Ganesha server must be fixed because it has a slot system per > channel. > > Thanks > Boaz
On Tue, Aug 26, 2014 at 11:24 AM, Matt W. Benjamin <matt@linuxbox.com> wrote: > IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? Yep. layoutcommit will be sent on the fore channel, which is why it can deadlock with the initial layoutget (or whatever operation that triggered the layout recall). > Matt > > ----- "Boaz Harrosh" <boaz@plexistor.com> wrote: > >> On 08/26/2014 05:55 PM, Trond Myklebust wrote: >> > On Tue, Aug 26, 2014 at 10:37 AM, Boaz Harrosh <boaz@plexistor.com> >> wrote: >> > >> > The above can deadlock if there are no session slots available to >> send >> > the layoutcommit, in which case the recall won't complete, and the >> > layoutget won't get a reply (which would free up the slot). >> > >> >> What? the back-channel and the fore-channel do not use the same >> slots. these are two different slots, No? >> >> Matt, Adam you need to chip in here. >> >> If it is as you say, then yes it must be as Christoph wrote it. >> >> And the Ganesha server must be fixed because it has a slot system per >> channel. >> >> Thanks >> Boaz > > -- > Matt Benjamin > The Linux Box > 206 South Fifth Ave. Suite 150 > Ann Arbor, MI 48104 > > http://linuxbox.com > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309
On 08/26/2014 06:36 PM, Trond Myklebust wrote: > On Tue, Aug 26, 2014 at 11:24 AM, Matt W. Benjamin <matt@linuxbox.com> wrote: >> IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? Matt no top post on a Linux mailing list ;-) > Yep. layoutcommit will be sent on the fore channel, which is why it > can deadlock with the initial layoutget (or whatever operation that > triggered the layout recall). Trond you said below: > The above can deadlock if there are no session slots available to send > the layoutcommit, in which case the recall won't complete, and the > layoutget won't get a reply (which would free up the slot). Why would the layoutget not-get-a-reply ? This is how it goes with Both ganesha server and knfsd last I tested. [1] The LAYOUT_GET cause LAYOUT_RECALL case: (including the lo_commit) client Server comments ~~~~~~ ~~~~~~ ~~~~~~~~ LAYOUT_GET ==> <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) <--------- fore-channel is free <== RECALL LAYOUT_COMMIT ==> <== LAYOUT_COMMIT_REPLAY <--------- fore-channel is free RECALL_REPLY(NO_MATCHING) => <--------- back-channel is free Note that in this case the server is to send the RECALL only after the error reply to LAYOUT_GET, specifically it is not aloud to get stuck inside LAYOUT_GET and wait for the RECALL. (mandated by STD) [2] The LAYOUT_GET sent all the while a RECALL is on the wire: client Server comments ~~~~~~ ~~~~~~ ~~~~~~~~ <== RECALL LAYOUT_GET ==> <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) <--------- fore-channel is free LAYOUT_COMMIT ==> LAYOUT_COMMIT_REPLAY <--------- fore-channel is free RECALL_REPLY(NO_MATCHING) => <--------- back-channel is free [3] Or the worst case that lo_commit needs to wait for the channel Similar to [2] above: client Server comments ~~~~~~ ~~~~~~ ~~~~~~~~ <== RECALL LAYOUT_GET ==> initiate_lo_commit ==> slot is taken needs to wait <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) <--------- fore-channel is free LAYOUT_COMMIT ==> slot is now free lo_commit goes through <== LAYOUT_COMMIT_REPLAY <--------- fore-channel is free RECALL_REPLY(NO_MATCHING) => <--------- back-channel is free So the most important is that the server must not get stuck in lo_get and since there is a slot for each channel the lo_commit can be sent from within the recall. What am I missing? Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 26, 2014 at 12:56 PM, Boaz Harrosh <boaz@plexistor.com> wrote: > On 08/26/2014 06:36 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 11:24 AM, Matt W. Benjamin <matt@linuxbox.com> wrote: >>> IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? > > > Matt no top post on a Linux mailing list ;-) > >> Yep. layoutcommit will be sent on the fore channel, which is why it >> can deadlock with the initial layoutget (or whatever operation that >> triggered the layout recall). > > Trond you said below: >> The above can deadlock if there are no session slots available to send >> the layoutcommit, in which case the recall won't complete, and the >> layoutget won't get a reply (which would free up the slot). > > Why would the layoutget not-get-a-reply ? > This is how it goes with Both ganesha server and knfsd last I tested. > > [1] > The LAYOUT_GET cause LAYOUT_RECALL case: (including the lo_commit) > > client Server comments > ~~~~~~ ~~~~~~ ~~~~~~~~ > LAYOUT_GET ==> > <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) > <--------- fore-channel is free > <== RECALL > LAYOUT_COMMIT ==> > <== LAYOUT_COMMIT_REPLAY > <--------- fore-channel is free Beep! No free slots, so this hangs. > RECALL_REPLY(NO_MATCHING) => > <--------- back-channel is free > > Note that in this case the server is to send the RECALL only after > the error reply to LAYOUT_GET, specifically it is not aloud to get stuck > inside LAYOUT_GET and wait for the RECALL. (mandated by STD) > > [2] > The LAYOUT_GET sent all the while a RECALL is on the wire: > client Server comments > ~~~~~~ ~~~~~~ ~~~~~~~~ > <== RECALL > LAYOUT_GET ==> > <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) > <--------- fore-channel is free > LAYOUT_COMMIT ==> > LAYOUT_COMMIT_REPLAY > <--------- fore-channel is free > RECALL_REPLY(NO_MATCHING) => > <--------- back-channel is free > > > [3] > Or the worst case that lo_commit needs to wait for the channel Similar > to [2] above: > > client Server comments > ~~~~~~ ~~~~~~ ~~~~~~~~ > <== RECALL > LAYOUT_GET ==> > initiate_lo_commit ==> slot is taken needs to wait > > <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) > <--------- fore-channel is free > LAYOUT_COMMIT ==> slot is now free lo_commit goes through > <== LAYOUT_COMMIT_REPLAY > <--------- fore-channel is free > RECALL_REPLY(NO_MATCHING) => > <--------- back-channel is free > > So the most important is that the server must not get stuck in lo_get > and since there is a slot for each channel the lo_commit can be sent > from within the recall. > > What am I missing? > > Thanks > Boaz >
On 08/26/2014 07:59 PM, Trond Myklebust wrote: > On Tue, Aug 26, 2014 at 12:56 PM, Boaz Harrosh <boaz@plexistor.com> wrote: >> On 08/26/2014 06:36 PM, Trond Myklebust wrote: >>> On Tue, Aug 26, 2014 at 11:24 AM, Matt W. Benjamin <matt@linuxbox.com> wrote: >>>> IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? >> >> >> Matt no top post on a Linux mailing list ;-) >> >>> Yep. layoutcommit will be sent on the fore channel, which is why it >>> can deadlock with the initial layoutget (or whatever operation that >>> triggered the layout recall). >> >> Trond you said below: >>> The above can deadlock if there are no session slots available to send >>> the layoutcommit, in which case the recall won't complete, and the >>> layoutget won't get a reply (which would free up the slot). >> >> Why would the layoutget not-get-a-reply ? >> This is how it goes with Both ganesha server and knfsd last I tested. >> >> [1] >> The LAYOUT_GET cause LAYOUT_RECALL case: (including the lo_commit) >> >> client Server comments >> ~~~~~~ ~~~~~~ ~~~~~~~~ >> LAYOUT_GET ==> >> <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) >> <--------- fore-channel is free >> <== RECALL >> LAYOUT_COMMIT ==> >> <== LAYOUT_COMMIT_REPLAY >> <--------- fore-channel is free > > Beep! No free slots, so this hangs. > Beep! does not do a very good of a job to explain. Sorry What do you mean? which slot? which channel? Just above your text it says "fore-channel is free" so are you saying it is not free? why not. Please use more then one line of text to explain. It might be clear to you but not to me. >> RECALL_REPLY(NO_MATCHING) => >> <--------- back-channel is free Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 26, 2014 at 1:06 PM, Boaz Harrosh <boaz@plexistor.com> wrote: > On 08/26/2014 07:59 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 12:56 PM, Boaz Harrosh <boaz@plexistor.com> wrote: >>> On 08/26/2014 06:36 PM, Trond Myklebust wrote: >>>> On Tue, Aug 26, 2014 at 11:24 AM, Matt W. Benjamin <matt@linuxbox.com> wrote: >>>>> IIUC, the problem is the forechannel slot count, since the call you want to make synchronously is on the forechannel? >>> >>> >>> Matt no top post on a Linux mailing list ;-) >>> >>>> Yep. layoutcommit will be sent on the fore channel, which is why it >>>> can deadlock with the initial layoutget (or whatever operation that >>>> triggered the layout recall). >>> >>> Trond you said below: >>>> The above can deadlock if there are no session slots available to send >>>> the layoutcommit, in which case the recall won't complete, and the >>>> layoutget won't get a reply (which would free up the slot). >>> >>> Why would the layoutget not-get-a-reply ? >>> This is how it goes with Both ganesha server and knfsd last I tested. >>> >>> [1] >>> The LAYOUT_GET cause LAYOUT_RECALL case: (including the lo_commit) >>> >>> client Server comments >>> ~~~~~~ ~~~~~~ ~~~~~~~~ >>> LAYOUT_GET ==> >>> <== LAYOUT_GET_REPLAY(ERR_RECALL_CONFLICT) >>> <--------- fore-channel is free >>> <== RECALL >>> LAYOUT_COMMIT ==> >>> <== LAYOUT_COMMIT_REPLAY >>> <--------- fore-channel is free >> >> Beep! No free slots, so this hangs. >> > > Beep! does not do a very good of a job to explain. Sorry > > What do you mean? which slot? which channel? Just above your text it says > "fore-channel is free" so are you saying it is not free? why not. > Please use more then one line of text to explain. It might be clear to > you but not to me. The deadlock occurs _if_ the above layout commit is unable to get a slot. You can't guarantee that it will, because the slot table is a finite resource and it can be exhausted if you allow fore channel calls to trigger synchronous recalls on the back channel that again trigger synchronous calls on the fore channel. You're basically saying that the client needs to guarantee that it can allocate 2 slots before it is allowed to send a layoutget just in case the server needs to recall a layout. If, OTOH, the layoutcommit is asynchronous, then there is no serialisation and the back channel thread can happily reply to the layout recall even if there are no free slots in the fore channel. >>> RECALL_REPLY(NO_MATCHING) => >>> <--------- back-channel is free > > Thanks > Boaz >
On 08/26/2014 08:54 PM, Trond Myklebust wrote: > On Tue, Aug 26, 2014 at 1:06 PM, Boaz Harrosh <boaz@plexistor.com> wrote: > > The deadlock occurs _if_ the above layout commit is unable to get a > slot. You can't guarantee that it will, because the slot table is a > finite resource and it can be exhausted Yes all I ever seen is 1 slot in any of the clients/servers I've seen so I assume 1 slot ever > if you allow fore channel > calls to trigger synchronous recalls on the back channel Beep! but this is exactly what I'm trying to say. The STD specifically forbids that. The server is not allowed to wait here, it must return imitatively, with an error that frees the slot and then later issue the RECALL. This is what I said exactly three times in my mail, and what I have depicted in my flow: Server async operation (mandated by the STD) Client back-channel can be sync with for channel (Not mentioned by the STD) > that again trigger synchronous calls on the fore channel. > You're basically saying > that the client needs to guarantee that it can allocate 2 slots before > it is allowed to send a layoutget just in case the server needs to > recall a layout. > No I am not saying that, please count. Since the Server is not allowed sync operation then one slot is enough and the client can do sync lo_commit while in recall. > If, OTOH, the layoutcommit is asynchronous, then there is no > serialisation and the back channel thread can happily reply to the > layout recall even if there are no free slots in the fore channel. > Sure that will work as well, but not optimally, and for no good reason. Please go back to my flow with the 3 cases. See how the server never waits for anything and will always imitatively reply to the layout_get. Since the server is not allowed a sync operation and is mandated by the RFC text to not wait, then the client is allowed and can do sync operations because it is enough that only one do async. BTW: If what you are saying is true than there is a bug in the slot code because this patch does work, and everything flows past this situation. I have a reproducer test that fails 100% of the time without this patch and only fails much later at some other place, but not at this deadlock, with this patch applied. Cheers Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2014 09:19 PM, Boaz Harrosh wrote: <> > Beep! but this is exactly what I'm trying to say. The STD specifically > forbids that. The server is not allowed to wait here, it must return > imitatively, with an error that frees the slot and then later issue the > RECALL. > > This is what I said exactly three times in my mail, and what I have > depicted in my flow: > Server async operation (mandated by the STD) > Client back-channel can be sync with for channel (Not mentioned by the STD) > BTW Both Ganesha and kpnfsd behave the same. A recall issued while in layout_get the layout-get reply will be sent first and then the recall will be sent on the wire. If the recall was sent before the receive of the layout_get then an error/success is returned imitatively without ever waiting for recall_reply to return. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 26, 2014 at 2:19 PM, Boaz Harrosh <boaz@plexistor.com> wrote: > On 08/26/2014 08:54 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 1:06 PM, Boaz Harrosh <boaz@plexistor.com> wrote: > >> >> The deadlock occurs _if_ the above layout commit is unable to get a >> slot. You can't guarantee that it will, because the slot table is a >> finite resource and it can be exhausted > > Yes all I ever seen is 1 slot in any of the clients/servers I've > seen so I assume 1 slot ever > >> if you allow fore channel >> calls to trigger synchronous recalls on the back channel > > Beep! but this is exactly what I'm trying to say. The STD specifically > forbids that. The server is not allowed to wait here, it must return > imitatively, with an error that frees the slot and then later issue the > RECALL. > > This is what I said exactly three times in my mail, and what I have > depicted in my flow: > Server async operation (mandated by the STD) > Client back-channel can be sync with for channel (Not mentioned by the STD) > >> that again trigger synchronous calls on the fore channel. > > >> You're basically saying >> that the client needs to guarantee that it can allocate 2 slots before >> it is allowed to send a layoutget just in case the server needs to >> recall a layout. >> > > No I am not saying that, please count. Since the Server is not allowed > sync operation then one slot is enough and the client can do sync lo_commit > while in recall. > >> If, OTOH, the layoutcommit is asynchronous, then there is no >> serialisation and the back channel thread can happily reply to the >> layout recall even if there are no free slots in the fore channel. >> > > Sure that will work as well, but not optimally, and for no good reason. > > Please go back to my flow with the 3 cases. See how the server never waits > for anything and will always imitatively reply to the layout_get. > Since the server is not allowed a sync operation and is mandated by the > RFC text to not wait, then the client is allowed and can do sync operations > because it is enough that only one do async. > > BTW: If what you are saying is true than there is a bug in the slot code > because this patch does work, and everything flows past this situation. > I have a reproducer test that fails 100% of the time without this patch > and only fails much later at some other place, but not at this deadlock, > with this patch applied. > > Cheers > Boaz > Whether or not your particular server allows it or not is irrelevant. We're not coding the client to a particular implementation. None of the other callbacks do synchronous RPC calls, and that's very intentional.
On Tue, Aug 26, 2014 at 2:41 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Tue, Aug 26, 2014 at 2:19 PM, Boaz Harrosh <boaz@plexistor.com> wrote: >> On 08/26/2014 08:54 PM, Trond Myklebust wrote: >>> On Tue, Aug 26, 2014 at 1:06 PM, Boaz Harrosh <boaz@plexistor.com> wrote: >> >>> >>> The deadlock occurs _if_ the above layout commit is unable to get a >>> slot. You can't guarantee that it will, because the slot table is a >>> finite resource and it can be exhausted >> >> Yes all I ever seen is 1 slot in any of the clients/servers I've >> seen so I assume 1 slot ever >> >>> if you allow fore channel >>> calls to trigger synchronous recalls on the back channel >> >> Beep! but this is exactly what I'm trying to say. The STD specifically >> forbids that. The server is not allowed to wait here, it must return >> imitatively, with an error that frees the slot and then later issue the >> RECALL. >> >> This is what I said exactly three times in my mail, and what I have >> depicted in my flow: >> Server async operation (mandated by the STD) >> Client back-channel can be sync with for channel (Not mentioned by the STD) >> >>> that again trigger synchronous calls on the fore channel. >> >> >>> You're basically saying >>> that the client needs to guarantee that it can allocate 2 slots before >>> it is allowed to send a layoutget just in case the server needs to >>> recall a layout. >>> >> >> No I am not saying that, please count. Since the Server is not allowed >> sync operation then one slot is enough and the client can do sync lo_commit >> while in recall. >> >>> If, OTOH, the layoutcommit is asynchronous, then there is no >>> serialisation and the back channel thread can happily reply to the >>> layout recall even if there are no free slots in the fore channel. >>> >> >> Sure that will work as well, but not optimally, and for no good reason. >> >> Please go back to my flow with the 3 cases. See how the server never waits >> for anything and will always imitatively reply to the layout_get. >> Since the server is not allowed a sync operation and is mandated by the >> RFC text to not wait, then the client is allowed and can do sync operations >> because it is enough that only one do async. >> >> BTW: If what you are saying is true than there is a bug in the slot code >> because this patch does work, and everything flows past this situation. >> I have a reproducer test that fails 100% of the time without this patch >> and only fails much later at some other place, but not at this deadlock, >> with this patch applied. >> >> Cheers >> Boaz >> > > Whether or not your particular server allows it or not is irrelevant. > We're not coding the client to a particular implementation. None of > the other callbacks do synchronous RPC calls, and that's very > intentional. > So to return to the original question: could we please change the layoutcommit in your patch so that it is asynchronous?
On 08/26/2014 09:41 PM, Trond Myklebust wrote: <> > Whether or not your particular server allows it or not is irrelevant. > We're not coding the client to a particular implementation. God, No this is not up to any particular server implementation it is stated clearly in the draft text. The Server is specifically forbidden to wait in lo_get and device_info/device_list actually any resource that is recallable. This is the STD exactly for those reasons you stated. > None of > the other callbacks do synchronous RPC calls, and that's very > intentional. > Because none of the other callbacks have the protection of the STD that forbids the server of synchronous operations, so the client have to. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/26/2014 10:46 PM, Trond Myklebust wrote: <> > > So to return to the original question: could we please change the > layoutcommit in your patch so that it is asynchronous? > There is nothing change. The original message at the head of this thread, is Christoph's patch with the asynchronous layoutcommit. As a reply I posted what I had in my tree, with a sync operation. Because I wanted the recall to take 500 micro-seconds instead of like 300-1000 mili-seconds. When the STD supports my model specifically I do not see why not Don't you guys have recalls on the hot path? Sigh, another nail in the coffin of what was suppose to be a dam good STD. Thinking about it from day one you had a personal agenda against RECALLS. For some reason, which you never told me, you decided to crash and burn any proper use of LO_RECALLS. With that dreaded *synchronous* polling driven, forgetful--model recall. (One day we'll sit on a Bear and you will tell me why? I really want to know) Now that there is the last ever chance to make an half dissent citizen out of what was left of recalls, you are all on your back feet, against it. With religious mantras, when you have never sat and actually stared at the prints and experienced recalls for yourself. Your words Christoph I have a deja-vu its principles against good science again. Lets revert to your patch. Reviewed-by: Boaz Harrosh <bharrosh@panasas.com> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/callback_proc.c b/fs/nfs/callback_proc.c index 41db525..8660f96 100644 --- a/fs/nfs/callback_proc.c +++ b/fs/nfs/callback_proc.c @@ -171,6 +171,14 @@ static u32 initiate_file_draining(struct nfs_client *clp, goto out; ino = lo->plh_inode; + + spin_lock(&ino->i_lock); + pnfs_set_layout_stateid(lo, &args->cbl_stateid, true); + spin_unlock(&ino->i_lock); + + /* kick out any segs held by need to commit */ + pnfs_layoutcommit_inode(ino, true); + spin_lock(&ino->i_lock); if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags) || pnfs_mark_matching_lsegs_invalid(lo, &free_me_list, @@ -178,7 +186,6 @@ static u32 initiate_file_draining(struct nfs_client *clp, rv = NFS4ERR_DELAY; else rv = NFS4ERR_NOMATCHING_LAYOUT; - pnfs_set_layout_stateid(lo, &args->cbl_stateid, true); spin_unlock(&ino->i_lock); pnfs_free_lseg_list(&free_me_list); pnfs_put_layout_hdr(lo);