Message ID | CABpMAyL-B9Kqg8O4xxxtj0X8PF7fP4TM+NnLA_Ynhm-_3C6MUg@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > owner@vger.kernel.org] On Behalf Of Idan Kedar > Sent: Thursday, August 09, 2012 9:03 AM > To: Boaz Harrosh; NFS list > Cc: Benny Halevy > Subject: return layout on error, BUG/deadlock > > Hi, > > As a result of some experiments, I wanted to see what happens when I > inject an error (hard coded) to the object layout driver. the patch is at the > bottom of this mail. the reason I did this is because when I inject errors in my > modified version of the object layout driver, I get the same BUG Tigran > reported about yesterday: > nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > > In my modified version (based on kernel 3.3), the bug seems to be that > pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there > is in-flight I/O. That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. See the changelog in the patch that I sent to the list yesterday.
On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> Sent: Thursday, August 09, 2012 9:03 AM >> To: Boaz Harrosh; NFS list >> Cc: Benny Halevy >> Subject: return layout on error, BUG/deadlock >> >> Hi, >> >> As a result of some experiments, I wanted to see what happens when I >> inject an error (hard coded) to the object layout driver. the patch is at the >> bottom of this mail. the reason I did this is because when I inject errors in my >> modified version of the object layout driver, I get the same BUG Tigran >> reported about yesterday: >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> is in-flight I/O. > > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. to what change are you referring? > > See the changelog in the patch that I sent to the list yesterday. > I saw that, and if I'm not mistaken these races apply to object layout as well, and in any case they apply in my case. However, it is not easy to mess around with LAYOUTRETURN in object layout, and there have been several discussions on the issue. In one of these discussions Benny clarified that the object layout client must wait for all in-flight I/O to end. So for file layout it probably makes sense, but object layout (and if I understand correctly, block layout as well) something else needs to be done. I thought about sync wait when returning the layout on error, but according to Boaz it will cause deadlocks (Boaz - can you please elaborate?). And come to think of it, nfs4_proc_setattr also returns the layout when there may be I/O in-flight (correct me if i'm wrong). So I guess pnfs_return_layout should somehow solve this by itself by saying "if this is fencing (a flag which will be set for file layout only), go ahead, otherwise make the layout as 'needs to be returned' and when the lseg lists gets empty return the layout". Comments? > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > >
On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: > On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > >> -----Original Message----- > >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > >> owner@vger.kernel.org] On Behalf Of Idan Kedar > >> Sent: Thursday, August 09, 2012 9:03 AM > >> To: Boaz Harrosh; NFS list > >> Cc: Benny Halevy > >> Subject: return layout on error, BUG/deadlock > >> > >> Hi, > >> > >> As a result of some experiments, I wanted to see what happens when I > >> inject an error (hard coded) to the object layout driver. the patch is at the > >> bottom of this mail. the reason I did this is because when I inject errors in my > >> modified version of the object layout driver, I get the same BUG Tigran > >> reported about yesterday: > >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > >> > >> In my modified version (based on kernel 3.3), the bug seems to be that > >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there > >> is in-flight I/O. > > > > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. > > to what change are you referring? As I stated in the changelog of the patch that I sent to the list yesterday, the behaviour is due to commit 0a57cdac3f. > > > > See the changelog in the patch that I sent to the list yesterday. > > > > I saw that, and if I'm not mistaken these races apply to object layout > as well, and in any case they apply in my case. However, it is not > easy to mess around with LAYOUTRETURN in object layout, and there have > been several discussions on the issue. In one of these discussions > Benny clarified that the object layout client must wait for all > in-flight I/O to end. If the problem is that the DS is failing to respond, how does the client know that the in-flight I/O has ended? > So for file layout it probably makes sense, but object layout (and if > I understand correctly, block layout as well) something else needs to > be done. I thought about sync wait when returning the layout on error, > but according to Boaz it will cause deadlocks (Boaz - can you please > elaborate?). The object layoutreturn has the ability to pass a timeout error value to the MDS precisely in order to allow the latter to deal with this kind of issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. The block layout is adding the same ability to layoutreturn in NFSv4.2 (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct layoutreturn_device_error4, so presumably they too have a plan for dealing with this kind of issue. > And come to think of it, nfs4_proc_setattr also returns the layout > when there may be I/O in-flight (correct me if i'm wrong). So I guess > pnfs_return_layout should somehow solve this by itself by saying "if > this is fencing (a flag which will be set for file layout only), go > ahead, otherwise make the layout as 'needs to be returned' and when > the lseg lists gets empty return the layout". The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is objects, so that question needs to be directed to Boaz. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> >> -----Original Message----- >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> To: Boaz Harrosh; NFS list >> >> Cc: Benny Halevy >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> Hi, >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> reported about yesterday: >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> is in-flight I/O. >> > >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> to what change are you referring? > > As I stated in the changelog of the patch that I sent to the list > yesterday, the behaviour is due to commit 0a57cdac3f. > >> > >> > See the changelog in the patch that I sent to the list yesterday. >> > >> >> I saw that, and if I'm not mistaken these races apply to object layout >> as well, and in any case they apply in my case. However, it is not >> easy to mess around with LAYOUTRETURN in object layout, and there have >> been several discussions on the issue. In one of these discussions >> Benny clarified that the object layout client must wait for all >> in-flight I/O to end. > > If the problem is that the DS is failing to respond, how does the client > know that the in-flight I/O has ended? > >> So for file layout it probably makes sense, but object layout (and if >> I understand correctly, block layout as well) something else needs to >> be done. I thought about sync wait when returning the layout on error, >> but according to Boaz it will cause deadlocks (Boaz - can you please >> elaborate?). > > The object layoutreturn has the ability to pass a timeout error value to > the MDS precisely in order to allow the latter to deal with this kind of > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > > The block layout is adding the same ability to layoutreturn in NFSv4.2 > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > layoutreturn_device_error4, so presumably they too have a plan for > dealing with this kind of issue. It is one thing to tell MDS that there is DS access error by sending layoutreturn, and it is another thing to return a layout even if there is overlapping in-flight DS IO... I certainly agree that client is entitled to return layout to inform MDS about DS errors and also avoid possible cb_layoutrecall. But it is just an optimization and should only be done when there is no in-flight IO (at least for block layout) IMHO. Thanks, Tao > >> And come to think of it, nfs4_proc_setattr also returns the layout >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> pnfs_return_layout should somehow solve this by itself by saying "if >> this is fencing (a flag which will be set for file layout only), go >> ahead, otherwise make the layout as 'needs to be returned' and when >> the lseg lists gets empty return the layout". > > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is > objects, so that question needs to be directed to Boaz. > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: > On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: > >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond > >> <Trond.Myklebust@netapp.com> wrote: > >> >> -----Original Message----- > >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar > >> >> Sent: Thursday, August 09, 2012 9:03 AM > >> >> To: Boaz Harrosh; NFS list > >> >> Cc: Benny Halevy > >> >> Subject: return layout on error, BUG/deadlock > >> >> > >> >> Hi, > >> >> > >> >> As a result of some experiments, I wanted to see what happens when I > >> >> inject an error (hard coded) to the object layout driver. the patch is at the > >> >> bottom of this mail. the reason I did this is because when I inject errors in my > >> >> modified version of the object layout driver, I get the same BUG Tigran > >> >> reported about yesterday: > >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > >> >> > >> >> In my modified version (based on kernel 3.3), the bug seems to be that > >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there > >> >> is in-flight I/O. > >> > > >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. > >> > >> to what change are you referring? > > > > As I stated in the changelog of the patch that I sent to the list > > yesterday, the behaviour is due to commit 0a57cdac3f. > > > >> > > >> > See the changelog in the patch that I sent to the list yesterday. > >> > > >> > >> I saw that, and if I'm not mistaken these races apply to object layout > >> as well, and in any case they apply in my case. However, it is not > >> easy to mess around with LAYOUTRETURN in object layout, and there have > >> been several discussions on the issue. In one of these discussions > >> Benny clarified that the object layout client must wait for all > >> in-flight I/O to end. > > > > If the problem is that the DS is failing to respond, how does the client > > know that the in-flight I/O has ended? > > > >> So for file layout it probably makes sense, but object layout (and if > >> I understand correctly, block layout as well) something else needs to > >> be done. I thought about sync wait when returning the layout on error, > >> but according to Boaz it will cause deadlocks (Boaz - can you please > >> elaborate?). > > > > The object layoutreturn has the ability to pass a timeout error value to > > the MDS precisely in order to allow the latter to deal with this kind of > > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > > > > The block layout is adding the same ability to layoutreturn in NFSv4.2 > > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > > layoutreturn_device_error4, so presumably they too have a plan for > > dealing with this kind of issue. > It is one thing to tell MDS that there is DS access error by sending > layoutreturn, and it is another thing to return a layout even if there > is overlapping in-flight DS IO... > > I certainly agree that client is entitled to return layout to inform > MDS about DS errors and also avoid possible cb_layoutrecall. But it is > just an optimization and should only be done when there is no > in-flight IO (at least for block layout) IMHO. HOW DO YOU GUARANTEE NO IN-FLIGHT IO? Repeating the same mantra about 'no in-flight IO' that doesn't apply to timeout situations isn't helpful. A TIMEOUT means that you have NO IDEA if the data is still in flight or not. That's when you need fencing, and the only thing that can supply fencing in that situation is the MDS. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> -----Original Message----- >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> To: Boaz Harrosh; NFS list >> >> >> Cc: Benny Halevy >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> Hi, >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> reported about yesterday: >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> is in-flight I/O. >> >> > >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> to what change are you referring? >> > >> > As I stated in the changelog of the patch that I sent to the list >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> > >> >> > >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> > >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> as well, and in any case they apply in my case. However, it is not >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> been several discussions on the issue. In one of these discussions >> >> Benny clarified that the object layout client must wait for all >> >> in-flight I/O to end. >> > >> > If the problem is that the DS is failing to respond, how does the client >> > know that the in-flight I/O has ended? >> > >> >> So for file layout it probably makes sense, but object layout (and if >> >> I understand correctly, block layout as well) something else needs to >> >> be done. I thought about sync wait when returning the layout on error, >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> elaborate?). >> > >> > The object layoutreturn has the ability to pass a timeout error value to >> > the MDS precisely in order to allow the latter to deal with this kind of >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> > >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> > layoutreturn_device_error4, so presumably they too have a plan for >> > dealing with this kind of issue. >> It is one thing to tell MDS that there is DS access error by sending >> layoutreturn, and it is another thing to return a layout even if there >> is overlapping in-flight DS IO... >> >> I certainly agree that client is entitled to return layout to inform >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is >> just an optimization and should only be done when there is no >> in-flight IO (at least for block layout) IMHO. > > HOW DO YOU GUARANTEE NO IN-FLIGHT IO? > I don't. That's why I don't return layout in pnfs_ld_write_done(). And for layoutreturn upon cb_layoutreturn, block layout client needs to do timed-lease IO fencing per rfc5663, but it is not implemented in Linux client. > Repeating the same mantra about 'no in-flight IO' that doesn't apply to > timeout situations isn't helpful. > > A TIMEOUT means that you have NO IDEA if the data is still in flight or > not. That's when you need fencing, and the only thing that can supply > fencing in that situation is the MDS. > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com >
On Thu, Aug 9, 2012 at 7:06 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> >> -----Original Message----- >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> To: Boaz Harrosh; NFS list >> >> Cc: Benny Halevy >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> Hi, >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> reported about yesterday: >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> is in-flight I/O. >> > >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> to what change are you referring? > > As I stated in the changelog of the patch that I sent to the list > yesterday, the behaviour is due to commit 0a57cdac3f. > I'm not sure I'm following. I was talking about the state of the code v3.5, not to any specific patch/change. since the client yelled "BUG!!!" and crashed, there is some bug involved, either because the BUG() is correct or because the presence of the BUG() is the bug itself. >> > >> > See the changelog in the patch that I sent to the list yesterday. >> > >> >> I saw that, and if I'm not mistaken these races apply to object layout >> as well, and in any case they apply in my case. However, it is not >> easy to mess around with LAYOUTRETURN in object layout, and there have >> been several discussions on the issue. In one of these discussions >> Benny clarified that the object layout client must wait for all >> in-flight I/O to end. > > If the problem is that the DS is failing to respond, how does the client > know that the in-flight I/O has ended? > after the last rpc_call_done is called there is no I/O from the client's perspective an the layout can be safely returned, after which I/O can be done through the MDS. Is there I/O pending from the DS perspective? maybe, but all you can do is send I/O via MDS and hope it will not race. fencing doesn't really save you, it just improves your odds where applicable, i.e. file layout. >> So for file layout it probably makes sense, but object layout (and if >> I understand correctly, block layout as well) something else needs to >> be done. I thought about sync wait when returning the layout on error, >> but according to Boaz it will cause deadlocks (Boaz - can you please >> elaborate?). > > The object layoutreturn has the ability to pass a timeout error value to > the MDS precisely in order to allow the latter to deal with this kind of > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > > The block layout is adding the same ability to layoutreturn in NFSv4.2 > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > layoutreturn_device_error4, so presumably they too have a plan for > dealing with this kind of issue. > I'll wait for Boaz (or someone else) to explain what exactly is "this kind of issue". I still don't know why sync wait would deadlock. >> And come to think of it, nfs4_proc_setattr also returns the layout >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> pnfs_return_layout should somehow solve this by itself by saying "if >> this is fencing (a flag which will be set for file layout only), go >> ahead, otherwise make the layout as 'needs to be returned' and when >> the lseg lists gets empty return the layout". > > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is > objects, so that question needs to be directed to Boaz. > Yes, this entire thread is mainly directed to Boaz... And IIRC he did want to implement something of that sort, and this thread was basically for asking him "did you?". > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com >
T24gVGh1LCAyMDEyLTA4LTA5IGF0IDE5OjU0ICswMzAwLCBJZGFuIEtlZGFyIHdyb3RlOg0KPiBP biBUaHUsIEF1ZyA5LCAyMDEyIGF0IDc6MDYgUE0sIE15a2xlYnVzdCwgVHJvbmQNCj4gPFRyb25k Lk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gPiBPbiBUaHUsIDIwMTItMDgtMDkgYXQg MTg6NDkgKzAzMDAsIElkYW4gS2VkYXIgd3JvdGU6DQo+ID4+IE9uIFRodSwgQXVnIDksIDIwMTIg YXQgNTowNSBQTSwgTXlrbGVidXN0LCBUcm9uZA0KPiA+PiA8VHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20+IHdyb3RlOg0KPiA+PiA+PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+PiA+ PiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZz LQ0KPiA+PiA+PiBvd25lckB2Z2VyLmtlcm5lbC5vcmddIE9uIEJlaGFsZiBPZiBJZGFuIEtlZGFy DQo+ID4+ID4+IFNlbnQ6IFRodXJzZGF5LCBBdWd1c3QgMDksIDIwMTIgOTowMyBBTQ0KPiA+PiA+ PiBUbzogQm9heiBIYXJyb3NoOyBORlMgbGlzdA0KPiA+PiA+PiBDYzogQmVubnkgSGFsZXZ5DQo+ ID4+ID4+IFN1YmplY3Q6IHJldHVybiBsYXlvdXQgb24gZXJyb3IsIEJVRy9kZWFkbG9jaw0KPiA+ PiA+Pg0KPiA+PiA+PiBIaSwNCj4gPj4gPj4NCj4gPj4gPj4gQXMgYSByZXN1bHQgb2Ygc29tZSBl eHBlcmltZW50cywgSSB3YW50ZWQgdG8gc2VlIHdoYXQgaGFwcGVucyB3aGVuIEkNCj4gPj4gPj4g aW5qZWN0IGFuIGVycm9yIChoYXJkIGNvZGVkKSB0byB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIu IHRoZSBwYXRjaCBpcyBhdCB0aGUNCj4gPj4gPj4gYm90dG9tIG9mIHRoaXMgbWFpbC4gdGhlIHJl YXNvbiBJIGRpZCB0aGlzIGlzIGJlY2F1c2Ugd2hlbiBJIGluamVjdCBlcnJvcnMgaW4gbXkNCj4g Pj4gPj4gbW9kaWZpZWQgdmVyc2lvbiBvZiB0aGUgb2JqZWN0IGxheW91dCBkcml2ZXIsIEkgZ2V0 IHRoZSBzYW1lIEJVRyBUaWdyYW4NCj4gPj4gPj4gcmVwb3J0ZWQgYWJvdXQgeWVzdGVyZGF5Og0K PiA+PiA+PiBuZnM0cHJvYy5jOjYyNTIgOiAgIEJVR19PTighbGlzdF9lbXB0eSgmbG8tPnBsaF9z ZWdzKSk7DQo+ID4+ID4+DQo+ID4+ID4+IEluIG15IG1vZGlmaWVkIHZlcnNpb24gKGJhc2VkIG9u IGtlcm5lbCAzLjMpLCB0aGUgYnVnIHNlZW1zIHRvIGJlIHRoYXQNCj4gPj4gPj4gcG5mc19sZF93 cml0ZV9kb25lIGNhbGxzIHBuZnNfcmV0dXJuX2xheW91dCBpbiB0aGUgZXJyb3IgcGF0aCwgZXZl biBpZiB0aGVyZQ0KPiA+PiA+PiBpcyBpbi1mbGlnaHQgSS9PLg0KPiA+PiA+DQo+ID4+ID4gVGhh dCBpcyBub3QgYSBidWcuIEl0IGlzIGFuIGludGVudGlvbmFsIGNoYW5nZSBpbiBvcmRlciB0byBh bGxvdyB0aGUgTURTIHRvIGZlbmNlIG9mZiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVzIChpZiBpdCBj YW4gZG8gc28pIGJlZm9yZSB3ZSByZXRyYW5zbWl0IHRoZW0gYXMgd3JpdGUtdGhyb3VnaC1NRFMu IE90aGVyd2lzZSwgeW91IHJpc2sgcmFjZXMgYmV0d2VlbiB0aGUgb3V0c3RhbmRpbmcgd3JpdGVz LXRvLURTIGFuZCB0aGUgbmV3IHdyaXRlcy10aHJvdWdoLU1EUy4NCj4gPj4NCj4gPj4gdG8gd2hh dCBjaGFuZ2UgYXJlIHlvdSByZWZlcnJpbmc/DQo+ID4NCj4gPiBBcyBJIHN0YXRlZCBpbiB0aGUg Y2hhbmdlbG9nIG9mIHRoZSBwYXRjaCB0aGF0IEkgc2VudCB0byB0aGUgbGlzdA0KPiA+IHllc3Rl cmRheSwgdGhlIGJlaGF2aW91ciBpcyBkdWUgdG8gY29tbWl0IDBhNTdjZGFjM2YuDQo+ID4NCj4g DQo+IEknbSBub3Qgc3VyZSBJJ20gZm9sbG93aW5nLiBJIHdhcyB0YWxraW5nIGFib3V0IHRoZSBz dGF0ZSBvZiB0aGUgY29kZQ0KPiB2My41LCBub3QgdG8gYW55IHNwZWNpZmljIHBhdGNoL2NoYW5n ZS4gc2luY2UgdGhlIGNsaWVudCB5ZWxsZWQNCj4gIkJVRyEhISIgYW5kIGNyYXNoZWQsIHRoZXJl IGlzIHNvbWUgYnVnIGludm9sdmVkLCBlaXRoZXIgYmVjYXVzZSB0aGUNCj4gQlVHKCkgaXMgY29y cmVjdCBvciBiZWNhdXNlIHRoZSBwcmVzZW5jZSBvZiB0aGUgQlVHKCkgaXMgdGhlIGJ1Zw0KPiBp dHNlbGYuDQo+IA0KPiA+PiA+DQo+ID4+ID4gU2VlIHRoZSBjaGFuZ2Vsb2cgaW4gdGhlIHBhdGNo IHRoYXQgSSBzZW50IHRvIHRoZSBsaXN0IHllc3RlcmRheS4NCj4gPj4gPg0KPiA+Pg0KPiA+PiBJ IHNhdyB0aGF0LCBhbmQgaWYgSSdtIG5vdCBtaXN0YWtlbiB0aGVzZSByYWNlcyBhcHBseSB0byBv YmplY3QgbGF5b3V0DQo+ID4+IGFzIHdlbGwsIGFuZCBpbiBhbnkgY2FzZSB0aGV5IGFwcGx5IGlu IG15IGNhc2UuIEhvd2V2ZXIsIGl0IGlzIG5vdA0KPiA+PiBlYXN5IHRvIG1lc3MgYXJvdW5kIHdp dGggTEFZT1VUUkVUVVJOIGluIG9iamVjdCBsYXlvdXQsIGFuZCB0aGVyZSBoYXZlDQo+ID4+IGJl ZW4gc2V2ZXJhbCBkaXNjdXNzaW9ucyBvbiB0aGUgaXNzdWUuIEluIG9uZSBvZiB0aGVzZSBkaXNj dXNzaW9ucw0KPiA+PiBCZW5ueSBjbGFyaWZpZWQgdGhhdCB0aGUgb2JqZWN0IGxheW91dCBjbGll bnQgbXVzdCB3YWl0IGZvciBhbGwNCj4gPj4gaW4tZmxpZ2h0IEkvTyB0byBlbmQuDQo+ID4NCj4g PiBJZiB0aGUgcHJvYmxlbSBpcyB0aGF0IHRoZSBEUyBpcyBmYWlsaW5nIHRvIHJlc3BvbmQsIGhv dyBkb2VzIHRoZSBjbGllbnQNCj4gPiBrbm93IHRoYXQgdGhlIGluLWZsaWdodCBJL08gaGFzIGVu ZGVkPw0KPiA+DQo+IA0KPiBhZnRlciB0aGUgbGFzdCBycGNfY2FsbF9kb25lIGlzIGNhbGxlZCB0 aGVyZSBpcyBubyBJL08gZnJvbSB0aGUNCj4gY2xpZW50J3MgcGVyc3BlY3RpdmUgYW4gdGhlIGxh eW91dCBjYW4gYmUgc2FmZWx5IHJldHVybmVkLCBhZnRlciB3aGljaA0KPiBJL08gY2FuIGJlIGRv bmUgdGhyb3VnaCB0aGUgTURTLg0KDQpOby4gWW91IGFyZSBjb25mdXNlZC4NCg0KQmVpbmcgaW4g cnBjX2NhbGxfZG9uZSBqdXN0IHRlbGxzIHlvdSB0aGF0IHRoZSBjbGllbnQgdGhpbmtzIGl0IGlz IGRvbmUuDQpJdCB0ZWxscyB5b3Ugbm90aGluZyBhYm91dCB3aGF0IHRoZSBEUyBpcyBkb2luZy4g SXQgbWF5IGp1c3Qgc2xvb29vb3dseQ0KYmUgcHJvY2Vzc2luZyB0aGUgV1JJVEUgY2FsbHMgZm9y IGFsbCB5b3Uga25vdy4NCg0KVW5sZXNzIHlvdSBnZXQgYSByZXNwb25zZSBmcm9tIHRoZSBEUyBz YXlpbmcgIkknbSBkb25lIiwgb3IgeW91IGhhdmUNCnNvbWUgb3RoZXIgbWVjaGFuaXNtIHRvIGd1 YXJhbnRlZSB0aGF0IHRoZSBEUyBpcyBkb25lIHdoZW4geW91IHRpbWUgb3V0LA0KdGhlbiB5b3Ug aGF2ZSBubyBndWFyYW50ZWUgdGhhdCBpdCBpcyBzYWZlIHRvIHNlbmQgdG8gTURTLg0KDQo+IElz IHRoZXJlIEkvTyBwZW5kaW5nIGZyb20gdGhlIERTIHBlcnNwZWN0aXZlPyBtYXliZSwgYnV0IGFs bCB5b3UgY2FuDQo+IGRvIGlzIHNlbmQgSS9PIHZpYSBNRFMgYW5kIGhvcGUgaXQgd2lsbCBub3Qg cmFjZS4gZmVuY2luZyBkb2Vzbid0DQo+IHJlYWxseSBzYXZlIHlvdSwgaXQganVzdCBpbXByb3Zl cyB5b3VyIG9kZHMgd2hlcmUgYXBwbGljYWJsZSwgaS5lLg0KPiBmaWxlIGxheW91dC4NCg0KQWN0 dWFsbHksIHdoZW4gcmVhZGluZyB0aGUgaVNDU0kgc3BlYywgaXQgaW1wbGllcyB0aGF0IHRoZXJl IGlzIGEga25vd24NCnNlc3Npb24gdGltZW91dCB0aGF0IGlzIG5lZ290aWF0ZWQgYmV0d2VlbiB0 aGUgaW5pdGlhdG9yIGFuZCB0YXJnZXQuDQpBZnRlciB0aGUgc2Vzc2lvbiBleHBpcmVzLCB5b3Ug YXJlIGd1YXJhbnRlZWQgdGhhdCBubyBmdXJ0aGVyIGNvbW1hbmRzDQphcmUgcnVubmluZyBvbiB0 aGUgRFMuDQoNClRoZSBpRkNQIHByb3RvY29sIGF0dGFjaGVzIGEgdGltZW91dCB2YWx1ZSB0byBl YWNoIGZyYW1lLCBhbmQgaXMNCnN1cHBvc2VkIHRvIGVuZm9yY2UgdGhhdCB0aW1lb3V0Lg0KDQpT byBhdCBsZWFzdCB0aG9zZSBwcm90b2NvbHMgc2hvdWxkIGhhdmUgYSBkZXRlcm1pbmlzdGljIGJl aGF2aW91ciB0aGF0DQpjYW4gYmUgdXNlZCBieSB0aGUgY2xpZW50Lg0KDQo+ID4+IFNvIGZvciBm aWxlIGxheW91dCBpdCBwcm9iYWJseSBtYWtlcyBzZW5zZSwgYnV0IG9iamVjdCBsYXlvdXQgKGFu ZCBpZg0KPiA+PiBJIHVuZGVyc3RhbmQgY29ycmVjdGx5LCBibG9jayBsYXlvdXQgYXMgd2VsbCkg c29tZXRoaW5nIGVsc2UgbmVlZHMgdG8NCj4gPj4gYmUgZG9uZS4gSSB0aG91Z2h0IGFib3V0IHN5 bmMgd2FpdCB3aGVuIHJldHVybmluZyB0aGUgbGF5b3V0IG9uIGVycm9yLA0KPiA+PiBidXQgYWNj b3JkaW5nIHRvIEJvYXogaXQgd2lsbCBjYXVzZSBkZWFkbG9ja3MgKEJvYXogLSBjYW4geW91IHBs ZWFzZQ0KPiA+PiBlbGFib3JhdGU/KS4NCj4gPg0KPiA+IFRoZSBvYmplY3QgbGF5b3V0cmV0dXJu IGhhcyB0aGUgYWJpbGl0eSB0byBwYXNzIGEgdGltZW91dCBlcnJvciB2YWx1ZSB0bw0KPiA+IHRo ZSBNRFMgcHJlY2lzZWx5IGluIG9yZGVyIHRvIGFsbG93IHRoZSBsYXR0ZXIgdG8gZGVhbCB3aXRo IHRoaXMga2luZCBvZg0KPiA+IGlzc3VlLiBTZWUgdGhlIGRlc2NyaXB0aW9uIG9mIHN0cnVjdCBw bmZzX29zZF9pb2VycjQgaW4gcmZjNTY2NC4NCj4gPg0KPiA+IFRoZSBibG9jayBsYXlvdXQgaXMg YWRkaW5nIHRoZSBzYW1lIGFiaWxpdHkgdG8gbGF5b3V0cmV0dXJuIGluIE5GU3Y0LjINCj4gPiAo c2VlIGRyYWZ0LWlldGYtbmZzdjQtbWlub3J2ZXJzaW9uMi0xMy50eHQpIHZpYSB0aGUgc3RydWN0 DQo+ID4gbGF5b3V0cmV0dXJuX2RldmljZV9lcnJvcjQsIHNvIHByZXN1bWFibHkgdGhleSB0b28g aGF2ZSBhIHBsYW4gZm9yDQo+ID4gZGVhbGluZyB3aXRoIHRoaXMga2luZCBvZiBpc3N1ZS4NCj4g Pg0KPiANCj4gSSdsbCB3YWl0IGZvciBCb2F6IChvciBzb21lb25lIGVsc2UpIHRvIGV4cGxhaW4g d2hhdCBleGFjdGx5IGlzICJ0aGlzDQo+IGtpbmQgb2YgaXNzdWUiLiBJIHN0aWxsIGRvbid0IGtu b3cgd2h5IHN5bmMgd2FpdCB3b3VsZCBkZWFkbG9jay4NCj4gDQo+ID4+IEFuZCBjb21lIHRvIHRo aW5rIG9mIGl0LCBuZnM0X3Byb2Nfc2V0YXR0ciBhbHNvIHJldHVybnMgdGhlIGxheW91dA0KPiA+ PiB3aGVuIHRoZXJlIG1heSBiZSBJL08gaW4tZmxpZ2h0IChjb3JyZWN0IG1lIGlmIGknbSB3cm9u ZykuIFNvIEkgZ3Vlc3MNCj4gPj4gcG5mc19yZXR1cm5fbGF5b3V0IHNob3VsZCBzb21laG93IHNv bHZlIHRoaXMgYnkgaXRzZWxmIGJ5IHNheWluZyAiaWYNCj4gPj4gdGhpcyBpcyBmZW5jaW5nIChh IGZsYWcgd2hpY2ggd2lsbCBiZSBzZXQgZm9yIGZpbGUgbGF5b3V0IG9ubHkpLCBnbw0KPiA+PiBh aGVhZCwgb3RoZXJ3aXNlIG1ha2UgdGhlIGxheW91dCBhcyAnbmVlZHMgdG8gYmUgcmV0dXJuZWQn IGFuZCB3aGVuDQo+ID4+IHRoZSBsc2VnIGxpc3RzIGdldHMgZW1wdHkgcmV0dXJuIHRoZSBsYXlv dXQiLg0KPiA+DQo+ID4gVGhlIG9ubHkgbGF5b3V0IHR5cGUgdGhhdCBzZXRzIHRoZSBQTkZTX0xB WU9VVFJFVF9PTl9TRVRBVFRSIGZsYWcgaXMNCj4gPiBvYmplY3RzLCBzbyB0aGF0IHF1ZXN0aW9u IG5lZWRzIHRvIGJlIGRpcmVjdGVkIHRvIEJvYXouDQo+ID4NCj4gDQo+IFllcywgdGhpcyBlbnRp cmUgdGhyZWFkIGlzIG1haW5seSBkaXJlY3RlZCB0byBCb2F6Li4uIEFuZCBJSVJDIGhlIGRpZA0K PiB3YW50IHRvIGltcGxlbWVudCBzb21ldGhpbmcgb2YgdGhhdCBzb3J0LCBhbmQgdGhpcyB0aHJl YWQgd2FzDQo+IGJhc2ljYWxseSBmb3IgYXNraW5nIGhpbSAiZGlkIHlvdT8iLg0KPiANCj4gPiAt LQ0KPiA+IFRyb25kIE15a2xlYnVzdA0KPiA+IExpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0K PiA+DQo+ID4gTmV0QXBwDQo+ID4gVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCj4gPiB3d3cu bmV0YXBwLmNvbQ0KPiA+DQo+IA0KPiANCj4gDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51 eCBORlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBw LmNvbQ0Kd3d3Lm5ldGFwcC5jb20NCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-08-10 at 00:48 +0800, Peng Tao wrote: > On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond > <Trond.Myklebust@netapp.com> wrote: > > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: > >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond > >> <Trond.Myklebust@netapp.com> wrote: > >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: > >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond > >> >> <Trond.Myklebust@netapp.com> wrote: > >> >> >> -----Original Message----- > >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar > >> >> >> Sent: Thursday, August 09, 2012 9:03 AM > >> >> >> To: Boaz Harrosh; NFS list > >> >> >> Cc: Benny Halevy > >> >> >> Subject: return layout on error, BUG/deadlock > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> As a result of some experiments, I wanted to see what happens when I > >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the > >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my > >> >> >> modified version of the object layout driver, I get the same BUG Tigran > >> >> >> reported about yesterday: > >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > >> >> >> > >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that > >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there > >> >> >> is in-flight I/O. > >> >> > > >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. > >> >> > >> >> to what change are you referring? > >> > > >> > As I stated in the changelog of the patch that I sent to the list > >> > yesterday, the behaviour is due to commit 0a57cdac3f. > >> > > >> >> > > >> >> > See the changelog in the patch that I sent to the list yesterday. > >> >> > > >> >> > >> >> I saw that, and if I'm not mistaken these races apply to object layout > >> >> as well, and in any case they apply in my case. However, it is not > >> >> easy to mess around with LAYOUTRETURN in object layout, and there have > >> >> been several discussions on the issue. In one of these discussions > >> >> Benny clarified that the object layout client must wait for all > >> >> in-flight I/O to end. > >> > > >> > If the problem is that the DS is failing to respond, how does the client > >> > know that the in-flight I/O has ended? > >> > > >> >> So for file layout it probably makes sense, but object layout (and if > >> >> I understand correctly, block layout as well) something else needs to > >> >> be done. I thought about sync wait when returning the layout on error, > >> >> but according to Boaz it will cause deadlocks (Boaz - can you please > >> >> elaborate?). > >> > > >> > The object layoutreturn has the ability to pass a timeout error value to > >> > the MDS precisely in order to allow the latter to deal with this kind of > >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. > >> > > >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 > >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct > >> > layoutreturn_device_error4, so presumably they too have a plan for > >> > dealing with this kind of issue. > >> It is one thing to tell MDS that there is DS access error by sending > >> layoutreturn, and it is another thing to return a layout even if there > >> is overlapping in-flight DS IO... > >> > >> I certainly agree that client is entitled to return layout to inform > >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is > >> just an optimization and should only be done when there is no > >> in-flight IO (at least for block layout) IMHO. > > > > HOW DO YOU GUARANTEE NO IN-FLIGHT IO? > > > I don't. That's why I don't return layout in pnfs_ld_write_done(). And > for layoutreturn upon cb_layoutreturn, block layout client needs to do > timed-lease IO fencing per rfc5663, but it is not implemented in Linux > client. The timed-lease IO fencing described in rfc5663 is about informing the server about how long the client expects a command to succeed or fail. It doesn't offer any advice for how the client is to deal with an unresponsive DS. What you need here is help from the underlying transport protocol. As I said in the email to Idan, when researching iSCSI and iFCP, I found what appears to be mechanisms for reliably timing out. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
On Thu, Aug 9, 2012 at 10:12 PM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Thu, 2012-08-09 at 19:54 +0300, Idan Kedar wrote: >> On Thu, Aug 9, 2012 at 7:06 PM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> -----Original Message----- >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> To: Boaz Harrosh; NFS list >> >> >> Cc: Benny Halevy >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> Hi, >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> reported about yesterday: >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> is in-flight I/O. >> >> > >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> to what change are you referring? >> > >> > As I stated in the changelog of the patch that I sent to the list >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> > >> >> I'm not sure I'm following. I was talking about the state of the code >> v3.5, not to any specific patch/change. since the client yelled >> "BUG!!!" and crashed, there is some bug involved, either because the >> BUG() is correct or because the presence of the BUG() is the bug >> itself. >> >> >> > >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> > >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> as well, and in any case they apply in my case. However, it is not >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> been several discussions on the issue. In one of these discussions >> >> Benny clarified that the object layout client must wait for all >> >> in-flight I/O to end. >> > >> > If the problem is that the DS is failing to respond, how does the client >> > know that the in-flight I/O has ended? >> > >> >> after the last rpc_call_done is called there is no I/O from the >> client's perspective an the layout can be safely returned, after which >> I/O can be done through the MDS. > > No. You are confused. > > Being in rpc_call_done just tells you that the client thinks it is done. > It tells you nothing about what the DS is doing. It may just slooooowly > be processing the WRITE calls for all you know. > That's what I meant by "from the client's perspective". At this point, the client can say "I'm not *using* this layout anymore" because even if some I/O on its way to the DS originates in the client, the client can't do anything about it. The fate of this I/O is meaningless to the client at that point. This is the best the client can do to help the MDS preserve integrity - keep the layout as long as it has information about its I/O. > Unless you get a response from the DS saying "I'm done", or you have > some other mechanism to guarantee that the DS is done when you time out, > then you have no guarantee that it is safe to send to MDS. > Exactly. What if I don't have such a mechanism? My options are: * Ask the MDS to somehow fence in-flight I/O and resend through the MDS. * Return an error to the user. What if the server does not support fencing? In that case we shouldn't send through MDS while there is in-flight I/O. Do we want to assume fencing is implemented? And if fencing isn't implemented and we return an error to the user - how do we know when it is safe to perform any I/O again? And even if fencing is implemented, I don't think the client should rely on it, because the whole idea beind pNFS is a smart client that offloads what it can from the servers. >> Is there I/O pending from the DS perspective? maybe, but all you can >> do is send I/O via MDS and hope it will not race. fencing doesn't >> really save you, it just improves your odds where applicable, i.e. >> file layout. One correction to what I said: Fencing does save you, if it's implemented. LAYOUTRETURN doesn't save you (when fencing is not implemented). > > Actually, when reading the iSCSI spec, it implies that there is a known > session timeout that is negotiated between the initiator and target. > After the session expires, you are guaranteed that no further commands > are running on the DS. > > The iFCP protocol attaches a timeout value to each frame, and is > supposed to enforce that timeout. > > So at least those protocols should have a deterministic behaviour that > can be used by the client. > >> >> So for file layout it probably makes sense, but object layout (and if >> >> I understand correctly, block layout as well) something else needs to >> >> be done. I thought about sync wait when returning the layout on error, >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> elaborate?). >> > >> > The object layoutreturn has the ability to pass a timeout error value to >> > the MDS precisely in order to allow the latter to deal with this kind of >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> > >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> > layoutreturn_device_error4, so presumably they too have a plan for >> > dealing with this kind of issue. >> > Then on layoutreturn we could call a new layout driver method, end_io or something returns only when there is no danger of race (either by waiting for in-flight I/O to complete/timeout/error out or by fencing) and only then return the layout. This works for me. I don't know if it works for Boaz though. >> >> I'll wait for Boaz (or someone else) to explain what exactly is "this >> kind of issue". I still don't know why sync wait would deadlock. >> >> >> And come to think of it, nfs4_proc_setattr also returns the layout >> >> when there may be I/O in-flight (correct me if i'm wrong). So I guess >> >> pnfs_return_layout should somehow solve this by itself by saying "if >> >> this is fencing (a flag which will be set for file layout only), go >> >> ahead, otherwise make the layout as 'needs to be returned' and when >> >> the lseg lists gets empty return the layout". >> > >> > The only layout type that sets the PNFS_LAYOUTRET_ON_SETATTR flag is >> > objects, so that question needs to be directed to Boaz. >> > >> >> Yes, this entire thread is mainly directed to Boaz... And IIRC he did >> want to implement something of that sort, and this thread was >> basically for asking him "did you?". >> >> > -- >> > Trond Myklebust >> > Linux NFS client maintainer >> > >> > NetApp >> > Trond.Myklebust@netapp.com >> > www.netapp.com >> > >> >> >> > > -- > Trond Myklebust > Linux NFS client maintainer > > NetApp > Trond.Myklebust@netapp.com > www.netapp.com >
On Fri, Aug 10, 2012 at 3:31 AM, Myklebust, Trond <Trond.Myklebust@netapp.com> wrote: > On Fri, 2012-08-10 at 00:48 +0800, Peng Tao wrote: >> On Fri, Aug 10, 2012 at 12:37 AM, Myklebust, Trond >> <Trond.Myklebust@netapp.com> wrote: >> > On Fri, 2012-08-10 at 00:34 +0800, Peng Tao wrote: >> >> On Fri, Aug 10, 2012 at 12:06 AM, Myklebust, Trond >> >> <Trond.Myklebust@netapp.com> wrote: >> >> > On Thu, 2012-08-09 at 18:49 +0300, Idan Kedar wrote: >> >> >> On Thu, Aug 9, 2012 at 5:05 PM, Myklebust, Trond >> >> >> <Trond.Myklebust@netapp.com> wrote: >> >> >> >> -----Original Message----- >> >> >> >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> >> >> >> owner@vger.kernel.org] On Behalf Of Idan Kedar >> >> >> >> Sent: Thursday, August 09, 2012 9:03 AM >> >> >> >> To: Boaz Harrosh; NFS list >> >> >> >> Cc: Benny Halevy >> >> >> >> Subject: return layout on error, BUG/deadlock >> >> >> >> >> >> >> >> Hi, >> >> >> >> >> >> >> >> As a result of some experiments, I wanted to see what happens when I >> >> >> >> inject an error (hard coded) to the object layout driver. the patch is at the >> >> >> >> bottom of this mail. the reason I did this is because when I inject errors in my >> >> >> >> modified version of the object layout driver, I get the same BUG Tigran >> >> >> >> reported about yesterday: >> >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); >> >> >> >> >> >> >> >> In my modified version (based on kernel 3.3), the bug seems to be that >> >> >> >> pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there >> >> >> >> is in-flight I/O. >> >> >> > >> >> >> > That is not a bug. It is an intentional change in order to allow the MDS to fence off the outstanding writes (if it can do so) before we retransmit them as write-through-MDS. Otherwise, you risk races between the outstanding writes-to-DS and the new writes-through-MDS. >> >> >> >> >> >> to what change are you referring? >> >> > >> >> > As I stated in the changelog of the patch that I sent to the list >> >> > yesterday, the behaviour is due to commit 0a57cdac3f. >> >> > >> >> >> > >> >> >> > See the changelog in the patch that I sent to the list yesterday. >> >> >> > >> >> >> >> >> >> I saw that, and if I'm not mistaken these races apply to object layout >> >> >> as well, and in any case they apply in my case. However, it is not >> >> >> easy to mess around with LAYOUTRETURN in object layout, and there have >> >> >> been several discussions on the issue. In one of these discussions >> >> >> Benny clarified that the object layout client must wait for all >> >> >> in-flight I/O to end. >> >> > >> >> > If the problem is that the DS is failing to respond, how does the client >> >> > know that the in-flight I/O has ended? >> >> > >> >> >> So for file layout it probably makes sense, but object layout (and if >> >> >> I understand correctly, block layout as well) something else needs to >> >> >> be done. I thought about sync wait when returning the layout on error, >> >> >> but according to Boaz it will cause deadlocks (Boaz - can you please >> >> >> elaborate?). >> >> > >> >> > The object layoutreturn has the ability to pass a timeout error value to >> >> > the MDS precisely in order to allow the latter to deal with this kind of >> >> > issue. See the description of struct pnfs_osd_ioerr4 in rfc5664. >> >> > >> >> > The block layout is adding the same ability to layoutreturn in NFSv4.2 >> >> > (see draft-ietf-nfsv4-minorversion2-13.txt) via the struct >> >> > layoutreturn_device_error4, so presumably they too have a plan for >> >> > dealing with this kind of issue. >> >> It is one thing to tell MDS that there is DS access error by sending >> >> layoutreturn, and it is another thing to return a layout even if there >> >> is overlapping in-flight DS IO... >> >> >> >> I certainly agree that client is entitled to return layout to inform >> >> MDS about DS errors and also avoid possible cb_layoutrecall. But it is >> >> just an optimization and should only be done when there is no >> >> in-flight IO (at least for block layout) IMHO. >> > >> > HOW DO YOU GUARANTEE NO IN-FLIGHT IO? >> > >> I don't. That's why I don't return layout in pnfs_ld_write_done(). And >> for layoutreturn upon cb_layoutreturn, block layout client needs to do >> timed-lease IO fencing per rfc5663, but it is not implemented in Linux >> client. > > The timed-lease IO fencing described in rfc5663 is about informing the > server about how long the client expects a command to succeed or fail. > It doesn't offer any advice for how the client is to deal with an > unresponsive DS. > > What you need here is help from the underlying transport protocol. As I > said in the email to Idan, when researching iSCSI and iFCP, I found what > appears to be mechanisms for reliably timing out. Just checked and found that the layoutreturn-on-error behavior only affects object and file layout. So block layout stays out and safe. That's all I would ask for. Thanks for your explanation. Best, Tao -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c index 24a49d4..e9d7b45 100644 --- a/fs/exofs/ore.c +++ b/fs/exofs/ore.c @@ -426,9 +426,14 @@ int ore_check_io(struct ore_io_state *ios, ore_on_dev_error on_dev_error) if (unlikely(!or)) continue; - ret = osd_req_decode_sense(or, &osi); - if (likely(!ret)) - continue; + if (jiffies % 6 == 0) { + osi.osd_err_pri = OSD_ERR_PRI_EIO; + ret = -EIO; + } else { + ret = osd_req_decode_sense(or, &osi); + if (likely(!ret)) + continue; + } if (OSD_ERR_PRI_CLEAR_PAGES == osi.osd_err_pri) {