Message ID | 1468817945.5273.2.camel@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: > Actually... The problem might be that a previous attribute update is > marking the attribute cache as being revalidated. Does the following > patch help? It doesn't. Also with your most recent linux-next branch the test now cause the systems to OOM with or without your patch (with mine it's still fine). I tested with your writeback branch from about two or three days ago before, and with that + your patch it also 'just fails' and doesn't OOM. Looks like whatever causes the bug also creates a temporarily memory leak when combined with recent changes from your tree, most likely something from the pnfs branch. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >> Actually... The problem might be that a previous attribute update is >> marking the attribute cache as being revalidated. Does the following >> patch help? > > It doesn't. Also with your most recent linux-next branch the test > now cause the systems to OOM with or without your patch (with mine > it's > still fine). I tested with your writeback branch from about two or > three days ago before, and with that + your patch it also 'just fails' > and doesn't OOM. Looks like whatever causes the bug also creates > a temporarily memory leak when combined with recent changes from your > tree, most likely something from the pnfs branch. I couldn't find the memory leak using kmemleak, but it OOMs pretty quick. If I insert an mdelay(200) just after the lookup_again: marker in pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a loop on that marker: [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 layout=ffff8800392bca58 [ 1230.636729] pnfs_find_lseg:Begin [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 [ 1230.638582] --> send_layoutget [ 1230.639499] --> nfs4_proc_layoutget [ 1230.640525] --> nfs4_layoutget_prepare [ 1230.641479] --> nfs41_setup_sequence [ 1230.641581] <-- nfs4_proc_layoutget status=-512 [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 [ 1230.646356] <-- nfs4_layoutget_prepare [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 slotid=0 max_slotid=0 cache_this=0 [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 len:4096 mc:4096 [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, lo_type:0x5, lo.len:48 [ 1230.651331] --> nfs4_layoutget_done [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 [ 1230.655606] nfs41_sequence_done: Error 0 free the slot [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 [ 1230.657739] <-- nfs4_layoutget_done [ 1230.658650] --> nfs4_layoutget_release [ 1230.659626] <-- nfs4_layoutget_release This debug output is identical for every cycle of the loop. Have to stop for the day.. more tomorrow. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > >> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>> Actually... The problem might be that a previous attribute update is >>> marking the attribute cache as being revalidated. Does the following >>> patch help? >> >> It doesn't. Also with your most recent linux-next branch the test >> now cause the systems to OOM with or without your patch (with mine it's >> still fine). I tested with your writeback branch from about two or >> three days ago before, and with that + your patch it also 'just fails' >> and doesn't OOM. Looks like whatever causes the bug also creates >> a temporarily memory leak when combined with recent changes from your >> tree, most likely something from the pnfs branch. > > I couldn't find the memory leak using kmemleak, but it OOMs pretty quick. If I > insert an mdelay(200) just after the lookup_again: marker in > pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a loop on > that marker: > > [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 layout=ffff8800392bca58 > [ 1230.636729] pnfs_find_lseg:Begin > [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 > [ 1230.638582] --> send_layoutget > [ 1230.639499] --> nfs4_proc_layoutget > [ 1230.640525] --> nfs4_layoutget_prepare > [ 1230.641479] --> nfs41_setup_sequence > [ 1230.641581] <-- nfs4_proc_layoutget status=-512 > [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=31 > [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0 > [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 > [ 1230.646356] <-- nfs4_layoutget_prepare > [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 slotid=0 max_slotid=0 cache_this=0 > [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 len:4096 mc:4096 > [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, lo_type:0x5, lo.len:48 > [ 1230.651331] --> nfs4_layoutget_done > [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=31 > [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1 > [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 > [ 1230.655606] nfs41_sequence_done: Error 0 free the slot > [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > [ 1230.657739] <-- nfs4_layoutget_done > [ 1230.658650] --> nfs4_layoutget_release > [ 1230.659626] <-- nfs4_layoutget_release > > This debug output is identical for every cycle of the loop. Have to stop for the > day.. more tomorrow. > > Ben > Duh… It’s this patch: pNFS: Fix post-layoutget error handling in pnfs_update_layout() We have to pass through fatal errors… I’ll fix it. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19 Jul 2016, at 16:00, Benjamin Coddington wrote: > On 18 Jul 2016, at 23:58, hch@infradead.org wrote: > >> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>> Actually... The problem might be that a previous attribute update is >>> marking the attribute cache as being revalidated. Does the following >>> patch help? >> >> It doesn't. Also with your most recent linux-next branch the test >> now cause the systems to OOM with or without your patch (with mine >> it's >> still fine). I tested with your writeback branch from about two or >> three days ago before, and with that + your patch it also 'just >> fails' >> and doesn't OOM. Looks like whatever causes the bug also creates >> a temporarily memory leak when combined with recent changes from your >> tree, most likely something from the pnfs branch. > > I couldn't find the memory leak using kmemleak, but it OOMs pretty > quick. If I > insert an mdelay(200) just after the lookup_again: marker in > pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a > loop on > that marker: > > [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 > layout=ffff8800392bca58 > [ 1230.636729] pnfs_find_lseg:Begin > [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 > [ 1230.638582] --> send_layoutget > [ 1230.639499] --> nfs4_proc_layoutget > [ 1230.640525] --> nfs4_layoutget_prepare > [ 1230.641479] --> nfs41_setup_sequence > [ 1230.641581] <-- nfs4_proc_layoutget status=-512 > [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 > highest_used=4294967295 max_slots=31 > [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 > slotid=0 > [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 > [ 1230.646356] <-- nfs4_layoutget_prepare > [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 > slotid=0 max_slotid=0 cache_this=0 > [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 > len:4096 mc:4096 > [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, > lo_type:0x5, lo.len:48 > [ 1230.651331] --> nfs4_layoutget_done > [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 > max_slots=31 > [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 > slotid=1 > [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 > [ 1230.655606] nfs41_sequence_done: Error 0 free the slot > [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295 > [ 1230.657739] <-- nfs4_layoutget_done > [ 1230.658650] --> nfs4_layoutget_release > [ 1230.659626] <-- nfs4_layoutget_release > > This debug output is identical for every cycle of the loop. Except for the monotonically incrementing sequence id! sorry.. :/ Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19 Jul 2016, at 16:06, Trond Myklebust wrote: >> On Jul 19, 2016, at 16:00, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 18 Jul 2016, at 23:58, hch@infradead.org wrote: >> >>> On Mon, Jul 18, 2016 at 04:59:09AM +0000, Trond Myklebust wrote: >>>> Actually... The problem might be that a previous attribute update >>>> is >>>> marking the attribute cache as being revalidated. Does the >>>> following >>>> patch help? >>> >>> It doesn't. Also with your most recent linux-next branch the test >>> now cause the systems to OOM with or without your patch (with mine >>> it's >>> still fine). I tested with your writeback branch from about two or >>> three days ago before, and with that + your patch it also 'just >>> fails' >>> and doesn't OOM. Looks like whatever causes the bug also creates >>> a temporarily memory leak when combined with recent changes from >>> your >>> tree, most likely something from the pnfs branch. >> >> I couldn't find the memory leak using kmemleak, but it OOMs pretty >> quick. If I >> insert an mdelay(200) just after the lookup_again: marker in >> pnfs_update_layout() it doesn't OOM, but it seems stuck forever in a >> loop on >> that marker: >> >> [ 1230.635586] pnfs_find_alloc_layout Begin ino=ffff88003ef986f8 >> layout=ffff8800392bca58 >> [ 1230.636729] pnfs_find_lseg:Begin >> [ 1230.637538] pnfs_find_lseg:Return lseg (null) ref 0 >> [ 1230.638582] --> send_layoutget >> [ 1230.639499] --> nfs4_proc_layoutget >> [ 1230.640525] --> nfs4_layoutget_prepare >> [ 1230.641479] --> nfs41_setup_sequence >> [ 1230.641581] <-- nfs4_proc_layoutget status=-512 >> [ 1230.643288] --> nfs4_alloc_slot used_slots=0000 >> highest_used=4294967295 max_slots=31 >> [ 1230.644348] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 >> slotid=0 >> [ 1230.645373] <-- nfs41_setup_sequence slotid=0 seqid=4376 >> [ 1230.646356] <-- nfs4_layoutget_prepare >> [ 1230.647357] encode_sequence: sessionid=1468956665:2:3:0 seqid=4376 >> slotid=0 max_slotid=0 cache_this=0 >> [ 1230.648522] encode_layoutget: 1st type:0x5 iomode:2 off:122880 >> len:4096 mc:4096 >> [ 1230.650182] decode_layoutget roff:122880 rlen:4096 riomode:2, >> lo_type:0x5, lo.len:48 >> [ 1230.651331] --> nfs4_layoutget_done >> [ 1230.652233] --> nfs4_alloc_slot used_slots=0001 highest_used=0 >> max_slots=31 >> [ 1230.653409] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 >> slotid=1 >> [ 1230.654547] nfs4_free_slot: slotid 1 highest_used_slotid 0 >> [ 1230.655606] nfs41_sequence_done: Error 0 free the slot >> [ 1230.656635] nfs4_free_slot: slotid 0 highest_used_slotid >> 4294967295 >> [ 1230.657739] <-- nfs4_layoutget_done >> [ 1230.658650] --> nfs4_layoutget_release >> [ 1230.659626] <-- nfs4_layoutget_release >> >> This debug output is identical for every cycle of the loop. Have to >> stop for the >> day.. more tomorrow. >> >> Ben >> > > Duh… It’s this patch: pNFS: Fix post-layoutget error handling in > pnfs_update_layout() > We have to pass through fatal errors… I’ll fix it. That's indeed fixed it up, and generic/207 passes now. Thanks! Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: > > > This debug output is identical for every cycle of the loop. Have to > > > stop for the > > > day.. more tomorrow. > > > > > > Ben > > > > > > > Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in > > pnfs_update_layout() > > We have to pass through fatal errors??? I???ll fix it. > > That's indeed fixed it up, and generic/207 passes now. Thanks! So I spoke too soon in my last mail, generic/207 still fails for me with Trond's linux-next tree, although much later in the test now. Does it include the changes that are supposed to fix the issue? -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 Jul 2016, at 4:22, hch@infradead.org wrote: > On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>> This debug output is identical for every cycle of the loop. Have to >>>> stop for the >>>> day.. more tomorrow. >>>> >>>> Ben >>>> >>> >>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in >>> pnfs_update_layout() >>> We have to pass through fatal errors??? I???ll fix it. >> >> That's indeed fixed it up, and generic/207 passes now. Thanks! > > So I spoke too soon in my last mail, generic/207 still fails for me > with Trond's linux-next tree, although much later in the test now. > > Does it include the changes that are supposed to fix the issue? It should -- the v2 that fixed 207 for me is 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again too. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: > On 21 Jul 2016, at 4:22, hch@infradead.org wrote: > >> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>> This debug output is identical for every cycle of the loop. Have >>>>> to >>>>> stop for the >>>>> day.. more tomorrow. >>>>> >>>>> Ben >>>>> >>>> >>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling >>>> in >>>> pnfs_update_layout() >>>> We have to pass through fatal errors??? I???ll fix it. >>> >>> That's indeed fixed it up, and generic/207 passes now. Thanks! >> >> So I spoke too soon in my last mail, generic/207 still fails for me >> with Trond's linux-next tree, although much later in the test now. >> >> Does it include the changes that are supposed to fix the issue? > > It should -- the v2 that fixed 207 for me is > 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was > e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again > too. Looks like we're back to the original problem - it fails with the inode size is 4k less than expected. The reason it worked for me was I had pnfs_ld debugging turned up which slowed things down enough to somehow catch the right size. Looks like the right size is returned in the CLOSE, but the inode's not getting updated. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: > On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: > >> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >> >>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>>> This debug output is identical for every cycle of the loop. Have >>>>>> to >>>>>> stop for the >>>>>> day.. more tomorrow. >>>>>> >>>>>> Ben >>>>>> >>>>> >>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling >>>>> in >>>>> pnfs_update_layout() >>>>> We have to pass through fatal errors??? I???ll fix it. >>>> >>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>> >>> So I spoke too soon in my last mail, generic/207 still fails for me >>> with Trond's linux-next tree, although much later in the test now. >>> >>> Does it include the changes that are supposed to fix the issue? >> >> It should -- the v2 that fixed 207 for me is >> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was >> e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again >> too. > > Looks like we're back to the original problem - it fails with the > inode size is 4k less than expected. > > The reason it worked for me was I had pnfs_ld debugging turned up > which slowed things down enough to somehow catch the right size. > > Looks like the right size is returned in the CLOSE, but the inode's > not getting updated. And the size is right in the last LAYOUTCOMMIT response, of course. Is this the problem? 6024 static int decode_layoutcommit(struct xdr_stream *xdr, ... 6040 sizechanged = be32_to_cpup(p); 6041 6042 if (sizechanged) { 6043 /* throw away new size */ 6044 p = xdr_inline_decode(xdr, 8); Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> wrote: > > On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: > >> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: >> >>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >>> >>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington wrote: >>>>>>> This debug output is identical for every cycle of the loop. Have to >>>>>>> stop for the >>>>>>> day.. more tomorrow. >>>>>>> >>>>>>> Ben >>>>>>> >>>>>> >>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error handling in >>>>>> pnfs_update_layout() >>>>>> We have to pass through fatal errors??? I???ll fix it. >>>>> >>>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>>> >>>> So I spoke too soon in my last mail, generic/207 still fails for me >>>> with Trond's linux-next tree, although much later in the test now. >>>> >>>> Does it include the changes that are supposed to fix the issue? >>> >>> It should -- the v2 that fixed 207 for me is 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again too. >> >> Looks like we're back to the original problem - it fails with the inode size is 4k less than expected. >> >> The reason it worked for me was I had pnfs_ld debugging turned up which slowed things down enough to somehow catch the right size. >> >> Looks like the right size is returned in the CLOSE, but the inode's not getting updated. > > And the size is right in the last LAYOUTCOMMIT response, of course. Is this the problem? > > 6024 static int decode_layoutcommit(struct xdr_stream *xdr, > ... > 6040 sizechanged = be32_to_cpup(p); > 6041 > 6042 if (sizechanged) { > 6043 /* throw away new size */ > 6044 p = xdr_inline_decode(xdr, 8); > > > Ben > That shouldn’t really matter since we have a GETATTR immediately following the LAYOUTGET operation. Assuming that nfs4_proc_layoutcommit() actually gets called, it is supposed to update the inode correctly. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 Jul 2016, at 8:46, Trond Myklebust wrote: >> On Jul 21, 2016, at 05:52, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 21 Jul 2016, at 5:10, Benjamin Coddington wrote: >> >>> On 21 Jul 2016, at 4:32, Benjamin Coddington wrote: >>> >>>> On 21 Jul 2016, at 4:22, hch@infradead.org wrote: >>>> >>>>> On Wed, Jul 20, 2016 at 11:03:06AM -0400, Benjamin Coddington >>>>> wrote: >>>>>>>> This debug output is identical for every cycle of the loop. >>>>>>>> Have to >>>>>>>> stop for the >>>>>>>> day.. more tomorrow. >>>>>>>> >>>>>>>> Ben >>>>>>>> >>>>>>> >>>>>>> Duh??? It???s this patch: pNFS: Fix post-layoutget error >>>>>>> handling in >>>>>>> pnfs_update_layout() >>>>>>> We have to pass through fatal errors??? I???ll fix it. >>>>>> >>>>>> That's indeed fixed it up, and generic/207 passes now. Thanks! >>>>> >>>>> So I spoke too soon in my last mail, generic/207 still failjs for >>>>> me >>>>> with Trond's linux-next tree, although much later in the test now. >>>>> >>>>> Does it include the changes that are supposed to fix the issue? >>>> >>>> It should -- the v2 that fixed 207 for me is >>>> 56b38a1f7c781519eef09c1668a3c97ea911f86b, the first version was >>>> e35c2a0b3cd062a8941d21511719391b64437427, I think. I'll test again >>>> too. >>> >>> Looks like we're back to the original problem - it fails with the >>> inode size is 4k less than expected. >>> >>> The reason it worked for me was I had pnfs_ld debugging turned up >>> which slowed things down enough to somehow catch the right size. >>> >>> Looks like the right size is returned in the CLOSE, but the inode's >>> not getting updated. >> >> And the size is right in the last LAYOUTCOMMIT response, of course. >> Is this the problem? >> >> 6024 static int decode_layoutcommit(struct xdr_stream *xdr, >> ... >> 6040 sizechanged = be32_to_cpup(p); >> 6041 >> 6042 if (sizechanged) { >> 6043 /* throw away new size */ >> 6044 p = xdr_inline_decode(xdr, 8); >> >> >> Ben >> > > That shouldn’t really matter since we have a GETATTR immediately > following the LAYOUTGET operation. Assuming that > nfs4_proc_layoutcommit() actually gets called, it is supposed to > update the inode correctly. So back to Christoph's point earlier: On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: > This one breaks xfstests generic/207 on block/scsi layout for me. The > reason for that is that we need a layoutcommit after writing out all > data for the file for the file size to be updated on the server. You responded: On 18 Jul 2016, at 0:32, Trond Myklebust wrote: > I’m not understanding this argument. Why do we care if the file size > is up > to date on the server if we’re not sending an actual GETATTR on the > wire > to retrieve the file size? I guess the answer might be because we can get it back from the last LAYOUTCOMMIT. This test has repeated appending 4k and has this pattern on the wire: NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 858) GETATTR NFS 374 V4 Call LAYOUTCOMMIT NFS 314 V4 Reply (Call In 856) LAYOUTGET NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 864) GETATTR NFS 374 V4 Call LAYOUTCOMMIT NFS 314 V4 Reply (Call In 862) LAYOUTGET NFS 334 V4 Call LAYOUTGET NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT NFS 294 V4 Call GETATTR FH: 0x4f5528b0 NFS 442 V4 Reply (Call In 870) GETATTR NFS 314 V4 Reply (Call In 868) LAYOUTGET NFS 374 V4 Call LAYOUTCOMMIT NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT NFS 314 V4 Call CLOSE StateID: 0x54d9 NFS 294 V4 Reply (Call In 876) CLOSE That last LAYOUTCOMMIT and the CLOSE have the size we want. The previous GETATTR is 4k short. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> wrote: > > So back to Christoph's point earlier: > > On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >> This one breaks xfstests generic/207 on block/scsi layout for me. The >> reason for that is that we need a layoutcommit after writing out all >> data for the file for the file size to be updated on the server. > > You responded: > > On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >> I’m not understanding this argument. Why do we care if the file size is up >> to date on the server if we’re not sending an actual GETATTR on the wire >> to retrieve the file size? > > I guess the answer might be because we can get it back from the last > LAYOUTCOMMIT. > The patch that I followed up with should now ensure that we do not mark the attribute cache as up to date if there is a LAYOUTCOMMIT pending. IOW: when the pNFS write is done, it is expected to do 2 things: 1) mark the inode for LAYOUTCOMMIT 2) mark the attribute cache as invalid (because we know the change attribute, mtime, ctime need to be updates) In the case of blocks pNFS write: The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should take care of (1) The call to nfs_writeback_update_inode() in nfs4_write_done_cb() should take care of (2). Provided that these 2 calls are performed in the above order, then any call to nfs_getattr() which has not been preceded by a call to nfs4_proc_layoutcommit() should trigger the call to __nfs_revalidate_inode(). > This test has repeated appending 4k and has this pattern on the wire: > > NFS 334 V4 Call LAYOUTGET > NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT > NFS 294 V4 Call GETATTR FH: 0x4f5528b0 > NFS 442 V4 Reply (Call In 858) GETATTR > NFS 374 V4 Call LAYOUTCOMMIT > NFS 314 V4 Reply (Call In 856) LAYOUTGET > NFS 334 V4 Call LAYOUTGET > NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT > NFS 294 V4 Call GETATTR FH: 0x4f5528b0 > NFS 442 V4 Reply (Call In 864) GETATTR > NFS 374 V4 Call LAYOUTCOMMIT > NFS 314 V4 Reply (Call In 862) LAYOUTGET > NFS 334 V4 Call LAYOUTGET > NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT > NFS 294 V4 Call GETATTR FH: 0x4f5528b0 > NFS 442 V4 Reply (Call In 870) GETATTR > NFS 314 V4 Reply (Call In 868) LAYOUTGET > NFS 374 V4 Call LAYOUTCOMMIT > NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT > NFS 314 V4 Call CLOSE StateID: 0x54d9 > NFS 294 V4 Reply (Call In 876) CLOSE > > That last LAYOUTCOMMIT and the CLOSE have the size we want. The previous > GETATTR is 4k short. When you say “4k short”, do you mean that it differs from the value returned by the LAYOUTCOMMIT that immediately precedes it? It looks as if there is a LAYOUTGET immediately following it, so presumably the writes have not all completed yet. Cheers Trond
> On Jul 21, 2016, at 09:20, Trond Myklebust <trondmy@primarydata.com> wrote: > > >> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> wrote: >> >> So back to Christoph's point earlier: >> >> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>> This one breaks xfstests generic/207 on block/scsi layout for me. The >>> reason for that is that we need a layoutcommit after writing out all >>> data for the file for the file size to be updated on the server. >> >> You responded: >> >> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>> I’m not understanding this argument. Why do we care if the file size is up >>> to date on the server if we’re not sending an actual GETATTR on the wire >>> to retrieve the file size? >> >> I guess the answer might be because we can get it back from the last >> LAYOUTCOMMIT. >> > > The patch that I followed up with should now ensure that we do not mark the attribute cache as up to date if there is a LAYOUTCOMMIT pending. > IOW: when the pNFS write is done, it is expected to do 2 things: > > 1) mark the inode for LAYOUTCOMMIT > 2) mark the attribute cache as invalid (because we know the change attribute, mtime, ctime need to be updates) > > In the case of blocks pNFS write: > The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should take care of (1) > The call to nfs_writeback_update_inode() in nfs4_write_done_cb() should take care of (2). > > Provided that these 2 calls are performed in the above order, then any call to nfs_getattr() which has not been preceded by a call to nfs4_proc_layoutcommit() should trigger the call to __nfs_revalidate_inode(). By the way, it looks as if the ‘files’ layout type fails to do (2). I’ll add a fix for that. Cheers Trond
On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> So back to Christoph's point earlier: >> >> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>> This one breaks xfstests generic/207 on block/scsi layout for me. >>> The >>> reason for that is that we need a layoutcommit after writing out all >>> data for the file for the file size to be updated on the server. >> >> You responded: >> >> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>> I’m not understanding this argument. Why do we care if the file >>> size is up >>> to date on the server if we’re not sending an actual GETATTR on >>> the wire >>> to retrieve the file size? >> >> I guess the answer might be because we can get it back from the last >> LAYOUTCOMMIT. >> > > The patch that I followed up with should now ensure that we do not > mark the attribute cache as up to date if there is a LAYOUTCOMMIT > pending. > IOW: when the pNFS write is done, it is expected to do 2 things: > > 1) mark the inode for LAYOUTCOMMIT > 2) mark the attribute cache as invalid (because we know the change > attribute, mtime, ctime need to be updates) > > In the case of blocks pNFS write: > The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should > take care of (1) > The call to nfs_writeback_update_inode() in nfs4_write_done_cb() > should take care of (2). > > Provided that these 2 calls are performed in the above order, then any > call to nfs_getattr() which has not been preceded by a call to > nfs4_proc_layoutcommit() should trigger the call to > __nfs_revalidate_inode(). OK, so maybe things are out of order here.. Thanks - this is helpful. >> This test has repeated appending 4k and has this pattern on the wire: >> >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 854) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 858) GETATTR >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 314 V4 Reply (Call In 856) LAYOUTGET >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 860) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 864) GETATTR >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 314 V4 Reply (Call In 862) LAYOUTGET >> NFS 334 V4 Call LAYOUTGET >> NFS 290 V4 Reply (Call In 866) LAYOUTCOMMIT >> NFS 294 V4 Call GETATTR FH: 0x4f5528b0 >> NFS 442 V4 Reply (Call In 870) GETATTR >> NFS 314 V4 Reply (Call In 868) LAYOUTGET >> NFS 374 V4 Call LAYOUTCOMMIT >> NFS 290 V4 Reply (Call In 874) LAYOUTCOMMIT >> NFS 314 V4 Call CLOSE StateID: 0x54d9 >> NFS 294 V4 Reply (Call In 876) CLOSE >> >> That last LAYOUTCOMMIT and the CLOSE have the size we want. The >> previous >> GETATTR is 4k short. > > When you say “4k short”, do you mean that it differs from the > value returned by the LAYOUTCOMMIT that immediately precedes it? It > looks as if there is a LAYOUTGET immediately following it, so > presumably the writes have not all completed yet. I meant it is 4k short of the final size returned in the LAYOUTCOMMIT immediately following. It is the same as the LAYOUTCOMMIT immediately preceding. FWIW at this point, here's a better look at the network: tshark -r /tmp/pcap -T fields -e frame.number -e nfs.fh.hash -e nfs.opcode -e nfs.fattr4.size 'frame.number > 860' 861 53,22,50 862 0x4f5528b0 53,22,50 863 53,22,49,9 409600 864 0x4f5528b0 53,22,9 865 53,22,9 409600 866 0x4f5528b0 53,22,49,9 867 53,22,50 868 0x4f5528b0 53,22,50 869 53,22,49,9 413696 870 0x4f5528b0 53,22,9 871 53,22,9 413696 872 53,22,50 873 874 0x4f5528b0 53,22,49,9 875 53,22,49,9 417792 876 0x4f5528b0 53,22,4,9 877 53,22,4,9 417792 ... 880 0x4f5528b0 53,22,9 881 53,22,9 417792 Now I see there's a GETATTR after the CLOSE that returns the right size -- but I'll really should track down what's happening to that one to see if it is the same call that the test is making. Unfortunately, I'm getting pulled away again, so I'll dig more later. Thanks for looking. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> On Jul 21, 2016, at 09:05, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> So back to Christoph's point earlier: >> >> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>> This one breaks xfstests generic/207 on block/scsi layout for me. >>> The >>> reason for that is that we need a layoutcommit after writing out all >>> data for the file for the file size to be updated on the server. >> >> You responded: >> >> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>> I’m not understanding this argument. Why do we care if the file >>> size is up >>> to date on the server if we’re not sending an actual GETATTR on >>> the wire >>> to retrieve the file size? >> >> I guess the answer might be because we can get it back from the last >> LAYOUTCOMMIT. >> > > The patch that I followed up with should now ensure that we do not > mark the attribute cache as up to date if there is a LAYOUTCOMMIT > pending. > IOW: when the pNFS write is done, it is expected to do 2 things: > > 1) mark the inode for LAYOUTCOMMIT > 2) mark the attribute cache as invalid (because we know the change > attribute, mtime, ctime need to be updates) > > In the case of blocks pNFS write: > The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should > take care of (1) > The call to nfs_writeback_update_inode() in nfs4_write_done_cb() > should take care of (2). > > Provided that these 2 calls are performed in the above order, then any > call to nfs_getattr() which has not been preceded by a call to > nfs4_proc_layoutcommit() should trigger the call to > __nfs_revalidate_inode(). I think the problem is that a following nfs_getattr() will fail to notice the size change in the case of a write_completion and layoutcommit occuring after nfs_getattr() has done pnfs_sync_inode() but before it has done nfs_update_inode(). In the failing case there are two threads one is doing writes, the other doing lstat on aio_complete via io_getevents(2). For each write completion the lstat thread tries to verify the file size. GETATTR Thread LAYOUTCOMMIT Thread -------------- -------------------- write_completion sets LAYOUTCOMMIT (4096@0) --> nfs_getattr __nfs_revalidate_inode pnfs_sync_inode getattr sees 4096 write_completion sets LAYOUTCOMMIT (4096@4096) sets LAYOUTCOMMITING clears LAYOUTCOMMIT clears LAYOUTCOMMITTING nfs_refresh_inode nfs_update_inode size is 4096 <-- nfs_getattr At this point the cached attributes are seen as up to date, but aio-dio-extend-stat program expects that second write_completion to reflect in the file size. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTI6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRy b25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDIxLCAyMDE2LCBhdCAwOTowNSwg QmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+ Pj4gU28gYmFjayB0byBDaHJpc3RvcGgncyBwb2ludCBlYXJsaWVyOg0KPj4+IA0KPj4+IE9uIDE3 IEp1bCAyMDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+IFRoaXMg b25lIGJyZWFrcyB4ZnN0ZXN0cyBnZW5lcmljLzIwNyBvbiBibG9jay9zY3NpIGxheW91dCBmb3Ig bWUuICBUaGUNCj4+Pj4gcmVhc29uIGZvciB0aGF0IGlzIHRoYXQgd2UgbmVlZCBhIGxheW91dGNv bW1pdCBhZnRlciB3cml0aW5nIG91dCBhbGwNCj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+IA0KPj4+IFlvdSBy ZXNwb25kZWQ6DQo+Pj4gDQo+Pj4gT24gMTggSnVsIDIwMTYsIGF0IDA6MzIsIFRyb25kIE15a2xl YnVzdCB3cm90ZToNCj4+Pj4gSeKAmW0gbm90IHVuZGVyc3RhbmRpbmcgdGhpcyBhcmd1bWVudC4g V2h5IGRvIHdlIGNhcmUgaWYgdGhlIGZpbGUgc2l6ZSBpcyB1cA0KPj4+PiB0byBkYXRlIG9uIHRo ZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VUQVRUUiBvbiB0aGUg d2lyZQ0KPj4+PiB0byByZXRyaWV2ZSB0aGUgZmlsZSBzaXplPw0KPj4+IA0KPj4+IEkgZ3Vlc3Mg dGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQgaXQgYmFjayBmcm9tIHRoZSBs YXN0DQo+Pj4gTEFZT1VUQ09NTUlULg0KPj4+IA0KPj4gDQo+PiBUaGUgcGF0Y2ggdGhhdCBJIGZv bGxvd2VkIHVwIHdpdGggc2hvdWxkIG5vdyBlbnN1cmUgdGhhdCB3ZSBkbyBub3QgbWFyayB0aGUg YXR0cmlidXRlIGNhY2hlIGFzIHVwIHRvIGRhdGUgaWYgdGhlcmUgaXMgYSBMQVlPVVRDT01NSVQg cGVuZGluZy4NCj4+IElPVzogd2hlbiB0aGUgcE5GUyB3cml0ZSBpcyBkb25lLCBpdCBpcyBleHBl Y3RlZCB0byBkbyAyIHRoaW5nczoNCj4+IA0KPj4gMSkgbWFyayB0aGUgaW5vZGUgZm9yIExBWU9V VENPTU1JVA0KPj4gMikgbWFyayB0aGUgYXR0cmlidXRlIGNhY2hlIGFzIGludmFsaWQgKGJlY2F1 c2Ugd2Uga25vdyB0aGUgY2hhbmdlIGF0dHJpYnV0ZSwgbXRpbWUsIGN0aW1lIG5lZWQgdG8gYmUg dXBkYXRlcykNCj4+IA0KPj4gSW4gdGhlIGNhc2Ugb2YgYmxvY2tzIHBORlMgd3JpdGU6DQo+PiBU aGUgY2FsbCB0byBwbmZzX3NldF9sYXlvdXRjb21taXQoKSBpbiBwbmZzX2xkX3dyaXRlX2RvbmUo KSBzaG91bGQgdGFrZSBjYXJlIG9mICgxKQ0KPj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191 cGRhdGVfaW5vZGUoKSBpbiBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9m ICgyKS4NCj4+IA0KPj4gUHJvdmlkZWQgdGhhdCB0aGVzZSAyIGNhbGxzIGFyZSBwZXJmb3JtZWQg aW4gdGhlIGFib3ZlIG9yZGVyLCB0aGVuIGFueSBjYWxsIHRvIG5mc19nZXRhdHRyKCkgd2hpY2gg aGFzIG5vdCBiZWVuIHByZWNlZGVkIGJ5IGEgY2FsbCB0byBuZnM0X3Byb2NfbGF5b3V0Y29tbWl0 KCkgc2hvdWxkIHRyaWdnZXIgdGhlIGNhbGwgdG8gX19uZnNfcmV2YWxpZGF0ZV9pbm9kZSgpLg0K PiANCj4gSSB0aGluayB0aGUgcHJvYmxlbSBpcyB0aGF0IGEgZm9sbG93aW5nIG5mc19nZXRhdHRy KCkgd2lsbCBmYWlsIHRvIG5vdGljZQ0KPiB0aGUgc2l6ZSBjaGFuZ2UgaW4gdGhlIGNhc2Ugb2Yg YSB3cml0ZV9jb21wbGV0aW9uIGFuZCBsYXlvdXRjb21taXQgb2NjdXJpbmcNCj4gYWZ0ZXIgbmZz X2dldGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBk b25lDQo+IG5mc191cGRhdGVfaW5vZGUoKS4NCj4gDQo+IEluIHRoZSBmYWlsaW5nIGNhc2UgdGhl cmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPiBkb2lu ZyBsc3RhdCBvbiBhaW9fY29tcGxldGUgdmlhIGlvX2dldGV2ZW50cygyKS4NCj4gDQo+IEZvciBl YWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2ZXJpZnkgdGhl IGZpbGUgc2l6ZS4NCj4gDQo+IEdFVEFUVFIgVGhyZWFkICAgICAgICAgICAgICAgICAgTEFZT1VU Q09NTUlUIFRocmVhZA0KPiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAgICAgICAgIC0tLS0tLS0t LS0tLS0tLS0tLS0tDQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDApDQo+IC0tPiBuZnNfZ2V0YXR0cg0KDQpm aWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCg0KPiBfX25mc19yZXZhbGlkYXRlX2lub2RlDQo+ICBw bmZzX3N5bmNfaW5vZGUNCg0KTkZTX1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQoNCj4gIGdldGF0 dHIgc2VlcyA0MDk2DQo+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICB3cml0ZV9jb21w bGV0aW9uIHNldHMgTEFZT1VUQ09NTUlUICg0MDk2QDQwOTYpDQo+ICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICBzZXRzIExBWU9VVENPTU1JVElORw0KPiAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVA0KPiAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgY2xlYXJzIExBWU9VVENPTU1JVFRJTkcNCj4gIG5mc19yZWZyZXNoX2lub2RlDQo+ ICAgbmZzX3VwZGF0ZV9pbm9kZSBzaXplIGlzIDQwOTYNCj4gPC0tIG5mc19nZXRhdHRyDQo+IA0K PiBBdCB0aGlzIHBvaW50IHRoZSBjYWNoZWQgYXR0cmlidXRlcyBhcmUgc2VlbiBhcyB1cCB0byBk YXRlLCBidXQNCj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNv bmQgd3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+IGluIHRoZSBmaWxlIHNpemUuDQo+IA0K DQpXaHkgaXNu4oCZdCB0aGUgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGFib3ZlIHJlc29sdmlu ZyB0aGUgcmFjZT8gSeKAmWQgZXhwZWN0IHRoYXQgd291bGQgbW92ZSB5b3VyIOKAnHdyaXRlIGNv bXBsZXRpb24gc2V0cyBMQVlPVVRDT01NSVTigJ0gdXAgdG8gYmVmb3JlIHRoZSBwbmZzX3N5bmNf aW5vZGUoKS4NCkluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlzdG9waCBzZW50LCBhbGwg aGUgd2FzIGRvaW5nIHdhcyBtb3ZpbmcgdGhlIHBuZnNfc3luY19pbm9kZSgpIHRvIGltbWVkaWF0 ZWx5IGFmdGVyIHRoYXQgZmlsZW1hcF93cml0ZV9hbmRfd2FpdCgpIGluc3RlYWQgb2YgcmVseWlu ZyBvbiBpdCBpbiBfbmZzX3JldmFsaWRhdGVfaW5vZGUu -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >> On Jul 25, 2016, at 12:26, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >> >>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> So back to Christoph's point earlier: >>>> >>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>> This one breaks xfstests generic/207 on block/scsi layout for me. >>>>> The >>>>> reason for that is that we need a layoutcommit after writing out >>>>> all >>>>> data for the file for the file size to be updated on the server. >>>> >>>> You responded: >>>> >>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>> I’m not understanding this argument. Why do we care if the file >>>>> size is up >>>>> to date on the server if we’re not sending an actual GETATTR on >>>>> the wire >>>>> to retrieve the file size? >>>> >>>> I guess the answer might be because we can get it back from the >>>> last >>>> LAYOUTCOMMIT. >>>> >>> >>> The patch that I followed up with should now ensure that we do not >>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT >>> pending. >>> IOW: when the pNFS write is done, it is expected to do 2 things: >>> >>> 1) mark the inode for LAYOUTCOMMIT >>> 2) mark the attribute cache as invalid (because we know the change >>> attribute, mtime, ctime need to be updates) >>> >>> In the case of blocks pNFS write: >>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should >>> take care of (1) >>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() >>> should take care of (2). >>> >>> Provided that these 2 calls are performed in the above order, then >>> any call to nfs_getattr() which has not been preceded by a call to >>> nfs4_proc_layoutcommit() should trigger the call to >>> __nfs_revalidate_inode(). >> >> I think the problem is that a following nfs_getattr() will fail to >> notice >> the size change in the case of a write_completion and layoutcommit >> occuring >> after nfs_getattr() has done pnfs_sync_inode() but before it has done >> nfs_update_inode(). >> >> In the failing case there are two threads one is doing writes, the >> other >> doing lstat on aio_complete via io_getevents(2). >> >> For each write completion the lstat thread tries to verify the file >> size. >> >> GETATTR Thread LAYOUTCOMMIT Thread >> -------------- -------------------- >> write_completion sets LAYOUTCOMMIT >> (4096@0) >> --> nfs_getattr > > filemap_write_and_wait() > >> __nfs_revalidate_inode >> pnfs_sync_inode > > NFS_PROTO(inode)->getattr() > >> getattr sees 4096 >> write_completion sets LAYOUTCOMMIT >> (4096@4096) >> sets LAYOUTCOMMITING >> clears LAYOUTCOMMIT >> clears LAYOUTCOMMITTING >> nfs_refresh_inode >> nfs_update_inode size is 4096 >> <-- nfs_getattr >> >> At this point the cached attributes are seen as up to date, but >> aio-dio-extend-stat program expects that second write_completion to >> reflect >> in the file size. >> > > Why isn’t the filemap_write_and_wait() above resolving the race? > I’d > expect that would move your “write completion sets LAYOUTCOMMIT” > up to > before the pnfs_sync_inode(). In fact, in the patch that Christoph > sent, > all he was doing was moving the pnfs_sync_inode() to immediately after > that filemap_write_and_wait() instead of relying on it in > _nfs_revalidate_inode. This is O_DIRECT, I've failed to mention yet. The second write hasn't made it out of __nfs_pageio_add_request() at the time filemap_write_and_wait() is called. It is sleeping in pnfs_update_layout() waiting on a LAYOUTGET and it doesn't resumes until after filemap_write_and_wait(). -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQo+IE9uIEp1bCAyNSwgMjAxNiwgYXQgMTQ6MjYsIEJlbmphbWluIENvZGRpbmd0b24gPGJjb2Rk aW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPiANCj4gDQo+IA0KPiBPbiAyNSBKdWwgMjAxNiwgYXQg MTI6MzksIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+Pj4gT24gSnVsIDI1LCAyMDE2LCBh dCAxMjoyNiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRpbmdAcmVkaGF0LmNvbT4gd3JvdGU6 DQo+Pj4gDQo+Pj4gT24gMjEgSnVsIDIwMTYsIGF0IDk6MjAsIFRyb25kIE15a2xlYnVzdCB3cm90 ZToNCj4+PiANCj4+Pj4+IE9uIEp1bCAyMSwgMjAxNiwgYXQgMDk6MDUsIEJlbmphbWluIENvZGRp bmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+IHdyb3RlOg0KPj4+Pj4gDQo+Pj4+PiBTbyBiYWNr IHRvIENocmlzdG9waCdzIHBvaW50IGVhcmxpZXI6DQo+Pj4+PiANCj4+Pj4+IE9uIDE3IEp1bCAy MDE2LCBhdCAyMzo0OCwgQ2hyaXN0b3BoIEhlbGx3aWcgd3JvdGU6DQo+Pj4+Pj4gVGhpcyBvbmUg YnJlYWtzIHhmc3Rlc3RzIGdlbmVyaWMvMjA3IG9uIGJsb2NrL3Njc2kgbGF5b3V0IGZvciBtZS4g IFRoZQ0KPj4+Pj4+IHJlYXNvbiBmb3IgdGhhdCBpcyB0aGF0IHdlIG5lZWQgYSBsYXlvdXRjb21t aXQgYWZ0ZXIgd3JpdGluZyBvdXQgYWxsDQo+Pj4+Pj4gZGF0YSBmb3IgdGhlIGZpbGUgZm9yIHRo ZSBmaWxlIHNpemUgdG8gYmUgdXBkYXRlZCBvbiB0aGUgc2VydmVyLg0KPj4+Pj4gDQo+Pj4+PiBZ b3UgcmVzcG9uZGVkOg0KPj4+Pj4gDQo+Pj4+PiBPbiAxOCBKdWwgMjAxNiwgYXQgMDozMiwgVHJv bmQgTXlrbGVidXN0IHdyb3RlOg0KPj4+Pj4+IEnigJltIG5vdCB1bmRlcnN0YW5kaW5nIHRoaXMg YXJndW1lbnQuIFdoeSBkbyB3ZSBjYXJlIGlmIHRoZSBmaWxlIHNpemUgaXMgdXANCj4+Pj4+PiB0 byBkYXRlIG9uIHRoZSBzZXJ2ZXIgaWYgd2XigJlyZSBub3Qgc2VuZGluZyBhbiBhY3R1YWwgR0VU QVRUUiBvbiB0aGUgd2lyZQ0KPj4+Pj4+IHRvIHJldHJpZXZlIHRoZSBmaWxlIHNpemU/DQo+Pj4+ PiANCj4+Pj4+IEkgZ3Vlc3MgdGhlIGFuc3dlciBtaWdodCBiZSBiZWNhdXNlIHdlIGNhbiBnZXQg aXQgYmFjayBmcm9tIHRoZSBsYXN0DQo+Pj4+PiBMQVlPVVRDT01NSVQuDQo+Pj4+PiANCj4+Pj4g DQo+Pj4+IFRoZSBwYXRjaCB0aGF0IEkgZm9sbG93ZWQgdXAgd2l0aCBzaG91bGQgbm93IGVuc3Vy ZSB0aGF0IHdlIGRvIG5vdCBtYXJrIHRoZSBhdHRyaWJ1dGUgY2FjaGUgYXMgdXAgdG8gZGF0ZSBp ZiB0aGVyZSBpcyBhIExBWU9VVENPTU1JVCBwZW5kaW5nLg0KPj4+PiBJT1c6IHdoZW4gdGhlIHBO RlMgd3JpdGUgaXMgZG9uZSwgaXQgaXMgZXhwZWN0ZWQgdG8gZG8gMiB0aGluZ3M6DQo+Pj4+IA0K Pj4+PiAxKSBtYXJrIHRoZSBpbm9kZSBmb3IgTEFZT1VUQ09NTUlUDQo+Pj4+IDIpIG1hcmsgdGhl IGF0dHJpYnV0ZSBjYWNoZSBhcyBpbnZhbGlkIChiZWNhdXNlIHdlIGtub3cgdGhlIGNoYW5nZSBh dHRyaWJ1dGUsIG10aW1lLCBjdGltZSBuZWVkIHRvIGJlIHVwZGF0ZXMpDQo+Pj4+IA0KPj4+PiBJ biB0aGUgY2FzZSBvZiBibG9ja3MgcE5GUyB3cml0ZToNCj4+Pj4gVGhlIGNhbGwgdG8gcG5mc19z ZXRfbGF5b3V0Y29tbWl0KCkgaW4gcG5mc19sZF93cml0ZV9kb25lKCkgc2hvdWxkIHRha2UgY2Fy ZSBvZiAoMSkNCj4+Pj4gVGhlIGNhbGwgdG8gbmZzX3dyaXRlYmFja191cGRhdGVfaW5vZGUoKSBp biBuZnM0X3dyaXRlX2RvbmVfY2IoKSBzaG91bGQgdGFrZSBjYXJlIG9mICgyKS4NCj4+Pj4gDQo+ Pj4+IFByb3ZpZGVkIHRoYXQgdGhlc2UgMiBjYWxscyBhcmUgcGVyZm9ybWVkIGluIHRoZSBhYm92 ZSBvcmRlciwgdGhlbiBhbnkgY2FsbCB0byBuZnNfZ2V0YXR0cigpIHdoaWNoIGhhcyBub3QgYmVl biBwcmVjZWRlZCBieSBhIGNhbGwgdG8gbmZzNF9wcm9jX2xheW91dGNvbW1pdCgpIHNob3VsZCB0 cmlnZ2VyIHRoZSBjYWxsIHRvIF9fbmZzX3JldmFsaWRhdGVfaW5vZGUoKS4NCj4+PiANCj4+PiBJ IHRoaW5rIHRoZSBwcm9ibGVtIGlzIHRoYXQgYSBmb2xsb3dpbmcgbmZzX2dldGF0dHIoKSB3aWxs IGZhaWwgdG8gbm90aWNlDQo+Pj4gdGhlIHNpemUgY2hhbmdlIGluIHRoZSBjYXNlIG9mIGEgd3Jp dGVfY29tcGxldGlvbiBhbmQgbGF5b3V0Y29tbWl0IG9jY3VyaW5nDQo+Pj4gYWZ0ZXIgbmZzX2dl dGF0dHIoKSBoYXMgZG9uZSBwbmZzX3N5bmNfaW5vZGUoKSBidXQgYmVmb3JlIGl0IGhhcyBkb25l DQo+Pj4gbmZzX3VwZGF0ZV9pbm9kZSgpLg0KPj4+IA0KPj4+IEluIHRoZSBmYWlsaW5nIGNhc2Ug dGhlcmUgYXJlIHR3byB0aHJlYWRzIG9uZSBpcyBkb2luZyB3cml0ZXMsIHRoZSBvdGhlcg0KPj4+ IGRvaW5nIGxzdGF0IG9uIGFpb19jb21wbGV0ZSB2aWEgaW9fZ2V0ZXZlbnRzKDIpLg0KPj4+IA0K Pj4+IEZvciBlYWNoIHdyaXRlIGNvbXBsZXRpb24gdGhlIGxzdGF0IHRocmVhZCB0cmllcyB0byB2 ZXJpZnkgdGhlIGZpbGUgc2l6ZS4NCj4+PiANCj4+PiBHRVRBVFRSIFRocmVhZCAgICAgICAgICAg ICAgICAgIExBWU9VVENPTU1JVCBUaHJlYWQNCj4+PiAtLS0tLS0tLS0tLS0tLSAgICAgICAgICAg ICAgICAgIC0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+Pj4gICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgd3JpdGVfY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVCAoNDA5NkAwKQ0KPj4+IC0t PiBuZnNfZ2V0YXR0cg0KPj4gDQo+PiBmaWxlbWFwX3dyaXRlX2FuZF93YWl0KCkNCj4+IA0KPj4+ IF9fbmZzX3JldmFsaWRhdGVfaW5vZGUNCj4+PiBwbmZzX3N5bmNfaW5vZGUNCj4+IA0KPj4gTkZT X1BST1RPKGlub2RlKS0+Z2V0YXR0cigpDQo+PiANCj4+PiBnZXRhdHRyIHNlZXMgNDA5Ng0KPj4+ ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHdyaXRlX2NvbXBsZXRpb24gc2V0cyBMQVlP VVRDT01NSVQgKDQwOTZANDA5NikNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBz ZXRzIExBWU9VVENPTU1JVElORw0KPj4+ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGNs ZWFycyBMQVlPVVRDT01NSVQNCj4+PiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjbGVh cnMgTEFZT1VUQ09NTUlUVElORw0KPj4+IG5mc19yZWZyZXNoX2lub2RlDQo+Pj4gIG5mc191cGRh dGVfaW5vZGUgc2l6ZSBpcyA0MDk2DQo+Pj4gPC0tIG5mc19nZXRhdHRyDQo+Pj4gDQo+Pj4gQXQg dGhpcyBwb2ludCB0aGUgY2FjaGVkIGF0dHJpYnV0ZXMgYXJlIHNlZW4gYXMgdXAgdG8gZGF0ZSwg YnV0DQo+Pj4gYWlvLWRpby1leHRlbmQtc3RhdCBwcm9ncmFtIGV4cGVjdHMgdGhhdCBzZWNvbmQg d3JpdGVfY29tcGxldGlvbiB0byByZWZsZWN0DQo+Pj4gaW4gdGhlIGZpbGUgc2l6ZS4NCj4+PiAN Cj4+IA0KPj4gV2h5IGlzbuKAmXQgdGhlIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBhYm92ZSBy ZXNvbHZpbmcgdGhlIHJhY2U/IEnigJlkDQo+PiBleHBlY3QgdGhhdCB3b3VsZCBtb3ZlIHlvdXIg 4oCcd3JpdGUgY29tcGxldGlvbiBzZXRzIExBWU9VVENPTU1JVOKAnSB1cCB0bw0KPj4gYmVmb3Jl IHRoZSBwbmZzX3N5bmNfaW5vZGUoKS4gIEluIGZhY3QsIGluIHRoZSBwYXRjaCB0aGF0IENocmlz dG9waCBzZW50LA0KPj4gYWxsIGhlIHdhcyBkb2luZyB3YXMgbW92aW5nIHRoZSBwbmZzX3N5bmNf aW5vZGUoKSB0byBpbW1lZGlhdGVseSBhZnRlcg0KPj4gdGhhdCBmaWxlbWFwX3dyaXRlX2FuZF93 YWl0KCkgaW5zdGVhZCBvZiByZWx5aW5nIG9uIGl0IGluDQo+PiBfbmZzX3JldmFsaWRhdGVfaW5v ZGUuDQo+IA0KPiBUaGlzIGlzIE9fRElSRUNULCBJJ3ZlIGZhaWxlZCB0byBtZW50aW9uIHlldC4g IFRoZSBzZWNvbmQgd3JpdGUgaGFzbid0IG1hZGUNCj4gaXQgb3V0IG9mIF9fbmZzX3BhZ2Vpb19h ZGRfcmVxdWVzdCgpIGF0IHRoZSB0aW1lIGZpbGVtYXBfd3JpdGVfYW5kX3dhaXQoKSBpcw0KPiBj YWxsZWQuICBJdCBpcyBzbGVlcGluZyBpbiBwbmZzX3VwZGF0ZV9sYXlvdXQoKSB3YWl0aW5nIG9u IGEgTEFZT1VUR0VUIGFuZCBpdA0KPiBkb2Vzbid0IHJlc3VtZXMgdW50aWwgYWZ0ZXIgZmlsZW1h cF93cml0ZV9hbmRfd2FpdCgpLg0KDQpXYWl0LCBzbyB5b3UgaGF2ZSAxIHRocmVhZCBkb2luZyBh biBPX0RJUkVDVCB3cml0ZSgpIGFuZCBhbm90aGVyIGRvaW5nIGEgc3RhdCgpIGluIHBhcmFsbGVs PyBXaHkgd291bGQgdGhlcmUgYmUgYW4gZXhwZWN0YXRpb24gdGhhdCB0aGUgZmlsZXN5c3RlbSBz aG91bGQgc2VyaWFsaXNlIHRob3NlIHN5c3RlbSBjYWxscz8NCg0K -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 25 Jul 2016, at 14:34, Trond Myklebust wrote: >> On Jul 25, 2016, at 14:26, Benjamin Coddington <bcodding@redhat.com> >> wrote: >> >> >> >> On 25 Jul 2016, at 12:39, Trond Myklebust wrote: >> >>>> On Jul 25, 2016, at 12:26, Benjamin Coddington >>>> <bcodding@redhat.com> wrote: >>>> >>>> On 21 Jul 2016, at 9:20, Trond Myklebust wrote: >>>> >>>>>> On Jul 21, 2016, at 09:05, Benjamin Coddington >>>>>> <bcodding@redhat.com> wrote: >>>>>> >>>>>> So back to Christoph's point earlier: >>>>>> >>>>>> On 17 Jul 2016, at 23:48, Christoph Hellwig wrote: >>>>>>> This one breaks xfstests generic/207 on block/scsi layout for >>>>>>> me. The >>>>>>> reason for that is that we need a layoutcommit after writing out >>>>>>> all >>>>>>> data for the file for the file size to be updated on the server. >>>>>> >>>>>> You responded: >>>>>> >>>>>> On 18 Jul 2016, at 0:32, Trond Myklebust wrote: >>>>>>> I’m not understanding this argument. Why do we care if the >>>>>>> file size is up >>>>>>> to date on the server if we’re not sending an actual GETATTR >>>>>>> on the wire >>>>>>> to retrieve the file size? >>>>>> >>>>>> I guess the answer might be because we can get it back from the >>>>>> last >>>>>> LAYOUTCOMMIT. >>>>>> >>>>> >>>>> The patch that I followed up with should now ensure that we do not >>>>> mark the attribute cache as up to date if there is a LAYOUTCOMMIT >>>>> pending. >>>>> IOW: when the pNFS write is done, it is expected to do 2 things: >>>>> >>>>> 1) mark the inode for LAYOUTCOMMIT >>>>> 2) mark the attribute cache as invalid (because we know the change >>>>> attribute, mtime, ctime need to be updates) >>>>> >>>>> In the case of blocks pNFS write: >>>>> The call to pnfs_set_layoutcommit() in pnfs_ld_write_done() should >>>>> take care of (1) >>>>> The call to nfs_writeback_update_inode() in nfs4_write_done_cb() >>>>> should take care of (2). >>>>> >>>>> Provided that these 2 calls are performed in the above order, then >>>>> any call to nfs_getattr() which has not been preceded by a call to >>>>> nfs4_proc_layoutcommit() should trigger the call to >>>>> __nfs_revalidate_inode(). >>>> >>>> I think the problem is that a following nfs_getattr() will fail to >>>> notice >>>> the size change in the case of a write_completion and layoutcommit >>>> occuring >>>> after nfs_getattr() has done pnfs_sync_inode() but before it has >>>> done >>>> nfs_update_inode(). >>>> >>>> In the failing case there are two threads one is doing writes, the >>>> other >>>> doing lstat on aio_complete via io_getevents(2). >>>> >>>> For each write completion the lstat thread tries to verify the file >>>> size. >>>> >>>> GETATTR Thread LAYOUTCOMMIT Thread >>>> -------------- -------------------- >>>> write_completion sets LAYOUTCOMMIT >>>> (4096@0) >>>> --> nfs_getattr >>> >>> filemap_write_and_wait() >>> >>>> __nfs_revalidate_inode >>>> pnfs_sync_inode >>> >>> NFS_PROTO(inode)->getattr() >>> >>>> getattr sees 4096 >>>> write_completion sets LAYOUTCOMMIT >>>> (4096@4096) >>>> sets LAYOUTCOMMITING >>>> clears LAYOUTCOMMIT >>>> clears LAYOUTCOMMITTING >>>> nfs_refresh_inode >>>> nfs_update_inode size is 4096 >>>> <-- nfs_getattr >>>> >>>> At this point the cached attributes are seen as up to date, but >>>> aio-dio-extend-stat program expects that second write_completion to >>>> reflect >>>> in the file size. >>>> >>> >>> Why isn’t the filemap_write_and_wait() above resolving the race? >>> I’d >>> expect that would move your “write completion sets LAYOUTCOMMIT” >>> up to >>> before the pnfs_sync_inode(). In fact, in the patch that Christoph >>> sent, >>> all he was doing was moving the pnfs_sync_inode() to immediately >>> after >>> that filemap_write_and_wait() instead of relying on it in >>> _nfs_revalidate_inode. >> >> This is O_DIRECT, I've failed to mention yet. The second write >> hasn't made >> it out of __nfs_pageio_add_request() at the time >> filemap_write_and_wait() is >> called. It is sleeping in pnfs_update_layout() waiting on a >> LAYOUTGET and it >> doesn't resumes until after filemap_write_and_wait(). > > Wait, so you have 1 thread doing an O_DIRECT write() and another doing > a > stat() in parallel? Why would there be an expectation that the > filesystem > should serialise those system calls? Not exactly parallel, but synchronized on aio_complete. A comment in generic/207's src/aio-dio-regress/aio-dio-extend-stat.c: 36 /* 37 * This was originally submitted to 38 * http://bugzilla.kernel.org/show_bug.cgi?id=6831 by 39 * Rafal Wijata <wijata@nec-labs.com>. It caught a race in dio aio completion 40 * that would call aio_complete() before the dio callers would update i_size. 41 * A stat after io_getevents() would not see the new file size. 42 * 43 * The bug was fixed in the fs/direct-io.c completion reworking that appeared 44 * in 2.6.20. This test should fail on 2.6.19. 45 */ As far as I can see, this check is the whole point of generic/207.. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 35fda08dc4f6..9df45832e28b 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1664,7 +1664,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) unsigned long now = jiffies; unsigned long save_cache_validity; bool have_writers = nfs_file_has_buffered_writers(nfsi); - bool cache_revalidated = true; + bool cache_revalidated; dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n", __func__, inode->i_sb->s_id, inode->i_ino, @@ -1713,6 +1713,9 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) /* Do atomic weak cache consistency updates */ invalid |= nfs_wcc_update_inode(inode, fattr); + + cache_revalidated = !pnfs_layoutcommit_outstanding(inode); + /* More cache consistency checks */ if (fattr->valid & NFS_ATTR_FATTR_CHANGE) { if (inode->i_version != fattr->change_attr) { diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h index d6be5299a55a..181283c4ebc3 100644 --- a/fs/nfs/pnfs.h +++ b/fs/nfs/pnfs.h @@ -629,6 +629,13 @@ pnfs_sync_inode(struct inode *inode, bool datasync) } static inline bool +pnfs_layoutcommit_outstanding(struct inode *inode) +{ + return false; +} + + +static inline bool pnfs_roc(struct inode *ino) { return false;