diff mbox

nfs: nfs_commit_inode should redirty inode if the inode has outstanding requests

Message ID 20180307195313.kzqdboqk5j2hyrf3@tonberry.usersys.redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Scott Mayhew March 7, 2018, 7:53 p.m. UTC
On Mon, 05 Mar 2018, Trond Myklebust wrote:

> On Mon, 2018-03-05 at 16:16 -0500, J. Bruce Fields wrote:
> > On Fri, Mar 02, 2018 at 11:00:38AM -0500, Scott Mayhew wrote:
> > > It seems that nfs_commit_inode can be called where the nfs_inode
> > > has
> > > outstanding requests and the commit lists are empty.  That can lead
> > > to
> > > invalidate_complete_page2 failing due to the associated page having
> > > private data which in turn leads to invalidate_inode_pages2_range
> > > returning -EBUSY.
> > 
> > For what it's worth, I verified that this fixes the EBUSY I was
> > seeing:
> > 
> > 	http://marc.info/?i=20180223160350.GF15876@fieldses.org
> > 
> 
> Fine, but the patch will also cause the inode to be marked as dirty in
> cases where there are no unstable writes to commit, but there are pages
> undergoing writeback.
> IOW: it regresses the fix that was made in dc4fd9ab01
> 
> So please do look into fixing do_launder_page().
> 

Yes, sorry... so I've been testing with this change since Friday
afternoon:


But I'm frequently seeing soft lockups though, on both 4.16-rc4 and on
the latest RHEL 7 kernel.

Mar  7 13:52:08 localhost kernel: watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [xfs_io:17667]
Mar  7 13:52:08 localhost kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon i2c_piix4 joydev xfs libcrc32c qxl drm_kms_helper ttm virtio_console virtio_net drm virtio_scsi serio_raw crc32c_intel ata_generic virtio_pci pata_acpi qemu_fw_cfg virtio_rng virtio_ring virtio
Mar  7 13:52:08 localhost kernel: CPU: 5 PID: 17667 Comm: xfs_io Tainted: G             L   4.16.0-rc4+ #2
Mar  7 13:52:08 localhost kernel: Hardware name: Red Hat RHEV Hypervisor, BIOS 1.10.2-3.el7_4.1 04/01/2014
Mar  7 13:52:08 localhost kernel: RIP: 0010:nfs_commit_inode+0x87/0x160 [nfs]
Mar  7 13:52:08 localhost kernel: RSP: 0018:ffffab310e627b00 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff12
Mar  7 13:52:08 localhost kernel: RAX: 0000000000000000 RBX: ffff8cd834f0a3e0 RCX: 0000000000000000
Mar  7 13:52:08 localhost kernel: RDX: ffff8cd834f0a300 RSI: 0000000000000001 RDI: ffff8cd834f0a3e0
Mar  7 13:52:08 localhost kernel: RBP: 0000000000000001 R08: ffffab310e627c30 R09: 000000000001d400
Mar  7 13:52:08 localhost kernel: R10: ffff8cd836c02480 R11: ffff8cd83302043c R12: ffffab310e627b70
Mar  7 13:52:08 localhost kernel: R13: ffffffffffffffff R14: 0000000000000000 R15: ffffcd0147055f00
Mar  7 13:52:08 localhost kernel: FS:  00007feae2d97b80(0000) GS:ffff8cd837340000(0000) knlGS:0000000000000000
Mar  7 13:52:08 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  7 13:52:08 localhost kernel: CR2: 00007feae2103fb8 CR3: 0000000120fc2002 CR4: 00000000003606e0
Mar  7 13:52:08 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  7 13:52:08 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar  7 13:52:08 localhost kernel: Call Trace:
Mar  7 13:52:08 localhost kernel: nfs_wb_page+0xd7/0x1b0 [nfs]
Mar  7 13:52:08 localhost kernel: invalidate_inode_pages2_range+0x2aa/0x510
Mar  7 13:52:08 localhost kernel: nfs_revalidate_mapping+0xc6/0x280 [nfs]
Mar  7 13:52:08 localhost kernel: mmap_region+0x3a7/0x5e0
Mar  7 13:52:08 localhost kernel: do_mmap+0x2de/0x440
Mar  7 13:52:08 localhost kernel: vm_mmap_pgoff+0xd2/0x120
Mar  7 13:52:08 localhost kernel: SyS_mmap_pgoff+0x1c2/0x250
Mar  7 13:52:08 localhost kernel: do_syscall_64+0x74/0x180
Mar  7 13:52:08 localhost kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Mar  7 13:52:08 localhost kernel: RIP: 0033:0x7feae2442e13
Mar  7 13:52:08 localhost kernel: RSP: 002b:00007fff59c5c158 EFLAGS: 00000246 ORIG_RAX: 0000000000000009
Mar  7 13:52:08 localhost kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007feae2442e13
Mar  7 13:52:08 localhost kernel: RDX: 0000000000000002 RSI: 0000000000100000 RDI: 0000000000000000
Mar  7 13:52:08 localhost kernel: RBP: 000000001d300000 R08: 0000000000000003 R09: 000000001d300000
Mar  7 13:52:08 localhost kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Mar  7 13:52:08 localhost kernel: R13: 0000000000100000 R14: 0000000000000001 R15: 0000000000000002
Mar  7 13:52:08 localhost kernel: Code: 00 00 48 85 c9 74 0a e8 98 9d 7d ec 48 8b 54 24 18 48 89 44 24 20 48 c7 44 24 28 00 00 00 00 48 c7 44 24 30 b0 30 43 c0 f0 ff 02 <48> 8b 7c 24 18 48 8b 47 08 48 85 c0 75 2a 45 31 e4 e8 53 d9 ff


Mar  6 17:42:11 dell-r430-8 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [fio:9269]
Mar  6 17:42:11 dell-r430-8 kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ipmi_ssif glue_helper ablk_helper cryptd ipmi_si iTCO_wdt iTCO_vendor_support pcspkr dcdbas ipmi_devintf sg ipmi_msghandler wmi acpi_power_meter mei_me mei shpchp lpc_ich ip_tables xfs libcrc32c sr_mod sd_mod cdrom crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common crc32c_intel ahci libahci libata tg3 i2c_core megaraid_sas ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
Mar  6 17:42:11 dell-r430-8 kernel: CPU: 1 PID: 9269 Comm: fio Kdump: loaded Tainted: G             L ------------   3.10.0.jsm.test+ #10
Mar  6 17:42:11 dell-r430-8 kernel: Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
Mar  6 17:42:11 dell-r430-8 kernel: task: ffff9728b830eeb0 ti: ffff9728b668c000 task.ti: ffff9728b668c000
Mar  6 17:42:11 dell-r430-8 kernel: RIP: 0010:[<ffffffff89f9f789>]  [<ffffffff89f9f789>] clear_page_dirty_for_io+0x9/0xd0
Mar  6 17:42:11 dell-r430-8 kernel: RSP: 0018:ffff9728b668fbb8  EFLAGS: 00000246
Mar  6 17:42:11 dell-r430-8 kernel: RAX: 002fffff0000082d RBX: ffff9728b668fb58 RCX: 0000000000000000
Mar  6 17:42:11 dell-r430-8 kernel: RDX: 0000000000000000 RSI: ffff9728b668fb58 RDI: ffffccfc93737180
Mar  6 17:42:11 dell-r430-8 kernel: RBP: ffff9728b668fbb8 R08: 0000000000000101 R09: ffffccfc8ddd4b80
Mar  6 17:42:11 dell-r430-8 kernel: R10: 0000000000000001 R11: 0000000000000005 R12: ffffffffffffff10
Mar  6 17:42:11 dell-r430-8 kernel: R13: ffff972913279db8 R14: ffff9728b668fb58 R15: 0000000000000000
Mar  6 17:42:11 dell-r430-8 kernel: FS:  00007fe3e484e7c0(0000) GS:ffff97295d240000(0000) knlGS:0000000000000000
Mar  6 17:42:11 dell-r430-8 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar  6 17:42:11 dell-r430-8 kernel: CR2: 00007fe3ddff95b0 CR3: 000000084837a000 CR4: 00000000001607e0
Mar  6 17:42:11 dell-r430-8 kernel: Call Trace:
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc0848ea5>] nfs_wb_single_page+0x95/0x190 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc0837dd1>] nfs_launder_page+0x41/0x90 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffff89fa406a>] invalidate_inode_pages2_range+0x3ca/0x470
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffff89fa4127>] invalidate_inode_pages2+0x17/0x20
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc083b29a>] nfs_invalidate_mapping+0x9a/0x100 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc083ba0a>] __nfs_revalidate_mapping+0x19a/0x280 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc083bf53>] nfs_revalidate_mapping_protected+0x13/0x20 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc0838344>] nfs_file_read+0x44/0xf0 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffffc0838300>] ? nfs_write_begin+0x290/0x290 [nfs]
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffff8a070143>] do_io_submit+0x3c3/0x870
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffff8a070600>] SyS_io_submit+0x10/0x20
Mar  6 17:42:11 dell-r430-8 kernel: [<ffffffff8a51f7d5>] system_call_fastpath+0x1c/0x21

-Scott

> > 
> > > 
> > > Instead of having nfs_commit_inode exit early when the commit lists
> > > are
> > > empty, only do so if nrequests is also 0.
> > > 
> > > Fixes: dc4fd9ab01 ("nfs: don't wait on commit in nfs_commit_inode()
> > > if there were no commit requests")
> > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > > ---
> > >  fs/nfs/write.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > index 7428a66..0268bd1 100644
> > > --- a/fs/nfs/write.c
> > > +++ b/fs/nfs/write.c
> > > @@ -1890,7 +1890,7 @@ int nfs_commit_inode(struct inode *inode, int
> > > how)
> > >  	if (res)
> > >  		error = nfs_generic_commit_list(inode, &head, how,
> > > &cinfo);
> > >  	nfs_commit_end(cinfo.mds);
> > > -	if (res == 0)
> > > +	if (res == 0 && !nfs_have_writebacks(inode))
> > >  		return res;
> > >  	if (error < 0)
> > >  		goto out_error;
> > > -- 
> > > 2.9.5
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-
> > > nfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust March 7, 2018, 8:38 p.m. UTC | #1
T24gV2VkLCAyMDE4LTAzLTA3IGF0IDE0OjUzIC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+
IE9uIE1vbiwgMDUgTWFyIDIwMTgsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+ID4gT24g
TW9uLCAyMDE4LTAzLTA1IGF0IDE2OjE2IC0wNTAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6DQo+
ID4gPiBPbiBGcmksIE1hciAwMiwgMjAxOCBhdCAxMTowMDozOEFNIC0wNTAwLCBTY290dCBNYXlo
ZXcgd3JvdGU6DQo+ID4gPiA+IEl0IHNlZW1zIHRoYXQgbmZzX2NvbW1pdF9pbm9kZSBjYW4gYmUg
Y2FsbGVkIHdoZXJlIHRoZQ0KPiA+ID4gPiBuZnNfaW5vZGUNCj4gPiA+ID4gaGFzDQo+ID4gPiA+
IG91dHN0YW5kaW5nIHJlcXVlc3RzIGFuZCB0aGUgY29tbWl0IGxpc3RzIGFyZSBlbXB0eS4gIFRo
YXQgY2FuDQo+ID4gPiA+IGxlYWQNCj4gPiA+ID4gdG8NCj4gPiA+ID4gaW52YWxpZGF0ZV9jb21w
bGV0ZV9wYWdlMiBmYWlsaW5nIGR1ZSB0byB0aGUgYXNzb2NpYXRlZCBwYWdlDQo+ID4gPiA+IGhh
dmluZw0KPiA+ID4gPiBwcml2YXRlIGRhdGEgd2hpY2ggaW4gdHVybiBsZWFkcyB0bw0KPiA+ID4g
PiBpbnZhbGlkYXRlX2lub2RlX3BhZ2VzMl9yYW5nZQ0KPiA+ID4gPiByZXR1cm5pbmcgLUVCVVNZ
Lg0KPiA+ID4gDQo+ID4gPiBGb3Igd2hhdCBpdCdzIHdvcnRoLCBJIHZlcmlmaWVkIHRoYXQgdGhp
cyBmaXhlcyB0aGUgRUJVU1kgSSB3YXMNCj4gPiA+IHNlZWluZzoNCj4gPiA+IA0KPiA+ID4gCWh0
dHA6Ly9tYXJjLmluZm8vP2k9MjAxODAyMjMxNjAzNTAuR0YxNTg3NkBmaWVsZHNlcy5vcmcNCj4g
PiA+IA0KPiA+IA0KPiA+IEZpbmUsIGJ1dCB0aGUgcGF0Y2ggd2lsbCBhbHNvIGNhdXNlIHRoZSBp
bm9kZSB0byBiZSBtYXJrZWQgYXMgZGlydHkNCj4gPiBpbg0KPiA+IGNhc2VzIHdoZXJlIHRoZXJl
IGFyZSBubyB1bnN0YWJsZSB3cml0ZXMgdG8gY29tbWl0LCBidXQgdGhlcmUgYXJlDQo+ID4gcGFn
ZXMNCj4gPiB1bmRlcmdvaW5nIHdyaXRlYmFjay4NCj4gPiBJT1c6IGl0IHJlZ3Jlc3NlcyB0aGUg
Zml4IHRoYXQgd2FzIG1hZGUgaW4gZGM0ZmQ5YWIwMQ0KPiA+IA0KPiA+IFNvIHBsZWFzZSBkbyBs
b29rIGludG8gZml4aW5nIGRvX2xhdW5kZXJfcGFnZSgpLg0KPiA+IA0KPiANCj4gWWVzLCBzb3Jy
eS4uLiBzbyBJJ3ZlIGJlZW4gdGVzdGluZyB3aXRoIHRoaXMgY2hhbmdlIHNpbmNlIEZyaWRheQ0K
PiBhZnRlcm5vb246DQo+IA0KPiBkaWZmIC0tZ2l0IGEvbW0vdHJ1bmNhdGUuYyBiL21tL3RydW5j
YXRlLmMNCj4gaW5kZXggYzM0ZTJmZDRmNTgzLi45MDk3MzRhNWQzYTMgMTAwNjQ0DQo+IC0tLSBh
L21tL3RydW5jYXRlLmMNCj4gKysrIGIvbW0vdHJ1bmNhdGUuYw0KPiBAQCAtNjQ3LDcgKzY0Nyw3
IEBAIGludmFsaWRhdGVfY29tcGxldGVfcGFnZTIoc3RydWN0IGFkZHJlc3Nfc3BhY2UNCj4gKm1h
cHBpbmcsIHN0cnVjdCBwYWdlICpwYWdlKQ0KPiAgDQo+ICBzdGF0aWMgaW50IGRvX2xhdW5kZXJf
cGFnZShzdHJ1Y3QgYWRkcmVzc19zcGFjZSAqbWFwcGluZywgc3RydWN0DQo+IHBhZ2UgKnBhZ2Up
DQo+ICB7DQo+IC0gICAgICAgaWYgKCFQYWdlRGlydHkocGFnZSkpDQo+ICsgICAgICAgaWYgKCFQ
YWdlRGlydHkocGFnZSkgJiYgIVBhZ2VQcml2YXRlKHBhZ2UpKQ0KPiAgICAgICAgICAgICAgICAg
cmV0dXJuIDA7DQo+ICAgICAgICAgaWYgKHBhZ2UtPm1hcHBpbmcgIT0gbWFwcGluZyB8fCBtYXBw
aW5nLT5hX29wcy0+bGF1bmRlcl9wYWdlDQo+ID09IE5VTEwpDQo+ICAgICAgICAgICAgICAgICBy
ZXR1cm4gMDsNCj4gDQo+IEJ1dCBJJ20gZnJlcXVlbnRseSBzZWVpbmcgc29mdCBsb2NrdXBzIHRo
b3VnaCwgb24gYm90aCA0LjE2LXJjNCBhbmQNCj4gb24NCj4gdGhlIGxhdGVzdCBSSEVMIDcga2Vy
bmVsLg0KPiANCj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IHdhdGNoZG9nOiBC
VUc6IHNvZnQgbG9ja3VwIC0gQ1BVIzUNCj4gc3R1Y2sgZm9yIDIzcyEgW3hmc19pbzoxNzY2N10N
Cj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IE1vZHVsZXMgbGlua2VkIGluOiBy
cGNzZWNfZ3NzX2tyYjUNCj4gYXV0aF9ycGNnc3MgbmZzdjQgZG5zX3Jlc29sdmVyIG5mcyBsb2Nr
ZCBncmFjZSBmc2NhY2hlIHN1bnJwYw0KPiBjcmN0MTBkaWZfcGNsbXVsIGNyYzMyX3BjbG11bCBn
aGFzaF9jbG11bG5pX2ludGVsIHZpcnRpb19iYWxsb29uDQo+IGkyY19waWl4NCBqb3lkZXYgeGZz
IGxpYmNyYzMyYyBxeGwgZHJtX2ttc19oZWxwZXIgdHRtIHZpcnRpb19jb25zb2xlDQo+IHZpcnRp
b19uZXQgZHJtIHZpcnRpb19zY3NpIHNlcmlvX3JhdyBjcmMzMmNfaW50ZWwgYXRhX2dlbmVyaWMN
Cj4gdmlydGlvX3BjaSBwYXRhX2FjcGkgcWVtdV9md19jZmcgdmlydGlvX3JuZyB2aXJ0aW9fcmlu
ZyB2aXJ0aW8NCj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IENQVTogNSBQSUQ6
IDE3NjY3IENvbW06IHhmc19pbw0KPiBUYWludGVkOiBHICAgICAgICAgICAgIEwgICA0LjE2LjAt
cmM0KyAjMg0KPiBNYXIgIDcgMTM6NTI6MDggbG9jYWxob3N0IGtlcm5lbDogSGFyZHdhcmUgbmFt
ZTogUmVkIEhhdCBSSEVWDQo+IEh5cGVydmlzb3IsIEJJT1MgMS4xMC4yLTMuZWw3XzQuMSAwNC8w
MS8yMDE0DQo+IE1hciAgNyAxMzo1MjowOCBsb2NhbGhvc3Qga2VybmVsOiBSSVA6DQo+IDAwMTA6
bmZzX2NvbW1pdF9pbm9kZSsweDg3LzB4MTYwIFtuZnNdDQo+IE1hciAgNyAxMzo1MjowOCBsb2Nh
bGhvc3Qga2VybmVsOiBSU1A6IDAwMTg6ZmZmZmFiMzEwZTYyN2IwMCBFRkxBR1M6DQo+IDAwMDAw
MjAyIE9SSUdfUkFYOiBmZmZmZmZmZmZmZmZmZjEyDQo+IE1hciAgNyAxMzo1MjowOCBsb2NhbGhv
c3Qga2VybmVsOiBSQVg6IDAwMDAwMDAwMDAwMDAwMDAgUkJYOg0KPiBmZmZmOGNkODM0ZjBhM2Uw
IFJDWDogMDAwMDAwMDAwMDAwMDAwMA0KPiBNYXIgIDcgMTM6NTI6MDggbG9jYWxob3N0IGtlcm5l
bDogUkRYOiBmZmZmOGNkODM0ZjBhMzAwIFJTSToNCj4gMDAwMDAwMDAwMDAwMDAwMSBSREk6IGZm
ZmY4Y2Q4MzRmMGEzZTANCj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IFJCUDog
MDAwMDAwMDAwMDAwMDAwMSBSMDg6DQo+IGZmZmZhYjMxMGU2MjdjMzAgUjA5OiAwMDAwMDAwMDAw
MDFkNDAwDQo+IE1hciAgNyAxMzo1MjowOCBsb2NhbGhvc3Qga2VybmVsOiBSMTA6IGZmZmY4Y2Q4
MzZjMDI0ODAgUjExOg0KPiBmZmZmOGNkODMzMDIwNDNjIFIxMjogZmZmZmFiMzEwZTYyN2I3MA0K
PiBNYXIgIDcgMTM6NTI6MDggbG9jYWxob3N0IGtlcm5lbDogUjEzOiBmZmZmZmZmZmZmZmZmZmZm
IFIxNDoNCj4gMDAwMDAwMDAwMDAwMDAwMCBSMTU6IGZmZmZjZDAxNDcwNTVmMDANCj4gTWFyICA3
IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IEZTOiAgMDAwMDdmZWFlMmQ5N2I4MCgwMDAwKQ0K
PiBHUzpmZmZmOGNkODM3MzQwMDAwKDAwMDApIGtubEdTOjAwMDAwMDAwMDAwMDAwMDANCj4gTWFy
ICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IENTOiAgMDAxMCBEUzogMDAwMCBFUzogMDAw
MCBDUjA6DQo+IDAwMDAwMDAwODAwNTAwMzMNCj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBr
ZXJuZWw6IENSMjogMDAwMDdmZWFlMjEwM2ZiOCBDUjM6DQo+IDAwMDAwMDAxMjBmYzIwMDIgQ1I0
OiAwMDAwMDAwMDAwMzYwNmUwDQo+IE1hciAgNyAxMzo1MjowOCBsb2NhbGhvc3Qga2VybmVsOiBE
UjA6IDAwMDAwMDAwMDAwMDAwMDAgRFIxOg0KPiAwMDAwMDAwMDAwMDAwMDAwIERSMjogMDAwMDAw
MDAwMDAwMDAwMA0KPiBNYXIgIDcgMTM6NTI6MDggbG9jYWxob3N0IGtlcm5lbDogRFIzOiAwMDAw
MDAwMDAwMDAwMDAwIERSNjoNCj4gMDAwMDAwMDBmZmZlMGZmMCBEUjc6IDAwMDAwMDAwMDAwMDA0
MDANCj4gTWFyICA3IDEzOjUyOjA4IGxvY2FsaG9zdCBrZXJuZWw6IENhbGwgVHJhY2U6DQo+IE1h
ciAgNyAxMzo1MjowOCBsb2NhbGhvc3Qga2VybmVsOiBuZnNfd2JfcGFnZSsweGQ3LzB4MWIwIFtu
ZnNdDQoNCkFoLi4uIFNvIHRoZSByZWFsIHByb2JsZW0gaXMgdGhhdCB3ZSdyZSBub3Qgd2FpdGlu
ZyBmb3IgdGhlIG91dHN0YW5kaW5nDQpjb21taXQ/IE9LLCBzbyBob3cgYWJvdXQgc29tZXRoaW5n
IGxpa2UgdGhlIGZvbGxvd2luZyB0aGVuPw0KDQo4PC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLQ0KRnJvbSBmMmI3NjM0ZDhhMDUxMDA2MzFhYjAxOWQ0ZmI1MDkyZWQ1
ZmUzYzAzIE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQ0KRnJvbTogVHJvbmQgTXlrbGVidXN0IDx0
cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KRGF0ZTogV2VkLCA3IE1hciAyMDE4IDE1
OjIyOjMxIC0wNTAwDQpTdWJqZWN0OiBbUEFUQ0hdIE5GUzogRG9uJ3QgY2lyY3VtdmVudCB3YWl0
IGZvciBjb21taXQgY29tcGxldGlvbg0KDQpXZSBkbyB3YW50IHRvIHJlc3BlY3QgdGhlIEZMVVNI
X1NZTkMgYXJndW1lbnQgdG8gbmZzX2NvbW1pdF9pbm9kZSgpIHRvDQplbnN1cmUgdGhhdCBhbGwg
b3V0c3RhbmRpbmcgQ09NTUlUIHJlcXVlc3RzIHRvIHRoZSBpbm9kZSBpbiBxdWVzdGlvbiBhcmUN
CmNvbXBsZXRlLiBDdXJyZW50bHkgd2Ugd2lsbCBleGl0IGVhcmx5IGlmIHdlIGRpZCBub3QgaGF2
ZSB0byBzY2hlZHVsZQ0KYSBuZXcgQ09NTUlUIHJlcXVlc3QuDQoNCkZpeGVzOiBkYzRmZDlhYjAx
YWIzICgibmZzOiBkb24ndCB3YWl0IG9uIGNvbW1pdCBpbiBuZnNfY29tbWl0X2lub2RlKCkuLi4i
KQ0KU2lnbmVkLW9mZi1ieTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFy
eWRhdGEuY29tPg0KQ2M6IHN0YWJsZUB2Z2VyLmtlcm5lbC5vcmcgIyA0LjUrDQotLS0NCiBmcy9u
ZnMvd3JpdGUuYyB8IDUgKystLS0NCiAxIGZpbGUgY2hhbmdlZCwgMiBpbnNlcnRpb25zKCspLCAz
IGRlbGV0aW9ucygtKQ0KDQpkaWZmIC0tZ2l0IGEvZnMvbmZzL3dyaXRlLmMgYi9mcy9uZnMvd3Jp
dGUuYw0KaW5kZXggOTM0NjBmMWNmNWE0Li44OWNhN2I3MjU0NTQgMTAwNjQ0DQotLS0gYS9mcy9u
ZnMvd3JpdGUuYw0KKysrIGIvZnMvbmZzL3dyaXRlLmMNCkBAIC0xODg2LDggKzE4ODYsNiBAQCBp
bnQgbmZzX2NvbW1pdF9pbm9kZShzdHJ1Y3QgaW5vZGUgKmlub2RlLCBpbnQgaG93KQ0KIAlpZiAo
cmVzKQ0KIAkJZXJyb3IgPSBuZnNfZ2VuZXJpY19jb21taXRfbGlzdChpbm9kZSwgJmhlYWQsIGhv
dywgJmNpbmZvKTsNCiAJbmZzX2NvbW1pdF9lbmQoY2luZm8ubWRzKTsNCi0JaWYgKHJlcyA9PSAw
KQ0KLQkJcmV0dXJuIHJlczsNCiAJaWYgKGVycm9yIDwgMCkNCiAJCWdvdG8gb3V0X2Vycm9yOw0K
IAlpZiAoIW1heV93YWl0KQ0KQEAgLTE5MDQsNyArMTkwMiw4IEBAIGludCBuZnNfY29tbWl0X2lu
b2RlKHN0cnVjdCBpbm9kZSAqaW5vZGUsIGludCBob3cpDQogCSAqIHRoYXQgdGhlIGRhdGEgaXMg
b24gdGhlIGRpc2suDQogCSAqLw0KIG91dF9tYXJrX2RpcnR5Og0KLQlfX21hcmtfaW5vZGVfZGly
dHkoaW5vZGUsIElfRElSVFlfREFUQVNZTkMpOw0KKwlpZiAoYXRvbWljX3JlYWQoJmNpbmZvLm1k
cy0+cnBjc19vdXQpKQ0KKwkJX19tYXJrX2lub2RlX2RpcnR5KGlub2RlLCBJX0RJUlRZX0RBVEFT
WU5DKTsNCiAJcmV0dXJuIHJlczsNCiB9DQogRVhQT1JUX1NZTUJPTF9HUEwobmZzX2NvbW1pdF9p
bm9kZSk7DQotLSANCjIuMTQuMw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNs
aWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh
LmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Mayhew March 8, 2018, 1:09 p.m. UTC | #2
On Wed, 07 Mar 2018, Trond Myklebust wrote:

> On Wed, 2018-03-07 at 14:53 -0500, Scott Mayhew wrote:
> > On Mon, 05 Mar 2018, Trond Myklebust wrote:
> > 
> > > On Mon, 2018-03-05 at 16:16 -0500, J. Bruce Fields wrote:
> > > > On Fri, Mar 02, 2018 at 11:00:38AM -0500, Scott Mayhew wrote:
> > > > > It seems that nfs_commit_inode can be called where the
> > > > > nfs_inode
> > > > > has
> > > > > outstanding requests and the commit lists are empty.  That can
> > > > > lead
> > > > > to
> > > > > invalidate_complete_page2 failing due to the associated page
> > > > > having
> > > > > private data which in turn leads to
> > > > > invalidate_inode_pages2_range
> > > > > returning -EBUSY.
> > > > 
> > > > For what it's worth, I verified that this fixes the EBUSY I was
> > > > seeing:
> > > > 
> > > > 	http://marc.info/?i=20180223160350.GF15876@fieldses.org
> > > > 
> > > 
> > > Fine, but the patch will also cause the inode to be marked as dirty
> > > in
> > > cases where there are no unstable writes to commit, but there are
> > > pages
> > > undergoing writeback.
> > > IOW: it regresses the fix that was made in dc4fd9ab01
> > > 
> > > So please do look into fixing do_launder_page().
> > > 
> > 
> > Yes, sorry... so I've been testing with this change since Friday
> > afternoon:
> > 
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index c34e2fd4f583..909734a5d3a3 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -647,7 +647,7 @@ invalidate_complete_page2(struct address_space
> > *mapping, struct page *page)
> >  
> >  static int do_launder_page(struct address_space *mapping, struct
> > page *page)
> >  {
> > -       if (!PageDirty(page))
> > +       if (!PageDirty(page) && !PagePrivate(page))
> >                 return 0;
> >         if (page->mapping != mapping || mapping->a_ops->launder_page
> > == NULL)
> >                 return 0;
> > 
> > But I'm frequently seeing soft lockups though, on both 4.16-rc4 and
> > on
> > the latest RHEL 7 kernel.
> > 
> > Mar  7 13:52:08 localhost kernel: watchdog: BUG: soft lockup - CPU#5
> > stuck for 23s! [xfs_io:17667]
> > Mar  7 13:52:08 localhost kernel: Modules linked in: rpcsec_gss_krb5
> > auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc
> > crct10dif_pclmul crc32_pclmul ghash_clmulni_intel virtio_balloon
> > i2c_piix4 joydev xfs libcrc32c qxl drm_kms_helper ttm virtio_console
> > virtio_net drm virtio_scsi serio_raw crc32c_intel ata_generic
> > virtio_pci pata_acpi qemu_fw_cfg virtio_rng virtio_ring virtio
> > Mar  7 13:52:08 localhost kernel: CPU: 5 PID: 17667 Comm: xfs_io
> > Tainted: G             L   4.16.0-rc4+ #2
> > Mar  7 13:52:08 localhost kernel: Hardware name: Red Hat RHEV
> > Hypervisor, BIOS 1.10.2-3.el7_4.1 04/01/2014
> > Mar  7 13:52:08 localhost kernel: RIP:
> > 0010:nfs_commit_inode+0x87/0x160 [nfs]
> > Mar  7 13:52:08 localhost kernel: RSP: 0018:ffffab310e627b00 EFLAGS:
> > 00000202 ORIG_RAX: ffffffffffffff12
> > Mar  7 13:52:08 localhost kernel: RAX: 0000000000000000 RBX:
> > ffff8cd834f0a3e0 RCX: 0000000000000000
> > Mar  7 13:52:08 localhost kernel: RDX: ffff8cd834f0a300 RSI:
> > 0000000000000001 RDI: ffff8cd834f0a3e0
> > Mar  7 13:52:08 localhost kernel: RBP: 0000000000000001 R08:
> > ffffab310e627c30 R09: 000000000001d400
> > Mar  7 13:52:08 localhost kernel: R10: ffff8cd836c02480 R11:
> > ffff8cd83302043c R12: ffffab310e627b70
> > Mar  7 13:52:08 localhost kernel: R13: ffffffffffffffff R14:
> > 0000000000000000 R15: ffffcd0147055f00
> > Mar  7 13:52:08 localhost kernel: FS:  00007feae2d97b80(0000)
> > GS:ffff8cd837340000(0000) knlGS:0000000000000000
> > Mar  7 13:52:08 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> > 0000000080050033
> > Mar  7 13:52:08 localhost kernel: CR2: 00007feae2103fb8 CR3:
> > 0000000120fc2002 CR4: 00000000003606e0
> > Mar  7 13:52:08 localhost kernel: DR0: 0000000000000000 DR1:
> > 0000000000000000 DR2: 0000000000000000
> > Mar  7 13:52:08 localhost kernel: DR3: 0000000000000000 DR6:
> > 00000000fffe0ff0 DR7: 0000000000000400
> > Mar  7 13:52:08 localhost kernel: Call Trace:
> > Mar  7 13:52:08 localhost kernel: nfs_wb_page+0xd7/0x1b0 [nfs]
> 
> Ah... So the real problem is that we're not waiting for the outstanding
> commit? OK, so how about something like the following then?
> 
Yes, this works.  I ran it through a dozen fio runs on v4.1 and 1000 runs
of generic/247 on v3/v4.0/v4.1/v4.2 and didn't see any EBUSY errors.
Also ran the xfstests "quick" group (~80-90 tests) plus generic/074 on
v3/v4.0/v4.1/v4.2.  Finally, I double checked the panic on umount issue
that dc4fd9ab01ab3 fixed and that still works too.

Thanks,
Scott

> 8<------------------------------------------
> From f2b7634d8a05100631ab019d4fb5092ed5fe3c03 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> Date: Wed, 7 Mar 2018 15:22:31 -0500
> Subject: [PATCH] NFS: Don't circumvent wait for commit completion
> 
> We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to
> ensure that all outstanding COMMIT requests to the inode in question are
> complete. Currently we will exit early if we did not have to schedule
> a new COMMIT request.
> 
> Fixes: dc4fd9ab01ab3 ("nfs: don't wait on commit in nfs_commit_inode()...")
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> Cc: stable@vger.kernel.org # 4.5+
> ---
>  fs/nfs/write.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 93460f1cf5a4..89ca7b725454 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1886,8 +1886,6 @@ int nfs_commit_inode(struct inode *inode, int how)
>  	if (res)
>  		error = nfs_generic_commit_list(inode, &head, how, &cinfo);
>  	nfs_commit_end(cinfo.mds);
> -	if (res == 0)
> -		return res;
>  	if (error < 0)
>  		goto out_error;
>  	if (!may_wait)
> @@ -1904,7 +1902,8 @@ int nfs_commit_inode(struct inode *inode, int how)
>  	 * that the data is on the disk.
>  	 */
>  out_mark_dirty:
> -	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> +	if (atomic_read(&cinfo.mds->rpcs_out))
> +		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
>  	return res;
>  }
>  EXPORT_SYMBOL_GPL(nfs_commit_inode);
> -- 
> 2.14.3
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust March 8, 2018, 5:13 p.m. UTC | #3
T24gVGh1LCAyMDE4LTAzLTA4IGF0IDA4OjA5IC0wNTAwLCBTY290dCBNYXloZXcgd3JvdGU6DQo+
IFllcywgdGhpcyB3b3Jrcy4gIEkgcmFuIGl0IHRocm91Z2ggYSBkb3plbiBmaW8gcnVucyBvbiB2
NC4xIGFuZCAxMDAwDQo+IHJ1bnMNCj4gb2YgZ2VuZXJpYy8yNDcgb24gdjMvdjQuMC92NC4xL3Y0
LjIgYW5kIGRpZG4ndCBzZWUgYW55IEVCVVNZIGVycm9ycy4NCj4gQWxzbyByYW4gdGhlIHhmc3Rl
c3RzICJxdWljayIgZ3JvdXAgKH44MC05MCB0ZXN0cykgcGx1cyBnZW5lcmljLzA3NA0KPiBvbg0K
PiB2My92NC4wL3Y0LjEvdjQuMi4gIEZpbmFsbHksIEkgZG91YmxlIGNoZWNrZWQgdGhlIHBhbmlj
IG9uIHVtb3VudA0KPiBpc3N1ZQ0KPiB0aGF0IGRjNGZkOWFiMDFhYjMgZml4ZWQgYW5kIHRoYXQg
c3RpbGwgd29ya3MgdG9vLg0KDQpJIHRvb2sgYSBsb25nIGhhcmQgbG9vayBhdCB3aGF0IHdlIGFj
dHVhbGx5IG5lZWQgaW4gdGhhdCBhcmVhIG9mIHRoZQ0KY29kZS4gVGhlcmUgYXJlIGEgZmV3IHRo
aW5ncyB0aGF0IGFyZSBzdGlsbCBicm9rZW4gdGhlcmU6DQoNCkZpcnN0bHksIHdlIHdhbnQgdG8g
a2VlcCB0aGUgaW5vZGUgbWFya2VkIGFzIElfRElSVFlfREFUQVNZTkMgYXMgbG9uZw0KYXMgd2Ug
aGF2ZSBzdGFibGUgd3JpdGVzIHRoYXQgYXJlIHVuZGVyZ29pbmcgY29tbWl0IG9yIGFyZSB3YWl0
aW5nIHRvDQpiZSBzY2hlZHVsZWQuIFRoZSByZWFzb24gaXMgdGhhdCBlbnN1cmVzIHN5bmNfaW5v
ZGUoKSBiZWhhdmVzIGNvcnJlY3RseQ0KYnkgY2FsbGluZyBpbnRvIG5mc193cml0ZV9pbm9kZSgp
IHNvIHRoYXQgd2UgY2FuIHNjaGVkdWxlIENPTU1JVHMgYW5kDQp3YWl0IGZvciB0aGVtIGFsbCB0
byBjb21wbGV0ZS4NCkN1cnJlbnRseSB3ZSBhcmUgYnJva2VuIGluIHRoYXQgbmZzX3dyaXRlX2lu
b2RlKCkgd2lsbCBub3QgcmVzZXQNCklfRElSVFlfREFUQVNZTkMgaWYgdGhlcmUgYXJlIHN0aWxs
IENPTU1JVHMgaW4gZmxpZ2h0IGR1ZSB0byBoYXZpbmcNCmNhbGxlZCBpdCB3aXRoIHdiYy0+c3lu
Y19tb2RlID09IFdCX1NZTkNfTk9ORS4NCg0KU2Vjb25kbHksIHdlIHdhbnQgdG8gZW5zdXJlIHRo
YXQgaWYgdGhlIG51bWJlciBvZiByZXF1ZXN0cyBpcyA+DQpJTlRfTUFYLCB3ZSBsb29wIGFyb3Vu
ZCBhbmQgc2NoZWR1bGUgbW9yZSBDT01NSVRzIHNvIHRoYXQNCm5mc19jb21taXRfaW5vZGUoaW5v
ZGUsIEZMVVNIX1NZTkMpIGlzIHJlbGlhYmxlIG9uIHN5c3RlbXMgd2l0aCBsb3RzIG9mDQptZW1v
cnkuDQoNCkZpbmFsbHksIGl0IGlzIHdvcnRoIG5vdGluZyB0aGF0IGl0J3Mgb25seSB3aGVuIGNh
bGxlZCBmcm9tDQpfX3dyaXRlYmFja19zaW5nbGVfaW5vZGUoKSwgYW5kIHRoZSBhdHRlbXB0IHRv
IGNsZWFuIHRoZSBpbm9kZSBmYWlsZWQNCnRoYXQgd2UgbmVlZCB0byByZXNldCB0aGUgaW5vZGUg
c3RhdGUuIFNvIHdlIGNhbiBvcHRpbWlzZSBieSBwdXNoaW5nDQp0aG9zZSBjYWxscyB0byBfX21h
cmtfaW5vZGVfZGlydHkoKSBpbnRvIG5mc193cml0ZV9pbm9kZSgpLg0KDQpTbyBob3cgYWJvdXQg
dGhlIGZvbGxvd2luZyB2MiBwYXRjaCBpbnN0ZWFkPw0KODwtLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ0KRnJvbSAzODY5NzhjYzNlZjQ0OTRiOWY5NTM5MDc0N2My
MjY4ZjgzMThiOTRiIE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQ0KRnJvbTogVHJvbmQgTXlrbGVi
dXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KRGF0ZTogV2VkLCA3IE1hciAy
MDE4IDE1OjIyOjMxIC0wNTAwDQpTdWJqZWN0OiBbUEFUQ0ggdjJdIE5GUzogRml4IHVuc3RhYmxl
IHdyaXRlIGNvbXBsZXRpb24NCg0KV2UgZG8gd2FudCB0byByZXNwZWN0IHRoZSBGTFVTSF9TWU5D
IGFyZ3VtZW50IHRvIG5mc19jb21taXRfaW5vZGUoKSB0bw0KZW5zdXJlIHRoYXQgYWxsIG91dHN0
YW5kaW5nIENPTU1JVCByZXF1ZXN0cyB0byB0aGUgaW5vZGUgaW4gcXVlc3Rpb24gYXJlDQpjb21w
bGV0ZS4gQ3VycmVudGx5IHdlIG1heSBleGl0IGVhcmx5IGZyb20gYm90aCBuZnNfY29tbWl0X2lu
b2RlKCkgYW5kDQpuZnNfd3JpdGVfaW5vZGUoKSBldmVuIGlmIHRoZXJlIGFyZSBDT01NSVQgcmVx
dWVzdHMgaW4gZmxpZ2h0LCBvciB1bnN0YWJsZQ0Kd3JpdGVzIG9uIHRoZSBjb21taXQgbGlzdC4N
Cg0KSW4gb3JkZXIgdG8gZ2V0IHRoZSByaWdodCBzZW1hbnRpY3Mgdy5yLnQuIHN5bmNfaW5vZGUo
KSwgd2UgZG9uJ3QgbmVlZA0KdG8gaGF2ZSBuZnNfY29tbWl0X2lub2RlKCkgcmVzZXQgdGhlIGlu
b2RlIGRpcnR5IGZsYWdzIHdoZW4gY2FsbGVkIGZyb20NCm5mc193Yl9wYWdlKCkgYW5kL29yIG5m
c193Yl9hbGwoKS4gV2UganVzdCBuZWVkIHRvIGVuc3VyZSB0aGF0DQpuZnNfd3JpdGVfaW5vZGUo
KSBsZWF2ZXMgdGhlbSBpbiB0aGUgcmlnaHQgc3RhdGUgaWYgdGhlcmUgYXJlIG91dHN0YW5kaW5n
DQpjb21taXRzLCBvciBzdGFibGUgcGFnZXMuDQoNClJlcG9ydGVkLWJ5OiBTY290dCBNYXloZXcg
PHNtYXloZXdAcmVkaGF0LmNvbT4NCkZpeGVzOiBkYzRmZDlhYjAxYWIgKCJuZnM6IGRvbid0IHdh
aXQgb24gY29tbWl0IGluIG5mc19jb21taXRfaW5vZGUoKS4uLiIpDQpDYzogc3RhYmxlQHZnZXIu
a2VybmVsLm9yZyAjIHY0LjUrOiA1Y2I5NTNkNGIxZTc6IE5GUzogVXNlIGFuIGF0b21pY19sb25n
X3QNCkNjOiBzdGFibGVAdmdlci5rZXJuZWwub3JnICMgdjQuNSsNClNpZ25lZC1vZmYtYnk6IFRy
b25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCi0tLQ0KIGZz
L25mcy93cml0ZS5jIHwgODMgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLQ0KIDEgZmlsZSBjaGFuZ2VkLCA0MyBpbnNlcnRpb25zKCspLCA0
MCBkZWxldGlvbnMoLSkNCg0KZGlmZiAtLWdpdCBhL2ZzL25mcy93cml0ZS5jIGIvZnMvbmZzL3dy
aXRlLmMNCmluZGV4IDc0MjhhNjY5ZDdhNy4uZTdkOGNlYWU4ZjI2IDEwMDY0NA0KLS0tIGEvZnMv
bmZzL3dyaXRlLmMNCisrKyBiL2ZzL25mcy93cml0ZS5jDQpAQCAtMTg3Niw0MCArMTg3Niw0MyBA
QCBpbnQgbmZzX2dlbmVyaWNfY29tbWl0X2xpc3Qoc3RydWN0IGlub2RlICppbm9kZSwgc3RydWN0
IGxpc3RfaGVhZCAqaGVhZCwNCiAJcmV0dXJuIHN0YXR1czsNCiB9DQogDQotaW50IG5mc19jb21t
aXRfaW5vZGUoc3RydWN0IGlub2RlICppbm9kZSwgaW50IGhvdykNCitzdGF0aWMgaW50IF9fbmZz
X2NvbW1pdF9pbm9kZShzdHJ1Y3QgaW5vZGUgKmlub2RlLCBpbnQgaG93LA0KKwkJc3RydWN0IHdy
aXRlYmFja19jb250cm9sICp3YmMpDQogew0KIAlMSVNUX0hFQUQoaGVhZCk7DQogCXN0cnVjdCBu
ZnNfY29tbWl0X2luZm8gY2luZm87DQogCWludCBtYXlfd2FpdCA9IGhvdyAmIEZMVVNIX1NZTkM7
DQotCWludCBlcnJvciA9IDA7DQotCWludCByZXM7DQorCWludCByZXQsIG5zY2FuOw0KIA0KIAlu
ZnNfaW5pdF9jaW5mb19mcm9tX2lub2RlKCZjaW5mbywgaW5vZGUpOw0KIAluZnNfY29tbWl0X2Jl
Z2luKGNpbmZvLm1kcyk7DQotCXJlcyA9IG5mc19zY2FuX2NvbW1pdChpbm9kZSwgJmhlYWQsICZj
aW5mbyk7DQotCWlmIChyZXMpDQotCQllcnJvciA9IG5mc19nZW5lcmljX2NvbW1pdF9saXN0KGlu
b2RlLCAmaGVhZCwgaG93LCAmY2luZm8pOw0KKwlmb3IgKDs7KSB7DQorCQlyZXQgPSBuc2NhbiA9
IG5mc19zY2FuX2NvbW1pdChpbm9kZSwgJmhlYWQsICZjaW5mbyk7DQorCQlpZiAocmV0IDw9IDAp
DQorCQkJYnJlYWs7DQorCQlyZXQgPSBuZnNfZ2VuZXJpY19jb21taXRfbGlzdChpbm9kZSwgJmhl
YWQsIGhvdywgJmNpbmZvKTsNCisJCWlmIChyZXQgPCAwKQ0KKwkJCWJyZWFrOw0KKwkJcmV0ID0g
MDsNCisJCWlmICh3YmMgJiYgd2JjLT5zeW5jX21vZGUgPT0gV0JfU1lOQ19OT05FKSB7DQorCQkJ
aWYgKG5zY2FuIDwgd2JjLT5ucl90b193cml0ZSkNCisJCQkJd2JjLT5ucl90b193cml0ZSAtPSBu
c2NhbjsNCisJCQllbHNlDQorCQkJCXdiYy0+bnJfdG9fd3JpdGUgPSAwOw0KKwkJfQ0KKwkJaWYg
KG5zY2FuIDwgSU5UX01BWCkNCisJCQlicmVhazsNCisJCWNvbmRfcmVzY2hlZCgpOw0KKwl9DQog
CW5mc19jb21taXRfZW5kKGNpbmZvLm1kcyk7DQotCWlmIChyZXMgPT0gMCkNCi0JCXJldHVybiBy
ZXM7DQotCWlmIChlcnJvciA8IDApDQotCQlnb3RvIG91dF9lcnJvcjsNCi0JaWYgKCFtYXlfd2Fp
dCkNCi0JCWdvdG8gb3V0X21hcmtfZGlydHk7DQotCWVycm9yID0gd2FpdF9vbl9jb21taXQoY2lu
Zm8ubWRzKTsNCi0JaWYgKGVycm9yIDwgMCkNCi0JCXJldHVybiBlcnJvcjsNCi0JcmV0dXJuIHJl
czsNCi1vdXRfZXJyb3I6DQotCXJlcyA9IGVycm9yOw0KLQkvKiBOb3RlOiBJZiB3ZSBleGl0IHdp
dGhvdXQgZW5zdXJpbmcgdGhhdCB0aGUgY29tbWl0IGlzIGNvbXBsZXRlLA0KLQkgKiB3ZSBtdXN0
IG1hcmsgdGhlIGlub2RlIGFzIGRpcnR5LiBPdGhlcndpc2UsIGZ1dHVyZSBjYWxscyB0bw0KLQkg
KiBzeW5jX2lub2RlKCkgd2l0aCB0aGUgV0JfU1lOQ19BTEwgZmxhZyBzZXQgd2lsbCBmYWlsIHRv
IGVuc3VyZQ0KLQkgKiB0aGF0IHRoZSBkYXRhIGlzIG9uIHRoZSBkaXNrLg0KLQkgKi8NCi1vdXRf
bWFya19kaXJ0eToNCi0JX19tYXJrX2lub2RlX2RpcnR5KGlub2RlLCBJX0RJUlRZX0RBVEFTWU5D
KTsNCi0JcmV0dXJuIHJlczsNCisJaWYgKHJldCB8fCAhbWF5X3dhaXQpDQorCQlyZXR1cm4gcmV0
Ow0KKwlyZXR1cm4gd2FpdF9vbl9jb21taXQoY2luZm8ubWRzKTsNCit9DQorDQoraW50IG5mc19j
b21taXRfaW5vZGUoc3RydWN0IGlub2RlICppbm9kZSwgaW50IGhvdykNCit7DQorCXJldHVybiBf
X25mc19jb21taXRfaW5vZGUoaW5vZGUsIGhvdywgTlVMTCk7DQogfQ0KIEVYUE9SVF9TWU1CT0xf
R1BMKG5mc19jb21taXRfaW5vZGUpOw0KIA0KQEAgLTE5MTksMTEgKzE5MjIsMTEgQEAgaW50IG5m
c193cml0ZV9pbm9kZShzdHJ1Y3QgaW5vZGUgKmlub2RlLCBzdHJ1Y3Qgd3JpdGViYWNrX2NvbnRy
b2wgKndiYykNCiAJaW50IGZsYWdzID0gRkxVU0hfU1lOQzsNCiAJaW50IHJldCA9IDA7DQogDQot
CS8qIG5vIGNvbW1pdHMgbWVhbnMgbm90aGluZyBuZWVkcyB0byBiZSBkb25lICovDQotCWlmICgh
YXRvbWljX2xvbmdfcmVhZCgmbmZzaS0+Y29tbWl0X2luZm8ubmNvbW1pdCkpDQotCQlyZXR1cm4g
cmV0Ow0KLQ0KIAlpZiAod2JjLT5zeW5jX21vZGUgPT0gV0JfU1lOQ19OT05FKSB7DQorCQkvKiBu
byBjb21taXRzIG1lYW5zIG5vdGhpbmcgbmVlZHMgdG8gYmUgZG9uZSAqLw0KKwkJaWYgKCFhdG9t
aWNfbG9uZ19yZWFkKCZuZnNpLT5jb21taXRfaW5mby5uY29tbWl0KSkNCisJCQlnb3RvIGNoZWNr
X3JlcXVlc3RzX291dHN0YW5kaW5nOw0KKw0KIAkJLyogRG9uJ3QgY29tbWl0IHlldCBpZiB0aGlz
IGlzIGEgbm9uLWJsb2NraW5nIGZsdXNoIGFuZCB0aGVyZQ0KIAkJICogYXJlIGEgbG90IG9mIG91
dHN0YW5kaW5nIHdyaXRlcyBmb3IgdGhpcyBtYXBwaW5nLg0KIAkJICovDQpAQCAtMTkzNCwxNiAr
MTkzNywxNiBAQCBpbnQgbmZzX3dyaXRlX2lub2RlKHN0cnVjdCBpbm9kZSAqaW5vZGUsIHN0cnVj
dCB3cml0ZWJhY2tfY29udHJvbCAqd2JjKQ0KIAkJZmxhZ3MgPSAwOw0KIAl9DQogDQotCXJldCA9
IG5mc19jb21taXRfaW5vZGUoaW5vZGUsIGZsYWdzKTsNCi0JaWYgKHJldCA+PSAwKSB7DQotCQlp
ZiAod2JjLT5zeW5jX21vZGUgPT0gV0JfU1lOQ19OT05FKSB7DQotCQkJaWYgKHJldCA8IHdiYy0+
bnJfdG9fd3JpdGUpDQotCQkJCXdiYy0+bnJfdG9fd3JpdGUgLT0gcmV0Ow0KLQkJCWVsc2UNCi0J
CQkJd2JjLT5ucl90b193cml0ZSA9IDA7DQotCQl9DQotCQlyZXR1cm4gMDsNCi0JfQ0KKwlyZXQg
PSBfX25mc19jb21taXRfaW5vZGUoaW5vZGUsIGZsYWdzLCB3YmMpOw0KKwlpZiAoIXJldCkgew0K
KwkJaWYgKGZsYWdzICYgRkxVU0hfU1lOQykNCisJCQlyZXR1cm4gMDsNCisJfSBlbHNlIGlmIChh
dG9taWNfbG9uZ19yZWFkKCZuZnNpLT5jb21taXRfaW5mby5uY29tbWl0KSkNCisJCWdvdG8gb3V0
X21hcmtfZGlydHk7DQorDQorY2hlY2tfcmVxdWVzdHNfb3V0c3RhbmRpbmc6DQorCWlmICghYXRv
bWljX3JlYWQoJm5mc2ktPmNvbW1pdF9pbmZvLnJwY3Nfb3V0KSkNCisJCXJldHVybiByZXQ7DQog
b3V0X21hcmtfZGlydHk6DQogCV9fbWFya19pbm9kZV9kaXJ0eShpbm9kZSwgSV9ESVJUWV9EQVRB
U1lOQyk7DQogCXJldHVybiByZXQ7DQotLSANCjIuMTQuMw0KDQotLSANClRyb25kIE15a2xlYnVz
dA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVi
dXN0QHByaW1hcnlkYXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields March 8, 2018, 9:39 p.m. UTC | #4
On Thu, Mar 08, 2018 at 08:09:01AM -0500, Scott Mayhew wrote:
> Yes, this works.  I ran it through a dozen fio runs on v4.1 and 1000 runs
> of generic/247 on v3/v4.0/v4.1/v4.2 and didn't see any EBUSY errors.
> Also ran the xfstests "quick" group (~80-90 tests) plus generic/074 on
> v3/v4.0/v4.1/v4.2.  Finally, I double checked the panic on umount issue
> that dc4fd9ab01ab3 fixed and that still works too.

Works for me too.

(Yeah, I see there's a new patch.  Testing queued up but not run
yet...).

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust March 8, 2018, 10:01 p.m. UTC | #5
T24gVGh1LCAyMDE4LTAzLTA4IGF0IDE2OjM5IC0wNTAwLCBiZmllbGRzQGZpZWxkc2VzLm9yZyB3
cm90ZToNCj4gT24gVGh1LCBNYXIgMDgsIDIwMTggYXQgMDg6MDk6MDFBTSAtMDUwMCwgU2NvdHQg
TWF5aGV3IHdyb3RlOg0KPiA+IFllcywgdGhpcyB3b3Jrcy4gIEkgcmFuIGl0IHRocm91Z2ggYSBk
b3plbiBmaW8gcnVucyBvbiB2NC4xIGFuZA0KPiA+IDEwMDAgcnVucw0KPiA+IG9mIGdlbmVyaWMv
MjQ3IG9uIHYzL3Y0LjAvdjQuMS92NC4yIGFuZCBkaWRuJ3Qgc2VlIGFueSBFQlVTWQ0KPiA+IGVy
cm9ycy4NCj4gPiBBbHNvIHJhbiB0aGUgeGZzdGVzdHMgInF1aWNrIiBncm91cCAofjgwLTkwIHRl
c3RzKSBwbHVzIGdlbmVyaWMvMDc0DQo+ID4gb24NCj4gPiB2My92NC4wL3Y0LjEvdjQuMi4gIEZp
bmFsbHksIEkgZG91YmxlIGNoZWNrZWQgdGhlIHBhbmljIG9uIHVtb3VudA0KPiA+IGlzc3VlDQo+
ID4gdGhhdCBkYzRmZDlhYjAxYWIzIGZpeGVkIGFuZCB0aGF0IHN0aWxsIHdvcmtzIHRvby4NCj4g
DQo+IFdvcmtzIGZvciBtZSB0b28uDQo+IA0KPiAoWWVhaCwgSSBzZWUgdGhlcmUncyBhIG5ldyBw
YXRjaC4gIFRlc3RpbmcgcXVldWVkIHVwIGJ1dCBub3QgcnVuDQo+IHlldC4uLikuDQoNClNvcnJ5
IGZvciBwdWxsaW5nIHRoZSAibmV3IHBhdGNoIHN3aXRjaCIgb24geW91IGJvdGgsIGJ1dCBJIGZp
Z3VyZWQgaXQNCndvdWxkIGJlIGJldHRlciB0byByZS1leGFtaW5lIHdoZXJlIHRoZSByZXF1aXJl
bWVudCBmb3IgdGhlIGRpcnR5IGZsYWcNCmlzIGNvbWluZyBmcm9tLCBhbmQgdG8gZW5zdXJlIHRo
YXQgd2UgbWVldCB0aGF0IHJlcXVpcmVtZW50IG9uY2UgYW5kDQpmb3IgYWxsLg0KDQotLSANClRy
b25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0K
dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields March 9, 2018, 2:46 a.m. UTC | #6
On Thu, Mar 08, 2018 at 10:01:42PM +0000, Trond Myklebust wrote:
> On Thu, 2018-03-08 at 16:39 -0500, bfields@fieldses.org wrote:
> > On Thu, Mar 08, 2018 at 08:09:01AM -0500, Scott Mayhew wrote:
> > > Yes, this works.  I ran it through a dozen fio runs on v4.1 and
> > > 1000 runs
> > > of generic/247 on v3/v4.0/v4.1/v4.2 and didn't see any EBUSY
> > > errors.
> > > Also ran the xfstests "quick" group (~80-90 tests) plus generic/074
> > > on
> > > v3/v4.0/v4.1/v4.2.  Finally, I double checked the panic on umount
> > > issue
> > > that dc4fd9ab01ab3 fixed and that still works too.
> > 
> > Works for me too.
> > 
> > (Yeah, I see there's a new patch.  Testing queued up but not run
> > yet...).
> 
> Sorry for pulling the "new patch switch" on you both, but I figured it
> would be better to re-examine where the requirement for the dirty flag
> is coming from, and to ensure that we meet that requirement once and
> for all.

No problem.  My tests are passing on the new patch as well.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Mayhew March 12, 2018, 12:07 p.m. UTC | #7
On Thu, 08 Mar 2018, Trond Myklebust wrote:

> On Thu, 2018-03-08 at 08:09 -0500, Scott Mayhew wrote:
> > Yes, this works.  I ran it through a dozen fio runs on v4.1 and 1000
> > runs
> > of generic/247 on v3/v4.0/v4.1/v4.2 and didn't see any EBUSY errors.
> > Also ran the xfstests "quick" group (~80-90 tests) plus generic/074
> > on
> > v3/v4.0/v4.1/v4.2.  Finally, I double checked the panic on umount
> > issue
> > that dc4fd9ab01ab3 fixed and that still works too.
> 
> I took a long hard look at what we actually need in that area of the
> code. There are a few things that are still broken there:
> 
> Firstly, we want to keep the inode marked as I_DIRTY_DATASYNC as long
> as we have stable writes that are undergoing commit or are waiting to
> be scheduled. The reason is that ensures sync_inode() behaves correctly
> by calling into nfs_write_inode() so that we can schedule COMMITs and
> wait for them all to complete.
> Currently we are broken in that nfs_write_inode() will not reset
> I_DIRTY_DATASYNC if there are still COMMITs in flight due to having
> called it with wbc->sync_mode == WB_SYNC_NONE.
> 
> Secondly, we want to ensure that if the number of requests is >
> INT_MAX, we loop around and schedule more COMMITs so that
> nfs_commit_inode(inode, FLUSH_SYNC) is reliable on systems with lots of
> memory.
> 
> Finally, it is worth noting that it's only when called from
> __writeback_single_inode(), and the attempt to clean the inode failed
> that we need to reset the inode state. So we can optimise by pushing
> those calls to __mark_inode_dirty() into nfs_write_inode().
> 
> So how about the following v2 patch instead?
> 8<--------------------------------------------
> From 386978cc3ef4494b9f95390747c2268f8318b94b Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> Date: Wed, 7 Mar 2018 15:22:31 -0500
> Subject: [PATCH v2] NFS: Fix unstable write completion
> 
> We do want to respect the FLUSH_SYNC argument to nfs_commit_inode() to
> ensure that all outstanding COMMIT requests to the inode in question are
> complete. Currently we may exit early from both nfs_commit_inode() and
> nfs_write_inode() even if there are COMMIT requests in flight, or unstable
> writes on the commit list.
> 
> In order to get the right semantics w.r.t. sync_inode(), we don't need
> to have nfs_commit_inode() reset the inode dirty flags when called from
> nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that
> nfs_write_inode() leaves them in the right state if there are outstanding
> commits, or stable pages.
> 
> Reported-by: Scott Mayhew <smayhew@redhat.com>
> Fixes: dc4fd9ab01ab ("nfs: don't wait on commit in nfs_commit_inode()...")
> Cc: stable@vger.kernel.org # v4.5+: 5cb953d4b1e7: NFS: Use an atomic_long_t
> Cc: stable@vger.kernel.org # v4.5+
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

I ran all the same tests as before and this is working fine.

-Scott

> ---
>  fs/nfs/write.c | 83 ++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 43 insertions(+), 40 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 7428a669d7a7..e7d8ceae8f26 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1876,40 +1876,43 @@ int nfs_generic_commit_list(struct inode *inode, struct list_head *head,
>  	return status;
>  }
>  
> -int nfs_commit_inode(struct inode *inode, int how)
> +static int __nfs_commit_inode(struct inode *inode, int how,
> +		struct writeback_control *wbc)
>  {
>  	LIST_HEAD(head);
>  	struct nfs_commit_info cinfo;
>  	int may_wait = how & FLUSH_SYNC;
> -	int error = 0;
> -	int res;
> +	int ret, nscan;
>  
>  	nfs_init_cinfo_from_inode(&cinfo, inode);
>  	nfs_commit_begin(cinfo.mds);
> -	res = nfs_scan_commit(inode, &head, &cinfo);
> -	if (res)
> -		error = nfs_generic_commit_list(inode, &head, how, &cinfo);
> +	for (;;) {
> +		ret = nscan = nfs_scan_commit(inode, &head, &cinfo);
> +		if (ret <= 0)
> +			break;
> +		ret = nfs_generic_commit_list(inode, &head, how, &cinfo);
> +		if (ret < 0)
> +			break;
> +		ret = 0;
> +		if (wbc && wbc->sync_mode == WB_SYNC_NONE) {
> +			if (nscan < wbc->nr_to_write)
> +				wbc->nr_to_write -= nscan;
> +			else
> +				wbc->nr_to_write = 0;
> +		}
> +		if (nscan < INT_MAX)
> +			break;
> +		cond_resched();
> +	}
>  	nfs_commit_end(cinfo.mds);
> -	if (res == 0)
> -		return res;
> -	if (error < 0)
> -		goto out_error;
> -	if (!may_wait)
> -		goto out_mark_dirty;
> -	error = wait_on_commit(cinfo.mds);
> -	if (error < 0)
> -		return error;
> -	return res;
> -out_error:
> -	res = error;
> -	/* Note: If we exit without ensuring that the commit is complete,
> -	 * we must mark the inode as dirty. Otherwise, future calls to
> -	 * sync_inode() with the WB_SYNC_ALL flag set will fail to ensure
> -	 * that the data is on the disk.
> -	 */
> -out_mark_dirty:
> -	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> -	return res;
> +	if (ret || !may_wait)
> +		return ret;
> +	return wait_on_commit(cinfo.mds);
> +}
> +
> +int nfs_commit_inode(struct inode *inode, int how)
> +{
> +	return __nfs_commit_inode(inode, how, NULL);
>  }
>  EXPORT_SYMBOL_GPL(nfs_commit_inode);
>  
> @@ -1919,11 +1922,11 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>  	int flags = FLUSH_SYNC;
>  	int ret = 0;
>  
> -	/* no commits means nothing needs to be done */
> -	if (!atomic_long_read(&nfsi->commit_info.ncommit))
> -		return ret;
> -
>  	if (wbc->sync_mode == WB_SYNC_NONE) {
> +		/* no commits means nothing needs to be done */
> +		if (!atomic_long_read(&nfsi->commit_info.ncommit))
> +			goto check_requests_outstanding;
> +
>  		/* Don't commit yet if this is a non-blocking flush and there
>  		 * are a lot of outstanding writes for this mapping.
>  		 */
> @@ -1934,16 +1937,16 @@ int nfs_write_inode(struct inode *inode, struct writeback_control *wbc)
>  		flags = 0;
>  	}
>  
> -	ret = nfs_commit_inode(inode, flags);
> -	if (ret >= 0) {
> -		if (wbc->sync_mode == WB_SYNC_NONE) {
> -			if (ret < wbc->nr_to_write)
> -				wbc->nr_to_write -= ret;
> -			else
> -				wbc->nr_to_write = 0;
> -		}
> -		return 0;
> -	}
> +	ret = __nfs_commit_inode(inode, flags, wbc);
> +	if (!ret) {
> +		if (flags & FLUSH_SYNC)
> +			return 0;
> +	} else if (atomic_long_read(&nfsi->commit_info.ncommit))
> +		goto out_mark_dirty;
> +
> +check_requests_outstanding:
> +	if (!atomic_read(&nfsi->commit_info.rpcs_out))
> +		return ret;
>  out_mark_dirty:
>  	__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
>  	return ret;
> -- 
> 2.14.3
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> trond.myklebust@primarydata.com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Trond Myklebust March 12, 2018, 12:32 p.m. UTC | #8
On Mon, 2018-03-12 at 08:07 -0400, Scott Mayhew wrote:
> On Thu, 08 Mar 2018, Trond Myklebust wrote:

> 

> > On Thu, 2018-03-08 at 08:09 -0500, Scott Mayhew wrote:

> > > Yes, this works.  I ran it through a dozen fio runs on v4.1 and

> > > 1000

> > > runs

> > > of generic/247 on v3/v4.0/v4.1/v4.2 and didn't see any EBUSY

> > > errors.

> > > Also ran the xfstests "quick" group (~80-90 tests) plus

> > > generic/074

> > > on

> > > v3/v4.0/v4.1/v4.2.  Finally, I double checked the panic on umount

> > > issue

> > > that dc4fd9ab01ab3 fixed and that still works too.

> > 

> > I took a long hard look at what we actually need in that area of

> > the

> > code. There are a few things that are still broken there:

> > 

> > Firstly, we want to keep the inode marked as I_DIRTY_DATASYNC as

> > long

> > as we have stable writes that are undergoing commit or are waiting

> > to

> > be scheduled. The reason is that ensures sync_inode() behaves

> > correctly

> > by calling into nfs_write_inode() so that we can schedule COMMITs

> > and

> > wait for them all to complete.

> > Currently we are broken in that nfs_write_inode() will not reset

> > I_DIRTY_DATASYNC if there are still COMMITs in flight due to having

> > called it with wbc->sync_mode == WB_SYNC_NONE.

> > 

> > Secondly, we want to ensure that if the number of requests is >

> > INT_MAX, we loop around and schedule more COMMITs so that

> > nfs_commit_inode(inode, FLUSH_SYNC) is reliable on systems with

> > lots of

> > memory.

> > 

> > Finally, it is worth noting that it's only when called from

> > __writeback_single_inode(), and the attempt to clean the inode

> > failed

> > that we need to reset the inode state. So we can optimise by

> > pushing

> > those calls to __mark_inode_dirty() into nfs_write_inode().

> > 

> > So how about the following v2 patch instead?

> > 8<--------------------------------------------

> > From 386978cc3ef4494b9f95390747c2268f8318b94b Mon Sep 17 00:00:00

> > 2001

> > From: Trond Myklebust <trond.myklebust@primarydata.com>

> > Date: Wed, 7 Mar 2018 15:22:31 -0500

> > Subject: [PATCH v2] NFS: Fix unstable write completion

> > 

> > We do want to respect the FLUSH_SYNC argument to nfs_commit_inode()

> > to

> > ensure that all outstanding COMMIT requests to the inode in

> > question are

> > complete. Currently we may exit early from both nfs_commit_inode()

> > and

> > nfs_write_inode() even if there are COMMIT requests in flight, or

> > unstable

> > writes on the commit list.

> > 

> > In order to get the right semantics w.r.t. sync_inode(), we don't

> > need

> > to have nfs_commit_inode() reset the inode dirty flags when called

> > from

> > nfs_wb_page() and/or nfs_wb_all(). We just need to ensure that

> > nfs_write_inode() leaves them in the right state if there are

> > outstanding

> > commits, or stable pages.

> > 

> > Reported-by: Scott Mayhew <smayhew@redhat.com>

> > Fixes: dc4fd9ab01ab ("nfs: don't wait on commit in

> > nfs_commit_inode()...")

> > Cc: stable@vger.kernel.org # v4.5+: 5cb953d4b1e7: NFS: Use an

> > atomic_long_t

> > Cc: stable@vger.kernel.org # v4.5+

> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

> 

> I ran all the same tests as before and this is working fine.

> 


Thanks Scott!

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
diff mbox

Patch

diff --git a/mm/truncate.c b/mm/truncate.c
index c34e2fd4f583..909734a5d3a3 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -647,7 +647,7 @@  invalidate_complete_page2(struct address_space *mapping, struct page *page)
 
 static int do_launder_page(struct address_space *mapping, struct page *page)
 {
-       if (!PageDirty(page))
+       if (!PageDirty(page) && !PagePrivate(page))
                return 0;
        if (page->mapping != mapping || mapping->a_ops->launder_page == NULL)
                return 0;