diff mbox

[v2] NFSv4: replace seqcount_t with a seqlock_t

Message ID 20161028210511.k7hrsfxovti7gqtu@linutronix.de
State New, archived
Headers show

Commit Message

Sebastian Andrzej Siewior Oct. 28, 2016, 9:05 p.m. UTC
The raw_write_seqcount_begin() in nfs4_reclaim_open_state() bugs me
because it maps to preempt_disable() in -RT which I can't have at this
point. So I took a look at the code.
It the lockdep part was removed in commit abbec2da13f0 ("NFS: Use
raw_write_seqcount_begin/end int nfs4_reclaim_open_state") because
lockdep complained. The whole seqcount thing was introduced in commit
c137afabe330 ("NFSv4: Allow the state manager to mark an open_owner as
being recovered").
I don't understand how it is possible that we don't end up with two
writers for the same ressource because the `sp->so_lock' lock is dropped
is soon in the list_for_each_entry() loop. It might be the
test_and_clear_bit() check in nfs4_do_reclaim() but it might clear one
bit on each iteration so I *think* we could have two invocations of the
same struct nfs4_state_owner in nfs4_reclaim_open_state().
So there is that.

But back to the list_for_each_entry() macro.
It seems that this lock protects the ->so_states list among other
atomic_t & flags members. So at the begin of the loop we inc ->count
ensuring that this field is not removed while we use it. So we drop the
->so_lock loc during the loop it seems. And after nfs4_reclaim_locks()
invocation we nfs4_put_open_state() and grab the ->so_lock again. So if
we were the last user of this struct and we remove it, then the
following list_next_entry() invocation is a use-after-free. Even if we
use list_for_each_entry_safe() there is no guarantee that the following
member is still valid because it might have been removed by another
writer, right?
So there is this.

However to address my initial problem I have here a patch :) So it uses
a seqlock_t which ensures that there is only one writer at a time. So it
should be basically what is happening now plus a tiny tiny tiny lock
plus lockdep coverage. I tried to this myself but I don't manage to get
into this code path at all so I might be doing something wrong.

Could you please check if this patch is working for you and whether my
list_for_each_entry() observation is correct or not?

v1…v2: write_seqlock() disables preemption and some function need it
(thread_run(), non-GFP_ATOMIC memory alloction()). We don't want
preemption enabled because a preempted writer would stall the reader
spinning. This is a duct tape mutex. Maybe the seqlock should go.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 fs/nfs/delegation.c |  4 ++--
 fs/nfs/nfs4_fs.h    |  3 ++-
 fs/nfs/nfs4proc.c   |  4 ++--
 fs/nfs/nfs4state.c  | 23 +++++++++++++++++------
 4 files changed, 23 insertions(+), 11 deletions(-)

Comments

Trond Myklebust Oct. 28, 2016, 10:24 p.m. UTC | #1
DQo+IE9uIE9jdCAyOCwgMjAxNiwgYXQgMTc6MDUsIFNlYmFzdGlhbiBBbmRyemVqIFNpZXdpb3Ig
PGJpZ2Vhc3lAbGludXRyb25peC5kZT4gd3JvdGU6DQo+IA0KPiBUaGUgcmF3X3dyaXRlX3NlcWNv
dW50X2JlZ2luKCkgaW4gbmZzNF9yZWNsYWltX29wZW5fc3RhdGUoKSBidWdzIG1lDQo+IGJlY2F1
c2UgaXQgbWFwcyB0byBwcmVlbXB0X2Rpc2FibGUoKSBpbiAtUlQgd2hpY2ggSSBjYW4ndCBoYXZl
IGF0IHRoaXMNCj4gcG9pbnQuIFNvIEkgdG9vayBhIGxvb2sgYXQgdGhlIGNvZGUuDQo+IEl0IHRo
ZSBsb2NrZGVwIHBhcnQgd2FzIHJlbW92ZWQgaW4gY29tbWl0IGFiYmVjMmRhMTNmMCAoIk5GUzog
VXNlDQo+IHJhd193cml0ZV9zZXFjb3VudF9iZWdpbi9lbmQgaW50IG5mczRfcmVjbGFpbV9vcGVu
X3N0YXRlIikgYmVjYXVzZQ0KPiBsb2NrZGVwIGNvbXBsYWluZWQuIFRoZSB3aG9sZSBzZXFjb3Vu
dCB0aGluZyB3YXMgaW50cm9kdWNlZCBpbiBjb21taXQNCj4gYzEzN2FmYWJlMzMwICgiTkZTdjQ6
IEFsbG93IHRoZSBzdGF0ZSBtYW5hZ2VyIHRvIG1hcmsgYW4gb3Blbl9vd25lciBhcw0KPiBiZWlu
ZyByZWNvdmVyZWQiKS4NCj4gSSBkb24ndCB1bmRlcnN0YW5kIGhvdyBpdCBpcyBwb3NzaWJsZSB0
aGF0IHdlIGRvbid0IGVuZCB1cCB3aXRoIHR3bw0KPiB3cml0ZXJzIGZvciB0aGUgc2FtZSByZXNz
b3VyY2UgYmVjYXVzZSB0aGUgYHNwLT5zb19sb2NrJyBsb2NrIGlzIGRyb3BwZWQNCj4gaXMgc29v
biBpbiB0aGUgbGlzdF9mb3JfZWFjaF9lbnRyeSgpIGxvb3AuIEl0IG1pZ2h0IGJlIHRoZQ0KPiB0
ZXN0X2FuZF9jbGVhcl9iaXQoKSBjaGVjayBpbiBuZnM0X2RvX3JlY2xhaW0oKSBidXQgaXQgbWln
aHQgY2xlYXIgb25lDQo+IGJpdCBvbiBlYWNoIGl0ZXJhdGlvbiBzbyBJICp0aGluayogd2UgY291
bGQgaGF2ZSB0d28gaW52b2NhdGlvbnMgb2YgdGhlDQo+IHNhbWUgc3RydWN0IG5mczRfc3RhdGVf
b3duZXIgaW4gbmZzNF9yZWNsYWltX29wZW5fc3RhdGUoKS4NCj4gU28gdGhlcmUgaXMgdGhhdC4N
Cj4gDQo+IEJ1dCBiYWNrIHRvIHRoZSBsaXN0X2Zvcl9lYWNoX2VudHJ5KCkgbWFjcm8uDQo+IEl0
IHNlZW1zIHRoYXQgdGhpcyBsb2NrIHByb3RlY3RzIHRoZSAtPnNvX3N0YXRlcyBsaXN0IGFtb25n
IG90aGVyDQo+IGF0b21pY190ICYgZmxhZ3MgbWVtYmVycy4gU28gYXQgdGhlIGJlZ2luIG9mIHRo
ZSBsb29wIHdlIGluYyAtPmNvdW50DQo+IGVuc3VyaW5nIHRoYXQgdGhpcyBmaWVsZCBpcyBub3Qg
cmVtb3ZlZCB3aGlsZSB3ZSB1c2UgaXQuIFNvIHdlIGRyb3AgdGhlDQo+IC0+c29fbG9jayBsb2Mg
ZHVyaW5nIHRoZSBsb29wIGl0IHNlZW1zLiBBbmQgYWZ0ZXIgbmZzNF9yZWNsYWltX2xvY2tzKCkN
Cj4gaW52b2NhdGlvbiB3ZSBuZnM0X3B1dF9vcGVuX3N0YXRlKCkgYW5kIGdyYWIgdGhlIC0+c29f
bG9jayBhZ2Fpbi4gU28gaWYNCj4gd2Ugd2VyZSB0aGUgbGFzdCB1c2VyIG9mIHRoaXMgc3RydWN0
IGFuZCB3ZSByZW1vdmUgaXQsIHRoZW4gdGhlDQo+IGZvbGxvd2luZyBsaXN0X25leHRfZW50cnko
KSBpbnZvY2F0aW9uIGlzIGEgdXNlLWFmdGVyLWZyZWUuIEV2ZW4gaWYgd2UNCj4gdXNlIGxpc3Rf
Zm9yX2VhY2hfZW50cnlfc2FmZSgpIHRoZXJlIGlzIG5vIGd1YXJhbnRlZSB0aGF0IHRoZSBmb2xs
b3dpbmcNCj4gbWVtYmVyIGlzIHN0aWxsIHZhbGlkIGJlY2F1c2UgaXQgbWlnaHQgaGF2ZSBiZWVu
IHJlbW92ZWQgYnkgYW5vdGhlcg0KPiB3cml0ZXIsIHJpZ2h0Pw0KPiBTbyB0aGVyZSBpcyB0aGlz
Lg0KPiANCj4gSG93ZXZlciB0byBhZGRyZXNzIG15IGluaXRpYWwgcHJvYmxlbSBJIGhhdmUgaGVy
ZSBhIHBhdGNoIDopIFNvIGl0IHVzZXMNCj4gYSBzZXFsb2NrX3Qgd2hpY2ggZW5zdXJlcyB0aGF0
IHRoZXJlIGlzIG9ubHkgb25lIHdyaXRlciBhdCBhIHRpbWUuIFNvIGl0DQo+IHNob3VsZCBiZSBi
YXNpY2FsbHkgd2hhdCBpcyBoYXBwZW5pbmcgbm93IHBsdXMgYSB0aW55IHRpbnkgdGlueSBsb2Nr
DQo+IHBsdXMgbG9ja2RlcCBjb3ZlcmFnZS4gSSB0cmllZCB0byB0aGlzIG15c2VsZiBidXQgSSBk
b24ndCBtYW5hZ2UgdG8gZ2V0DQo+IGludG8gdGhpcyBjb2RlIHBhdGggYXQgYWxsIHNvIEkgbWln
aHQgYmUgZG9pbmcgc29tZXRoaW5nIHdyb25nLg0KPiANCj4gQ291bGQgeW91IHBsZWFzZSBjaGVj
ayBpZiB0aGlzIHBhdGNoIGlzIHdvcmtpbmcgZm9yIHlvdSBhbmQgd2hldGhlciBteQ0KPiBsaXN0
X2Zvcl9lYWNoX2VudHJ5KCkgb2JzZXJ2YXRpb24gaXMgY29ycmVjdCBvciBub3Q/DQo+IA0KPiB2
MeKApnYyOiB3cml0ZV9zZXFsb2NrKCkgZGlzYWJsZXMgcHJlZW1wdGlvbiBhbmQgc29tZSBmdW5j
dGlvbiBuZWVkIGl0DQo+ICh0aHJlYWRfcnVuKCksIG5vbi1HRlBfQVRPTUlDIG1lbW9yeSBhbGxv
Y3Rpb24oKSkuIFdlIGRvbid0IHdhbnQNCj4gcHJlZW1wdGlvbiBlbmFibGVkIGJlY2F1c2UgYSBw
cmVlbXB0ZWQgd3JpdGVyIHdvdWxkIHN0YWxsIHRoZSByZWFkZXINCj4gc3Bpbm5pbmcuIFRoaXMg
aXMgYSBkdWN0IHRhcGUgbXV0ZXguIE1heWJlIHRoZSBzZXFsb2NrIHNob3VsZCBnby4NCj4gDQo+
IFNpZ25lZC1vZmYtYnk6IFNlYmFzdGlhbiBBbmRyemVqIFNpZXdpb3IgPGJpZ2Vhc3lAbGludXRy
b25peC5kZT4NCj4gLS0tDQo+IGZzL25mcy9kZWxlZ2F0aW9uLmMgfCAgNCArKy0tDQo+IGZzL25m
cy9uZnM0X2ZzLmggICAgfCAgMyArKy0NCj4gZnMvbmZzL25mczRwcm9jLmMgICB8ICA0ICsrLS0N
Cj4gZnMvbmZzL25mczRzdGF0ZS5jICB8IDIzICsrKysrKysrKysrKysrKysrLS0tLS0tDQo+IDQg
ZmlsZXMgY2hhbmdlZCwgMjMgaW5zZXJ0aW9ucygrKSwgMTEgZGVsZXRpb25zKC0pDQo+IA0KPiBk
aWZmIC0tZ2l0IGEvZnMvbmZzL2RlbGVnYXRpb24uYyBiL2ZzL25mcy9kZWxlZ2F0aW9uLmMNCj4g
aW5kZXggZGZmNjAwYWUwZDc0Li5kNzI2ZDJlMDkzNTMgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9k
ZWxlZ2F0aW9uLmMNCj4gKysrIGIvZnMvbmZzL2RlbGVnYXRpb24uYw0KPiBAQCAtMTUwLDExICsx
NTAsMTEgQEAgc3RhdGljIGludCBuZnNfZGVsZWdhdGlvbl9jbGFpbV9vcGVucyhzdHJ1Y3QgaW5v
ZGUgKmlub2RlLA0KPiAJCXNwID0gc3RhdGUtPm93bmVyOw0KPiAJCS8qIEJsb2NrIG5mczRfcHJv
Y191bmxjayAqLw0KPiAJCW11dGV4X2xvY2soJnNwLT5zb19kZWxlZ3JldHVybl9tdXRleCk7DQo+
IC0JCXNlcSA9IHJhd19zZXFjb3VudF9iZWdpbigmc3AtPnNvX3JlY2xhaW1fc2VxY291bnQpOw0K
PiArCQlzZXEgPSByZWFkX3NlcWJlZ2luKCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrKTsNCj4gCQll
cnIgPSBuZnM0X29wZW5fZGVsZWdhdGlvbl9yZWNhbGwoY3R4LCBzdGF0ZSwgc3RhdGVpZCwgdHlw
ZSk7DQo+IAkJaWYgKCFlcnIpDQo+IAkJCWVyciA9IG5mc19kZWxlZ2F0aW9uX2NsYWltX2xvY2tz
KGN0eCwgc3RhdGUsIHN0YXRlaWQpOw0KPiAtCQlpZiAoIWVyciAmJiByZWFkX3NlcWNvdW50X3Jl
dHJ5KCZzcC0+c29fcmVjbGFpbV9zZXFjb3VudCwgc2VxKSkNCj4gKwkJaWYgKCFlcnIgJiYgcmVh
ZF9zZXFyZXRyeSgmc3AtPnNvX3JlY2xhaW1fc2VxbG9jaywgc2VxKSkNCj4gCQkJZXJyID0gLUVB
R0FJTjsNCj4gCQltdXRleF91bmxvY2soJnNwLT5zb19kZWxlZ3JldHVybl9tdXRleCk7DQo+IAkJ
cHV0X25mc19vcGVuX2NvbnRleHQoY3R4KTsNCj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0X2Zz
LmggYi9mcy9uZnMvbmZzNF9mcy5oDQo+IGluZGV4IDliM2E4MmFiYWIwNy4uMmZlZTFhMmU4YjU3
IDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZzNF9mcy5oDQo+ICsrKyBiL2ZzL25mcy9uZnM0X2Zz
LmgNCj4gQEAgLTExMSw3ICsxMTEsOCBAQCBzdHJ1Y3QgbmZzNF9zdGF0ZV9vd25lciB7DQo+IAl1
bnNpZ25lZCBsb25nCSAgICAgc29fZmxhZ3M7DQo+IAlzdHJ1Y3QgbGlzdF9oZWFkICAgICBzb19z
dGF0ZXM7DQo+IAlzdHJ1Y3QgbmZzX3NlcWlkX2NvdW50ZXIgc29fc2VxaWQ7DQo+IC0Jc2VxY291
bnRfdAkgICAgIHNvX3JlY2xhaW1fc2VxY291bnQ7DQo+ICsJc2VxbG9ja190CSAgICAgc29fcmVj
bGFpbV9zZXFsb2NrOw0KPiArCXN0cnVjdCBtdXRleAkgICAgIHNvX3JlY2xhaW1fc2VxbG9ja19t
dXRleDsNCj4gCXN0cnVjdCBtdXRleAkgICAgIHNvX2RlbGVncmV0dXJuX211dGV4Ow0KPiB9Ow0K
PiANCj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0cHJvYy5jIGIvZnMvbmZzL25mczRwcm9jLmMN
Cj4gaW5kZXggNzg5NzgyNmQ3YzUxLi45YjlkNTNjZDg1ZjkgMTAwNjQ0DQo+IC0tLSBhL2ZzL25m
cy9uZnM0cHJvYy5jDQo+ICsrKyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+IEBAIC0yNjg1LDcgKzI2
ODUsNyBAQCBzdGF0aWMgaW50IF9uZnM0X29wZW5fYW5kX2dldF9zdGF0ZShzdHJ1Y3QgbmZzNF9v
cGVuZGF0YSAqb3BlbmRhdGEsDQo+IAl1bnNpZ25lZCBpbnQgc2VxOw0KPiAJaW50IHJldDsNCj4g
DQo+IC0Jc2VxID0gcmF3X3NlcWNvdW50X2JlZ2luKCZzcC0+c29fcmVjbGFpbV9zZXFjb3VudCk7
DQo+ICsJc2VxID0gcmF3X3NlcWNvdW50X2JlZ2luKCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrLnNl
cWNvdW50KTsNCj4gDQo+IAlyZXQgPSBfbmZzNF9wcm9jX29wZW4ob3BlbmRhdGEpOw0KPiAJaWYg
KHJldCAhPSAwKQ0KPiBAQCAtMjcyMyw3ICsyNzIzLDcgQEAgc3RhdGljIGludCBfbmZzNF9vcGVu
X2FuZF9nZXRfc3RhdGUoc3RydWN0IG5mczRfb3BlbmRhdGEgKm9wZW5kYXRhLA0KPiAJY3R4LT5z
dGF0ZSA9IHN0YXRlOw0KPiAJaWYgKGRfaW5vZGUoZGVudHJ5KSA9PSBzdGF0ZS0+aW5vZGUpIHsN
Cj4gCQluZnNfaW5vZGVfYXR0YWNoX29wZW5fY29udGV4dChjdHgpOw0KPiAtCQlpZiAocmVhZF9z
ZXFjb3VudF9yZXRyeSgmc3AtPnNvX3JlY2xhaW1fc2VxY291bnQsIHNlcSkpDQo+ICsJCWlmIChy
ZWFkX3NlcXJldHJ5KCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrLCBzZXEpKQ0KPiAJCQluZnM0X3Nj
aGVkdWxlX3N0YXRlaWRfcmVjb3Zlcnkoc2VydmVyLCBzdGF0ZSk7DQo+IAl9DQo+IG91dDoNCj4g
ZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0c3RhdGUuYyBiL2ZzL25mcy9uZnM0c3RhdGUuYw0KPiBp
bmRleCA1ZjQyODFlYzVmNzIuLmE0NDJkOTg2Nzk0MiAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL25m
czRzdGF0ZS5jDQo+ICsrKyBiL2ZzL25mcy9uZnM0c3RhdGUuYw0KPiBAQCAtNDg4LDcgKzQ4OCw4
IEBAIG5mczRfYWxsb2Nfc3RhdGVfb3duZXIoc3RydWN0IG5mc19zZXJ2ZXIgKnNlcnZlciwNCj4g
CW5mczRfaW5pdF9zZXFpZF9jb3VudGVyKCZzcC0+c29fc2VxaWQpOw0KPiAJYXRvbWljX3NldCgm
c3AtPnNvX2NvdW50LCAxKTsNCj4gCUlOSVRfTElTVF9IRUFEKCZzcC0+c29fbHJ1KTsNCj4gLQlz
ZXFjb3VudF9pbml0KCZzcC0+c29fcmVjbGFpbV9zZXFjb3VudCk7DQo+ICsJc2VxbG9ja19pbml0
KCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrKTsNCj4gKwltdXRleF9pbml0KCZzcC0+c29fcmVjbGFp
bV9zZXFsb2NrX211dGV4KTsNCj4gCW11dGV4X2luaXQoJnNwLT5zb19kZWxlZ3JldHVybl9tdXRl
eCk7DQo+IAlyZXR1cm4gc3A7DQo+IH0NCj4gQEAgLTE0OTcsOCArMTQ5OCwxOCBAQCBzdGF0aWMg
aW50IG5mczRfcmVjbGFpbV9vcGVuX3N0YXRlKHN0cnVjdCBuZnM0X3N0YXRlX293bmVyICpzcCwg
Y29uc3Qgc3RydWN0IG5mcw0KPiAJICogcmVjb3ZlcmluZyBhZnRlciBhIG5ldHdvcmsgcGFydGl0
aW9uIG9yIGEgcmVib290IGZyb20gYQ0KPiAJICogc2VydmVyIHRoYXQgZG9lc24ndCBzdXBwb3J0
IGEgZ3JhY2UgcGVyaW9kLg0KPiAJICovDQo+ICsJLyoNCj4gKwkgKiBYWFgNCj4gKwkgKiBUaGlz
IG11dGV4IGlzIHdyb25nLiBJdCBwcm90ZWN0cyBhZ2FpbnN0IG11bHRpcGxlIHdyaXRlci4gSG93
ZXZlcg0KDQpUaGVyZSBpcyBvbmx5IDEgcmVjb3ZlcnkgdGhyZWFkIHBlciBjbGllbnQvc2VydmVy
IHBhaXIsIHdoaWNoIGlzIHdoeSB3ZSBrbm93IHRoZXJlIGlzIG9ubHkgYSBzaW5nbGUgd3JpdGVy
LiBObyBuZWVkIGZvciBhIG11dGV4IGhlcmUuDQoNCj4gKwkgKiB3cml0ZV9zZXFsb2NrKCkgc2hv
dWxkIGhhdmUgYmVlbiB1c2VkIGZvciB0aGlzIHRhc2suIFRoaXMgd291bGQgYXZvaWQNCj4gKwkg
KiBwcmVlbXB0aW9uIHdoaWxlIHRoZSBzZXFsb2NrIGlzIGhlbGQgd2hpY2ggaXMgZ29vZCBiZWNh
dXNlIHRoZSB3cml0ZXINCj4gKwkgKiBzaG91bGRuJ3QgYmUgcHJlZW1wdGVkIGFzIGl0IHdvdWxk
IGxldCB0aGUgcmVhZGVyIHNwaW4gZm9yIG5vIGdvb2QNCg0KUmVjb3ZlcnkgaW52b2x2ZXMgc2Vu
ZGluZyBSUEMgY2FsbHMgdG8gYSBzZXJ2ZXIuIFRoZXJlIGlzIG5vIGF2b2lkaW5nIHByZWVtcHRp
b24gaWYgd2UgaGF2ZSB0byByZWNvdmVyIHN0YXRlLiBUaGUgcG9pbnQgb2YgdGhlIHNlcXVlbmNl
IGNvdW50ZXIgaXMgdG8gc2lnbmFsIHRvIHByb2Nlc3NlcyB0aGF0IG1heSBoYXZlIHJhY2VkIHdp
dGggdGhpcyByZWNvdmVyeSB0aHJlYWQgdGhhdCB0aGV5IG1heSBuZWVkIHRvIHJlcGxheSB0aGVp
ciBsb2Nrcy4NCg0KPiArCSAqIHJlYXNvbi4gVGhlcmUgYXJlIGEgZmV3IG1lbW9yeSBhbGxvY2F0
aW9ucyBhbmQga3RocmVhZF9ydW4oKSBzbyB3ZQ0KPiArCSAqIGhhdmUgdGhpcyBtdXRleCBub3cu
DQo+ICsJICovDQo+ICsJbXV0ZXhfbG9jaygmc3AtPnNvX3JlY2xhaW1fc2VxbG9ja19tdXRleCk7
DQo+ICsJd3JpdGVfc2VxY291bnRfYmVnaW4oJnNwLT5zb19yZWNsYWltX3NlcWxvY2suc2VxY291
bnQpOw0KPiAJc3Bpbl9sb2NrKCZzcC0+c29fbG9jayk7DQo+IC0JcmF3X3dyaXRlX3NlcWNvdW50
X2JlZ2luKCZzcC0+c29fcmVjbGFpbV9zZXFjb3VudCk7DQo+IHJlc3RhcnQ6DQo+IAlsaXN0X2Zv
cl9lYWNoX2VudHJ5KHN0YXRlLCAmc3AtPnNvX3N0YXRlcywgb3Blbl9zdGF0ZXMpIHsNCj4gCQlp
ZiAoIXRlc3RfYW5kX2NsZWFyX2JpdChvcHMtPnN0YXRlX2ZsYWdfYml0LCAmc3RhdGUtPmZsYWdz
KSkNCj4gQEAgLTE1NjYsMTQgKzE1NzcsMTQgQEAgc3RhdGljIGludCBuZnM0X3JlY2xhaW1fb3Bl
bl9zdGF0ZShzdHJ1Y3QgbmZzNF9zdGF0ZV9vd25lciAqc3AsIGNvbnN0IHN0cnVjdCBuZnMNCj4g
CQlzcGluX2xvY2soJnNwLT5zb19sb2NrKTsNCj4gCQlnb3RvIHJlc3RhcnQ7DQo+IAl9DQo+IC0J
cmF3X3dyaXRlX3NlcWNvdW50X2VuZCgmc3AtPnNvX3JlY2xhaW1fc2VxY291bnQpOw0KPiAJc3Bp
bl91bmxvY2soJnNwLT5zb19sb2NrKTsNCj4gKwl3cml0ZV9zZXFjb3VudF9lbmQoJnNwLT5zb19y
ZWNsYWltX3NlcWxvY2suc2VxY291bnQpOw0KDQpUaGlzIHdpbGwgcmVpbnRyb2R1Y2UgbG9ja2Rl
cCBjaGVja2luZy4gV2UgZG9u4oCZdCBuZWVkIG9yIHdhbnQgdGhhdC4uLg0KDQo+ICsJbXV0ZXhf
dW5sb2NrKCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrX211dGV4KTsNCj4gCXJldHVybiAwOw0KPiBv
dXRfZXJyOg0KPiAJbmZzNF9wdXRfb3Blbl9zdGF0ZShzdGF0ZSk7DQo+IC0Jc3Bpbl9sb2NrKCZz
cC0+c29fbG9jayk7DQo+IC0JcmF3X3dyaXRlX3NlcWNvdW50X2VuZCgmc3AtPnNvX3JlY2xhaW1f
c2VxY291bnQpOw0KPiAtCXNwaW5fdW5sb2NrKCZzcC0+c29fbG9jayk7DQo+ICsJd3JpdGVfc2Vx
Y291bnRfZW5kKCZzcC0+c29fcmVjbGFpbV9zZXFsb2NrLnNlcWNvdW50KTsNCj4gKwltdXRleF91
bmxvY2soJnNwLT5zb19yZWNsYWltX3NlcWxvY2tfbXV0ZXgpOw0KPiAJcmV0dXJuIHN0YXR1czsN
Cj4gfQ0KPiANCj4gLS0gDQo+IDIuMTAuMQ0KPiANCg0K

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sebastian Andrzej Siewior Oct. 31, 2016, 10:46 a.m. UTC | #2
On 2016-10-28 22:24:52 [+0000], Trond Myklebust wrote:
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 5f4281ec5f72..a442d9867942 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -1497,8 +1498,18 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
> > 	 * recovering after a network partition or a reboot from a
> > 	 * server that doesn't support a grace period.
> > 	 */
> > +	/*
> > +	 * XXX
> > +	 * This mutex is wrong. It protects against multiple writer. However
> 
> There is only 1 recovery thread per client/server pair, which is why we know there is only a single writer. No need for a mutex here.

This isn't documented but it should.

> > +	 * write_seqlock() should have been used for this task. This would avoid
> > +	 * preemption while the seqlock is held which is good because the writer
> > +	 * shouldn't be preempted as it would let the reader spin for no good
> 
> Recovery involves sending RPC calls to a server. There is no avoiding preemption if we have to recover state. The point of the sequence counter is to signal to processes that may have raced with this recovery thread that they may need to replay their locks.

Then the sequence counter is the wrong mechanism used here. As I
explained: the call can be preempted and once it is, the reader will
spin and waste cycles.
What is wrong with a rwsem? Are the reader not preemptible?

> > +	 * reason. There are a few memory allocations and kthread_run() so we
> > +	 * have this mutex now.
> > +	 */
> > +	mutex_lock(&sp->so_reclaim_seqlock_mutex);
> > +	write_seqcount_begin(&sp->so_reclaim_seqlock.seqcount);
> > 	spin_lock(&sp->so_lock);
> > -	raw_write_seqcount_begin(&sp->so_reclaim_seqcount);
> > restart:
> > 	list_for_each_entry(state, &sp->so_states, open_states) {
> > 		if (!test_and_clear_bit(ops->state_flag_bit, &state->flags))
> > @@ -1566,14 +1577,14 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
> > 		spin_lock(&sp->so_lock);
> > 		goto restart;
> > 	}
> > -	raw_write_seqcount_end(&sp->so_reclaim_seqcount);
> > 	spin_unlock(&sp->so_lock);
> > +	write_seqcount_end(&sp->so_reclaim_seqlock.seqcount);
> 
> This will reintroduce lockdep checking. We don’t need or want that...

Why isn't lockdep welcome? And what kind warning will it introduce? How
do I trigger this? I tried various things but never got close to this.

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index dff600ae0d74..d726d2e09353 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -150,11 +150,11 @@  static int nfs_delegation_claim_opens(struct inode *inode,
 		sp = state->owner;
 		/* Block nfs4_proc_unlck */
 		mutex_lock(&sp->so_delegreturn_mutex);
-		seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+		seq = read_seqbegin(&sp->so_reclaim_seqlock);
 		err = nfs4_open_delegation_recall(ctx, state, stateid, type);
 		if (!err)
 			err = nfs_delegation_claim_locks(ctx, state, stateid);
-		if (!err && read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+		if (!err && read_seqretry(&sp->so_reclaim_seqlock, seq))
 			err = -EAGAIN;
 		mutex_unlock(&sp->so_delegreturn_mutex);
 		put_nfs_open_context(ctx);
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 9b3a82abab07..2fee1a2e8b57 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -111,7 +111,8 @@  struct nfs4_state_owner {
 	unsigned long	     so_flags;
 	struct list_head     so_states;
 	struct nfs_seqid_counter so_seqid;
-	seqcount_t	     so_reclaim_seqcount;
+	seqlock_t	     so_reclaim_seqlock;
+	struct mutex	     so_reclaim_seqlock_mutex;
 	struct mutex	     so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7897826d7c51..9b9d53cd85f9 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2685,7 +2685,7 @@  static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
 	unsigned int seq;
 	int ret;
 
-	seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+	seq = raw_seqcount_begin(&sp->so_reclaim_seqlock.seqcount);
 
 	ret = _nfs4_proc_open(opendata);
 	if (ret != 0)
@@ -2723,7 +2723,7 @@  static int _nfs4_open_and_get_state(struct nfs4_opendata *opendata,
 	ctx->state = state;
 	if (d_inode(dentry) == state->inode) {
 		nfs_inode_attach_open_context(ctx);
-		if (read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+		if (read_seqretry(&sp->so_reclaim_seqlock, seq))
 			nfs4_schedule_stateid_recovery(server, state);
 	}
 out:
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 5f4281ec5f72..a442d9867942 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -488,7 +488,8 @@  nfs4_alloc_state_owner(struct nfs_server *server,
 	nfs4_init_seqid_counter(&sp->so_seqid);
 	atomic_set(&sp->so_count, 1);
 	INIT_LIST_HEAD(&sp->so_lru);
-	seqcount_init(&sp->so_reclaim_seqcount);
+	seqlock_init(&sp->so_reclaim_seqlock);
+	mutex_init(&sp->so_reclaim_seqlock_mutex);
 	mutex_init(&sp->so_delegreturn_mutex);
 	return sp;
 }
@@ -1497,8 +1498,18 @@  static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 	 * recovering after a network partition or a reboot from a
 	 * server that doesn't support a grace period.
 	 */
+	/*
+	 * XXX
+	 * This mutex is wrong. It protects against multiple writer. However
+	 * write_seqlock() should have been used for this task. This would avoid
+	 * preemption while the seqlock is held which is good because the writer
+	 * shouldn't be preempted as it would let the reader spin for no good
+	 * reason. There are a few memory allocations and kthread_run() so we
+	 * have this mutex now.
+	 */
+	mutex_lock(&sp->so_reclaim_seqlock_mutex);
+	write_seqcount_begin(&sp->so_reclaim_seqlock.seqcount);
 	spin_lock(&sp->so_lock);
-	raw_write_seqcount_begin(&sp->so_reclaim_seqcount);
 restart:
 	list_for_each_entry(state, &sp->so_states, open_states) {
 		if (!test_and_clear_bit(ops->state_flag_bit, &state->flags))
@@ -1566,14 +1577,14 @@  static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
 		spin_lock(&sp->so_lock);
 		goto restart;
 	}
-	raw_write_seqcount_end(&sp->so_reclaim_seqcount);
 	spin_unlock(&sp->so_lock);
+	write_seqcount_end(&sp->so_reclaim_seqlock.seqcount);
+	mutex_unlock(&sp->so_reclaim_seqlock_mutex);
 	return 0;
 out_err:
 	nfs4_put_open_state(state);
-	spin_lock(&sp->so_lock);
-	raw_write_seqcount_end(&sp->so_reclaim_seqcount);
-	spin_unlock(&sp->so_lock);
+	write_seqcount_end(&sp->so_reclaim_seqlock.seqcount);
+	mutex_unlock(&sp->so_reclaim_seqlock_mutex);
 	return status;
 }