Message ID | 1479574497-38268-3-git-send-email-trond.myklebust@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
.. this one breaks things again: On 19 Nov 2016, at 11:54, Trond Myklebust wrote: > There is little point in setting NFS_INO_ADVISE_RDPLUS in nfs_lookup > and > nfs_lookup_revalidate() unless a process is actually doing readdir on > the > parent directory. > Furthermore, there is little point in using readdirplus if we're > trying > to revalidate a negative dentry. > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > fs/nfs/dir.c | 28 +++++++++++++++++----------- > 1 file changed, 17 insertions(+), 11 deletions(-) > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > index 53e02b8bd9bd..5befd382be7d 100644 > --- a/fs/nfs/dir.c > +++ b/fs/nfs/dir.c > @@ -455,14 +455,23 @@ bool nfs_use_readdirplus(struct inode *dir, > struct dir_context *ctx) > } > > /* > - * This function is called by the lookup code to request the use of > - * readdirplus to accelerate any future lookups in the same > + * This function is called by the lookup and getattr code to request > the > + * use of readdirplus to accelerate any future lookups in the same > * directory. > + * Do this by checking if there is an active file descriptor > + * and calling nfs_advise_use_readdirplus, then forcing a > + * cache flush. > */ > static > void nfs_advise_use_readdirplus(struct inode *dir) > { > - set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags); > + struct nfs_inode *nfsi = NFS_I(dir); > + > + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > + !list_empty(&nfsi->open_files)) { > + set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); > + invalidate_mapping_pages(dir->i_mapping, 0, -1); > + } > } So every time advise_use_readdirplus it drops the mapping.. but what about subsequent calls into nfs_readdir() to get the next batch of entries? For the ls -l case, we want to keep setting NFS_INO_ADVISE_RDPLUS, but we shouldn't start over filling the mapping from the beginning again. > > /* > @@ -475,10 +484,7 @@ void nfs_advise_use_readdirplus(struct inode > *dir) > */ > void nfs_force_use_readdirplus(struct inode *dir) > { > - if (!list_empty(&NFS_I(dir)->open_files)) { > - nfs_advise_use_readdirplus(dir); > - invalidate_mapping_pages(dir->i_mapping, 0, -1); > - } > + nfs_advise_use_readdirplus(dir); > } > > static > @@ -1150,7 +1156,7 @@ static int nfs_lookup_revalidate(struct dentry > *dentry, unsigned int flags) > return -ECHILD; > goto out_bad; > } > - goto out_valid_noent; > + goto out_valid; > } > > if (is_bad_inode(inode)) { > @@ -1192,6 +1198,9 @@ static int nfs_lookup_revalidate(struct dentry > *dentry, unsigned int flags) > if (IS_ERR(label)) > goto out_error; > > + /* We need to force a revalidation: set a readdirplus hint */ > + nfs_advise_use_readdirplus(dir); > + > trace_nfs_lookup_revalidate_enter(dir, dentry, flags); > error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, > label); > trace_nfs_lookup_revalidate_exit(dir, dentry, flags, error); > @@ -1211,9 +1220,6 @@ static int nfs_lookup_revalidate(struct dentry > *dentry, unsigned int flags) > out_set_verifier: > nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); > out_valid: > - /* Success: notify readdir to use READDIRPLUS */ > - nfs_advise_use_readdirplus(dir); > - out_valid_noent: > if (flags & LOOKUP_RCU) { > if (parent != ACCESS_ONCE(dentry->d_parent)) > return -ECHILD; Now when listing with `ls -l`: the second call into nfs_readdir() to get the next batch of entries will not have NFS_INO_ADVISE_RDPLUS. I think this removes nfs_advise_use_readdirplus(dir) from the common "goto out_valid" path through nfs_lookup_revalidate() (the block with the 'iff' typo). Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2016-11-30 at 14:09 -0500, Benjamin Coddington wrote: > .. this one breaks things again: > > On 19 Nov 2016, at 11:54, Trond Myklebust wrote: > > > There is little point in setting NFS_INO_ADVISE_RDPLUS in > > nfs_lookup > > and > > nfs_lookup_revalidate() unless a process is actually doing readdir > > on > > the > > parent directory. > > Furthermore, there is little point in using readdirplus if we're > > trying > > to revalidate a negative dentry. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > > --- > > fs/nfs/dir.c | 28 +++++++++++++++++----------- > > 1 file changed, 17 insertions(+), 11 deletions(-) > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c > > index 53e02b8bd9bd..5befd382be7d 100644 > > --- a/fs/nfs/dir.c > > +++ b/fs/nfs/dir.c > > @@ -455,14 +455,23 @@ bool nfs_use_readdirplus(struct inode *dir, > > struct dir_context *ctx) > > } > > > > /* > > - * This function is called by the lookup code to request the use > > of > > - * readdirplus to accelerate any future lookups in the same > > + * This function is called by the lookup and getattr code to > > request > > the > > + * use of readdirplus to accelerate any future lookups in the same > > * directory. > > + * Do this by checking if there is an active file descriptor > > + * and calling nfs_advise_use_readdirplus, then forcing a > > + * cache flush. > > */ > > static > > void nfs_advise_use_readdirplus(struct inode *dir) > > { > > - set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags); > > + struct nfs_inode *nfsi = NFS_I(dir); > > + > > + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && > > + !list_empty(&nfsi->open_files)) { > > + set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); > > + invalidate_mapping_pages(dir->i_mapping, 0, -1); > > + } > > } > > So every time advise_use_readdirplus it drops the mapping.. but what > about > subsequent calls into nfs_readdir() to get the next batch of > entries? > For > the ls -l case, we want to keep setting NFS_INO_ADVISE_RDPLUS, but we > shouldn't start over filling the mapping from the beginning again. How do I ensure that the readdir isn't being served from cache, if I don't invalidate the mapping? The intention of the patch is to ensure that we only call this on a dcache or inode attribute cache miss. > > > > > /* > > @@ -475,10 +484,7 @@ void nfs_advise_use_readdirplus(struct inode > > *dir) > > */ > > void nfs_force_use_readdirplus(struct inode *dir) > > { > > - if (!list_empty(&NFS_I(dir)->open_files)) { > > - nfs_advise_use_readdirplus(dir); > > - invalidate_mapping_pages(dir->i_mapping, 0, -1); > > - } > > + nfs_advise_use_readdirplus(dir); > > } > > > > static > > @@ -1150,7 +1156,7 @@ static int nfs_lookup_revalidate(struct > > dentry > > *dentry, unsigned int flags) > > return -ECHILD; > > goto out_bad; > > } > > - goto out_valid_noent; > > + goto out_valid; > > } > > > > if (is_bad_inode(inode)) { > > @@ -1192,6 +1198,9 @@ static int nfs_lookup_revalidate(struct > > dentry > > *dentry, unsigned int flags) > > if (IS_ERR(label)) > > goto out_error; > > > > + /* We need to force a revalidation: set a readdirplus hint > > */ > > + nfs_advise_use_readdirplus(dir); > > + > > trace_nfs_lookup_revalidate_enter(dir, dentry, flags); > > error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, > > fhandle, fattr, > > label); > > trace_nfs_lookup_revalidate_exit(dir, dentry, flags, > > error); > > @@ -1211,9 +1220,6 @@ static int nfs_lookup_revalidate(struct > > dentry > > *dentry, unsigned int flags) > > out_set_verifier: > > nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); > > out_valid: > > - /* Success: notify readdir to use READDIRPLUS */ > > - nfs_advise_use_readdirplus(dir); > > - out_valid_noent: > > if (flags & LOOKUP_RCU) { > > if (parent != ACCESS_ONCE(dentry->d_parent)) > > return -ECHILD; > > > Now when listing with `ls -l`: the second call into nfs_readdir() > to > get > the next batch of entries will not have NFS_INO_ADVISE_RDPLUS. > > I think this removes nfs_advise_use_readdirplus(dir) from the common > "goto > out_valid" path through nfs_lookup_revalidate() (the block with the > 'iff' > typo). > Actually, 'iff' is intentionally used there as the common shorthand for 'if and only if' (https://www.merriam-webster.com/dictionary/iff). As I said above, the point is to only trigger READDIRPLUS when we know the dcache or the inode cache needs revalidation. Otherwise we want to use the less expensive READDIR. I'm open for suggestions as to how we can improve that heuristic. Cheers Trond
On 1 Dec 2016, at 15:47, Trond Myklebust wrote: > On Wed, 2016-11-30 at 14:09 -0500, Benjamin Coddington wrote: >> .. this one breaks things again: >> >> On 19 Nov 2016, at 11:54, Trond Myklebust wrote: >> >>> static >>> void nfs_advise_use_readdirplus(struct inode *dir) >>> { >>> - set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags); >>> + struct nfs_inode *nfsi = NFS_I(dir); >>> + >>> + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && >>> + !list_empty(&nfsi->open_files)) { >>> + set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); >>> + invalidate_mapping_pages(dir->i_mapping, 0, -1); >>> + } >>> } >> >> So every time advise_use_readdirplus it drops the mapping.. but what >> about >> subsequent calls into nfs_readdir() to get the next batch of >> entries? >> For >> the ls -l case, we want to keep setting NFS_INO_ADVISE_RDPLUS, but we >> shouldn't start over filling the mapping from the beginning again. > > How do I ensure that the readdir isn't being served from cache, if I > don't invalidate the mapping? The way it is now. Isn't advise_use_readdirplus() just a marker to say "continue to use readdirplus" after nfs_force_use_readdirplus() has invalidated the mapping? > The intention of the patch is to ensure that we only call this on a dcache > or inode attribute cache miss. I think it also needs to be called when we detect if a child of the directory is looked up/revalidated in order to set NFS_INO_ADVISE_RDPLUS again. Otherwise we lose the optimization in commit 311324ad1713. Maybe I am just really confused.. >>> @@ -1211,9 +1220,6 @@ static int nfs_lookup_revalidate(struct >>> dentry >>> *dentry, unsigned int flags) >>> out_set_verifier: >>> nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); >>> out_valid: >>> - /* Success: notify readdir to use READDIRPLUS */ >>> - nfs_advise_use_readdirplus(dir); >>> - out_valid_noent: >>> if (flags & LOOKUP_RCU) { >>> if (parent != ACCESS_ONCE(dentry->d_parent)) >>> return -ECHILD; >> >> >> Now when listing with `ls -l`: the second call into nfs_readdir() >> to >> get >> the next batch of entries will not have NFS_INO_ADVISE_RDPLUS. >> >> I think this removes nfs_advise_use_readdirplus(dir) from the common >> "goto >> out_valid" path through nfs_lookup_revalidate() (the block with the >> 'iff' >> typo). >> > > Actually, 'iff' is intentionally used there as the common shorthand for > 'if and only if' (https://www.merriam-webster.com/dictionary/iff). Ah, and now I see it used everywhere. Thank you for filling in that hole in my head. > As I said above, the point is to only trigger READDIRPLUS when we know > the dcache or the inode cache needs revalidation. Otherwise we want to > use the less expensive READDIR. Not if using ls -l. With this patch, an ls -l of a directory of 2000 entries follows this pattern: - nfs_readdir() returns first 1024 entries obtained with READDIRPLUS - nfs_readdir() returns last 976 entries with READDIR - stat()/LOOKUPs are done on those 976 - ls -l does its final getdents() on the last position, but we've now invalidated the pages, so finally: - nfs_readdir() is called with position 2000, and we do READDIRPLUS for _every_ entry all over again. This seems to break commit 311324ad1713's optimization, since now we'll send RPC to read the entire directory twice for ls -l. > I'm open for suggestions as to how we can improve that heuristic. OK. Right now I've got nothing to suggest, but I'll spend some time thinking about it. Ben -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DQo+IE9uIERlYyAyLCAyMDE2LCBhdCAwODo1NiwgQmVuamFtaW4gQ29kZGluZ3RvbiA8YmNvZGRp bmdAcmVkaGF0LmNvbT4gd3JvdGU6DQo+IA0KPiBPbiAxIERlYyAyMDE2LCBhdCAxNTo0NywgVHJv bmQgTXlrbGVidXN0IHdyb3RlOg0KPiANCj4+IE9uIFdlZCwgMjAxNi0xMS0zMCBhdCAxNDowOSAt MDUwMCwgQmVuamFtaW4gQ29kZGluZ3RvbiB3cm90ZToNCj4+PiAuLiB0aGlzIG9uZSBicmVha3Mg dGhpbmdzIGFnYWluOg0KPj4+IA0KPj4+IE9uIDE5IE5vdiAyMDE2LCBhdCAxMTo1NCwgVHJvbmQg TXlrbGVidXN0IHdyb3RlOg0KPj4+IA0KPj4+PiAgc3RhdGljDQo+Pj4+ICB2b2lkIG5mc19hZHZp c2VfdXNlX3JlYWRkaXJwbHVzKHN0cnVjdCBpbm9kZSAqZGlyKQ0KPj4+PiAgew0KPj4+PiAtCXNl dF9iaXQoTkZTX0lOT19BRFZJU0VfUkRQTFVTLCAmTkZTX0koZGlyKS0+ZmxhZ3MpOw0KPj4+PiAr CXN0cnVjdCBuZnNfaW5vZGUgKm5mc2kgPSBORlNfSShkaXIpOw0KPj4+PiArDQo+Pj4+ICsJaWYg KG5mc19zZXJ2ZXJfY2FwYWJsZShkaXIsIE5GU19DQVBfUkVBRERJUlBMVVMpICYmDQo+Pj4+ICsJ ICAgICFsaXN0X2VtcHR5KCZuZnNpLT5vcGVuX2ZpbGVzKSkgew0KPj4+PiArCQlzZXRfYml0KE5G U19JTk9fQURWSVNFX1JEUExVUywgJm5mc2ktPmZsYWdzKTsNCj4+Pj4gKwkJaW52YWxpZGF0ZV9t YXBwaW5nX3BhZ2VzKGRpci0+aV9tYXBwaW5nLCAwLCAtMSk7DQo+Pj4+ICsJfQ0KPj4+PiAgfQ0K Pj4+IA0KPj4+IFNvIGV2ZXJ5IHRpbWUgYWR2aXNlX3VzZV9yZWFkZGlycGx1cyBpdCBkcm9wcyB0 aGUgbWFwcGluZy4uIGJ1dCB3aGF0IA0KPj4+IGFib3V0DQo+Pj4gc3Vic2VxdWVudCBjYWxscyBp bnRvIG5mc19yZWFkZGlyKCkgdG8gZ2V0IHRoZSBuZXh0IGJhdGNoIG9mDQo+Pj4gZW50cmllcz8g IA0KPj4+IEZvcg0KPj4+IHRoZSBscyAtbCBjYXNlLCB3ZSB3YW50IHRvIGtlZXAgc2V0dGluZyBO RlNfSU5PX0FEVklTRV9SRFBMVVMsIGJ1dCB3ZQ0KPj4+IHNob3VsZG4ndCBzdGFydCBvdmVyIGZp bGxpbmcgdGhlIG1hcHBpbmcgZnJvbSB0aGUgYmVnaW5uaW5nIGFnYWluLg0KPj4gDQo+PiBIb3cg ZG8gSSBlbnN1cmUgdGhhdCB0aGUgcmVhZGRpciBpc24ndCBiZWluZyBzZXJ2ZWQgZnJvbSBjYWNo ZSwgaWYgSQ0KPj4gZG9uJ3QgaW52YWxpZGF0ZSB0aGUgbWFwcGluZz8NCj4gDQo+IFRoZSB3YXkg aXQgaXMgbm93LiAgSXNuJ3QgYWR2aXNlX3VzZV9yZWFkZGlycGx1cygpIGp1c3QgYSBtYXJrZXIg dG8gc2F5DQo+ICJjb250aW51ZSB0byB1c2UgcmVhZGRpcnBsdXMiIGFmdGVyIG5mc19mb3JjZV91 c2VfcmVhZGRpcnBsdXMoKSBoYXMNCj4gaW52YWxpZGF0ZWQgdGhlIG1hcHBpbmc/DQoNClNvLCB5 ZXMuIEkgd2FzIGNvbnNpZGVyaW5nIHJlbmFtaW5nIHRoZSBmdW5jdGlvbnMgaW4gdGhlIHYyIHBh dGNoIHNlcmllcywgYnV0IEkgY291bGRu4oCZdCByZWFsbHkgZmluZCBhIGdvb2QgbmFtZS4NCklu IGVzc2VuY2U6DQotIG5mc19mb3JjZV91c2VfcmVhZGRpcnBsdXMoKSBpcyBoaW50aW5nIHRvIHJl YWRkaXIgdGhhdCB3ZSBoYWQgYSBjYWNoZSBtaXNzLCBhbmQgc28gd2Ugc2hvdWxkIGNsZWFyIHRo ZSByZWFkaXIgY2FjaGUgYW5kIHVzZSByZWFkZGlycGx1cyBpbiBvcmRlciB0byBwb3B1bGF0ZSB0 aGUgY2FjaGUuDQotIG5mc19hZHZpc2VfdXNlX3JlYWRkaXJwbHVzKCkgaXMgaGludGluZyB0aGF0 IHdlIGhhZCBhIGNhY2hlIGhpdCwgYW5kIHNvIHdlIHNob3VsZCBjb250aW51ZSB0byB1c2UgcmVh ZGRpcnBsdXMNCg0KPiANCj4+IFRoZSBpbnRlbnRpb24gb2YgdGhlIHBhdGNoIGlzIHRvIGVuc3Vy ZSB0aGF0IHdlIG9ubHkgY2FsbCB0aGlzIG9uIGEgZGNhY2hlDQo+PiBvciBpbm9kZSBhdHRyaWJ1 dGUgY2FjaGUgbWlzcy4NCj4gDQo+IEkgdGhpbmsgaXQgYWxzbyBuZWVkcyB0byBiZSBjYWxsZWQg d2hlbiB3ZSBkZXRlY3QgaWYgYSBjaGlsZCBvZiB0aGUNCj4gZGlyZWN0b3J5IGlzIGxvb2tlZCB1 cC9yZXZhbGlkYXRlZCBpbiBvcmRlciB0byBzZXQgTkZTX0lOT19BRFZJU0VfUkRQTFVTDQo+IGFn YWluLiAgT3RoZXJ3aXNlIHdlIGxvc2UgdGhlIG9wdGltaXphdGlvbiBpbiBjb21taXQgMzExMzI0 YWQxNzEzLg0KPiANCj4gTWF5YmUgSSBhbSBqdXN0IHJlYWxseSBjb25mdXNlZC4uDQoNCk5vLCBJ IHRoaW5rIHRoYXQgaXMgY29ycmVjdC4gSXQganVzdCBuZWVkcyB0byBiZSBkb25lIG1vcmUgY29u c2lzdGVudGx5LiBBZ2Fpbiwgc2VlIHRoZSB2MiBwYXRjaCBzZXJpZXMNCg0KQ2hlZXJzDQogIFRy b25k -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 53e02b8bd9bd..5befd382be7d 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -455,14 +455,23 @@ bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx) } /* - * This function is called by the lookup code to request the use of - * readdirplus to accelerate any future lookups in the same + * This function is called by the lookup and getattr code to request the + * use of readdirplus to accelerate any future lookups in the same * directory. + * Do this by checking if there is an active file descriptor + * and calling nfs_advise_use_readdirplus, then forcing a + * cache flush. */ static void nfs_advise_use_readdirplus(struct inode *dir) { - set_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags); + struct nfs_inode *nfsi = NFS_I(dir); + + if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && + !list_empty(&nfsi->open_files)) { + set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); + invalidate_mapping_pages(dir->i_mapping, 0, -1); + } } /* @@ -475,10 +484,7 @@ void nfs_advise_use_readdirplus(struct inode *dir) */ void nfs_force_use_readdirplus(struct inode *dir) { - if (!list_empty(&NFS_I(dir)->open_files)) { - nfs_advise_use_readdirplus(dir); - invalidate_mapping_pages(dir->i_mapping, 0, -1); - } + nfs_advise_use_readdirplus(dir); } static @@ -1150,7 +1156,7 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) return -ECHILD; goto out_bad; } - goto out_valid_noent; + goto out_valid; } if (is_bad_inode(inode)) { @@ -1192,6 +1198,9 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) if (IS_ERR(label)) goto out_error; + /* We need to force a revalidation: set a readdirplus hint */ + nfs_advise_use_readdirplus(dir); + trace_nfs_lookup_revalidate_enter(dir, dentry, flags); error = NFS_PROTO(dir)->lookup(dir, &dentry->d_name, fhandle, fattr, label); trace_nfs_lookup_revalidate_exit(dir, dentry, flags, error); @@ -1211,9 +1220,6 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) out_set_verifier: nfs_set_verifier(dentry, nfs_save_change_attribute(dir)); out_valid: - /* Success: notify readdir to use READDIRPLUS */ - nfs_advise_use_readdirplus(dir); - out_valid_noent: if (flags & LOOKUP_RCU) { if (parent != ACCESS_ONCE(dentry->d_parent)) return -ECHILD;
There is little point in setting NFS_INO_ADVISE_RDPLUS in nfs_lookup and nfs_lookup_revalidate() unless a process is actually doing readdir on the parent directory. Furthermore, there is little point in using readdirplus if we're trying to revalidate a negative dentry. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/dir.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-)