diff mbox

More parallel atomic_open/d_splice_alias fun with NFS and possibly more FSes.

Message ID 1587788B-C7F6-4D19-BDAD-29846231CD6C@linuxhacker.ru
State New
Headers show

Commit Message

Oleg Drokin July 5, 2016, 8:21 p.m. UTC
On Jul 5, 2016, at 4:08 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 03:12:44PM -0400, Oleg Drokin wrote:
>> 
>> When d_in_lookup was introduced, it was described as:
>>    New primitives: d_in_lookup() (a predicate checking if dentry is in
>>    the in-lookup state) and d_lookup_done() (tells the system that
>>    we are done with lookup and if it's still marked as in-lookup, it
>>    should cease to be such).
>> 
>> I don't see where it mentions anything about exclusive vs parallel lookup
>> that probably led to some confusion.
> 
> In the same commit:
> 
> #define DCACHE_PAR_LOOKUP               0x10000000 /* being looked up (with parent locked shared) */

Well, I did not really check the commit, just the log message,
since there's no other documentation about it apparently ;)

>> So with Lustre the dentry can be in three states, really:
>> 
>> 1. hashed dentry that's all A-ok to reuse.
>> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
>> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
>> 
>> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
>> 
>> If the dentry we have is not hashed - this is a new lookup, so we need to
>> call into ll_splice_alias() to see if there's a better dentry we need to
>> reuse that was already rejected by VFS before since we did not have necessary locks,
>> but we do have them now.
>> The comment at the top of ll_dcompare() explains why we don't just unhash the
>> dentry on lock-loss - that apparently leads to a loop around real_lookup for
>> real-contended dentries.
>> This is also why we cannot use d_splice_alias here - such cases are possible
>> for regular files and directories.
>> 
>> Anyway, I guess additional point of confusion here is then why does
>> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
>> lookup, so we should be unhashed here.
>> I checked the commit history and this check was added along with atomic_open
>> support, so I imagine we can just move it up into ll_atomic_open and then your
>> change starts to make sense along with a few other things.
> 
> So basically this
>        } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>                   !it_disposition(it, DISP_OPEN_CREATE)) {
>                /* With DISP_OPEN_CREATE dentry will be
>                 * instantiated in ll_create_it.
>                 */
>                LASSERT(!d_inode(*de));
>                d_instantiate(*de, inode);
>        }
> is something we should do only for negative hashed fed to it by
> ->atomic_open(), and that - only if we have no O_CREAT in flags?

Yes, and in fact we can totally avoid it I think.

> Then, since 3/3 eliminates that case completely, we could just rip that
> else-if, along with the check for d_unhashed itself, making the call of
> ll_splice_alias() unconditional there.  Or am I misreading what you said?
> Do you see any problems with what's in #for-linus now (head at 11f0083)?

Yes, we can make it unconditional
I think we can simplify it even more and since unlike NFS our negative dentries
are a lot less speculative, we can just return ENOENT if this is not a create
request and unhash otherwise. Still needs to go through the whole test suite
to make sure it does not break anything unexpected.

At least this is the patch I am playing with right now:


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Oleg Drokin July 6, 2016, 12:29 a.m. UTC | #1
On Jul 5, 2016, at 4:21 PM, Oleg Drokin wrote:
> 
>>> So with Lustre the dentry can be in three states, really:
>>> 
>>> 1. hashed dentry that's all A-ok to reuse.
>>> 2. hashed dentry that's NOT valid (dlm lock went away) - this is distinguished in d_compare by looking at a bit in the fs_data
>>> 3. unhashed dentry ( I guess could be both valid and invalid lustre-wise).
>>> 
>>> So the logic in ll_lookup_it_finish() (part of regular lookup) is this:
>>> 
>>> If the dentry we have is not hashed - this is a new lookup, so we need to
>>> call into ll_splice_alias() to see if there's a better dentry we need to
>>> reuse that was already rejected by VFS before since we did not have necessary locks,
>>> but we do have them now.
>>> The comment at the top of ll_dcompare() explains why we don't just unhash the
>>> dentry on lock-loss - that apparently leads to a loop around real_lookup for
>>> real-contended dentries.
>>> This is also why we cannot use d_splice_alias here - such cases are possible
>>> for regular files and directories.
>>> 
>>> Anyway, I guess additional point of confusion here is then why does
>>> ll_lookup_it_finish() need to check for hashedness of the dentry since it's in
>>> lookup, so we should be unhashed here.
>>> I checked the commit history and this check was added along with atomic_open
>>> support, so I imagine we can just move it up into ll_atomic_open and then your
>>> change starts to make sense along with a few other things.
>> 
>> So basically this
>>       } else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
>>                  !it_disposition(it, DISP_OPEN_CREATE)) {
>>               /* With DISP_OPEN_CREATE dentry will be
>>                * instantiated in ll_create_it.
>>                */
>>               LASSERT(!d_inode(*de));
>>               d_instantiate(*de, inode);
>>       }
>> is something we should do only for negative hashed fed to it by
>> ->atomic_open(), and that - only if we have no O_CREAT in flags?
> 
> Yes, and in fact we can totally avoid it I think.
> 
>> Then, since 3/3 eliminates that case completely, we could just rip that
>> else-if, along with the check for d_unhashed itself, making the call of
>> ll_splice_alias() unconditional there.  Or am I misreading what you said?
>> Do you see any problems with what's in #for-linus now (head at 11f0083)?
> 
> Yes, we can make it unconditional
> I think we can simplify it even more and since unlike NFS our negative dentries
> are a lot less speculative, we can just return ENOENT if this is not a create
> request and unhash otherwise. Still needs to go through the whole test suite
> to make sure it does not break anything unexpected.
> 
> At least this is the patch I am playing with right now:
> --- a/drivers/staging/lustre/lustre/llite/namei.c
> +++ b/drivers/staging/lustre/lustre/llite/namei.c
> @@ -391,6 +391,7 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> 	struct inode *inode = NULL;
> 	__u64 bits = 0;
> 	int rc = 0;
> +	struct dentry *alias;
> 
> 	/* NB 1 request reference will be taken away by ll_intent_lock()
> 	 * when I return
> @@ -415,26 +416,12 @@ static int ll_lookup_it_finish(struct ptlrpc_request *request,
> 		 */
> 	}
> 
> -	/* Only hash *de if it is unhashed (new dentry).
> -	 * Atoimc_open may passing hashed dentries for open.
> -	 */
> -	if (d_unhashed(*de)) {
> -		struct dentry *alias;
> -
> -		alias = ll_splice_alias(inode, *de);
> -		if (IS_ERR(alias)) {
> -			rc = PTR_ERR(alias);
> -			goto out;
> -		}
> -		*de = alias;
> -	} else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
> -		   !it_disposition(it, DISP_OPEN_CREATE)) {
> -		/* With DISP_OPEN_CREATE dentry will be
> -		 * instantiated in ll_create_it.
> -		 */
> -		LASSERT(!d_inode(*de));
> -		d_instantiate(*de, inode);
> +	alias = ll_splice_alias(inode, *de);
> +	if (IS_ERR(alias)) {
> +		rc = PTR_ERR(alias);
> +		goto out;
> 	}
> +	*de = alias;
> 
> 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
> 		/* we have lookup look - unhide dentry */
> @@ -590,6 +577,24 @@ static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
> 	       dentry, PFID(ll_inode2fid(dir)), dir, file, open_flags, mode,
> 	       *opened);
> 
> +	/* Only negative dentries enter here */
> +	LASSERT(!d_inode(dentry));
> +
> +	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {

Duh, this obviously was supposed to be
 if (!d_in_lookup(dentry)) {

But even in the form above nothing bad happened in the full testing
because we cannot find any aliases without an inode.

> +		/* A valid negative dentry that just passed revalidation,
> +		 * there's little point to try and open it server-side,
> +		 * even though there's a minuscle chance it might succeed.
> +		 * Either way it's a valid race to just return -ENOENT here.
> +		 */
> +		if (!(open_flags & O_CREAT))
> +			return -ENOENT;
> +
> +		/* Otherwise we just unhash it to be rehashed afresh via
> +		 * lookup if necessary
> +		 */
> +		d_drop(dentry);

So we can even drop this part and retain the top condition as it was.
d_add does not care if the dentry we are feeding it was hashed or not,
so do you see any downsides to doing that I wonder?

> +	}
> +
> 	it = kzalloc(sizeof(*it), GFP_NOFS);
> 	if (!it)
> 		return -ENOMEM;
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro July 6, 2016, 3:20 a.m. UTC | #2
On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
> > +		/* Otherwise we just unhash it to be rehashed afresh via
> > +		 * lookup if necessary
> > +		 */
> > +		d_drop(dentry);
> 
> So we can even drop this part and retain the top condition as it was.
> d_add does not care if the dentry we are feeding it was hashed or not,
> so do you see any downsides to doing that I wonder?

d_add() on hashed dentry will end up reaching this:
static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
{
        BUG_ON(!d_unhashed(entry));

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 6, 2016, 3:25 a.m. UTC | #3
On Jul 5, 2016, at 11:20 PM, Al Viro wrote:

> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>> +		 * lookup if necessary
>>> +		 */
>>> +		d_drop(dentry);
>> 
>> So we can even drop this part and retain the top condition as it was.
>> d_add does not care if the dentry we are feeding it was hashed or not,
>> so do you see any downsides to doing that I wonder?
> 
> d_add() on hashed dentry will end up reaching this:
> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
> {
>        BUG_ON(!d_unhashed(entry));

Ah, ok. Yes, I remember about it now from the older issue with nfs.

It's still puzzling why I did not hit it yet, but oh well.

I wonder if handling of negative dentries broke… Time for more investigations.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Drokin July 6, 2016, 4:35 a.m. UTC | #4
On Jul 5, 2016, at 11:25 PM, Oleg Drokin wrote:

> 
> On Jul 5, 2016, at 11:20 PM, Al Viro wrote:
> 
>> On Tue, Jul 05, 2016 at 08:29:37PM -0400, Oleg Drokin wrote:
>>>> +		/* Otherwise we just unhash it to be rehashed afresh via
>>>> +		 * lookup if necessary
>>>> +		 */
>>>> +		d_drop(dentry);
>>> 
>>> So we can even drop this part and retain the top condition as it was.
>>> d_add does not care if the dentry we are feeding it was hashed or not,
>>> so do you see any downsides to doing that I wonder?
>> 
>> d_add() on hashed dentry will end up reaching this:
>> static void __d_rehash(struct dentry * entry, struct hlist_bl_head *b)
>> {
>>       BUG_ON(!d_unhashed(entry));
> 
> Ah, ok. Yes, I remember about it now from the older issue with nfs.
> 
> It's still puzzling why I did not hit it yet, but oh well.
> 
> I wonder if handling of negative dentries broke… Time for more investigations.

Ah, the reason was just looking me in the eyes as the first thing we do in
ll_revalidate_dentry():

        if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
            (LOOKUP_OPEN | LOOKUP_CREATE))
                return 0;


Apparently we are trying to protect from the case where dentry is valid when we
check it, but gets pulled under us as soon as we are done (no lustre locking held
around it).
So if we don't do this, we fall into ll_file_open/ll_intent_file_open and form
a request to open the file based on the inode. If this inode is already gone
server-side, we might get ESTALE, since we don't really send the name to the
server in this case so it does not know what is it we are after anyway.

On the userspace side of things the picture is not pretty then, we are doing
open(O_CREAT) and get ESTALE back.

Looking at do_filp_open(), there's actually a retry:
        if (unlikely(filp == ERR_PTR(-ESTALE)))
                filp = path_openat(&nd, op, flags | LOOKUP_REVAL);

But it's only once so for a contended file we can still have some other thread
do a lookup, provide us with a new cached dentry that's similarly pulled
under us next time around.
But I guess the whole idea of this is to let us know the file is contended and
we can just only fail revalidate then.

Hm.. looking at the server - it's even worse, we'd get ENOENT from the server if
the desired inode no longer exist, so a perfect opportunity for the
client to turn that ENOENT into ESTALE if the create flag was set.--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -391,6 +391,7 @@  static int ll_lookup_it_finish(struct ptlrpc_request *request,
 	struct inode *inode = NULL;
 	__u64 bits = 0;
 	int rc = 0;
+	struct dentry *alias;
 
 	/* NB 1 request reference will be taken away by ll_intent_lock()
 	 * when I return
@@ -415,26 +416,12 @@  static int ll_lookup_it_finish(struct ptlrpc_request *request,
 		 */
 	}
 
-	/* Only hash *de if it is unhashed (new dentry).
-	 * Atoimc_open may passing hashed dentries for open.
-	 */
-	if (d_unhashed(*de)) {
-		struct dentry *alias;
-
-		alias = ll_splice_alias(inode, *de);
-		if (IS_ERR(alias)) {
-			rc = PTR_ERR(alias);
-			goto out;
-		}
-		*de = alias;
-	} else if (!it_disposition(it, DISP_LOOKUP_NEG)  &&
-		   !it_disposition(it, DISP_OPEN_CREATE)) {
-		/* With DISP_OPEN_CREATE dentry will be
-		 * instantiated in ll_create_it.
-		 */
-		LASSERT(!d_inode(*de));
-		d_instantiate(*de, inode);
+	alias = ll_splice_alias(inode, *de);
+	if (IS_ERR(alias)) {
+		rc = PTR_ERR(alias);
+		goto out;
 	}
+	*de = alias;
 
 	if (!it_disposition(it, DISP_LOOKUP_NEG)) {
 		/* we have lookup look - unhide dentry */
@@ -590,6 +577,24 @@  static int ll_atomic_open(struct inode *dir, struct dentry *dentry,
 	       dentry, PFID(ll_inode2fid(dir)), dir, file, open_flags, mode,
 	       *opened);
 
+	/* Only negative dentries enter here */
+	LASSERT(!d_inode(dentry));
+
+	if (!(open_flags & O_CREAT) && !d_in_lookup(dentry)) {
+		/* A valid negative dentry that just passed revalidation,
+		 * there's little point to try and open it server-side,
+		 * even though there's a minuscle chance it might succeed.
+		 * Either way it's a valid race to just return -ENOENT here.
+		 */
+		if (!(open_flags & O_CREAT))
+			return -ENOENT;
+
+		/* Otherwise we just unhash it to be rehashed afresh via
+		 * lookup if necessary
+		 */
+		d_drop(dentry);
+	}
+
 	it = kzalloc(sizeof(*it), GFP_NOFS);
 	if (!it)
 		return -ENOMEM;