Orangefs ABI documentation

Ok, I understand these last couple of problems better.

If the client-core crashes (kill -9 in my test cases) in
the middle of a rename or an unlink (and maybe some other
operations, these are the ones I have captured and studied) a
couple of things can happen.

In both cases, you get:

  service_operation
    queue the operation
    wait_for_matching_downcall = -EAGAIN
    queue the operation
    wait_for_matching_downcall = 0
    out:

Sometimes when the operation is first queued, the client-core
will be in the middle of the state machine code and the operation
will be half done when the client-core dies, and the object that
was being operated on will be broken. In other words, it is possible
for a userspace program using the Orangefs native API to corrupt
the filesystem if it crashes in a critical area. Do other userspace
filesystems have this same problem?

Other times, when the operation is first queued, the client-core
gets the operation fully launched, and then dies. Then, when
the operation is queued up again, the operation fails on -ENOENT.
You can't rename a to b if a has already been renamed to b. You can't
unlink a if there is no a.

For the first case I don't see how there's anything that can be done.
The filesystem is corrupted. Its not toast or anything, but there's
a directory somewhere with a broken file in it.

I have made a patch that appears to actually work and cause no bad
side effects for the second case. Al has come colorful phrases,
like "this is too ugly to live" and some others. What you think about
this patch, Al <g>...

The d_drop is how I implemented the idea you had at first, Al, I'm
not sure now if it helps or hurts or is a no-op.

# git --no-pager diff
 }
@@ -433,6 +444,17 @@ static int orangefs_rename(struct inode *old_dir,
      "orangefs_rename: got downcall status %d\n",
      ret);

+ /*
+ * We would never have gotten here if the object didn't
+ * exist when we started down this path. There's a race
+ * condition where if a restart of the client-core
+ * coincides just right with an in-progress rename a
+ * file can get renamed on the server and be gone
+ * when service-operation does the retry...
+ */
+ if (ret == -ENOENT)
+ ret = 0;
+
  if (new_dentry->d_inode)
  new_dentry->d_inode->i_ctime = CURRENT_TIME;


On Mon, Feb 22, 2016 at 4:22 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> I did this and the problem seems fixed:
>
> # git diff
> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
> index b3ae374..249bda5 100644
> --- a/fs/orangefs/namei.c
> +++ b/fs/orangefs/namei.c
> @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir,
>                            __func__,
>                            dentry->d_name.name);
>                 ret = PTR_ERR(inode);
> +               d_drop(dentry);
>                 goto out;
>         }
>
> Of course, this has uncovered yet another reproducible problem:
>
> 710055 orangefs_unlink: called on PPTB1E4.TMP
> 710058 service_operation: orangefs_unlink ffff880014828000
>
>                 right in here I think the rm is
>                 being processed in the server just
>                 as the client-core has died.
>
> 710534 wait_for_matching_downcall: operation purged ffff880014828000
> 710538 service_operation: orangefs_unlink ffff880014828000
> 710539 service_operation:client core is NOT in service
>
>                 right in here I think stuff starts
>                 working again and we're going
>                 to unsuccessfully try to process
>                 the rm again.
>
> 710646 wait_for_matching_downcall returned 0 for ffff880014828000
>
>                 happy, because we got the matching downcall
>
> 710647 service_operation orangefs_unlink returning -2 for ffff880014828000
> 710648 orangefs_unlink: service_operation returned -2
>
>                 sad, because we got ENOENT on second rm
>
> 710649 Releasing OP ffff880014828000
>
>                 so... the userspace process (dbench in this case) thinks
>                 the rm failed, but it didn't.
>
>
>
> On Mon, Feb 22, 2016 at 11:20 AM, Mike Marshall <hubcap@omnibond.com> wrote:
>>  > Looks like I'd screwed up checking last time.
>>
>> Probably not that <g>... my branch did diverge over the course
>> of the few days that we were thrashing around in the kernel trying
>> to fix what I had broken two years ago in userspace.
>>
>> I can relate to why you were motivated to remove the thrashing
>> around from the git history, but your git-foo is much stronger
>> than mine. I wanted to try and get my branch back into line using
>> a methodology that I understand to keep from ending up like
>> this fellow:
>>
>> http://myweb.clemson.edu/~hubcap/harris.jpg
>>
>> I'm glad it worked out... my kernel.org for-next branch is updated now.
>>
>> so, I'll keep working the problem, using your d_drop idea first off...
>> I'll be back with more information, and hopefully even have it fixed, soon...
>>
>> -Mike
>>
>> On Sat, Feb 20, 2016 at 8:36 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> On Sat, Feb 20, 2016 at 07:14:26AM -0500, Mike Marshall wrote:
>>>
>>>> Your orangefs-untested branch has 5625087 commits. My "current" branch
>>>> has 5625087 commits. In each all of the commit signatures match, except
>>>> for the most recent 15 commits. The last 15 commits in my "current"
>>>> branch were made from your orangefs-untested branch with "git format-patch"
>>>> and applied to my "current" branch with "git am -s". "git log -p" shows that
>>>> my most recent 15 commits differ from your most recent 15 commits by
>>>> the addition of my "sign off" line.
>>>
>>> *blinks*
>>> *checks*
>>>
>>> OK, ignore what I asked, then.  Looks like I'd screwed up checking last time.
>>>
>>>> I will absolutely update my kernel.org for-next branch with the procedure you
>>>> outlined, because you said so.
>>>>
>>>> I wish I understood it better, though... I can only guess at this point that
>>>> the procedure you outlined will do some desirable thing to git metadata...?
>>>
>>> None whatsoever, ignore it.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Orangefs ABI documentation

Commit Message

Comments

Patch