diff mbox

Orangefs ABI documentation

Message ID CAOg9mSSdiOQEyoy_YdoCWHpJ5ij2=n3F+OqrpErxQKcU_6bRsQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mike Marshall Feb. 23, 2016, 9:58 p.m. UTC
Ok, I understand these last couple of problems better.

If the client-core crashes (kill -9 in my test cases) in
the middle of a rename or an unlink (and maybe some other
operations, these are the ones I have captured and studied) a
couple of things can happen.

In both cases, you get:

  service_operation
    queue the operation
    wait_for_matching_downcall = -EAGAIN
    queue the operation
    wait_for_matching_downcall = 0
    out:

Sometimes when the operation is first queued, the client-core
will be in the middle of the state machine code and the operation
will be half done when the client-core dies, and the object that
was being operated on will be broken. In other words, it is possible
for a userspace program using the Orangefs native API to corrupt
the filesystem if it crashes in a critical area. Do other userspace
filesystems have this same problem?

Other times, when the operation is first queued, the client-core
gets the operation fully launched, and then dies. Then, when
the operation is queued up again, the operation fails on -ENOENT.
You can't rename a to b if a has already been renamed to b. You can't
unlink a if there is no a.

For the first case I don't see how there's anything that can be done.
The filesystem is corrupted. Its not toast or anything, but there's
a directory somewhere with a broken file in it.

I have made a patch that appears to actually work and cause no bad
side effects for the second case. Al has come colorful phrases,
like "this is too ugly to live" and some others. What you think about
this patch, Al <g>...

The d_drop is how I implemented the idea you had at first, Al, I'm
not sure now if it helps or hurts or is a no-op.

# git --no-pager diff
 }
@@ -433,6 +444,17 @@ static int orangefs_rename(struct inode *old_dir,
      "orangefs_rename: got downcall status %d\n",
      ret);

+ /*
+ * We would never have gotten here if the object didn't
+ * exist when we started down this path. There's a race
+ * condition where if a restart of the client-core
+ * coincides just right with an in-progress rename a
+ * file can get renamed on the server and be gone
+ * when service-operation does the retry...
+ */
+ if (ret == -ENOENT)
+ ret = 0;
+
  if (new_dentry->d_inode)
  new_dentry->d_inode->i_ctime = CURRENT_TIME;


On Mon, Feb 22, 2016 at 4:22 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> I did this and the problem seems fixed:
>
> # git diff
> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
> index b3ae374..249bda5 100644
> --- a/fs/orangefs/namei.c
> +++ b/fs/orangefs/namei.c
> @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir,
>                            __func__,
>                            dentry->d_name.name);
>                 ret = PTR_ERR(inode);
> +               d_drop(dentry);
>                 goto out;
>         }
>
> Of course, this has uncovered yet another reproducible problem:
>
> 710055 orangefs_unlink: called on PPTB1E4.TMP
> 710058 service_operation: orangefs_unlink ffff880014828000
>
>                 right in here I think the rm is
>                 being processed in the server just
>                 as the client-core has died.
>
> 710534 wait_for_matching_downcall: operation purged ffff880014828000
> 710538 service_operation: orangefs_unlink ffff880014828000
> 710539 service_operation:client core is NOT in service
>
>                 right in here I think stuff starts
>                 working again and we're going
>                 to unsuccessfully try to process
>                 the rm again.
>
> 710646 wait_for_matching_downcall returned 0 for ffff880014828000
>
>                 happy, because we got the matching downcall
>
> 710647 service_operation orangefs_unlink returning -2 for ffff880014828000
> 710648 orangefs_unlink: service_operation returned -2
>
>                 sad, because we got ENOENT on second rm
>
> 710649 Releasing OP ffff880014828000
>
>                 so... the userspace process (dbench in this case) thinks
>                 the rm failed, but it didn't.
>
>
>
> On Mon, Feb 22, 2016 at 11:20 AM, Mike Marshall <hubcap@omnibond.com> wrote:
>>  > Looks like I'd screwed up checking last time.
>>
>> Probably not that <g>... my branch did diverge over the course
>> of the few days that we were thrashing around in the kernel trying
>> to fix what I had broken two years ago in userspace.
>>
>> I can relate to why you were motivated to remove the thrashing
>> around from the git history, but your git-foo is much stronger
>> than mine. I wanted to try and get my branch back into line using
>> a methodology that I understand to keep from ending up like
>> this fellow:
>>
>> http://myweb.clemson.edu/~hubcap/harris.jpg
>>
>> I'm glad it worked out... my kernel.org for-next branch is updated now.
>>
>> so, I'll keep working the problem, using your d_drop idea first off...
>> I'll be back with more information, and hopefully even have it fixed, soon...
>>
>> -Mike
>>
>> On Sat, Feb 20, 2016 at 8:36 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> On Sat, Feb 20, 2016 at 07:14:26AM -0500, Mike Marshall wrote:
>>>
>>>> Your orangefs-untested branch has 5625087 commits. My "current" branch
>>>> has 5625087 commits. In each all of the commit signatures match, except
>>>> for the most recent 15 commits. The last 15 commits in my "current"
>>>> branch were made from your orangefs-untested branch with "git format-patch"
>>>> and applied to my "current" branch with "git am -s". "git log -p" shows that
>>>> my most recent 15 commits differ from your most recent 15 commits by
>>>> the addition of my "sign off" line.
>>>
>>> *blinks*
>>> *checks*
>>>
>>> OK, ignore what I asked, then.  Looks like I'd screwed up checking last time.
>>>
>>>> I will absolutely update my kernel.org for-next branch with the procedure you
>>>> outlined, because you said so.
>>>>
>>>> I wish I understood it better, though... I can only guess at this point that
>>>> the procedure you outlined will do some desirable thing to git metadata...?
>>>
>>> None whatsoever, ignore it.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mike Marshall Feb. 26, 2016, 8:21 p.m. UTC | #1
I have updated orangefs.txt to reflect all the work Al has done
recently. There are also several other cosmetic and comment
updates in the code, along with the time patches that Arnd
Bergmann sent today.

I would appreciate any corrections or comments on the new parts
in orangefs.txt,

thanks!

git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux.git
for-next

-Mike

On Tue, Feb 23, 2016 at 4:58 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> Ok, I understand these last couple of problems better.
>
> If the client-core crashes (kill -9 in my test cases) in
> the middle of a rename or an unlink (and maybe some other
> operations, these are the ones I have captured and studied) a
> couple of things can happen.
>
> In both cases, you get:
>
>   service_operation
>     queue the operation
>     wait_for_matching_downcall = -EAGAIN
>     queue the operation
>     wait_for_matching_downcall = 0
>     out:
>
> Sometimes when the operation is first queued, the client-core
> will be in the middle of the state machine code and the operation
> will be half done when the client-core dies, and the object that
> was being operated on will be broken. In other words, it is possible
> for a userspace program using the Orangefs native API to corrupt
> the filesystem if it crashes in a critical area. Do other userspace
> filesystems have this same problem?
>
> Other times, when the operation is first queued, the client-core
> gets the operation fully launched, and then dies. Then, when
> the operation is queued up again, the operation fails on -ENOENT.
> You can't rename a to b if a has already been renamed to b. You can't
> unlink a if there is no a.
>
> For the first case I don't see how there's anything that can be done.
> The filesystem is corrupted. Its not toast or anything, but there's
> a directory somewhere with a broken file in it.
>
> I have made a patch that appears to actually work and cause no bad
> side effects for the second case. Al has come colorful phrases,
> like "this is too ugly to live" and some others. What you think about
> this patch, Al <g>...
>
> The d_drop is how I implemented the idea you had at first, Al, I'm
> not sure now if it helps or hurts or is a no-op.
>
> # git --no-pager diff
> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
> index b3ae374..6d953b1 100644
> --- a/fs/orangefs/namei.c
> +++ b/fs/orangefs/namei.c
> @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir,
>     __func__,
>     dentry->d_name.name);
>   ret = PTR_ERR(inode);
> + d_drop(dentry);
>   goto out;
>   }
>
> @@ -246,12 +247,22 @@ static int orangefs_unlink(struct inode *dir,
> struct dentry *dentry)
>
>   op_release(new_op);
>
> - if (!ret) {
> + /*
> + * We would never have gotten here if the object didn't
> + * exist when we started down this path. There's a race
> + * condition where if a restart of the client-core
> + * coincides just right with an in-progress unlink a
> + * file can get deleted on the server and be gone
> + * when service-operation does the retry...
> + */
> + if ((!ret) || (ret == -ENOENT)) {
>   drop_nlink(inode);
>
>   SetMtimeFlag(parent);
>   dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
>   mark_inode_dirty_sync(dir);
> +
> + ret = 0;
>   }
>   return ret;
>  }
> @@ -433,6 +444,17 @@ static int orangefs_rename(struct inode *old_dir,
>       "orangefs_rename: got downcall status %d\n",
>       ret);
>
> + /*
> + * We would never have gotten here if the object didn't
> + * exist when we started down this path. There's a race
> + * condition where if a restart of the client-core
> + * coincides just right with an in-progress rename a
> + * file can get renamed on the server and be gone
> + * when service-operation does the retry...
> + */
> + if (ret == -ENOENT)
> + ret = 0;
> +
>   if (new_dentry->d_inode)
>   new_dentry->d_inode->i_ctime = CURRENT_TIME;
>
>
> On Mon, Feb 22, 2016 at 4:22 PM, Mike Marshall <hubcap@omnibond.com> wrote:
>> I did this and the problem seems fixed:
>>
>> # git diff
>> diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
>> index b3ae374..249bda5 100644
>> --- a/fs/orangefs/namei.c
>> +++ b/fs/orangefs/namei.c
>> @@ -61,6 +61,7 @@ static int orangefs_create(struct inode *dir,
>>                            __func__,
>>                            dentry->d_name.name);
>>                 ret = PTR_ERR(inode);
>> +               d_drop(dentry);
>>                 goto out;
>>         }
>>
>> Of course, this has uncovered yet another reproducible problem:
>>
>> 710055 orangefs_unlink: called on PPTB1E4.TMP
>> 710058 service_operation: orangefs_unlink ffff880014828000
>>
>>                 right in here I think the rm is
>>                 being processed in the server just
>>                 as the client-core has died.
>>
>> 710534 wait_for_matching_downcall: operation purged ffff880014828000
>> 710538 service_operation: orangefs_unlink ffff880014828000
>> 710539 service_operation:client core is NOT in service
>>
>>                 right in here I think stuff starts
>>                 working again and we're going
>>                 to unsuccessfully try to process
>>                 the rm again.
>>
>> 710646 wait_for_matching_downcall returned 0 for ffff880014828000
>>
>>                 happy, because we got the matching downcall
>>
>> 710647 service_operation orangefs_unlink returning -2 for ffff880014828000
>> 710648 orangefs_unlink: service_operation returned -2
>>
>>                 sad, because we got ENOENT on second rm
>>
>> 710649 Releasing OP ffff880014828000
>>
>>                 so... the userspace process (dbench in this case) thinks
>>                 the rm failed, but it didn't.
>>
>>
>>
>> On Mon, Feb 22, 2016 at 11:20 AM, Mike Marshall <hubcap@omnibond.com> wrote:
>>>  > Looks like I'd screwed up checking last time.
>>>
>>> Probably not that <g>... my branch did diverge over the course
>>> of the few days that we were thrashing around in the kernel trying
>>> to fix what I had broken two years ago in userspace.
>>>
>>> I can relate to why you were motivated to remove the thrashing
>>> around from the git history, but your git-foo is much stronger
>>> than mine. I wanted to try and get my branch back into line using
>>> a methodology that I understand to keep from ending up like
>>> this fellow:
>>>
>>> http://myweb.clemson.edu/~hubcap/harris.jpg
>>>
>>> I'm glad it worked out... my kernel.org for-next branch is updated now.
>>>
>>> so, I'll keep working the problem, using your d_drop idea first off...
>>> I'll be back with more information, and hopefully even have it fixed, soon...
>>>
>>> -Mike
>>>
>>> On Sat, Feb 20, 2016 at 8:36 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>>> On Sat, Feb 20, 2016 at 07:14:26AM -0500, Mike Marshall wrote:
>>>>
>>>>> Your orangefs-untested branch has 5625087 commits. My "current" branch
>>>>> has 5625087 commits. In each all of the commit signatures match, except
>>>>> for the most recent 15 commits. The last 15 commits in my "current"
>>>>> branch were made from your orangefs-untested branch with "git format-patch"
>>>>> and applied to my "current" branch with "git am -s". "git log -p" shows that
>>>>> my most recent 15 commits differ from your most recent 15 commits by
>>>>> the addition of my "sign off" line.
>>>>
>>>> *blinks*
>>>> *checks*
>>>>
>>>> OK, ignore what I asked, then.  Looks like I'd screwed up checking last time.
>>>>
>>>>> I will absolutely update my kernel.org for-next branch with the procedure you
>>>>> outlined, because you said so.
>>>>>
>>>>> I wish I understood it better, though... I can only guess at this point that
>>>>> the procedure you outlined will do some desirable thing to git metadata...?
>>>>
>>>> None whatsoever, ignore it.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/orangefs/namei.c b/fs/orangefs/namei.c
index b3ae374..6d953b1 100644
--- a/fs/orangefs/namei.c
+++ b/fs/orangefs/namei.c
@@ -61,6 +61,7 @@  static int orangefs_create(struct inode *dir,
    __func__,
    dentry->d_name.name);
  ret = PTR_ERR(inode);
+ d_drop(dentry);
  goto out;
  }

@@ -246,12 +247,22 @@  static int orangefs_unlink(struct inode *dir,
struct dentry *dentry)

  op_release(new_op);

- if (!ret) {
+ /*
+ * We would never have gotten here if the object didn't
+ * exist when we started down this path. There's a race
+ * condition where if a restart of the client-core
+ * coincides just right with an in-progress unlink a
+ * file can get deleted on the server and be gone
+ * when service-operation does the retry...
+ */
+ if ((!ret) || (ret == -ENOENT)) {
  drop_nlink(inode);

  SetMtimeFlag(parent);
  dir->i_mtime = dir->i_ctime = current_fs_time(dir->i_sb);
  mark_inode_dirty_sync(dir);
+
+ ret = 0;
  }
  return ret;