From patchwork Thu Feb 18 18:58:52 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mike Marshall <hubcap@omnibond.com>
X-Patchwork-Id: 8353601
Return-Path: <linux-fsdevel-owner@kernel.org>
X-Original-To: patchwork-linux-fsdevel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 7A9349F372
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Thu, 18 Feb 2016 18:59:01 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 8DAA120397
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Thu, 18 Feb 2016 18:58:59 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 35A3D20398
	for <patchwork-linux-fsdevel@patchwork.kernel.org>;
	Thu, 18 Feb 2016 18:58:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1947414AbcBRS6z (ORCPT
	<rfc822;patchwork-linux-fsdevel@patchwork.kernel.org>);
	Thu, 18 Feb 2016 13:58:55 -0500
Received: from mail-lf0-f45.google.com ([209.85.215.45]:36442 "EHLO
	mail-lf0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1947136AbcBRS6y (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Thu, 18 Feb 2016 13:58:54 -0500
Received: by mail-lf0-f45.google.com with SMTP id 78so39008870lfy.3
	for <linux-fsdevel@vger.kernel.org>;
	Thu, 18 Feb 2016 10:58:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=omnibond-com.20150623.gappssmtp.com; s=20150623;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=D+ZdmvQJCA6GS0UJPqojfqOencQAziZy4eqWrDdiDjU=;
	b=EeV6RoIIuu7YiyuAUM4FD3qjmNjwv45DytItPwfD8SLONp0I19neE4ldvg5YvnV/lA
	jXnxEhD3NXoF/4F87rfYP41E9uMtF6XPX0MBSGFPtitmkNn0kQ7TeWdUtdJRu6NKFjC+
	z7h82D1u0t5MQXvNVodZsZb1miJhxmhxlAWcm3ShgjeBn1vuz42ginQ4IMQrtVFXagWG
	OPtNPRM6Pr2VNavIZU18GCdpmvnmVBw1pl7tkk9uSSPMLxk0osZxhKNW/V1zdN23G94n
	8qXKpX/f3FO1IF+X0+J499cjljOZznUO5yYYHkZsa/9ZkZZUqDmMoVsdafYwgwjZOJ2Z
	cWfA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=D+ZdmvQJCA6GS0UJPqojfqOencQAziZy4eqWrDdiDjU=;
	b=f7owF9QYZ55bTS3isLCGCI0Jz0I6DVV/mnnJPz57C6jQ8fddV1nXvWsbEZlrVDp7Sb
	lyQBOQ8FkEnXbGvajy8Lg3YCmqY1k1co+NibVlxTaY54ScsP5QNMxn0ob/o7EjCyKJaA
	KoHYUG14f5Y1OjV4NTEhWSTSMtIVChZf1tO4tuunkSZDLTwvy7dIQnQfa1+KbVaoLRUY
	Ns5H/OVwZgblHJlzzwqR4WHqxkRJ6UUXMgiXrzjnZWqoC6XGb0nsAOuugSL6Uv3HcLLd
	4o0hK0iD1z/LCnCG0kzAM5Syo0oaM55NosQH9EPA7PRlxYT6MqnKCUe79ykjc7SP59SP
	JGoQ==
X-Gm-Message-State: 
 AG10YOSHShe5mOZ5jlO8gr2y2NVKjvPpSS4ZHKHoIjjctMc1CCfZ/8CAY9m3I7bGdGtXBpxXA627cBkBUPduGA==
MIME-Version: 1.0
X-Received: by 10.25.43.148 with SMTP id r142mr3885594lfr.103.1455821932291;
	Thu, 18 Feb 2016 10:58:52 -0800 (PST)
Received: by 10.112.161.106 with HTTP; Thu, 18 Feb 2016 10:58:52 -0800 (PST)
In-Reply-To: <20160218111122.GS17997@ZenIV.linux.org.uk>
References: 
 <CA+D=wkgGZ9A8Qa5C6q3cROrr+Gp=jsgowvcbOs-22UU=aVT7Wg@mail.gmail.com>
	<20160215230434.GZ17997@ZenIV.linux.org.uk>
	<CAOg9mSSdsVL8tRiAojg=UCNPQ6iPcthtdowH9kWyiWnXUvTEHg@mail.gmail.com>
	<20160216233609.GE17997@ZenIV.linux.org.uk>
	<20160216235441.GF17997@ZenIV.linux.org.uk>
	<CAOg9mSR0QZOy-JAesN32o51omxdOQyfUh-Zj8k7ZohKLaa3GxQ@mail.gmail.com>
	<alpine.BSO.2.20.1602171722490.29280@server1.sysblue.org>
	<20160217230900.GP17997@ZenIV.linux.org.uk>
	<20160217231524.GQ17997@ZenIV.linux.org.uk>
	<20160218000439.GR17997@ZenIV.linux.org.uk>
	<20160218111122.GS17997@ZenIV.linux.org.uk>
Date: Thu, 18 Feb 2016 13:58:52 -0500
Message-ID: 
 <CAOg9mSS11uSvfBkNYPeNBw8xMebvHAM3vzh82BH=W273-7oNyg@mail.gmail.com>
Subject: Re: Orangefs ABI documentation
From: Mike Marshall <hubcap@omnibond.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Brandenburg <martin@omnibond.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Mike Marshall <hubcap@omnibond.com>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,T_DKIM_INVALID,UNPARSEABLE_RELAY
	autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

Still busted, exactly the same, I think. The doomed op gets a good
return code from is_daemon_in_service in service_operation but
gets EAGAIN from wait_for_matching_downcall... an edge case kind of
problem.

Here's the raw (well, slightly edited for readability) logs showing
the doomed op and subsequent failed op that uses the bogus handle
and fsid from the doomed op.


Alloced OP (ffff880012898000: 10889 OP_CREATE)
service_operation: orangefs_create op:ffff880012898000:


wait_for_matching_downcall: operation purged (tag 10889, ffff880012898000, att 0
service_operation: wait_for_matching_downcall returned -11 for ffff880012898000
Interrupted: Removed op ffff880012898000 from htable_ops_in_progress
tag 10889 (orangefs_create) -- operation to be retried (1 attempt)
service_operation: orangefs_create op:ffff880012898000:
service_operation:client core is NOT in service, ffff880012898000


service_operation: wait_for_matching_downcall returned 0 for ffff880012898000
service_operation orangefs_create returning: 0 for ffff880012898000
orangefs_create: PPTOOLS1.PPA:
handle:00000000-0000-0000-0000-000000000000: fsid:0:
new_op:ffff880012898000: ret:0:


Alloced OP (ffff880012888000: 10958 OP_GETATTR)
service_operation: orangefs_inode_getattr op:ffff880012888000:
service_operation: wait_for_matching_downcall returned 0 for ffff880012888000
service_operation orangefs_inode_getattr returning: -22 for ffff880012888000
Releasing OP (ffff880012888000: 10958
orangefs_create: Failed to allocate inode for file :PPTOOLS1.PPA:
Releasing OP (ffff880012898000: 10889


What I'm testing with differs from what is at kernel.org#for-next by
  - diffs from Al's most recent email
  - 1 souped up gossip message
  - changed 0 to OP_VFS_STATE_UNKNOWN one place in service_operation
  - reinit_completion(&op->waitq) in orangefs_clean_up_interrupted_operation


      "Interrupted: Removed op %p from request_list\n",
@@ -225,24 +231,18 @@ static void
orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
  /* op must be removed from the in progress htable */
  spin_unlock(&op->lock);
  spin_lock(&htable_ops_in_progress_lock);
- list_del(&op->list);
+ list_del_init(&op->list);
  spin_unlock(&htable_ops_in_progress_lock);
  gossip_debug(GOSSIP_WAIT_DEBUG,
      "Interrupted: Removed op %p"
      " from htable_ops_in_progress\n",
      op);
- } else if (!op_state_serviced(op)) {
+ } else {
  spin_unlock(&op->lock);
  gossip_err("interrupted operation is in a weird state 0x%x\n",
    op->op_state);
- } else {
- /*
- * It is not intended for execution to flow here,
- * but having this unlock here makes sparse happy.
- */
- gossip_err("%s: can't get here.\n", __func__);
- spin_unlock(&op->lock);
  }
+ reinit_completion(&op->waitq);
 }

 /*

On Thu, Feb 18, 2016 at 6:11 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Feb 18, 2016 at 12:04:39AM +0000, Al Viro wrote:
>> Looks like the right approach is to have orangefs_clean_... hitting the
>> sucker being copied to/from daemon to wait until that's finished (and
>> discarded).  That, BTW, would have an extra benefit of making life simpler
>> for refcounting.
>>
>> So...  We need to have them marked as "being copied" for the duration, instead
>> of bumping the refcount.  That setting and dropping that flag should happen
>> under op->lock.  Setting it should happen only if it's not given up (that would
>> be interpreted as "not found").  Cleaning, OTOH, would recheck the "given up"
>> and do complete(&op->waitq) in case it's been given up...
>>
>> How about this (instead of the previous variant, includes a fix for
>> errno bogosity spotted a bit upthread; if it works, it'll need a bit of
>> splitup)
>
> Better yet, let's use list_del_init() on op->list instead of those list_del().
> Then, seeing that ..._clean_interrupted_... can't be called in case of
> serviced (we hadn't dropped op->lock since the time we'd checked it), we
> can use list_empty(&op->list) as a test for "given up while copying to/from
> daemon", so there's no need for separate flag that way:
>         * we never pick given up op from list/hash
>         * daemon read/write_iter never modifies op->list after op has
> been given up
>         * if op is given up while copying to/from userland in daemon
> read/write_iter, it will call complete(&op->waitq) once it finds that,
> so giveup side can wait for completion if it finds op it's about to give
> up not on any list.
>
> Should be equivalent to the previous variant, but IMO it's cleaner that
> way...
>
> diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
> index b27ed1c..f7914f5 100644
> --- a/fs/orangefs/devorangefs-req.c
> +++ b/fs/orangefs/devorangefs-req.c
> @@ -58,9 +58,9 @@ static struct orangefs_kernel_op_s *orangefs_devreq_remove_op(__u64 tag)
>                                  next,
>                                  &htable_ops_in_progress[index],
>                                  list) {
> -               if (op->tag == tag && !op_state_purged(op)) {
> +               if (op->tag == tag && !op_state_purged(op) &&
> +                   !op_state_given_up(op)) {
>                         list_del_init(&op->list);
> -                       get_op(op); /* increase ref count. */
>                         spin_unlock(&htable_ops_in_progress_lock);
>                         return op;
>                 }
> @@ -133,7 +133,7 @@ restart:
>                 __s32 fsid;
>                 /* This lock is held past the end of the loop when we break. */
>                 spin_lock(&op->lock);
> -               if (unlikely(op_state_purged(op))) {
> +               if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
>                         spin_unlock(&op->lock);
>                         continue;
>                 }
> @@ -199,13 +199,12 @@ restart:
>          */
>         if (op_state_in_progress(cur_op) || op_state_serviced(cur_op)) {
>                 gossip_err("orangefs: ERROR: Current op already queued.\n");
> -               list_del(&cur_op->list);
> +               list_del_init(&cur_op->list);
>                 spin_unlock(&cur_op->lock);
>                 spin_unlock(&orangefs_request_list_lock);
>                 return -EAGAIN;
>         }
>         list_del_init(&cur_op->list);
> -       get_op(op);
>         spin_unlock(&orangefs_request_list_lock);
>
>         spin_unlock(&cur_op->lock);
> @@ -230,7 +229,7 @@ restart:
>         if (unlikely(op_state_given_up(cur_op))) {
>                 spin_unlock(&cur_op->lock);
>                 spin_unlock(&htable_ops_in_progress_lock);
> -               op_release(cur_op);
> +               complete(&cur_op->waitq);
>                 goto restart;
>         }
>
> @@ -242,7 +241,6 @@ restart:
>         orangefs_devreq_add_op(cur_op);
>         spin_unlock(&cur_op->lock);
>         spin_unlock(&htable_ops_in_progress_lock);
> -       op_release(cur_op);
>
>         /* The client only asks to read one size buffer. */
>         return MAX_DEV_REQ_UPSIZE;
> @@ -258,10 +256,12 @@ error:
>         if (likely(!op_state_given_up(cur_op))) {
>                 set_op_state_waiting(cur_op);
>                 list_add(&cur_op->list, &orangefs_request_list);
> +               spin_unlock(&cur_op->lock);
> +       } else {
> +               spin_unlock(&cur_op->lock);
> +               complete(&cur_op->waitq);
>         }
> -       spin_unlock(&cur_op->lock);
>         spin_unlock(&orangefs_request_list_lock);
> -       op_release(cur_op);
>         return -EFAULT;
>  }
>
> @@ -333,8 +333,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>         n = copy_from_iter(&op->downcall, downcall_size, iter);
>         if (n != downcall_size) {
>                 gossip_err("%s: failed to copy downcall.\n", __func__);
> -               ret = -EFAULT;
> -               goto Broken;
> +               goto Efault;
>         }
>
>         if (op->downcall.status)
> @@ -354,8 +353,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>                            downcall_size,
>                            op->downcall.trailer_size,
>                            total);
> -               ret = -EFAULT;
> -               goto Broken;
> +               goto Efault;
>         }
>
>         /* Only READDIR operations should have trailers. */
> @@ -364,8 +362,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>                 gossip_err("%s: %x operation with trailer.",
>                            __func__,
>                            op->downcall.type);
> -               ret = -EFAULT;
> -               goto Broken;
> +               goto Efault;
>         }
>
>         /* READDIR operations should always have trailers. */
> @@ -374,8 +371,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>                 gossip_err("%s: %x operation with no trailer.",
>                            __func__,
>                            op->downcall.type);
> -               ret = -EFAULT;
> -               goto Broken;
> +               goto Efault;
>         }
>
>         if (op->downcall.type != ORANGEFS_VFS_OP_READDIR)
> @@ -386,8 +382,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>         if (op->downcall.trailer_buf == NULL) {
>                 gossip_err("%s: failed trailer vmalloc.\n",
>                            __func__);
> -               ret = -ENOMEM;
> -               goto Broken;
> +               goto Enomem;
>         }
>         memset(op->downcall.trailer_buf, 0, op->downcall.trailer_size);
>         n = copy_from_iter(op->downcall.trailer_buf,
> @@ -396,8 +391,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
>         if (n != op->downcall.trailer_size) {
>                 gossip_err("%s: failed to copy trailer.\n", __func__);
>                 vfree(op->downcall.trailer_buf);
> -               ret = -EFAULT;
> -               goto Broken;
> +               goto Efault;
>         }
>
>  wakeup:
> @@ -406,38 +400,27 @@ wakeup:
>          * that this op is done
>          */
>         spin_lock(&op->lock);
> -       if (unlikely(op_state_given_up(op))) {
> +       if (unlikely(op_is_cancel(op))) {
>                 spin_unlock(&op->lock);
> -               goto out;
> -       }
> -       set_op_state_serviced(op);
> -       spin_unlock(&op->lock);
> -
> -       /*
> -        * If this operation is an I/O operation we need to wait
> -        * for all data to be copied before we can return to avoid
> -        * buffer corruption and races that can pull the buffers
> -        * out from under us.
> -        *
> -        * Essentially we're synchronizing with other parts of the
> -        * vfs implicitly by not allowing the user space
> -        * application reading/writing this device to return until
> -        * the buffers are done being used.
> -        */
> -out:
> -       if (unlikely(op_is_cancel(op)))
>                 put_cancel(op);
> -       op_release(op);
> -       return ret;
> -
> -Broken:
> -       spin_lock(&op->lock);
> -       if (!op_state_given_up(op)) {
> -               op->downcall.status = ret;
> +       } else if (unlikely(op_state_given_up(op))) {
> +               spin_unlock(&op->lock);
> +               complete(&op->waitq);
> +       } else {
>                 set_op_state_serviced(op);
> +               spin_unlock(&op->lock);
>         }
> -       spin_unlock(&op->lock);
> -       goto out;
> +       return ret;
> +
> +Efault:
> +       op->downcall.status = -(ORANGEFS_ERROR_BIT | 9);
> +       ret = -EFAULT;
> +       goto wakeup;
> +
> +Enomem:
> +       op->downcall.status = -(ORANGEFS_ERROR_BIT | 8);
> +       ret = -ENOMEM;
> +       goto wakeup;
>  }
>
>  /* Returns whether any FS are still pending remounted */
> diff --git a/fs/orangefs/orangefs-cache.c b/fs/orangefs/orangefs-cache.c
> index 817092a..900a2e3 100644
> --- a/fs/orangefs/orangefs-cache.c
> +++ b/fs/orangefs/orangefs-cache.c
> @@ -120,8 +120,6 @@ struct orangefs_kernel_op_s *op_alloc(__s32 type)
>                 spin_lock_init(&new_op->lock);
>                 init_completion(&new_op->waitq);
>
> -               atomic_set(&new_op->ref_count, 1);
> -
>                 new_op->upcall.type = ORANGEFS_VFS_OP_INVALID;
>                 new_op->downcall.type = ORANGEFS_VFS_OP_INVALID;
>                 new_op->downcall.status = -1;
> @@ -149,7 +147,7 @@ struct orangefs_kernel_op_s *op_alloc(__s32 type)
>         return new_op;
>  }
>
> -void __op_release(struct orangefs_kernel_op_s *orangefs_op)
> +void op_release(struct orangefs_kernel_op_s *orangefs_op)
>  {
>         if (orangefs_op) {
>                 gossip_debug(GOSSIP_CACHE_DEBUG,
> diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
> index 1f8310c..e387d3c 100644
> --- a/fs/orangefs/orangefs-kernel.h
> +++ b/fs/orangefs/orangefs-kernel.h
> @@ -205,8 +205,6 @@ struct orangefs_kernel_op_s {
>         struct completion waitq;
>         spinlock_t lock;
>
> -       atomic_t ref_count;
> -
>         /* VFS aio fields */
>
>         int attempts;
> @@ -230,23 +228,7 @@ static inline void set_op_state_serviced(struct orangefs_kernel_op_s *op)
>  #define op_state_given_up(op)    ((op)->op_state & OP_VFS_STATE_GIVEN_UP)
>  #define op_is_cancel(op)         ((op)->downcall.type == ORANGEFS_VFS_OP_CANCEL)
>
> -static inline void get_op(struct orangefs_kernel_op_s *op)
> -{
> -       atomic_inc(&op->ref_count);
> -       gossip_debug(GOSSIP_DEV_DEBUG,
> -                       "(get) Alloced OP (%p:%llu)\n", op, llu(op->tag));
> -}
> -
> -void __op_release(struct orangefs_kernel_op_s *op);
> -
> -static inline void op_release(struct orangefs_kernel_op_s *op)
> -{
> -       if (atomic_dec_and_test(&op->ref_count)) {
> -               gossip_debug(GOSSIP_DEV_DEBUG,
> -                       "(put) Releasing OP (%p:%llu)\n", op, llu((op)->tag));
> -               __op_release(op);
> -       }
> -}
> +void op_release(struct orangefs_kernel_op_s *op);
>
>  extern void orangefs_bufmap_put(int);
>  static inline void put_cancel(struct orangefs_kernel_op_s *op)
> @@ -259,7 +241,7 @@ static inline void set_op_state_purged(struct orangefs_kernel_op_s *op)
>  {
>         spin_lock(&op->lock);
>         if (unlikely(op_is_cancel(op))) {
> -               list_del(&op->list);
> +               list_del_init(&op->list);
>                 spin_unlock(&op->lock);
>                 put_cancel(op);
>         } else {
> diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
> index d980240..3f9e430 100644
> --- a/fs/orangefs/waitqueue.c
> +++ b/fs/orangefs/waitqueue.c
> @@ -208,15 +208,20 @@ static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
>          * Called with op->lock held.
>          */
>         op->op_state |= OP_VFS_STATE_GIVEN_UP;
> -
> -       if (op_state_waiting(op)) {
> +       /* from that point on it can't be moved by anybody else */
> +       if (list_empty(&op->list)) {
> +               /* caught copying to/from daemon */
> +               BUG_ON(op_state_serviced(op));
> +               spin_unlock(&op->lock);
> +               wait_for_completion(&op->waitq);
> +       } else if (op_state_waiting(op)) {
>                 /*
>                  * upcall hasn't been read; remove op from upcall request
>                  * list.
>                  */
>                 spin_unlock(&op->lock);
>                 spin_lock(&orangefs_request_list_lock);
> -               list_del(&op->list);
> +               list_del_init(&op->list);
>                 spin_unlock(&orangefs_request_list_lock);
>                 gossip_debug(GOSSIP_WAIT_DEBUG,
>                              "Interrupted: Removed op %p from request_list\n",
> @@ -225,23 +230,16 @@ static void orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
>                 /* op must be removed from the in progress htable */
>                 spin_unlock(&op->lock);
>                 spin_lock(&htable_ops_in_progress_lock);
> -               list_del(&op->list);
> +               list_del_init(&op->list);
>                 spin_unlock(&htable_ops_in_progress_lock);
>                 gossip_debug(GOSSIP_WAIT_DEBUG,
>                              "Interrupted: Removed op %p"
>                              " from htable_ops_in_progress\n",
>                              op);
> -       } else if (!op_state_serviced(op)) {
> +       } else {
>                 spin_unlock(&op->lock);
>                 gossip_err("interrupted operation is in a weird state 0x%x\n",
>                            op->op_state);
> -       } else {
> -               /*
> -                * It is not intended for execution to flow here,
> -                * but having this unlock here makes sparse happy.
> -                */
> -               gossip_err("%s: can't get here.\n", __func__);
> -               spin_unlock(&op->lock);
>         }
>         reinit_completion(&op->waitq);
>  }
---
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/orangefs/devorangefs-req.c b/fs/orangefs/devorangefs-req.c
index b27ed1c..f7914f5 100644
--- a/fs/orangefs/devorangefs-req.c
+++ b/fs/orangefs/devorangefs-req.c
@@ -58,9 +58,9 @@ static struct orangefs_kernel_op_s
*orangefs_devreq_remove_op(__u64 tag)
  next,
  &htable_ops_in_progress[index],
  list) {
- if (op->tag == tag && !op_state_purged(op)) {
+ if (op->tag == tag && !op_state_purged(op) &&
+    !op_state_given_up(op)) {
  list_del_init(&op->list);
- get_op(op); /* increase ref count. */
  spin_unlock(&htable_ops_in_progress_lock);
  return op;
  }
@@ -133,7 +133,7 @@ restart:
  __s32 fsid;
  /* This lock is held past the end of the loop when we break. */
  spin_lock(&op->lock);
- if (unlikely(op_state_purged(op))) {
+ if (unlikely(op_state_purged(op) || op_state_given_up(op))) {
  spin_unlock(&op->lock);
  continue;
  }
@@ -199,13 +199,12 @@ restart:
  */
  if (op_state_in_progress(cur_op) || op_state_serviced(cur_op)) {
  gossip_err("orangefs: ERROR: Current op already queued.\n");
- list_del(&cur_op->list);
+ list_del_init(&cur_op->list);
  spin_unlock(&cur_op->lock);
  spin_unlock(&orangefs_request_list_lock);
  return -EAGAIN;
  }
  list_del_init(&cur_op->list);
- get_op(op);
  spin_unlock(&orangefs_request_list_lock);

  spin_unlock(&cur_op->lock);
@@ -230,7 +229,7 @@ restart:
  if (unlikely(op_state_given_up(cur_op))) {
  spin_unlock(&cur_op->lock);
  spin_unlock(&htable_ops_in_progress_lock);
- op_release(cur_op);
+ complete(&cur_op->waitq);
  goto restart;
  }

@@ -242,7 +241,6 @@ restart:
  orangefs_devreq_add_op(cur_op);
  spin_unlock(&cur_op->lock);
  spin_unlock(&htable_ops_in_progress_lock);
- op_release(cur_op);

  /* The client only asks to read one size buffer. */
  return MAX_DEV_REQ_UPSIZE;
@@ -258,10 +256,12 @@ error:
  if (likely(!op_state_given_up(cur_op))) {
  set_op_state_waiting(cur_op);
  list_add(&cur_op->list, &orangefs_request_list);
+ spin_unlock(&cur_op->lock);
+ } else {
+ spin_unlock(&cur_op->lock);
+ complete(&cur_op->waitq);
  }
- spin_unlock(&cur_op->lock);
  spin_unlock(&orangefs_request_list_lock);
- op_release(cur_op);
  return -EFAULT;
 }

@@ -333,8 +333,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
  n = copy_from_iter(&op->downcall, downcall_size, iter);
  if (n != downcall_size) {
  gossip_err("%s: failed to copy downcall.\n", __func__);
- ret = -EFAULT;
- goto Broken;
+ goto Efault;
  }

  if (op->downcall.status)
@@ -354,8 +353,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
    downcall_size,
    op->downcall.trailer_size,
    total);
- ret = -EFAULT;
- goto Broken;
+ goto Efault;
  }

  /* Only READDIR operations should have trailers. */
@@ -364,8 +362,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
  gossip_err("%s: %x operation with trailer.",
    __func__,
    op->downcall.type);
- ret = -EFAULT;
- goto Broken;
+ goto Efault;
  }

  /* READDIR operations should always have trailers. */
@@ -374,8 +371,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
  gossip_err("%s: %x operation with no trailer.",
    __func__,
    op->downcall.type);
- ret = -EFAULT;
- goto Broken;
+ goto Efault;
  }

  if (op->downcall.type != ORANGEFS_VFS_OP_READDIR)
@@ -386,8 +382,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
  if (op->downcall.trailer_buf == NULL) {
  gossip_err("%s: failed trailer vmalloc.\n",
    __func__);
- ret = -ENOMEM;
- goto Broken;
+ goto Enomem;
  }
  memset(op->downcall.trailer_buf, 0, op->downcall.trailer_size);
  n = copy_from_iter(op->downcall.trailer_buf,
@@ -396,8 +391,7 @@ static ssize_t orangefs_devreq_write_iter(struct
kiocb *iocb,
  if (n != op->downcall.trailer_size) {
  gossip_err("%s: failed to copy trailer.\n", __func__);
  vfree(op->downcall.trailer_buf);
- ret = -EFAULT;
- goto Broken;
+ goto Efault;
  }

 wakeup:
@@ -406,38 +400,27 @@ wakeup:
  * that this op is done
  */
  spin_lock(&op->lock);
- if (unlikely(op_state_given_up(op))) {
+ if (unlikely(op_is_cancel(op))) {
  spin_unlock(&op->lock);
- goto out;
- }
- set_op_state_serviced(op);
- spin_unlock(&op->lock);
-
- /*
- * If this operation is an I/O operation we need to wait
- * for all data to be copied before we can return to avoid
- * buffer corruption and races that can pull the buffers
- * out from under us.
- *
- * Essentially we're synchronizing with other parts of the
- * vfs implicitly by not allowing the user space
- * application reading/writing this device to return until
- * the buffers are done being used.
- */
-out:
- if (unlikely(op_is_cancel(op)))
  put_cancel(op);
- op_release(op);
- return ret;
-
-Broken:
- spin_lock(&op->lock);
- if (!op_state_given_up(op)) {
- op->downcall.status = ret;
+ } else if (unlikely(op_state_given_up(op))) {
+ spin_unlock(&op->lock);
+ complete(&op->waitq);
+ } else {
  set_op_state_serviced(op);
+ spin_unlock(&op->lock);
  }
- spin_unlock(&op->lock);
- goto out;
+ return ret;
+
+Efault:
+ op->downcall.status = -(ORANGEFS_ERROR_BIT | 9);
+ ret = -EFAULT;
+ goto wakeup;
+
+Enomem:
+ op->downcall.status = -(ORANGEFS_ERROR_BIT | 8);
+ ret = -ENOMEM;
+ goto wakeup;
 }

 /* Returns whether any FS are still pending remounted */
diff --git a/fs/orangefs/orangefs-cache.c b/fs/orangefs/orangefs-cache.c
index 817092a..900a2e3 100644
--- a/fs/orangefs/orangefs-cache.c
+++ b/fs/orangefs/orangefs-cache.c
@@ -120,8 +120,6 @@ struct orangefs_kernel_op_s *op_alloc(__s32 type)
  spin_lock_init(&new_op->lock);
  init_completion(&new_op->waitq);

- atomic_set(&new_op->ref_count, 1);
-
  new_op->upcall.type = ORANGEFS_VFS_OP_INVALID;
  new_op->downcall.type = ORANGEFS_VFS_OP_INVALID;
  new_op->downcall.status = -1;
@@ -149,7 +147,7 @@ struct orangefs_kernel_op_s *op_alloc(__s32 type)
  return new_op;
 }

-void __op_release(struct orangefs_kernel_op_s *orangefs_op)
+void op_release(struct orangefs_kernel_op_s *orangefs_op)
 {
  if (orangefs_op) {
  gossip_debug(GOSSIP_CACHE_DEBUG,
diff --git a/fs/orangefs/orangefs-kernel.h b/fs/orangefs/orangefs-kernel.h
index 1f8310c..e387d3c 100644
--- a/fs/orangefs/orangefs-kernel.h
+++ b/fs/orangefs/orangefs-kernel.h
@@ -205,8 +205,6 @@ struct orangefs_kernel_op_s {
  struct completion waitq;
  spinlock_t lock;

- atomic_t ref_count;
-
  /* VFS aio fields */

  int attempts;
@@ -230,23 +228,7 @@ static inline void set_op_state_serviced(struct
orangefs_kernel_op_s *op)
 #define op_state_given_up(op)    ((op)->op_state & OP_VFS_STATE_GIVEN_UP)
 #define op_is_cancel(op)         ((op)->downcall.type ==
ORANGEFS_VFS_OP_CANCEL)

-static inline void get_op(struct orangefs_kernel_op_s *op)
-{
- atomic_inc(&op->ref_count);
- gossip_debug(GOSSIP_DEV_DEBUG,
- "(get) Alloced OP (%p:%llu)\n", op, llu(op->tag));
-}
-
-void __op_release(struct orangefs_kernel_op_s *op);
-
-static inline void op_release(struct orangefs_kernel_op_s *op)
-{
- if (atomic_dec_and_test(&op->ref_count)) {
- gossip_debug(GOSSIP_DEV_DEBUG,
- "(put) Releasing OP (%p:%llu)\n", op, llu((op)->tag));
- __op_release(op);
- }
-}
+void op_release(struct orangefs_kernel_op_s *op);

 extern void orangefs_bufmap_put(int);
 static inline void put_cancel(struct orangefs_kernel_op_s *op)
@@ -259,7 +241,7 @@ static inline void set_op_state_purged(struct
orangefs_kernel_op_s *op)
 {
  spin_lock(&op->lock);
  if (unlikely(op_is_cancel(op))) {
- list_del(&op->list);
+ list_del_init(&op->list);
  spin_unlock(&op->lock);
  put_cancel(op);
  } else {
diff --git a/fs/orangefs/waitqueue.c b/fs/orangefs/waitqueue.c
index 2528d58..0ea2741 100644
--- a/fs/orangefs/waitqueue.c
+++ b/fs/orangefs/waitqueue.c
@@ -65,7 +65,7 @@ int service_operation(struct orangefs_kernel_op_s *op,
  op->upcall.pid = current->pid;

 retry_servicing:
- op->downcall.status = 0;
+ op->downcall.status = OP_VFS_STATE_UNKNOWN;
  gossip_debug(GOSSIP_WAIT_DEBUG,
      "%s: %s op:%p: process:%s: pid:%d:\n",
      __func__,
@@ -103,8 +103,9 @@ retry_servicing:
  wake_up_interruptible(&orangefs_request_list_waitq);
  if (!__is_daemon_in_service()) {
  gossip_debug(GOSSIP_WAIT_DEBUG,
-     "%s:client core is NOT in service.\n",
-     __func__);
+     "%s:client core is NOT in service, %p.\n",
+     __func__,
+     op);
  timeout = op_timeout_secs * HZ;
  }
  spin_unlock(&orangefs_request_list_lock);
@@ -208,15 +209,20 @@ static void
orangefs_clean_up_interrupted_operation(struct orangefs_kernel_op_s
  * Called with op->lock held.
  */
  op->op_state |= OP_VFS_STATE_GIVEN_UP;
-
- if (op_state_waiting(op)) {
+ /* from that point on it can't be moved by anybody else */
+ if (list_empty(&op->list)) {
+ /* caught copying to/from daemon */
+ BUG_ON(op_state_serviced(op));
+ spin_unlock(&op->lock);
+ wait_for_completion(&op->waitq);
+ } else if (op_state_waiting(op)) {
  /*
  * upcall hasn't been read; remove op from upcall request
  * list.
  */
  spin_unlock(&op->lock);
  spin_lock(&orangefs_request_list_lock);
- list_del(&op->list);
+ list_del_init(&op->list);
  spin_unlock(&orangefs_request_list_lock);
  gossip_debug(GOSSIP_WAIT_DEBUG,