diff mbox

Issue #5876 : assertion failure in rbd_img_obj_callback()

Message ID 1395753516.2823.37.camel@localhost (mailing list archive)
State New, archived
Headers show

Commit Message

Olivier Bonvalet March 25, 2014, 1:18 p.m. UTC
Le mardi 25 mars 2014 à 14:57 +0200, Ilya Dryomov a écrit :
> On Tue, Mar 25, 2014 at 2:51 PM, Alex Elder <elder@ieee.org> wrote:
> > On 03/25/2014 07:34 AM, Ilya Dryomov wrote:
> >>> On 03/25/2014 04:04 AM, Ilya Dryomov wrote:
> >>>> On Tue, Mar 25, 2014 at 10:39 AM, Olivier Bonvalet <ceph.list@daevel.fr> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> what can/should I do to help fix that problem ?
> >>>>>
> >>>>> for now, RBD kernel client hang on :
> >>>>>         Assertion failure in rbd_img_obj_callback() at line 2131:
> >>>>>            rbd_assert(which >= img_request->next_completion);
> >>>
> >>> If you can build your own kernel as Ilya says I'd like to
> >>> see the values of which and img_request->next_completion
> >>> here.
> >>
> >> Looks like which was 1, which means that next_completion had to be 2 or
> >> greater.  I miss solaris crash dumps ...
> >>
> >> On a different note, why are we asserting next_completion outside of
> >> a spinlock which is supposed to protect next_completion?
> >
> > That's a very good point (which could be easily remedied by moving
> > the assertion down a couple lines).  The image object request (#1)
> > in this case will have been marked done at this point; it's possible
> > that request #2 (or later) was concurrently getting handled by the
> > for_each_obj_request_from() loop below in that same function, but
> > may not have updated next_completion yet.
> >
> > So that *could* explain the tripped assertion.  The assertion
> > should be moved in any case, it's a bug.
> >
> > That being said, it doesn't explain the other assertion:
> >         rbd_assert(img_request != NULL);
> > So there's at least one other thing going on.
> 
> Yeah, exactly my thoughts.
> 
> Thanks,
> 
>                 Ilya

So, a (partial) fix can be this patch ?



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2123,6 +2123,7 @@  static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
        rbd_assert(obj_request_img_data_test(obj_request));
        img_request = obj_request->img_request;
 
+       spin_lock_irq(&img_request->completion_lock);
        dout("%s: img %p obj %p\n", __func__, img_request, obj_request);
        rbd_assert(img_request != NULL);
        rbd_assert(img_request->obj_request_count > 0);
@@ -2130,7 +2131,6 @@  static void rbd_img_obj_callback(struct rbd_obj_request *obj_request)
        rbd_assert(which < img_request->obj_request_count);
        rbd_assert(which >= img_request->next_completion);
 
-       spin_lock_irq(&img_request->completion_lock);
        if (which != img_request->next_completion)
                goto out;