Message ID | 1444586591-24141-1-git-send-email-idryomov@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 10/11/2015 01:03 PM, Ilya Dryomov wrote: > Currently we leak parent_spec and trigger a "parent reference > underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. > The problem is we take the !parent out_err branch and that only drops > refcounts; parent_spec that would've been freed had we called > rbd_dev_unparent() remains and triggers rbd_warn() in > rbd_dev_parent_put() - at that point we have parent_spec != NULL and > parent_ref == 0, so counter ends up being -1 after the decrement. > > Redo rbd_dev_probe_parent() to fix this. I'm just starting to look at this. My up-front problem is that I want to understand why it's OK to no longer grab references to the parent spec and the client before creating the new parent device. Is it simply because the rbd_dev that's the subject of this call is already holding a reference to both, so no need to get another? I think that's right, but you should say that in the explanation. It's a change that's independent of the other change (which you describe). OK, onto the change you describe. You say that we get the underflow if rbd_dev_create() fails. At that point in the function, we have: parent == NULL rbd_dev->parent_spec != NULL parent_spec holds a new reference to that rbdc holds a new reference to the client rbd_dev_create() only returns NULL if the kzalloc() fails. And if so, we jump to out_err, take the !parent branch, and we drop the references we took in rbdc and parent_spec before returning. Where is the leak? (Actually, underflow means we're dropping more references than we took.) I'm not saying your patch is wrong, but I'm not understanding the problem you describe. I'm probably just missing something; please educate me. -Alex > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > drivers/block/rbd.c | 36 ++++++++++++++++-------------------- > 1 file changed, 16 insertions(+), 20 deletions(-) > > diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c > index cd00e4653e49..ccbc3cbbf24e 100644 > --- a/drivers/block/rbd.c > +++ b/drivers/block/rbd.c > @@ -5134,41 +5134,37 @@ out_err: > static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) > { > struct rbd_device *parent = NULL; > - struct rbd_spec *parent_spec; > - struct rbd_client *rbdc; > int ret; > > if (!rbd_dev->parent_spec) > return 0; > - /* > - * We need to pass a reference to the client and the parent > - * spec when creating the parent rbd_dev. Images related by > - * parent/child relationships always share both. > - */ > - parent_spec = rbd_spec_get(rbd_dev->parent_spec); > - rbdc = __rbd_get_client(rbd_dev->rbd_client); > > - ret = -ENOMEM; > - parent = rbd_dev_create(rbdc, parent_spec, NULL); > - if (!parent) > + parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec, > + NULL); > + if (!parent) { > + ret = -ENOMEM; > goto out_err; > + } > + > + /* > + * Images related by parent/child relationships always share > + * rbd_client and spec/parent_spec, so bump their refcounts. > + */ > + __rbd_get_client(rbd_dev->rbd_client); > + rbd_spec_get(rbd_dev->parent_spec); > > ret = rbd_dev_image_probe(parent, false); > if (ret < 0) > goto out_err; > + > rbd_dev->parent = parent; > atomic_set(&rbd_dev->parent_ref, 1); > - > return 0; > + > out_err: > - if (parent) { > - rbd_dev_unparent(rbd_dev); > + rbd_dev_unparent(rbd_dev); > + if (parent) > rbd_dev_destroy(parent); > - } else { > - rbd_put_client(rbdc); > - rbd_spec_put(parent_spec); > - } > - > return ret; > } > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 15, 2015 at 7:10 PM, Alex Elder <elder@ieee.org> wrote: > On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >> Currently we leak parent_spec and trigger a "parent reference >> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >> The problem is we take the !parent out_err branch and that only drops >> refcounts; parent_spec that would've been freed had we called >> rbd_dev_unparent() remains and triggers rbd_warn() in >> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >> parent_ref == 0, so counter ends up being -1 after the decrement. >> >> Redo rbd_dev_probe_parent() to fix this. > > I'm just starting to look at this. > > My up-front problem is that I want to understand why > it's OK to no longer grab references to the parent spec > and the client before creating the new parent device. > Is it simply because the rbd_dev that's the subject > of this call is already holding a reference to both, > so no need to get another? I think that's right, but > you should say that in the explanation. It's a > change that's independent of the other change (which > you describe). I still grab references, I just moved that code after rbd_dev_create(). Fundamentally rbd_dev_create() is just a memory allocation, so whether we acquire references before or after doesn't matter, except that acquiring them before complicates the error path. > > OK, onto the change you describe. > > You say that we get the underflow if rbd_dev_create() fails. > > At that point in the function, we have: > parent == NULL > rbd_dev->parent_spec != NULL > parent_spec holds a new reference to that > rbdc holds a new reference to the client > > rbd_dev_create() only returns NULL if the kzalloc() fails. > And if so, we jump to out_err, take the !parent branch, > and we drop the references we took in rbdc and parent_spec > before returning. > > Where is the leak? (Actually, underflow means we're > dropping more references than we took.) We enter rbd_dev_probe_parent(). If ->parent_spec != NULL (and therefore has a refcount of 1), we bump refcounts and proceed to allocating parent rbd_dev. If rbd_dev_create() fails, we drop refcounts and exit rbd_dev_probe_parent() with ->parent_spec != NULL and a refcount of 1 on that spec. We take err_out_probe in rbd_dev_image_probe() and through rbd_dev_unprobe() end up in rbd_dev_parent_put(). Now, ->parent_spec != NULL but ->parent_ref == 0 because we never got to atomic_set(&rbd_dev->parent_ref, 1) in rbd_dev_probe_parent(). Local counter ends up -1 after the decrement and so instead of calling rbd_dev_unparent() which would have freed ->parent_spec we issue an underflow warning and bail. ->parent_spec is leaked. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/15/2015 01:31 PM, Ilya Dryomov wrote: > On Thu, Oct 15, 2015 at 7:10 PM, Alex Elder <elder@ieee.org> wrote: >> On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >>> Currently we leak parent_spec and trigger a "parent reference >>> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >>> The problem is we take the !parent out_err branch and that only drops >>> refcounts; parent_spec that would've been freed had we called >>> rbd_dev_unparent() remains and triggers rbd_warn() in >>> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >>> parent_ref == 0, so counter ends up being -1 after the decrement. >>> >>> Redo rbd_dev_probe_parent() to fix this. >> >> I'm just starting to look at this. >> >> My up-front problem is that I want to understand why >> it's OK to no longer grab references to the parent spec >> and the client before creating the new parent device. >> Is it simply because the rbd_dev that's the subject >> of this call is already holding a reference to both, >> so no need to get another? I think that's right, but >> you should say that in the explanation. It's a >> change that's independent of the other change (which >> you describe). > > I still grab references, I just moved that code after rbd_dev_create(). > Fundamentally rbd_dev_create() is just a memory allocation, so whether > we acquire references before or after doesn't matter, except that > acquiring them before complicates the error path. OK, I accept that. But looking at the patch alone it made me wonder whether grabbing the references was necessary before creating the parent device. I guess the main point is you should have mentioned it in your description. I'm going to take another look at the original patch, now that I have your explanation (below). Thanks. -Alex >> OK, onto the change you describe. >> >> You say that we get the underflow if rbd_dev_create() fails. >> >> At that point in the function, we have: >> parent == NULL >> rbd_dev->parent_spec != NULL >> parent_spec holds a new reference to that >> rbdc holds a new reference to the client >> >> rbd_dev_create() only returns NULL if the kzalloc() fails. >> And if so, we jump to out_err, take the !parent branch, >> and we drop the references we took in rbdc and parent_spec >> before returning. >> >> Where is the leak? (Actually, underflow means we're >> dropping more references than we took.) > > We enter rbd_dev_probe_parent(). If ->parent_spec != NULL (and > therefore has a refcount of 1), we bump refcounts and proceed to > allocating parent rbd_dev. If rbd_dev_create() fails, we drop > refcounts and exit rbd_dev_probe_parent() with ->parent_spec != NULL > and a refcount of 1 on that spec. We take err_out_probe in > rbd_dev_image_probe() and through rbd_dev_unprobe() end up in > rbd_dev_parent_put(). Now, ->parent_spec != NULL but ->parent_ref == 0 > because we never got to atomic_set(&rbd_dev->parent_ref, 1) in > rbd_dev_probe_parent(). Local counter ends up -1 after the decrement > and so instead of calling rbd_dev_unparent() which would have freed > ->parent_spec we issue an underflow warning and bail. ->parent_spec is > leaked. > > Thanks, > > Ilya > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/15/2015 01:31 PM, Ilya Dryomov wrote: > On Thu, Oct 15, 2015 at 7:10 PM, Alex Elder <elder@ieee.org> wrote: >> On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >>> Currently we leak parent_spec and trigger a "parent reference >>> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >>> The problem is we take the !parent out_err branch and that only drops >>> refcounts; parent_spec that would've been freed had we called >>> rbd_dev_unparent() remains and triggers rbd_warn() in >>> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >>> parent_ref == 0, so counter ends up being -1 after the decrement. >>> >>> Redo rbd_dev_probe_parent() to fix this. >> >> I'm just starting to look at this. >> >> My up-front problem is that I want to understand why >> it's OK to no longer grab references to the parent spec >> and the client before creating the new parent device. >> Is it simply because the rbd_dev that's the subject >> of this call is already holding a reference to both, >> so no need to get another? I think that's right, but >> you should say that in the explanation. It's a >> change that's independent of the other change (which >> you describe). > > I still grab references, I just moved that code after rbd_dev_create(). > Fundamentally rbd_dev_create() is just a memory allocation, so whether > we acquire references before or after doesn't matter, except that > acquiring them before complicates the error path. > >> >> OK, onto the change you describe. >> >> You say that we get the underflow if rbd_dev_create() fails. >> >> At that point in the function, we have: >> parent == NULL >> rbd_dev->parent_spec != NULL >> parent_spec holds a new reference to that >> rbdc holds a new reference to the client >> >> rbd_dev_create() only returns NULL if the kzalloc() fails. >> And if so, we jump to out_err, take the !parent branch, >> and we drop the references we took in rbdc and parent_spec >> before returning. >> >> Where is the leak? (Actually, underflow means we're >> dropping more references than we took.) OK, I'm still reviewing your patch, but I have another comment. > We enter rbd_dev_probe_parent(). If ->parent_spec != NULL (and > therefore has a refcount of 1), we bump refcounts and proceed to > allocating parent rbd_dev. If rbd_dev_create() fails, we drop > refcounts and exit rbd_dev_probe_parent() with ->parent_spec != NULL > and a refcount of 1 on that spec. We take err_out_probe in This is exactly as it should be. On error, we exit rbd_dev_create() with things in exactly the state they were in when it was called. > rbd_dev_image_probe() and through rbd_dev_unprobe() end up in > rbd_dev_parent_put(). Now, ->parent_spec != NULL but ->parent_ref == 0 > because we never got to atomic_set(&rbd_dev->parent_ref, 1) in That's where the faulty logic lies. We set the parent_spec before we set up the parent, but we update parent_ref only after the parent is recorded. And we later assume a non-null parent_spec means we can decrement parent_ref, which in this case is not a valid assumption. I'll send my review comments shortly. I think you have fixed a bug but I think it's the wrong fix. -Alex > rbd_dev_probe_parent(). Local counter ends up -1 after the decrement > and so instead of calling rbd_dev_unparent() which would have freed > ->parent_spec we issue an underflow warning and bail. ->parent_spec is > leaked. > > Thanks, > > Ilya > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/11/2015 01:03 PM, Ilya Dryomov wrote: > Currently we leak parent_spec and trigger a "parent reference > underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. > The problem is we take the !parent out_err branch and that only drops > refcounts; parent_spec that would've been freed had we called > rbd_dev_unparent() remains and triggers rbd_warn() in > rbd_dev_parent_put() - at that point we have parent_spec != NULL and > parent_ref == 0, so counter ends up being -1 after the decrement. OK, now that I understand the context... You now get extra references for the spec and client for the parent only after creating the parent device. I think the reason they logically belonged before the call to rbd_device_create() for the parent is because the client and spec pointers passed to that function carry with them references that are passed to the resulting rbd_device if successful. If creating the parent device fails, you unparent the original device, which will still have a null parent pointer. The effect of unparenting in this case is dropping a reference to the parent's spec, and clearing the device's pointer to it. This is confusing, but let's run with it. If creating the parent device succeeds, references to the client and parent spec are taken (basically, these belong to the just-created parent device). The parent image is now probed. If this fails, you again unparent the device. We still have not set the device's parent pointer, so the effect is as before, dropping the parent spec reference and clearing the device's reference to it. The error handling now destroys the parent, which drops references to its client and the spec. Again, this seems confusing, as if we've dropped one more reference to the parent spec than desired. This logic now seems to work. But it's working around a flaw in the caller I think. Upon entry to rbd_dev_probe_parent(), a layered device will have have a non-null parent_spec pointer (and a reference to it), which will have been filled in by rbd_dev_v2_parent_info(). Really, it should not be rbd_dev_probe_parent() that drops that parent spec reference on error. Instead, rbd_dev_image_probe() (which got the reference to the parent spec) should handle cleaning up the device's parent spec if an error occurs after it has been assigned. I'll wait for your response, I'd like to know if what I'm saying makes sense. -Alex > Redo rbd_dev_probe_parent() to fix this. > > Signed-off-by: Ilya Dryomov <idryomov@gmail.com> > --- > drivers/block/rbd.c | 36 ++++++++++++++++-------------------- > 1 file changed, 16 insertions(+), 20 deletions(-) > > diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c > index cd00e4653e49..ccbc3cbbf24e 100644 > --- a/drivers/block/rbd.c > +++ b/drivers/block/rbd.c > @@ -5134,41 +5134,37 @@ out_err: > static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) > { > struct rbd_device *parent = NULL; > - struct rbd_spec *parent_spec; > - struct rbd_client *rbdc; > int ret; > > if (!rbd_dev->parent_spec) > return 0; > - /* > - * We need to pass a reference to the client and the parent > - * spec when creating the parent rbd_dev. Images related by > - * parent/child relationships always share both. > - */ > - parent_spec = rbd_spec_get(rbd_dev->parent_spec); > - rbdc = __rbd_get_client(rbd_dev->rbd_client); > > - ret = -ENOMEM; > - parent = rbd_dev_create(rbdc, parent_spec, NULL); > - if (!parent) > + parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec, > + NULL); > + if (!parent) { > + ret = -ENOMEM; > goto out_err; > + } > + > + /* > + * Images related by parent/child relationships always share > + * rbd_client and spec/parent_spec, so bump their refcounts. > + */ > + __rbd_get_client(rbd_dev->rbd_client); > + rbd_spec_get(rbd_dev->parent_spec); > > ret = rbd_dev_image_probe(parent, false); > if (ret < 0) > goto out_err; > + > rbd_dev->parent = parent; > atomic_set(&rbd_dev->parent_ref, 1); > - > return 0; > + > out_err: > - if (parent) { > - rbd_dev_unparent(rbd_dev); > + rbd_dev_unparent(rbd_dev); > + if (parent) > rbd_dev_destroy(parent); > - } else { > - rbd_put_client(rbdc); > - rbd_spec_put(parent_spec); > - } > - > return ret; > } > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 15, 2015 at 11:10 PM, Alex Elder <elder@ieee.org> wrote: > On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >> Currently we leak parent_spec and trigger a "parent reference >> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >> The problem is we take the !parent out_err branch and that only drops >> refcounts; parent_spec that would've been freed had we called >> rbd_dev_unparent() remains and triggers rbd_warn() in >> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >> parent_ref == 0, so counter ends up being -1 after the decrement. > > OK, now that I understand the context... > > You now get extra references for the spec and client > for the parent only after creating the parent device. > I think the reason they logically belonged before > the call to rbd_device_create() for the parent is > because the client and spec pointers passed to that > function carry with them references that are passed > to the resulting rbd_device if successful. Let me stress that those two get()s that I moved is not the point of the patch at all. Whether we do it before or after rbd_dev_create() is pretty much irrelevant, the only reason I moved those calls is to make the error path simpler. > > If creating the parent device fails, you unparent the > original device, which will still have a null parent > pointer. The effect of unparenting in this case is > dropping a reference to the parent's spec, and clearing > the device's pointer to it. This is confusing, but > let's run with it. > > If creating the parent device succeeds, references to > the client and parent spec are taken (basically, these > belong to the just-created parent device). The parent > image is now probed. If this fails, you again > unparent the device. We still have not set the > device's parent pointer, so the effect is as before, > dropping the parent spec reference and clearing > the device's reference to it. The error handling > now destroys the parent, which drops references to > its client and the spec. Again, this seems > confusing, as if we've dropped one more reference > to the parent spec than desired. > > This logic now seems to work. But it's working > around a flaw in the caller I think. Upon entry > to rbd_dev_probe_parent(), a layered device will > have have a non-null parent_spec pointer (and a > reference to it), which will have been filled in > by rbd_dev_v2_parent_info(). > > Really, it should not be rbd_dev_probe_parent() > that drops that parent spec reference on error. > Instead, rbd_dev_image_probe() (which got the > reference to the parent spec) should handle > cleaning up the device's parent spec if an > error occurs after it has been assigned. > > I'll wait for your response, I'd like to know > if what I'm saying makes sense. All of the above makes sense. I agree that it is asymmetrical and that it would have been better to have rbd_dev_image_probe() drop the last ref on ->parent_spec. But, that's not what all of the existing code is doing. Consider rbd_dev_probe_parent() without my patch. There are two out_err jumps. As it is, if rbd_dev_create() fails, we only revert those two get()s and return with alive ->parent_spec. However, if rbd_dev_image_probe() fails, we call rbd_dev_unparent(), which will put ->parent_spec. Really, the issue is rbd_dev_unparent(), which not only removes the parent rbd_dev, but also the parent spec. All I did with this patch was align both out_err cases to do the same, namely undo the effects of rbd_dev_v2_parent_info() - I didn't want to create yet another error path. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/16/2015 04:50 AM, Ilya Dryomov wrote: > On Thu, Oct 15, 2015 at 11:10 PM, Alex Elder <elder@ieee.org> wrote: >> On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >>> Currently we leak parent_spec and trigger a "parent reference >>> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >>> The problem is we take the !parent out_err branch and that only drops >>> refcounts; parent_spec that would've been freed had we called >>> rbd_dev_unparent() remains and triggers rbd_warn() in >>> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >>> parent_ref == 0, so counter ends up being -1 after the decrement. >> >> OK, now that I understand the context... >> >> You now get extra references for the spec and client >> for the parent only after creating the parent device. >> I think the reason they logically belonged before >> the call to rbd_device_create() for the parent is >> because the client and spec pointers passed to that >> function carry with them references that are passed >> to the resulting rbd_device if successful. > > Let me stress that those two get()s that I moved is not the point of > the patch at all. Whether we do it before or after rbd_dev_create() is > pretty much irrelevant, the only reason I moved those calls is to make > the error path simpler. Yes I understand that. >> If creating the parent device fails, you unparent the >> original device, which will still have a null parent >> pointer. The effect of unparenting in this case is >> dropping a reference to the parent's spec, and clearing >> the device's pointer to it. This is confusing, but >> let's run with it. >> >> If creating the parent device succeeds, references to >> the client and parent spec are taken (basically, these >> belong to the just-created parent device). The parent >> image is now probed. If this fails, you again >> unparent the device. We still have not set the >> device's parent pointer, so the effect is as before, >> dropping the parent spec reference and clearing >> the device's reference to it. The error handling >> now destroys the parent, which drops references to >> its client and the spec. Again, this seems >> confusing, as if we've dropped one more reference >> to the parent spec than desired. >> >> This logic now seems to work. But it's working >> around a flaw in the caller I think. Upon entry >> to rbd_dev_probe_parent(), a layered device will >> have have a non-null parent_spec pointer (and a >> reference to it), which will have been filled in >> by rbd_dev_v2_parent_info(). >> >> Really, it should not be rbd_dev_probe_parent() >> that drops that parent spec reference on error. >> Instead, rbd_dev_image_probe() (which got the >> reference to the parent spec) should handle >> cleaning up the device's parent spec if an >> error occurs after it has been assigned. >> >> I'll wait for your response, I'd like to know >> if what I'm saying makes sense. > > All of the above makes sense. I agree that it is asymmetrical and that > it would have been better to have rbd_dev_image_probe() drop the last > ref on ->parent_spec. But, that's not what all of the existing code is > doing. Agreed. > Consider rbd_dev_probe_parent() without my patch. There are two > out_err jumps. As it is, if rbd_dev_create() fails, we only revert > those two get()s and return with alive ->parent_spec. However, if > rbd_dev_image_probe() fails, we call rbd_dev_unparent(), which will put > ->parent_spec. Really, the issue is rbd_dev_unparent(), which not only > removes the parent rbd_dev, but also the parent spec. All I did with > this patch was align both out_err cases to do the same, namely undo the > effects of rbd_dev_v2_parent_info() - I didn't want to create yet > another error path. OK. Walking through the code I did concluded that what you did made things "line up" properly. But I just felt like it wasn't necessarily the *right* fix, but it was a correct fix. The only point in suggesting what I think would be a better fix is that it would make things easier to reason about and get correct, and in so doing, make it easier to maintain and adapt in the future. But I have no objection to your fix, so please accept this for your original patch: Reviewed-by: Alex Elder <elder@linaro.org> > Thanks, > > Ilya > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 16, 2015 at 1:50 PM, Alex Elder <elder@ieee.org> wrote: > On 10/16/2015 04:50 AM, Ilya Dryomov wrote: >> On Thu, Oct 15, 2015 at 11:10 PM, Alex Elder <elder@ieee.org> wrote: >>> On 10/11/2015 01:03 PM, Ilya Dryomov wrote: >>>> Currently we leak parent_spec and trigger a "parent reference >>>> underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. >>>> The problem is we take the !parent out_err branch and that only drops >>>> refcounts; parent_spec that would've been freed had we called >>>> rbd_dev_unparent() remains and triggers rbd_warn() in >>>> rbd_dev_parent_put() - at that point we have parent_spec != NULL and >>>> parent_ref == 0, so counter ends up being -1 after the decrement. >>> >>> OK, now that I understand the context... >>> >>> You now get extra references for the spec and client >>> for the parent only after creating the parent device. >>> I think the reason they logically belonged before >>> the call to rbd_device_create() for the parent is >>> because the client and spec pointers passed to that >>> function carry with them references that are passed >>> to the resulting rbd_device if successful. >> >> Let me stress that those two get()s that I moved is not the point of >> the patch at all. Whether we do it before or after rbd_dev_create() is >> pretty much irrelevant, the only reason I moved those calls is to make >> the error path simpler. > > Yes I understand that. > >>> If creating the parent device fails, you unparent the >>> original device, which will still have a null parent >>> pointer. The effect of unparenting in this case is >>> dropping a reference to the parent's spec, and clearing >>> the device's pointer to it. This is confusing, but >>> let's run with it. >>> >>> If creating the parent device succeeds, references to >>> the client and parent spec are taken (basically, these >>> belong to the just-created parent device). The parent >>> image is now probed. If this fails, you again >>> unparent the device. We still have not set the >>> device's parent pointer, so the effect is as before, >>> dropping the parent spec reference and clearing >>> the device's reference to it. The error handling >>> now destroys the parent, which drops references to >>> its client and the spec. Again, this seems >>> confusing, as if we've dropped one more reference >>> to the parent spec than desired. >>> >>> This logic now seems to work. But it's working >>> around a flaw in the caller I think. Upon entry >>> to rbd_dev_probe_parent(), a layered device will >>> have have a non-null parent_spec pointer (and a >>> reference to it), which will have been filled in >>> by rbd_dev_v2_parent_info(). >>> >>> Really, it should not be rbd_dev_probe_parent() >>> that drops that parent spec reference on error. >>> Instead, rbd_dev_image_probe() (which got the >>> reference to the parent spec) should handle >>> cleaning up the device's parent spec if an >>> error occurs after it has been assigned. >>> >>> I'll wait for your response, I'd like to know >>> if what I'm saying makes sense. >> >> All of the above makes sense. I agree that it is asymmetrical and that >> it would have been better to have rbd_dev_image_probe() drop the last >> ref on ->parent_spec. But, that's not what all of the existing code is >> doing. > > Agreed. > >> Consider rbd_dev_probe_parent() without my patch. There are two >> out_err jumps. As it is, if rbd_dev_create() fails, we only revert >> those two get()s and return with alive ->parent_spec. However, if >> rbd_dev_image_probe() fails, we call rbd_dev_unparent(), which will put >> ->parent_spec. Really, the issue is rbd_dev_unparent(), which not only >> removes the parent rbd_dev, but also the parent spec. All I did with >> this patch was align both out_err cases to do the same, namely undo the >> effects of rbd_dev_v2_parent_info() - I didn't want to create yet >> another error path. > > OK. Walking through the code I did concluded that what > you did made things "line up" properly. But I just felt > like it wasn't necessarily the *right* fix, but it was > a correct fix. > > The only point in suggesting what I think would be a better > fix is that it would make things easier to reason about > and get correct, and in so doing, make it easier to > maintain and adapt in the future. Well, I'm fixing another rbd_dev refcount/use-after-free problem right now, so I think the only thing that will really make it easier to maintain this is refactor into just a couple of symmetrical functions. Making a single change (e.g. patching rbd_dev_unparent() to only touch ->parent and not ->parent_spec) is going to be invasive - there are just way too many different error paths. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index cd00e4653e49..ccbc3cbbf24e 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -5134,41 +5134,37 @@ out_err: static int rbd_dev_probe_parent(struct rbd_device *rbd_dev) { struct rbd_device *parent = NULL; - struct rbd_spec *parent_spec; - struct rbd_client *rbdc; int ret; if (!rbd_dev->parent_spec) return 0; - /* - * We need to pass a reference to the client and the parent - * spec when creating the parent rbd_dev. Images related by - * parent/child relationships always share both. - */ - parent_spec = rbd_spec_get(rbd_dev->parent_spec); - rbdc = __rbd_get_client(rbd_dev->rbd_client); - ret = -ENOMEM; - parent = rbd_dev_create(rbdc, parent_spec, NULL); - if (!parent) + parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec, + NULL); + if (!parent) { + ret = -ENOMEM; goto out_err; + } + + /* + * Images related by parent/child relationships always share + * rbd_client and spec/parent_spec, so bump their refcounts. + */ + __rbd_get_client(rbd_dev->rbd_client); + rbd_spec_get(rbd_dev->parent_spec); ret = rbd_dev_image_probe(parent, false); if (ret < 0) goto out_err; + rbd_dev->parent = parent; atomic_set(&rbd_dev->parent_ref, 1); - return 0; + out_err: - if (parent) { - rbd_dev_unparent(rbd_dev); + rbd_dev_unparent(rbd_dev); + if (parent) rbd_dev_destroy(parent); - } else { - rbd_put_client(rbdc); - rbd_spec_put(parent_spec); - } - return ret; }
Currently we leak parent_spec and trigger a "parent reference underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails. The problem is we take the !parent out_err branch and that only drops refcounts; parent_spec that would've been freed had we called rbd_dev_unparent() remains and triggers rbd_warn() in rbd_dev_parent_put() - at that point we have parent_spec != NULL and parent_ref == 0, so counter ends up being -1 after the decrement. Redo rbd_dev_probe_parent() to fix this. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> --- drivers/block/rbd.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-)