diff mbox

[8/9] rbd: set req->r_abort_on_full in writing

Message ID 1525682645-30510-9-git-send-email-dongsheng.yang@easystack.cn (mailing list archive)
State New, archived
Headers show

Commit Message

Dongsheng Yang May 7, 2018, 8:44 a.m. UTC
Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
---
 drivers/block/rbd.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Ilya Dryomov May 7, 2018, 2:30 p.m. UTC | #1
On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
> ---
>  drivers/block/rbd.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 39ca0d7..a341929 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
>         struct ceph_osd_request *osd_req = obj_request->osd_req;
>
>         osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
> +       osd_req->r_abort_on_full = true;
>         ktime_get_real_ts(&osd_req->r_mtime);
>         osd_req->r_data_offset = obj_request->ex.oe_off;
>  }

Blocking instead of returning -ENOSPC is intentional.  See

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a

and associated tracker tickets.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang May 8, 2018, 2:58 a.m. UTC | #2
Thanx Ilya,
      IIUC, the problem is that block layer will translate the ENOSPC to 
EIO,
then it will cause filesystem corruption if the cluster get full -> not 
full soon.

But Christoph introduce a new block status code type, which will translate
the correct error number to users of block layer in 2017.

please check this commit:

https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996

Thanx
Yang

On 05/07/2018 10:30 PM, Ilya Dryomov wrote:
> On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
>> ---
>>   drivers/block/rbd.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 39ca0d7..a341929 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
>>          struct ceph_osd_request *osd_req = obj_request->osd_req;
>>
>>          osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
>> +       osd_req->r_abort_on_full = true;
>>          ktime_get_real_ts(&osd_req->r_mtime);
>>          osd_req->r_data_offset = obj_request->ex.oe_off;
>>   }
> Blocking instead of returning -ENOSPC is intentional.  See
>
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde
>    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a
>
> and associated tracker tickets.
>
> Thanks,
>
>                  Ilya
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang May 8, 2018, 6:13 a.m. UTC | #3
To be more specific, the ENOSPC was introduced by 
a9d6ceb838755c24dde8a0ca02c3378926fc63db.

Please check:

https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db

Or, maybe we can keep the default behavior of blocking IO, but add a rbd 
option like abort_on_full to
return ENOSPC in the case we really don't want any IO blocking in our 
program.

Thanx

On 05/08/2018 10:58 AM, Dongsheng Yang wrote:
> Thanx Ilya,
>      IIUC, the problem is that block layer will translate the ENOSPC 
> to EIO,
> then it will cause filesystem corruption if the cluster get full -> 
> not full soon.
>
> But Christoph introduce a new block status code type, which will 
> translate
> the correct error number to users of block layer in 2017.
>
> please check this commit:
>
> https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996 
>
>
> Thanx
> Yang
>
> On 05/07/2018 10:30 PM, Ilya Dryomov wrote:
>> On Mon, May 7, 2018 at 10:44 AM, Dongsheng Yang
>> <dongsheng.yang@easystack.cn> wrote:
>>> Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
>>> ---
>>>   drivers/block/rbd.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 39ca0d7..a341929 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -1453,6 +1453,7 @@ static void rbd_osd_req_format_write(struct 
>>> rbd_obj_request *obj_request)
>>>          struct ceph_osd_request *osd_req = obj_request->osd_req;
>>>
>>>          osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
>>> +       osd_req->r_abort_on_full = true;
>>>          ktime_get_real_ts(&osd_req->r_mtime);
>>>          osd_req->r_data_offset = obj_request->ex.oe_off;
>>>   }
>> Blocking instead of returning -ENOSPC is intentional.  See
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d29adb34a94715174c88ca93e8aba955850c9bde
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a1ea2dbff11547a8e664f143c1ffefc586a577a
>>
>> and associated tracker tickets.
>>
>> Thanks,
>>
>>                  Ilya
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov May 10, 2018, 2:34 p.m. UTC | #4
On Tue, May 8, 2018 at 4:58 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
> Thanx Ilya,
>      IIUC, the problem is that block layer will translate the ENOSPC to EIO,
> then it will cause filesystem corruption if the cluster get full -> not full
> soon.
>
> But Christoph introduce a new block status code type, which will translate
> the correct error number to users of block layer in 2017.
>
> please check this commit:
>
> https://github.com/torvalds/linux/commit/2a842acab109f40f0d7d10b38e9ca88390628996

IIRC the way it used to work is that an error code would always be
propagated in the sense that bios would be completed with whatever
error code set by the block driver (even something crazy like -EBFONT).
Christoph's patch constrained that to a set of blk_status_t codes.

The particular error code doesn't really matter.  Whether it's -EIO or
-ENOSPC, a filesystem nowadays will either remount itself read-only or
shutdown.  xfs used to retry indefinitely in certain cases, but that
has changed and is now configurable via /sys/fs/xfs.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov May 10, 2018, 2:56 p.m. UTC | #5
On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
> To be more specific, the ENOSPC was introduced by
> a9d6ceb838755c24dde8a0ca02c3378926fc63db.
>
> Please check:
>
> https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db

This commit added an error string for -ENOSPC, so that "critical space
allocation error" is printed instead of "I/O error".  It didn't change
anything about generic error code propagation.

>
> Or, maybe we can keep the default behavior of blocking IO, but add a rbd
> option like abort_on_full to
> return ENOSPC in the case we really don't want any IO blocking in our
> program.

What is your use case?  Are you using raw block devices with pool
quotas enabled?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov May 11, 2018, 1:09 p.m. UTC | #6
On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
>
>
> On 05/10/2018 10:56 PM, Ilya Dryomov wrote:
>
> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>
> To be more specific, the ENOSPC was introduced by
> a9d6ceb838755c24dde8a0ca02c3378926fc63db.
>
> Please check:
>
> https://github.com/torvalds/linux/commit/a9d6ceb838755c24dde8a0ca02c3378926fc63db
>
> This commit added an error string for -ENOSPC, so that "critical space
> allocation error" is printed instead of "I/O error".  It didn't change
> anything about generic error code propagation.
>
> Or, maybe we can keep the default behavior of blocking IO, but add a rbd
> option like abort_on_full to
> return ENOSPC in the case we really don't want any IO blocking in our
> program.
>
> What is your use case?  Are you using raw block devices with pool
> quotas enabled?
>
> both raw block and xfs.
>
> One of the problem cases is that:
> We provide a container service for customer. Each project has
> a pool with quota, and the customer in this project can create
> containers with specified size of disk, but that's thin provisioned.
>
> When they reach the quota, the problem is the process in container
> which is doing writing will be blocked, and go into D state. Then we
> can't kill processes or stop these containers.
>
> Customers complained times
>
> "Just tell me what happened, rather than suspend everything and forbid to
> cancel."
>
> So we want to just return error to user and allow them to do a clean work
> and recovery
> their containers.

Right, so you are using pool quotas.  Support for pool quotas is
a relatively recent addition (since 4.7), so you are hitting all the
sharp edges.  This is exactly how it works in userspace -- no changes
were made.

Hanging processes in D state is definitely undesirable, but I'm curious
how would the recovery look like?  There is nothing that can done on the
customer side once the pool is marked full.  It will remain full until
an administrative action is taken.

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dongsheng Yang May 14, 2018, 5:43 a.m. UTC | #7
On 05/11/2018 09:09 PM, Ilya Dryomov wrote:
> On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>>
>> On 05/10/2018 10:56 PM, Ilya Dryomov wrote:
>>
>> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang
>> <dongsheng.yang@easystack.cn> wrote:
...
>> Customers complained times
>>
>> "Just tell me what happened, rather than suspend everything and forbid to
>> cancel."
>>
>> So we want to just return error to user and allow them to do a clean work
>> and recovery
>> their containers.
> Right, so you are using pool quotas.  Support for pool quotas is
> a relatively recent addition (since 4.7), so you are hitting all the
> sharp edges.  This is exactly how it works in userspace -- no changes
> were made.
>
> Hanging processes in D state is definitely undesirable, but I'm curious
> how would the recovery look like?  There is nothing that can done on the
> customer side once the pool is marked full.  It will remain full until
> an administrative action is taken.

We can remove some unused rbd images in this case,

https://github.com/ceph/ceph/pull/12627

In our using, there is a reset button at web for each container. When 
the user
found no space left, they can remove some unused volumes. Then
we can click the reset button of each container which will reset the
container status, actually, it just stop and then start the container again.
And then this container would be running again.

Thanx

>
> Thanks,
>
>                  Ilya
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilya Dryomov May 15, 2018, 1:34 p.m. UTC | #8
On Mon, May 14, 2018 at 7:43 AM, Dongsheng Yang
<dongsheng.yang@easystack.cn> wrote:
>
>
> On 05/11/2018 09:09 PM, Ilya Dryomov wrote:
>>
>> On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang
>> <dongsheng.yang@easystack.cn> wrote:
>>>
>>>
>>> On 05/10/2018 10:56 PM, Ilya Dryomov wrote:
>>>
>>> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang
>>> <dongsheng.yang@easystack.cn> wrote:
>
> ...
>>>
>>> Customers complained times
>>>
>>> "Just tell me what happened, rather than suspend everything and forbid to
>>> cancel."
>>>
>>> So we want to just return error to user and allow them to do a clean work
>>> and recovery
>>> their containers.
>>
>> Right, so you are using pool quotas.  Support for pool quotas is
>> a relatively recent addition (since 4.7), so you are hitting all the
>> sharp edges.  This is exactly how it works in userspace -- no changes
>> were made.
>>
>> Hanging processes in D state is definitely undesirable, but I'm curious
>> how would the recovery look like?  There is nothing that can done on the
>> customer side once the pool is marked full.  It will remain full until
>> an administrative action is taken.
>
>
> We can remove some unused rbd images in this case,
>
> https://github.com/ceph/ceph/pull/12627
>
> In our using, there is a reset button at web for each container. When the
> user
> found no space left, they can remove some unused volumes. Then
> we can click the reset button of each container which will reset the
> container status, actually, it just stop and then start the container again.
> And then this container would be running again.

I see, removing an image is one example.  I'd be fine with adding an
option to return -ENOSPC, off by default.

Josh, any objections?  Your patch that made blocking optional was
dropped, and I'm trying to remember why.  Was it just because the OSDs
had to be changed to drop such writes on the floor, the problem being
that someone with an old osdmap wouldn't know whether to resend?

Thanks,

                Ilya
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Durgin May 22, 2018, 7:21 p.m. UTC | #9
On 05/15/2018 06:34 AM, Ilya Dryomov wrote:
> On Mon, May 14, 2018 at 7:43 AM, Dongsheng Yang
> <dongsheng.yang@easystack.cn> wrote:
>>
>>
>> On 05/11/2018 09:09 PM, Ilya Dryomov wrote:
>>>
>>> On Fri, May 11, 2018 at 9:16 AM, Dongsheng Yang
>>> <dongsheng.yang@easystack.cn> wrote:
>>>>
>>>>
>>>> On 05/10/2018 10:56 PM, Ilya Dryomov wrote:
>>>>
>>>> On Tue, May 8, 2018 at 8:13 AM, Dongsheng Yang
>>>> <dongsheng.yang@easystack.cn> wrote:
>>
>> ...
>>>>
>>>> Customers complained times
>>>>
>>>> "Just tell me what happened, rather than suspend everything and forbid to
>>>> cancel."
>>>>
>>>> So we want to just return error to user and allow them to do a clean work
>>>> and recovery
>>>> their containers.
>>>
>>> Right, so you are using pool quotas.  Support for pool quotas is
>>> a relatively recent addition (since 4.7), so you are hitting all the
>>> sharp edges.  This is exactly how it works in userspace -- no changes
>>> were made.
>>>
>>> Hanging processes in D state is definitely undesirable, but I'm curious
>>> how would the recovery look like?  There is nothing that can done on the
>>> customer side once the pool is marked full.  It will remain full until
>>> an administrative action is taken.
>>
>>
>> We can remove some unused rbd images in this case,
>>
>> https://github.com/ceph/ceph/pull/12627
>>
>> In our using, there is a reset button at web for each container. When the
>> user
>> found no space left, they can remove some unused volumes. Then
>> we can click the reset button of each container which will reset the
>> container status, actually, it just stop and then start the container again.
>> And then this container would be running again.
> 
> I see, removing an image is one example.  I'd be fine with adding an
> option to return -ENOSPC, off by default.
> 
> Josh, any objections?  Your patch that made blocking optional was
> dropped, and I'm trying to remember why.  Was it just because the OSDs
> had to be changed to drop such writes on the floor, the problem being
> that someone with an old osdmap wouldn't know whether to resend?

Yes, that was the reason at the time. Since the kernel is passing ENOSPC
up the stack now, I see no reason not to add a krbd option to do so.

Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 39ca0d7..a341929 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1453,6 +1453,7 @@  static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
 	struct ceph_osd_request *osd_req = obj_request->osd_req;
 
 	osd_req->r_flags = CEPH_OSD_FLAG_WRITE;
+	osd_req->r_abort_on_full = true;
 	ktime_get_real_ts(&osd_req->r_mtime);
 	osd_req->r_data_offset = obj_request->ex.oe_off;
 }