[v3,0/9] block-backend: Introduce I/O hang

Message ID	20201022130303.1092-1-cenjiahui@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=PmZg=D5=nongnu.org=qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66BBE2177B From: Jiahui Cen <cenjiahui@huawei.com> To: <qemu-devel@nongnu.org>, <kwolf@redhat.com>, <mreitz@redhat.com>, <eblake@redhat.com> Subject: [PATCH v3 0/9] block-backend: Introduce I/O hang Date: Thu, 22 Oct 2020 21:02:54 +0800 Message-ID: <20201022130303.1092-1-cenjiahui@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Received-SPF: pass client-ip=45.249.212.190; envelope-from=cenjiahui@huawei.com; helo=huawei.com Precedence: list Cc: cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com, qemu-block@nongnu.org, stefanha@redhat.com, fangying1@huawei.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	block-backend: Introduce I/O hang \| expand [v3,0/9] block-backend: Introduce I/O hang [v3,1/9] block-backend: introduce I/O rehandle info [v3,2/9] block-backend: rehandle block aios when EIO [v3,3/9] block-backend: add I/O hang timeout [v3,4/9] block-backend: add I/O rehandle pause/unpause [v3,5/9] block-backend: enable I/O hang when timeout is set [v3,6/9] virtio-blk: pause I/O hang when resetting [v3,7/9] qemu-option: add I/O hang timeout option [v3,8/9] qapi: add I/O hang and I/O hang timeout qapi event [v3,9/9] docs: add a doc about I/O hang

Message ID

20201022130303.1092-1-cenjiahui@huawei.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 66BBE2177B
From: Jiahui Cen <cenjiahui@huawei.com>
To: <qemu-devel@nongnu.org>, <kwolf@redhat.com>, <mreitz@redhat.com>,
 <eblake@redhat.com>
Subject: [PATCH v3 0/9] block-backend: Introduce I/O hang
Date: Thu, 22 Oct 2020 21:02:54 +0800
Message-ID: <20201022130303.1092-1-cenjiahui@huawei.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
Received-SPF: pass client-ip=45.249.212.190;
 envelope-from=cenjiahui@huawei.com;
 helo=huawei.com
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: cenjiahui@huawei.com, zhang.zhanghailiang@huawei.com,
 qemu-block@nongnu.org,
 stefanha@redhat.com, fangying1@huawei.com, jsnow@redhat.com
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

block-backend: Introduce I/O hang | expand

Message

Jiahui Cen Oct. 22, 2020, 1:02 p.m. UTC

A VM in the cloud environment may use a virutal disk as the backend storage,
and there are usually filesystems on the virtual block device. When backend
storage is temporarily down, any I/O issued to the virtual block device will
cause an error. For example, an error occurred in ext4 filesystem would make
the filesystem readonly. However a cloud backend storage can be soon recovered.
For example, an IP-SAN may be down due to network failure and will be online
soon after network is recovered. The error in the filesystem may not be
recovered unless a device reattach or system restart. So an I/O rehandle is
in need to implement a self-healing mechanism.

This patch series propose a feature called I/O hang. It can rehandle AIOs
with EIO error without sending error back to guest. From guest's perspective
of view it is just like an IO is hanging and not returned. Guest can get
back running smoothly when I/O is recovred with this feature enabled.

v2->v3:
* Add a doc to describe I/O hang.

v1->v2:
* Rebase to fix compile problems.
* Fix incorrect remove of rehandle list.
* Provide rehandle pause interface.

Jiahui Cen (9):
  block-backend: introduce I/O rehandle info
  block-backend: rehandle block aios when EIO
  block-backend: add I/O hang timeout
  block-backend: add I/O rehandle pause/unpause
  block-backend: enable I/O hang when timeout is set
  virtio-blk: pause I/O hang when resetting
  qemu-option: add I/O hang timeout option
  qapi: add I/O hang and I/O hang timeout qapi event
  docs: add a doc about I/O hang

 block/block-backend.c          | 300 +++++++++++++++++++++++++++++++++
 blockdev.c                     |  11 ++
 docs/io-hang.rst               |  45 +++++
 hw/block/virtio-blk.c          |   8 +
 include/sysemu/block-backend.h |   5 +
 qapi/block-core.json           |  26 +++
 6 files changed, 395 insertions(+)
 create mode 100644 docs/io-hang.rst

Comments

Stefan Hajnoczi Oct. 26, 2020, 4:53 p.m. UTC | #1

On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> A VM in the cloud environment may use a virutal disk as the backend storage,
> and there are usually filesystems on the virtual block device. When backend
> storage is temporarily down, any I/O issued to the virtual block device will
> cause an error. For example, an error occurred in ext4 filesystem would make
> the filesystem readonly. However a cloud backend storage can be soon recovered.
> For example, an IP-SAN may be down due to network failure and will be online
> soon after network is recovered. The error in the filesystem may not be
> recovered unless a device reattach or system restart. So an I/O rehandle is
> in need to implement a self-healing mechanism.
> 
> This patch series propose a feature called I/O hang. It can rehandle AIOs
> with EIO error without sending error back to guest. From guest's perspective
> of view it is just like an IO is hanging and not returned. Guest can get
> back running smoothly when I/O is recovred with this feature enabled.

Hi,
This feature seems like an extension of the existing -drive
rerror=/werror= parameters:

  werror=action,rerror=action
      Specify which action to take on write and read errors. Valid
      actions are: "ignore" (ignore the error and try to continue),
      "stop" (pause QEMU), "report" (report the error to the guest),
      "enospc" (pause QEMU only if the host disk is full; report the
      error to the guest otherwise).  The default setting is
      werror=enospc and rerror=report.

That mechanism already has a list of requests to retry and live
migration integration. Using the werror=/rerror= mechanism would avoid
code duplication between these features. You could add a
werror/rerror=retry error action for this feature.

Does that sound good?

Stefan

Jiahui Cen Oct. 29, 2020, 9:42 a.m. UTC | #2

On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
>> A VM in the cloud environment may use a virutal disk as the backend storage,
>> and there are usually filesystems on the virtual block device. When backend
>> storage is temporarily down, any I/O issued to the virtual block device will
>> cause an error. For example, an error occurred in ext4 filesystem would make
>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>> For example, an IP-SAN may be down due to network failure and will be online
>> soon after network is recovered. The error in the filesystem may not be
>> recovered unless a device reattach or system restart. So an I/O rehandle is
>> in need to implement a self-healing mechanism.
>>
>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>> with EIO error without sending error back to guest. From guest's perspective
>> of view it is just like an IO is hanging and not returned. Guest can get
>> back running smoothly when I/O is recovred with this feature enabled.
> 
> Hi,
> This feature seems like an extension of the existing -drive
> rerror=/werror= parameters:
> 
>   werror=action,rerror=action
>       Specify which action to take on write and read errors. Valid
>       actions are: "ignore" (ignore the error and try to continue),
>       "stop" (pause QEMU), "report" (report the error to the guest),
>       "enospc" (pause QEMU only if the host disk is full; report the
>       error to the guest otherwise).  The default setting is
>       werror=enospc and rerror=report.
> 
> That mechanism already has a list of requests to retry and live
> migration integration. Using the werror=/rerror= mechanism would avoid
> code duplication between these features. You could add a
> werror/rerror=retry error action for this feature.
> 
> Does that sound good?
> 
> Stefan
> 

Hi Stefan,

Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
way for the retry feature.

However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
provides ACTION, and the real handler of errors need be implemented several
times in device layer for different devices. While our I/O Hang mechanism
directly handles AIO errors no matter which type of devices it is. Is it a
more common way to implement the feature in block-backend layer? Especially we
can set retry timeout in a common structure BlockBackend.

Besides, is there any reason that QEMU implements the rerror=/werror mechansim
in device layer rather than in block-backend layer?

Jiahui

Stefan Hajnoczi Oct. 30, 2020, 1:21 p.m. UTC | #3

On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
> 
> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
> > On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
> >> A VM in the cloud environment may use a virutal disk as the backend storage,
> >> and there are usually filesystems on the virtual block device. When backend
> >> storage is temporarily down, any I/O issued to the virtual block device will
> >> cause an error. For example, an error occurred in ext4 filesystem would make
> >> the filesystem readonly. However a cloud backend storage can be soon recovered.
> >> For example, an IP-SAN may be down due to network failure and will be online
> >> soon after network is recovered. The error in the filesystem may not be
> >> recovered unless a device reattach or system restart. So an I/O rehandle is
> >> in need to implement a self-healing mechanism.
> >>
> >> This patch series propose a feature called I/O hang. It can rehandle AIOs
> >> with EIO error without sending error back to guest. From guest's perspective
> >> of view it is just like an IO is hanging and not returned. Guest can get
> >> back running smoothly when I/O is recovred with this feature enabled.
> > 
> > Hi,
> > This feature seems like an extension of the existing -drive
> > rerror=/werror= parameters:
> > 
> >   werror=action,rerror=action
> >       Specify which action to take on write and read errors. Valid
> >       actions are: "ignore" (ignore the error and try to continue),
> >       "stop" (pause QEMU), "report" (report the error to the guest),
> >       "enospc" (pause QEMU only if the host disk is full; report the
> >       error to the guest otherwise).  The default setting is
> >       werror=enospc and rerror=report.
> > 
> > That mechanism already has a list of requests to retry and live
> > migration integration. Using the werror=/rerror= mechanism would avoid
> > code duplication between these features. You could add a
> > werror/rerror=retry error action for this feature.
> > 
> > Does that sound good?
> > 
> > Stefan
> > 
> 
> Hi Stefan,
> 
> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
> way for the retry feature.
> 
> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
> provides ACTION, and the real handler of errors need be implemented several
> times in device layer for different devices. While our I/O Hang mechanism
> directly handles AIO errors no matter which type of devices it is. Is it a
> more common way to implement the feature in block-backend layer? Especially we
> can set retry timeout in a common structure BlockBackend.
> 
> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
> in device layer rather than in block-backend layer?

Yes, it's because failed requests can be live-migrated and retried on
the destination host. In other words, live migration still works even
when there are failed requests.

There may be things that can be refactored so there is less duplication
in devices, but the basic design goal is that the block layer doesn't
keep track of failed requests because they are live migrated together
with the device state.

Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.

Stefan

Jiahui Cen Nov. 3, 2020, 12:19 p.m. UTC | #4

On 2020/10/30 21:21, Stefan Hajnoczi wrote:
> On Thu, Oct 29, 2020 at 05:42:42PM +0800, cenjiahui wrote:
>>
>> On 2020/10/27 0:53, Stefan Hajnoczi wrote:
>>> On Thu, Oct 22, 2020 at 09:02:54PM +0800, Jiahui Cen wrote:
>>>> A VM in the cloud environment may use a virutal disk as the backend storage,
>>>> and there are usually filesystems on the virtual block device. When backend
>>>> storage is temporarily down, any I/O issued to the virtual block device will
>>>> cause an error. For example, an error occurred in ext4 filesystem would make
>>>> the filesystem readonly. However a cloud backend storage can be soon recovered.
>>>> For example, an IP-SAN may be down due to network failure and will be online
>>>> soon after network is recovered. The error in the filesystem may not be
>>>> recovered unless a device reattach or system restart. So an I/O rehandle is
>>>> in need to implement a self-healing mechanism.
>>>>
>>>> This patch series propose a feature called I/O hang. It can rehandle AIOs
>>>> with EIO error without sending error back to guest. From guest's perspective
>>>> of view it is just like an IO is hanging and not returned. Guest can get
>>>> back running smoothly when I/O is recovred with this feature enabled.
>>>
>>> Hi,
>>> This feature seems like an extension of the existing -drive
>>> rerror=/werror= parameters:
>>>
>>>   werror=action,rerror=action
>>>       Specify which action to take on write and read errors. Valid
>>>       actions are: "ignore" (ignore the error and try to continue),
>>>       "stop" (pause QEMU), "report" (report the error to the guest),
>>>       "enospc" (pause QEMU only if the host disk is full; report the
>>>       error to the guest otherwise).  The default setting is
>>>       werror=enospc and rerror=report.
>>>
>>> That mechanism already has a list of requests to retry and live
>>> migration integration. Using the werror=/rerror= mechanism would avoid
>>> code duplication between these features. You could add a
>>> werror/rerror=retry error action for this feature.
>>>
>>> Does that sound good?
>>>
>>> Stefan
>>>
>>
>> Hi Stefan,
>>
>> Thanks for your reply. Extending the rerror=/werror= mechanism is a feasible
>> way for the retry feature.
>>
>> However, AFAIK, the rerror=/werror= mechanism in block-backend layer only
>> provides ACTION, and the real handler of errors need be implemented several
>> times in device layer for different devices. While our I/O Hang mechanism
>> directly handles AIO errors no matter which type of devices it is. Is it a
>> more common way to implement the feature in block-backend layer? Especially we
>> can set retry timeout in a common structure BlockBackend.
>>
>> Besides, is there any reason that QEMU implements the rerror=/werror mechansim
>> in device layer rather than in block-backend layer?
> 
> Yes, it's because failed requests can be live-migrated and retried on
> the destination host. In other words, live migration still works even
> when there are failed requests.
> 
> There may be things that can be refactored so there is less duplication
> in devices, but the basic design goal is that the block layer doesn't
> keep track of failed requests because they are live migrated together
> with the device state.
> 
> Maybe Kevin Wolf has more thoughts to share about rerror=/werror=.
> 
> Stefan
> 

Hi Kevin,

What do you think about extending rerror=/werror= for the retry feature?

And which place is better to set retry timeout, BlockBackend in
block layer or per device structure in device layer?

Jiahui