[v3,0/4] support NVMe smart critial warning injection

Message ID	20210114072251.334304-1-pizhenwei@bytedance.com (mailing list archive)
Headers	show Return-Path: <SRS0=LYIK=GR=nongnu.org=qemu-devel-bounces+qemu-devel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34219207AE From: zhenwei pi <pizhenwei@bytedance.com> To: kbusch@kernel.org, its@irrelevant.dk, kwolf@redhat.com, mreitz@redhat.com Subject: [PATCH v3 0/4] support NVMe smart critial warning injection Date: Thu, 14 Jan 2021 15:22:47 +0800 Message-Id: <20210114072251.334304-1-pizhenwei@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::52a; envelope-from=pizhenwei@bytedance.com; helo=mail-pg1-x52a.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Cc: zhenwei pi <pizhenwei@bytedance.com>, philmd@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	support NVMe smart critial warning injection \| expand [v3,0/4] support NVMe smart critial warning injection [v3,1/4] block/nvme: introduce bit 5 for critical warning [v3,2/4] hw/block/nvme: fix overwritten bar.cap [v3,3/4] hw/block/nvme: add smart_critical_warning property

Message ID

20210114072251.334304-1-pizhenwei@bytedance.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34219207AE
From: zhenwei pi <pizhenwei@bytedance.com>
To: kbusch@kernel.org, its@irrelevant.dk, kwolf@redhat.com, mreitz@redhat.com
Subject: [PATCH v3 0/4] support NVMe smart critial warning injection
Date: Thu, 14 Jan 2021 15:22:47 +0800
Message-Id: <20210114072251.334304-1-pizhenwei@bytedance.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::52a;
 envelope-from=pizhenwei@bytedance.com; helo=mail-pg1-x52a.google.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Cc: zhenwei pi <pizhenwei@bytedance.com>, philmd@redhat.com,
 qemu-devel@nongnu.org, qemu-block@nongnu.org
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: "Qemu-devel"
 <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>

Series

support NVMe smart critial warning injection | expand

Message

zhenwei pi Jan. 14, 2021, 7:22 a.m. UTC

v2 -> v3:
- Introduce "Persistent Memory Region has become read-only or
  unreliable"

- Fix overwritten bar.cap

- Check smart critical warning value from QOM.

- Trigger asynchronous event during smart warning injection.

v1 -> v2:
- Suggested by Philippe & Klaus, set/get smart_critical_warning by QMP.

v1:
- Add smart_critical_warning for nvme device which can be set by QEMU
  command line to emulate hardware error.

Zhenwei Pi (4):
  block/nvme: introduce bit 5 for critical warning
  hw/block/nvme: fix overwritten bar.cap
  hw/block/nvme: add smart_critical_warning property
  hw/blocl/nvme: trigger async event during injecting smart warning

 hw/block/nvme.c      | 86 ++++++++++++++++++++++++++++++++++++++++----
 hw/block/nvme.h      |  1 +
 include/block/nvme.h |  1 +
 3 files changed, 81 insertions(+), 7 deletions(-)

Comments

Klaus Jensen Jan. 14, 2021, 8:23 a.m. UTC | #1

On Jan 14 15:22, zhenwei pi wrote:
> During smart critical warning injection by setting property from QMP
> command, also try to trigger asynchronous event.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ce9a9c9023..1feb603471 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -847,6 +847,36 @@ static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
>      nvme_process_aers(n);
>  }
>  
> +static void nvme_enqueue_smart_event(NvmeCtrl *n, uint8_t event)

Maybe rename to just nvme_smart_event, since it is conditional if it
enqueues anything.

> +{
> +    uint8_t aer_info;
> +
> +    if (!(NVME_AEC_SMART(n->features.async_config) & event)) {
> +        return;
> +    }
> +
> +    /* Ref SPEC <Asynchronous Event Information ??? SMART / Health Status> */
> +    switch (event) {
> +    case NVME_SMART_SPARE:
> +        aer_info = NVME_AER_INFO_SMART_SPARE_THRESH;
> +        break;
> +    case NVME_SMART_TEMPERATURE:
> +        aer_info = NVME_AER_INFO_SMART_TEMP_THRESH;
> +        break;
> +    case NVME_SMART_RELIABILITY:
> +    case NVME_SMART_MEDIA_READ_ONLY:
> +    case NVME_SMART_FAILED_VOLATILE_MEDIA:
> +        aer_info = NVME_AER_INFO_SMART_RELIABILITY;
> +        break;
> +    case NVME_SMART_PMR_UNRELIABLE:
> +        /* TODO if NVME_SMART_PMR_UNRELIABLE is defined in future */

Doesn't NVME_SMART_PMR_UNRELIABLE fall under the
NVME_AER_INFO_SMART_RELIABILITY SMART/Health information group? The spec
says that the PMR becoming unreliable can cause an AEN, so I think that
is the only group that is usable.

> +    default:
> +        return;
> +    }
> +
> +    nvme_enqueue_event(n, NVME_AER_TYPE_SMART, aer_info, NVME_LOG_SMART_INFO);
> +}
> +
>  static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
>  {
>      n->aer_mask &= ~(1 << event_type);
> @@ -1824,12 +1854,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>              return NVME_INVALID_FIELD | NVME_DNR;
>          }
>  
> -        if (((n->temperature >= n->features.temp_thresh_hi) ||
> -             (n->temperature <= n->features.temp_thresh_low)) &&
> -            NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
> -            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
> -                               NVME_AER_INFO_SMART_TEMP_THRESH,
> -                               NVME_LOG_SMART_INFO);
> +        if ((n->temperature >= n->features.temp_thresh_hi) ||
> +             (n->temperature <= n->features.temp_thresh_low)) {
> +            nvme_enqueue_smart_event(n, NVME_AER_INFO_SMART_TEMP_THRESH);
>          }
>  
>          break;
> @@ -2841,7 +2868,7 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
>      NvmeCtrl *s = NVME(obj);
> -    uint8_t value, cap = 0;
> +    uint8_t value, cap = 0, event;
>      uint64_t pmr_cap = CAP_PMR_MASK;
>  
>      if (!visit_type_uint8(v, name, &value, errp)) {
> @@ -2860,6 +2887,12 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>      }
>  
>      s->smart_critical_warning = value;
> +
> +    /* test each bit of uint8_t for smart.critical_warning */
> +    for (event = 0; event < 8; event++) {
> +        if (value & (1 << event))
> +            nvme_enqueue_smart_event(s, 1 << event);
> +    }

I suggest you add a NVME_SMART_WARN_MAX to the NvmeSmartWarn enum with
value '6' and use that instead of the literal '8'.

>  }
>  
>  static const VMStateDescription nvme_vmstate = {
> -- 
> 2.25.1
> 
>

Philippe Mathieu-Daudé Jan. 14, 2021, 3:57 p.m. UTC | #2

On 1/14/21 8:22 AM, zhenwei pi wrote:
> During smart critical warning injection by setting property from QMP
> command, also try to trigger asynchronous event.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 7 deletions(-)
...
> +static void nvme_enqueue_smart_event(NvmeCtrl *n, uint8_t event)
> +{
> +    uint8_t aer_info;
> +
> +    if (!(NVME_AEC_SMART(n->features.async_config) & event)) {
> +        return;
> +    }
> +
> +    /* Ref SPEC <Asynchronous Event Information Ã¢â‚¬â€œ SMART / Health Status> */

Mojibake UTF-8 encoding problem?

Keith Busch Jan. 14, 2021, 10:29 p.m. UTC | #3

On Thu, Jan 14, 2021 at 03:22:51PM +0800, zhenwei pi wrote:
> @@ -2860,6 +2887,12 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>      }
>  
>      s->smart_critical_warning = value;
> +
> +    /* test each bit of uint8_t for smart.critical_warning */
> +    for (event = 0; event < 8; event++) {
> +        if (value & (1 << event))
> +            nvme_enqueue_smart_event(s, 1 << event);

I think you need to save the events that have already been raised with
the host so that you don't send duplicate responses everytime a new
event is added to the 'critical_warning'.