diff mbox series

[v9,09/10] hw/nvme: add reservation protocal command

Message ID 20240712023650.45626-10-luchangqi.123@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Support persistent reservation operations | expand

Commit Message

Changqi Lu July 12, 2024, 2:36 a.m. UTC
Add reservation acquire, reservation register,
reservation release and reservation report commands
in the nvme device layer.

By introducing these commands, this enables the nvme
device to perform reservation-related tasks, including
querying keys, querying reservation status, registering
reservation keys, initiating and releasing reservations,
as well as clearing and preempting reservations held by
other keys.

These commands are crucial for management and control of
shared storage resources in a persistent manner.
Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Acked-by: Klaus Jensen <k.jensen@samsung.com>
---
 hw/nvme/ctrl.c       | 318 +++++++++++++++++++++++++++++++++++++++++++
 hw/nvme/nvme.h       |   4 +
 include/block/nvme.h |  37 +++++
 3 files changed, 359 insertions(+)

Comments

Klaus Jensen July 15, 2024, 10:09 a.m. UTC | #1
On Jul 12 10:36, Changqi Lu wrote:
> Add reservation acquire, reservation register,
> reservation release and reservation report commands
> in the nvme device layer.
> 
> By introducing these commands, this enables the nvme
> device to perform reservation-related tasks, including
> querying keys, querying reservation status, registering
> reservation keys, initiating and releasing reservations,
> as well as clearing and preempting reservations held by
> other keys.
> 
> These commands are crucial for management and control of
> shared storage resources in a persistent manner.
> Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> Acked-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/nvme/ctrl.c       | 318 +++++++++++++++++++++++++++++++++++++++++++
>  hw/nvme/nvme.h       |   4 +
>  include/block/nvme.h |  37 +++++
>  3 files changed, 359 insertions(+)
> 

This looks good to me, but two comments.

  1. Do you have a small guide on how to get a simple test environment
     up for this?

  2. Can you touch on the justification for not supporting the remaining
     mandatory features required when indicating Reservation support?

     hw/nvme *does* compromise on mandatory features from time to time
     when it makes sense, so I'm not saying that not having the log
     pages, notifications and so on is a deal breaker, I'm just
     interested in the justification and/or motivation.

     For instance, I think the SPDK reserve test will fail on this due
     to lack of Host Identifier Feature support.
Changqi Lu July 17, 2024, 9:04 a.m. UTC | #2
Hi,


Thank you for your support.
1. I will add a guide on how to get a simple test at next patch.
2. I think hostid is stored locally like cntlid, but I can't
find a way to get the host ID. Is it correct to get it through
qmp_query_uuid() method?

Using spdk as target will not fail, but it will show 0 at hostid
at present.

On 2024/7/15 18:09, Klaus Jensen wrote:
> On Jul 12 10:36, Changqi Lu wrote:
>> Add reservation acquire, reservation register,
>> reservation release and reservation report commands
>> in the nvme device layer.
>>
>> By introducing these commands, this enables the nvme
>> device to perform reservation-related tasks, including
>> querying keys, querying reservation status, registering
>> reservation keys, initiating and releasing reservations,
>> as well as clearing and preempting reservations held by
>> other keys.
>>
>> These commands are crucial for management and control of
>> shared storage resources in a persistent manner.
>> Signed-off-by: Changqi Lu
>> Signed-off-by: zhenwei pi
>> Acked-by: Klaus Jensen
>> ---
>> hw/nvme/ctrl.c | 318 +++++++++++++++++++++++++++++++++++++++++++
>> hw/nvme/nvme.h | 4 +
>> include/block/nvme.h | 37 +++++
>> 3 files changed, 359 insertions(+)
>>
>
> This looks good to me, but two comments.
>
> 1. Do you have a small guide on how to get a simple test environment
> up for this?
>
> 2. Can you touch on the justification for not supporting the remaining
> mandatory features required when indicating Reservation support?
>
> hw/nvme *does* compromise on mandatory features from time to time
> when it makes sense, so I'm not saying that not having the log
> pages, notifications and so on is a deal breaker, I'm just
> interested in the justification and/or motivation.
>
> For instance, I think the SPDK reserve test will fail on this due
> to lack of Host Identifier Feature support.
Changqi Lu July 26, 2024, 2:42 a.m. UTC | #3
Hi,

```
2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
2688 reservation->key ? 1 : 0;
2689 /* hostid is not supported currently */
2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
```

Klaus, I think hostid(2685) is stored locally like cntlid, i
can get cntlid by nvme_ctrl(req)->cntlid, but I can't
find a good way to get the host ID(2670). So I add a comment
"/* hostid is not supported currently */". Could you give me
some advices?

And using spdk as target will not fail, but it will show 0 at hostid
at present. The relevant tests in qemu are as follows,

```
root@node1:~# nvme resv-report /dev/nvme0n1
NVME Reservation Report success

NVME Reservation status:

gen : 1
regctl : 1
rtype : 0
ptpls : 0
regctl[0] :
cntlid : 0
rcsts : 0
hostid : 0
rkey : 6
```

On 2024/7/17 17:03, 卢长奇 wrote:
> Hi,
>
>
> Thank you for your support.
> 1. I will add a guide on how to get a simple test at next patch.
> 2. I think hostid is stored locally like cntlid, but I can't
> find a way to get the host ID. Is it correct to get it through
> qmp_query_uuid() method?
>
> Using spdk as target will not fail, but it will show 0 at hostid
> at present.
>
> On 2024/7/15 18:09, Klaus Jensen wrote:
>> On Jul 12 10:36, Changqi Lu wrote:
>>> Add reservation acquire, reservation register,
>>> reservation release and reservation report commands
>>> in the nvme device layer.
>>>
>>> By introducing these commands, this enables the nvme
>>> device to perform reservation-related tasks, including
>>> querying keys, querying reservation status, registering
>>> reservation keys, initiating and releasing reservations,
>>> as well as clearing and preempting reservations held by
>>> other keys.
>>>
>>> These commands are crucial for management and control of
>>> shared storage resources in a persistent manner.
>>> Signed-off-by: Changqi Lu
>>> Signed-off-by: zhenwei pi
>>> Acked-by: Klaus Jensen
>>> ---
>>> hw/nvme/ctrl.c | 318 +++++++++++++++++++++++++++++++++++++++++++
>>> hw/nvme/nvme.h | 4 +
>>> include/block/nvme.h | 37 +++++
>>> 3 files changed, 359 insertions(+)
>>>
>>
>> This looks good to me, but two comments.
>>
>> 1. Do you have a small guide on how to get a simple test environment
>> up for this?
>>
>> 2. Can you touch on the justification for not supporting the remaining
>> mandatory features required when indicating Reservation support?
>>
>> hw/nvme *does* compromise on mandatory features from time to time
>> when it makes sense, so I'm not saying that not having the log
>> pages, notifications and so on is a deal breaker, I'm just
>> interested in the justification and/or motivation.
>>
>> For instance, I think the SPDK reserve test will fail on this due
>> to lack of Host Identifier Feature support.
Klaus Jensen July 26, 2024, 6:25 a.m. UTC | #4
On Jul 25 19:42, 卢长奇 wrote:
> Hi,
> 
> ```
> 2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
> 2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
> 2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
> 2688 reservation->key ? 1 : 0;
> 2689 /* hostid is not supported currently */
> 2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
> ```
> 
> Klaus, I think hostid(2685) is stored locally like cntlid, i
> can get cntlid by nvme_ctrl(req)->cntlid, but I can't
> find a good way to get the host ID(2670). So I add a comment
> "/* hostid is not supported currently */". Could you give me
> some advices?
> 

The Host Identifier is just a 64 or 128 bit value that the host can set
with Set Feature. So, it is fine (and normal) that the value is
initially zero, but the host should be able to set it on controllers
with Set Feature to indicate if a controller belongs to the same host or
not.

> And using spdk as target will not fail, but it will show 0 at hostid
> at present.

Host Identifier 0 is a valid value when used with reservations; 0
indicates that the host associated with the controller is not associated
with any other controllers in the subsystem. So if two controllers have
Host Identifier set to 0, that implicitly mean they are associated with
two different hosts.

> The relevant tests in qemu are as follows,
> 
> ```
> root@node1:~# nvme resv-report /dev/nvme0n1
> NVME Reservation Report success
> 
> NVME Reservation status:
> 
> gen : 1
> regctl : 1
> rtype : 0
> ptpls : 0
> regctl[0] :
> cntlid : 0
> rcsts : 0
> hostid : 0
> rkey : 6
> ```

I was hoping for an example on how to setup some simple iscsi stuff so I
could test the feature.
Changqi Lu July 26, 2024, 9:54 a.m. UTC | #5
Hi;

You can test it in spdk.
First start spdk and execute the following command.

```
dd if=/dev/zero of=test.img bs=1G count=10
RPC=/root/source/spdk/spdk/scripts/rpc.py
FILE=/root/test.img

$RPC bdev_aio_create $FILE aio0 512
$RPC iscsi_create_portal_group 1 127.0.0.1:3260
$RPC iscsi_create_initiator_group 2 ANY ANY
$RPC iscsi_create_target_node target0 target0_alias aio0:0 1:2 64 -d
```

Then start qemu and mount an nvme disk.
Execute the following test command.
```
#reporter
nvme resv-report /dev/nvme0n1
#register
nvme resv-register /dev/nvme0n1 --nrkey 3 --rrega 0
#unregister
nvme resv-register /dev/nvme0n1 --crkey 3 --rrega 1
# register replace
nvme resv-register /dev/nvme0n1 --crkey 3 --nrkey 5 --rrega 2
#release
nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 0
#clear
nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 1
#reserve
nvme resv-acquire /dev/nvme0n1 --crkey 3 --rtype 1 --racqa 0
#premmpt
nvme resv-acquire /dev/nvme0n1 --crkey 6 --prkey 3 --rtype 1 --racqa 1
```



On 2024/7/26 14:25, Klaus Jensen wrote:
> On Jul 25 19:42, 卢长奇 wrote:
>> Hi,
>>
>> ```
>> 2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
>> 2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
>> 2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
>> 2688 reservation->key ? 1 : 0;
>> 2689 /* hostid is not supported currently */
>> 2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
>> ```
>>
>> Klaus, I think hostid(2685) is stored locally like cntlid, i
>> can get cntlid by nvme_ctrl(req)->cntlid, but I can't
>> find a good way to get the host ID(2670). So I add a comment
>> "/* hostid is not supported currently */". Could you give me
>> some advices?
>>
>
> The Host Identifier is just a 64 or 128 bit value that the host can set
> with Set Feature. So, it is fine (and normal) that the value is
> initially zero, but the host should be able to set it on controllers
> with Set Feature to indicate if a controller belongs to the same host or
> not.
>
>> And using spdk as target will not fail, but it will show 0 at hostid
>> at present.
>
> Host Identifier 0 is a valid value when used with reservations; 0
> indicates that the host associated with the controller is not associated
> with any other controllers in the subsystem. So if two controllers have
> Host Identifier set to 0, that implicitly mean they are associated with
> two different hosts.
>
>> The relevant tests in qemu are as follows,
>>
>> ```
>> root@node1:~# nvme resv-report /dev/nvme0n1
>> NVME Reservation Report success
>>
>> NVME Reservation status:
>>
>> gen : 1
>> regctl : 1
>> rtype : 0
>> ptpls : 0
>> regctl[0] :
>> cntlid : 0
>> rcsts : 0
>> hostid : 0
>> rkey : 6
>> ```
>
> I was hoping for an example on how to setup some simple iscsi stuff so I
> could test the feature.
Changqi Lu Aug. 6, 2024, 2:56 a.m. UTC | #6
Hi;

Klaus, Does the test method in the above email work properly?

On 2024/7/26 17:53, 卢长奇 wrote:
> Hi;
>
> You can test it in spdk.
> First start spdk and execute the following command.
>
> ```
> dd if=/dev/zero of=test.img bs=1G count=10
> RPC=/root/source/spdk/spdk/scripts/rpc.py
> FILE=/root/test.img
>
> $RPC bdev_aio_create $FILE aio0 512
> $RPC iscsi_create_portal_group 1 127.0.0.1:3260
> $RPC iscsi_create_initiator_group 2 ANY ANY
> $RPC iscsi_create_target_node target0 target0_alias aio0:0 1:2 64 -d
> ```
>
> Then start qemu and mount an nvme disk.
> Execute the following test command.
> ```
> #reporter
> nvme resv-report /dev/nvme0n1
> #register
> nvme resv-register /dev/nvme0n1 --nrkey 3 --rrega 0
> #unregister
> nvme resv-register /dev/nvme0n1 --crkey 3 --rrega 1
> # register replace
> nvme resv-register /dev/nvme0n1 --crkey 3 --nrkey 5 --rrega 2
> #release
> nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 0
> #clear
> nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 1
> #reserve
> nvme resv-acquire /dev/nvme0n1 --crkey 3 --rtype 1 --racqa 0
> #premmpt
> nvme resv-acquire /dev/nvme0n1 --crkey 6 --prkey 3 --rtype 1 --racqa 1
> ```
>
>
>
> On 2024/7/26 14:25, Klaus Jensen wrote:
>> On Jul 25 19:42, 卢长奇 wrote:
>>> Hi,
>>>
>>> ```
>>> 2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
>>> 2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
>>> 2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
>>> 2688 reservation->key ? 1 : 0;
>>> 2689 /* hostid is not supported currently */
>>> 2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
>>> ```
>>>
>>> Klaus, I think hostid(2685) is stored locally like cntlid, i
>>> can get cntlid by nvme_ctrl(req)->cntlid, but I can't
>>> find a good way to get the host ID(2670). So I add a comment
>>> "/* hostid is not supported currently */". Could you give me
>>> some advices?
>>>
>>
>> The Host Identifier is just a 64 or 128 bit value that the host can set
>> with Set Feature. So, it is fine (and normal) that the value is
>> initially zero, but the host should be able to set it on controllers
>> with Set Feature to indicate if a controller belongs to the same host or
>> not.
>>
>>> And using spdk as target will not fail, but it will show 0 at hostid
>>> at present.
>>
>> Host Identifier 0 is a valid value when used with reservations; 0
>> indicates that the host associated with the controller is not associated
>> with any other controllers in the subsystem. So if two controllers have
>> Host Identifier set to 0, that implicitly mean they are associated with
>> two different hosts.
>>
>>> The relevant tests in qemu are as follows,
>>>
>>> ```
>>> root@node1:~# nvme resv-report /dev/nvme0n1
>>> NVME Reservation Report success
>>>
>>> NVME Reservation status:
>>>
>>> gen : 1
>>> regctl : 1
>>> rtype : 0
>>> ptpls : 0
>>> regctl[0] :
>>> cntlid : 0
>>> rcsts : 0
>>> hostid : 0
>>> rkey : 6
>>> ```
>>
>> I was hoping for an example on how to setup some simple iscsi stuff so I
>> could test the feature.
Klaus Jensen Aug. 18, 2024, 4:57 p.m. UTC | #7
On Aug  6 04:56, 卢长奇 wrote:
> Hi;
> 
> Klaus, Does the test method in the above email work properly?
> 

I assume it will; I was on conference and was catching up all of last
week.

I'll take action on this this week.

> On 2024/7/26 17:53, 卢长奇 wrote:
> > Hi;
> >
> > You can test it in spdk.
> > First start spdk and execute the following command.
> >
> > ```
> > dd if=/dev/zero of=test.img bs=1G count=10
> > RPC=/root/source/spdk/spdk/scripts/rpc.py
> > FILE=/root/test.img
> >
> > $RPC bdev_aio_create $FILE aio0 512
> > $RPC iscsi_create_portal_group 1 127.0.0.1:3260
> > $RPC iscsi_create_initiator_group 2 ANY ANY
> > $RPC iscsi_create_target_node target0 target0_alias aio0:0 1:2 64 -d
> > ```
> >
> > Then start qemu and mount an nvme disk.
> > Execute the following test command.
> > ```
> > #reporter
> > nvme resv-report /dev/nvme0n1
> > #register
> > nvme resv-register /dev/nvme0n1 --nrkey 3 --rrega 0
> > #unregister
> > nvme resv-register /dev/nvme0n1 --crkey 3 --rrega 1
> > # register replace
> > nvme resv-register /dev/nvme0n1 --crkey 3 --nrkey 5 --rrega 2
> > #release
> > nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 0
> > #clear
> > nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 1
> > #reserve
> > nvme resv-acquire /dev/nvme0n1 --crkey 3 --rtype 1 --racqa 0
> > #premmpt
> > nvme resv-acquire /dev/nvme0n1 --crkey 6 --prkey 3 --rtype 1 --racqa 1
> > ```
> >
> >
> >
> > On 2024/7/26 14:25, Klaus Jensen wrote:
> >> On Jul 25 19:42, 卢长奇 wrote:
> >>> Hi,
> >>>
> >>> ```
> >>> 2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
> >>> 2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
> >>> 2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
> >>> 2688 reservation->key ? 1 : 0;
> >>> 2689 /* hostid is not supported currently */
> >>> 2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
> >>> ```
> >>>
> >>> Klaus, I think hostid(2685) is stored locally like cntlid, i
> >>> can get cntlid by nvme_ctrl(req)->cntlid, but I can't
> >>> find a good way to get the host ID(2670). So I add a comment
> >>> "/* hostid is not supported currently */". Could you give me
> >>> some advices?
> >>>
> >>
> >> The Host Identifier is just a 64 or 128 bit value that the host can set
> >> with Set Feature. So, it is fine (and normal) that the value is
> >> initially zero, but the host should be able to set it on controllers
> >> with Set Feature to indicate if a controller belongs to the same host or
> >> not.
> >>
> >>> And using spdk as target will not fail, but it will show 0 at hostid
> >>> at present.
> >>
> >> Host Identifier 0 is a valid value when used with reservations; 0
> >> indicates that the host associated with the controller is not associated
> >> with any other controllers in the subsystem. So if two controllers have
> >> Host Identifier set to 0, that implicitly mean they are associated with
> >> two different hosts.
> >>
> >>> The relevant tests in qemu are as follows,
> >>>
> >>> ```
> >>> root@node1:~# nvme resv-report /dev/nvme0n1
> >>> NVME Reservation Report success
> >>>
> >>> NVME Reservation status:
> >>>
> >>> gen : 1
> >>> regctl : 1
> >>> rtype : 0
> >>> ptpls : 0
> >>> regctl[0] :
> >>> cntlid : 0
> >>> rcsts : 0
> >>> hostid : 0
> >>> rkey : 6
> >>> ```
> >>
> >> I was hoping for an example on how to setup some simple iscsi stuff so I
> >> could test the feature.
Klaus Jensen Aug. 28, 2024, 6:05 a.m. UTC | #8
On Jul 26 05:54, 卢长奇 wrote:
> Hi;
> 
> You can test it in spdk.
> First start spdk and execute the following command.
> 
> ```
> dd if=/dev/zero of=test.img bs=1G count=10
> RPC=/root/source/spdk/spdk/scripts/rpc.py
> FILE=/root/test.img
> 
> $RPC bdev_aio_create $FILE aio0 512
> $RPC iscsi_create_portal_group 1 127.0.0.1:3260
> $RPC iscsi_create_initiator_group 2 ANY ANY
> $RPC iscsi_create_target_node target0 target0_alias aio0:0 1:2 64 -d
> ```
> 
> Then start qemu and mount an nvme disk.

Forgive my ignorance when it comes to iSCSI, but can you enlighten me as
to what the name of the target is? I tried a bunch of combinations.

I configured QEMU with

  -drive id=target0,file=iscsi://127.0.0.1/target0/0
  -device nvme,serial=deadbeef,drive=target0

But on the spdk side (iscsi_tgt) I'm getting "target0 not found".

> Execute the following test command.
> ```
> #reporter
> nvme resv-report /dev/nvme0n1
> #register
> nvme resv-register /dev/nvme0n1 --nrkey 3 --rrega 0
> #unregister
> nvme resv-register /dev/nvme0n1 --crkey 3 --rrega 1
> # register replace
> nvme resv-register /dev/nvme0n1 --crkey 3 --nrkey 5 --rrega 2
> #release
> nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 0
> #clear
> nvme resv-release /dev/nvme0n1 --crkey 5 --rtype 1 --rrela 1
> #reserve
> nvme resv-acquire /dev/nvme0n1 --crkey 3 --rtype 1 --racqa 0
> #premmpt
> nvme resv-acquire /dev/nvme0n1 --crkey 6 --prkey 3 --rtype 1 --racqa 1
> ```
> 
> 
> 
> On 2024/7/26 14:25, Klaus Jensen wrote:
> > On Jul 25 19:42, 卢长奇 wrote:
> >> Hi,
> >>
> >> ```
> >> 2685 nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
> >> 2686 nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
> >> 2687 nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
> >> 2688 reservation->key ? 1 : 0;
> >> 2689 /* hostid is not supported currently */
> >> 2670 memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
> >> ```
> >>
> >> Klaus, I think hostid(2685) is stored locally like cntlid, i
> >> can get cntlid by nvme_ctrl(req)->cntlid, but I can't
> >> find a good way to get the host ID(2670). So I add a comment
> >> "/* hostid is not supported currently */". Could you give me
> >> some advices?
> >>
> >
> > The Host Identifier is just a 64 or 128 bit value that the host can set
> > with Set Feature. So, it is fine (and normal) that the value is
> > initially zero, but the host should be able to set it on controllers
> > with Set Feature to indicate if a controller belongs to the same host or
> > not.
> >
> >> And using spdk as target will not fail, but it will show 0 at hostid
> >> at present.
> >
> > Host Identifier 0 is a valid value when used with reservations; 0
> > indicates that the host associated with the controller is not associated
> > with any other controllers in the subsystem. So if two controllers have
> > Host Identifier set to 0, that implicitly mean they are associated with
> > two different hosts.
> >
> >> The relevant tests in qemu are as follows,
> >>
> >> ```
> >> root@node1:~# nvme resv-report /dev/nvme0n1
> >> NVME Reservation Report success
> >>
> >> NVME Reservation status:
> >>
> >> gen : 1
> >> regctl : 1
> >> rtype : 0
> >> ptpls : 0
> >> regctl[0] :
> >> cntlid : 0
> >> rcsts : 0
> >> hostid : 0
> >> rkey : 6
> >> ```
> >
> > I was hoping for an example on how to setup some simple iscsi stuff so I
> > could test the feature.
Klaus Jensen Aug. 28, 2024, 6:10 a.m. UTC | #9
On Jul 12 10:36, Changqi Lu wrote:
> Add reservation acquire, reservation register,
> reservation release and reservation report commands
> in the nvme device layer.
> 
> By introducing these commands, this enables the nvme
> device to perform reservation-related tasks, including
> querying keys, querying reservation status, registering
> reservation keys, initiating and releasing reservations,
> as well as clearing and preempting reservations held by
> other keys.
> 
> These commands are crucial for management and control of
> shared storage resources in a persistent manner.
> Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> Acked-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/nvme/ctrl.c       | 318 +++++++++++++++++++++++++++++++++++++++++++
>  hw/nvme/nvme.h       |   4 +
>  include/block/nvme.h |  37 +++++
>  3 files changed, 359 insertions(+)
> 

> +static int nvme_read_reservation_cb(NvmeReadReservation *reservation)
> +{
> +    int rc;
> +    NvmeReservationStatus *nvme_status;
> +    NvmeRequest *req = reservation->req;
> +    NvmeCtrl *n = req->sq->ctrl;
> +    NvmeResvKeys *keys_info = reservation->keys_info;
> +    int len = sizeof(NvmeReservationStatusHeader) +
> +              sizeof(NvmeRegisteredCtrl) * keys_info->num_keys;
> +
> +    nvme_status = g_malloc(len);
> +    nvme_status->header.gen = reservation->generation;
> +    nvme_status->header.rtype = block_pr_type_to_nvme(reservation->type);
> +    nvme_status->header.regctl = keys_info->num_keys;
> +    for (int i = 0; i < keys_info->num_keys; i++) {
> +        nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
> +        nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
> +        nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
> +                                          reservation->key ? 1 : 0;
> +        /* hostid is not supported currently */
> +        memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
> +    }
> +
> +    rc = nvme_c2h(n, (uint8_t *)nvme_status, len, req);
> +    g_free(nvme_status);
> +    return rc;
> +}
> +
> +static int nvme_read_reservation_ext_cb(NvmeReadReservation *reservation)
> +{
> +    int rc;
> +    NvmeReservationStatusExt *nvme_status_ext;
> +    NvmeRequest *req = reservation->req;
> +    NvmeCtrl *n = req->sq->ctrl;
> +    NvmeResvKeys *keys_info = reservation->keys_info;
> +    int len = sizeof(NvmeReservationStatusHeader) +
> +              sizeof(uint8_t) * 40 +
> +              sizeof(NvmeRegisteredCtrlExt) * keys_info->num_keys;
> +
> +    nvme_status_ext = g_malloc(len);

This leaks heap memory due to uninitialized reserved fields in
NvmeReservationStatusExt. Prefer g_malloc0.

The one above in nvme_read_reservation_cb looks safe, but prefer
g_malloc0 there anyway.

> +    nvme_status_ext->header.gen = cpu_to_be32(reservation->generation);
> +    nvme_status_ext->header.rtype = block_pr_type_to_nvme(reservation->type);
> +    nvme_status_ext->header.regctl = cpu_to_be16(keys_info->num_keys);
> +
> +    for (int i = 0; i < keys_info->num_keys; i++) {
> +        uint16_t ctnlid = nvme_ctrl(req)->cntlid;
> +        nvme_status_ext->regctl_eds[i].cntlid = cpu_to_be16(ctnlid);
> +        nvme_status_ext->regctl_eds[i].rkey = cpu_to_be64(keys_info->keys[i]);
> +        nvme_status_ext->regctl_eds[i].rcsts = keys_info->keys[i] ==
> +                                               reservation->key ? 1 : 0;
> +        /* hostid is not supported currently */
> +        memset(&nvme_status_ext->regctl_eds[i].hostid, 0, 16);
> +    }
> +
> +    rc = nvme_c2h(n, (uint8_t *)nvme_status_ext, len, req);
> +    g_free(nvme_status_ext);
> +    return rc;
> +}
Changqi Lu Aug. 28, 2024, 6:27 a.m. UTC | #10
Hi,

Thanks for your advice.
Previously, g_malloc0 was used, but a segmentation
fault occurred during testing. Later, debugging
revealed that a field in struct NvmeReservationStatusExt,
struct NvmeRegisteredCtrlExt regctl_eds[], was a
variable-length array. Therefore, g_malloc was used
to apply after the array length was determined.
Therefore, g_malloc0 was not used here.

On 2024/8/28 14:10, Klaus Jensen wrote:
> On Jul 12 10:36, Changqi Lu wrote:
>> Add reservation acquire, reservation register,
>> reservation release and reservation report commands
>> in the nvme device layer.
>>
>> By introducing these commands, this enables the nvme
>> device to perform reservation-related tasks, including
>> querying keys, querying reservation status, registering
>> reservation keys, initiating and releasing reservations,
>> as well as clearing and preempting reservations held by
>> other keys.
>>
>> These commands are crucial for management and control of
>> shared storage resources in a persistent manner.
>> Signed-off-by: Changqi Lu
>> Signed-off-by: zhenwei pi
>> Acked-by: Klaus Jensen
>> ---
>> hw/nvme/ctrl.c | 318 +++++++++++++++++++++++++++++++++++++++++++
>> hw/nvme/nvme.h | 4 +
>> include/block/nvme.h | 37 +++++
>> 3 files changed, 359 insertions(+)
>>
>
>> +static int nvme_read_reservation_cb(NvmeReadReservation *reservation)
>> +{
>> + int rc;
>> + NvmeReservationStatus *nvme_status;
>> + NvmeRequest *req = reservation->req;
>> + NvmeCtrl *n = req->sq->ctrl;
>> + NvmeResvKeys *keys_info = reservation->keys_info;
>> + int len = sizeof(NvmeReservationStatusHeader) +
>> + sizeof(NvmeRegisteredCtrl) * keys_info->num_keys;
>> +
>> + nvme_status = g_malloc(len);
>> + nvme_status->header.gen = reservation->generation;
>> + nvme_status->header.rtype = block_pr_type_to_nvme(reservation->type);
>> + nvme_status->header.regctl = keys_info->num_keys;
>> + for (int i = 0; i < keys_info->num_keys; i++) {
>> + nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
>> + nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
>> + nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
>> + reservation->key ? 1 : 0;
>> + /* hostid is not supported currently */
>> + memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
>> + }
>> +
>> + rc = nvme_c2h(n, (uint8_t *)nvme_status, len, req);
>> + g_free(nvme_status);
>> + return rc;
>> +}
>> +
>> +static int nvme_read_reservation_ext_cb(NvmeReadReservation
*reservation)
>> +{
>> + int rc;
>> + NvmeReservationStatusExt *nvme_status_ext;
>> + NvmeRequest *req = reservation->req;
>> + NvmeCtrl *n = req->sq->ctrl;
>> + NvmeResvKeys *keys_info = reservation->keys_info;
>> + int len = sizeof(NvmeReservationStatusHeader) +
>> + sizeof(uint8_t) * 40 +
>> + sizeof(NvmeRegisteredCtrlExt) * keys_info->num_keys;
>> +
>> + nvme_status_ext = g_malloc(len);
>
> This leaks heap memory due to uninitialized reserved fields in
> NvmeReservationStatusExt. Prefer g_malloc0.
>
> The one above in nvme_read_reservation_cb looks safe, but prefer
> g_malloc0 there anyway.
>
>> + nvme_status_ext->header.gen = cpu_to_be32(reservation->generation);
>> + nvme_status_ext->header.rtype =
block_pr_type_to_nvme(reservation->type);
>> + nvme_status_ext->header.regctl = cpu_to_be16(keys_info->num_keys);
>> +
>> + for (int i = 0; i < keys_info->num_keys; i++) {
>> + uint16_t ctnlid = nvme_ctrl(req)->cntlid;
>> + nvme_status_ext->regctl_eds[i].cntlid = cpu_to_be16(ctnlid);
>> + nvme_status_ext->regctl_eds[i].rkey = cpu_to_be64(keys_info->keys[i]);
>> + nvme_status_ext->regctl_eds[i].rcsts = keys_info->keys[i] ==
>> + reservation->key ? 1 : 0;
>> + /* hostid is not supported currently */
>> + memset(&nvme_status_ext->regctl_eds[i].hostid, 0, 16);
>> + }
>> +
>> + rc = nvme_c2h(n, (uint8_t *)nvme_status_ext, len, req);
>> + g_free(nvme_status_ext);
>> + return rc;
>> +}
Klaus Jensen Aug. 28, 2024, 6:51 a.m. UTC | #11
On Jul 12 10:36, Changqi Lu wrote:
> Add reservation acquire, reservation register,
> reservation release and reservation report commands
> in the nvme device layer.
> 
> By introducing these commands, this enables the nvme
> device to perform reservation-related tasks, including
> querying keys, querying reservation status, registering
> reservation keys, initiating and releasing reservations,
> as well as clearing and preempting reservations held by
> other keys.
> 
> These commands are crucial for management and control of
> shared storage resources in a persistent manner.
> Signed-off-by: Changqi Lu <luchangqi.123@bytedance.com>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> Acked-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  hw/nvme/ctrl.c       | 318 +++++++++++++++++++++++++++++++++++++++++++
>  hw/nvme/nvme.h       |   4 +
>  include/block/nvme.h |  37 +++++
>  3 files changed, 359 insertions(+)
> 

> +static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
> +{
> +    int ret;
> +    NvmeKeyInfo key_info;
> +    NvmeNamespace *ns = req->ns;
> +    uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
> +    bool ignore_key = cdw10 >> 3 & 0x1;
> +    uint8_t action = cdw10 & 0x7;
> +    uint8_t ptpl = cdw10 >> 30 & 0x3;
> +    bool aptpl;
> +
> +    switch (ptpl) {
> +    case NVME_RESV_PTPL_NO_CHANGE:
> +        aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
> +        break;
> +    case NVME_RESV_PTPL_DISABLE:
> +        aptpl = false;
> +        break;
> +    case NVME_RESV_PTPL_ENABLE:
> +        aptpl = true;
> +        break;
> +    default:
> +        return NVME_INVALID_FIELD;
> +    }
> +
> +    ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    switch (action) {
> +    case NVME_RESV_REGISTER_ACTION_REGISTER:
> +        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
> +                                         key_info.nr_key, 0, aptpl,
> +                                         ignore_key, nvme_misc_cb,
> +                                         req);
> +        break;
> +    case NVME_RESV_REGISTER_ACTION_UNREGISTER:
> +        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
> +                                         0, aptpl, ignore_key,
> +                                         nvme_misc_cb, req);
> +        break;
> +    case NVME_RESV_REGISTER_ACTION_REPLACE:
> +        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
> +                                         key_info.nr_key, 0, aptpl, ignore_key,
> +                                         nvme_misc_cb, req);
> +        break;

There should be some check on rescap I think. On a setup without
reservation support from the block layer, these functions ends up
returning ENOTSUP which causes hw/nvme to end up returning a Write Fault
(which is a little wonky).

Should they return invalid field, invalid opcode?
Changqi Lu Aug. 28, 2024, 7:20 a.m. UTC | #12
Hi,

I want to know if I understand it correctly.

```
static void nvme_aio_err(NvmeRequest *req, int ret)
{
uint16_t status = NVME_SUCCESS;
Error *local_err = NULL;

switch (req->cmd.opcode) {
case NVME_CMD_READ:
case NVME_CMD_RESV_REPORT:
status = NVME_UNRECOVERED_READ;
break;
case NVME_CMD_FLUSH:
case NVME_CMD_WRITE:
case NVME_CMD_WRITE_ZEROES:
case NVME_CMD_ZONE_APPEND:
case NVME_CMD_COPY:
case NVME_CMD_RESV_REGISTER:
case NVME_CMD_RESV_ACQUIRE:
case NVME_CMD_RESV_RELEASE:
status = NVME_WRITE_FAULT;
break;
default:
status = NVME_INTERNAL_DEV_ERROR;
break;
}

trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);

error_setg_errno(&local_err, -ret, "aio failed");
error_report_err(local_err);

/*
* Set the command status code to the first encountered error but
allow a
* subsequent Internal Device Error to trump it.
*/
if (req->status && status != NVME_INTERNAL_DEV_ERROR) {
return;
}

req->status = status;
}
```
In the above use case, if it is a pr-related command and the error code
is not supported, the invalid error code should be returned instead of
the Fault error code.


On 2024/8/28 14:51, Klaus Jensen wrote:
> On Jul 12 10:36, Changqi Lu wrote:
>> Add reservation acquire, reservation register,
>> reservation release and reservation report commands
>> in the nvme device layer.
>>
>> By introducing these commands, this enables the nvme
>> device to perform reservation-related tasks, including
>> querying keys, querying reservation status, registering
>> reservation keys, initiating and releasing reservations,
>> as well as clearing and preempting reservations held by
>> other keys.
>>
>> These commands are crucial for management and control of
>> shared storage resources in a persistent manner.
>> Signed-off-by: Changqi Lu
>> Signed-off-by: zhenwei pi
>> Acked-by: Klaus Jensen
>> ---
>> hw/nvme/ctrl.c | 318 +++++++++++++++++++++++++++++++++++++++++++
>> hw/nvme/nvme.h | 4 +
>> include/block/nvme.h | 37 +++++
>> 3 files changed, 359 insertions(+)
>>
>
>> +static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
>> +{
>> + int ret;
>> + NvmeKeyInfo key_info;
>> + NvmeNamespace *ns = req->ns;
>> + uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
>> + bool ignore_key = cdw10 >> 3 & 0x1;
>> + uint8_t action = cdw10 & 0x7;
>> + uint8_t ptpl = cdw10 >> 30 & 0x3;
>> + bool aptpl;
>> +
>> + switch (ptpl) {
>> + case NVME_RESV_PTPL_NO_CHANGE:
>> + aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
>> + break;
>> + case NVME_RESV_PTPL_DISABLE:
>> + aptpl = false;
>> + break;
>> + case NVME_RESV_PTPL_ENABLE:
>> + aptpl = true;
>> + break;
>> + default:
>> + return NVME_INVALID_FIELD;
>> + }
>> +
>> + ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
>> + if (ret) {
>> + return ret;
>> + }
>> +
>> + switch (action) {
>> + case NVME_RESV_REGISTER_ACTION_REGISTER:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
>> + key_info.nr_key, 0, aptpl,
>> + ignore_key, nvme_misc_cb,
>> + req);
>> + break;
>> + case NVME_RESV_REGISTER_ACTION_UNREGISTER:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
>> + 0, aptpl, ignore_key,
>> + nvme_misc_cb, req);
>> + break;
>> + case NVME_RESV_REGISTER_ACTION_REPLACE:
>> + req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
>> + key_info.nr_key, 0, aptpl, ignore_key,
>> + nvme_misc_cb, req);
>> + break;
>
> There should be some check on rescap I think. On a setup without
> reservation support from the block layer, these functions ends up
> returning ENOTSUP which causes hw/nvme to end up returning a Write Fault
> (which is a little wonky).
>
> Should they return invalid field, invalid opcode?
Klaus Jensen Aug. 28, 2024, 7:27 a.m. UTC | #13
On Aug 28 00:20, 卢长奇 wrote:
> Hi,
> 
> I want to know if I understand it correctly.
> 
> ```
> static void nvme_aio_err(NvmeRequest *req, int ret)
> {
> uint16_t status = NVME_SUCCESS;
> Error *local_err = NULL;
> 
> switch (req->cmd.opcode) {
> case NVME_CMD_READ:
> case NVME_CMD_RESV_REPORT:
> status = NVME_UNRECOVERED_READ;
> break;
> case NVME_CMD_FLUSH:
> case NVME_CMD_WRITE:
> case NVME_CMD_WRITE_ZEROES:
> case NVME_CMD_ZONE_APPEND:
> case NVME_CMD_COPY:
> case NVME_CMD_RESV_REGISTER:
> case NVME_CMD_RESV_ACQUIRE:
> case NVME_CMD_RESV_RELEASE:
> status = NVME_WRITE_FAULT;
> break;
> default:
> status = NVME_INTERNAL_DEV_ERROR;
> break;
> }
> 
> trace_pci_nvme_err_aio(nvme_cid(req), strerror(-ret), status);
> 
> error_setg_errno(&local_err, -ret, "aio failed");
> error_report_err(local_err);
> 
> /*
> * Set the command status code to the first encountered error but
> allow a
> * subsequent Internal Device Error to trump it.
> */
> if (req->status && status != NVME_INTERNAL_DEV_ERROR) {
> return;
> }
> 
> req->status = status;
> }
> ```
> In the above use case, if it is a pr-related command and the error code
> is not supported, the invalid error code should be returned instead of
> the Fault error code.
> 

Yes, as far as I can tell from the spec, if a Reservations related
command is issued on a controller/namespace that does not BOTH support
Reservations (i.e., in ONCS and RESCAP), then return Invalid Command
Opcode.
diff mbox series

Patch

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index ad212de723..b6e8deb40d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -294,6 +294,10 @@  static const uint32_t nvme_cse_iocs_nvm[256] = {
     [NVME_CMD_COMPARE]              = NVME_CMD_EFF_CSUPP,
     [NVME_CMD_IO_MGMT_RECV]         = NVME_CMD_EFF_CSUPP,
     [NVME_CMD_IO_MGMT_SEND]         = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
+    [NVME_CMD_RESV_REGISTER]        = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_REPORT]          = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_ACQUIRE]         = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_RELEASE]         = NVME_CMD_EFF_CSUPP,
 };
 
 static const uint32_t nvme_cse_iocs_zoned[256] = {
@@ -308,6 +312,10 @@  static const uint32_t nvme_cse_iocs_zoned[256] = {
     [NVME_CMD_ZONE_APPEND]          = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
     [NVME_CMD_ZONE_MGMT_SEND]       = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
     [NVME_CMD_ZONE_MGMT_RECV]       = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_REGISTER]        = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_REPORT]          = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_ACQUIRE]         = NVME_CMD_EFF_CSUPP,
+    [NVME_CMD_RESV_RELEASE]         = NVME_CMD_EFF_CSUPP,
 };
 
 static void nvme_process_sq(void *opaque);
@@ -1745,6 +1753,7 @@  static void nvme_aio_err(NvmeRequest *req, int ret)
 
     switch (req->cmd.opcode) {
     case NVME_CMD_READ:
+    case NVME_CMD_RESV_REPORT:
         status = NVME_UNRECOVERED_READ;
         break;
     case NVME_CMD_FLUSH:
@@ -1752,6 +1761,9 @@  static void nvme_aio_err(NvmeRequest *req, int ret)
     case NVME_CMD_WRITE_ZEROES:
     case NVME_CMD_ZONE_APPEND:
     case NVME_CMD_COPY:
+    case NVME_CMD_RESV_REGISTER:
+    case NVME_CMD_RESV_ACQUIRE:
+    case NVME_CMD_RESV_RELEASE:
         status = NVME_WRITE_FAULT;
         break;
     default:
@@ -2692,6 +2704,304 @@  static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest *req)
     return NVME_NO_COMPLETE;
 }
 
+typedef struct NvmeKeyInfo {
+    uint64_t cr_key;
+    uint64_t nr_key;
+} NvmeKeyInfo;
+
+static uint16_t nvme_resv_register(NvmeCtrl *n, NvmeRequest *req)
+{
+    int ret;
+    NvmeKeyInfo key_info;
+    NvmeNamespace *ns = req->ns;
+    uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
+    bool ignore_key = cdw10 >> 3 & 0x1;
+    uint8_t action = cdw10 & 0x7;
+    uint8_t ptpl = cdw10 >> 30 & 0x3;
+    bool aptpl;
+
+    switch (ptpl) {
+    case NVME_RESV_PTPL_NO_CHANGE:
+        aptpl = (ns->id_ns.rescap & NVME_PR_CAP_PTPL) ? true : false;
+        break;
+    case NVME_RESV_PTPL_DISABLE:
+        aptpl = false;
+        break;
+    case NVME_RESV_PTPL_ENABLE:
+        aptpl = true;
+        break;
+    default:
+        return NVME_INVALID_FIELD;
+    }
+
+    ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
+    if (ret) {
+        return ret;
+    }
+
+    switch (action) {
+    case NVME_RESV_REGISTER_ACTION_REGISTER:
+        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, 0,
+                                         key_info.nr_key, 0, aptpl,
+                                         ignore_key, nvme_misc_cb,
+                                         req);
+        break;
+    case NVME_RESV_REGISTER_ACTION_UNREGISTER:
+        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key, 0,
+                                         0, aptpl, ignore_key,
+                                         nvme_misc_cb, req);
+        break;
+    case NVME_RESV_REGISTER_ACTION_REPLACE:
+        req->aiocb = blk_aio_pr_register(ns->blkconf.blk, key_info.cr_key,
+                                         key_info.nr_key, 0, aptpl, ignore_key,
+                                         nvme_misc_cb, req);
+        break;
+    default:
+        return NVME_INVALID_FIELD;
+    }
+
+    return NVME_NO_COMPLETE;
+}
+
+static uint16_t nvme_resv_release(NvmeCtrl *n, NvmeRequest *req)
+{
+    int ret;
+    uint64_t cr_key;
+    NvmeNamespace *ns = req->ns;
+    uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
+    uint8_t action = cdw10 & 0x7;
+    NvmeResvType type = cdw10 >> 8 & 0xff;
+
+    ret = nvme_h2c(n, (uint8_t *)&cr_key, sizeof(cr_key), req);
+    if (ret) {
+        return ret;
+    }
+
+    switch (action) {
+    case NVME_RESV_RELEASE_ACTION_RELEASE:
+        req->aiocb = blk_aio_pr_release(ns->blkconf.blk, cr_key,
+                                        nvme_pr_type_to_block(type),
+                                        nvme_misc_cb, req);
+        break;
+    case NVME_RESV_RELEASE_ACTION_CLEAR:
+        req->aiocb = blk_aio_pr_clear(ns->blkconf.blk, cr_key,
+                                      nvme_misc_cb, req);
+        break;
+    default:
+        return NVME_INVALID_FIELD;
+    }
+
+    return NVME_NO_COMPLETE;
+}
+
+static uint16_t nvme_resv_acquire(NvmeCtrl *n, NvmeRequest *req)
+{
+    int ret;
+    NvmeKeyInfo key_info;
+    NvmeNamespace *ns = req->ns;
+    uint32_t cdw10 = le32_to_cpu(req->cmd.cdw10);
+    uint8_t action = cdw10 & 0x7;
+    NvmeResvType type = cdw10 >> 8 & 0xff;
+
+    ret = nvme_h2c(n, (uint8_t *)&key_info, sizeof(NvmeKeyInfo), req);
+    if (ret) {
+        return ret;
+    }
+
+    switch (action) {
+    case NVME_RESV_ACQUIRE_ACTION_ACQUIRE:
+        req->aiocb = blk_aio_pr_reserve(ns->blkconf.blk, key_info.cr_key,
+                                        nvme_pr_type_to_block(type),
+                                        nvme_misc_cb, req);
+        break;
+    case NVME_RESV_ACQUIRE_ACTION_PREEMPT:
+        req->aiocb = blk_aio_pr_preempt(ns->blkconf.blk,
+                     key_info.cr_key, key_info.nr_key,
+                     nvme_pr_type_to_block(type),
+                     false, nvme_misc_cb, req);
+        break;
+    case NVME_RESV_ACQUIRE_ACTION_PREEMPT_AND_ABORT:
+        req->aiocb = blk_aio_pr_preempt(ns->blkconf.blk, key_info.cr_key,
+                                        key_info.nr_key, type, true,
+                                        nvme_misc_cb, req);
+        break;
+    default:
+        return NVME_INVALID_FIELD;
+    }
+
+    return NVME_NO_COMPLETE;
+}
+
+typedef struct NvmeResvKeys {
+    uint32_t generation;
+    uint32_t num_keys;
+    uint64_t *keys;
+    NvmeRequest *req;
+} NvmeResvKeys;
+
+typedef struct NvmeReadReservation {
+    uint32_t generation;
+    uint64_t key;
+    BlockPrType type;
+    NvmeRequest *req;
+    NvmeResvKeys *keys_info;
+} NvmeReadReservation;
+
+static int nvme_read_reservation_cb(NvmeReadReservation *reservation)
+{
+    int rc;
+    NvmeReservationStatus *nvme_status;
+    NvmeRequest *req = reservation->req;
+    NvmeCtrl *n = req->sq->ctrl;
+    NvmeResvKeys *keys_info = reservation->keys_info;
+    int len = sizeof(NvmeReservationStatusHeader) +
+              sizeof(NvmeRegisteredCtrl) * keys_info->num_keys;
+
+    nvme_status = g_malloc(len);
+    nvme_status->header.gen = reservation->generation;
+    nvme_status->header.rtype = block_pr_type_to_nvme(reservation->type);
+    nvme_status->header.regctl = keys_info->num_keys;
+    for (int i = 0; i < keys_info->num_keys; i++) {
+        nvme_status->regctl_ds[i].cntlid = nvme_ctrl(req)->cntlid;
+        nvme_status->regctl_ds[i].rkey = keys_info->keys[i];
+        nvme_status->regctl_ds[i].rcsts = keys_info->keys[i] ==
+                                          reservation->key ? 1 : 0;
+        /* hostid is not supported currently */
+        memset(&nvme_status->regctl_ds[i].hostid, 0, 8);
+    }
+
+    rc = nvme_c2h(n, (uint8_t *)nvme_status, len, req);
+    g_free(nvme_status);
+    return rc;
+}
+
+static int nvme_read_reservation_ext_cb(NvmeReadReservation *reservation)
+{
+    int rc;
+    NvmeReservationStatusExt *nvme_status_ext;
+    NvmeRequest *req = reservation->req;
+    NvmeCtrl *n = req->sq->ctrl;
+    NvmeResvKeys *keys_info = reservation->keys_info;
+    int len = sizeof(NvmeReservationStatusHeader) +
+              sizeof(uint8_t) * 40 +
+              sizeof(NvmeRegisteredCtrlExt) * keys_info->num_keys;
+
+    nvme_status_ext = g_malloc(len);
+    nvme_status_ext->header.gen = cpu_to_be32(reservation->generation);
+    nvme_status_ext->header.rtype = block_pr_type_to_nvme(reservation->type);
+    nvme_status_ext->header.regctl = cpu_to_be16(keys_info->num_keys);
+
+    for (int i = 0; i < keys_info->num_keys; i++) {
+        uint16_t ctnlid = nvme_ctrl(req)->cntlid;
+        nvme_status_ext->regctl_eds[i].cntlid = cpu_to_be16(ctnlid);
+        nvme_status_ext->regctl_eds[i].rkey = cpu_to_be64(keys_info->keys[i]);
+        nvme_status_ext->regctl_eds[i].rcsts = keys_info->keys[i] ==
+                                               reservation->key ? 1 : 0;
+        /* hostid is not supported currently */
+        memset(&nvme_status_ext->regctl_eds[i].hostid, 0, 16);
+    }
+
+    rc = nvme_c2h(n, (uint8_t *)nvme_status_ext, len, req);
+    g_free(nvme_status_ext);
+    return rc;
+}
+
+static void nvme_resv_read_reservation_cb(void *opaque, int ret)
+{
+    NvmeReadReservation *reservation = opaque;
+    NvmeRequest *req = reservation->req;
+    bool eds = req->cmd.cdw11 & 0x1;
+    NvmeResvKeys *keys_info = reservation->keys_info;
+
+    if (ret < 0) {
+        goto out;
+    }
+
+    if (eds) {
+        ret = nvme_read_reservation_ext_cb(reservation);
+    } else {
+        ret = nvme_read_reservation_cb(reservation);
+    }
+
+out:
+    g_free(keys_info->keys);
+    g_free(keys_info);
+    g_free(reservation);
+    nvme_misc_cb(req, ret);
+}
+
+static void nvme_resv_read_keys_cb(void *opaque, int ret)
+{
+    NvmeResvKeys *keys_info = opaque;
+    NvmeRequest *req = keys_info->req;
+    NvmeNamespace *ns = req->ns;
+    NvmeReadReservation *reservation;
+
+    if (ret < 0) {
+        goto out;
+    }
+
+    keys_info->num_keys = MIN(ret, keys_info->num_keys);
+    reservation = g_new0(NvmeReadReservation, 1);
+    memset(reservation, 0, sizeof(*reservation));
+    reservation->req = req;
+    reservation->keys_info = keys_info;
+
+    req->aiocb = blk_aio_pr_read_reservation(ns->blkconf.blk,
+                 &reservation->generation, &reservation->key,
+                 &reservation->type, nvme_resv_read_reservation_cb,
+                 reservation);
+    return;
+
+out:
+    g_free(keys_info->keys);
+    g_free(keys_info);
+    nvme_misc_cb(req, ret);
+}
+
+
+static uint16_t nvme_resv_report(NvmeCtrl *n, NvmeRequest *req)
+{
+    int num_keys;
+    uint32_t cdw10 = req->cmd.cdw10;
+    uint32_t cdw11 = req->cmd.cdw11;
+    int buflen = (cdw10 + 1) * sizeof(uint32_t);
+    bool eds = cdw11 & 0x1;
+    NvmeNamespace *ns = req->ns;
+    NvmeResvKeys *keys_info;
+
+    if (eds) {
+        if (buflen < sizeof(NvmeReservationStatusHeader) +
+           sizeof(uint8_t) * 40) {
+            return NVME_INVALID_FIELD;
+        }
+
+        num_keys = (buflen - sizeof(NvmeReservationStatusHeader) -
+                   sizeof(uint8_t) * 40) /
+                   sizeof(struct NvmeRegisteredCtrlExt);
+    } else {
+        if (buflen < sizeof(NvmeReservationStatusHeader)) {
+            return NVME_INVALID_FIELD;
+        }
+
+        num_keys = (buflen - sizeof(NvmeReservationStatusHeader)) /
+                   sizeof(struct NvmeRegisteredCtrl);
+    }
+
+    keys_info = g_new0(NvmeResvKeys, 1);
+    keys_info->generation = 0;
+    /* num_keys is the maximum number of keys that can be transmitted */
+    keys_info->num_keys = num_keys;
+    keys_info->keys = g_malloc(sizeof(uint64_t) * num_keys);
+    keys_info->req = req;
+
+    req->aiocb = blk_aio_pr_read_keys(ns->blkconf.blk, &keys_info->generation,
+                                      keys_info->num_keys, keys_info->keys,
+                                      nvme_resv_read_keys_cb, keys_info);
+
+    return NVME_NO_COMPLETE;
+}
+
 typedef struct NvmeCopyAIOCB {
     BlockAIOCB common;
     BlockAIOCB *aiocb;
@@ -4469,6 +4779,14 @@  static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
         return nvme_dsm(n, req);
     case NVME_CMD_VERIFY:
         return nvme_verify(n, req);
+    case NVME_CMD_RESV_REGISTER:
+        return nvme_resv_register(n, req);
+    case NVME_CMD_RESV_REPORT:
+        return nvme_resv_report(n, req);
+    case NVME_CMD_RESV_ACQUIRE:
+        return nvme_resv_acquire(n, req);
+    case NVME_CMD_RESV_RELEASE:
+        return nvme_resv_release(n, req);
     case NVME_CMD_COPY:
         return nvme_copy(n, req);
     case NVME_CMD_ZONE_MGMT_SEND:
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 6d0e456348..fdd919c2b1 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -470,6 +470,10 @@  static inline const char *nvme_io_opc_str(uint8_t opc)
     case NVME_CMD_ZONE_MGMT_SEND:   return "NVME_ZONED_CMD_MGMT_SEND";
     case NVME_CMD_ZONE_MGMT_RECV:   return "NVME_ZONED_CMD_MGMT_RECV";
     case NVME_CMD_ZONE_APPEND:      return "NVME_ZONED_CMD_ZONE_APPEND";
+    case NVME_CMD_RESV_REGISTER:    return "NVME_CMD_RESV_REGISTER";
+    case NVME_CMD_RESV_REPORT:      return "NVME_CMD_RESV_REPORT";
+    case NVME_CMD_RESV_ACQUIRE:     return "NVME_CMD_RESV_ACQUIRE";
+    case NVME_CMD_RESV_RELEASE:     return "NVME_CMD_RESV_RELEASE";
     default:                        return "NVME_NVM_CMD_UNKNOWN";
     }
 }
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 9b9eaeb3a7..2f24570d4a 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -702,6 +702,43 @@  typedef enum NVMEPrCap {
                       NVME_PR_CAP_EX_AC_AR),
 } NvmePrCap;
 
+typedef struct QEMU_PACKED NvmeRegisteredCtrl {
+    uint16_t    cntlid;
+    uint8_t     rcsts;
+    uint8_t     rsvd3[5];
+    uint8_t     hostid[8];
+    uint64_t    rkey;
+} NvmeRegisteredCtrl;
+
+typedef struct QEMU_PACKED NvmeRegisteredCtrlExt {
+    uint16_t  cntlid;
+    uint8_t   rcsts;
+    uint8_t   rsvd3[5];
+    uint64_t  rkey;
+    uint8_t   hostid[16];
+    uint8_t   rsvd32[32];
+} NvmeRegisteredCtrlExt;
+
+typedef struct QEMU_PACKED NvmeReservationStatusHeader {
+    uint32_t  gen;
+    uint8_t   rtype;
+    uint16_t  regctl;
+    uint16_t  resv5;
+    uint8_t   ptpls;
+    uint8_t   resv10[14];
+} NvmeReservationStatusHeader;
+
+typedef struct QEMU_PACKED NvmeReservationStatus {
+    struct NvmeReservationStatusHeader header;
+    struct NvmeRegisteredCtrl regctl_ds[];
+} NvmeReservationStatus;
+
+typedef struct QEMU_PACKED NvmeReservationStatusExt {
+    struct NvmeReservationStatusHeader header;
+    uint8_t   rsvd24[40];
+    struct NvmeRegisteredCtrlExt regctl_eds[];
+} NvmeReservationStatusExt;
+
 typedef struct QEMU_PACKED NvmeDeleteQ {
     uint8_t     opcode;
     uint8_t     flags;