mbox series

[blktests,v2,0/3] Test different queue counts

Message ID 20230322101648.31514-1-dwagner@suse.de (mailing list archive)
Headers show
Series Test different queue counts | expand

Message

Daniel Wagner March 22, 2023, 10:16 a.m. UTC
Setup different queues, e.g. read and poll queues.

There is still the problem that _require_nvme_trtype_is_fabrics also includes
the loop transport which has no support for different queue types.

See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/

changes:
 v2:
   - make most arguements optional for _nvme_connect_subsys
   - move variable decleration at the beging of _nvme_connect_subsy
 v1:
   - initial version

Daniel Wagner (3):
  nvme/rc: Parse optional arguments in _nvme_connect_subsys()
  nvme/rc: Add nr queue parser arguments
  nvme/047: Test different queue counts

 tests/nvme/041     |  7 +++-
 tests/nvme/042     | 10 +++--
 tests/nvme/043     | 10 +++--
 tests/nvme/044     | 23 +++++++----
 tests/nvme/045     |  6 ++-
 tests/nvme/047     | 65 +++++++++++++++++++++++++++++++
 tests/nvme/047.out |  2 +
 tests/nvme/rc      | 96 +++++++++++++++++++++++++++++++++++++++++-----
 8 files changed, 190 insertions(+), 29 deletions(-)
 create mode 100755 tests/nvme/047
 create mode 100644 tests/nvme/047.out

Comments

Shin'ichiro Kawasaki March 23, 2023, 11:06 a.m. UTC | #1
On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
> Setup different queues, e.g. read and poll queues.
> 
> There is still the problem that _require_nvme_trtype_is_fabrics also includes
> the loop transport which has no support for different queue types.
> 
> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/

Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
valuable.

I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
confirmed the hang disappears. I would like to wait for the kernel fix patch
delivered to upstream, before adding this test case to blktests master.

When I ran the test case without setting nvme_trtype, kernel reported messages
below:

[  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
[  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
[  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'

Is it useful to run the test case with default nvme_trtype=loop?
Daniel Wagner March 27, 2023, 3:41 p.m. UTC | #2
On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote:
> On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
> > Setup different queues, e.g. read and poll queues.
> > 
> > There is still the problem that _require_nvme_trtype_is_fabrics also includes
> > the loop transport which has no support for different queue types.
> > 
> > See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/
> 
> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
> valuable.
> 
> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
> confirmed the hang disappears. I would like to wait for the kernel fix patch
> delivered to upstream, before adding this test case to blktests master.

Okay makes sense.

> When I ran the test case without setting nvme_trtype, kernel reported messages
> below:
> 
> [  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> [  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> [  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'

BTW, I've added a '|| echo FAIL' to catch those.

> Is it useful to run the test case with default nvme_trtype=loop?

No, we should run this test only for those transport which actually support the
different queue types. Christoph suggest to figure out before running the test
if it is actually supported. So my first idea was to check what options are
supported by reading /dev/nvme-fabrics. But this will return all options we are
parsed by fabrics.c but not the subset which each transport might only support.

So to figure this out we would need to do a full setup just to figure out if it
is supported. I think the currently best approach would just to limit this test
to tcp and rdma. Maybe we could add something like

rc:
_require_nvme_trtype() {
	local trtype
	for trtype in "$@"; do
		if [[ "${nvme_trtype}" == "$trtype" ]]; then
			return 0
		fi
	done
	SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test")
	return 1
}

047:
requires() {
	_nvme_requires
	_have_xfs
	_have_fio
	_require_nvme_trtype tcp rdma
	_have_kver 4 21
}

What do you think?

Thanks,
Daniel
Shin'ichiro Kawasaki March 28, 2023, 8:45 a.m. UTC | #3
On Mar 27, 2023 / 17:41, Daniel Wagner wrote:
> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote:
> > On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
> > > Setup different queues, e.g. read and poll queues.
> > > 
> > > There is still the problem that _require_nvme_trtype_is_fabrics also includes
> > > the loop transport which has no support for different queue types.
> > > 
> > > See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/
> > 
> > Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
> > valuable.
> > 
> > I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
> > observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
> > confirmed the hang disappears. I would like to wait for the kernel fix patch
> > delivered to upstream, before adding this test case to blktests master.
> 
> Okay makes sense.
> 
> > When I ran the test case without setting nvme_trtype, kernel reported messages
> > below:
> > 
> > [  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> > [  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> > [  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'
> 
> BTW, I've added a '|| echo FAIL' to catch those.
> 
> > Is it useful to run the test case with default nvme_trtype=loop?
> 
> No, we should run this test only for those transport which actually support the
> different queue types. Christoph suggest to figure out before running the test
> if it is actually supported. So my first idea was to check what options are
> supported by reading /dev/nvme-fabrics. But this will return all options we are
> parsed by fabrics.c but not the subset which each transport might only support.
> 
> So to figure this out we would need to do a full setup just to figure out if it
> is supported. I think the currently best approach would just to limit this test
> to tcp and rdma. Maybe we could add something like
> 
> rc:
> _require_nvme_trtype() {
> 	local trtype
> 	for trtype in "$@"; do
> 		if [[ "${nvme_trtype}" == "$trtype" ]]; then
> 			return 0
> 		fi
> 	done
> 	SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test")
> 	return 1
> }
> 
> 047:
> requires() {
> 	_nvme_requires
> 	_have_xfs
> 	_have_fio
> 	_require_nvme_trtype tcp rdma
> 	_have_kver 4 21
> }
> 
> What do you think?

Thanks for the clarifications about the requirements. I think your approach will
work. Having said that, we may have another potentially simpler solution:

- Do not check _require_nvme_trtype and _have_kver in requires().
- After setting nr_write_queues in test(), check if dmesg contains the error
  "invalid parameter 'nr_write_queues" using the helper function
  _dmesg_since_test_start().
- If the error is reported, set SKIP_REASONS and return from test().
  Blktests will report the test case as "not run".

This approach assumes that the "invalid parameter" is printed when the test case
should be skipped (loop transport, older kernel version).

As a generic guide, SKIP_REASONS should be set in requires() before test().
However, if the SKIP_REASONS can not be checked before test(), blktests allows
to set it in test(). The test case block/030 is such an exception. I think your
new test case can be another exception. With this, we do not need to repeat the
full setup. And it might be more robust against future changes such as new
transport types.

Thoughts?
Chaitanya Kulkarni March 28, 2023, 6:20 p.m. UTC | #4
On 3/28/23 01:45, Shinichiro Kawasaki wrote:
> On Mar 27, 2023 / 17:41, Daniel Wagner wrote:
>> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote:
>>> On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
>>>> Setup different queues, e.g. read and poll queues.
>>>>
>>>> There is still the problem that _require_nvme_trtype_is_fabrics also includes
>>>> the loop transport which has no support for different queue types.
>>>>
>>>> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/
>>> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
>>> valuable.
>>>
>>> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
>>> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
>>> confirmed the hang disappears. I would like to wait for the kernel fix patch
>>> delivered to upstream, before adding this test case to blktests master.
>> Okay makes sense.
>>
>>> When I ran the test case without setting nvme_trtype, kernel reported messages
>>> below:
>>>
>>> [  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
>>> [  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
>>> [  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'
>> BTW, I've added a '|| echo FAIL' to catch those.
>>
>>> Is it useful to run the test case with default nvme_trtype=loop?
>> No, we should run this test only for those transport which actually support the
>> different queue types. Christoph suggest to figure out before running the test
>> if it is actually supported. So my first idea was to check what options are
>> supported by reading /dev/nvme-fabrics. But this will return all options we are
>> parsed by fabrics.c but not the subset which each transport might only support.
>>
>> So to figure this out we would need to do a full setup just to figure out if it
>> is supported. I think the currently best approach would just to limit this test
>> to tcp and rdma. Maybe we could add something like
>>
>> rc:
>> _require_nvme_trtype() {
>> 	local trtype
>> 	for trtype in "$@"; do
>> 		if [[ "${nvme_trtype}" == "$trtype" ]]; then
>> 			return 0
>> 		fi
>> 	done
>> 	SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test")
>> 	return 1
>> }
>>
>> 047:
>> requires() {
>> 	_nvme_requires
>> 	_have_xfs
>> 	_have_fio
>> 	_require_nvme_trtype tcp rdma
>> 	_have_kver 4 21
>> }
>>
>> What do you think?
> Thanks for the clarifications about the requirements. I think your approach will
> work. Having said that, we may have another potentially simpler solution:
>
> - Do not check _require_nvme_trtype and _have_kver in requires().
> - After setting nr_write_queues in test(), check if dmesg contains the error
>    "invalid parameter 'nr_write_queues" using the helper function
>    _dmesg_since_test_start().
> - If the error is reported, set SKIP_REASONS and return from test().
>    Blktests will report the test case as "not run".
>
> This approach assumes that the "invalid parameter" is printed when the test case
> should be skipped (loop transport, older kernel version).


Is it possible to not rely on dmesg unless it is absolutely required ?

> As a generic guide, SKIP_REASONS should be set in requires() before test().
> However, if the SKIP_REASONS can not be checked before test(), blktests allows
> to set it in test(). The test case block/030 is such an exception. I think your
> new test case can be another exception. With this, we do not need to repeat the
> full setup. And it might be more robust against future changes such as new
> transport types.

Ummm should we avoid creating exceptions ? unless it is absolutely 
necessary ?
The problem with exception is it becomes problematic for long term 
maintenance.

I believe currently focusing on tcp/rdma only is sufficient ...

-ck
Keith Busch March 28, 2023, 6:35 p.m. UTC | #5
On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote:
> Setup different queues, e.g. read and poll queues.

If you wanted to add a similar test for pci, you do it by echo'ing the desired
options to:

  /sys/modules/nvme/parameters/{poll_queues,write_queues}

Then do an 'nvme reset' on the target nvme pci device.

I'll just note that such a test will currently fail, and fixing that doesn't
look like fun. :)
Shin'ichiro Kawasaki March 29, 2023, 12:57 a.m. UTC | #6
On Mar 28, 2023 / 18:20, Chaitanya Kulkarni wrote:
> On 3/28/23 01:45, Shinichiro Kawasaki wrote:
> > On Mar 27, 2023 / 17:41, Daniel Wagner wrote:
> >> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote:
> >>> On Mar 22, 2023 / 11:16, Daniel Wagner wrote:
> >>>> Setup different queues, e.g. read and poll queues.
> >>>>
> >>>> There is still the problem that _require_nvme_trtype_is_fabrics also includes
> >>>> the loop transport which has no support for different queue types.
> >>>>
> >>>> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/
> >>> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks
> >>> valuable.
> >>>
> >>> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and
> >>> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and
> >>> confirmed the hang disappears. I would like to wait for the kernel fix patch
> >>> delivered to upstream, before adding this test case to blktests master.
> >> Okay makes sense.
> >>
> >>> When I ran the test case without setting nvme_trtype, kernel reported messages
> >>> below:
> >>>
> >>> [  199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> >>> [  201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d'
> >>> [  201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d'
> >> BTW, I've added a '|| echo FAIL' to catch those.
> >>
> >>> Is it useful to run the test case with default nvme_trtype=loop?
> >> No, we should run this test only for those transport which actually support the
> >> different queue types. Christoph suggest to figure out before running the test
> >> if it is actually supported. So my first idea was to check what options are
> >> supported by reading /dev/nvme-fabrics. But this will return all options we are
> >> parsed by fabrics.c but not the subset which each transport might only support.
> >>
> >> So to figure this out we would need to do a full setup just to figure out if it
> >> is supported. I think the currently best approach would just to limit this test
> >> to tcp and rdma. Maybe we could add something like
> >>
> >> rc:
> >> _require_nvme_trtype() {
> >> 	local trtype
> >> 	for trtype in "$@"; do
> >> 		if [[ "${nvme_trtype}" == "$trtype" ]]; then
> >> 			return 0
> >> 		fi
> >> 	done
> >> 	SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test")
> >> 	return 1
> >> }
> >>
> >> 047:
> >> requires() {
> >> 	_nvme_requires
> >> 	_have_xfs
> >> 	_have_fio
> >> 	_require_nvme_trtype tcp rdma
> >> 	_have_kver 4 21
> >> }
> >>
> >> What do you think?
> > Thanks for the clarifications about the requirements. I think your approach will
> > work. Having said that, we may have another potentially simpler solution:
> >
> > - Do not check _require_nvme_trtype and _have_kver in requires().
> > - After setting nr_write_queues in test(), check if dmesg contains the error
> >    "invalid parameter 'nr_write_queues" using the helper function
> >    _dmesg_since_test_start().
> > - If the error is reported, set SKIP_REASONS and return from test().
> >    Blktests will report the test case as "not run".
> >
> > This approach assumes that the "invalid parameter" is printed when the test case
> > should be skipped (loop transport, older kernel version).
> 
> 
> Is it possible to not rely on dmesg unless it is absolutely required ?
> 
> > As a generic guide, SKIP_REASONS should be set in requires() before test().
> > However, if the SKIP_REASONS can not be checked before test(), blktests allows
> > to set it in test(). The test case block/030 is such an exception. I think your
> > new test case can be another exception. With this, we do not need to repeat the
> > full setup. And it might be more robust against future changes such as new
> > transport types.
> 
> Ummm should we avoid creating exceptions ? unless it is absolutely 
> necessary ?
> The problem with exception is it becomes problematic for long term 
> maintenance.
> 
> I believe currently focusing on tcp/rdma only is sufficient ...

I see, then let's go with Daniel's approach. My idea could be tricky too much.
Chaitanya Kulkarni March 29, 2023, 3:30 a.m. UTC | #7
On 3/28/23 11:35, Keith Busch wrote:
> On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote:
>> Setup different queues, e.g. read and poll queues.
> If you wanted to add a similar test for pci, you do it by echo'ing the desired
> options to:
>
>    /sys/modules/nvme/parameters/{poll_queues,write_queues}
>
> Then do an 'nvme reset' on the target nvme pci device.
>
> I'll just note that such a test will currently fail, and fixing that doesn't
> look like fun. :)

then we should definitely add it ;) ha ha.

I was actually wondering about pci based on the discussion on this thread
was mainly focused on tcp and rdam, thanks for the suggestion Keith.

-ck
Daniel Wagner March 29, 2023, 6:25 a.m. UTC | #8
On Wed, Mar 29, 2023 at 03:30:17AM +0000, Chaitanya Kulkarni wrote:
> On 3/28/23 11:35, Keith Busch wrote:
> > On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote:
> >> Setup different queues, e.g. read and poll queues.
> > If you wanted to add a similar test for pci, you do it by echo'ing the desired
> > options to:
> >
> >    /sys/modules/nvme/parameters/{poll_queues,write_queues}
> >
> > Then do an 'nvme reset' on the target nvme pci device.
> >
> > I'll just note that such a test will currently fail, and fixing that doesn't
> > look like fun. :)
> 
> then we should definitely add it ;) ha ha.
> 
> I was actually wondering about pci based on the discussion on this thread
> was mainly focused on tcp and rdam, thanks for the suggestion Keith.

The test is fabric centric because we configure the queues via the 'nvme
connect' call, something we can't do for PCI. It look likes it needs another
new test for PCI. Let me get this one sorted out first though.