Message ID | 20230322101648.31514-1-dwagner@suse.de (mailing list archive) |
---|---|
Headers | show |
Series | Test different queue counts | expand |
On Mar 22, 2023 / 11:16, Daniel Wagner wrote: > Setup different queues, e.g. read and poll queues. > > There is still the problem that _require_nvme_trtype_is_fabrics also includes > the loop transport which has no support for different queue types. > > See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/ Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks valuable. I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and confirmed the hang disappears. I would like to wait for the kernel fix patch delivered to upstream, before adding this test case to blktests master. When I ran the test case without setting nvme_trtype, kernel reported messages below: [ 199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d' [ 201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d' [ 201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d' Is it useful to run the test case with default nvme_trtype=loop?
On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote: > On Mar 22, 2023 / 11:16, Daniel Wagner wrote: > > Setup different queues, e.g. read and poll queues. > > > > There is still the problem that _require_nvme_trtype_is_fabrics also includes > > the loop transport which has no support for different queue types. > > > > See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/ > > Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks > valuable. > > I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and > observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and > confirmed the hang disappears. I would like to wait for the kernel fix patch > delivered to upstream, before adding this test case to blktests master. Okay makes sense. > When I ran the test case without setting nvme_trtype, kernel reported messages > below: > > [ 199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > [ 201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > [ 201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d' BTW, I've added a '|| echo FAIL' to catch those. > Is it useful to run the test case with default nvme_trtype=loop? No, we should run this test only for those transport which actually support the different queue types. Christoph suggest to figure out before running the test if it is actually supported. So my first idea was to check what options are supported by reading /dev/nvme-fabrics. But this will return all options we are parsed by fabrics.c but not the subset which each transport might only support. So to figure this out we would need to do a full setup just to figure out if it is supported. I think the currently best approach would just to limit this test to tcp and rdma. Maybe we could add something like rc: _require_nvme_trtype() { local trtype for trtype in "$@"; do if [[ "${nvme_trtype}" == "$trtype" ]]; then return 0 fi done SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test") return 1 } 047: requires() { _nvme_requires _have_xfs _have_fio _require_nvme_trtype tcp rdma _have_kver 4 21 } What do you think? Thanks, Daniel
On Mar 27, 2023 / 17:41, Daniel Wagner wrote: > On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote: > > On Mar 22, 2023 / 11:16, Daniel Wagner wrote: > > > Setup different queues, e.g. read and poll queues. > > > > > > There is still the problem that _require_nvme_trtype_is_fabrics also includes > > > the loop transport which has no support for different queue types. > > > > > > See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/ > > > > Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks > > valuable. > > > > I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and > > observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and > > confirmed the hang disappears. I would like to wait for the kernel fix patch > > delivered to upstream, before adding this test case to blktests master. > > Okay makes sense. > > > When I ran the test case without setting nvme_trtype, kernel reported messages > > below: > > > > [ 199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > > [ 201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > > [ 201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d' > > BTW, I've added a '|| echo FAIL' to catch those. > > > Is it useful to run the test case with default nvme_trtype=loop? > > No, we should run this test only for those transport which actually support the > different queue types. Christoph suggest to figure out before running the test > if it is actually supported. So my first idea was to check what options are > supported by reading /dev/nvme-fabrics. But this will return all options we are > parsed by fabrics.c but not the subset which each transport might only support. > > So to figure this out we would need to do a full setup just to figure out if it > is supported. I think the currently best approach would just to limit this test > to tcp and rdma. Maybe we could add something like > > rc: > _require_nvme_trtype() { > local trtype > for trtype in "$@"; do > if [[ "${nvme_trtype}" == "$trtype" ]]; then > return 0 > fi > done > SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test") > return 1 > } > > 047: > requires() { > _nvme_requires > _have_xfs > _have_fio > _require_nvme_trtype tcp rdma > _have_kver 4 21 > } > > What do you think? Thanks for the clarifications about the requirements. I think your approach will work. Having said that, we may have another potentially simpler solution: - Do not check _require_nvme_trtype and _have_kver in requires(). - After setting nr_write_queues in test(), check if dmesg contains the error "invalid parameter 'nr_write_queues" using the helper function _dmesg_since_test_start(). - If the error is reported, set SKIP_REASONS and return from test(). Blktests will report the test case as "not run". This approach assumes that the "invalid parameter" is printed when the test case should be skipped (loop transport, older kernel version). As a generic guide, SKIP_REASONS should be set in requires() before test(). However, if the SKIP_REASONS can not be checked before test(), blktests allows to set it in test(). The test case block/030 is such an exception. I think your new test case can be another exception. With this, we do not need to repeat the full setup. And it might be more robust against future changes such as new transport types. Thoughts?
On 3/28/23 01:45, Shinichiro Kawasaki wrote: > On Mar 27, 2023 / 17:41, Daniel Wagner wrote: >> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote: >>> On Mar 22, 2023 / 11:16, Daniel Wagner wrote: >>>> Setup different queues, e.g. read and poll queues. >>>> >>>> There is still the problem that _require_nvme_trtype_is_fabrics also includes >>>> the loop transport which has no support for different queue types. >>>> >>>> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/ >>> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks >>> valuable. >>> >>> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and >>> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and >>> confirmed the hang disappears. I would like to wait for the kernel fix patch >>> delivered to upstream, before adding this test case to blktests master. >> Okay makes sense. >> >>> When I ran the test case without setting nvme_trtype, kernel reported messages >>> below: >>> >>> [ 199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d' >>> [ 201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d' >>> [ 201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d' >> BTW, I've added a '|| echo FAIL' to catch those. >> >>> Is it useful to run the test case with default nvme_trtype=loop? >> No, we should run this test only for those transport which actually support the >> different queue types. Christoph suggest to figure out before running the test >> if it is actually supported. So my first idea was to check what options are >> supported by reading /dev/nvme-fabrics. But this will return all options we are >> parsed by fabrics.c but not the subset which each transport might only support. >> >> So to figure this out we would need to do a full setup just to figure out if it >> is supported. I think the currently best approach would just to limit this test >> to tcp and rdma. Maybe we could add something like >> >> rc: >> _require_nvme_trtype() { >> local trtype >> for trtype in "$@"; do >> if [[ "${nvme_trtype}" == "$trtype" ]]; then >> return 0 >> fi >> done >> SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test") >> return 1 >> } >> >> 047: >> requires() { >> _nvme_requires >> _have_xfs >> _have_fio >> _require_nvme_trtype tcp rdma >> _have_kver 4 21 >> } >> >> What do you think? > Thanks for the clarifications about the requirements. I think your approach will > work. Having said that, we may have another potentially simpler solution: > > - Do not check _require_nvme_trtype and _have_kver in requires(). > - After setting nr_write_queues in test(), check if dmesg contains the error > "invalid parameter 'nr_write_queues" using the helper function > _dmesg_since_test_start(). > - If the error is reported, set SKIP_REASONS and return from test(). > Blktests will report the test case as "not run". > > This approach assumes that the "invalid parameter" is printed when the test case > should be skipped (loop transport, older kernel version). Is it possible to not rely on dmesg unless it is absolutely required ? > As a generic guide, SKIP_REASONS should be set in requires() before test(). > However, if the SKIP_REASONS can not be checked before test(), blktests allows > to set it in test(). The test case block/030 is such an exception. I think your > new test case can be another exception. With this, we do not need to repeat the > full setup. And it might be more robust against future changes such as new > transport types. Ummm should we avoid creating exceptions ? unless it is absolutely necessary ? The problem with exception is it becomes problematic for long term maintenance. I believe currently focusing on tcp/rdma only is sufficient ... -ck
On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote:
> Setup different queues, e.g. read and poll queues.
If you wanted to add a similar test for pci, you do it by echo'ing the desired
options to:
/sys/modules/nvme/parameters/{poll_queues,write_queues}
Then do an 'nvme reset' on the target nvme pci device.
I'll just note that such a test will currently fail, and fixing that doesn't
look like fun. :)
On Mar 28, 2023 / 18:20, Chaitanya Kulkarni wrote: > On 3/28/23 01:45, Shinichiro Kawasaki wrote: > > On Mar 27, 2023 / 17:41, Daniel Wagner wrote: > >> On Thu, Mar 23, 2023 at 11:06:53AM +0000, Shinichiro Kawasaki wrote: > >>> On Mar 22, 2023 / 11:16, Daniel Wagner wrote: > >>>> Setup different queues, e.g. read and poll queues. > >>>> > >>>> There is still the problem that _require_nvme_trtype_is_fabrics also includes > >>>> the loop transport which has no support for different queue types. > >>>> > >>>> See also https://lore.kernel.org/linux-nvme/20230322002350.4038048-1-kbusch@meta.com/ > >>> Hi Daniel, thanks for the patches. The new test case catches some bugs. Looks > >>> valuable. > >>> > >>> I ran the test case using various nvme_trtype on kernel v6.2 and v6.3-rc3, and > >>> observed hangs. I applied the 3rd patch in the link above on top of v6.3-rc3 and > >>> confirmed the hang disappears. I would like to wait for the kernel fix patch > >>> delivered to upstream, before adding this test case to blktests master. > >> Okay makes sense. > >> > >>> When I ran the test case without setting nvme_trtype, kernel reported messages > >>> below: > >>> > >>> [ 199.621431][ T1001] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > >>> [ 201.271200][ T1030] nvme_fabrics: invalid parameter 'nr_write_queues=%d' > >>> [ 201.272155][ T1030] nvme_fabrics: invalid parameter 'nr_poll_queues=%d' > >> BTW, I've added a '|| echo FAIL' to catch those. > >> > >>> Is it useful to run the test case with default nvme_trtype=loop? > >> No, we should run this test only for those transport which actually support the > >> different queue types. Christoph suggest to figure out before running the test > >> if it is actually supported. So my first idea was to check what options are > >> supported by reading /dev/nvme-fabrics. But this will return all options we are > >> parsed by fabrics.c but not the subset which each transport might only support. > >> > >> So to figure this out we would need to do a full setup just to figure out if it > >> is supported. I think the currently best approach would just to limit this test > >> to tcp and rdma. Maybe we could add something like > >> > >> rc: > >> _require_nvme_trtype() { > >> local trtype > >> for trtype in "$@"; do > >> if [[ "${nvme_trtype}" == "$trtype" ]]; then > >> return 0 > >> fi > >> done > >> SKIP_REASONS+=("nvme_trtype=${nvme_trtype} is not supported in this test") > >> return 1 > >> } > >> > >> 047: > >> requires() { > >> _nvme_requires > >> _have_xfs > >> _have_fio > >> _require_nvme_trtype tcp rdma > >> _have_kver 4 21 > >> } > >> > >> What do you think? > > Thanks for the clarifications about the requirements. I think your approach will > > work. Having said that, we may have another potentially simpler solution: > > > > - Do not check _require_nvme_trtype and _have_kver in requires(). > > - After setting nr_write_queues in test(), check if dmesg contains the error > > "invalid parameter 'nr_write_queues" using the helper function > > _dmesg_since_test_start(). > > - If the error is reported, set SKIP_REASONS and return from test(). > > Blktests will report the test case as "not run". > > > > This approach assumes that the "invalid parameter" is printed when the test case > > should be skipped (loop transport, older kernel version). > > > Is it possible to not rely on dmesg unless it is absolutely required ? > > > As a generic guide, SKIP_REASONS should be set in requires() before test(). > > However, if the SKIP_REASONS can not be checked before test(), blktests allows > > to set it in test(). The test case block/030 is such an exception. I think your > > new test case can be another exception. With this, we do not need to repeat the > > full setup. And it might be more robust against future changes such as new > > transport types. > > Ummm should we avoid creating exceptions ? unless it is absolutely > necessary ? > The problem with exception is it becomes problematic for long term > maintenance. > > I believe currently focusing on tcp/rdma only is sufficient ... I see, then let's go with Daniel's approach. My idea could be tricky too much.
On 3/28/23 11:35, Keith Busch wrote: > On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote: >> Setup different queues, e.g. read and poll queues. > If you wanted to add a similar test for pci, you do it by echo'ing the desired > options to: > > /sys/modules/nvme/parameters/{poll_queues,write_queues} > > Then do an 'nvme reset' on the target nvme pci device. > > I'll just note that such a test will currently fail, and fixing that doesn't > look like fun. :) then we should definitely add it ;) ha ha. I was actually wondering about pci based on the discussion on this thread was mainly focused on tcp and rdam, thanks for the suggestion Keith. -ck
On Wed, Mar 29, 2023 at 03:30:17AM +0000, Chaitanya Kulkarni wrote: > On 3/28/23 11:35, Keith Busch wrote: > > On Wed, Mar 22, 2023 at 11:16:45AM +0100, Daniel Wagner wrote: > >> Setup different queues, e.g. read and poll queues. > > If you wanted to add a similar test for pci, you do it by echo'ing the desired > > options to: > > > > /sys/modules/nvme/parameters/{poll_queues,write_queues} > > > > Then do an 'nvme reset' on the target nvme pci device. > > > > I'll just note that such a test will currently fail, and fixing that doesn't > > look like fun. :) > > then we should definitely add it ;) ha ha. > > I was actually wondering about pci based on the discussion on this thread > was mainly focused on tcp and rdam, thanks for the suggestion Keith. The test is fabric centric because we configure the queues via the 'nvme connect' call, something we can't do for PCI. It look likes it needs another new test for PCI. Let me get this one sorted out first though.