Message ID | 20230620132703.20648-2-dwagner@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | More fixes for FC enabling | expand |
> The test monitored the state changes live -> resetting -> connecting -> > live, to figure out the queue count change was successful. > > The fc transport is reconnecting very fast and the state transitions > are not observed by the current approach. > > So instead trying to monitor the state changes, let's just wait for the > live state and the correct queue number. > > As queue count is depending on the number of online CPUs we explicitly > use 1 and 2 for the max_queue count. This means the queue_count value > needs to reach either 2 or 3 (admin queue included). > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > tests/nvme/048 | 35 ++++++++++++++++++++++++----------- > 1 file changed, 24 insertions(+), 11 deletions(-) > > diff --git a/tests/nvme/048 b/tests/nvme/048 > index 81084f0440c2..3dc5169132de 100755 > --- a/tests/nvme/048 > +++ b/tests/nvme/048 > @@ -42,6 +42,26 @@ nvmf_wait_for_state() { > return 0 > } > > +nvmf_wait_for_queue_count() { > + local subsys_name="$1" > + local queue_count="$2" > + local nvmedev > + > + nvmedev=$(_find_nvme_dev "${subsys_name}") > + > + queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count" > + > + nvmf_wait_for_state "${subsys_name}" "live" || return 1 > + > + queue_count=$((queue_count + 1)) > + if grep -q "${queue_count}" "${queue_count_file}"; then > + return 0 > + fi > + > + echo "expected queue count ${queue_count} not set" > + return 1 > +} > + > set_nvmet_attr_qid_max() { > local nvmet_subsystem="$1" > local qid_max="$2" > @@ -56,10 +76,7 @@ set_qid_max() { > local qid_max="$3" > > set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}" > - > - # Setting qid_max forces a disconnect and the reconntect attempt starts > - nvmf_wait_for_state "${subsys_name}" "connecting" || return 1 > - nvmf_wait_for_state "${subsys_name}" "live" || return 1 > + nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1 Why not simply wait for live? The connecting is obviously racy... > > return 0 > } > @@ -77,12 +94,8 @@ test() { > > _setup_nvmet > > - hostid="$(uuidgen)" > - if [ -z "$hostid" ] ; then > - echo "uuidgen failed" > - return 1 > - fi > - hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}" > + hostid="${def_hostid}" > + hostnqn="${def_hostnqn}" > > truncate -s "${nvme_img_size}" "${file_path}" > > @@ -103,7 +116,7 @@ test() { > echo FAIL > else > set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL > - set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL > + set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL > fi > > _nvme_disconnect_subsys "${subsys_name}"
On Tue, Jun 20, 2023 at 04:49:45PM +0300, Sagi Grimberg wrote: > > +nvmf_wait_for_queue_count() { > > + local subsys_name="$1" > > + local queue_count="$2" > > + local nvmedev > > + > > + nvmedev=$(_find_nvme_dev "${subsys_name}") > > + > > + queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count" > > + > > + nvmf_wait_for_state "${subsys_name}" "live" || return 1 > > + > > + queue_count=$((queue_count + 1)) > > + if grep -q "${queue_count}" "${queue_count_file}"; then > > + return 0 > > + fi > > + > > + echo "expected queue count ${queue_count} not set" > > + return 1 > > +} > > + > > set_nvmet_attr_qid_max() { > > local nvmet_subsystem="$1" > > local qid_max="$2" > > @@ -56,10 +76,7 @@ set_qid_max() { > > local qid_max="$3" > > set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}" > > - > > - # Setting qid_max forces a disconnect and the reconntect attempt starts > > - nvmf_wait_for_state "${subsys_name}" "connecting" || return 1 > > - nvmf_wait_for_state "${subsys_name}" "live" || return 1 > > + nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1 > > Why not simply wait for live? The connecting is obviously racy... That is what the new version is doing. It's waiting for the live state and then checks the queue count.
>>> +nvmf_wait_for_queue_count() { >>> + local subsys_name="$1" >>> + local queue_count="$2" >>> + local nvmedev >>> + >>> + nvmedev=$(_find_nvme_dev "${subsys_name}") >>> + >>> + queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count" >>> + >>> + nvmf_wait_for_state "${subsys_name}" "live" || return 1 >>> + >>> + queue_count=$((queue_count + 1)) >>> + if grep -q "${queue_count}" "${queue_count_file}"; then >>> + return 0 >>> + fi >>> + >>> + echo "expected queue count ${queue_count} not set" >>> + return 1 >>> +} >>> + >>> set_nvmet_attr_qid_max() { >>> local nvmet_subsystem="$1" >>> local qid_max="$2" >>> @@ -56,10 +76,7 @@ set_qid_max() { >>> local qid_max="$3" >>> set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}" >>> - >>> - # Setting qid_max forces a disconnect and the reconntect attempt starts >>> - nvmf_wait_for_state "${subsys_name}" "connecting" || return 1 >>> - nvmf_wait_for_state "${subsys_name}" "live" || return 1 >>> + nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1 >> >> Why not simply wait for live? The connecting is obviously racy... > > That is what the new version is doing. It's waiting for the live state and then > checks the queue count. Maybe don't fold waiting for live into waiting for queue_count.
On Wed, Jun 21, 2023 at 12:50:29PM +0300, Sagi Grimberg wrote: > > > Why not simply wait for live? The connecting is obviously racy... > > > > That is what the new version is doing. It's waiting for the live state and then > > checks the queue count. > > Maybe don't fold waiting for live into waiting for queue_count. Alright, will do!
On Jun 20, 2023 / 15:27, Daniel Wagner wrote: > The test monitored the state changes live -> resetting -> connecting -> > live, to figure out the queue count change was successful. > > The fc transport is reconnecting very fast and the state transitions > are not observed by the current approach. > > So instead trying to monitor the state changes, let's just wait for the > live state and the correct queue number. > > As queue count is depending on the number of online CPUs we explicitly > use 1 and 2 for the max_queue count. This means the queue_count value > needs to reach either 2 or 3 (admin queue included). > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > tests/nvme/048 | 35 ++++++++++++++++++++++++----------- > 1 file changed, 24 insertions(+), 11 deletions(-) > > diff --git a/tests/nvme/048 b/tests/nvme/048 > index 81084f0440c2..3dc5169132de 100755 > --- a/tests/nvme/048 > +++ b/tests/nvme/048 > @@ -42,6 +42,26 @@ nvmf_wait_for_state() { > return 0 > } > > +nvmf_wait_for_queue_count() { > + local subsys_name="$1" > + local queue_count="$2" > + local nvmedev > + > + nvmedev=$(_find_nvme_dev "${subsys_name}") > + > + queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count" Nit: queue count file is not declared as a local variable. > + > + nvmf_wait_for_state "${subsys_name}" "live" || return 1 > + > + queue_count=$((queue_count + 1)) > + if grep -q "${queue_count}" "${queue_count_file}"; then Does this check work when the number in queue_count_file has more digits than queue_count? e.g.) queue_count_file=20, queue_count=2 > + return 0 > + fi > + > + echo "expected queue count ${queue_count} not set" > + return 1 > +} > + > set_nvmet_attr_qid_max() { > local nvmet_subsystem="$1" > local qid_max="$2" > @@ -56,10 +76,7 @@ set_qid_max() { > local qid_max="$3" > > set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}" > - > - # Setting qid_max forces a disconnect and the reconntect attempt starts > - nvmf_wait_for_state "${subsys_name}" "connecting" || return 1 > - nvmf_wait_for_state "${subsys_name}" "live" || return 1 > + nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1 > > return 0 > } > @@ -77,12 +94,8 @@ test() { > > _setup_nvmet > > - hostid="$(uuidgen)" > - if [ -z "$hostid" ] ; then > - echo "uuidgen failed" > - return 1 > - fi > - hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}" > + hostid="${def_hostid}" > + hostnqn="${def_hostnqn}" I guess it's the better to move this hunk to the 3rd patch. Or we can mention it in the commit message of this patch. > > truncate -s "${nvme_img_size}" "${file_path}" > > @@ -103,7 +116,7 @@ test() { > echo FAIL > else > set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL > - set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL > + set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL > fi > > _nvme_disconnect_subsys "${subsys_name}" > -- > 2.41.0 >
On Tue, Jun 27, 2023 at 10:13:48AM +0000, Shinichiro Kawasaki wrote: > > + nvmf_wait_for_state "${subsys_name}" "live" || return 1 > > + > > + queue_count=$((queue_count + 1)) > > + if grep -q "${queue_count}" "${queue_count_file}"; then > > Does this check work when the number in queue_count_file has more digits than > queue_count? e.g.) queue_count_file=20, queue_count=2 The idea is that it should be an exact match. Let me figure out if this does what it is supposed to do. > - hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}" > > + hostid="${def_hostid}" > > + hostnqn="${def_hostnqn}" > > I guess it's the better to move this hunk to the 3rd patch. Or we can mention it > in the commit message of this patch. Good point, I'll move this to the 3rd patch so this change is in one patch.
diff --git a/tests/nvme/048 b/tests/nvme/048 index 81084f0440c2..3dc5169132de 100755 --- a/tests/nvme/048 +++ b/tests/nvme/048 @@ -42,6 +42,26 @@ nvmf_wait_for_state() { return 0 } +nvmf_wait_for_queue_count() { + local subsys_name="$1" + local queue_count="$2" + local nvmedev + + nvmedev=$(_find_nvme_dev "${subsys_name}") + + queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count" + + nvmf_wait_for_state "${subsys_name}" "live" || return 1 + + queue_count=$((queue_count + 1)) + if grep -q "${queue_count}" "${queue_count_file}"; then + return 0 + fi + + echo "expected queue count ${queue_count} not set" + return 1 +} + set_nvmet_attr_qid_max() { local nvmet_subsystem="$1" local qid_max="$2" @@ -56,10 +76,7 @@ set_qid_max() { local qid_max="$3" set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}" - - # Setting qid_max forces a disconnect and the reconntect attempt starts - nvmf_wait_for_state "${subsys_name}" "connecting" || return 1 - nvmf_wait_for_state "${subsys_name}" "live" || return 1 + nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1 return 0 } @@ -77,12 +94,8 @@ test() { _setup_nvmet - hostid="$(uuidgen)" - if [ -z "$hostid" ] ; then - echo "uuidgen failed" - return 1 - fi - hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}" + hostid="${def_hostid}" + hostnqn="${def_hostnqn}" truncate -s "${nvme_img_size}" "${file_path}" @@ -103,7 +116,7 @@ test() { echo FAIL else set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL - set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL + set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL fi _nvme_disconnect_subsys "${subsys_name}"
The test monitored the state changes live -> resetting -> connecting -> live, to figure out the queue count change was successful. The fc transport is reconnecting very fast and the state transitions are not observed by the current approach. So instead trying to monitor the state changes, let's just wait for the live state and the correct queue number. As queue count is depending on the number of online CPUs we explicitly use 1 and 2 for the max_queue count. This means the queue_count value needs to reach either 2 or 3 (admin queue included). Signed-off-by: Daniel Wagner <dwagner@suse.de> --- tests/nvme/048 | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)