diff mbox series

[blktests,v1,1/3] nvme/048: Check for queue count check directly

Message ID 20230620132703.20648-2-dwagner@suse.de (mailing list archive)
State New, archived
Headers show
Series More fixes for FC enabling | expand

Commit Message

Daniel Wagner June 20, 2023, 1:27 p.m. UTC
The test monitored the state changes live -> resetting -> connecting ->
live, to figure out the queue count change was successful.

The fc transport is reconnecting very fast and the state transitions
are not observed by the current approach.

So instead trying to monitor the state changes, let's just wait for the
live state and the correct queue number.

As queue count is depending on the number of online CPUs we explicitly
use 1 and 2 for the max_queue count. This means the queue_count value
needs to reach either 2 or 3 (admin queue included).

Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

Comments

Sagi Grimberg June 20, 2023, 1:49 p.m. UTC | #1
> The test monitored the state changes live -> resetting -> connecting ->
> live, to figure out the queue count change was successful.
> 
> The fc transport is reconnecting very fast and the state transitions
> are not observed by the current approach.
> 
> So instead trying to monitor the state changes, let's just wait for the
> live state and the correct queue number.
> 
> As queue count is depending on the number of online CPUs we explicitly
> use 1 and 2 for the max_queue count. This means the queue_count value
> needs to reach either 2 or 3 (admin queue included).
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>   tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
>   1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/nvme/048 b/tests/nvme/048
> index 81084f0440c2..3dc5169132de 100755
> --- a/tests/nvme/048
> +++ b/tests/nvme/048
> @@ -42,6 +42,26 @@ nvmf_wait_for_state() {
>   	return 0
>   }
>   
> +nvmf_wait_for_queue_count() {
> +	local subsys_name="$1"
> +	local queue_count="$2"
> +	local nvmedev
> +
> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> +
> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
> +
> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +
> +	queue_count=$((queue_count + 1))
> +	if grep -q "${queue_count}" "${queue_count_file}"; then
> +		return 0
> +	fi
> +
> +	echo "expected queue count ${queue_count} not set"
> +	return 1
> +}
> +
>   set_nvmet_attr_qid_max() {
>   	local nvmet_subsystem="$1"
>   	local qid_max="$2"
> @@ -56,10 +76,7 @@ set_qid_max() {
>   	local qid_max="$3"
>   
>   	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> -
> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1

Why not simply wait for live? The connecting is obviously racy...

>   
>   	return 0
>   }
> @@ -77,12 +94,8 @@ test() {
>   
>   	_setup_nvmet
>   
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"
>   
>   	truncate -s "${nvme_img_size}" "${file_path}"
>   
> @@ -103,7 +116,7 @@ test() {
>   			echo FAIL
>   		else
>   			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
> -			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
> +			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
>   		fi
>   
>   		_nvme_disconnect_subsys "${subsys_name}"
Daniel Wagner June 20, 2023, 5:37 p.m. UTC | #2
On Tue, Jun 20, 2023 at 04:49:45PM +0300, Sagi Grimberg wrote:
> > +nvmf_wait_for_queue_count() {
> > +	local subsys_name="$1"
> > +	local queue_count="$2"
> > +	local nvmedev
> > +
> > +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> > +
> > +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
> > +
> > +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +
> > +	queue_count=$((queue_count + 1))
> > +	if grep -q "${queue_count}" "${queue_count_file}"; then
> > +		return 0
> > +	fi
> > +
> > +	echo "expected queue count ${queue_count} not set"
> > +	return 1
> > +}
> > +
> >   set_nvmet_attr_qid_max() {
> >   	local nvmet_subsystem="$1"
> >   	local qid_max="$2"
> > @@ -56,10 +76,7 @@ set_qid_max() {
> >   	local qid_max="$3"
> >   	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> > -
> > -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> > -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> > -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
> 
> Why not simply wait for live? The connecting is obviously racy...

That is what the new version is doing. It's waiting for the live state and then
checks the queue count.
Sagi Grimberg June 21, 2023, 9:50 a.m. UTC | #3
>>> +nvmf_wait_for_queue_count() {
>>> +	local subsys_name="$1"
>>> +	local queue_count="$2"
>>> +	local nvmedev
>>> +
>>> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
>>> +
>>> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
>>> +
>>> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
>>> +
>>> +	queue_count=$((queue_count + 1))
>>> +	if grep -q "${queue_count}" "${queue_count_file}"; then
>>> +		return 0
>>> +	fi
>>> +
>>> +	echo "expected queue count ${queue_count} not set"
>>> +	return 1
>>> +}
>>> +
>>>    set_nvmet_attr_qid_max() {
>>>    	local nvmet_subsystem="$1"
>>>    	local qid_max="$2"
>>> @@ -56,10 +76,7 @@ set_qid_max() {
>>>    	local qid_max="$3"
>>>    	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
>>> -
>>> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
>>> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
>>> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
>>> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
>>
>> Why not simply wait for live? The connecting is obviously racy...
> 
> That is what the new version is doing. It's waiting for the live state and then
> checks the queue count.

Maybe don't fold waiting for live into waiting for queue_count.
Daniel Wagner June 21, 2023, 11:19 a.m. UTC | #4
On Wed, Jun 21, 2023 at 12:50:29PM +0300, Sagi Grimberg wrote:
> > > Why not simply wait for live? The connecting is obviously racy...
> > 
> > That is what the new version is doing. It's waiting for the live state and then
> > checks the queue count.
> 
> Maybe don't fold waiting for live into waiting for queue_count.

Alright, will do!
Shinichiro Kawasaki June 27, 2023, 10:13 a.m. UTC | #5
On Jun 20, 2023 / 15:27, Daniel Wagner wrote:
> The test monitored the state changes live -> resetting -> connecting ->
> live, to figure out the queue count change was successful.
> 
> The fc transport is reconnecting very fast and the state transitions
> are not observed by the current approach.
> 
> So instead trying to monitor the state changes, let's just wait for the
> live state and the correct queue number.
> 
> As queue count is depending on the number of online CPUs we explicitly
> use 1 and 2 for the max_queue count. This means the queue_count value
> needs to reach either 2 or 3 (admin queue included).
> 
> Signed-off-by: Daniel Wagner <dwagner@suse.de>
> ---
>  tests/nvme/048 | 35 ++++++++++++++++++++++++-----------
>  1 file changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/tests/nvme/048 b/tests/nvme/048
> index 81084f0440c2..3dc5169132de 100755
> --- a/tests/nvme/048
> +++ b/tests/nvme/048
> @@ -42,6 +42,26 @@ nvmf_wait_for_state() {
>  	return 0
>  }
>  
> +nvmf_wait_for_queue_count() {
> +	local subsys_name="$1"
> +	local queue_count="$2"
> +	local nvmedev
> +
> +	nvmedev=$(_find_nvme_dev "${subsys_name}")
> +
> +	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"

Nit: queue count file is not declared as a local variable.

> +
> +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +
> +	queue_count=$((queue_count + 1))
> +	if grep -q "${queue_count}" "${queue_count_file}"; then

Does this check work when the number in queue_count_file has more digits than
queue_count? e.g.) queue_count_file=20, queue_count=2

> +		return 0
> +	fi
> +
> +	echo "expected queue count ${queue_count} not set"
> +	return 1
> +}
> +
>  set_nvmet_attr_qid_max() {
>  	local nvmet_subsystem="$1"
>  	local qid_max="$2"
> @@ -56,10 +76,7 @@ set_qid_max() {
>  	local qid_max="$3"
>  
>  	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
> -
> -	# Setting qid_max forces a disconnect and the reconntect attempt starts
> -	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
> -	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> +	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
>  
>  	return 0
>  }
> @@ -77,12 +94,8 @@ test() {
>  
>  	_setup_nvmet
>  
> -	hostid="$(uuidgen)"
> -	if [ -z "$hostid" ] ; then
> -		echo "uuidgen failed"
> -		return 1
> -	fi
> -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> +	hostid="${def_hostid}"
> +	hostnqn="${def_hostnqn}"

I guess it's the better to move this hunk to the 3rd patch. Or we can mention it
in the commit message of this patch.

>  
>  	truncate -s "${nvme_img_size}" "${file_path}"
>  
> @@ -103,7 +116,7 @@ test() {
>  			echo FAIL
>  		else
>  			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
> -			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
> +			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
>  		fi
>  
>  		_nvme_disconnect_subsys "${subsys_name}"
> -- 
> 2.41.0
>
Daniel Wagner June 28, 2023, 5:52 a.m. UTC | #6
On Tue, Jun 27, 2023 at 10:13:48AM +0000, Shinichiro Kawasaki wrote:
> > +	nvmf_wait_for_state "${subsys_name}" "live" || return 1
> > +
> > +	queue_count=$((queue_count + 1))
> > +	if grep -q "${queue_count}" "${queue_count_file}"; then
> 
> Does this check work when the number in queue_count_file has more digits than
> queue_count? e.g.) queue_count_file=20, queue_count=2

The idea is that it should be an exact match. Let me figure out if this does
what it is supposed to do.

 > -	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
> > +	hostid="${def_hostid}"
> > +	hostnqn="${def_hostnqn}"
> 
> I guess it's the better to move this hunk to the 3rd patch. Or we can mention it
> in the commit message of this patch.

Good point, I'll move this to the 3rd patch so this change is in one patch.
diff mbox series

Patch

diff --git a/tests/nvme/048 b/tests/nvme/048
index 81084f0440c2..3dc5169132de 100755
--- a/tests/nvme/048
+++ b/tests/nvme/048
@@ -42,6 +42,26 @@  nvmf_wait_for_state() {
 	return 0
 }
 
+nvmf_wait_for_queue_count() {
+	local subsys_name="$1"
+	local queue_count="$2"
+	local nvmedev
+
+	nvmedev=$(_find_nvme_dev "${subsys_name}")
+
+	queue_count_file="/sys/class/nvme-fabrics/ctl/${nvmedev}/queue_count"
+
+	nvmf_wait_for_state "${subsys_name}" "live" || return 1
+
+	queue_count=$((queue_count + 1))
+	if grep -q "${queue_count}" "${queue_count_file}"; then
+		return 0
+	fi
+
+	echo "expected queue count ${queue_count} not set"
+	return 1
+}
+
 set_nvmet_attr_qid_max() {
 	local nvmet_subsystem="$1"
 	local qid_max="$2"
@@ -56,10 +76,7 @@  set_qid_max() {
 	local qid_max="$3"
 
 	set_nvmet_attr_qid_max "${subsys_name}" "${qid_max}"
-
-	# Setting qid_max forces a disconnect and the reconntect attempt starts
-	nvmf_wait_for_state "${subsys_name}" "connecting" || return 1
-	nvmf_wait_for_state "${subsys_name}" "live" || return 1
+	nvmf_wait_for_queue_count "${subsys_name}" "${qid_max}" || return 1
 
 	return 0
 }
@@ -77,12 +94,8 @@  test() {
 
 	_setup_nvmet
 
-	hostid="$(uuidgen)"
-	if [ -z "$hostid" ] ; then
-		echo "uuidgen failed"
-		return 1
-	fi
-	hostnqn="nqn.2014-08.org.nvmexpress:uuid:${hostid}"
+	hostid="${def_hostid}"
+	hostnqn="${def_hostnqn}"
 
 	truncate -s "${nvme_img_size}" "${file_path}"
 
@@ -103,7 +116,7 @@  test() {
 			echo FAIL
 		else
 			set_qid_max "${port}" "${subsys_name}" 1 || echo FAIL
-			set_qid_max "${port}" "${subsys_name}" 128 || echo FAIL
+			set_qid_max "${port}" "${subsys_name}" 2 || echo FAIL
 		fi
 
 		_nvme_disconnect_subsys "${subsys_name}"