diff mbox series

[v2] KVM: selftests: Fix target thread to be migrated in rseq_test

Message ID 20220716144537.3436743-1-gshan@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: selftests: Fix target thread to be migrated in rseq_test | expand

Commit Message

Gavin Shan July 16, 2022, 2:45 p.m. UTC
In rseq_test, there are two threads, which are thread group leader
and migration worker. The migration worker relies on sched_setaffinity()
to force migration on the thread group leader. Unfortunately, we
have wrong parameter (0) passed to sched_getaffinity(). It's actually
forcing migration on the migration worker instead of the thread group
leader. It also means migration can happen on the thread group leader
at any time, which eventually leads to failure as the following logs
show.

  host# uname -r
  5.19.0-rc6-gavin+
  host# # cat /proc/cpuinfo | grep processor | tail -n 1
  processor    : 223
  host# pwd
  /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
  host# for i in `seq 1 100`;                \
        do echo "--------> $i"; ./rseq_test; done
  --------> 1
  --------> 2
  --------> 3
  --------> 4
  --------> 5
  --------> 6
  ==== Test Assertion Failure ====
    rseq_test.c:265: rseq_cpu == cpu
    pid=3925 tid=3925 errno=4 - Interrupted system call
       1  0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
       2  0x0000ffffb044affb: ?? ??:0
       3  0x0000ffffb044b0c7: ?? ??:0
       4  0x0000000000401a6f: _start at ??:?
    rseq CPU = 4, sched CPU = 27

This fixes the issue by passing correct parameter, tid of the group
thread leader, to sched_setaffinity().

Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/rseq_test.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Oliver Upton July 16, 2022, 9:48 p.m. UTC | #1
Hi Gavin,

Thanks for cleaning this up.

July 16, 2022 7:45 AM, "Gavin Shan" <gshan@redhat.com> wrote:
> In rseq_test, there are two threads, which are thread group leader
> and migration worker. The migration worker relies on sched_setaffinity()
> to force migration on the thread group leader.

It may be clearer to describe it as a vCPU thread and a migration worker
thread. The meat of this test is to catch a regression in KVM.

> Unfortunately, we have

s/we have/the test has the/

> wrong parameter (0) passed to sched_getaffinity().

wrong PID

> It's actually
> forcing migration on the migration worker instead of the thread group
> leader.

What's missing is _why_ the migration worker is getting moved around by
the call. Perhaps instead it is better to state what a PID of 0 implies,
for those of us who haven't read their manpages in a while ;-)

> It also means migration can happen on the thread group leader
> at any time, which eventually leads to failure as the following logs
> show.
> 
> host# uname -r
> 5.19.0-rc6-gavin+
> host# # cat /proc/cpuinfo | grep processor | tail -n 1
> processor : 223
> host# pwd
> /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
> host# for i in `seq 1 100`; \
> do echo "--------> $i"; ./rseq_test; done
> --------> 1
> --------> 2
> --------> 3
> --------> 4
> --------> 5
> --------> 6
> ==== Test Assertion Failure ====
> rseq_test.c:265: rseq_cpu == cpu
> pid=3925 tid=3925 errno=4 - Interrupted system call
> 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
> 2 0x0000ffffb044affb: ?? ??:0
> 3 0x0000ffffb044b0c7: ?? ??:0
> 4 0x0000000000401a6f: _start at ??:?
> rseq CPU = 4, sched CPU = 27
> 
> This fixes the issue by passing correct parameter, tid of the group
> thread leader, to sched_setaffinity().

Kernel commit messages should have an imperative tone:

Fix the issue by ...

> Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
> Signed-off-by: Gavin Shan <gshan@redhat.com>

With the comments on the commit message addressed:

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Gavin Shan July 17, 2022, 3:11 a.m. UTC | #2
Hi Oliver,

On 7/17/22 7:48 AM, oliver.upton@linux.dev wrote:
> 
> Thanks for cleaning this up.
> 

Thanks for your review.

> July 16, 2022 7:45 AM, "Gavin Shan" <gshan@redhat.com> wrote:
>> In rseq_test, there are two threads, which are thread group leader
>> and migration worker. The migration worker relies on sched_setaffinity()
>> to force migration on the thread group leader.
> 
> It may be clearer to describe it as a vCPU thread and a migration worker
> thread. The meat of this test is to catch a regression in KVM.
> 
>> Unfortunately, we have
> 
> s/we have/the test has the/
> 
>> wrong parameter (0) passed to sched_getaffinity().
> 
> wrong PID
> 

Yep, it's much clearer to describe it as vCPU thread and migration worker.

>> It's actually
>> forcing migration on the migration worker instead of the thread group
>> leader.
> 
> What's missing is _why_ the migration worker is getting moved around by
> the call. Perhaps instead it is better to state what a PID of 0 implies,
> for those of us who haven't read their manpages in a while ;-)
> 

Yes, it's good idea. I will have something like below in next revision :)

     In rseq_test, there are two threads, which are vCPU thread and migration
     worker separately. Unfortunately, the test has the wrong PID passed to
     sched_setaffinity() in the migration worker. It forces migration on the
     migration worker because zeroed PID represents the calling thread, which
     is the migration worker itself. It means the vCPU thread is never enforced
     to migration and it can migrate at any time, which eventually leads to
     failure as the following logs show.
         :
         :
     Fix the issue by passing correct parameter, TID of the vCPU thread, to
     sched_setaffinity() in the migration worker.


>> It also means migration can happen on the thread group leader
>> at any time, which eventually leads to failure as the following logs
>> show.
>>
>> host# uname -r
>> 5.19.0-rc6-gavin+
>> host# # cat /proc/cpuinfo | grep processor | tail -n 1
>> processor : 223
>> host# pwd
>> /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
>> host# for i in `seq 1 100`; \
>> do echo "--------> $i"; ./rseq_test; done
>> --------> 1
>> --------> 2
>> --------> 3
>> --------> 4
>> --------> 5
>> --------> 6
>> ==== Test Assertion Failure ====
>> rseq_test.c:265: rseq_cpu == cpu
>> pid=3925 tid=3925 errno=4 - Interrupted system call
>> 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
>> 2 0x0000ffffb044affb: ?? ??:0
>> 3 0x0000ffffb044b0c7: ?? ??:0
>> 4 0x0000000000401a6f: _start at ??:?
>> rseq CPU = 4, sched CPU = 27
>>
>> This fixes the issue by passing correct parameter, tid of the group
>> thread leader, to sched_setaffinity().
> 
> Kernel commit messages should have an imperative tone:
> 
> Fix the issue by ...
> 

Ok. I've been having my style for long time. Actually, the style was
shared by some one when I worked for IBM long time ago. I will bear
it in mind to use imperative expression since now on :)

All your comments will be fixed in next revision, but I would delay
the posting a bit to see Sean or Paolo have more comments. In that
case, I can fix all of them at once.

>> Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> With the comments on the commit message addressed:
> 
> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
> 

Thanks again for your time on this.

Thanks,
Gavin
Gavin Shan July 19, 2022, 1:40 a.m. UTC | #3
On 7/17/22 1:11 PM, Gavin Shan wrote:
> On 7/17/22 7:48 AM, oliver.upton@linux.dev wrote:
>> July 16, 2022 7:45 AM, "Gavin Shan" <gshan@redhat.com> wrote:
>>> In rseq_test, there are two threads, which are thread group leader
>>> and migration worker. The migration worker relies on sched_setaffinity()
>>> to force migration on the thread group leader.
>>
>> It may be clearer to describe it as a vCPU thread and a migration worker
>> thread. The meat of this test is to catch a regression in KVM.
>>
>>> Unfortunately, we have
>>
>> s/we have/the test has the/
>>
>>> wrong parameter (0) passed to sched_getaffinity().
>>
>> wrong PID
>>
> 
> Yep, it's much clearer to describe it as vCPU thread and migration worker.
> 
>>> It's actually
>>> forcing migration on the migration worker instead of the thread group
>>> leader.
>>
>> What's missing is _why_ the migration worker is getting moved around by
>> the call. Perhaps instead it is better to state what a PID of 0 implies,
>> for those of us who haven't read their manpages in a while ;-)
>>
> 
> Yes, it's good idea. I will have something like below in next revision :)
> 
>      In rseq_test, there are two threads, which are vCPU thread and migration
>      worker separately. Unfortunately, the test has the wrong PID passed to
>      sched_setaffinity() in the migration worker. It forces migration on the
>      migration worker because zeroed PID represents the calling thread, which
>      is the migration worker itself. It means the vCPU thread is never enforced
>      to migration and it can migrate at any time, which eventually leads to
>      failure as the following logs show.
>          :
>          :
>      Fix the issue by passing correct parameter, TID of the vCPU thread, to
>      sched_setaffinity() in the migration worker.
> 
> 
>>> It also means migration can happen on the thread group leader
>>> at any time, which eventually leads to failure as the following logs
>>> show.
>>>
>>> host# uname -r
>>> 5.19.0-rc6-gavin+
>>> host# # cat /proc/cpuinfo | grep processor | tail -n 1
>>> processor : 223
>>> host# pwd
>>> /home/gavin/sandbox/linux.main/tools/testing/selftests/kvm
>>> host# for i in `seq 1 100`; \
>>> do echo "--------> $i"; ./rseq_test; done
>>> --------> 1
>>> --------> 2
>>> --------> 3
>>> --------> 4
>>> --------> 5
>>> --------> 6
>>> ==== Test Assertion Failure ====
>>> rseq_test.c:265: rseq_cpu == cpu
>>> pid=3925 tid=3925 errno=4 - Interrupted system call
>>> 1 0x0000000000401963: main at rseq_test.c:265 (discriminator 2)
>>> 2 0x0000ffffb044affb: ?? ??:0
>>> 3 0x0000ffffb044b0c7: ?? ??:0
>>> 4 0x0000000000401a6f: _start at ??:?
>>> rseq CPU = 4, sched CPU = 27
>>>
>>> This fixes the issue by passing correct parameter, tid of the group
>>> thread leader, to sched_setaffinity().
>>
>> Kernel commit messages should have an imperative tone:
>>
>> Fix the issue by ...
>>
> 
> Ok. I've been having my style for long time. Actually, the style was
> shared by some one when I worked for IBM long time ago. I will bear
> it in mind to use imperative expression since now on :)
> 
> All your comments will be fixed in next revision, but I would delay
> the posting a bit to see Sean or Paolo have more comments. In that
> case, I can fix all of them at once.
> 

v3 was just posted.

https://lore.kernel.org/kvmarm/20220719013540.3477946-1-gshan@redhat.com/T/#u

>>> Fixes: 61e52f1630f5 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>
>> With the comments on the commit message addressed:
>>
>> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
>>

Thanks,
Gavin
diff mbox series

Patch

diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 4158da0da2bb..c83ac7b467f8 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -38,6 +38,7 @@  static __thread volatile struct rseq __rseq = {
  */
 #define NR_TASK_MIGRATIONS 100000
 
+static pid_t rseq_tid;
 static pthread_t migration_thread;
 static cpu_set_t possible_mask;
 static int min_cpu, max_cpu;
@@ -106,7 +107,8 @@  static void *migration_worker(void *ign)
 		 * stable, i.e. while changing affinity is in-progress.
 		 */
 		smp_wmb();
-		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+		r = sched_setaffinity(rseq_tid, sizeof(allowed_mask),
+				      &allowed_mask);
 		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
 			    errno, strerror(errno));
 		smp_wmb();
@@ -231,6 +233,7 @@  int main(int argc, char *argv[])
 	vm = vm_create_default(VCPU_ID, 0, guest_code);
 	ucall_init(vm, NULL);
 
+	rseq_tid = gettid();
 	pthread_create(&migration_thread, NULL, migration_worker, 0);
 
 	for (i = 0; !done; i++) {