Persistent problem with handshake unit tests

Message ID	b22d1b62-a6b1-4dd6-9ccb-827442846f3c@roeck-us.net (mailing list archive)
State	Not Applicable
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D4AB14901A for <netdev@vger.kernel.org>; Fri, 2 Feb 2024 17:21:25 +0000 (UTC) Sender: Guenter Roeck <groeck7@gmail.com> Date: Fri, 2 Feb 2024 09:21:22 -0800 From: Guenter Roeck <linux@roeck-us.net> To: Chuck Lever <chuck.lever@oracle.com> Cc: kernel-tls-handshake@lists.linux.dev, netdev@vger.kernel.org Subject: Persistent problem with handshake unit tests Message-ID: <b22d1b62-a6b1-4dd6-9ccb-827442846f3c@roeck-us.net> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	Persistent problem with handshake unit tests \| expand Persistent problem with handshake unit tests

Context	Check	Description
netdev/tree_selection	success	Guessing tree name failed - patch did not apply

Guenter Roeck Feb. 2, 2024, 5:21 p.m. UTC

Hi,

when running handshake kunit tests in qemu, I always get the following
failure.

    KTAP version 1
    # Subtest: Handshake API tests
    1..11
        KTAP version 1
        # Subtest: req_alloc API fuzzing
        ok 1 handshake_req_alloc NULL proto
        ok 2 handshake_req_alloc CLASS_NONE
        ok 3 handshake_req_alloc CLASS_MAX
        ok 4 handshake_req_alloc no callbacks
        ok 5 handshake_req_alloc no done callback
        ok 6 handshake_req_alloc excessive privsize
        ok 7 handshake_req_alloc all good
    # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
    ok 1 req_alloc API fuzzing
    ok 2 req_submit NULL req arg
    ok 3 req_submit NULL sock arg
    ok 4 req_submit NULL sock->file
    ok 5 req_lookup works
    ok 6 req_submit max pending
    ok 7 req_submit multiple
    ok 8 req_cancel before accept
    ok 9 req_cancel after accept
    ok 10 req_cancel after done
    # req_destroy works: EXPECTATION FAILED at net/handshake/handshake-test.c:478
    Expected handshake_req_destroy_test == req, but
        handshake_req_destroy_test == 00000000
        req == c5080280
    not ok 11 req_destroy works
# Handshake API tests: pass:10 fail:1 skip:0 total:11
# Totals: pass:16 fail:1 skip:0 total:17
not ok 31 Handshake API tests
############## destroy 0xc5080280
...

The line starting with "#######" is from added debug information.

It appears that the destroy function works, but is delayed. Unfortunately,
I don't know enough about the network subsystem and/or the handshake
protocol to suggest a fix. I'd be happy to submit a fix if you let me know
how that should look like.

Thanks,
Guenter

Jakub Kicinski Feb. 2, 2024, 7:22 p.m. UTC | #1

On Fri, 2 Feb 2024 09:21:22 -0800 Guenter Roeck wrote:
> when running handshake kunit tests in qemu, I always get the following
> failure.

Sorry for sidetracking - how do you run kunit to get all the tests?
We run:

	./tools/testing/kunit/kunit.py run --alltests

but more and more I feel like the --alltests is a cruel joke.

Guenter Roeck Feb. 2, 2024, 8:49 p.m. UTC | #2

On 2/2/24 11:22, Jakub Kicinski wrote:
> On Fri, 2 Feb 2024 09:21:22 -0800 Guenter Roeck wrote:
>> when running handshake kunit tests in qemu, I always get the following
>> failure.
> 
> Sorry for sidetracking - how do you run kunit to get all the tests?
> We run:
> 
> 	./tools/testing/kunit/kunit.py run --alltests
> 
> but more and more I feel like the --alltests is a cruel joke.

I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
The tests run during boot, so no additional work is needed. I don't run all
tests because many take too long to execute in qemu.

Guenter

Guenter Roeck Feb. 2, 2024, 8:51 p.m. UTC | #3

On 2/2/24 12:49, Guenter Roeck wrote:
> On 2/2/24 11:22, Jakub Kicinski wrote:
>> On Fri, 2 Feb 2024 09:21:22 -0800 Guenter Roeck wrote:
>>> when running handshake kunit tests in qemu, I always get the following
>>> failure.
>>
>> Sorry for sidetracking - how do you run kunit to get all the tests?
>> We run:
>>
>>     ./tools/testing/kunit/kunit.py run --alltests
>>
>> but more and more I feel like the --alltests is a cruel joke.
> 
> I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
> The tests run during boot, so no additional work is needed. I don't run all
> tests because many take too long to execute in qemu.
> 

Follow-up: If this test isn't supposed to run during boot, please
let me know and I'll drop it.

Thanks,
Guenter

Jakub Kicinski Feb. 3, 2024, 12:47 a.m. UTC | #4

On Fri, 2 Feb 2024 12:51:07 -0800 Guenter Roeck wrote:
> > I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
> > The tests run during boot, so no additional work is needed. I don't run all
> > tests because many take too long to execute in qemu.

I was wondering how do you discover the options. Periodically grep for
new possible KUNIT options added to the kernel? Have some script that
does it.

> Follow-up: If this test isn't supposed to run during boot, please
> let me know and I'll drop it.

FWIW I don't see why, but Chuck is the best person to comment.

Guenter Roeck Feb. 3, 2024, 1:45 a.m. UTC | #5

On 2/2/24 16:47, Jakub Kicinski wrote:
> On Fri, 2 Feb 2024 12:51:07 -0800 Guenter Roeck wrote:
>>> I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
>>> The tests run during boot, so no additional work is needed. I don't run all
>>> tests because many take too long to execute in qemu.
> 
> I was wondering how do you discover the options. Periodically grep for
> new possible KUNIT options added to the kernel? Have some script that
> does it.
> 

"Periodically grep for > new possible KUNIT options added to the kernel"

Exactly, and see what happens if I enable them to determine if they run
fast and if they are stable enough to be enabled in a qemu test. I can not
enable tests automatically because some just take too long to run, as in

# takes too long
# enable_config "${fragment}" CONFIG_TEST_RHASHTABLE

or

# takes too long
# enable_config "${fragment}" CONFIG_TORTURE_TEST CONFIG_LOCK_TORTURE_TEST CONFIG_RCU_TORTURE_TEST

Sometimes I have to disable tests because a long-running sub-option is added,
as in

# CONFIG_MEMCPY_KUNIT_TEST sometimes takes more than 45 seconds.
# CONFIG_MEMCPY_SLOW_KUNIT_TEST avoids this, but last time I checked
# this was not present in all affected kernel branches.
# enable_config "${fragment} CONFIG_MEMCPY_KUNIT_TEST

or because of problems such as

# RTC library unit tests hang in many qemu emulations
# enable_config "${fragment}" CONFIG_RTC_LIB_KUNIT_TEST

or

# clock unit tests seem to introduce noise warning tracebacks
# enable_config "${fragment}" CONFIG_CLK_GATE_KUNIT_TEST CONFIG_CLK_KUNIT_TEST

or

# Results in lots of "ASoC: Parent card not yet available" log messages
# enable_config "${fragment}" CONFIG_SND_SOC_TOPOLOGY_KUNIT_TEST SND_SOC_UTILS_KUNIT_TEST

or

# CONFIG_WW_MUTEX_SELFTEST interferes with CONFIG_PREEMPT=y
# enable_config "${fragment}" CONFIG_WW_MUTEX_SELFTEST

or

# generates warning backtraces on purpose
# enable_config "${fragment}" CONFIG_OF_UNITTEST

Thanks,
Guenter

>> Follow-up: If this test isn't supposed to run during boot, please
>> let me know and I'll drop it.
> 
> FWIW I don't see why, but Chuck is the best person to comment.

Jakub Kicinski Feb. 3, 2024, 2:27 a.m. UTC | #6

On Fri, 2 Feb 2024 17:45:55 -0800 Guenter Roeck wrote:
> > I was wondering how do you discover the options. Periodically grep for
> > new possible KUNIT options added to the kernel? Have some script that
> > does it.
> 
> "Periodically grep for > new possible KUNIT options added to the kernel"
> 
> Exactly, and see what happens if I enable them to determine if they run
> fast and if they are stable enough to be enabled in a qemu test. I can not
> enable tests automatically because some just take too long to run...

Ugh, we have such a long way to go with kernel testing :(
At least I know I'm not missing an obvious trick, thanks for explaining!

Chuck Lever Feb. 3, 2024, 3:46 a.m. UTC | #7

> On Feb 2, 2024, at 3:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:
> 
> On 2/2/24 12:49, Guenter Roeck wrote:
>> On 2/2/24 11:22, Jakub Kicinski wrote:
>>> On Fri, 2 Feb 2024 09:21:22 -0800 Guenter Roeck wrote:
>>>> when running handshake kunit tests in qemu, I always get the following
>>>> failure.
>>> 
>>> Sorry for sidetracking - how do you run kunit to get all the tests?
>>> We run:
>>> 
>>>     ./tools/testing/kunit/kunit.py run --alltests
>>> 
>>> but more and more I feel like the --alltests is a cruel joke.
>> I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
>> The tests run during boot, so no additional work is needed. I don't run all
>> tests because many take too long to execute in qemu.
> 
> Follow-up: If this test isn't supposed to run during boot, please
> let me know and I'll drop it.

These are pretty simple tests that should run quickly. I don't see
a reason to exclude them. Generally the memory environment in the
Kunit test harness is significantly different than the one at boot,
so problems like this do crop up for qemu vs. boot time on occasion.

Question is, has this test started to fail only recently, or has it
been broken since it was merged?

--
Chuck Lever

Guenter Roeck Feb. 3, 2024, 3:54 a.m. UTC | #8

On 2/2/24 19:46, Chuck Lever III wrote:
> 
> 
>> On Feb 2, 2024, at 3:51 PM, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> On 2/2/24 12:49, Guenter Roeck wrote:
>>> On 2/2/24 11:22, Jakub Kicinski wrote:
>>>> On Fri, 2 Feb 2024 09:21:22 -0800 Guenter Roeck wrote:
>>>>> when running handshake kunit tests in qemu, I always get the following
>>>>> failure.
>>>>
>>>> Sorry for sidetracking - how do you run kunit to get all the tests?
>>>> We run:
>>>>
>>>>      ./tools/testing/kunit/kunit.py run --alltests
>>>>
>>>> but more and more I feel like the --alltests is a cruel joke.
>>> I have CONFIG_NET_HANDSHAKE_KUNIT_TEST=y enabled in my configuration.
>>> The tests run during boot, so no additional work is needed. I don't run all
>>> tests because many take too long to execute in qemu.
>>
>> Follow-up: If this test isn't supposed to run during boot, please
>> let me know and I'll drop it.
> 
> These are pretty simple tests that should run quickly. I don't see
> a reason to exclude them. Generally the memory environment in the
> Kunit test harness is significantly different than the one at boot,
> so problems like this do crop up for qemu vs. boot time on occasion.
> 
> Question is, has this test started to fail only recently, or has it
> been broken since it was merged?
> 

I am wuite sure that it has been broken since it was merged. I only recently
started actually checking test results, so I missed the problem for a while.
Also, I am not sure if anyone besides me is actually running those tests ;-)

Guenter

Chuck Lever Feb. 6, 2024, 4:47 p.m. UTC | #9

On Fri, Feb 02, 2024 at 09:21:22AM -0800, Guenter Roeck wrote:
> Hi,
> 
> when running handshake kunit tests in qemu, I always get the following
> failure.
> 
>     KTAP version 1
>     # Subtest: Handshake API tests
>     1..11
>         KTAP version 1
>         # Subtest: req_alloc API fuzzing
>         ok 1 handshake_req_alloc NULL proto
>         ok 2 handshake_req_alloc CLASS_NONE
>         ok 3 handshake_req_alloc CLASS_MAX
>         ok 4 handshake_req_alloc no callbacks
>         ok 5 handshake_req_alloc no done callback
>         ok 6 handshake_req_alloc excessive privsize
>         ok 7 handshake_req_alloc all good
>     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
>     ok 1 req_alloc API fuzzing
>     ok 2 req_submit NULL req arg
>     ok 3 req_submit NULL sock arg
>     ok 4 req_submit NULL sock->file
>     ok 5 req_lookup works
>     ok 6 req_submit max pending
>     ok 7 req_submit multiple
>     ok 8 req_cancel before accept
>     ok 9 req_cancel after accept
>     ok 10 req_cancel after done
>     # req_destroy works: EXPECTATION FAILED at net/handshake/handshake-test.c:478
>     Expected handshake_req_destroy_test == req, but
>         handshake_req_destroy_test == 00000000
>         req == c5080280
>     not ok 11 req_destroy works
> # Handshake API tests: pass:10 fail:1 skip:0 total:11
> # Totals: pass:16 fail:1 skip:0 total:17
> not ok 31 Handshake API tests
> ############## destroy 0xc5080280
> ...
> 
> The line starting with "#######" is from added debug information.
> 
> diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
> index 16ed7bfd29e4..a2417c56fe15 100644
> --- a/net/handshake/handshake-test.c
> +++ b/net/handshake/handshake-test.c
> @@ -434,6 +434,7 @@ static struct handshake_req *handshake_req_destroy_test;
> 
>  static void test_destroy_func(struct handshake_req *req)
>  {
> +       pr_info("############## destroy 0x%px\n", req);
>         handshake_req_destroy_test = req;
>  }
> 
> It appears that the destroy function works, but is delayed. Unfortunately,
> I don't know enough about the network subsystem and/or the handshake
> protocol to suggest a fix. I'd be happy to submit a fix if you let me know
> how that should look like.
> 
> Thanks,
> Guenter

I am able to reproduce the test failure at boot:

[  125.404130]     KTAP version 1
[  125.404690]     # Subtest: Handshake API tests
[  125.405540]     1..11
[  125.405966]         KTAP version 1
[  125.406623]         # Subtest: req_alloc API fuzzing
[  125.406971]         ok 1 handshake_req_alloc NULL proto
[  125.408275]         ok 2 handshake_req_alloc CLASS_NONE
[  125.409599]         ok 3 handshake_req_alloc CLASS_MAX
[  125.410879]         ok 4 handshake_req_alloc no callbacks
[  125.412200]         ok 5 handshake_req_alloc no done callback
[  125.413525]         ok 6 handshake_req_alloc excessive privsize
[  125.414896]         ok 7 handshake_req_alloc all good
[  125.416036]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
[  125.416891]     ok 1 req_alloc API fuzzing
[  125.418439]     ok 2 req_submit NULL req arg
[  125.419399]     ok 3 req_submit NULL sock arg
[  125.420925]     ok 4 req_submit NULL sock->file
[  125.422305]     ok 5 req_lookup works
[  125.423667]     ok 6 req_submit max pending
[  125.425061]     ok 7 req_submit multiple
[  125.426151]     ok 8 req_cancel before accept
[  125.427225]     ok 9 req_cancel after accept
[  125.428318]     ok 10 req_cancel after done
[  125.429424]     # req_destroy works: EXPECTATION FAILED at net/handshake/handshake-test.c:477
[  125.429424]     Expected handshake_req_destroy_test == req, but
[  125.429424]         handshake_req_destroy_test == 0000000000000000
[  125.429424]         req == ffff88802c5e6900
[  125.430479]     not ok 11 req_destroy works
[  125.435215] # Handshake API tests: pass:10 fail:1 skip:0 total:11
[  125.435858] # Totals: pass:16 fail:1 skip:0 total:17
[  125.437224] not ok 69 Handshake API tests

I'll have a look.

Guenter Roeck Feb. 6, 2024, 5:55 p.m. UTC | #10

On 2/6/24 08:47, Chuck Lever wrote:
> On Fri, Feb 02, 2024 at 09:21:22AM -0800, Guenter Roeck wrote:
>> Hi,
>>
>> when running handshake kunit tests in qemu, I always get the following
>> failure.
>>
>>      KTAP version 1
>>      # Subtest: Handshake API tests
>>      1..11
>>          KTAP version 1
>>          # Subtest: req_alloc API fuzzing
>>          ok 1 handshake_req_alloc NULL proto
>>          ok 2 handshake_req_alloc CLASS_NONE
>>          ok 3 handshake_req_alloc CLASS_MAX
>>          ok 4 handshake_req_alloc no callbacks
>>          ok 5 handshake_req_alloc no done callback
>>          ok 6 handshake_req_alloc excessive privsize
>>          ok 7 handshake_req_alloc all good
>>      # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
>>      ok 1 req_alloc API fuzzing
>>      ok 2 req_submit NULL req arg
>>      ok 3 req_submit NULL sock arg
>>      ok 4 req_submit NULL sock->file
>>      ok 5 req_lookup works
>>      ok 6 req_submit max pending
>>      ok 7 req_submit multiple
>>      ok 8 req_cancel before accept
>>      ok 9 req_cancel after accept
>>      ok 10 req_cancel after done
>>      # req_destroy works: EXPECTATION FAILED at net/handshake/handshake-test.c:478
>>      Expected handshake_req_destroy_test == req, but
>>          handshake_req_destroy_test == 00000000
>>          req == c5080280
>>      not ok 11 req_destroy works
>> # Handshake API tests: pass:10 fail:1 skip:0 total:11
>> # Totals: pass:16 fail:1 skip:0 total:17
>> not ok 31 Handshake API tests
>> ############## destroy 0xc5080280
>> ...
>>
>> The line starting with "#######" is from added debug information.
>>
>> diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
>> index 16ed7bfd29e4..a2417c56fe15 100644
>> --- a/net/handshake/handshake-test.c
>> +++ b/net/handshake/handshake-test.c
>> @@ -434,6 +434,7 @@ static struct handshake_req *handshake_req_destroy_test;
>>
>>   static void test_destroy_func(struct handshake_req *req)
>>   {
>> +       pr_info("############## destroy 0x%px\n", req);
>>          handshake_req_destroy_test = req;
>>   }
>>
>> It appears that the destroy function works, but is delayed. Unfortunately,
>> I don't know enough about the network subsystem and/or the handshake
>> protocol to suggest a fix. I'd be happy to submit a fix if you let me know
>> how that should look like.
>>
>> Thanks,
>> Guenter
> 
> I am able to reproduce the test failure at boot:
> 
> [  125.404130]     KTAP version 1
> [  125.404690]     # Subtest: Handshake API tests
> [  125.405540]     1..11
> [  125.405966]         KTAP version 1
> [  125.406623]         # Subtest: req_alloc API fuzzing
> [  125.406971]         ok 1 handshake_req_alloc NULL proto
> [  125.408275]         ok 2 handshake_req_alloc CLASS_NONE
> [  125.409599]         ok 3 handshake_req_alloc CLASS_MAX
> [  125.410879]         ok 4 handshake_req_alloc no callbacks
> [  125.412200]         ok 5 handshake_req_alloc no done callback
> [  125.413525]         ok 6 handshake_req_alloc excessive privsize
> [  125.414896]         ok 7 handshake_req_alloc all good
> [  125.416036]     # req_alloc API fuzzing: pass:7 fail:0 skip:0 total:7
> [  125.416891]     ok 1 req_alloc API fuzzing
> [  125.418439]     ok 2 req_submit NULL req arg
> [  125.419399]     ok 3 req_submit NULL sock arg
> [  125.420925]     ok 4 req_submit NULL sock->file
> [  125.422305]     ok 5 req_lookup works
> [  125.423667]     ok 6 req_submit max pending
> [  125.425061]     ok 7 req_submit multiple
> [  125.426151]     ok 8 req_cancel before accept
> [  125.427225]     ok 9 req_cancel after accept
> [  125.428318]     ok 10 req_cancel after done
> [  125.429424]     # req_destroy works: EXPECTATION FAILED at net/handshake/handshake-test.c:477
> [  125.429424]     Expected handshake_req_destroy_test == req, but
> [  125.429424]         handshake_req_destroy_test == 0000000000000000
> [  125.429424]         req == ffff88802c5e6900
> [  125.430479]     not ok 11 req_destroy works
> [  125.435215] # Handshake API tests: pass:10 fail:1 skip:0 total:11
> [  125.435858] # Totals: pass:16 fail:1 skip:0 total:17
> [  125.437224] not ok 69 Handshake API tests
> 
> I'll have a look.
> 

Since the destroy function runs asynchronously, the best I could come up with
was to use completion handling. The following patch fixes the problem for me.

Guenter

---
diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
index 16ed7bfd29e4..dc119c1e211b 100644
--- a/net/handshake/handshake-test.c
+++ b/net/handshake/handshake-test.c
@@ -432,9 +432,12 @@ static void handshake_req_cancel_test3(struct kunit *test)

  static struct handshake_req *handshake_req_destroy_test;

+static DECLARE_COMPLETION(handshake_request_destroyed);
+
  static void test_destroy_func(struct handshake_req *req)
  {
         handshake_req_destroy_test = req;
+       complete(&handshake_request_destroyed);
  }

  static struct handshake_proto handshake_req_alloc_proto_destroy = {
@@ -473,6 +476,8 @@ static void handshake_req_destroy_test1(struct kunit *test)
         /* Act */
         fput(filp);

+       wait_for_completion_timeout(&handshake_request_destroyed, msecs_to_jiffies(100));
+
         /* Assert */
         KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
  }

Guenter Roeck Feb. 6, 2024, 7:05 p.m. UTC | #11

On 2/6/24 09:55, Guenter Roeck wrote:
[ ...]

> 
> Since the destroy function runs asynchronously, the best I could come up with
> was to use completion handling. The following patch fixes the problem for me.
> 

I don't know if it is a proper fix, but I no longer see the problem with
the patch below applied on top of v6.8-rc3.

Guenter

> 
> ---
> diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
> index 16ed7bfd29e4..dc119c1e211b 100644
> --- a/net/handshake/handshake-test.c
> +++ b/net/handshake/handshake-test.c
> @@ -432,9 +432,12 @@ static void handshake_req_cancel_test3(struct kunit *test)
> 
>   static struct handshake_req *handshake_req_destroy_test;
> 
> +static DECLARE_COMPLETION(handshake_request_destroyed);
> +
>   static void test_destroy_func(struct handshake_req *req)
>   {
>          handshake_req_destroy_test = req;
> +       complete(&handshake_request_destroyed);
>   }
> 
>   static struct handshake_proto handshake_req_alloc_proto_destroy = {
> @@ -473,6 +476,8 @@ static void handshake_req_destroy_test1(struct kunit *test)
>          /* Act */
>          fput(filp);
> 
> +       wait_for_completion_timeout(&handshake_request_destroyed, msecs_to_jiffies(100));
> +
>          /* Assert */
>          KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
>   }
> 
> 
>

Chuck Lever Feb. 6, 2024, 7:07 p.m. UTC | #12

On Tue, Feb 06, 2024 at 11:05:36AM -0800, Guenter Roeck wrote:
> On 2/6/24 09:55, Guenter Roeck wrote:
> [ ...]
> 
> > 
> > Since the destroy function runs asynchronously, the best I could come up with
> > was to use completion handling. The following patch fixes the problem for me.
> > 
> 
> I don't know if it is a proper fix, but I no longer see the problem with
> the patch below applied on top of v6.8-rc3.

I've got something similar but simpler that I will post with a full
patch description, root cause, and rationale. Stand by.


> Guenter
> 
> > 
> > ---
> > diff --git a/net/handshake/handshake-test.c b/net/handshake/handshake-test.c
> > index 16ed7bfd29e4..dc119c1e211b 100644
> > --- a/net/handshake/handshake-test.c
> > +++ b/net/handshake/handshake-test.c
> > @@ -432,9 +432,12 @@ static void handshake_req_cancel_test3(struct kunit *test)
> > 
> >   static struct handshake_req *handshake_req_destroy_test;
> > 
> > +static DECLARE_COMPLETION(handshake_request_destroyed);
> > +
> >   static void test_destroy_func(struct handshake_req *req)
> >   {
> >          handshake_req_destroy_test = req;
> > +       complete(&handshake_request_destroyed);
> >   }
> > 
> >   static struct handshake_proto handshake_req_alloc_proto_destroy = {
> > @@ -473,6 +476,8 @@ static void handshake_req_destroy_test1(struct kunit *test)
> >          /* Act */
> >          fput(filp);
> > 
> > +       wait_for_completion_timeout(&handshake_request_destroyed, msecs_to_jiffies(100));
> > +
> >          /* Assert */
> >          KUNIT_EXPECT_PTR_EQ(test, handshake_req_destroy_test, req);
> >   }
> > 
> > 
> > 
>

Guenter Roeck Feb. 6, 2024, 7:24 p.m. UTC | #13

On 2/6/24 11:07, Chuck Lever wrote:
> On Tue, Feb 06, 2024 at 11:05:36AM -0800, Guenter Roeck wrote:
>> On 2/6/24 09:55, Guenter Roeck wrote:
>> [ ...]
>>
>>>
>>> Since the destroy function runs asynchronously, the best I could come up with
>>> was to use completion handling. The following patch fixes the problem for me.
>>>
>>
>> I don't know if it is a proper fix, but I no longer see the problem with
>> the patch below applied on top of v6.8-rc3.
> 
> I've got something similar but simpler that I will post with a full
> patch description, root cause, and rationale. Stand by.
> 

Excellent.

Thanks!
Guenter

Persistent problem with handshake unit tests

Checks

Commit Message

Comments

Patch