diff mbox series

[v2] libceph: init the cursor when preparing the sparse read

Message ID 20240306010544.182527-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2] libceph: init the cursor when preparing the sparse read | expand

Commit Message

Xiubo Li March 6, 2024, 1:05 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

The osd code has remove cursor initilizing code and this will make
the sparse read state into a infinite loop. We should initialize
the cursor just before each sparse-read in messnger v2.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/64607
Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
Reported-by: Luis Henriques <lhenriques@suse.de>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---

V2:
- Just removed the unnecessary 'sparse_read_total' check.


 net/ceph/messenger_v2.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Luis Henriques March 6, 2024, 11:24 a.m. UTC | #1
xiubli@redhat.com writes:

> From: Xiubo Li <xiubli@redhat.com>
>
> The osd code has remove cursor initilizing code and this will make
> the sparse read state into a infinite loop. We should initialize
> the cursor just before each sparse-read in messnger v2.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/64607
> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> Reported-by: Luis Henriques <lhenriques@suse.de>
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>
> V2:
> - Just removed the unnecessary 'sparse_read_total' check.
>

Thanks a lot for the quick fix, Xiubo.  FWIW:

Tested-by: Luis Henriques <lhenriques@suse.de>

Note that I still see this test failing occasionally, but I haven't had
time to help debugging it.  And that's a different issue, of course.  TBH
I don't remember if this test ever used to reliably pass.  Here's the
output diff shown by fstests in case you're not able to reproduce it:

@@ -65,7 +65,7 @@
 # Getting encryption key status
 Present (user_count=1, added_by_self)
 # Removing encryption key
-Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751
+Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751, but files still busy
 # Getting encryption key status
 Absent
 # Verifying that the encrypted directory was "locked"

Cheers,
Ilya Dryomov March 6, 2024, 2:13 p.m. UTC | #2
On Wed, Mar 6, 2024 at 12:24 PM Luis Henriques <lhenriques@suse.de> wrote:
>
> xiubli@redhat.com writes:
>
> > From: Xiubo Li <xiubli@redhat.com>
> >
> > The osd code has remove cursor initilizing code and this will make
> > the sparse read state into a infinite loop. We should initialize
> > the cursor just before each sparse-read in messnger v2.
> >
> > Cc: stable@vger.kernel.org
> > URL: https://tracker.ceph.com/issues/64607
> > Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
> > Reported-by: Luis Henriques <lhenriques@suse.de>
> > Signed-off-by: Xiubo Li <xiubli@redhat.com>
> > ---
> >
> > V2:
> > - Just removed the unnecessary 'sparse_read_total' check.
> >
>
> Thanks a lot for the quick fix, Xiubo.  FWIW:
>
> Tested-by: Luis Henriques <lhenriques@suse.de>

Thank you for catching this, Luis!  I'm still lacking clarity on how
this got missed, but hopefully the fs suite will improve with regard to
fscrypt + ms_type coverage.

I have staged the fix with a minor tweak to use msg local variable
instead of con->in_msg and reworded changelog:

https://github.com/ceph/ceph-client/commit/321e3c3de53c7530cd518219d01f04e7e32a9d23

                Ilya
Xiubo Li March 7, 2024, 1:19 a.m. UTC | #3
On 3/6/24 19:24, Luis Henriques wrote:
> xiubli@redhat.com writes:
>
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> The osd code has remove cursor initilizing code and this will make
>> the sparse read state into a infinite loop. We should initialize
>> the cursor just before each sparse-read in messnger v2.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/64607
>> Fixes: 8e46a2d068c9 ("libceph: just wait for more data to be available on the socket")
>> Reported-by: Luis Henriques <lhenriques@suse.de>
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>
>> V2:
>> - Just removed the unnecessary 'sparse_read_total' check.
>>
> Thanks a lot for the quick fix, Xiubo.  FWIW:
>
> Tested-by: Luis Henriques <lhenriques@suse.de>
>
> Note that I still see this test failing occasionally, but I haven't had
> time to help debugging it.  And that's a different issue, of course.  TBH
> I don't remember if this test ever used to reliably pass.  Here's the
> output diff shown by fstests in case you're not able to reproduce it:
>
> @@ -65,7 +65,7 @@
>   # Getting encryption key status
>   Present (user_count=1, added_by_self)
>   # Removing encryption key
> -Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751
> +Removed encryption key with identifier 69b2f6edeee720cce0577937eb8a6751, but files still busy
>   # Getting encryption key status
>   Absent
>   # Verifying that the encrypted directory was "locked"

Thanks Luis.

This is a different issue as I remembered I have seen this before in msgr1.

Thanks

- Xiubo

> Cheers,
diff mbox series

Patch

diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index a0ca5414b333..ab3ab130a911 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -2034,6 +2034,9 @@  static int prepare_sparse_read_data(struct ceph_connection *con)
 	if (!con_secure(con))
 		con->in_data_crc = -1;
 
+	ceph_msg_data_cursor_init(&con->v2.in_cursor, con->in_msg,
+				  con->in_msg->sparse_read_total);
+
 	reset_in_kvecs(con);
 	con->v2.in_state = IN_S_PREPARE_SPARSE_DATA_CONT;
 	con->v2.data_len_remain = data_len(msg);