diff mbox series

ceph: avoid putting the realm twice when docoding snaps fails

Message ID 20221107071759.32000-1-xiubli@redhat.com (mailing list archive)
State New, archived
Headers show
Series ceph: avoid putting the realm twice when docoding snaps fails | expand

Commit Message

Xiubo Li Nov. 7, 2022, 7:17 a.m. UTC
From: Xiubo Li <xiubli@redhat.com>

When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli@redhat.com>
---
 fs/ceph/snap.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Luis Henriques Nov. 7, 2022, 10:39 a.m. UTC | #1
On Mon, Nov 07, 2022 at 03:17:59PM +0800, xiubli@redhat.com wrote:
> From: Xiubo Li <xiubli@redhat.com>
> 
> When decoding the snaps fails it maybe leaving the 'first_realm'
> and 'realm' pointing to the same snaprealm memory. And then it'll
> put it twice and could cause random use-after-free, BUG_ON, etc
> issues.
> 
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/57686
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/snap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 9bceed2ebda3..baf17df05107 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>  	if (realm_to_rebuild && p >= e)
>  		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>  
> -	if (!first_realm)
> +	if (!first_realm) {
>  		first_realm = realm;
> -	else
> +		realm = NULL;
> +	} else {
>  		ceph_put_snap_realm(mdsc, realm);
> +	}
>  
>  	if (p < e)
>  		goto more;
> -- 
> 2.31.1
> 

This patch looks correct to me.  But I wonder if there's a deeper problem
there (probably not on the kernel client).  Because the other question is:
why are we failing to decode the snaps?  But I guess this fix is worth it
anyway.

Reviewed-by: Luís Henriques <lhenriques@suse.de>


Cheers,
--
Luís
Xiubo Li Nov. 7, 2022, 10:58 a.m. UTC | #2
On 07/11/2022 18:39, Luís Henriques wrote:
> On Mon, Nov 07, 2022 at 03:17:59PM +0800, xiubli@redhat.com wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> When decoding the snaps fails it maybe leaving the 'first_realm'
>> and 'realm' pointing to the same snaprealm memory. And then it'll
>> put it twice and could cause random use-after-free, BUG_ON, etc
>> issues.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/57686
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/snap.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>> index 9bceed2ebda3..baf17df05107 100644
>> --- a/fs/ceph/snap.c
>> +++ b/fs/ceph/snap.c
>> @@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>   	if (realm_to_rebuild && p >= e)
>>   		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>>   
>> -	if (!first_realm)
>> +	if (!first_realm) {
>>   		first_realm = realm;
>> -	else
>> +		realm = NULL;
>> +	} else {
>>   		ceph_put_snap_realm(mdsc, realm);
>> +	}
>>   
>>   	if (p < e)
>>   		goto more;
>> -- 
>> 2.31.1
>>
> This patch looks correct to me.  But I wonder if there's a deeper problem
> there (probably not on the kernel client).  Because the other question is:
> why are we failing to decode the snaps?  But I guess this fix is worth it
> anyway.

Yeah, good question.

At the same time the MDS also crashed [1][2] just before the kernel 
crash was triggered seconds later. And the metadata in cephfs was 
corrupted due to some reasons.

[1] https://tracker.ceph.com/issues/56140

[2] https://tracker.ceph.com/issues/54546

Thanks!

- Xiubo

> Reviewed-by: Luís Henriques <lhenriques@suse.de>
>
>
> Cheers,
> --
> Luís
>
Ilya Dryomov Nov. 7, 2022, 3:21 p.m. UTC | #3
On Mon, Nov 7, 2022 at 8:18 AM <xiubli@redhat.com> wrote:
>
> From: Xiubo Li <xiubli@redhat.com>
>
> When decoding the snaps fails it maybe leaving the 'first_realm'
> and 'realm' pointing to the same snaprealm memory. And then it'll
> put it twice and could cause random use-after-free, BUG_ON, etc
> issues.
>
> Cc: stable@vger.kernel.org
> URL: https://tracker.ceph.com/issues/57686
> Signed-off-by: Xiubo Li <xiubli@redhat.com>
> ---
>  fs/ceph/snap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
> index 9bceed2ebda3..baf17df05107 100644
> --- a/fs/ceph/snap.c
> +++ b/fs/ceph/snap.c
> @@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>         if (realm_to_rebuild && p >= e)
>                 rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>
> -       if (!first_realm)
> +       if (!first_realm) {
>                 first_realm = realm;
> -       else
> +               realm = NULL;

Hi Xiubo,

I wonder why realm is cleared only in !first_realm branch?  Can't
the same issue occur with realm?

    first_realm is already set, ceph_put_snap_realm(realm)
    p < e, goto more
    decoding fails, goto bad
    realm is still set and not IS_ERR, ceph_put_snap_realm(realm)
    <realm is put twice>

Thanks,

                Ilya
Xiubo Li Nov. 8, 2022, 2:24 a.m. UTC | #4
On 07/11/2022 23:21, Ilya Dryomov wrote:
> On Mon, Nov 7, 2022 at 8:18 AM <xiubli@redhat.com> wrote:
>> From: Xiubo Li <xiubli@redhat.com>
>>
>> When decoding the snaps fails it maybe leaving the 'first_realm'
>> and 'realm' pointing to the same snaprealm memory. And then it'll
>> put it twice and could cause random use-after-free, BUG_ON, etc
>> issues.
>>
>> Cc: stable@vger.kernel.org
>> URL: https://tracker.ceph.com/issues/57686
>> Signed-off-by: Xiubo Li <xiubli@redhat.com>
>> ---
>>   fs/ceph/snap.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
>> index 9bceed2ebda3..baf17df05107 100644
>> --- a/fs/ceph/snap.c
>> +++ b/fs/ceph/snap.c
>> @@ -849,10 +849,12 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
>>          if (realm_to_rebuild && p >= e)
>>                  rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
>>
>> -       if (!first_realm)
>> +       if (!first_realm) {
>>                  first_realm = realm;
>> -       else
>> +               realm = NULL;
> Hi Xiubo,
>
> I wonder why realm is cleared only in !first_realm branch?  Can't
> the same issue occur with realm?
>
>      first_realm is already set, ceph_put_snap_realm(realm)
>      p < e, goto more
>      decoding fails, goto bad
>      realm is still set and not IS_ERR, ceph_put_snap_realm(realm)
>      <realm is put twice>

Yeah, makes sense.

I will fix this.

Thanks Ilya!


>
> Thanks,
>
>                  Ilya
>
diff mbox series

Patch

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9bceed2ebda3..baf17df05107 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -849,10 +849,12 @@  int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	if (realm_to_rebuild && p >= e)
 		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
 
-	if (!first_realm)
+	if (!first_realm) {
 		first_realm = realm;
-	else
+		realm = NULL;
+	} else {
 		ceph_put_snap_realm(mdsc, realm);
+	}
 
 	if (p < e)
 		goto more;