mbox series

[RFC,0/2] ceph: safely use 'copy-from' Op on Octopus OSDs

Message ID 20191108141555.31176-1-lhenriques@suse.com (mailing list archive)
Headers show
Series ceph: safely use 'copy-from' Op on Octopus OSDs | expand

Message

Luis Henriques Nov. 8, 2019, 2:15 p.m. UTC
Hi!

(Sorry for the long cover letter!)

Since the fix for [1] has finally been merged and should be available in
the next (Octopus) ceph release, I'm trying to clean-up my kernel client
patch that tries to find out whether or not it's safe to use the
'copy-from' RADOS operation for copy_file_range.

So, the fix for [1] was to modify the 'copy-from' operation to allow
clients to optionally (using the CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ
flag) send the extra truncate_seq and truncate_size parameters.  Since
only Octopus will have this fix (no backports planned), the client
simply needs to ensure the OSDs being used have SERVER_OCTOPUS in their
features.

My initial solution was to add an extra test in __submit_request,
looping all the request ops and checking if the connection has the
required features for that operation.  Obviously, at the moment only the
copy-from operation has a restriction but I guess others may be added in
the future.  I believe that doing this at this point (__submit_request)
allows to cover cases where a cluster is being upgraded to Octopus and
we have different OSDs running with different feature bits.

Unfortunately, this solution is racy because the connection state
machine may be changing and the peer_features field isn't yet set.  For
example: if the connection to an OSD is being re-open when we're about
to check the features, the con->state will be CON_STATE_PREOPEN and the
con->peer_features will be 0.  I tried to find ways to move the feature
check further down in the stack, but that can't be easily done without
adding more infrastructure.  A solution that came to my mind was to add
a new con->ops, invoked in the context of ceph_con_workfn, under the
con->mutex.  This callback could then verify the available features,
aborting the operation if needed.

Note that the race in this patchset doesn't seem to be a huge problem,
other than occasionally reverting to a VFS generic copy_file_range, as
-EOPNOTSUPP will be returned here.  But it's still a race, and there are
probably other cases that I'm missing.

Anyway, maybe I'm missing an obvious solution for checking these OSD
features, but I'm open to any suggestions on other options (or some
feedback on the new callback in ceph_connection_operations option).

[1] https://tracker.ceph.com/issues/37378

Cheers,
--
Luis

Luis Henriques (2):
  ceph: add support for sending truncate_{seq,size} in 'copy-from' Op
  ceph: make 'copyfrom' a default mount option again

 fs/ceph/file.c                     |  4 +++-
 fs/ceph/super.c                    |  4 ++--
 fs/ceph/super.h                    |  4 +---
 include/linux/ceph/ceph_features.h |  6 ++++-
 include/linux/ceph/osd_client.h    |  1 +
 include/linux/ceph/rados.h         |  1 +
 net/ceph/osd_client.c              | 37 +++++++++++++++++++++++++++++-
 7 files changed, 49 insertions(+), 8 deletions(-)

Comments

Ilya Dryomov Nov. 8, 2019, 3:15 p.m. UTC | #1
On Fri, Nov 8, 2019 at 3:15 PM Luis Henriques <lhenriques@suse.com> wrote:
>
> Hi!
>
> (Sorry for the long cover letter!)

This is exactly what cover letters are for!

>
> Since the fix for [1] has finally been merged and should be available in
> the next (Octopus) ceph release, I'm trying to clean-up my kernel client
> patch that tries to find out whether or not it's safe to use the
> 'copy-from' RADOS operation for copy_file_range.
>
> So, the fix for [1] was to modify the 'copy-from' operation to allow
> clients to optionally (using the CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ
> flag) send the extra truncate_seq and truncate_size parameters.  Since
> only Octopus will have this fix (no backports planned), the client
> simply needs to ensure the OSDs being used have SERVER_OCTOPUS in their
> features.
>
> My initial solution was to add an extra test in __submit_request,
> looping all the request ops and checking if the connection has the
> required features for that operation.  Obviously, at the moment only the
> copy-from operation has a restriction but I guess others may be added in
> the future.  I believe that doing this at this point (__submit_request)
> allows to cover cases where a cluster is being upgraded to Octopus and
> we have different OSDs running with different feature bits.
>
> Unfortunately, this solution is racy because the connection state
> machine may be changing and the peer_features field isn't yet set.  For
> example: if the connection to an OSD is being re-open when we're about
> to check the features, the con->state will be CON_STATE_PREOPEN and the
> con->peer_features will be 0.  I tried to find ways to move the feature
> check further down in the stack, but that can't be easily done without
> adding more infrastructure.  A solution that came to my mind was to add
> a new con->ops, invoked in the context of ceph_con_workfn, under the
> con->mutex.  This callback could then verify the available features,
> aborting the operation if needed.
>
> Note that the race in this patchset doesn't seem to be a huge problem,
> other than occasionally reverting to a VFS generic copy_file_range, as
> -EOPNOTSUPP will be returned here.  But it's still a race, and there are
> probably other cases that I'm missing.
>
> Anyway, maybe I'm missing an obvious solution for checking these OSD
> features, but I'm open to any suggestions on other options (or some
> feedback on the new callback in ceph_connection_operations option).
>
> [1] https://tracker.ceph.com/issues/37378

If the OSD checked for unknown flags, like newer syscalls do, it would
be super easy, but it looks like it doesn't.

An obvious solution is to look at require_osd_release in osdmap, but we
don't decode that in the kernel because it lives the OSD portion of the
osdmap.  We could add that and consider the fact that the client now
needs to decode more than just the client portion a design mistake.
I'm not sure what can of worms does that open and if copy-from alone is
worth it though.  Perhaps that field could be moved to (or a copy of it
be replicated in) the client portion of the osdmap starting with
octopus?  We seem to be running into it on the client side more and
more...

Given the track record of this feature (the fix for the most recently
discovered data corrupting bug hasn't even merged yet), I would be very
hesitant to turn it back on by default even if we figure out a good
solution for the feature check.  In my opinion, it should stay opt-in.

Thanks,

                Ilya
Luis Henriques Nov. 8, 2019, 4:47 p.m. UTC | #2
On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> On Fri, Nov 8, 2019 at 3:15 PM Luis Henriques <lhenriques@suse.com> wrote:
> >
> > Hi!
> >
> > (Sorry for the long cover letter!)
> 
> This is exactly what cover letters are for!
> 
> >
> > Since the fix for [1] has finally been merged and should be available in
> > the next (Octopus) ceph release, I'm trying to clean-up my kernel client
> > patch that tries to find out whether or not it's safe to use the
> > 'copy-from' RADOS operation for copy_file_range.
> >
> > So, the fix for [1] was to modify the 'copy-from' operation to allow
> > clients to optionally (using the CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ
> > flag) send the extra truncate_seq and truncate_size parameters.  Since
> > only Octopus will have this fix (no backports planned), the client
> > simply needs to ensure the OSDs being used have SERVER_OCTOPUS in their
> > features.
> >
> > My initial solution was to add an extra test in __submit_request,
> > looping all the request ops and checking if the connection has the
> > required features for that operation.  Obviously, at the moment only the
> > copy-from operation has a restriction but I guess others may be added in
> > the future.  I believe that doing this at this point (__submit_request)
> > allows to cover cases where a cluster is being upgraded to Octopus and
> > we have different OSDs running with different feature bits.
> >
> > Unfortunately, this solution is racy because the connection state
> > machine may be changing and the peer_features field isn't yet set.  For
> > example: if the connection to an OSD is being re-open when we're about
> > to check the features, the con->state will be CON_STATE_PREOPEN and the
> > con->peer_features will be 0.  I tried to find ways to move the feature
> > check further down in the stack, but that can't be easily done without
> > adding more infrastructure.  A solution that came to my mind was to add
> > a new con->ops, invoked in the context of ceph_con_workfn, under the
> > con->mutex.  This callback could then verify the available features,
> > aborting the operation if needed.
> >
> > Note that the race in this patchset doesn't seem to be a huge problem,
> > other than occasionally reverting to a VFS generic copy_file_range, as
> > -EOPNOTSUPP will be returned here.  But it's still a race, and there are
> > probably other cases that I'm missing.
> >
> > Anyway, maybe I'm missing an obvious solution for checking these OSD
> > features, but I'm open to any suggestions on other options (or some
> > feedback on the new callback in ceph_connection_operations option).
> >
> > [1] https://tracker.ceph.com/issues/37378
> 
> If the OSD checked for unknown flags, like newer syscalls do, it would
> be super easy, but it looks like it doesn't.
> 
> An obvious solution is to look at require_osd_release in osdmap, but we
> don't decode that in the kernel because it lives the OSD portion of the
> osdmap.  We could add that and consider the fact that the client now
> needs to decode more than just the client portion a design mistake.
> I'm not sure what can of worms does that open and if copy-from alone is
> worth it though.  Perhaps that field could be moved to (or a copy of it
> be replicated in) the client portion of the osdmap starting with
> octopus?  We seem to be running into it on the client side more and
> more...

I can't say I'm thrilled with the idea of going back to hack into the
OSDs code again, I was hoping to be able to handle this with the
information we already have on the connection peer_features field.  It
took me *months* to have the OSD fix merged in so I'm not really
convinced a change to the osdmap would make it into Octopus :-)

(But I'll have a look at this and see if I can understand what moving or
replicating the field in the osdmap would really entail.)

> Given the track record of this feature (the fix for the most recently
> discovered data corrupting bug hasn't even merged yet), I would be very
> hesitant to turn it back on by default even if we figure out a good
> solution for the feature check.  In my opinion, it should stay opt-in.

Ok, makes sense.

And thanks a lot for your feedback, Ilya.

Cheers,
--
Luís
Sage Weil Nov. 8, 2019, 4:57 p.m. UTC | #3
On Fri, 8 Nov 2019, Luis Henriques wrote:
> On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> > If the OSD checked for unknown flags, like newer syscalls do, it would
> > be super easy, but it looks like it doesn't.
> > 
> > An obvious solution is to look at require_osd_release in osdmap, but we
> > don't decode that in the kernel because it lives the OSD portion of the
> > osdmap.  We could add that and consider the fact that the client now
> > needs to decode more than just the client portion a design mistake.
> > I'm not sure what can of worms does that open and if copy-from alone is
> > worth it though.  Perhaps that field could be moved to (or a copy of it
> > be replicated in) the client portion of the osdmap starting with
> > octopus?  We seem to be running into it on the client side more and
> > more...
> 
> I can't say I'm thrilled with the idea of going back to hack into the
> OSDs code again, I was hoping to be able to handle this with the
> information we already have on the connection peer_features field.  It
> took me *months* to have the OSD fix merged in so I'm not really
> convinced a change to the osdmap would make it into Octopus :-)
> 
> (But I'll have a look at this and see if I can understand what moving or
> replicating the field in the osdmap would really entail.)

Adding a copy of require_osd_release in the client portion of the map is 
an easy thing to do (and probably where it should have gone in the first 
place!).  Let's do that!

sage
Ilya Dryomov Nov. 8, 2019, 4:59 p.m. UTC | #4
On Fri, Nov 8, 2019 at 5:48 PM Luis Henriques <lhenriques@suse.com> wrote:
>
> On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> > On Fri, Nov 8, 2019 at 3:15 PM Luis Henriques <lhenriques@suse.com> wrote:
> > >
> > > Hi!
> > >
> > > (Sorry for the long cover letter!)
> >
> > This is exactly what cover letters are for!
> >
> > >
> > > Since the fix for [1] has finally been merged and should be available in
> > > the next (Octopus) ceph release, I'm trying to clean-up my kernel client
> > > patch that tries to find out whether or not it's safe to use the
> > > 'copy-from' RADOS operation for copy_file_range.
> > >
> > > So, the fix for [1] was to modify the 'copy-from' operation to allow
> > > clients to optionally (using the CEPH_OSD_COPY_FROM_FLAG_TRUNCATE_SEQ
> > > flag) send the extra truncate_seq and truncate_size parameters.  Since
> > > only Octopus will have this fix (no backports planned), the client
> > > simply needs to ensure the OSDs being used have SERVER_OCTOPUS in their
> > > features.
> > >
> > > My initial solution was to add an extra test in __submit_request,
> > > looping all the request ops and checking if the connection has the
> > > required features for that operation.  Obviously, at the moment only the
> > > copy-from operation has a restriction but I guess others may be added in
> > > the future.  I believe that doing this at this point (__submit_request)
> > > allows to cover cases where a cluster is being upgraded to Octopus and
> > > we have different OSDs running with different feature bits.
> > >
> > > Unfortunately, this solution is racy because the connection state
> > > machine may be changing and the peer_features field isn't yet set.  For
> > > example: if the connection to an OSD is being re-open when we're about
> > > to check the features, the con->state will be CON_STATE_PREOPEN and the
> > > con->peer_features will be 0.  I tried to find ways to move the feature
> > > check further down in the stack, but that can't be easily done without
> > > adding more infrastructure.  A solution that came to my mind was to add
> > > a new con->ops, invoked in the context of ceph_con_workfn, under the
> > > con->mutex.  This callback could then verify the available features,
> > > aborting the operation if needed.
> > >
> > > Note that the race in this patchset doesn't seem to be a huge problem,
> > > other than occasionally reverting to a VFS generic copy_file_range, as
> > > -EOPNOTSUPP will be returned here.  But it's still a race, and there are
> > > probably other cases that I'm missing.
> > >
> > > Anyway, maybe I'm missing an obvious solution for checking these OSD
> > > features, but I'm open to any suggestions on other options (or some
> > > feedback on the new callback in ceph_connection_operations option).
> > >
> > > [1] https://tracker.ceph.com/issues/37378
> >
> > If the OSD checked for unknown flags, like newer syscalls do, it would
> > be super easy, but it looks like it doesn't.
> >
> > An obvious solution is to look at require_osd_release in osdmap, but we
> > don't decode that in the kernel because it lives the OSD portion of the
> > osdmap.  We could add that and consider the fact that the client now
> > needs to decode more than just the client portion a design mistake.
> > I'm not sure what can of worms does that open and if copy-from alone is
> > worth it though.  Perhaps that field could be moved to (or a copy of it
> > be replicated in) the client portion of the osdmap starting with
> > octopus?  We seem to be running into it on the client side more and
> > more...
>
> I can't say I'm thrilled with the idea of going back to hack into the
> OSDs code again, I was hoping to be able to handle this with the
> information we already have on the connection peer_features field.  It
> took me *months* to have the OSD fix merged in so I'm not really
> convinced a change to the osdmap would make it into Octopus :-)
>
> (But I'll have a look at this and see if I can understand what moving or
> replicating the field in the osdmap would really entail.)

Just to be clear: I'm not suggesting that you do it ;)  More of an
observation that something that is buried deep in the OSD portion of
the osdmap is being needed increasingly by the clients.

Thanks,

                Ilya
Luis Henriques Nov. 8, 2019, 5:16 p.m. UTC | #5
On Fri, Nov 08, 2019 at 04:57:27PM +0000, Sage Weil wrote:
> On Fri, 8 Nov 2019, Luis Henriques wrote:
> > On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> > > If the OSD checked for unknown flags, like newer syscalls do, it would
> > > be super easy, but it looks like it doesn't.
> > > 
> > > An obvious solution is to look at require_osd_release in osdmap, but we
> > > don't decode that in the kernel because it lives the OSD portion of the
> > > osdmap.  We could add that and consider the fact that the client now
> > > needs to decode more than just the client portion a design mistake.
> > > I'm not sure what can of worms does that open and if copy-from alone is
> > > worth it though.  Perhaps that field could be moved to (or a copy of it
> > > be replicated in) the client portion of the osdmap starting with
> > > octopus?  We seem to be running into it on the client side more and
> > > more...
> > 
> > I can't say I'm thrilled with the idea of going back to hack into the
> > OSDs code again, I was hoping to be able to handle this with the
> > information we already have on the connection peer_features field.  It
> > took me *months* to have the OSD fix merged in so I'm not really
> > convinced a change to the osdmap would make it into Octopus :-)
> > 
> > (But I'll have a look at this and see if I can understand what moving or
> > replicating the field in the osdmap would really entail.)
> 
> Adding a copy of require_osd_release in the client portion of the map is 
> an easy thing to do (and probably where it should have gone in the first 
> place!).  Let's do that!

Yeah, after sending my reply to Ilya I took a quick look and it _seems_
to be as easy as adding a new encode(require_osd_release...) in the
OSDMap.  And handle the versions, of course.  Let me spend some time
playing with that and I'll try to come up with something during the next
few days.

Thanks for your feedback.

Cheers,
--
Luís
Sage Weil Nov. 8, 2019, 5:22 p.m. UTC | #6
On Fri, 8 Nov 2019, Luis Henriques wrote:
> On Fri, Nov 08, 2019 at 04:57:27PM +0000, Sage Weil wrote:
> > On Fri, 8 Nov 2019, Luis Henriques wrote:
> > > On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> > > > If the OSD checked for unknown flags, like newer syscalls do, it would
> > > > be super easy, but it looks like it doesn't.
> > > > 
> > > > An obvious solution is to look at require_osd_release in osdmap, but we
> > > > don't decode that in the kernel because it lives the OSD portion of the
> > > > osdmap.  We could add that and consider the fact that the client now
> > > > needs to decode more than just the client portion a design mistake.
> > > > I'm not sure what can of worms does that open and if copy-from alone is
> > > > worth it though.  Perhaps that field could be moved to (or a copy of it
> > > > be replicated in) the client portion of the osdmap starting with
> > > > octopus?  We seem to be running into it on the client side more and
> > > > more...
> > > 
> > > I can't say I'm thrilled with the idea of going back to hack into the
> > > OSDs code again, I was hoping to be able to handle this with the
> > > information we already have on the connection peer_features field.  It
> > > took me *months* to have the OSD fix merged in so I'm not really
> > > convinced a change to the osdmap would make it into Octopus :-)
> > > 
> > > (But I'll have a look at this and see if I can understand what moving or
> > > replicating the field in the osdmap would really entail.)
> > 
> > Adding a copy of require_osd_release in the client portion of the map is 
> > an easy thing to do (and probably where it should have gone in the first 
> > place!).  Let's do that!
> 
> Yeah, after sending my reply to Ilya I took a quick look and it _seems_
> to be as easy as adding a new encode(require_osd_release...) in the
> OSDMap.  And handle the versions, of course.  Let me spend some time
> playing with that and I'll try to come up with something during the next
> few days.

- You'll need to add it for both OSDMap::Incremental and OSDMap
- You'll need to make the encoding condition by updating the block like 
the one below from OSDMap::encode()

    uint8_t v = 9;
    if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
      v = 3;
    } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
      v = 6;
    } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
      v = 7;
    }

to include a SERVER_OCTOPUS case too.  Same goes for Incremental::encode()

sage
Luis Henriques Nov. 8, 2019, 5:31 p.m. UTC | #7
On Fri, Nov 08, 2019 at 05:22:28PM +0000, Sage Weil wrote:
> On Fri, 8 Nov 2019, Luis Henriques wrote:
> > On Fri, Nov 08, 2019 at 04:57:27PM +0000, Sage Weil wrote:
> > > On Fri, 8 Nov 2019, Luis Henriques wrote:
> > > > On Fri, Nov 08, 2019 at 04:15:35PM +0100, Ilya Dryomov wrote:
> > > > > If the OSD checked for unknown flags, like newer syscalls do, it would
> > > > > be super easy, but it looks like it doesn't.
> > > > > 
> > > > > An obvious solution is to look at require_osd_release in osdmap, but we
> > > > > don't decode that in the kernel because it lives the OSD portion of the
> > > > > osdmap.  We could add that and consider the fact that the client now
> > > > > needs to decode more than just the client portion a design mistake.
> > > > > I'm not sure what can of worms does that open and if copy-from alone is
> > > > > worth it though.  Perhaps that field could be moved to (or a copy of it
> > > > > be replicated in) the client portion of the osdmap starting with
> > > > > octopus?  We seem to be running into it on the client side more and
> > > > > more...
> > > > 
> > > > I can't say I'm thrilled with the idea of going back to hack into the
> > > > OSDs code again, I was hoping to be able to handle this with the
> > > > information we already have on the connection peer_features field.  It
> > > > took me *months* to have the OSD fix merged in so I'm not really
> > > > convinced a change to the osdmap would make it into Octopus :-)
> > > > 
> > > > (But I'll have a look at this and see if I can understand what moving or
> > > > replicating the field in the osdmap would really entail.)
> > > 
> > > Adding a copy of require_osd_release in the client portion of the map is 
> > > an easy thing to do (and probably where it should have gone in the first 
> > > place!).  Let's do that!
> > 
> > Yeah, after sending my reply to Ilya I took a quick look and it _seems_
> > to be as easy as adding a new encode(require_osd_release...) in the
> > OSDMap.  And handle the versions, of course.  Let me spend some time
> > playing with that and I'll try to come up with something during the next
> > few days.
> 
> - You'll need to add it for both OSDMap::Incremental and OSDMap
> - You'll need to make the encoding condition by updating the block like 
> the one below from OSDMap::encode()
> 
>     uint8_t v = 9;
>     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
>       v = 3;
>     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
>       v = 6;
>     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
>       v = 7;
>     }
> 
> to include a SERVER_OCTOPUS case too.  Same goes for Incremental::encode()

Awesome, thanks!  I'll give it a try, and test it with the appropriate
kernel client side changes to use this.

Cheers,
--
Luís
Luis Henriques Nov. 11, 2019, 4:30 p.m. UTC | #8
On Fri, Nov 08, 2019 at 05:31:01PM +0000, Luis Henriques wrote:
<snip>
> > - You'll need to add it for both OSDMap::Incremental and OSDMap
> > - You'll need to make the encoding condition by updating the block like 
> > the one below from OSDMap::encode()
> > 
> >     uint8_t v = 9;
> >     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
> >       v = 3;
> >     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
> >       v = 6;
> >     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
> >       v = 7;
> >     }
> > 
> > to include a SERVER_OCTOPUS case too.  Same goes for Incremental::encode()
> 
> Awesome, thanks!  I'll give it a try, and test it with the appropriate
> kernel client side changes to use this.

Ok, I've got the patch bellow for the OSD code, which IIRC should do
exactly what we want: duplicate the require_osd_release in the client
side.

Now, in order to quickly test this I've started adding flags to the
CEPH_FEATURES_SUPPORTED_DEFAULT definition.  SERVER_MIMIC *seemed* to be
Ok, but once I've added SERVER_NAUTILUS I've realized that we'll need to
handle TYPE_MSGR2 address.  Which is a _big_ thing.  Is anyone already
looking into adding support for msgr v2 to the kernel client?

Cheers,
--
Luís

diff --git a/src/osd/OSDMap.cc b/src/osd/OSDMap.cc
index 6b5930743a33..b38d91d98fcf 100644
--- a/src/osd/OSDMap.cc
+++ b/src/osd/OSDMap.cc
@@ -555,13 +555,15 @@ void OSDMap::Incremental::encode(ceph::buffer::list& bl, uint64_t features) cons
   ENCODE_START(8, 7, bl);
 
   {
-    uint8_t v = 8;
+    uint8_t v = 9;
     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
       v = 3;
     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
       v = 5;
     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
       v = 6;
+    } else if (!HAVE_FEATURE(features, SERVER_OCTOPUS)) {
+      v = 8;
     }
     ENCODE_START(v, 1, bl); // client-usable data
     encode(fsid, bl);
@@ -611,6 +613,9 @@ void OSDMap::Incremental::encode(ceph::buffer::list& bl, uint64_t features) cons
       encode(new_last_up_change, bl);
       encode(new_last_in_change, bl);
     }
+    if (v >= 9) {
+      encode(new_require_osd_release, bl);
+    }
     ENCODE_FINISH(bl); // client-usable data
   }
 
@@ -816,7 +821,7 @@ void OSDMap::Incremental::decode(ceph::buffer::list::const_iterator& bl)
     return;
   }
   {
-    DECODE_START(8, bl); // client-usable data
+    DECODE_START(9, bl); // client-usable data
     decode(fsid, bl);
     decode(epoch, bl);
     decode(modified, bl);
@@ -2847,13 +2852,15 @@ void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
   {
     // NOTE: any new encoding dependencies must be reflected by
     // SIGNIFICANT_FEATURES
-    uint8_t v = 9;
+    uint8_t v = 10;
     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
       v = 3;
     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
       v = 6;
     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
       v = 7;
+    } else if (!HAVE_FEATURE(features, SERVER_OCTOPUS)) {
+      v = 9;
     }
     ENCODE_START(v, 1, bl); // client-usable data
     // base
@@ -2929,6 +2936,9 @@ void OSDMap::encode(ceph::buffer::list& bl, uint64_t features) const
       encode(last_up_change, bl);
       encode(last_in_change, bl);
     }
+    if (v >= 10) {
+      encode(require_osd_release, bl);
+    }
     ENCODE_FINISH(bl); // client-usable data
   }
 
@@ -3170,7 +3180,7 @@ void OSDMap::decode(ceph::buffer::list::const_iterator& bl)
    * Since we made it past that hurdle, we can use our normal paths.
    */
   {
-    DECODE_START(9, bl); // client-usable data
+    DECODE_START(10, bl); // client-usable data
     // base
     decode(fsid, bl);
     decode(epoch, bl);
Ilya Dryomov Nov. 11, 2019, 8:51 p.m. UTC | #9
On Mon, Nov 11, 2019 at 5:30 PM Luis Henriques <lhenriques@suse.com> wrote:
>
> On Fri, Nov 08, 2019 at 05:31:01PM +0000, Luis Henriques wrote:
> <snip>
> > > - You'll need to add it for both OSDMap::Incremental and OSDMap
> > > - You'll need to make the encoding condition by updating the block like
> > > the one below from OSDMap::encode()
> > >
> > >     uint8_t v = 9;
> > >     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
> > >       v = 3;
> > >     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
> > >       v = 6;
> > >     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
> > >       v = 7;
> > >     }
> > >
> > > to include a SERVER_OCTOPUS case too.  Same goes for Incremental::encode()
> >
> > Awesome, thanks!  I'll give it a try, and test it with the appropriate
> > kernel client side changes to use this.
>
> Ok, I've got the patch bellow for the OSD code, which IIRC should do
> exactly what we want: duplicate the require_osd_release in the client
> side.
>
> Now, in order to quickly test this I've started adding flags to the
> CEPH_FEATURES_SUPPORTED_DEFAULT definition.  SERVER_MIMIC *seemed* to be
> Ok, but once I've added SERVER_NAUTILUS I've realized that we'll need to
> handle TYPE_MSGR2 address.  Which is a _big_ thing.  Is anyone already
> looking into adding support for msgr v2 to the kernel client?

It should be easy enough to hack around it for testing purposes.

I made some initial steps and hope to be able to dedicate the 5.6 cycle
to it.

Thanks,

                Ilya
Luis Henriques Nov. 12, 2019, 10:42 a.m. UTC | #10
On Mon, Nov 11, 2019 at 09:51:47PM +0100, Ilya Dryomov wrote:
> On Mon, Nov 11, 2019 at 5:30 PM Luis Henriques <lhenriques@suse.com> wrote:
> >
> > On Fri, Nov 08, 2019 at 05:31:01PM +0000, Luis Henriques wrote:
> > <snip>
> > > > - You'll need to add it for both OSDMap::Incremental and OSDMap
> > > > - You'll need to make the encoding condition by updating the block like
> > > > the one below from OSDMap::encode()
> > > >
> > > >     uint8_t v = 9;
> > > >     if (!HAVE_FEATURE(features, SERVER_LUMINOUS)) {
> > > >       v = 3;
> > > >     } else if (!HAVE_FEATURE(features, SERVER_MIMIC)) {
> > > >       v = 6;
> > > >     } else if (!HAVE_FEATURE(features, SERVER_NAUTILUS)) {
> > > >       v = 7;
> > > >     }
> > > >
> > > > to include a SERVER_OCTOPUS case too.  Same goes for Incremental::encode()
> > >
> > > Awesome, thanks!  I'll give it a try, and test it with the appropriate
> > > kernel client side changes to use this.
> >
> > Ok, I've got the patch bellow for the OSD code, which IIRC should do
> > exactly what we want: duplicate the require_osd_release in the client
> > side.
> >
> > Now, in order to quickly test this I've started adding flags to the
> > CEPH_FEATURES_SUPPORTED_DEFAULT definition.  SERVER_MIMIC *seemed* to be
> > Ok, but once I've added SERVER_NAUTILUS I've realized that we'll need to
> > handle TYPE_MSGR2 address.  Which is a _big_ thing.  Is anyone already
> > looking into adding support for msgr v2 to the kernel client?
> 
> It should be easy enough to hack around it for testing purposes.
>
> I made some initial steps and hope to be able to dedicate the 5.6 cycle
> to it.

Yeah, I'll give that a try; adding support for that new address type
shouldn't be a big deal.  I was just wondering if that wasn't already
being handling by any new msgrv2 code under development.  Thanks, Ilya.

Cheers,
--
Luís