diff mbox

[4/4] nbd: add a nbd-control interface

Message ID 1484949412-6903-4-git-send-email-jbacik@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Josef Bacik Jan. 20, 2017, 9:56 p.m. UTC
This patch mirrors the loop back device behavior with a few changes.  First
there is no DEL operation as NBD doesn't get as much churn as loop devices do.
Secondly the GET_NEXT operation can optionally create a new NBD device or not.
Our infrastructure people want to not allow NBD to create new devices as it
causes problems for them in containers.  However allow this to be optional as
things like the OSS NBD client probably doesn't care and would like to just be
given a device to use.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c      | 129 ++++++++++++++++++++++++++++++++++++++++++-----
 include/uapi/linux/nbd.h |  10 ++++
 2 files changed, 125 insertions(+), 14 deletions(-)

Comments

Greg KH Jan. 21, 2017, 9:05 a.m. UTC | #1
On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> This patch mirrors the loop back device behavior with a few changes.  First
> there is no DEL operation as NBD doesn't get as much churn as loop devices do.
> Secondly the GET_NEXT operation can optionally create a new NBD device or not.
> Our infrastructure people want to not allow NBD to create new devices as it
> causes problems for them in containers.  However allow this to be optional as
> things like the OSS NBD client probably doesn't care and would like to just be
> given a device to use.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

A random char device with odd ioctls?  Why?  There's no other
configuration choice you could possibly use?  Where is the userspace
tool that uses this new kernel api?

You aren't passing in structures to the ioctl, so why does this HAVE to
be an ioctl?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wouter Verhelst Jan. 21, 2017, 12:11 p.m. UTC | #2
On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> This patch mirrors the loop back device behavior with a few changes.  First
> there is no DEL operation as NBD doesn't get as much churn as loop devices do.
> Secondly the GET_NEXT operation can optionally create a new NBD device or not.
> Our infrastructure people want to not allow NBD to create new devices as it
> causes problems for them in containers.  However allow this to be optional as
> things like the OSS NBD client probably doesn't care and would like to just be
> given a device to use.

Don't be so sure :-)

I agree that having a control device for NBD is useful and would make
certain things much easier. If that's added, then I'll move to using
that as a way to control the device rather than opening a device and
dealing with it that way. In fact, at some point in the past I did
suggest something along those ways myself; it's just not happened yet.

Obviously though this would require an intermediate situation in which
the control master would be available as well as (optionally perhaps)
the old way where you open a specific device node, so that we don't
break existing implementations before they've had a chance to follow
suit.
Josef Bacik Jan. 21, 2017, 1:44 p.m. UTC | #3
> On Jan 21, 2017, at 7:12 AM, Wouter Verhelst <w@uter.be> wrote:
> 
>> On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
>> This patch mirrors the loop back device behavior with a few changes.  First
>> there is no DEL operation as NBD doesn't get as much churn as loop devices do.
>> Secondly the GET_NEXT operation can optionally create a new NBD device or not.
>> Our infrastructure people want to not allow NBD to create new devices as it
>> causes problems for them in containers.  However allow this to be optional as
>> things like the OSS NBD client probably doesn't care and would like to just be
>> given a device to use.
> 
> Don't be so sure :-)
> 
> I agree that having a control device for NBD is useful and would make
> certain things much easier. If that's added, then I'll move to using
> that as a way to control the device rather than opening a device and
> dealing with it that way. In fact, at some point in the past I did
> suggest something along those ways myself; it's just not happened yet.
> 
> Obviously though this would require an intermediate situation in which
> the control master would be available as well as (optionally perhaps)
> the old way where you open a specific device node, so that we don't
> break existing implementations before they've had a chance to follow
> suit.

Sorry I wasn't super clear.  This doesn't change anything about how the devices are setup, it just means if you do max_nbds=0 you can dynamically add more as you need them, and you can find unused nbd devices easily instead of making the user specify them.  When I get home tonight I'll push my nbd-client patch so you can see how I use it.  Thanks,

Josef--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 23, 2017, 2:42 p.m. UTC | #4
On Sat, Jan 21, 2017 at 4:05 AM, Greg KH <gregkh@linuxfoundation.org> 
wrote:
> On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
>>  This patch mirrors the loop back device behavior with a few 
>> changes.  First
>>  there is no DEL operation as NBD doesn't get as much churn as loop 
>> devices do.
>>  Secondly the GET_NEXT operation can optionally create a new NBD 
>> device or not.
>>  Our infrastructure people want to not allow NBD to create new 
>> devices as it
>>  causes problems for them in containers.  However allow this to be 
>> optional as
>>  things like the OSS NBD client probably doesn't care and would like 
>> to just be
>>  given a device to use.
>> 
>>  Signed-off-by: Josef Bacik <jbacik@fb.com>
> 
> A random char device with odd ioctls?  Why?  There's no other
> configuration choice you could possibly use?  Where is the userspace
> tool that uses this new kernel api?
> 
> You aren't passing in structures to the ioctl, so why does this HAVE 
> to
> be an ioctl?

Again, this is how loop does it so I assumed a known, regularly used 
API was the best bet.  I can do literally anything, but these 
interfaces have to be used by other people, including internal people.  
The /dev/whatever-control is a well established way for interacting 
with dynamic device drivers (loop, DM, btrfs), so that's what I went 
with.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 23, 2017, 2:52 p.m. UTC | #5
On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
> 
> 
> On Sat, Jan 21, 2017 at 4:05 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> > >  This patch mirrors the loop back device behavior with a few
> > > changes.  First
> > >  there is no DEL operation as NBD doesn't get as much churn as loop
> > > devices do.
> > >  Secondly the GET_NEXT operation can optionally create a new NBD
> > > device or not.
> > >  Our infrastructure people want to not allow NBD to create new
> > > devices as it
> > >  causes problems for them in containers.  However allow this to be
> > > optional as
> > >  things like the OSS NBD client probably doesn't care and would like
> > > to just be
> > >  given a device to use.
> > > 
> > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
> > 
> > A random char device with odd ioctls?  Why?  There's no other
> > configuration choice you could possibly use?  Where is the userspace
> > tool that uses this new kernel api?
> > 
> > You aren't passing in structures to the ioctl, so why does this HAVE to
> > be an ioctl?
> 
> Again, this is how loop does it so I assumed a known, regularly used API was
> the best bet.  I can do literally anything, but these interfaces have to be
> used by other people, including internal people.  The /dev/whatever-control
> is a well established way for interacting with dynamic device drivers (loop,
> DM, btrfs), so that's what I went with.  Thanks,

Again, please don't duplicate what loop did, we must _learn_ from
history, not repeat it :(
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 23, 2017, 2:57 p.m. UTC | #6
On Mon, Jan 23, 2017 at 9:52 AM, Greg KH <gregkh@linuxfoundation.org> 
wrote:
> On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
>> 
>> 
>>  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH 
>> <gregkh@linuxfoundation.org> wrote:
>>  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
>>  > >  This patch mirrors the loop back device behavior with a few
>>  > > changes.  First
>>  > >  there is no DEL operation as NBD doesn't get as much churn as 
>> loop
>>  > > devices do.
>>  > >  Secondly the GET_NEXT operation can optionally create a new NBD
>>  > > device or not.
>>  > >  Our infrastructure people want to not allow NBD to create new
>>  > > devices as it
>>  > >  causes problems for them in containers.  However allow this to 
>> be
>>  > > optional as
>>  > >  things like the OSS NBD client probably doesn't care and would 
>> like
>>  > > to just be
>>  > >  given a device to use.
>>  > >
>>  > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
>>  >
>>  > A random char device with odd ioctls?  Why?  There's no other
>>  > configuration choice you could possibly use?  Where is the 
>> userspace
>>  > tool that uses this new kernel api?
>>  >
>>  > You aren't passing in structures to the ioctl, so why does this 
>> HAVE to
>>  > be an ioctl?
>> 
>>  Again, this is how loop does it so I assumed a known, regularly 
>> used API was
>>  the best bet.  I can do literally anything, but these interfaces 
>> have to be
>>  used by other people, including internal people.  The 
>> /dev/whatever-control
>>  is a well established way for interacting with dynamic device 
>> drivers (loop,
>>  DM, btrfs), so that's what I went with.  Thanks,
> 
> Again, please don't duplicate what loop did, we must _learn_ from
> history, not repeat it :(

Sure but what am I supposed to do?  Have some random sysfs knobs?  
Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 23, 2017, 3:03 p.m. UTC | #7
On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
> On Mon, Jan 23, 2017 at 9:52 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
> > > 
> > > 
> > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
> > > <gregkh@linuxfoundation.org> wrote:
> > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> > >  > >  This patch mirrors the loop back device behavior with a few
> > >  > > changes.  First
> > >  > >  there is no DEL operation as NBD doesn't get as much churn as
> > > loop
> > >  > > devices do.
> > >  > >  Secondly the GET_NEXT operation can optionally create a new NBD
> > >  > > device or not.
> > >  > >  Our infrastructure people want to not allow NBD to create new
> > >  > > devices as it
> > >  > >  causes problems for them in containers.  However allow this to
> > > be
> > >  > > optional as
> > >  > >  things like the OSS NBD client probably doesn't care and would
> > > like
> > >  > > to just be
> > >  > >  given a device to use.
> > >  > >
> > >  > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
> > >  >
> > >  > A random char device with odd ioctls?  Why?  There's no other
> > >  > configuration choice you could possibly use?  Where is the
> > > userspace
> > >  > tool that uses this new kernel api?
> > >  >
> > >  > You aren't passing in structures to the ioctl, so why does this
> > > HAVE to
> > >  > be an ioctl?
> > > 
> > >  Again, this is how loop does it so I assumed a known, regularly
> > > used API was
> > >  the best bet.  I can do literally anything, but these interfaces
> > > have to be
> > >  used by other people, including internal people.  The
> > > /dev/whatever-control
> > >  is a well established way for interacting with dynamic device
> > > drivers (loop,
> > >  DM, btrfs), so that's what I went with.  Thanks,
> > 
> > Again, please don't duplicate what loop did, we must _learn_ from
> > history, not repeat it :(
> 
> Sure but what am I supposed to do?  Have some random sysfs knobs?  Thanks,

It all depends on what you are trying to do.  I have yet to figure that
out at all here :(
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josef Bacik Jan. 23, 2017, 3:52 p.m. UTC | #8
On Mon, Jan 23, 2017 at 10:03 AM, Greg KH <gregkh@linuxfoundation.org> 
wrote:
> On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
>>  On Mon, Jan 23, 2017 at 9:52 AM, Greg KH 
>> <gregkh@linuxfoundation.org> wrote:
>>  > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
>>  > >
>>  > >
>>  > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
>>  > > <gregkh@linuxfoundation.org> wrote:
>>  > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
>>  > >  > >  This patch mirrors the loop back device behavior with a 
>> few
>>  > >  > > changes.  First
>>  > >  > >  there is no DEL operation as NBD doesn't get as much 
>> churn as
>>  > > loop
>>  > >  > > devices do.
>>  > >  > >  Secondly the GET_NEXT operation can optionally create a 
>> new NBD
>>  > >  > > device or not.
>>  > >  > >  Our infrastructure people want to not allow NBD to create 
>> new
>>  > >  > > devices as it
>>  > >  > >  causes problems for them in containers.  However allow 
>> this to
>>  > > be
>>  > >  > > optional as
>>  > >  > >  things like the OSS NBD client probably doesn't care and 
>> would
>>  > > like
>>  > >  > > to just be
>>  > >  > >  given a device to use.
>>  > >  > >
>>  > >  > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
>>  > >  >
>>  > >  > A random char device with odd ioctls?  Why?  There's no other
>>  > >  > configuration choice you could possibly use?  Where is the
>>  > > userspace
>>  > >  > tool that uses this new kernel api?
>>  > >  >
>>  > >  > You aren't passing in structures to the ioctl, so why does 
>> this
>>  > > HAVE to
>>  > >  > be an ioctl?
>>  > >
>>  > >  Again, this is how loop does it so I assumed a known, regularly
>>  > > used API was
>>  > >  the best bet.  I can do literally anything, but these 
>> interfaces
>>  > > have to be
>>  > >  used by other people, including internal people.  The
>>  > > /dev/whatever-control
>>  > >  is a well established way for interacting with dynamic device
>>  > > drivers (loop,
>>  > >  DM, btrfs), so that's what I went with.  Thanks,
>>  >
>>  > Again, please don't duplicate what loop did, we must _learn_ from
>>  > history, not repeat it :(
>> 
>>  Sure but what am I supposed to do?  Have some random sysfs knobs?  
>> Thanks,
> 
> It all depends on what you are trying to do.  I have yet to figure 
> that
> out at all here :(

I explained it in the changelog and my response to Wouter.  NBD 
preallocates all of its /dev/nbd# devices at modprobe time, so there's 
no way to add new devices as we need them.  Loop accomplishes this with 
the /dev/loop-control and an ioctl.  Then we also need a way to figure 
out what is the first /dev/nbd# device that isn't currently in use in 
order to pick the next one to configure.  Keep in mind that all of the 
configuration for /dev/nbd# devices is done through ioctls to those 
devices, so having a ioctl interface for the control device is 
consistent with the rest of how NBD works.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 24, 2017, 7:11 a.m. UTC | #9
On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:
> On Mon, Jan 23, 2017 at 10:03 AM, Greg KH <gregkh@linuxfoundation.org>
> wrote:
> > On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
> > >  On Mon, Jan 23, 2017 at 9:52 AM, Greg KH
> > > <gregkh@linuxfoundation.org> wrote:
> > >  > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
> > >  > >
> > >  > >
> > >  > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
> > >  > > <gregkh@linuxfoundation.org> wrote:
> > >  > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik wrote:
> > >  > >  > >  This patch mirrors the loop back device behavior with a
> > > few
> > >  > >  > > changes.  First
> > >  > >  > >  there is no DEL operation as NBD doesn't get as much
> > > churn as
> > >  > > loop
> > >  > >  > > devices do.
> > >  > >  > >  Secondly the GET_NEXT operation can optionally create a
> > > new NBD
> > >  > >  > > device or not.
> > >  > >  > >  Our infrastructure people want to not allow NBD to create
> > > new
> > >  > >  > > devices as it
> > >  > >  > >  causes problems for them in containers.  However allow
> > > this to
> > >  > > be
> > >  > >  > > optional as
> > >  > >  > >  things like the OSS NBD client probably doesn't care and
> > > would
> > >  > > like
> > >  > >  > > to just be
> > >  > >  > >  given a device to use.
> > >  > >  > >
> > >  > >  > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
> > >  > >  >
> > >  > >  > A random char device with odd ioctls?  Why?  There's no other
> > >  > >  > configuration choice you could possibly use?  Where is the
> > >  > > userspace
> > >  > >  > tool that uses this new kernel api?
> > >  > >  >
> > >  > >  > You aren't passing in structures to the ioctl, so why does
> > > this
> > >  > > HAVE to
> > >  > >  > be an ioctl?
> > >  > >
> > >  > >  Again, this is how loop does it so I assumed a known, regularly
> > >  > > used API was
> > >  > >  the best bet.  I can do literally anything, but these
> > > interfaces
> > >  > > have to be
> > >  > >  used by other people, including internal people.  The
> > >  > > /dev/whatever-control
> > >  > >  is a well established way for interacting with dynamic device
> > >  > > drivers (loop,
> > >  > >  DM, btrfs), so that's what I went with.  Thanks,
> > >  >
> > >  > Again, please don't duplicate what loop did, we must _learn_ from
> > >  > history, not repeat it :(
> > > 
> > >  Sure but what am I supposed to do?  Have some random sysfs knobs?
> > > Thanks,
> > 
> > It all depends on what you are trying to do.  I have yet to figure that
> > out at all here :(
> 
> I explained it in the changelog and my response to Wouter.  NBD preallocates
> all of its /dev/nbd# devices at modprobe time, so there's no way to add new
> devices as we need them.

Why not fix that odd restriction?

> Loop accomplishes this with the /dev/loop-control
> and an ioctl.  Then we also need a way to figure out what is the first
> /dev/nbd# device that isn't currently in use in order to pick the next one
> to configure.  Keep in mind that all of the configuration for /dev/nbd#
> devices is done through ioctls to those devices, so having a ioctl interface
> for the control device is consistent with the rest of how NBD works.

But adding a random char device node and using ioctls on it is different
than using an ioctl on an existing block device node that is already
there.

So what _exactly_ do you need to do with this interface?  What data do
you need sent/received to/from the kernel?  Specifics please.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wouter Verhelst Jan. 25, 2017, 8:55 a.m. UTC | #10
On Tue, Jan 24, 2017 at 08:11:52AM +0100, Greg KH wrote:
> On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:
> > I explained it in the changelog and my response to Wouter.  NBD preallocates
> > all of its /dev/nbd# devices at modprobe time, so there's no way to add new
> > devices as we need them.
> 
> Why not fix that odd restriction?

Isn't that what this patch is trying to do?

> > Loop accomplishes this with the /dev/loop-control
> > and an ioctl.  Then we also need a way to figure out what is the first
> > /dev/nbd# device that isn't currently in use in order to pick the next one
> > to configure.  Keep in mind that all of the configuration for /dev/nbd#
> > devices is done through ioctls to those devices, so having a ioctl interface
> > for the control device is consistent with the rest of how NBD works.
> 
> But adding a random char device node and using ioctls on it is different
> than using an ioctl on an existing block device node that is already
> there.
> 
> So what _exactly_ do you need to do with this interface?  What data do
> you need sent/received to/from the kernel?  Specifics please.

AIUI: Create new devices, and figure out which device is the next one
free so we can find one without having to check the pid attribute for
every device (as is needed today, and which is racy).
Josef Bacik Jan. 25, 2017, 1:47 p.m. UTC | #11
On Tue, Jan 24, 2017 at 2:11 AM, Greg KH <gregkh@linuxfoundation.org> 
wrote:
> On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:
>>  On Mon, Jan 23, 2017 at 10:03 AM, Greg KH 
>> <gregkh@linuxfoundation.org>
>>  wrote:
>>  > On Mon, Jan 23, 2017 at 09:57:52AM -0500, Josef Bacik wrote:
>>  > >  On Mon, Jan 23, 2017 at 9:52 AM, Greg KH
>>  > > <gregkh@linuxfoundation.org> wrote:
>>  > >  > On Mon, Jan 23, 2017 at 09:42:08AM -0500, Josef Bacik wrote:
>>  > >  > >
>>  > >  > >
>>  > >  > >  On Sat, Jan 21, 2017 at 4:05 AM, Greg KH
>>  > >  > > <gregkh@linuxfoundation.org> wrote:
>>  > >  > >  > On Fri, Jan 20, 2017 at 04:56:52PM -0500, Josef Bacik 
>> wrote:
>>  > >  > >  > >  This patch mirrors the loop back device behavior 
>> with a
>>  > > few
>>  > >  > >  > > changes.  First
>>  > >  > >  > >  there is no DEL operation as NBD doesn't get as much
>>  > > churn as
>>  > >  > > loop
>>  > >  > >  > > devices do.
>>  > >  > >  > >  Secondly the GET_NEXT operation can optionally 
>> create a
>>  > > new NBD
>>  > >  > >  > > device or not.
>>  > >  > >  > >  Our infrastructure people want to not allow NBD to 
>> create
>>  > > new
>>  > >  > >  > > devices as it
>>  > >  > >  > >  causes problems for them in containers.  However 
>> allow
>>  > > this to
>>  > >  > > be
>>  > >  > >  > > optional as
>>  > >  > >  > >  things like the OSS NBD client probably doesn't care 
>> and
>>  > > would
>>  > >  > > like
>>  > >  > >  > > to just be
>>  > >  > >  > >  given a device to use.
>>  > >  > >  > >
>>  > >  > >  > >  Signed-off-by: Josef Bacik <jbacik@fb.com>
>>  > >  > >  >
>>  > >  > >  > A random char device with odd ioctls?  Why?  There's no 
>> other
>>  > >  > >  > configuration choice you could possibly use?  Where is 
>> the
>>  > >  > > userspace
>>  > >  > >  > tool that uses this new kernel api?
>>  > >  > >  >
>>  > >  > >  > You aren't passing in structures to the ioctl, so why 
>> does
>>  > > this
>>  > >  > > HAVE to
>>  > >  > >  > be an ioctl?
>>  > >  > >
>>  > >  > >  Again, this is how loop does it so I assumed a known, 
>> regularly
>>  > >  > > used API was
>>  > >  > >  the best bet.  I can do literally anything, but these
>>  > > interfaces
>>  > >  > > have to be
>>  > >  > >  used by other people, including internal people.  The
>>  > >  > > /dev/whatever-control
>>  > >  > >  is a well established way for interacting with dynamic 
>> device
>>  > >  > > drivers (loop,
>>  > >  > >  DM, btrfs), so that's what I went with.  Thanks,
>>  > >  >
>>  > >  > Again, please don't duplicate what loop did, we must _learn_ 
>> from
>>  > >  > history, not repeat it :(
>>  > >
>>  > >  Sure but what am I supposed to do?  Have some random sysfs 
>> knobs?
>>  > > Thanks,
>>  >
>>  > It all depends on what you are trying to do.  I have yet to 
>> figure that
>>  > out at all here :(
>> 
>>  I explained it in the changelog and my response to Wouter.  NBD 
>> preallocates
>>  all of its /dev/nbd# devices at modprobe time, so there's no way to 
>> add new
>>  devices as we need them.
> 
> Why not fix that odd restriction?

That's the entire point of this patchset, adding the ability to create 
new devices on the fly as needed.

> 
> 
>>  Loop accomplishes this with the /dev/loop-control
>>  and an ioctl.  Then we also need a way to figure out what is the 
>> first
>>  /dev/nbd# device that isn't currently in use in order to pick the 
>> next one
>>  to configure.  Keep in mind that all of the configuration for 
>> /dev/nbd#
>>  devices is done through ioctls to those devices, so having a ioctl 
>> interface
>>  for the control device is consistent with the rest of how NBD works.
> 
> But adding a random char device node and using ioctls on it is 
> different
> than using an ioctl on an existing block device node that is already
> there.
> 
> So what _exactly_ do you need to do with this interface?  What data do
> you need sent/received to/from the kernel?  Specifics please.

I need to say "Hey kernel, can you setup some new nbd devices for me?" 
and "Hey kernel, what is the first free nbd device I can use for this 
new connection and (optionally) create a new device for me if one does 
not exist?"  Loop accomplished this (in 2011 btw, not some far off 
date) with /dev/loop-control.  Btrfs has it's scanning stuff controlled 
by /dev/btrfs-control.  Device mapper has /dev/mapper/control.  This is 
a well established and widely used way for control block based device 
drivers.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Jan. 25, 2017, 2:23 p.m. UTC | #12
On Wed, Jan 25, 2017 at 2:47 PM, Josef Bacik <jbacik@fb.com> wrote:
> On Tue, Jan 24, 2017 at 2:11 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:
>>>  On Mon, Jan 23, 2017 at 10:03 AM, Greg KH <gregkh@linuxfoundation.org>
>>>  Loop accomplishes this with the /dev/loop-control
>>>  and an ioctl.  Then we also need a way to figure out what is the first
>>>  /dev/nbd# device that isn't currently in use in order to pick the next
>>> one
>>>  to configure.  Keep in mind that all of the configuration for /dev/nbd#
>>>  devices is done through ioctls to those devices, so having a ioctl
>>> interface
>>>  for the control device is consistent with the rest of how NBD works.
>>
>>
>> But adding a random char device node and using ioctls on it is different
>> than using an ioctl on an existing block device node that is already
>> there.
>>
>> So what _exactly_ do you need to do with this interface?  What data do
>> you need sent/received to/from the kernel?  Specifics please.
>
>
> I need to say "Hey kernel, can you setup some new nbd devices for me?" and
> "Hey kernel, what is the first free nbd device I can use for this new
> connection and (optionally) create a new device for me if one does not
> exist?"  Loop accomplished this (in 2011 btw, not some far off date) with
> /dev/loop-control.  Btrfs has it's scanning stuff controlled by
> /dev/btrfs-control.  Device mapper has /dev/mapper/control.  This is a well
> established and widely used way for control block based device drivers.

We have multiple established ways to deal with this kind of problem, the most
common ones being a separate chardev as you propose, a sysfs interface and
netlink.

They all have their own set of problems, but the needs of nbd as a network
storage interface seem most closely resemble what we have for other network
related interfaces, where we typically use netlink to do the setup, see e.g.
macvtap as an example for a network chardev interface.

Can you have a look if that would solve your problem more nicely?

     Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Gartrell Jan. 25, 2017, 4:48 p.m. UTC | #13
On 1/25/17, 6:23 AM, "arndbergmann@gmail.com on behalf of Arnd Bergmann" <arndbergmann@gmail.com on behalf of arnd@arndb.de> wrote:
> On Wed, Jan 25, 2017 at 2:47 PM, Josef Bacik <jbacik@fb.com> wrote:

> > On Tue, Jan 24, 2017 at 2:11 AM, Greg KH <gregkh@linuxfoundation.org> wrote:

> >> On Mon, Jan 23, 2017 at 10:52:42AM -0500, Josef Bacik wrote:

> >>>  On Mon, Jan 23, 2017 at 10:03 AM, Greg KH <gregkh@linuxfoundation.org>

> >>>  Loop accomplishes this with the /dev/loop-control

> >>>  and an ioctl.  Then we also need a way to figure out what is the first

> >>>  /dev/nbd# device that isn't currently in use in order to pick the next

> >>> one

> >>>  to configure.  Keep in mind that all of the configuration for /dev/nbd#

> >>>  devices is done through ioctls to those devices, so having a ioctl

> >>> interface

> >>>  for the control device is consistent with the rest of how NBD works.

> >>

> >>

> >> But adding a random char device node and using ioctls on it is different

> >> than using an ioctl on an existing block device node that is already

> >> there.

> >>

> >> So what _exactly_ do you need to do with this interface?  What data do

> >> you need sent/received to/from the kernel?  Specifics please.

> >

> >

> > I need to say "Hey kernel, can you setup some new nbd devices for me?" and

> > "Hey kernel, what is the first free nbd device I can use for this new

> > connection and (optionally) create a new device for me if one does not

> > exist?"  Loop accomplished this (in 2011 btw, not some far off date) with

> > /dev/loop-control.  Btrfs has it's scanning stuff controlled by

> > /dev/btrfs-control.  Device mapper has /dev/mapper/control.  This is a well

> > established and widely used way for control block based device drivers.

> 

> We have multiple established ways to deal with this kind of problem, the most

> common ones being a separate chardev as you propose, a sysfs interface and

> netlink.

> 

> They all have their own set of problems, but the needs of nbd as a network

> storage interface seem most closely resemble what we have for other network

> related interfaces, where we typically use netlink to do the setup, see e.g.

> macvtap as an example for a network chardev interface.

> 

> Can you have a look if that would solve your problem more nicely?


FWIW, I'm the one consuming this nbd stuff at Facebook and bringing
netlink into this would be a huge pain for me.  It's very weird to
need to pull sockets in and then drop back to ioctls from a design
perspective, and future consumers of this API would be sad as well.
This is compounded, I think, by the fact that the breadth of
functionality here doesn't really merit a shared library for everyone
to use -- could you imagine libnbdcontrol, which exposes a single
"get_next_device" function?

If nbd were *all* netlink I think that that'd be fine, but you'd have
problems implementing the NBD_DOIT function in that fashion.  So I'd
rather stick to the char device ioctl thing because it's more
consistent with the old NBD stuff as well as the loop device stuff.

-- 
Alex Gartrell <agartrell@fb.com>
Wouter Verhelst Jan. 25, 2017, 6:21 p.m. UTC | #14
On Wed, Jan 25, 2017 at 04:48:27PM +0000, Alex Gartrell wrote:
> On 1/25/17, 6:23 AM, "arndbergmann@gmail.com on behalf of Arnd Bergmann" <arndbergmann@gmail.com on behalf of arnd@arndb.de> wrote:
> > We have multiple established ways to deal with this kind of problem, the most
> > common ones being a separate chardev as you propose, a sysfs interface and
> > netlink.
> > 
> > They all have their own set of problems, but the needs of nbd as a network
> > storage interface seem most closely resemble what we have for other network
> > related interfaces, where we typically use netlink to do the setup, see e.g.
> > macvtap as an example for a network chardev interface.
> > 
> > Can you have a look if that would solve your problem more nicely?
> 
> FWIW, I'm the one consuming this nbd stuff at Facebook and bringing
> netlink into this would be a huge pain for me.  It's very weird to
> need to pull sockets in and then drop back to ioctls from a design
> perspective, and future consumers of this API would be sad as well.

As who's been maintaining the official userspace nbd client for 15 years
now, I have to concur. NBD has had an API that deals with /dev/nbdX
since forever, adding other APIs to that just feels wrong.

> This is compounded, I think, by the fact that the breadth of
> functionality here doesn't really merit a shared library for everyone
> to use -- could you imagine libnbdcontrol, which exposes a single
> "get_next_device" function?

Please no :)

> If nbd were *all* netlink I think that that'd be fine, but you'd have
> problems implementing the NBD_DOIT function in that fashion.  So I'd
> rather stick to the char device ioctl thing because it's more
> consistent with the old NBD stuff as well as the loop device stuff.

Indeed.
Paul Clements Jan. 25, 2017, 6:24 p.m. UTC | #15
On Wed, Jan 25, 2017 at 9:23 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> They all have their own set of problems, but the needs of nbd as a network
> storage interface seem most closely resemble what we have for other network
> related interfaces, where we typically use netlink to do the setup, see e.g.
> macvtap as an example for a network chardev interface.

But nbd is a virtual block device, first and foremost. That means it
is very similar to both loop (Pavel began nbd as a copy of loop, I
believe) and device mapper.

Also, it uses ioctls for all configuration and administration currently.

--
Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Bligh Jan. 25, 2017, 6:25 p.m. UTC | #16
> On 25 Jan 2017, at 16:48, Alex Gartrell <agartrell@fb.com> wrote:
> 
> 
> If nbd were *all* netlink I think that that'd be fine, but you'd have
> problems implementing the NBD_DOIT function in that fashion.  So I'd
> rather stick to the char device ioctl thing because it's more
> consistent with the old NBD stuff as well as the loop device stuff.

I spend most of my time looking at the userspace side of NBD so
apologies if this is off base.

Given (because of NBD_DO_IT) we need an ioctl anyway, and we have
an ioctl that isn't going to go away, it would seem better if possible
to stick with ioctls, and not introduce either a dependency
on netlink (which would presumably bloat static binaries that
are used early in the boot process). Personally I'd have thought
adding a new NBD ioctl (or extending an existing one) would be
less entropy than adding a new char device.
Greg KH Jan. 25, 2017, 9:30 p.m. UTC | #17
On Wed, Jan 25, 2017 at 06:25:11PM +0000, Alex Bligh wrote:
> 
> > On 25 Jan 2017, at 16:48, Alex Gartrell <agartrell@fb.com> wrote:
> > 
> > 
> > If nbd were *all* netlink I think that that'd be fine, but you'd have
> > problems implementing the NBD_DOIT function in that fashion.  So I'd
> > rather stick to the char device ioctl thing because it's more
> > consistent with the old NBD stuff as well as the loop device stuff.
> 
> I spend most of my time looking at the userspace side of NBD so
> apologies if this is off base.
> 
> Given (because of NBD_DO_IT) we need an ioctl anyway, and we have
> an ioctl that isn't going to go away, it would seem better if possible
> to stick with ioctls, and not introduce either a dependency
> on netlink (which would presumably bloat static binaries that
> are used early in the boot process). Personally I'd have thought
> adding a new NBD ioctl (or extending an existing one) would be
> less entropy than adding a new char device.

Why can't you just do this on any existing nbd block device with an
ioctl to it?  No need to have to do it on an out-of-band char device
node, right?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Blake Jan. 25, 2017, 9:36 p.m. UTC | #18
On 01/25/2017 03:30 PM, Greg KH wrote:
>> Given (because of NBD_DO_IT) we need an ioctl anyway, and we have
>> an ioctl that isn't going to go away, it would seem better if possible
>> to stick with ioctls, and not introduce either a dependency
>> on netlink (which would presumably bloat static binaries that
>> are used early in the boot process). Personally I'd have thought
>> adding a new NBD ioctl (or extending an existing one) would be
>> less entropy than adding a new char device.
> 
> Why can't you just do this on any existing nbd block device with an
> ioctl to it?  No need to have to do it on an out-of-band char device
> node, right?

How do you get an fd to existing nbd block device?  Your intent is to
use an ioctl to request creating/opening a new nbd device that no one
else is using; opening an existing device in order to send that ioctl
may have negative ramifications to the actual user of that existing
device, if not permissions issues that prevent the open from even
happening.  Having a separate control fd makes it obvious that you are
asking for a new nbd device, and don't want to stomp on any existing
devices.
Christoph Hellwig Jan. 26, 2017, 8:40 a.m. UTC | #19
On Wed, Jan 25, 2017 at 03:36:20PM -0600, Eric Blake wrote:
> How do you get an fd to existing nbd block device?  Your intent is to
> use an ioctl to request creating/opening a new nbd device that no one
> else is using; opening an existing device in order to send that ioctl
> may have negative ramifications to the actual user of that existing
> device, if not permissions issues that prevent the open from even
> happening.  Having a separate control fd makes it obvious that you are
> asking for a new nbd device, and don't want to stomp on any existing
> devices.

Yes, - this whole concept of needing a device first to then associate
it with a backing store is something we had a in a lot of drivers,
and it's always been a major problem.  Thus we've been slowly moving
all the drivers off it.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg KH Jan. 26, 2017, 9:17 a.m. UTC | #20
On Thu, Jan 26, 2017 at 12:40:47AM -0800, Christoph Hellwig wrote:
> On Wed, Jan 25, 2017 at 03:36:20PM -0600, Eric Blake wrote:
> > How do you get an fd to existing nbd block device?  Your intent is to
> > use an ioctl to request creating/opening a new nbd device that no one
> > else is using; opening an existing device in order to send that ioctl
> > may have negative ramifications to the actual user of that existing
> > device, if not permissions issues that prevent the open from even
> > happening.  Having a separate control fd makes it obvious that you are
> > asking for a new nbd device, and don't want to stomp on any existing
> > devices.
> 
> Yes, - this whole concept of needing a device first to then associate
> it with a backing store is something we had a in a lot of drivers,
> and it's always been a major problem.  Thus we've been slowly moving
> all the drivers off it.

Ok, but do you feel the "loop method" of using a char device node to
create/control these devices is a good model to follow for new devices
like ndb?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Jan. 26, 2017, 1:17 p.m. UTC | #21
On Thu, Jan 26, 2017 at 10:17:58AM +0100, Greg KH wrote:
> Ok, but do you feel the "loop method" of using a char device node to
> create/control these devices is a good model to follow for new devices
> like ndb?

Yes. We've done the same for NVMe over fabrics.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 96f7681..c06ed36 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -35,6 +35,7 @@ 
 #include <linux/types.h>
 #include <linux/debugfs.h>
 #include <linux/blk-mq.h>
+#include <linux/miscdevice.h>
 
 #include <linux/uaccess.h>
 #include <asm/types.h>
@@ -59,6 +60,7 @@  struct nbd_device {
 	unsigned long runtime_flags;
 	struct nbd_sock **socks;
 	int magic;
+	int index;
 
 	struct blk_mq_tag_set tag_set;
 
@@ -1041,6 +1043,7 @@  static int nbd_dev_add(int index)
 	if (err < 0)
 		goto out_free_disk;
 
+	nbd->index = index;
 	nbd->disk = disk;
 	nbd->tag_set.ops = &nbd_mq_ops;
 	nbd->tag_set.nr_hw_queues = 1;
@@ -1097,20 +1100,109 @@  static int nbd_dev_add(int index)
 	return err;
 }
 
-/*
- * And here should be modules and kernel interface 
- *  (Just smiley confuses emacs :-)
- */
+static int find_free_cb(int id, void *ptr, void *data)
+{
+	struct nbd_device *nbd = ptr;
+	struct nbd_device **ret = data;
+
+	if (nbd->num_connections == 0) {
+		*ret = nbd;
+		return 1;
+	}
+	if (*ret == NULL || (*ret)->index < nbd->index)
+		*ret = nbd;
+	return 0;
+}
+
+static int nbd_dev_lookup(struct nbd_device **ret, int index)
+{
+	struct nbd_device *nbd = NULL;
+
+	if (index < 0) {
+		int err = idr_for_each(&nbd_index_idr, &find_free_cb, &nbd);
+		if (err != 1) {
+			if (nbd != NULL) {
+				*ret = NULL;
+				return nbd->index + 1;
+			} else {
+				return 0;
+			}
+		}
+	} else
+		nbd = idr_find(&nbd_index_idr, index);
+	if (nbd) {
+		*ret = nbd;
+		return nbd->index;
+	}
+	return -ENODEV;
+}
+
+static long nbd_control_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long parm)
+{
+	struct nbd_device *nbd = NULL;
+	int ret = -ENOSYS;
+
+	mutex_lock(&nbd_index_mutex);
+	switch (cmd) {
+	case NBD_CTL_ADD:
+		ret = nbd_dev_lookup(&nbd, parm);
+		if (ret >= 0) {
+			ret = -EEXIST;
+			break;
+		}
+		ret = nbd_dev_add(parm);
+		break;
+	case NBD_CTL_GET_FREE:
+		ret = nbd_dev_lookup(&nbd, -1);
+		if (ret < 0)
+			break;
+		if (parm && !nbd) {
+			int index = ret;
+
+			ret = nbd_dev_add(ret);
+			if (ret < 0)
+				break;
+			ret = index;
+		}
+		break;
+	default:
+		break;
+	}
+	mutex_unlock(&nbd_index_mutex);
+	return ret;
+}
+
+static const struct file_operations nbd_ctl_fops = {
+	.open		= nonseekable_open,
+	.unlocked_ioctl	= nbd_control_ioctl,
+	.compat_ioctl	= nbd_control_ioctl,
+	.owner		= THIS_MODULE,
+	.llseek		= noop_llseek,
+};
+
+static struct miscdevice nbd_misc = {
+	.minor	= NBD_CTRL_MINOR,
+	.name	= "nbd-control",
+	.fops	= &nbd_ctl_fops,
+};
+
+MODULE_ALIAS_MISCDEV(NBD_CTRL_MINOR);
+MODULE_ALIAS("devname:nbd-control");
 
 static int __init nbd_init(void)
 {
-	int i;
+	int i, ret;
 
 	BUILD_BUG_ON(sizeof(struct nbd_request) != 28);
+	ret = misc_register(&nbd_misc);
+	if (ret < 0)
+		return ret;
 
+	ret = -EINVAL;
 	if (max_part < 0) {
 		printk(KERN_ERR "nbd: max_part must be >= 0\n");
-		return -EINVAL;
+		goto out_misc;
 	}
 
 	part_shift = 0;
@@ -1129,17 +1221,20 @@  static int __init nbd_init(void)
 	}
 
 	if ((1UL << part_shift) > DISK_MAX_PARTS)
-		return -EINVAL;
-
+		goto out_misc;
 	if (nbds_max > 1UL << (MINORBITS - part_shift))
-		return -EINVAL;
+		goto out_misc;
+
 	recv_workqueue = alloc_workqueue("knbd-recv",
 					 WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
-	if (!recv_workqueue)
-		return -ENOMEM;
-
-	if (register_blkdev(NBD_MAJOR, "nbd"))
-		return -EIO;
+	if (!recv_workqueue) {
+		ret = -ENOMEM;
+		goto out_misc;
+	}
+	if (register_blkdev(NBD_MAJOR, "nbd")) {
+		ret = -EIO;
+		goto out_workqueue;
+	}
 
 	nbd_dbg_init();
 
@@ -1148,6 +1243,11 @@  static int __init nbd_init(void)
 		nbd_dev_add(i);
 	mutex_unlock(&nbd_index_mutex);
 	return 0;
+out_workqueue:
+	destroy_workqueue(recv_workqueue);
+out_misc:
+	misc_deregister(&nbd_misc);
+	return ret;
 }
 
 static int nbd_exit_cb(int id, void *ptr, void *data)
@@ -1164,6 +1264,7 @@  static void __exit nbd_cleanup(void)
 	idr_for_each(&nbd_index_idr, &nbd_exit_cb, NULL);
 	idr_destroy(&nbd_index_idr);
 	destroy_workqueue(recv_workqueue);
+	misc_deregister(&nbd_misc);
 	unregister_blkdev(NBD_MAJOR, "nbd");
 }
 
diff --git a/include/uapi/linux/nbd.h b/include/uapi/linux/nbd.h
index c91c642..a76924b 100644
--- a/include/uapi/linux/nbd.h
+++ b/include/uapi/linux/nbd.h
@@ -29,6 +29,16 @@ 
 #define NBD_SET_TIMEOUT _IO( 0xab, 9 )
 #define NBD_SET_FLAGS   _IO( 0xab, 10)
 
+/* Use the last 8 values of our allocation for CTRL ioctls. */
+#define NBD_CTL_ADD		_IO( 0xab, 23)
+#define NBD_CTL_GET_FREE	_IO( 0xab, 24)
+
+enum {
+	NBD_CTL_GET_FREE_SCAN = 0, /* Get the next free available device. */
+	NBD_CTL_GET_FREE_ADD = 1, /* Get the next free available device, but
+				     create one if one does not exist. */
+};
+
 enum {
 	NBD_CMD_READ = 0,
 	NBD_CMD_WRITE = 1,