diff mbox

Robustify ceph-rbdnamer and adapt udev rules

Message ID 1342012996.5511.27.camel@clawhammer (mailing list archive)
State New, archived
Headers show

Commit Message

Pascal de Bruijn | Unilogic Networks B.V. July 11, 2012, 1:23 p.m. UTC
Below is a patch which makes the ceph-rbdnamer script more robust and
fixes a problem with the rbd udev rules.

On our setup we encountered a symlink which was linked to the wrong rbd:

  /dev/rbd/mypool/myrbd -> /dev/rbd1

While that link should have gone to /dev/rbd3 (on which a
partition /dev/rbd3p1 was present).

Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
with %n is that %n results in a value of 3 (for rbd3), but in a value of
1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.

In the patch below the ceph-rbdnamer script is made more robust and it
now it can be called in various ways:

  /usr/bin/ceph-rbdnamer /dev/rbd3
  /usr/bin/ceph-rbdnamer /dev/rbd3p1
  /usr/bin/ceph-rbdnamer rbd3
  /usr/bin/ceph-rbdnamer rbd3p1
  /usr/bin/ceph-rbdnamer 3

Even with all these different styles of calling the modified script, it
should now return the same rbdname. This change "has" to be combined
with calling it from udev with %k though.

With that fixed, we hit the second problem. We ended up with:

  /dev/rbd/mypool/myrbd -> /dev/rbd3p1

So the rbdname was symlinked to the partition on the rbd instead of the
rbd itself. So what probably went wrong is udev discovering the disk and
running ceph-rbdnamer which resolved it to myrbd so the following
symlink was created:

  /dev/rbd/mypool/myrbd -> /dev/rbd3

However partitions would be discovered next and ceph-rbdnamer would be
run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
the previous correct symlink being overwritten with a faulty one:

  /dev/rbd/mypool/myrbd -> /dev/rbd3p1

The solution to the problem is in differentiating between disks and
partitions in udev and handling them slightly differently. So with the
patch below partitions now get their own symlinks in the following style
(which is fairly consistent with other udev rules):

  /dev/rbd/mypool/myrbd-part1 -> /dev/rbd3p1

Please let me know any feedback you have on this patch or the approach
used.

Regards,
Pascal de Bruijn
Unilogic B.V.

Signed-off-by: Pascal de Bruijn <pascal@unilogicnetworks.net>

--------------------------------------------------------------------------------



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Josh Durgin July 11, 2012, 4:28 p.m. UTC | #1
On 07/11/2012 06:23 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote:
> Below is a patch which makes the ceph-rbdnamer script more robust and
> fixes a problem with the rbd udev rules.
>
> On our setup we encountered a symlink which was linked to the wrong rbd:
>
>    /dev/rbd/mypool/myrbd ->  /dev/rbd1
>
> While that link should have gone to /dev/rbd3 (on which a
> partition /dev/rbd3p1 was present).
>
> Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
> with %n is that %n results in a value of 3 (for rbd3), but in a value of
> 1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
>
> In the patch below the ceph-rbdnamer script is made more robust and it
> now it can be called in various ways:
>
>    /usr/bin/ceph-rbdnamer /dev/rbd3
>    /usr/bin/ceph-rbdnamer /dev/rbd3p1
>    /usr/bin/ceph-rbdnamer rbd3
>    /usr/bin/ceph-rbdnamer rbd3p1
>    /usr/bin/ceph-rbdnamer 3
>
> Even with all these different styles of calling the modified script, it
> should now return the same rbdname. This change "has" to be combined
> with calling it from udev with %k though.
>
> With that fixed, we hit the second problem. We ended up with:
>
>    /dev/rbd/mypool/myrbd ->  /dev/rbd3p1
>
> So the rbdname was symlinked to the partition on the rbd instead of the
> rbd itself. So what probably went wrong is udev discovering the disk and
> running ceph-rbdnamer which resolved it to myrbd so the following
> symlink was created:
>
>    /dev/rbd/mypool/myrbd ->  /dev/rbd3
>
> However partitions would be discovered next and ceph-rbdnamer would be
> run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
> the previous correct symlink being overwritten with a faulty one:
>
>    /dev/rbd/mypool/myrbd ->  /dev/rbd3p1
>
> The solution to the problem is in differentiating between disks and
> partitions in udev and handling them slightly differently. So with the
> patch below partitions now get their own symlinks in the following style
> (which is fairly consistent with other udev rules):
>
>    /dev/rbd/mypool/myrbd-part1 ->  /dev/rbd3p1
>
> Please let me know any feedback you have on this patch or the approach
> used.

This all makes sense, but maybe we should put the -part suffix in
another namespace to avoid colliding with images that happen to have
-partN in their name, e.g.:

     /dev/rbd/mypool/myrbd/part1 -> /dev/rbd3p1

Josh

> Regards,
> Pascal de Bruijn
> Unilogic B.V.
>
> Signed-off-by: Pascal de Bruijn<pascal@unilogicnetworks.net>
>
> --------------------------------------------------------------------------------
> diff --git a/src/ceph-rbdnamer b/src/ceph-rbdnamer
> index 49350a7..efb6804 100755
> --- a/src/ceph-rbdnamer
> +++ b/src/ceph-rbdnamer
> @@ -1,8 +1,10 @@
>   #!/bin/sh
>
> -POOL=`cat /sys/devices/rbd/$1/pool`
> -IMAGE=`cat /sys/devices/rbd/$1/name`
> -SNAP=`cat /sys/devices/rbd/$1/current_snap`
> +DEV=$1
> +NUM=`echo $DEV | sed 's#p.*##g' | tr -d 'a-z'`
> +POOL=`cat /sys/devices/rbd/$NUM/pool`
> +IMAGE=`cat /sys/devices/rbd/$NUM/name`
> +SNAP=`cat /sys/devices/rbd/$NUM/current_snap`
>   if [ "$SNAP" = "-" ]; then
>   	echo -n "$POOL $IMAGE"
>   else
> diff --git a/udev/50-rbd.rules b/udev/50-rbd.rules
> index 07f220f..406c31c 100644
> --- a/udev/50-rbd.rules
> +++ b/udev/50-rbd.rules
> @@ -1 +1,2 @@
> -KERNEL=="rbd[0-9]*", PROGRAM="/usr/bin/ceph-rbdnamer %n", SYMLINK+="rbd/%c{1}/%c{2}"
> +KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="disk", PROGRAM="/usr/bin/ceph-rbdnamer %k", SYMLINK+="rbd/%c{1}/%c{2}"
> +KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="partition", PROGRAM="/usr/bin/ceph-rbdnamer %k", SYMLINK+="rbd/%c{1}/%c{2}-part%n"
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tommi Virtanen July 11, 2012, 4:32 p.m. UTC | #2
On Wed, Jul 11, 2012 at 9:28 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
> This all makes sense, but maybe we should put the -part suffix in
> another namespace to avoid colliding with images that happen to have
> -partN in their name, e.g.:
>
>     /dev/rbd/mypool/myrbd/part1 -> /dev/rbd3p1

Good idea (assuming the udev details make that doable).

Namespaces are one honking great idea -- let's do more of those!   ;)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pascal de Bruijn | Unilogic Networks B.V. July 12, 2012, 7:49 a.m. UTC | #3
On Wed, 2012-07-11 at 09:28 -0700, Josh Durgin wrote:
> On 07/11/2012 06:23 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote:
> > Below is a patch which makes the ceph-rbdnamer script more robust and
> > fixes a problem with the rbd udev rules.
> >
> > On our setup we encountered a symlink which was linked to the wrong rbd:
> >
> >    /dev/rbd/mypool/myrbd ->  /dev/rbd1
> >
> > While that link should have gone to /dev/rbd3 (on which a
> > partition /dev/rbd3p1 was present).
> >
> > Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
> > with %n is that %n results in a value of 3 (for rbd3), but in a value of
> > 1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
> >
> > In the patch below the ceph-rbdnamer script is made more robust and it
> > now it can be called in various ways:
> >
> >    /usr/bin/ceph-rbdnamer /dev/rbd3
> >    /usr/bin/ceph-rbdnamer /dev/rbd3p1
> >    /usr/bin/ceph-rbdnamer rbd3
> >    /usr/bin/ceph-rbdnamer rbd3p1
> >    /usr/bin/ceph-rbdnamer 3
> >
> > Even with all these different styles of calling the modified script, it
> > should now return the same rbdname. This change "has" to be combined
> > with calling it from udev with %k though.
> >
> > With that fixed, we hit the second problem. We ended up with:
> >
> >    /dev/rbd/mypool/myrbd ->  /dev/rbd3p1
> >
> > So the rbdname was symlinked to the partition on the rbd instead of the
> > rbd itself. So what probably went wrong is udev discovering the disk and
> > running ceph-rbdnamer which resolved it to myrbd so the following
> > symlink was created:
> >
> >    /dev/rbd/mypool/myrbd ->  /dev/rbd3
> >
> > However partitions would be discovered next and ceph-rbdnamer would be
> > run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
> > the previous correct symlink being overwritten with a faulty one:
> >
> >    /dev/rbd/mypool/myrbd ->  /dev/rbd3p1
> >
> > The solution to the problem is in differentiating between disks and
> > partitions in udev and handling them slightly differently. So with the
> > patch below partitions now get their own symlinks in the following style
> > (which is fairly consistent with other udev rules):
> >
> >    /dev/rbd/mypool/myrbd-part1 ->  /dev/rbd3p1
> >
> > Please let me know any feedback you have on this patch or the approach
> > used.
> 
> This all makes sense, but maybe we should put the -part suffix in
> another namespace to avoid colliding with images that happen to have
> -partN in their name, e.g.:
> 
>      /dev/rbd/mypool/myrbd/part1 -> /dev/rbd3p1

Well my current patch changes the udev rules in a way that's consistent
with other udev bits. For example:

  /dev/disk/by-id/cciss-3600508b1001038353220202020200006
  /dev/disk/by-id/cciss-3600508b1001038353220202020200006-part1
  /dev/disk/by-id/cciss-3600508b1001038353220202020200006-part2

There is no namespacing there either. That said, those rules tends to
use serials/unique-id's for naming (and not user specified strings), so
there is little risk of conflicting with the -part%n bit.

Also, having a namespace as suggested:

  /dev/rbd/mypool/myrbd/part1 -> /dev/rbd3p1

Also precludes:

  /dev/rbd/mypool/myrbd -> /dev/rbd3

From existing, as myrbd can't be both a device file and directory at the
same time :)

Assuming you'd want to continue with this approach the disk udev link
should probably be something like:

  /dev/rbd/mypool/myrbd/disk -> /dev/rbd3

Please do note that this would change the udev rules in a way that could
potentially break people's existing scripts which might assume the old
udev scheme (whereas my current patch does not break the old scheme).

Maybe it's worth considering applying my patch as-is to the 0.48.x
stable tree, and experimenting with other udev schemes in newer
development releases?

Regards,
Pascal de Bruijn


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Durgin July 17, 2012, 1:03 a.m. UTC | #4
On 07/12/2012 12:49 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote:
> On Wed, 2012-07-11 at 09:28 -0700, Josh Durgin wrote:
>> On 07/11/2012 06:23 AM, Pascal de Bruijn | Unilogic Networks B.V. wrote:
>>> Below is a patch which makes the ceph-rbdnamer script more robust and
>>> fixes a problem with the rbd udev rules.
>>>
>>> On our setup we encountered a symlink which was linked to the wrong rbd:
>>>
>>>     /dev/rbd/mypool/myrbd ->   /dev/rbd1
>>>
>>> While that link should have gone to /dev/rbd3 (on which a
>>> partition /dev/rbd3p1 was present).
>>>
>>> Now the old udev rule passes %n to the ceph-rbdnamer script, the problem
>>> with %n is that %n results in a value of 3 (for rbd3), but in a value of
>>> 1 (for rbd3p1), so it seems it can't be depended upon for rbdnaming.
>>>
>>> In the patch below the ceph-rbdnamer script is made more robust and it
>>> now it can be called in various ways:
>>>
>>>     /usr/bin/ceph-rbdnamer /dev/rbd3
>>>     /usr/bin/ceph-rbdnamer /dev/rbd3p1
>>>     /usr/bin/ceph-rbdnamer rbd3
>>>     /usr/bin/ceph-rbdnamer rbd3p1
>>>     /usr/bin/ceph-rbdnamer 3
>>>
>>> Even with all these different styles of calling the modified script, it
>>> should now return the same rbdname. This change "has" to be combined
>>> with calling it from udev with %k though.
>>>
>>> With that fixed, we hit the second problem. We ended up with:
>>>
>>>     /dev/rbd/mypool/myrbd ->   /dev/rbd3p1
>>>
>>> So the rbdname was symlinked to the partition on the rbd instead of the
>>> rbd itself. So what probably went wrong is udev discovering the disk and
>>> running ceph-rbdnamer which resolved it to myrbd so the following
>>> symlink was created:
>>>
>>>     /dev/rbd/mypool/myrbd ->   /dev/rbd3
>>>
>>> However partitions would be discovered next and ceph-rbdnamer would be
>>> run with rbd3p1 (%k) as parameter, resulting in the name myrbd too, with
>>> the previous correct symlink being overwritten with a faulty one:
>>>
>>>     /dev/rbd/mypool/myrbd ->   /dev/rbd3p1
>>>
>>> The solution to the problem is in differentiating between disks and
>>> partitions in udev and handling them slightly differently. So with the
>>> patch below partitions now get their own symlinks in the following style
>>> (which is fairly consistent with other udev rules):
>>>
>>>     /dev/rbd/mypool/myrbd-part1 ->   /dev/rbd3p1
>>>
>>> Please let me know any feedback you have on this patch or the approach
>>> used.
>>
>> This all makes sense, but maybe we should put the -part suffix in
>> another namespace to avoid colliding with images that happen to have
>> -partN in their name, e.g.:
>>
>>       /dev/rbd/mypool/myrbd/part1 ->  /dev/rbd3p1
>
> Well my current patch changes the udev rules in a way that's consistent
> with other udev bits. For example:
>
>    /dev/disk/by-id/cciss-3600508b1001038353220202020200006
>    /dev/disk/by-id/cciss-3600508b1001038353220202020200006-part1
>    /dev/disk/by-id/cciss-3600508b1001038353220202020200006-part2
>
> There is no namespacing there either. That said, those rules tends to
> use serials/unique-id's for naming (and not user specified strings), so
> there is little risk of conflicting with the -part%n bit.
>
> Also, having a namespace as suggested:
>
>    /dev/rbd/mypool/myrbd/part1 ->  /dev/rbd3p1
>
> Also precludes:
>
>    /dev/rbd/mypool/myrbd ->  /dev/rbd3
>
>  From existing, as myrbd can't be both a device file and directory at the
> same time :)
>
> Assuming you'd want to continue with this approach the disk udev link
> should probably be something like:
>
>    /dev/rbd/mypool/myrbd/disk ->  /dev/rbd3
>
> Please do note that this would change the udev rules in a way that could
> potentially break people's existing scripts which might assume the old
> udev scheme (whereas my current patch does not break the old scheme).

Good point.

> Maybe it's worth considering applying my patch as-is to the 0.48.x
> stable tree, and experimenting with other udev schemes in newer
> development releases?

That sounds best. I've applied your patch to the stable, next, and
master branches.

Thanks!
Josh

> Regards,
> Pascal de Bruijn

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/src/ceph-rbdnamer b/src/ceph-rbdnamer
index 49350a7..efb6804 100755
--- a/src/ceph-rbdnamer
+++ b/src/ceph-rbdnamer
@@ -1,8 +1,10 @@ 
 #!/bin/sh
 
-POOL=`cat /sys/devices/rbd/$1/pool`
-IMAGE=`cat /sys/devices/rbd/$1/name`
-SNAP=`cat /sys/devices/rbd/$1/current_snap`
+DEV=$1
+NUM=`echo $DEV | sed 's#p.*##g' | tr -d 'a-z'`
+POOL=`cat /sys/devices/rbd/$NUM/pool`
+IMAGE=`cat /sys/devices/rbd/$NUM/name`
+SNAP=`cat /sys/devices/rbd/$NUM/current_snap`
 if [ "$SNAP" = "-" ]; then
 	echo -n "$POOL $IMAGE"
 else
diff --git a/udev/50-rbd.rules b/udev/50-rbd.rules
index 07f220f..406c31c 100644
--- a/udev/50-rbd.rules
+++ b/udev/50-rbd.rules
@@ -1 +1,2 @@ 
-KERNEL=="rbd[0-9]*", PROGRAM="/usr/bin/ceph-rbdnamer %n", SYMLINK+="rbd/%c{1}/%c{2}"
+KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="disk", PROGRAM="/usr/bin/ceph-rbdnamer %k", SYMLINK+="rbd/%c{1}/%c{2}"
+KERNEL=="rbd[0-9]*", ENV{DEVTYPE}=="partition", PROGRAM="/usr/bin/ceph-rbdnamer %k", SYMLINK+="rbd/%c{1}/%c{2}-part%n"