Message ID | 20250309222904.449803-1-bmarzins@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | dm: fix issues with swapping dm tables | expand |
On 3/10/25 07:28, Benjamin Marzinski wrote: > There were multiple places in dm's __bind() function where it could fail > and not completely roll back, leaving the device using the the old > table, but with device limits and resources from the new table. > Additionally, unused mempools for request-based devices were not always > freed immediately. > > Finally, there were a number of issues with switching zoned tables that > emulate zone append (in other words, dm-crypt on top of zoned devices). > dm_blk_report_zones() could be called while the device was suspended and > modifying zoned resources or could possibly fail to end a srcu read > section. More importantly, blk_revalidate_disk_zones() would never get > called when updating a zoned table. This could cause the dm device to > see the wrong zone write offsets, not have a large enough zwplugs > reserved in its mempool, or read invalid memory when checking the > conventional zones bitmap. > > This patchset fixes these issues. It does not make it so that > device-mapper is able to load any zoned table from any other zoned > table. Zoned dm-crypt devices can be safely grown and shrunk, but > reloading a zoned dm-crypt device to, for instance, point at a > completely different underlying device won't work correctly. IO might > fail since the zone write offsets of the dm-crypt device will not be > updated for all the existing zones with plugs. If the new device's zone > offsets don't match the old device's offsets, IO to the zone will fail. > If the ability to switch tables from a zoned dm-crypt device to an > abritry other zoned dm-crypt device is important to people, it could be > done as long as there are no plugged zones when dm suspends. Thanks for fixing this. Given that in the general case switching tables will always likely result in unaligned write errors, I think we should just report a ENOTSUPP error if the user attempts to swap tables. > This patchset also doesn't touch the code for switching from a zoned to > a non-zoned device. Switching from a zoned dm-crypt device to a > non-zoned device is problematic if there are plugged zones, since the > underlying device will not be prepared to handle these plugged writes. > Switching to a target that does not pass down IOs, like the dm-error > target, is fine. So is switching when there are no plugged zones, except > that we do not free the zoned resources in this case, even though we > safely could. This is another case that does not make much sense in practice. So instead of still allowing it knowing that it most likely will not work, we should return ENOTSUPP here too I think. > If people are interested in removing some of these limitations, I can > send patches for them, but I'm not sure how much extra code we want, > just to support niche zoned dm-crypt reloads. I have never heard any complaints/bug reports from people attempting this. Which likely means that no-one is needing/trying to do this. So as mentionned above, we should make sure that this feature is not reported as not supported with a ENOTSUPP error, and maybe a warning.
On Mon, Mar 10, 2025 at 08:16:43AM +0900, Damien Le Moal wrote: > On 3/10/25 07:28, Benjamin Marzinski wrote: > > There were multiple places in dm's __bind() function where it could fail > > and not completely roll back, leaving the device using the the old > > table, but with device limits and resources from the new table. > > Additionally, unused mempools for request-based devices were not always > > freed immediately. > > > > Finally, there were a number of issues with switching zoned tables that > > emulate zone append (in other words, dm-crypt on top of zoned devices). > > dm_blk_report_zones() could be called while the device was suspended and > > modifying zoned resources or could possibly fail to end a srcu read > > section. More importantly, blk_revalidate_disk_zones() would never get > > called when updating a zoned table. This could cause the dm device to > > see the wrong zone write offsets, not have a large enough zwplugs > > reserved in its mempool, or read invalid memory when checking the > > conventional zones bitmap. > > > > This patchset fixes these issues. It does not make it so that > > device-mapper is able to load any zoned table from any other zoned > > table. Zoned dm-crypt devices can be safely grown and shrunk, but > > reloading a zoned dm-crypt device to, for instance, point at a > > completely different underlying device won't work correctly. IO might > > fail since the zone write offsets of the dm-crypt device will not be > > updated for all the existing zones with plugs. If the new device's zone > > offsets don't match the old device's offsets, IO to the zone will fail. > > If the ability to switch tables from a zoned dm-crypt device to an > > abritry other zoned dm-crypt device is important to people, it could be > > done as long as there are no plugged zones when dm suspends. > > Thanks for fixing this. > > Given that in the general case switching tables will always likely result in > unaligned write errors, I think we should just report a ENOTSUPP error if the > user attempts to swap tables. If we don't think there's any interest in growing or shrinking zoned dm-crypt devices, that's fine. I do think we should make an exception for switching to the dm-error target. We specifically call that out with DM_TARGET_WILDCARD so that we can always switch to it from any table if we just want to fail out all the IO. -Ben > > This patchset also doesn't touch the code for switching from a zoned to > > a non-zoned device. Switching from a zoned dm-crypt device to a > > non-zoned device is problematic if there are plugged zones, since the > > underlying device will not be prepared to handle these plugged writes. > > Switching to a target that does not pass down IOs, like the dm-error > > target, is fine. So is switching when there are no plugged zones, except > > that we do not free the zoned resources in this case, even though we > > safely could. > > This is another case that does not make much sense in practice. So instead of > still allowing it knowing that it most likely will not work, we should return > ENOTSUPP here too I think. > > > If people are interested in removing some of these limitations, I can > > send patches for them, but I'm not sure how much extra code we want, > > just to support niche zoned dm-crypt reloads. > > I have never heard any complaints/bug reports from people attempting this. Which > likely means that no-one is needing/trying to do this. So as mentionned above, > we should make sure that this feature is not reported as not supported with a > ENOTSUPP error, and maybe a warning. > > > > -- > Damien Le Moal > Western Digital Research
On 3/11/25 01:38, Benjamin Marzinski wrote: > On Mon, Mar 10, 2025 at 08:16:43AM +0900, Damien Le Moal wrote: >> On 3/10/25 07:28, Benjamin Marzinski wrote: >>> There were multiple places in dm's __bind() function where it could fail >>> and not completely roll back, leaving the device using the the old >>> table, but with device limits and resources from the new table. >>> Additionally, unused mempools for request-based devices were not always >>> freed immediately. >>> >>> Finally, there were a number of issues with switching zoned tables that >>> emulate zone append (in other words, dm-crypt on top of zoned devices). >>> dm_blk_report_zones() could be called while the device was suspended and >>> modifying zoned resources or could possibly fail to end a srcu read >>> section. More importantly, blk_revalidate_disk_zones() would never get >>> called when updating a zoned table. This could cause the dm device to >>> see the wrong zone write offsets, not have a large enough zwplugs >>> reserved in its mempool, or read invalid memory when checking the >>> conventional zones bitmap. >>> >>> This patchset fixes these issues. It does not make it so that >>> device-mapper is able to load any zoned table from any other zoned >>> table. Zoned dm-crypt devices can be safely grown and shrunk, but >>> reloading a zoned dm-crypt device to, for instance, point at a >>> completely different underlying device won't work correctly. IO might >>> fail since the zone write offsets of the dm-crypt device will not be >>> updated for all the existing zones with plugs. If the new device's zone >>> offsets don't match the old device's offsets, IO to the zone will fail. >>> If the ability to switch tables from a zoned dm-crypt device to an >>> abritry other zoned dm-crypt device is important to people, it could be >>> done as long as there are no plugged zones when dm suspends. >> >> Thanks for fixing this. >> >> Given that in the general case switching tables will always likely result in >> unaligned write errors, I think we should just report a ENOTSUPP error if the >> user attempts to swap tables. > > If we don't think there's any interest in growing or shrinking zoned > dm-crypt devices, that's fine. I do think we should make an exception > for switching to the dm-error target. We specifically call that out with > DM_TARGET_WILDCARD so that we can always switch to it from any table if > we just want to fail out all the IO. Arg ! dm-error is used in xfstests so we need it (for btrfs at least since btrfs supports zoned devices, and soon xfs as well). So I guess we should disallow switching tables when the new table changes something to the zone configuration (grow, shrink, zone size, zoned/non-zoned). dm-error does not change anything, so we should still be able to allow it.