Message ID | 20231124145338.3112416-1-o.rempel@pengutronix.de (mailing list archive) |
---|---|
Headers | show |
Series | introduce priority-based shutdown support | expand |
On Fri, Nov 24, 2023 at 03:53:35PM +0100, Oleksij Rempel wrote: > Hi, > > This patch series introduces support for prioritized device shutdown. > The main goal is to enable prioritization for shutting down specific > devices, particularly crucial in scenarios like power loss where > hardware damage can occur if not handled properly. Oh fun, now we will have drivers and subsystems fighting over their priority, with each one insisting that they are the most important! /s Anyway, this is ripe for problems and issues in the long-run, what is so special about this hardware that it can not just shutdown in the existing order that it has to be "first" over everyone else? What exactly does this prevent and what devices are requiring this? And most importantly, what has changed in the past 20+ years to suddenly require this new functionality and how does any other operating system handle it? thanks, greg k-h
On Fri, Nov 24, 2023 at 03:05:47PM +0000, Greg Kroah-Hartman wrote: > Anyway, this is ripe for problems and issues in the long-run, what is so > special about this hardware that it can not just shutdown in the > existing order that it has to be "first" over everyone else? What > exactly does this prevent and what devices are requiring this? > And most importantly, what has changed in the past 20+ years to > suddenly require this new functionality and how does any other operating > system handle it? This came out of some discussions about trying to handle emergency power failure notifications.
On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > On Fri, Nov 24, 2023 at 03:05:47PM +0000, Greg Kroah-Hartman wrote: > > > Anyway, this is ripe for problems and issues in the long-run, what is so > > special about this hardware that it can not just shutdown in the > > existing order that it has to be "first" over everyone else? What > > exactly does this prevent and what devices are requiring this? > > > And most importantly, what has changed in the past 20+ years to > > suddenly require this new functionality and how does any other operating > > system handle it? > > This came out of some discussions about trying to handle emergency power > failure notifications. I'm sorry, but I don't know what that means. Are you saying that the kernel is now going to try to provide a hard guarantee that some devices are going to be shut down in X number of seconds when asked? If so, why not do this in userspace? thanks, greg k-h
On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > This came out of some discussions about trying to handle emergency power > > failure notifications. > I'm sorry, but I don't know what that means. Are you saying that the > kernel is now going to try to provide a hard guarantee that some devices > are going to be shut down in X number of seconds when asked? If so, why > not do this in userspace? No, it was initially (or when I initially saw it anyway) handling of notifications from regulators that they're in trouble and we have some small amount of time to do anything we might want to do about it before we expire.
On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > This came out of some discussions about trying to handle emergency power > > > failure notifications. > > > I'm sorry, but I don't know what that means. Are you saying that the > > kernel is now going to try to provide a hard guarantee that some devices > > are going to be shut down in X number of seconds when asked? If so, why > > not do this in userspace? > > No, it was initially (or when I initially saw it anyway) handling of > notifications from regulators that they're in trouble and we have some > small amount of time to do anything we might want to do about it before > we expire. So we are going to guarantee a "time" in which we are going to do something? Again, if that's required, why not do it in userspace using a RT kernel? thanks, greg k-h
On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > This came out of some discussions about trying to handle emergency power > > > > failure notifications. > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > kernel is now going to try to provide a hard guarantee that some devices > > > are going to be shut down in X number of seconds when asked? If so, why > > > not do this in userspace? > > > > No, it was initially (or when I initially saw it anyway) handling of > > notifications from regulators that they're in trouble and we have some > > small amount of time to do anything we might want to do about it before > > we expire. > > So we are going to guarantee a "time" in which we are going to do > something? Again, if that's required, why not do it in userspace using > a RT kernel? For the HW in question I have only 100ms time before power loss. By doing it over use space some we will have even less time to react. In fact, this is not a new requirement. It exist on different flavors of automotive Linux for about 10 years. Linux in cars should be able to handle voltage drops for example on ignition and so on. The only new thing is the attempt to mainline it. Regards, Oleksij
On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > failure notifications. > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > not do this in userspace? > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > notifications from regulators that they're in trouble and we have some > > > small amount of time to do anything we might want to do about it before > > > we expire. > > > > So we are going to guarantee a "time" in which we are going to do > > something? Again, if that's required, why not do it in userspace using > > a RT kernel? > > For the HW in question I have only 100ms time before power loss. By > doing it over use space some we will have even less time to react. Why can't userspace react that fast? Why will the kernel be somehow faster? Speed should be the same, just get the "power is cut" signal and have userspace flush and unmount the disk before power is gone. Why can the kernel do this any differently? > In fact, this is not a new requirement. It exist on different flavors of > automotive Linux for about 10 years. Linux in cars should be able to > handle voltage drops for example on ignition and so on. The only new thing is > the attempt to mainline it. But your patch is not guaranteeing anything, it's just doing a "I want this done before the other devices are handled", that's it. There is no chance that 100ms is going to be a requirement, or that some other device type is not going to come along and demand to be ahead of your device in the list. So you are going to have a constant fight among device types over the years, and people complaining that the kernel is now somehow going to guarantee that a device is shutdown in a set amount of time, which again, the kernel can not guarantee here. This might work as a one-off for a specific hardware platform, which is odd, but not anything you really should be adding for anyone else to use here as your reasoning for it does not reflect what the code does. thanks, greg k-h
On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > failure notifications. > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > not do this in userspace? > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > notifications from regulators that they're in trouble and we have some > > > > small amount of time to do anything we might want to do about it before > > > > we expire. > > > > > > So we are going to guarantee a "time" in which we are going to do > > > something? Again, if that's required, why not do it in userspace using > > > a RT kernel? > > > > For the HW in question I have only 100ms time before power loss. By > > doing it over use space some we will have even less time to react. > > Why can't userspace react that fast? Why will the kernel be somehow > faster? Speed should be the same, just get the "power is cut" signal > and have userspace flush and unmount the disk before power is gone. Why > can the kernel do this any differently? > > > In fact, this is not a new requirement. It exist on different flavors of > > automotive Linux for about 10 years. Linux in cars should be able to > > handle voltage drops for example on ignition and so on. The only new thing is > > the attempt to mainline it. > > But your patch is not guaranteeing anything, it's just doing a "I want > this done before the other devices are handled", that's it. There is no > chance that 100ms is going to be a requirement, or that some other > device type is not going to come along and demand to be ahead of your > device in the list. > > So you are going to have a constant fight among device types over the > years, and people complaining that the kernel is now somehow going to > guarantee that a device is shutdown in a set amount of time, which > again, the kernel can not guarantee here. > > This might work as a one-off for a specific hardware platform, which is > odd, but not anything you really should be adding for anyone else to use > here as your reasoning for it does not reflect what the code does. I see. Good point. In my case umount is not needed, there is not enough time to write down the data. We should send a shutdown command to the eMMC ASAP. @Ulf, are there a way request mmc shutdown from user space? If I see it correctly, sysfs-devices-power-control support only "auto" and "on". Unbinding the module will not execute MMC shutdown notification. If user space is the way to go, do sysfs-devices-power-control "off" command will be acceptable? The other option I have is to add a regulator event handler to the MMC framework and do shutdown notification on under-voltage event. Are there other options? Regards, Oleksij
On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote: > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > > failure notifications. > > > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > > not do this in userspace? > > > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > > notifications from regulators that they're in trouble and we have some > > > > > small amount of time to do anything we might want to do about it before > > > > > we expire. > > > > > > > > So we are going to guarantee a "time" in which we are going to do > > > > something? Again, if that's required, why not do it in userspace using > > > > a RT kernel? > > > > > > For the HW in question I have only 100ms time before power loss. By > > > doing it over use space some we will have even less time to react. > > > > Why can't userspace react that fast? Why will the kernel be somehow > > faster? Speed should be the same, just get the "power is cut" signal > > and have userspace flush and unmount the disk before power is gone. Why > > can the kernel do this any differently? > > > > > In fact, this is not a new requirement. It exist on different flavors of > > > automotive Linux for about 10 years. Linux in cars should be able to > > > handle voltage drops for example on ignition and so on. The only new thing is > > > the attempt to mainline it. > > > > But your patch is not guaranteeing anything, it's just doing a "I want > > this done before the other devices are handled", that's it. There is no > > chance that 100ms is going to be a requirement, or that some other > > device type is not going to come along and demand to be ahead of your > > device in the list. > > > > So you are going to have a constant fight among device types over the > > years, and people complaining that the kernel is now somehow going to > > guarantee that a device is shutdown in a set amount of time, which > > again, the kernel can not guarantee here. > > > > This might work as a one-off for a specific hardware platform, which is > > odd, but not anything you really should be adding for anyone else to use > > here as your reasoning for it does not reflect what the code does. > > I see. Good point. > > In my case umount is not needed, there is not enough time to write down > the data. We should send a shutdown command to the eMMC ASAP. If you don't care about the data, why is a shutdown command to the hardware needed? What does that do that makes anything "safe" if your data is lost. thanks, greg k-h
On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote: > On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote: > > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > > > failure notifications. > > > > > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > > > not do this in userspace? > > > > > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > > > notifications from regulators that they're in trouble and we have some > > > > > > small amount of time to do anything we might want to do about it before > > > > > > we expire. > > > > > > > > > > So we are going to guarantee a "time" in which we are going to do > > > > > something? Again, if that's required, why not do it in userspace using > > > > > a RT kernel? > > > > > > > > For the HW in question I have only 100ms time before power loss. By > > > > doing it over use space some we will have even less time to react. > > > > > > Why can't userspace react that fast? Why will the kernel be somehow > > > faster? Speed should be the same, just get the "power is cut" signal > > > and have userspace flush and unmount the disk before power is gone. Why > > > can the kernel do this any differently? > > > > > > > In fact, this is not a new requirement. It exist on different flavors of > > > > automotive Linux for about 10 years. Linux in cars should be able to > > > > handle voltage drops for example on ignition and so on. The only new thing is > > > > the attempt to mainline it. > > > > > > But your patch is not guaranteeing anything, it's just doing a "I want > > > this done before the other devices are handled", that's it. There is no > > > chance that 100ms is going to be a requirement, or that some other > > > device type is not going to come along and demand to be ahead of your > > > device in the list. > > > > > > So you are going to have a constant fight among device types over the > > > years, and people complaining that the kernel is now somehow going to > > > guarantee that a device is shutdown in a set amount of time, which > > > again, the kernel can not guarantee here. > > > > > > This might work as a one-off for a specific hardware platform, which is > > > odd, but not anything you really should be adding for anyone else to use > > > here as your reasoning for it does not reflect what the code does. > > > > I see. Good point. > > > > In my case umount is not needed, there is not enough time to write down > > the data. We should send a shutdown command to the eMMC ASAP. > > If you don't care about the data, why is a shutdown command to the > hardware needed? What does that do that makes anything "safe" if your > data is lost. It prevents HW damage. In a typical automotive under-voltage labor it is usually possible to reproduce X amount of bricked eMMCs or NANDs on Y amount of under-voltage cycles (I do not have exact numbers right now). Even if the numbers not so high in the labor tests (sometimes something like one bricked device in a month of tests), the field returns are significant enough to care about software solution for this problem. Same problem was seen not only in automotive devices, but also in industrial or agricultural. With other words, it is important enough to bring some kind of solution mainline.
On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote: > On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote: > > On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote: > > > On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: > > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > > > > failure notifications. > > > > > > > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > > > > not do this in userspace? > > > > > > > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > > > > notifications from regulators that they're in trouble and we have some > > > > > > > small amount of time to do anything we might want to do about it before > > > > > > > we expire. > > > > > > > > > > > > So we are going to guarantee a "time" in which we are going to do > > > > > > something? Again, if that's required, why not do it in userspace using > > > > > > a RT kernel? > > > > > > > > > > For the HW in question I have only 100ms time before power loss. By > > > > > doing it over use space some we will have even less time to react. > > > > > > > > Why can't userspace react that fast? Why will the kernel be somehow > > > > faster? Speed should be the same, just get the "power is cut" signal > > > > and have userspace flush and unmount the disk before power is gone. Why > > > > can the kernel do this any differently? > > > > > > > > > In fact, this is not a new requirement. It exist on different flavors of > > > > > automotive Linux for about 10 years. Linux in cars should be able to > > > > > handle voltage drops for example on ignition and so on. The only new thing is > > > > > the attempt to mainline it. > > > > > > > > But your patch is not guaranteeing anything, it's just doing a "I want > > > > this done before the other devices are handled", that's it. There is no > > > > chance that 100ms is going to be a requirement, or that some other > > > > device type is not going to come along and demand to be ahead of your > > > > device in the list. > > > > > > > > So you are going to have a constant fight among device types over the > > > > years, and people complaining that the kernel is now somehow going to > > > > guarantee that a device is shutdown in a set amount of time, which > > > > again, the kernel can not guarantee here. > > > > > > > > This might work as a one-off for a specific hardware platform, which is > > > > odd, but not anything you really should be adding for anyone else to use > > > > here as your reasoning for it does not reflect what the code does. > > > > > > I see. Good point. > > > > > > In my case umount is not needed, there is not enough time to write down > > > the data. We should send a shutdown command to the eMMC ASAP. > > > > If you don't care about the data, why is a shutdown command to the > > hardware needed? What does that do that makes anything "safe" if your > > data is lost. > > It prevents HW damage. In a typical automotive under-voltage labor it is > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y > amount of under-voltage cycles (I do not have exact numbers right now). > Even if the numbers not so high in the labor tests (sometimes something > like one bricked device in a month of tests), the field returns are > significant enough to care about software solution for this problem. So hardware is attempting to rely on software in order to prevent the destruction of that same hardware? Surely hardware designers aren't that crazy, right? (rhetorical question, I know...) > Same problem was seen not only in automotive devices, but also in > industrial or agricultural. With other words, it is important enough to bring > some kind of solution mainline. But you are not providing a real solution here, only a "I am going to attempt to shut down a specific type of device before the others, there are no time or ordering guarantees here, so good luck!" solution. And again, how are you going to prevent the in-fighting of all device types to be "first" in the list? thanks, greg k-h
On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote: > On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote: > > It prevents HW damage. In a typical automotive under-voltage labor it is > > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y > > amount of under-voltage cycles (I do not have exact numbers right now). > > Even if the numbers not so high in the labor tests (sometimes something > > like one bricked device in a month of tests), the field returns are > > significant enough to care about software solution for this problem. > So hardware is attempting to rely on software in order to prevent the > destruction of that same hardware? Surely hardware designers aren't > that crazy, right? (rhetorical question, I know...) Surely software people aren't going to make no effort to integrate with the notification features that the hardware engineers have so helpfully provided us with? > > Same problem was seen not only in automotive devices, but also in > > industrial or agricultural. With other words, it is important enough to bring > > some kind of solution mainline. > But you are not providing a real solution here, only a "I am going to > attempt to shut down a specific type of device before the others, there > are no time or ordering guarantees here, so good luck!" solution. I'm not sure there are great solutions here, the system integrators are constrained by the what the application appropriate silicon that's on the market is capable of, the siicon is constrained by the area costs of dealing with corner cases for system robustness and how much of the market cares about fixing these issues and software is constrained by what hardware ends up being built. Everyone's just got to try their best with the reality they're confronted with, hopefully what's possible will improve with time. > And again, how are you going to prevent the in-fighting of all device > types to be "first" in the list? It doesn't seem like the most complex integration challenge we've ever had to deal with TBH.
On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote: > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote: > > On Sat, Nov 25, 2023 at 09:50:38AM +0100, Oleksij Rempel wrote: > > > > It prevents HW damage. In a typical automotive under-voltage labor it is > > > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y > > > amount of under-voltage cycles (I do not have exact numbers right now). > > > Even if the numbers not so high in the labor tests (sometimes something > > > like one bricked device in a month of tests), the field returns are > > > significant enough to care about software solution for this problem. > > > So hardware is attempting to rely on software in order to prevent the > > destruction of that same hardware? Surely hardware designers aren't > > that crazy, right? (rhetorical question, I know...) > > Surely software people aren't going to make no effort to integrate with > the notification features that the hardware engineers have so helpfully > provided us with? That would be great, but I don't see that here, do you? All I see is the shutdown sequence changing because someone wants it to go "faster" with the threat of hardware breaking if we don't meet that "faster" number, yet no knowledge or guarantee that this number can ever be known or happen. > > > Same problem was seen not only in automotive devices, but also in > > > industrial or agricultural. With other words, it is important enough to bring > > > some kind of solution mainline. > > > But you are not providing a real solution here, only a "I am going to > > attempt to shut down a specific type of device before the others, there > > are no time or ordering guarantees here, so good luck!" solution. > > I'm not sure there are great solutions here, the system integrators are > constrained by the what the application appropriate silicon that's on > the market is capable of, the siicon is constrained by the area costs of > dealing with corner cases for system robustness and how much of the > market cares about fixing these issues and software is constrained by > what hardware ends up being built. Everyone's just got to try their > best with the reality they're confronted with, hopefully what's possible > will improve with time. Agreed, but I don't think this patch is going to actually work properly over time as there is no time values involved :) > > And again, how are you going to prevent the in-fighting of all device > > types to be "first" in the list? > > It doesn't seem like the most complex integration challenge we've ever > had to deal with TBH. True, but we all know how this grows and thinking about how to handle it now is key for this to be acceptable. thanks, greg k-h
On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote: > > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote: > > > So hardware is attempting to rely on software in order to prevent the > > > destruction of that same hardware? Surely hardware designers aren't > > > that crazy, right? (rhetorical question, I know...) > > Surely software people aren't going to make no effort to integrate with > > the notification features that the hardware engineers have so helpfully > > provided us with? > That would be great, but I don't see that here, do you? All I see is > the shutdown sequence changing because someone wants it to go "faster" > with the threat of hardware breaking if we don't meet that "faster" > number, yet no knowledge or guarantee that this number can ever be known > or happen. The idea was to have somewhere to send notifications when the hardware starts reporting things like power supplies starting to fail. We do have those from hardware, we just don't do anything terribly useful with them yet. TBH it does seem reasonable that there will be systems that can usefully detect these issues but hasn't got a detailed characterisation of exactly how long you've got before things expire, it's also likely that the actual bound is going to be highly variable depending on what the system is up to at the point of detection. It's quite likely that we'd only get a worst case bound so it's also likely that we'd have more time in practice than in spec. I'd expect characterisation that does happen to be very system specific at this point, I don't think we can rely on getting that information. I'd certainly expect that we have vastly more systems can usefully detect issues than systems where we have firm numbers. > > > > Same problem was seen not only in automotive devices, but also in > > > > industrial or agricultural. With other words, it is important enough to bring > > > > some kind of solution mainline. > > > But you are not providing a real solution here, only a "I am going to > > > attempt to shut down a specific type of device before the others, there > > > are no time or ordering guarantees here, so good luck!" solution. > > I'm not sure there are great solutions here, the system integrators are > > constrained by the what the application appropriate silicon that's on > > the market is capable of, the siicon is constrained by the area costs of > > dealing with corner cases for system robustness and how much of the > > market cares about fixing these issues and software is constrained by > > what hardware ends up being built. Everyone's just got to try their > > best with the reality they're confronted with, hopefully what's possible > > will improve with time. > Agreed, but I don't think this patch is going to actually work properly > over time as there is no time values involved :) This seems to be more into the area of mitigation than firm solution, I suspect users will be pleased if they can make a noticable dent in the number of failures they're seeing. > > > And again, how are you going to prevent the in-fighting of all device > > > types to be "first" in the list? > > It doesn't seem like the most complex integration challenge we've ever > > had to deal with TBH. > True, but we all know how this grows and thinking about how to handle it > now is key for this to be acceptable. It feels like if we're concerned about mitigating physical damage during the process of power failure that's a very limited set of devices - the storage case where we're in the middle of writing to flash or whatever is the most obvious case.
On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > > On Sat, Nov 25, 2023 at 10:30:42AM +0000, Mark Brown wrote: > > > On Sat, Nov 25, 2023 at 09:09:01AM +0000, Greg Kroah-Hartman wrote: > > > > > So hardware is attempting to rely on software in order to prevent the > > > > destruction of that same hardware? Surely hardware designers aren't > > > > that crazy, right? (rhetorical question, I know...) > > > > Surely software people aren't going to make no effort to integrate with > > > the notification features that the hardware engineers have so helpfully > > > provided us with? > > > That would be great, but I don't see that here, do you? All I see is > > the shutdown sequence changing because someone wants it to go "faster" > > with the threat of hardware breaking if we don't meet that "faster" > > number, yet no knowledge or guarantee that this number can ever be known > > or happen. > > The idea was to have somewhere to send notifications when the hardware > starts reporting things like power supplies starting to fail. We do > have those from hardware, we just don't do anything terribly useful > with them yet. Ok, but that's not what I recall this patchset doing, or did I missing something? All I saw was a "reorder the shutdown sequence" set of changes. Or at least that's all I remember at this point in time, sorry, it's been a few days, but at least that lines up with what the Subject line says above :) > TBH it does seem reasonable that there will be systems that can usefully > detect these issues but hasn't got a detailed characterisation of > exactly how long you've got before things expire, it's also likely that > the actual bound is going to be highly variable depending on what the > system is up to at the point of detection. It's quite likely that we'd > only get a worst case bound so it's also likely that we'd have more time > in practice than in spec. I'd expect characterisation that does happen > to be very system specific at this point, I don't think we can rely on > getting that information. I'd certainly expect that we have vastly more > systems can usefully detect issues than systems where we have firm > numbers. Sure, that all sounds good, but again, I don't think that's what is happening here. > > > > > Same problem was seen not only in automotive devices, but also in > > > > > industrial or agricultural. With other words, it is important enough to bring > > > > > some kind of solution mainline. > > > > > But you are not providing a real solution here, only a "I am going to > > > > attempt to shut down a specific type of device before the others, there > > > > are no time or ordering guarantees here, so good luck!" solution. > > > > I'm not sure there are great solutions here, the system integrators are > > > constrained by the what the application appropriate silicon that's on > > > the market is capable of, the siicon is constrained by the area costs of > > > dealing with corner cases for system robustness and how much of the > > > market cares about fixing these issues and software is constrained by > > > what hardware ends up being built. Everyone's just got to try their > > > best with the reality they're confronted with, hopefully what's possible > > > will improve with time. Note, if you attempt to mitigate broken hardware with software fixes, hardware will never get unbroken as it never needs to change. Push back on this, it's the only real way forward here. I know it's not always possible, but the number of times I have heard hardware engineers say "but no one ever told us that was broken/impossible/whatever, we just assumed software could handle it" is uncountable. > > Agreed, but I don't think this patch is going to actually work properly > > over time as there is no time values involved :) > > This seems to be more into the area of mitigation than firm solution, I > suspect users will be pleased if they can make a noticable dent in the > number of failures they're seeing. Mitigation is good, but this patch series is just a hack by doing "throw this device type at the front of the shutdown list because we have hardware that crashes a lot" :) > > > > And again, how are you going to prevent the in-fighting of all device > > > > types to be "first" in the list? > > > > It doesn't seem like the most complex integration challenge we've ever > > > had to deal with TBH. > > > True, but we all know how this grows and thinking about how to handle it > > now is key for this to be acceptable. > > It feels like if we're concerned about mitigating physical damage during > the process of power failure that's a very limited set of devices - the > storage case where we're in the middle of writing to flash or whatever > is the most obvious case. Then why isn't userspace handling this? This is a policy decision that it needs to take to properly know what hardware needs to be shut down, and what needs to happen in order to do that (i.e. flush, unmount, etc.?) And userspace today should be able to say, "power down this device now!" for any device in the system based on the sysfs device tree, or at the very least, force it to a specific power state. So why not handle this policy there? thanks, greg k-h
On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: > On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: > > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > > > That would be great, but I don't see that here, do you? All I see is > > > the shutdown sequence changing because someone wants it to go "faster" > > > with the threat of hardware breaking if we don't meet that "faster" > > > number, yet no knowledge or guarantee that this number can ever be known > > > or happen. > > The idea was to have somewhere to send notifications when the hardware > > starts reporting things like power supplies starting to fail. We do > > have those from hardware, we just don't do anything terribly useful > > with them yet. > Ok, but that's not what I recall this patchset doing, or did I missing > something? All I saw was a "reorder the shutdown sequence" set of > changes. Or at least that's all I remember at this point in time, > sorry, it's been a few days, but at least that lines up with what the > Subject line says above :) That's not in the series, a bunch of it is merged in some form (eg, see hw_protection_shutdown()) and more of it would need to be built on top if this were merged. > > > Agreed, but I don't think this patch is going to actually work properly > > > over time as there is no time values involved :) > > This seems to be more into the area of mitigation than firm solution, I > > suspect users will be pleased if they can make a noticable dent in the > > number of failures they're seeing. > Mitigation is good, but this patch series is just a hack by doing "throw > this device type at the front of the shutdown list because we have > hardware that crashes a lot" :) Sounds like a mitigation to me. > > It feels like if we're concerned about mitigating physical damage during > > the process of power failure that's a very limited set of devices - the > > storage case where we're in the middle of writing to flash or whatever > > is the most obvious case. > Then why isn't userspace handling this? This is a policy decision that > it needs to take to properly know what hardware needs to be shut down, > and what needs to happen in order to do that (i.e. flush, unmount, > etc.?) And userspace today should be able to say, "power down this > device now!" for any device in the system based on the sysfs device > tree, or at the very least, force it to a specific power state. So why > not handle this policy there? Given the tight timelines it does seem reasonable to have some of this in the kernel - the specific decisions about how to handle these events can always be controlled from userspace (eg, with a sysfs file like we do for autosuspend delay times which seem to be in a similar ballpark).
On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote: > On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: > > On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: > > > On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > > > > > That would be great, but I don't see that here, do you? All I see is > > > > the shutdown sequence changing because someone wants it to go "faster" > > > > with the threat of hardware breaking if we don't meet that "faster" > > > > number, yet no knowledge or guarantee that this number can ever be known > > > > or happen. > > > > The idea was to have somewhere to send notifications when the hardware > > > starts reporting things like power supplies starting to fail. We do > > > have those from hardware, we just don't do anything terribly useful > > > with them yet. > > > Ok, but that's not what I recall this patchset doing, or did I missing > > something? All I saw was a "reorder the shutdown sequence" set of > > changes. Or at least that's all I remember at this point in time, > > sorry, it's been a few days, but at least that lines up with what the > > Subject line says above :) > > That's not in the series, a bunch of it is merged in some form (eg, see > hw_protection_shutdown()) and more of it would need to be built on top > if this were merged. The current kernel has enough infrastructure to manage essential functions related to hardware protection: - The Device Tree specifies the source of interrupts for detecting under-voltage events. It also details critical system regulators and some of specification of backup power supplied by the board. - Various frameworks within the kernel can identify critical hardware conditions like over-temperature and under-voltage. Upon detection, these frameworks invoke the hw_protection_shutdown() function. > > > > Agreed, but I don't think this patch is going to actually work properly > > > > over time as there is no time values involved :) If we're to implement a deadline for each shutdown call (as the requirement for "time values" suggests?), then prioritization becomes essential. Without establishing a shutdown order, the inclusion of time values might not be effectively utilized. Am I overlooking anything in this regard? > > > This seems to be more into the area of mitigation than firm solution, I > > > suspect users will be pleased if they can make a noticable dent in the > > > number of failures they're seeing. > > > Mitigation is good, but this patch series is just a hack by doing "throw > > this device type at the front of the shutdown list because we have > > hardware that crashes a lot" :) The root of the issue seems to be the choice of primary storage device. All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power loss. The only foolproof safeguard is a backup power source, but this introduces its own set of challenges: 1. Batteries: While they provide a backup, they come with limitations like a finite number of charge cycles, sensitivity to temperature (a significant concern in industrial and automotive environments), higher costs, and increased device size. For most embedded applications, a UPS isn't a viable solution. 2. Capacitors: A potential alternative, but they cannot offer prolonged backup time. Increasing the number of capacitors to extend backup time leads to additional issues: - Increased costs and space requirements on the PCB. - The need to manage partially charged capacitors during power failures. - The requirement for a power supply capable of rapid charging. - The risk of not reaching a safe state before the backup energy depletes. - In specific environments, like explosive atmospheres, storing large amounts of energy can be hazardous. Given these considerations, it's crucial to understand that such design choices aren't merely "hacks". They represent a balance between different types of trade-offs. > > > It feels like if we're concerned about mitigating physical damage during > > > the process of power failure that's a very limited set of devices - the > > > storage case where we're in the middle of writing to flash or whatever > > > is the most obvious case. > > > Then why isn't userspace handling this? This is a policy decision that > > it needs to take to properly know what hardware needs to be shut down, > > and what needs to happen in order to do that (i.e. flush, unmount, > > etc.?) And userspace today should be able to say, "power down this > > device now!" for any device in the system based on the sysfs device > > tree, or at the very least, force it to a specific power state. So why > > not handle this policy there? > > Given the tight timelines it does seem reasonable to have some of this > in the kernel - the specific decisions about how to handle these events > can always be controlled from userspace (eg, with a sysfs file like we > do for autosuspend delay times which seem to be in a similar ballpark). Upon investigating the feasibility of a user space solution for eMMC power control, I've concluded that it's likely not possible. The primary issue is that most board designs don't include reset signaling for eMMCs. Additionally, the eMMC power rail is usually linked to the system's main power controller. While powering off is doable, cleanly powering it back on isn’t feasible. This is especially problematic when the rootfs is located on the eMMC, as power cycling the storage device could lead to system instability. Therefore, any user space method to power off eMMC wouldn't be reliable or safe, as there's no way to ensure it can be turned back on without risking the integrity of the system. The design rationale is clear: avoiding the risks associated with powering off the primary storage device. Considering these constraints, the only practical implementation I see is integrating this functionality into the system's shutdown sequence. This approach ensures a controlled environment for powering off the eMMC, avoiding potential issues.
Ha ha, Funny discussion. As a hardware engineer (with no experience in automotive, but actual experience in industrial applications and debugging issues arising from bad shutdowns) let me add my 5ct at the end. Op 26-11-2023 om 11:14 schreef Mark Brown: > On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: >> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: >>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > >>>> That would be great, but I don't see that here, do you? All I see is >>>> the shutdown sequence changing because someone wants it to go "faster" >>>> with the threat of hardware breaking if we don't meet that "faster" >>>> number, yet no knowledge or guarantee that this number can ever be known >>>> or happen. > >>> The idea was to have somewhere to send notifications when the hardware >>> starts reporting things like power supplies starting to fail. We do >>> have those from hardware, we just don't do anything terribly useful >>> with them yet. > >> Ok, but that's not what I recall this patchset doing, or did I missing >> something? All I saw was a "reorder the shutdown sequence" set of >> changes. Or at least that's all I remember at this point in time, >> sorry, it's been a few days, but at least that lines up with what the >> Subject line says above :) > > That's not in the series, a bunch of it is merged in some form (eg, see > hw_protection_shutdown()) and more of it would need to be built on top > if this were merged. > >>>> Agreed, but I don't think this patch is going to actually work properly >>>> over time as there is no time values involved :) > >>> This seems to be more into the area of mitigation than firm solution, I >>> suspect users will be pleased if they can make a noticable dent in the >>> number of failures they're seeing. > >> Mitigation is good, but this patch series is just a hack by doing "throw >> this device type at the front of the shutdown list because we have >> hardware that crashes a lot" :) > > Sounds like a mitigation to me. > >>> It feels like if we're concerned about mitigating physical damage during >>> the process of power failure that's a very limited set of devices - the >>> storage case where we're in the middle of writing to flash or whatever >>> is the most obvious case. > >> Then why isn't userspace handling this? This is a policy decision that >> it needs to take to properly know what hardware needs to be shut down, >> and what needs to happen in order to do that (i.e. flush, unmount, >> etc.?) And userspace today should be able to say, "power down this >> device now!" for any device in the system based on the sysfs device >> tree, or at the very least, force it to a specific power state. So why >> not handle this policy there? > > Given the tight timelines it does seem reasonable to have some of this > in the kernel - the specific decisions about how to handle these events > can always be controlled from userspace (eg, with a sysfs file like we > do for autosuspend delay times which seem to be in a similar ballpark). I'd prefer not to call the HW broken in this case. The life of hardware (unlike software) continues during and after power down. That means there may be requirements and specs for it to conform to during those transitions and states. Unlike broken hardware, which does not conform to its specs. Typically, a HDD that autoparks its heads to a safe position on its last rotation energy, that's not broken, that's carefully designed. That said, I agree with Greg, if there is a hard requirement to shutdown safely to prevent damage, the solution is not to shutdown fast. The solution is to shutdown on time. In fact, if the software needs more energy to shutdown safely, any hardware engineer will consider that a requirement. And ask the appropriate question: "how much energy do you need exactly?". There are various reasons why that can not be answered in general. The most funny answer I ever got (thanks Albert) being: "My software doesn't consume energy". Now, we do need to keep in mind that storing J in a supercap, executing a CPU at GHz, storing GB data do not come free. So, after making sure things shutdown in time, it often pays off to shorten that deadline, and indeed make it faster. Looking at the above discussion from the different angles: 1) The hardware mentioned does not need to shutdown (as said, it doesn't need to be unmounted). It needs to be placed into a safe state on time. And the only thing here that can know for the particular hardware what is a safe state, is the driver itself. 2) To get a signal (Low Power Warning) to the driver on time, the PREEMPT_RT kernel seems like a natural choice. 3) To me (but hey who am I) it makes sense to have a generic mechanism from drivers to transition to their safe state if they require that. 4) I wouldn't worry about drivers fighting for priority, these systems are normally "embedded" with fixed hardware. Otherwise there is no way to calculate shutdown energy required and do proper hardware design.
On 25/11/2023 08:50, Oleksij Rempel wrote: > On Sat, Nov 25, 2023 at 06:51:55AM +0000, Greg Kroah-Hartman wrote: >> On Fri, Nov 24, 2023 at 07:57:25PM +0100, Oleksij Rempel wrote: >>> On Fri, Nov 24, 2023 at 05:26:30PM +0000, Greg Kroah-Hartman wrote: >>>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: >>>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: >>>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: >>>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: >>>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: >>>>>>> >>>>>>>>> This came out of some discussions about trying to handle emergency power >>>>>>>>> failure notifications. >>>>>>> >>>>>>>> I'm sorry, but I don't know what that means. Are you saying that the >>>>>>>> kernel is now going to try to provide a hard guarantee that some devices >>>>>>>> are going to be shut down in X number of seconds when asked? If so, why >>>>>>>> not do this in userspace? >>>>>>> >>>>>>> No, it was initially (or when I initially saw it anyway) handling of >>>>>>> notifications from regulators that they're in trouble and we have some >>>>>>> small amount of time to do anything we might want to do about it before >>>>>>> we expire. >>>>>> >>>>>> So we are going to guarantee a "time" in which we are going to do >>>>>> something? Again, if that's required, why not do it in userspace using >>>>>> a RT kernel? >>>>> >>>>> For the HW in question I have only 100ms time before power loss. By >>>>> doing it over use space some we will have even less time to react. >>>> >>>> Why can't userspace react that fast? Why will the kernel be somehow >>>> faster? Speed should be the same, just get the "power is cut" signal >>>> and have userspace flush and unmount the disk before power is gone. Why >>>> can the kernel do this any differently? >>>> >>>>> In fact, this is not a new requirement. It exist on different flavors of >>>>> automotive Linux for about 10 years. Linux in cars should be able to >>>>> handle voltage drops for example on ignition and so on. The only new thing is >>>>> the attempt to mainline it. >>>> >>>> But your patch is not guaranteeing anything, it's just doing a "I want >>>> this done before the other devices are handled", that's it. There is no >>>> chance that 100ms is going to be a requirement, or that some other >>>> device type is not going to come along and demand to be ahead of your >>>> device in the list. >>>> >>>> So you are going to have a constant fight among device types over the >>>> years, and people complaining that the kernel is now somehow going to >>>> guarantee that a device is shutdown in a set amount of time, which >>>> again, the kernel can not guarantee here. >>>> >>>> This might work as a one-off for a specific hardware platform, which is >>>> odd, but not anything you really should be adding for anyone else to use >>>> here as your reasoning for it does not reflect what the code does. >>> >>> I see. Good point. >>> >>> In my case umount is not needed, there is not enough time to write down >>> the data. We should send a shutdown command to the eMMC ASAP. >> >> If you don't care about the data, why is a shutdown command to the >> hardware needed? What does that do that makes anything "safe" if your >> data is lost. > > It prevents HW damage. In a typical automotive under-voltage labor it is > usually possible to reproduce X amount of bricked eMMCs or NANDs on Y > amount of under-voltage cycles (I do not have exact numbers right now). > Even if the numbers not so high in the labor tests (sometimes something > like one bricked device in a month of tests), the field returns are > significant enough to care about software solution for this problem. > > Same problem was seen not only in automotive devices, but also in > industrial or agricultural. With other words, it is important enough to bring > some kind of solution mainline. > IMO that is a serious problem with the used storage / eMMC in that case and it is not suitable for industrial/automotive uses? Any industrial/automotive-suitable storage device should detect under-voltage and just treat it as a power-down/loss, and while that isn't nice for the storage device, it really shouldn't be able to brick a device (within <1M cycles anyway). What does the storage module vendor say about this? BR, Christian
On 26/11/2023 19:31, Oleksij Rempel wrote: > On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote: >> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: >>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: >>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: >> >>>>> That would be great, but I don't see that here, do you? All I see is >>>>> the shutdown sequence changing because someone wants it to go "faster" >>>>> with the threat of hardware breaking if we don't meet that "faster" >>>>> number, yet no knowledge or guarantee that this number can ever be known >>>>> or happen. >> >>>> The idea was to have somewhere to send notifications when the hardware >>>> starts reporting things like power supplies starting to fail. We do >>>> have those from hardware, we just don't do anything terribly useful >>>> with them yet. >> >>> Ok, but that's not what I recall this patchset doing, or did I missing >>> something? All I saw was a "reorder the shutdown sequence" set of >>> changes. Or at least that's all I remember at this point in time, >>> sorry, it's been a few days, but at least that lines up with what the >>> Subject line says above :) >> >> That's not in the series, a bunch of it is merged in some form (eg, see >> hw_protection_shutdown()) and more of it would need to be built on top >> if this were merged. > > The current kernel has enough infrastructure to manage essential functions > related to hardware protection: > - The Device Tree specifies the source of interrupts for detecting > under-voltage events. It also details critical system regulators and some > of specification of backup power supplied by the board. > - Various frameworks within the kernel can identify critical hardware > conditions like over-temperature and under-voltage. Upon detection, these > frameworks invoke the hw_protection_shutdown() function. > >>>>> Agreed, but I don't think this patch is going to actually work properly >>>>> over time as there is no time values involved :) > > If we're to implement a deadline for each shutdown call (as the requirement for > "time values" suggests?), then prioritization becomes essential. Without > establishing a shutdown order, the inclusion of time values might not be > effectively utilized. Am I overlooking anything in this regard? > >>>> This seems to be more into the area of mitigation than firm solution, I >>>> suspect users will be pleased if they can make a noticable dent in the >>>> number of failures they're seeing. >> >>> Mitigation is good, but this patch series is just a hack by doing "throw >>> this device type at the front of the shutdown list because we have >>> hardware that crashes a lot" :) > > The root of the issue seems to be the choice of primary storage device. > > All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power > loss. The only foolproof safeguard is a backup power source, but this > introduces its own set of challenges: I disagree and would say that any storage device sold as "industrial" should guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern, but the storage device fails/bricks. > > 1. Batteries: While they provide a backup, they come with limitations like a > finite number of charge cycles, sensitivity to temperature (a significant > concern in industrial and automotive environments), higher costs, and > increased device size. For most embedded applications, a UPS isn't a viable > solution. > > 2. Capacitors: A potential alternative, but they cannot offer prolonged > backup time. Increasing the number of capacitors to extend backup time leads > to additional issues: > - Increased costs and space requirements on the PCB. > - The need to manage partially charged capacitors during power failures. > - The requirement for a power supply capable of rapid charging. > - The risk of not reaching a safe state before the backup energy > depletes. > - In specific environments, like explosive atmospheres, storing large > amounts of energy can be hazardous. And also just practically, ensuring a safe power down could be in the order of a second, so it would be quite a capacitor. > > Given these considerations, it's crucial to understand that such design choices > aren't merely "hacks". They represent a balance between different types of > trade-offs. > >>>> It feels like if we're concerned about mitigating physical damage during >>>> the process of power failure that's a very limited set of devices - the >>>> storage case where we're in the middle of writing to flash or whatever >>>> is the most obvious case. >> >>> Then why isn't userspace handling this? This is a policy decision that >>> it needs to take to properly know what hardware needs to be shut down, >>> and what needs to happen in order to do that (i.e. flush, unmount, >>> etc.?) And userspace today should be able to say, "power down this >>> device now!" for any device in the system based on the sysfs device >>> tree, or at the very least, force it to a specific power state. So why >>> not handle this policy there? >> >> Given the tight timelines it does seem reasonable to have some of this >> in the kernel - the specific decisions about how to handle these events >> can always be controlled from userspace (eg, with a sysfs file like we >> do for autosuspend delay times which seem to be in a similar ballpark). > > Upon investigating the feasibility of a user space solution for eMMC > power control, I've concluded that it's likely not possible. The primary > issue is that most board designs don't include reset signaling for > eMMCs. Additionally, the eMMC power rail is usually linked to the > system's main power controller. While powering off is doable, cleanly > powering it back on isn’t feasible. This is especially problematic when > the rootfs is located on the eMMC, as power cycling the storage device > could lead to system instability. > > Therefore, any user space method to power off eMMC wouldn't be reliable > or safe, as there's no way to ensure it can be turned back on without > risking the integrity of the system. The design rationale is clear: > avoiding the risks associated with powering off the primary storage > device. > > Considering these constraints, the only practical implementation I see > is integrating this functionality into the system's shutdown sequence. > This approach ensures a controlled environment for powering off the > eMMC, avoiding potential issues. You don't need the RST signal, in fact even if you had it it would be the wrong thing to do. (Implementation is vendor-specific but RST assumes that eMMCs' VCC and VCCQ are left untouched.) You can try turning off eMMC cache completely and/or sending power down notification on 'emergency shutdown', but since power-loss/fail behavior is vendor-specific asking the storage device vendor how to ensure a safe power-down. Anyway the proper eMMC power-down methods are up to a second in timeouts, so infeasible for your requirements from what I can see. BR, Christian
On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote: > > Same problem was seen not only in automotive devices, but also in > > industrial or agricultural. With other words, it is important enough to bring > > some kind of solution mainline. > > > > IMO that is a serious problem with the used storage / eMMC in that case and it > is not suitable for industrial/automotive uses? > Any industrial/automotive-suitable storage device should detect under-voltage and > just treat it as a power-down/loss, and while that isn't nice for the storage device, > it really shouldn't be able to brick a device (within <1M cycles anyway). > What does the storage module vendor say about this? Good question. I do not have insights ATM. I'll forward it.
On Mon, Nov 27, 2023 at 11:27:31AM +0000, Christian Loehle wrote: > On 26/11/2023 19:31, Oleksij Rempel wrote: > > On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote: > >> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: > >>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: > >>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: > >> > >>>>> That would be great, but I don't see that here, do you? All I see is > >>>>> the shutdown sequence changing because someone wants it to go "faster" > >>>>> with the threat of hardware breaking if we don't meet that "faster" > >>>>> number, yet no knowledge or guarantee that this number can ever be known > >>>>> or happen. > >> > >>>> The idea was to have somewhere to send notifications when the hardware > >>>> starts reporting things like power supplies starting to fail. We do > >>>> have those from hardware, we just don't do anything terribly useful > >>>> with them yet. > >> > >>> Ok, but that's not what I recall this patchset doing, or did I missing > >>> something? All I saw was a "reorder the shutdown sequence" set of > >>> changes. Or at least that's all I remember at this point in time, > >>> sorry, it's been a few days, but at least that lines up with what the > >>> Subject line says above :) > >> > >> That's not in the series, a bunch of it is merged in some form (eg, see > >> hw_protection_shutdown()) and more of it would need to be built on top > >> if this were merged. > > > > The current kernel has enough infrastructure to manage essential functions > > related to hardware protection: > > - The Device Tree specifies the source of interrupts for detecting > > under-voltage events. It also details critical system regulators and some > > of specification of backup power supplied by the board. > > - Various frameworks within the kernel can identify critical hardware > > conditions like over-temperature and under-voltage. Upon detection, these > > frameworks invoke the hw_protection_shutdown() function. > > > >>>>> Agreed, but I don't think this patch is going to actually work properly > >>>>> over time as there is no time values involved :) > > > > If we're to implement a deadline for each shutdown call (as the requirement for > > "time values" suggests?), then prioritization becomes essential. Without > > establishing a shutdown order, the inclusion of time values might not be > > effectively utilized. Am I overlooking anything in this regard? > > > >>>> This seems to be more into the area of mitigation than firm solution, I > >>>> suspect users will be pleased if they can make a noticable dent in the > >>>> number of failures they're seeing. > >> > >>> Mitigation is good, but this patch series is just a hack by doing "throw > >>> this device type at the front of the shutdown list because we have > >>> hardware that crashes a lot" :) > > > > The root of the issue seems to be the choice of primary storage device. > > > > All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power > > loss. The only foolproof safeguard is a backup power source, but this > > introduces its own set of challenges: > > I disagree and would say that any storage device sold as "industrial" should > guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern, > but the storage device fails/bricks. > > > > 1. Batteries: While they provide a backup, they come with limitations like a > > finite number of charge cycles, sensitivity to temperature (a significant > > concern in industrial and automotive environments), higher costs, and > > increased device size. For most embedded applications, a UPS isn't a viable > > solution. > > > > 2. Capacitors: A potential alternative, but they cannot offer prolonged > > backup time. Increasing the number of capacitors to extend backup time leads > > to additional issues: > > - Increased costs and space requirements on the PCB. > > - The need to manage partially charged capacitors during power failures. > > - The requirement for a power supply capable of rapid charging. > > - The risk of not reaching a safe state before the backup energy > > depletes. > > - In specific environments, like explosive atmospheres, storing large > > amounts of energy can be hazardous. > > And also just practically, ensuring a safe power down could be in the order > of a second, so it would be quite a capacitor. > > > > > Given these considerations, it's crucial to understand that such design choices > > aren't merely "hacks". They represent a balance between different types of > > trade-offs. > > > >>>> It feels like if we're concerned about mitigating physical damage during > >>>> the process of power failure that's a very limited set of devices - the > >>>> storage case where we're in the middle of writing to flash or whatever > >>>> is the most obvious case. > >> > >>> Then why isn't userspace handling this? This is a policy decision that > >>> it needs to take to properly know what hardware needs to be shut down, > >>> and what needs to happen in order to do that (i.e. flush, unmount, > >>> etc.?) And userspace today should be able to say, "power down this > >>> device now!" for any device in the system based on the sysfs device > >>> tree, or at the very least, force it to a specific power state. So why > >>> not handle this policy there? > >> > >> Given the tight timelines it does seem reasonable to have some of this > >> in the kernel - the specific decisions about how to handle these events > >> can always be controlled from userspace (eg, with a sysfs file like we > >> do for autosuspend delay times which seem to be in a similar ballpark). > > > > Upon investigating the feasibility of a user space solution for eMMC > > power control, I've concluded that it's likely not possible. The primary > > issue is that most board designs don't include reset signaling for > > eMMCs. Additionally, the eMMC power rail is usually linked to the > > system's main power controller. While powering off is doable, cleanly > > powering it back on isn’t feasible. This is especially problematic when > > the rootfs is located on the eMMC, as power cycling the storage device > > could lead to system instability. > > > > Therefore, any user space method to power off eMMC wouldn't be reliable > > or safe, as there's no way to ensure it can be turned back on without > > risking the integrity of the system. The design rationale is clear: > > avoiding the risks associated with powering off the primary storage > > device. > > > > Considering these constraints, the only practical implementation I see > > is integrating this functionality into the system's shutdown sequence. > > This approach ensures a controlled environment for powering off the > > eMMC, avoiding potential issues. > > You don't need the RST signal, in fact even if you had it it would be > the wrong thing to do. (Implementation is vendor-specific but RST > assumes that eMMCs' VCC and VCCQ are left untouched.) It means, if VCC and VCCQ are off on reboot or watchdog reset, there is potentially bigger problem? > You can try turning off eMMC cache completely and/or sending power down > notification on 'emergency shutdown', but since power-loss/fail behavior > is vendor-specific asking the storage device vendor how to ensure a safe > power-down. > Anyway the proper eMMC power-down methods are up to a second in timeouts, > so infeasible for your requirements from what I can see. Ok. So, increasing capacity at least to one second should be main goal for now? But even if capacity is increased, emergency shutdown should notify eMMCs as early as possible?
On 27/11/2023 11:44, Oleksij Rempel wrote: > On Mon, Nov 27, 2023 at 11:27:31AM +0000, Christian Loehle wrote: >> On 26/11/2023 19:31, Oleksij Rempel wrote: >>> On Sun, Nov 26, 2023 at 10:14:45AM +0000, Mark Brown wrote: >>>> On Sat, Nov 25, 2023 at 07:58:12PM +0000, Greg Kroah-Hartman wrote: >>>>> On Sat, Nov 25, 2023 at 03:43:02PM +0000, Mark Brown wrote: >>>>>> On Sat, Nov 25, 2023 at 02:35:41PM +0000, Greg Kroah-Hartman wrote: >>>> >>>>>>> That would be great, but I don't see that here, do you? All I see is >>>>>>> the shutdown sequence changing because someone wants it to go "faster" >>>>>>> with the threat of hardware breaking if we don't meet that "faster" >>>>>>> number, yet no knowledge or guarantee that this number can ever be known >>>>>>> or happen. >>>> >>>>>> The idea was to have somewhere to send notifications when the hardware >>>>>> starts reporting things like power supplies starting to fail. We do >>>>>> have those from hardware, we just don't do anything terribly useful >>>>>> with them yet. >>>> >>>>> Ok, but that's not what I recall this patchset doing, or did I missing >>>>> something? All I saw was a "reorder the shutdown sequence" set of >>>>> changes. Or at least that's all I remember at this point in time, >>>>> sorry, it's been a few days, but at least that lines up with what the >>>>> Subject line says above :) >>>> >>>> That's not in the series, a bunch of it is merged in some form (eg, see >>>> hw_protection_shutdown()) and more of it would need to be built on top >>>> if this were merged. >>> >>> The current kernel has enough infrastructure to manage essential functions >>> related to hardware protection: >>> - The Device Tree specifies the source of interrupts for detecting >>> under-voltage events. It also details critical system regulators and some >>> of specification of backup power supplied by the board. >>> - Various frameworks within the kernel can identify critical hardware >>> conditions like over-temperature and under-voltage. Upon detection, these >>> frameworks invoke the hw_protection_shutdown() function. >>> >>>>>>> Agreed, but I don't think this patch is going to actually work properly >>>>>>> over time as there is no time values involved :) >>> >>> If we're to implement a deadline for each shutdown call (as the requirement for >>> "time values" suggests?), then prioritization becomes essential. Without >>> establishing a shutdown order, the inclusion of time values might not be >>> effectively utilized. Am I overlooking anything in this regard? >>> >>>>>> This seems to be more into the area of mitigation than firm solution, I >>>>>> suspect users will be pleased if they can make a noticable dent in the >>>>>> number of failures they're seeing. >>>> >>>>> Mitigation is good, but this patch series is just a hack by doing "throw >>>>> this device type at the front of the shutdown list because we have >>>>> hardware that crashes a lot" :) >>> >>> The root of the issue seems to be the choice of primary storage device. >>> >>> All storage technologies - HDD, SSD, eMMC, NAND - are vulnerable to power >>> loss. The only foolproof safeguard is a backup power source, but this >>> introduces its own set of challenges: >> >> I disagree and would say that any storage device sold as "industrial" should >> guarantee power-fail safety. Plus, you mentioned data loss isn't even your concern, >> but the storage device fails/bricks. >>> >>> 1. Batteries: While they provide a backup, they come with limitations like a >>> finite number of charge cycles, sensitivity to temperature (a significant >>> concern in industrial and automotive environments), higher costs, and >>> increased device size. For most embedded applications, a UPS isn't a viable >>> solution. >>> >>> 2. Capacitors: A potential alternative, but they cannot offer prolonged >>> backup time. Increasing the number of capacitors to extend backup time leads >>> to additional issues: >>> - Increased costs and space requirements on the PCB. >>> - The need to manage partially charged capacitors during power failures. >>> - The requirement for a power supply capable of rapid charging. >>> - The risk of not reaching a safe state before the backup energy >>> depletes. >>> - In specific environments, like explosive atmospheres, storing large >>> amounts of energy can be hazardous. >> >> And also just practically, ensuring a safe power down could be in the order >> of a second, so it would be quite a capacitor. >> >>> >>> Given these considerations, it's crucial to understand that such design choices >>> aren't merely "hacks". They represent a balance between different types of >>> trade-offs. >>> >>>>>> It feels like if we're concerned about mitigating physical damage during >>>>>> the process of power failure that's a very limited set of devices - the >>>>>> storage case where we're in the middle of writing to flash or whatever >>>>>> is the most obvious case. >>>> >>>>> Then why isn't userspace handling this? This is a policy decision that >>>>> it needs to take to properly know what hardware needs to be shut down, >>>>> and what needs to happen in order to do that (i.e. flush, unmount, >>>>> etc.?) And userspace today should be able to say, "power down this >>>>> device now!" for any device in the system based on the sysfs device >>>>> tree, or at the very least, force it to a specific power state. So why >>>>> not handle this policy there? >>>> >>>> Given the tight timelines it does seem reasonable to have some of this >>>> in the kernel - the specific decisions about how to handle these events >>>> can always be controlled from userspace (eg, with a sysfs file like we >>>> do for autosuspend delay times which seem to be in a similar ballpark). >>> >>> Upon investigating the feasibility of a user space solution for eMMC >>> power control, I've concluded that it's likely not possible. The primary >>> issue is that most board designs don't include reset signaling for >>> eMMCs. Additionally, the eMMC power rail is usually linked to the >>> system's main power controller. While powering off is doable, cleanly >>> powering it back on isn’t feasible. This is especially problematic when >>> the rootfs is located on the eMMC, as power cycling the storage device >>> could lead to system instability. >>> >>> Therefore, any user space method to power off eMMC wouldn't be reliable >>> or safe, as there's no way to ensure it can be turned back on without >>> risking the integrity of the system. The design rationale is clear: >>> avoiding the risks associated with powering off the primary storage >>> device. >>> >>> Considering these constraints, the only practical implementation I see >>> is integrating this functionality into the system's shutdown sequence. >>> This approach ensures a controlled environment for powering off the >>> eMMC, avoiding potential issues. >> >> You don't need the RST signal, in fact even if you had it it would be >> the wrong thing to do. (Implementation is vendor-specific but RST >> assumes that eMMCs' VCC and VCCQ are left untouched.) > > It means, if VCC and VCCQ are off on reboot or watchdog reset, there is > potentially bigger problem? Just to confirm I was talking about EMMC_RST signal, which I understood you where talking about, too? Sending a EMMC_RST pulse does not have to trigger a safe shutdown for the eMMC, (again it could), but both VCC and VCCQ should be left untouched and stable. (Otherwise with the short timeout on EMMC_RST you might as well toggle EMMC_VCC.) If you toggle EMMC_VCC and EMMC_VCCQ on system reboot/reset is a design choice, but if your eMMC module has trouble with 'sudden' power-loss, leaving them on could be beneficial? Anyway definitely not required and not really related to your issue. > >> You can try turning off eMMC cache completely and/or sending power down >> notification on 'emergency shutdown', but since power-loss/fail behavior >> is vendor-specific asking the storage device vendor how to ensure a safe >> power-down. >> Anyway the proper eMMC power-down methods are up to a second in timeouts, >> so infeasible for your requirements from what I can see. > > Ok. So, increasing capacity at least to one second should be main goal > for now? But even if capacity is increased, emergency shutdown should > notify eMMCs as early as possible? Well if that is an option that is a path to explore for sure, that is, if you don't want to switch the eMMC to something more fitting. Yes you would need to notify the eMMC somehow, otherwise you just delayed the power-fail for a second. Sending a sleep, power-down notification or flushing the cache could all be ways to trigger a safe shutdown for the eMMC, but again, you really would have to confirm this with the eMMC vendor, as all of these can essentially be implemented as a NOP (apart from the state-machine transition) and be spec-compliant. BR, Christian
pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman (gregkh@linuxfoundation.org) kirjoitti: > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > failure notifications. > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > not do this in userspace? > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > notifications from regulators that they're in trouble and we have some > > > > small amount of time to do anything we might want to do about it before > > > > we expire. > > > > > > So we are going to guarantee a "time" in which we are going to do > > > something? Again, if that's required, why not do it in userspace using > > > a RT kernel? > > > > For the HW in question I have only 100ms time before power loss. By > > doing it over use space some we will have even less time to react. > > Why can't userspace react that fast? Why will the kernel be somehow > faster? Speed should be the same, just get the "power is cut" signal > and have userspace flush and unmount the disk before power is gone. Why > can the kernel do this any differently? > > > In fact, this is not a new requirement. It exist on different flavors of > > automotive Linux for about 10 years. Linux in cars should be able to > > handle voltage drops for example on ignition and so on. The only new thing is > > the attempt to mainline it. > > But your patch is not guaranteeing anything, it's just doing a "I want > this done before the other devices are handled", that's it. There is no > chance that 100ms is going to be a requirement, or that some other > device type is not going to come along and demand to be ahead of your > device in the list. > > So you are going to have a constant fight among device types over the > years, and people complaining that the kernel is now somehow going to > guarantee that a device is shutdown in a set amount of time, which > again, the kernel can not guarantee here. > > This might work as a one-off for a specific hardware platform, which is > odd, but not anything you really should be adding for anyone else to use > here as your reasoning for it does not reflect what the code does. I was (am) interested in knowing how/where the regulator error notifications are utilized - hence I asked this in ELCE last summer. Replies indeed mostly pointed to automotive and handling the under voltage events. As to what has changed (I think this was asked in another mail on this topic) - I understood from the discussions that the demand of running systems with as low power as possible is even more important/desirable. Hence, the under-voltage events are more usual than they were when cars used to be working by burning flammable liquids :) Anyways, what I thought I'd comment on is that the severity of the regulator error notifications can be given from device-tree. Rationale behind this is that figuring out whether a certain detected problem is fatal or not (in embedded systems) should be done by the board designers, per board. Maybe the understanding which hardware should react first is also a property of hardware and could come from the device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" set unconditionally for EMMC, systems could set shutdown priority per board and per device explicitly using device-tree? Yours, -- Matti
On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote: > pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman > (gregkh@linuxfoundation.org) kirjoitti: > > > > On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: > > > On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: > > > > On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: > > > > > On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: > > > > > > On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: > > > > > > > > > > > > This came out of some discussions about trying to handle emergency power > > > > > > > failure notifications. > > > > > > > > > > > I'm sorry, but I don't know what that means. Are you saying that the > > > > > > kernel is now going to try to provide a hard guarantee that some devices > > > > > > are going to be shut down in X number of seconds when asked? If so, why > > > > > > not do this in userspace? > > > > > > > > > > No, it was initially (or when I initially saw it anyway) handling of > > > > > notifications from regulators that they're in trouble and we have some > > > > > small amount of time to do anything we might want to do about it before > > > > > we expire. > > > > > > > > So we are going to guarantee a "time" in which we are going to do > > > > something? Again, if that's required, why not do it in userspace using > > > > a RT kernel? > > > > > > For the HW in question I have only 100ms time before power loss. By > > > doing it over use space some we will have even less time to react. > > > > Why can't userspace react that fast? Why will the kernel be somehow > > faster? Speed should be the same, just get the "power is cut" signal > > and have userspace flush and unmount the disk before power is gone. Why > > can the kernel do this any differently? > > > > > In fact, this is not a new requirement. It exist on different flavors of > > > automotive Linux for about 10 years. Linux in cars should be able to > > > handle voltage drops for example on ignition and so on. The only new thing is > > > the attempt to mainline it. > > > > But your patch is not guaranteeing anything, it's just doing a "I want > > this done before the other devices are handled", that's it. There is no > > chance that 100ms is going to be a requirement, or that some other > > device type is not going to come along and demand to be ahead of your > > device in the list. > > > > So you are going to have a constant fight among device types over the > > years, and people complaining that the kernel is now somehow going to > > guarantee that a device is shutdown in a set amount of time, which > > again, the kernel can not guarantee here. > > > > This might work as a one-off for a specific hardware platform, which is > > odd, but not anything you really should be adding for anyone else to use > > here as your reasoning for it does not reflect what the code does. > > I was (am) interested in knowing how/where the regulator error > notifications are utilized - hence I asked this in ELCE last summer. > Replies indeed mostly pointed to automotive and handling the under > voltage events. > > As to what has changed (I think this was asked in another mail on this > topic) - I understood from the discussions that the demand of running > systems with as low power as possible is even more > important/desirable. Hence, the under-voltage events are more usual > than they were when cars used to be working by burning flammable > liquids :) > > Anyways, what I thought I'd comment on is that the severity of the > regulator error notifications can be given from device-tree. Rationale > behind this is that figuring out whether a certain detected problem is > fatal or not (in embedded systems) should be done by the board > designers, per board. Maybe the understanding which hardware should > react first is also a property of hardware and could come from the > device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" > set unconditionally for EMMC, systems could set shutdown priority per > board and per device explicitly using device-tree? Yes, using device tree would be good, but now you have created something that is device-tree-specific and not all the world is device tree :( Also, many devices are finally moving out to non-device-tree busses, like PCI and USB, so how would you handle them in this type of scheme? thanks, greg k-h
On Sun, Nov 26, 2023 at 08:42:02PM +0100, Ferry Toth wrote: > Funny discussion. As a hardware engineer (with no experience in automotive, > but actual experience in industrial applications and debugging issues > arising from bad shutdowns) let me add my 5ct at the end. I suspect there's also a space here beyond systems that were designed with these failure modes in mind where people run into issues once they have the hardware and are trying to improve what they can after the fact. > Now, we do need to keep in mind that storing J in a supercap, executing a > CPU at GHz, storing GB data do not come free. So, after making sure things > shutdown in time, it often pays off to shorten that deadline, and indeed > make it faster. Indeed.
On Mon, Nov 27, 2023 at 01:08:24PM +0000, Greg Kroah-Hartman wrote: > Yes, using device tree would be good, but now you have created something > that is device-tree-specific and not all the world is device tree :( AFAICT the idiomatic thing for ACPI would be platform quirks based on DMI information. Yay ACPI. If the system is more Linux targetted then you can use _DSD properties to store DT properties, these can then be parsed out in a firmware interface neutral way via the fwnode API. I'm not sure there's any avoiding dealing with firmware interface specifics at some point if we need platform description. > Also, many devices are finally moving out to non-device-tree busses, > like PCI and USB, so how would you handle them in this type of scheme? DT does have bindings for devices on discoverable buses like PCI - I think the original thing was for vendors cheaping out on EEPROMs though it's also useful when things are soldered down in embedded systems.
On 11/27/23 15:08, Greg Kroah-Hartman wrote: > On Mon, Nov 27, 2023 at 02:54:21PM +0200, Matti Vaittinen wrote: >> pe 24. marrask. 2023 klo 19.26 Greg Kroah-Hartman >> (gregkh@linuxfoundation.org) kirjoitti: >>> >>> On Fri, Nov 24, 2023 at 05:32:34PM +0100, Oleksij Rempel wrote: >>>> On Fri, Nov 24, 2023 at 03:56:19PM +0000, Greg Kroah-Hartman wrote: >>>>> On Fri, Nov 24, 2023 at 03:49:46PM +0000, Mark Brown wrote: >>>>>> On Fri, Nov 24, 2023 at 03:27:48PM +0000, Greg Kroah-Hartman wrote: >>>>>>> On Fri, Nov 24, 2023 at 03:21:40PM +0000, Mark Brown wrote: >>>>>> >>>>>>>> This came out of some discussions about trying to handle emergency power >>>>>>>> failure notifications. >>>>>> >>>>>>> I'm sorry, but I don't know what that means. Are you saying that the >>>>>>> kernel is now going to try to provide a hard guarantee that some devices >>>>>>> are going to be shut down in X number of seconds when asked? If so, why >>>>>>> not do this in userspace? >>>>>> >>>>>> No, it was initially (or when I initially saw it anyway) handling of >>>>>> notifications from regulators that they're in trouble and we have some >>>>>> small amount of time to do anything we might want to do about it before >>>>>> we expire. >>>>> >>>>> So we are going to guarantee a "time" in which we are going to do >>>>> something? Again, if that's required, why not do it in userspace using >>>>> a RT kernel? >>>> >>>> For the HW in question I have only 100ms time before power loss. By >>>> doing it over use space some we will have even less time to react. >>> >>> Why can't userspace react that fast? Why will the kernel be somehow >>> faster? Speed should be the same, just get the "power is cut" signal >>> and have userspace flush and unmount the disk before power is gone. Why >>> can the kernel do this any differently? >>> >>>> In fact, this is not a new requirement. It exist on different flavors of >>>> automotive Linux for about 10 years. Linux in cars should be able to >>>> handle voltage drops for example on ignition and so on. The only new thing is >>>> the attempt to mainline it. >>> >>> But your patch is not guaranteeing anything, it's just doing a "I want >>> this done before the other devices are handled", that's it. There is no >>> chance that 100ms is going to be a requirement, or that some other >>> device type is not going to come along and demand to be ahead of your >>> device in the list. >>> >>> So you are going to have a constant fight among device types over the >>> years, and people complaining that the kernel is now somehow going to >>> guarantee that a device is shutdown in a set amount of time, which >>> again, the kernel can not guarantee here. >>> >>> This might work as a one-off for a specific hardware platform, which is >>> odd, but not anything you really should be adding for anyone else to use >>> here as your reasoning for it does not reflect what the code does. >> >> I was (am) interested in knowing how/where the regulator error >> notifications are utilized - hence I asked this in ELCE last summer. >> Replies indeed mostly pointed to automotive and handling the under >> voltage events. >> >> As to what has changed (I think this was asked in another mail on this >> topic) - I understood from the discussions that the demand of running >> systems with as low power as possible is even more >> important/desirable. Hence, the under-voltage events are more usual >> than they were when cars used to be working by burning flammable >> liquids :) >> >> Anyways, what I thought I'd comment on is that the severity of the >> regulator error notifications can be given from device-tree. Rationale >> behind this is that figuring out whether a certain detected problem is >> fatal or not (in embedded systems) should be done by the board >> designers, per board. Maybe the understanding which hardware should >> react first is also a property of hardware and could come from the >> device-tree? Eg, instead of having a "DEVICE_SHUTDOWN_PRIO_STORAGE" >> set unconditionally for EMMC, systems could set shutdown priority per >> board and per device explicitly using device-tree? > > Yes, using device tree would be good, but now you have created something > that is device-tree-specific and not all the world is device tree :( True. However, my understanding is that the regulator subsystem is largely written to work with DT-based systems. Hence supporting the DT-based solution would probably fit to this specific use-case as source of problem notifications is the regulator subsystem. > Also, many devices are finally moving out to non-device-tree busses, > like PCI and USB, so how would you handle them in this type of scheme? I do readily admit I don't have [all ;) ] the answers. I also think that if we add support for prioritized shutdown on device-tree-based systems, people may eventually want to use this on non device-tree setups too. There may also be other use-cases for prioritized shutdown (Don't know what they would be though). For now I would leave that to be the problem of the folks who need non device-tree systems when (if) this needs realizes. Assuming there was the handling of priorities in place, the missing piece would then be to find out the place to store this hardware specific priority information. If this is solved for the non DT cases, then the DT-based and non DT-based solutions can co-exist. Just a suggestion though. I am not working on under-voltage "stuff" right now. Yours, -- Matti
On Mon, Nov 27, 2023 at 04:49:49PM +0200, Matti Vaittinen wrote: > On 11/27/23 15:08, Greg Kroah-Hartman wrote: > > Yes, using device tree would be good, but now you have created something > > that is device-tree-specific and not all the world is device tree :( > True. However, my understanding is that the regulator subsystem is largely > written to work with DT-based systems. Hence supporting the DT-based > solution would probably fit to this specific use-case as source of problem > notifications is the regulator subsystem. Yes, ACPI has a strong model that things like regulators and clocks are not visible to the OS.
On Fri, 24 Nov 2023 at 15:53, Oleksij Rempel <o.rempel@pengutronix.de> wrote: > > Hi, > > This patch series introduces support for prioritized device shutdown. > The main goal is to enable prioritization for shutting down specific > devices, particularly crucial in scenarios like power loss where > hardware damage can occur if not handled properly. > > Oleksij Rempel (3): > driver core: move core part of device_shutdown() to a separate > function > driver core: introduce prioritized device shutdown sequence > mmc: core: increase shutdown priority for MMC devices > > drivers/base/core.c | 157 +++++++++++++++++++++++++++-------------- > drivers/mmc/core/bus.c | 2 + > include/linux/device.h | 51 ++++++++++++- > kernel/reboot.c | 4 +- > 4 files changed, 157 insertions(+), 57 deletions(-) > Sorry for joining the discussions a bit late! Besides the valuable feedback that you already received from others (which indicates that we have quite some work to do in the commit messages to better explain and justify these changes), I wanted to share my overall thoughts around this. So, I fully understand the reason behind the $subject series, as we unfortunately can't rely on flash-based (NAND/NOR) storage devices being 100% tolerant to sudden-power failures. Besides for the reasons already discussed in the thread, the robustness simply depends on the "quality" of the FTL (flash translation layer) and the NAND/NOR/etc device it runs. For example, back in the days when Android showed up, we were testing YAFFS and UBIFS on rawNAND, which failed miserably after just a few thousands of power-cycles. It was even worse with ext3/4 on the early variants of eMMC devices, as those survived only a few hundreds of power-cycles. Now, I assume this has improved a lot over the years, but I haven't really verified this myself. That said, for eMMC and other flash-based storage devices, industrial or not, I think it would make sense to try to notify the device about the power-failure, if possible. This would add another level of mitigation, I think. From an implementation point of view, it looks to me that the approach in the $subject series has some potential. Although, rather than diving into the details, I will defer to review the next version. Kind regards Uffe
Hello all, On Mon, Nov 27, 2023 at 12:36:11PM +0100, Oleksij Rempel wrote: > On Mon, Nov 27, 2023 at 10:13:49AM +0000, Christian Loehle wrote: > > > Same problem was seen not only in automotive devices, but also in > > > industrial or agricultural. With other words, it is important enough to bring > > > some kind of solution mainline. > > > > > > > IMO that is a serious problem with the used storage / eMMC in that case and it > > is not suitable for industrial/automotive uses? > > Any industrial/automotive-suitable storage device should detect under-voltage and > > just treat it as a power-down/loss, and while that isn't nice for the storage device, > > it really shouldn't be able to brick a device (within <1M cycles anyway). > > What does the storage module vendor say about this? > > Good question. I do not have insights ATM. I'll forward it. From personal experience I can tell that bricked eMMC devices happen because of eMMC controller firmware bugs. You might find some recently committed quirk on this regard. Waiting for any additional details you might be able to find, given my past experience I would agree with what Christian wrote. Francesco