Message ID | 20130316213519.2974.38954.stgit@amt.stowe (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > Port space and MMIO. This memory regions correspond to the device's > internal status and control registers used to drive the device. > > Accessing these registers from userspace such as "udevadm info > --attribute-walk --path=/sys/devices/..." does can not be allowed as > such accesses outside of the driver, even just reading, can yield > catastrophic consequences. > > Udevadm-info skips parsing a specific set of sysfs entries including > 'resource'. This patch extends the set to include the additional > 'resource<N>' entries that correspond to a PCI device's BARs. Nice, are you also going to patch bash to prevent a user from reading these sysfs files as well? :) And pciutils? You get my point here, right? The root user just asked to read all of the data for this device, so why wouldn't you allow it? Just like 'lspci' does. Or bash does. If this hardware has a problem, then it needs to be fixed in the kernel, not have random band-aids added to various userspace programs to paper over the root problem here. Please fix the kernel driver and all should be fine. No need to change udevadm. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 16, 2013 at 4:11 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: >> Sysfs includes entries to memory that backs a PCI device's BARs, both I/O >> Port space and MMIO. This memory regions correspond to the device's >> internal status and control registers used to drive the device. >> >> Accessing these registers from userspace such as "udevadm info >> --attribute-walk --path=/sys/devices/..." does can not be allowed as >> such accesses outside of the driver, even just reading, can yield >> catastrophic consequences. >> >> Udevadm-info skips parsing a specific set of sysfs entries including >> 'resource'. This patch extends the set to include the additional >> 'resource<N>' entries that correspond to a PCI device's BARs. > > Nice, are you also going to patch bash to prevent a user from reading > these sysfs files as well? :) > > And pciutils? > > You get my point here, right? The root user just asked to read all of > the data for this device, so why wouldn't you allow it? Just like > 'lspci' does. Or bash does. > > If this hardware has a problem, then it needs to be fixed in the kernel, > not have random band-aids added to various userspace programs to paper > over the root problem here. Please fix the kernel driver and all should > be fine. No need to change udevadm. I'm not sure that "udevadm info" (or bash) reading device registers is a good idea because we don't know what the device is, and we don't have any idea what the side effects of reading its registers will be. Just to be clear, this is about device-specific I/O port registers, not config space, so we can't expect any sort of consistency. We could put a quirk in the kernel for this device (obviously the issue is independent of whether the driver is loaded), but no doubt other devices with I/O BARs will have access size restrictions, side effects, or other issues. Adding quirks for them feels like a never-ending job. It might have been a mistake to put the resourceN files in sysfs in the first place, or to make them read/writable, because users expect sysfs files to contain ASCII. For memory BARs, resourceN only allows mmap, not read/write, so at least we side-step similar issues there. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > Port space and MMIO. This memory regions correspond to the device's > > internal status and control registers used to drive the device. > > > > Accessing these registers from userspace such as "udevadm info > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > such accesses outside of the driver, even just reading, can yield > > catastrophic consequences. > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > 'resource'. This patch extends the set to include the additional > > 'resource<N>' entries that correspond to a PCI device's BARs. > > Nice, are you also going to patch bash to prevent a user from reading > these sysfs files as well? :) > > And pciutils? > > You get my point here, right? The root user just asked to read all of > the data for this device, so why wouldn't you allow it? Just like > 'lspci' does. Or bash does. Yes :P , you raise a very good point, there are a lot of way a user can poke around in those BARs. However, there is a difference between shooting yourself in the foot and getting what you deserve versus unknowingly executing a common command such as udevadm and having the system hang. > > If this hardware has a problem, then it needs to be fixed in the kernel, > not have random band-aids added to various userspace programs to paper > over the root problem here. Please fix the kernel driver and all should > be fine. No need to change udevadm. Xiangliang initially proposed a patch within the PCI core. Ignoring the specific issue with the proposal which I pointed out in the https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like the right place to effect a change either as PCI's core isn't concerned with the contents or access limitations of those regions, those are issues that the driver concerns itself with. So things seem to be gravitating towards the driver. I'm fairly ignorant of this area but as Robert succinctly pointed out in the originating thread - the AHCI driver only uses the device's MMIO region. The I/O related regions are for legacy SFF-compatible ATA ports and are not used to driver the device. This, coupled with the observance that userspace accesses such as udevadm, and others like you additionally point out, do not filter through the device's driver for seems to suggest that changes to the driver will not help here either. That said, I was attempting to point out an interesting problem and get the conversation started towards coming up with some type a solution. Let's continue the conversation and see where things go. Thanks, Myron > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > Port space and MMIO. This memory regions correspond to the device's > > > internal status and control registers used to drive the device. > > > > > > Accessing these registers from userspace such as "udevadm info > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > such accesses outside of the driver, even just reading, can yield > > > catastrophic consequences. > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > 'resource'. This patch extends the set to include the additional > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > Nice, are you also going to patch bash to prevent a user from reading > > these sysfs files as well? :) > > > > And pciutils? > > > > You get my point here, right? The root user just asked to read all of > > the data for this device, so why wouldn't you allow it? Just like > > 'lspci' does. Or bash does. > > Yes :P , you raise a very good point, there are a lot of way a user can > poke around in those BARs. However, there is a difference between > shooting yourself in the foot and getting what you deserve versus > unknowingly executing a common command such as udevadm and having the > system hang. > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > not have random band-aids added to various userspace programs to paper > > over the root problem here. Please fix the kernel driver and all should > > be fine. No need to change udevadm. > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > specific issue with the proposal which I pointed out in the > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > the right place to effect a change either as PCI's core isn't concerned > with the contents or access limitations of those regions, those are > issues that the driver concerns itself with. > > So things seem to be gravitating towards the driver. I'm fairly > ignorant of this area but as Robert succinctly pointed out in the > originating thread - the AHCI driver only uses the device's MMIO region. > The I/O related regions are for legacy SFF-compatible ATA ports and are > not used to driver the device. This, coupled with the observance that > userspace accesses such as udevadm, and others like you additionally > point out, do not filter through the device's driver for seems to > suggest that changes to the driver will not help here either. A PCI quirk should handle this properly, right? Why not do that? Worse thing, the quirk could just not expose these sysfs files for this device, which would solve all userspace program issues, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > Port space and MMIO. This memory regions correspond to the device's > > > > internal status and control registers used to drive the device. > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > such accesses outside of the driver, even just reading, can yield > > > > catastrophic consequences. > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > 'resource'. This patch extends the set to include the additional > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > these sysfs files as well? :) > > > > > > And pciutils? > > > > > > You get my point here, right? The root user just asked to read all of > > > the data for this device, so why wouldn't you allow it? Just like > > > 'lspci' does. Or bash does. > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > poke around in those BARs. However, there is a difference between > > shooting yourself in the foot and getting what you deserve versus > > unknowingly executing a common command such as udevadm and having the > > system hang. > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > not have random band-aids added to various userspace programs to paper > > > over the root problem here. Please fix the kernel driver and all should > > > be fine. No need to change udevadm. > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > specific issue with the proposal which I pointed out in the > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > the right place to effect a change either as PCI's core isn't concerned > > with the contents or access limitations of those regions, those are > > issues that the driver concerns itself with. > > > > So things seem to be gravitating towards the driver. I'm fairly > > ignorant of this area but as Robert succinctly pointed out in the > > originating thread - the AHCI driver only uses the device's MMIO region. > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > not used to driver the device. This, coupled with the observance that > > userspace accesses such as udevadm, and others like you additionally > > point out, do not filter through the device's driver for seems to > > suggest that changes to the driver will not help here either. > > A PCI quirk should handle this properly, right? Why not do that? Worse > thing, the quirk could just not expose these sysfs files for this > device, which would solve all userspace program issues, right? Not exactly. I/O port access through pci-sysfs was added for userspace programs, specifically qemu-kvm device assignment. We use the I/O port resource# files to access device owned I/O port registers using file permissions rather than global permissions such as iopl/ioperm. File permissions also prevent random users from accessing device registers through these files, but of course can't stop a privileged app that chooses to ignore the purpose of these files. A quirk would therefore remove a file that actually has a useful purpose for one app just so another app that has no particular reason for dumping the contents can run unabated. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: > On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > > Port space and MMIO. This memory regions correspond to the device's > > > > > internal status and control registers used to drive the device. > > > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > > such accesses outside of the driver, even just reading, can yield > > > > > catastrophic consequences. > > > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > > 'resource'. This patch extends the set to include the additional > > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > > these sysfs files as well? :) > > > > > > > > And pciutils? > > > > > > > > You get my point here, right? The root user just asked to read all of > > > > the data for this device, so why wouldn't you allow it? Just like > > > > 'lspci' does. Or bash does. > > > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > > poke around in those BARs. However, there is a difference between > > > shooting yourself in the foot and getting what you deserve versus > > > unknowingly executing a common command such as udevadm and having the > > > system hang. > > > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > > not have random band-aids added to various userspace programs to paper > > > > over the root problem here. Please fix the kernel driver and all should > > > > be fine. No need to change udevadm. > > > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > > specific issue with the proposal which I pointed out in the > > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > > the right place to effect a change either as PCI's core isn't concerned > > > with the contents or access limitations of those regions, those are > > > issues that the driver concerns itself with. > > > > > > So things seem to be gravitating towards the driver. I'm fairly > > > ignorant of this area but as Robert succinctly pointed out in the > > > originating thread - the AHCI driver only uses the device's MMIO region. > > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > > not used to driver the device. This, coupled with the observance that > > > userspace accesses such as udevadm, and others like you additionally > > > point out, do not filter through the device's driver for seems to > > > suggest that changes to the driver will not help here either. > > > > A PCI quirk should handle this properly, right? Why not do that? Worse > > thing, the quirk could just not expose these sysfs files for this > > device, which would solve all userspace program issues, right? > > Not exactly. I/O port access through pci-sysfs was added for userspace > programs, specifically qemu-kvm device assignment. We use the I/O port > resource# files to access device owned I/O port registers using file > permissions rather than global permissions such as iopl/ioperm. File > permissions also prevent random users from accessing device registers > through these files, but of course can't stop a privileged app that > chooses to ignore the purpose of these files. A quirk would therefore > remove a file that actually has a useful purpose for one app just so > another app that has no particular reason for dumping the contents can > run unabated. Thanks, The quirk would only be for this one specific device, which obviously can't handle this type of access, so why would you want the sysfs files even present for it at all? greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-03-16 at 22:36 -0700, Greg KH wrote: > On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: > > On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > > > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > > > Port space and MMIO. This memory regions correspond to the device's > > > > > > internal status and control registers used to drive the device. > > > > > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > > > such accesses outside of the driver, even just reading, can yield > > > > > > catastrophic consequences. > > > > > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > > > 'resource'. This patch extends the set to include the additional > > > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > > > these sysfs files as well? :) > > > > > > > > > > And pciutils? > > > > > > > > > > You get my point here, right? The root user just asked to read all of > > > > > the data for this device, so why wouldn't you allow it? Just like > > > > > 'lspci' does. Or bash does. > > > > > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > > > poke around in those BARs. However, there is a difference between > > > > shooting yourself in the foot and getting what you deserve versus > > > > unknowingly executing a common command such as udevadm and having the > > > > system hang. > > > > > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > > > not have random band-aids added to various userspace programs to paper > > > > > over the root problem here. Please fix the kernel driver and all should > > > > > be fine. No need to change udevadm. > > > > > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > > > specific issue with the proposal which I pointed out in the > > > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > > > the right place to effect a change either as PCI's core isn't concerned > > > > with the contents or access limitations of those regions, those are > > > > issues that the driver concerns itself with. > > > > > > > > So things seem to be gravitating towards the driver. I'm fairly > > > > ignorant of this area but as Robert succinctly pointed out in the > > > > originating thread - the AHCI driver only uses the device's MMIO region. > > > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > > > not used to driver the device. This, coupled with the observance that > > > > userspace accesses such as udevadm, and others like you additionally > > > > point out, do not filter through the device's driver for seems to > > > > suggest that changes to the driver will not help here either. > > > > > > A PCI quirk should handle this properly, right? Why not do that? Worse > > > thing, the quirk could just not expose these sysfs files for this > > > device, which would solve all userspace program issues, right? > > > > Not exactly. I/O port access through pci-sysfs was added for userspace > > programs, specifically qemu-kvm device assignment. We use the I/O port > > resource# files to access device owned I/O port registers using file > > permissions rather than global permissions such as iopl/ioperm. File > > permissions also prevent random users from accessing device registers > > through these files, but of course can't stop a privileged app that > > chooses to ignore the purpose of these files. A quirk would therefore > > remove a file that actually has a useful purpose for one app just so > > another app that has no particular reason for dumping the contents can > > run unabated. Thanks, > > The quirk would only be for this one specific device, which obviously > can't handle this type of access, so why would you want the sysfs files > even present for it at all? I'm assuming that the device only breaks because udevadm is dumping the full I/O port register space of the device and that if an actual driver was interacting with it through this interface that it would work. Who knows how many devices will have read side-effects by udevadm blindly dumping these files. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > I'm assuming that the device only breaks because udevadm is dumping the > full I/O port register space of the device and that if an actual driver > was interacting with it through this interface that it would work. Who > knows how many devices will have read side-effects by udevadm blindly > dumping these files. Thanks, Sysfs is a too public interface to export things there which make devices/driver choke on a simple read() of an attribute. This is nothing specific to udevadm, any tool can do that. Udevadm will never read any of the files during normal operation. The admin explicitly asked udevadm with a specific command to dump all the stuff the device offers. The kernel driver needs to be fixed to allow that, in the worst case, the attributes not exported at all. People should take more care what they export in /sys, it's not a hidden and private ioctl what's exported there, stuff is very visible and will be looked at. Telling userspace not to use specific stuff in /sys I would not expect to work as a strategy; there is too much weird stuff out there that will always try to do that ... Thanks, Kay -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > Port space and MMIO. This memory regions correspond to the device's > > > > internal status and control registers used to drive the device. > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > such accesses outside of the driver, even just reading, can yield > > > > catastrophic consequences. > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > 'resource'. This patch extends the set to include the additional > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > these sysfs files as well? :) > > > > > > And pciutils? > > > > > > You get my point here, right? The root user just asked to read all of > > > the data for this device, so why wouldn't you allow it? Just like > > > 'lspci' does. Or bash does. > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > poke around in those BARs. However, there is a difference between > > shooting yourself in the foot and getting what you deserve versus > > unknowingly executing a common command such as udevadm and having the > > system hang. > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > not have random band-aids added to various userspace programs to paper > > > over the root problem here. Please fix the kernel driver and all should > > > be fine. No need to change udevadm. > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > specific issue with the proposal which I pointed out in the > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > the right place to effect a change either as PCI's core isn't concerned > > with the contents or access limitations of those regions, those are > > issues that the driver concerns itself with. > > > > So things seem to be gravitating towards the driver. I'm fairly > > ignorant of this area but as Robert succinctly pointed out in the > > originating thread - the AHCI driver only uses the device's MMIO region. > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > not used to driver the device. This, coupled with the observance that > > userspace accesses such as udevadm, and others like you additionally > > point out, do not filter through the device's driver for seems to > > suggest that changes to the driver will not help here either. > > A PCI quirk should handle this properly, right? Why not do that? Worse > thing, the quirk could just not expose these sysfs files for this > device, which would solve all userspace program issues, right? The quirk you are suggesting would basically have to be a reversion of commit 8633328 for the reasons that Bjorn pointed out so that we cover all devices, not just this one particular device: We could put a quirk in the kernel for this device (obviously the issue is independent of whether the driver is loaded), but no doubt other devices with I/O BARs will have access size restrictions, side effects, or other issues. Adding quirks for them feels like a never-ending job. I'm beginning to think that people have not read the analysis which was the first mail entry of this thread (I meant for the Subject: to read "PATCH 0/1] ...) which is at https://lkml.org/lkml/2013/3/16/168 It appears [*] that we are exposed to this potential conflict with *every* PCI device's resource# files; not just this one particular device (again see the analysis cover email, especially the three paragraphs starting with "Putting together..."). [*] I carefully use the word "appears" due to the one aspect of this whole issue that I still do not understand which I also expressed in the cover - which is immediately below the section I just pointed out above. So what I'd like to understand and why we are focusing on this one particular instance/device when we *appear* to be at risk with all devices and their resource# files? Myron > > thanks, > > greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: > On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > I'm assuming that the device only breaks because udevadm is dumping the > > full I/O port register space of the device and that if an actual driver > > was interacting with it through this interface that it would work. Who > > knows how many devices will have read side-effects by udevadm blindly > > dumping these files. Thanks, > > Sysfs is a too public interface to export things there which make > devices/driver choke on a simple read() of an attribute. > > This is nothing specific to udevadm, any tool can do that. Udevadm > will never read any of the files during normal operation. The admin > explicitly asked udevadm with a specific command to dump all the stuff > the device offers. > > The kernel driver needs to be fixed to allow that, in the worst case, > the attributes not exported at all. People should take more care what > they export in /sys, it's not a hidden and private ioctl what's > exported there, stuff is very visible and will be looked at. > > Telling userspace not to use specific stuff in /sys I would not expect > to work as a strategy; there is too much weird stuff out there that > will always try to do that ... Kay - could you comment on Foot Note 3 in https://lkml.org/lkml/2013/3/16/168 With respect to 'udev', you are working on the assumption that all files in sysfs must be readable with no consequences which may be implied by the Documentation's sysfs.txt file's mentioning ASCII. If we are to interpret that as strictly as you seem to want to then why is there sysfs support for creating binary files? Myron > > Thanks, > Kay -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe <mstowe@redhat.com> wrote: > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: >> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > I'm assuming that the device only breaks because udevadm is dumping the >> > full I/O port register space of the device and that if an actual driver >> > was interacting with it through this interface that it would work. Who >> > knows how many devices will have read side-effects by udevadm blindly >> > dumping these files. Thanks, >> >> Sysfs is a too public interface to export things there which make >> devices/driver choke on a simple read() of an attribute. >> >> This is nothing specific to udevadm, any tool can do that. Udevadm >> will never read any of the files during normal operation. The admin >> explicitly asked udevadm with a specific command to dump all the stuff >> the device offers. >> >> The kernel driver needs to be fixed to allow that, in the worst case, >> the attributes not exported at all. People should take more care what >> they export in /sys, it's not a hidden and private ioctl what's >> exported there, stuff is very visible and will be looked at. >> >> Telling userspace not to use specific stuff in /sys I would not expect >> to work as a strategy; there is too much weird stuff out there that >> will always try to do that ... > > Kay - could you comment on Foot Note 3 in > https://lkml.org/lkml/2013/3/16/168 > > With respect to 'udev', you are working on the assumption that all files > in sysfs must be readable with no consequences which may be implied by > the Documentation's sysfs.txt file's mentioning ASCII. If we are to > interpret that as strictly as you seem to want to then why is there > sysfs support for creating binary files? They cannot be distinguished from outside, so there is nothing I know that could make a difference to userspace tools. Tools -- no matter how useful they are not not, it's that they do that for many years already -- need to be able to read() the stuff in there, without causing any damage to the system. Kay -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 07:38 -0600, Alex Williamson wrote: > On Sat, 2013-03-16 at 22:36 -0700, Greg KH wrote: > > On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: > > > On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > > > > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > > > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > > > > Port space and MMIO. This memory regions correspond to the device's > > > > > > > internal status and control registers used to drive the device. > > > > > > > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > > > > such accesses outside of the driver, even just reading, can yield > > > > > > > catastrophic consequences. > > > > > > > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > > > > 'resource'. This patch extends the set to include the additional > > > > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > > > > these sysfs files as well? :) > > > > > > > > > > > > And pciutils? > > > > > > > > > > > > You get my point here, right? The root user just asked to read all of > > > > > > the data for this device, so why wouldn't you allow it? Just like > > > > > > 'lspci' does. Or bash does. > > > > > > > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > > > > poke around in those BARs. However, there is a difference between > > > > > shooting yourself in the foot and getting what you deserve versus > > > > > unknowingly executing a common command such as udevadm and having the > > > > > system hang. > > > > > > > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > > > > not have random band-aids added to various userspace programs to paper > > > > > > over the root problem here. Please fix the kernel driver and all should > > > > > > be fine. No need to change udevadm. > > > > > > > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > > > > specific issue with the proposal which I pointed out in the > > > > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > > > > the right place to effect a change either as PCI's core isn't concerned > > > > > with the contents or access limitations of those regions, those are > > > > > issues that the driver concerns itself with. > > > > > > > > > > So things seem to be gravitating towards the driver. I'm fairly > > > > > ignorant of this area but as Robert succinctly pointed out in the > > > > > originating thread - the AHCI driver only uses the device's MMIO region. > > > > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > > > > not used to driver the device. This, coupled with the observance that > > > > > userspace accesses such as udevadm, and others like you additionally > > > > > point out, do not filter through the device's driver for seems to > > > > > suggest that changes to the driver will not help here either. > > > > > > > > A PCI quirk should handle this properly, right? Why not do that? Worse > > > > thing, the quirk could just not expose these sysfs files for this > > > > device, which would solve all userspace program issues, right? > > > > > > Not exactly. I/O port access through pci-sysfs was added for userspace > > > programs, specifically qemu-kvm device assignment. We use the I/O port > > > resource# files to access device owned I/O port registers using file > > > permissions rather than global permissions such as iopl/ioperm. File > > > permissions also prevent random users from accessing device registers > > > through these files, but of course can't stop a privileged app that > > > chooses to ignore the purpose of these files. A quirk would therefore > > > remove a file that actually has a useful purpose for one app just so > > > another app that has no particular reason for dumping the contents can > > > run unabated. Thanks, > > > > The quirk would only be for this one specific device, which obviously > > can't handle this type of access, so why would you want the sysfs files > > even present for it at all? > > I'm assuming that the device only breaks because udevadm is dumping the > full I/O port register space of the device and that if an actual driver > was interacting with it through this interface that it would work. Correct: the AHCI driver only uses the device's MMIO region. The I/O related regions are for legacy SFF-compatible ATA ports and are not used to driver the device. This, coupled with the observance that userspace accesses such as udevadm, and others like Greg additionally pointed out, do not filter through the device's driver seems to suggest that changes to the driver will not help here either. > Who > knows how many devices will have read side-effects by udevadm blindly > dumping these files. Thanks, > > Alex > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 15:29 +0100, Kay Sievers wrote: > On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe <mstowe@redhat.com> wrote: > > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: > >> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson > >> <alex.williamson@redhat.com> wrote: > >> > I'm assuming that the device only breaks because udevadm is dumping the > >> > full I/O port register space of the device and that if an actual driver > >> > was interacting with it through this interface that it would work. Who > >> > knows how many devices will have read side-effects by udevadm blindly > >> > dumping these files. Thanks, > >> > >> Sysfs is a too public interface to export things there which make > >> devices/driver choke on a simple read() of an attribute. > >> > >> This is nothing specific to udevadm, any tool can do that. Udevadm > >> will never read any of the files during normal operation. The admin > >> explicitly asked udevadm with a specific command to dump all the stuff > >> the device offers. > >> > >> The kernel driver needs to be fixed to allow that, in the worst case, > >> the attributes not exported at all. People should take more care what > >> they export in /sys, it's not a hidden and private ioctl what's > >> exported there, stuff is very visible and will be looked at. > >> > >> Telling userspace not to use specific stuff in /sys I would not expect > >> to work as a strategy; there is too much weird stuff out there that > >> will always try to do that ... > > > > Kay - could you comment on Foot Note 3 in > > https://lkml.org/lkml/2013/3/16/168 > > > > With respect to 'udev', you are working on the assumption that all files > > in sysfs must be readable with no consequences which may be implied by > > the Documentation's sysfs.txt file's mentioning ASCII. If we are to > > interpret that as strictly as you seem to want to then why is there > > sysfs support for creating binary files? > > They cannot be distinguished from outside, so there is nothing I know > that could make a difference to userspace tools. Agreed > > Tools -- no matter how useful they are not not, it's that they do that > for many years already -- need to be able to read() the stuff in > there, without causing any damage to the system. So then, why are certain sysfs files skipped in udevadm-info's parsing (./src/udevadm-info.c::skip_attribute())? > > Kay -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Mar 17, 2013 at 3:36 PM, Myron Stowe <mstowe@redhat.com> wrote: > On Sun, 2013-03-17 at 15:29 +0100, Kay Sievers wrote: >> On Sun, Mar 17, 2013 at 3:20 PM, Myron Stowe <mstowe@redhat.com> wrote: >> > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: >> >> On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson >> >> <alex.williamson@redhat.com> wrote: >> >> > I'm assuming that the device only breaks because udevadm is dumping the >> >> > full I/O port register space of the device and that if an actual driver >> >> > was interacting with it through this interface that it would work. Who >> >> > knows how many devices will have read side-effects by udevadm blindly >> >> > dumping these files. Thanks, >> >> >> >> Sysfs is a too public interface to export things there which make >> >> devices/driver choke on a simple read() of an attribute. >> >> >> >> This is nothing specific to udevadm, any tool can do that. Udevadm >> >> will never read any of the files during normal operation. The admin >> >> explicitly asked udevadm with a specific command to dump all the stuff >> >> the device offers. >> >> >> >> The kernel driver needs to be fixed to allow that, in the worst case, >> >> the attributes not exported at all. People should take more care what >> >> they export in /sys, it's not a hidden and private ioctl what's >> >> exported there, stuff is very visible and will be looked at. >> >> >> >> Telling userspace not to use specific stuff in /sys I would not expect >> >> to work as a strategy; there is too much weird stuff out there that >> >> will always try to do that ... >> > >> > Kay - could you comment on Foot Note 3 in >> > https://lkml.org/lkml/2013/3/16/168 >> > >> > With respect to 'udev', you are working on the assumption that all files >> > in sysfs must be readable with no consequences which may be implied by >> > the Documentation's sysfs.txt file's mentioning ASCII. If we are to >> > interpret that as strictly as you seem to want to then why is there >> > sysfs support for creating binary files? >> >> They cannot be distinguished from outside, so there is nothing I know >> that could make a difference to userspace tools. > > Agreed >> >> Tools -- no matter how useful they are not not, it's that they do that >> for many years already -- need to be able to read() the stuff in >> there, without causing any damage to the system. > > So then, why are certain sysfs files skipped in udevadm-info's parsing > (./src/udevadm-info.c::skip_attribute())? Because they are not useful to use in udev rules, or are just not recommended to use in rules because they break other assumptions and would encode specific settings, which can rightfully change at runtime, into rules. The list is in no way a list to ensure a system/driver/device is not choking on read(). Kay -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 08:33 -0600, Myron Stowe wrote: > On Sun, 2013-03-17 at 07:38 -0600, Alex Williamson wrote: > > On Sat, 2013-03-16 at 22:36 -0700, Greg KH wrote: > > > On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: > > > > On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > > > > > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > > > > > > On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > > > > > > > On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > > > > > > > > Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > > > > > > > > Port space and MMIO. This memory regions correspond to the device's > > > > > > > > internal status and control registers used to drive the device. > > > > > > > > > > > > > > > > Accessing these registers from userspace such as "udevadm info > > > > > > > > --attribute-walk --path=/sys/devices/..." does can not be allowed as > > > > > > > > such accesses outside of the driver, even just reading, can yield > > > > > > > > catastrophic consequences. > > > > > > > > > > > > > > > > Udevadm-info skips parsing a specific set of sysfs entries including > > > > > > > > 'resource'. This patch extends the set to include the additional > > > > > > > > 'resource<N>' entries that correspond to a PCI device's BARs. > > > > > > > > > > > > > > Nice, are you also going to patch bash to prevent a user from reading > > > > > > > these sysfs files as well? :) > > > > > > > > > > > > > > And pciutils? > > > > > > > > > > > > > > You get my point here, right? The root user just asked to read all of > > > > > > > the data for this device, so why wouldn't you allow it? Just like > > > > > > > 'lspci' does. Or bash does. > > > > > > > > > > > > Yes :P , you raise a very good point, there are a lot of way a user can > > > > > > poke around in those BARs. However, there is a difference between > > > > > > shooting yourself in the foot and getting what you deserve versus > > > > > > unknowingly executing a common command such as udevadm and having the > > > > > > system hang. > > > > > > > > > > > > > > If this hardware has a problem, then it needs to be fixed in the kernel, > > > > > > > not have random band-aids added to various userspace programs to paper > > > > > > > over the root problem here. Please fix the kernel driver and all should > > > > > > > be fine. No need to change udevadm. > > > > > > > > > > > > Xiangliang initially proposed a patch within the PCI core. Ignoring the > > > > > > specific issue with the proposal which I pointed out in the > > > > > > https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > > > > > > the right place to effect a change either as PCI's core isn't concerned > > > > > > with the contents or access limitations of those regions, those are > > > > > > issues that the driver concerns itself with. > > > > > > > > > > > > So things seem to be gravitating towards the driver. I'm fairly > > > > > > ignorant of this area but as Robert succinctly pointed out in the > > > > > > originating thread - the AHCI driver only uses the device's MMIO region. > > > > > > The I/O related regions are for legacy SFF-compatible ATA ports and are > > > > > > not used to driver the device. This, coupled with the observance that > > > > > > userspace accesses such as udevadm, and others like you additionally > > > > > > point out, do not filter through the device's driver for seems to > > > > > > suggest that changes to the driver will not help here either. > > > > > > > > > > A PCI quirk should handle this properly, right? Why not do that? Worse > > > > > thing, the quirk could just not expose these sysfs files for this > > > > > device, which would solve all userspace program issues, right? > > > > > > > > Not exactly. I/O port access through pci-sysfs was added for userspace > > > > programs, specifically qemu-kvm device assignment. We use the I/O port > > > > resource# files to access device owned I/O port registers using file > > > > permissions rather than global permissions such as iopl/ioperm. File > > > > permissions also prevent random users from accessing device registers > > > > through these files, but of course can't stop a privileged app that > > > > chooses to ignore the purpose of these files. A quirk would therefore > > > > remove a file that actually has a useful purpose for one app just so > > > > another app that has no particular reason for dumping the contents can > > > > run unabated. Thanks, > > > > > > The quirk would only be for this one specific device, which obviously > > > can't handle this type of access, so why would you want the sysfs files > > > even present for it at all? > > > > I'm assuming that the device only breaks because udevadm is dumping the > > full I/O port register space of the device and that if an actual driver > > was interacting with it through this interface that it would work. > > Correct: > the AHCI driver only uses the device's MMIO region. The I/O > related regions are for legacy SFF-compatible ATA ports and are > not used to driver the device. This, coupled with the > observance that userspace accesses such as udevadm, and others > like Greg additionally pointed out, do not filter through the > device's driver seems to suggest that changes to the driver will > not help here either. That may be true of our AHCI driver, but when it's assigned to a guest we're potentially using a completely different stack and cannot make that assumption. A guest running in compatibility mode or the option ROM for the device may still use I/O port regions. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/17/2013 06:28 PM, Alex Williamson wrote: > On Sun, 2013-03-17 at 08:33 -0600, Myron Stowe wrote: >> On Sun, 2013-03-17 at 07:38 -0600, Alex Williamson wrote: >>> On Sat, 2013-03-16 at 22:36 -0700, Greg KH wrote: >>>> On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: >>>>> On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: >>>>>> On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: >>>>>>> On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: >>>>>>>> On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: >>>>>>>>> Sysfs includes entries to memory that backs a PCI device's BARs, both I/O >>>>>>>>> Port space and MMIO. This memory regions correspond to the device's >>>>>>>>> internal status and control registers used to drive the device. >>>>>>>>> >>>>>>>>> Accessing these registers from userspace such as "udevadm info >>>>>>>>> --attribute-walk --path=/sys/devices/..." does can not be allowed as >>>>>>>>> such accesses outside of the driver, even just reading, can yield >>>>>>>>> catastrophic consequences. >>>>>>>>> >>>>>>>>> Udevadm-info skips parsing a specific set of sysfs entries including >>>>>>>>> 'resource'. This patch extends the set to include the additional >>>>>>>>> 'resource<N>' entries that correspond to a PCI device's BARs. >>>>>>>> >>>>>>>> Nice, are you also going to patch bash to prevent a user from reading >>>>>>>> these sysfs files as well? :) >>>>>>>> >>>>>>>> And pciutils? >>>>>>>> >>>>>>>> You get my point here, right? The root user just asked to read all of >>>>>>>> the data for this device, so why wouldn't you allow it? Just like >>>>>>>> 'lspci' does. Or bash does. >>>>>>> >>>>>>> Yes :P , you raise a very good point, there are a lot of way a user can >>>>>>> poke around in those BARs. However, there is a difference between >>>>>>> shooting yourself in the foot and getting what you deserve versus >>>>>>> unknowingly executing a common command such as udevadm and having the >>>>>>> system hang. >>>>>>>> >>>>>>>> If this hardware has a problem, then it needs to be fixed in the kernel, >>>>>>>> not have random band-aids added to various userspace programs to paper >>>>>>>> over the root problem here. Please fix the kernel driver and all should >>>>>>>> be fine. No need to change udevadm. >>>>>>> >>>>>>> Xiangliang initially proposed a patch within the PCI core. Ignoring the >>>>>>> specific issue with the proposal which I pointed out in the >>>>>>> https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like >>>>>>> the right place to effect a change either as PCI's core isn't concerned >>>>>>> with the contents or access limitations of those regions, those are >>>>>>> issues that the driver concerns itself with. >>>>>>> >>>>>>> So things seem to be gravitating towards the driver. I'm fairly >>>>>>> ignorant of this area but as Robert succinctly pointed out in the >>>>>>> originating thread - the AHCI driver only uses the device's MMIO region. >>>>>>> The I/O related regions are for legacy SFF-compatible ATA ports and are >>>>>>> not used to driver the device. This, coupled with the observance that >>>>>>> userspace accesses such as udevadm, and others like you additionally >>>>>>> point out, do not filter through the device's driver for seems to >>>>>>> suggest that changes to the driver will not help here either. >>>>>> >>>>>> A PCI quirk should handle this properly, right? Why not do that? Worse >>>>>> thing, the quirk could just not expose these sysfs files for this >>>>>> device, which would solve all userspace program issues, right? >>>>> >>>>> Not exactly. I/O port access through pci-sysfs was added for userspace >>>>> programs, specifically qemu-kvm device assignment. We use the I/O port >>>>> resource# files to access device owned I/O port registers using file >>>>> permissions rather than global permissions such as iopl/ioperm. File >>>>> permissions also prevent random users from accessing device registers >>>>> through these files, but of course can't stop a privileged app that >>>>> chooses to ignore the purpose of these files. A quirk would therefore >>>>> remove a file that actually has a useful purpose for one app just so >>>>> another app that has no particular reason for dumping the contents can >>>>> run unabated. Thanks, >>>> >>>> The quirk would only be for this one specific device, which obviously >>>> can't handle this type of access, so why would you want the sysfs files >>>> even present for it at all? >>> >>> I'm assuming that the device only breaks because udevadm is dumping the >>> full I/O port register space of the device and that if an actual driver >>> was interacting with it through this interface that it would work. >> >> Correct: >> the AHCI driver only uses the device's MMIO region. The I/O >> related regions are for legacy SFF-compatible ATA ports and are >> not used to driver the device. This, coupled with the >> observance that userspace accesses such as udevadm, and others >> like Greg additionally pointed out, do not filter through the >> device's driver seems to suggest that changes to the driver will >> not help here either. > > That may be true of our AHCI driver, but when it's assigned to a guest > we're potentially using a completely different stack and cannot make > that assumption. A guest running in compatibility mode or the option > ROM for the device may still use I/O port regions. Thanks, > > Alex > > In quick summary: (1)reading a device's registers may have side effects on the device operation, e.g., a register maps to a device's FIFO register. (2) Having two threads read such device registers can cause unknown results, i.e., driver & user-app. (3) It may be valid for a user-app to read device regs, e.g., qemu-kvm assigned device So, can't it be solved by: (a) if no driver is configured for the device, than it's valid for a user-app to read the device regs ? -- although diff. user apps doing so still exposes the problem, and can't be distinguished, e.g., qemu-kvm + udevadm -- or can file permissions (set by libvirt driving qemu-kvm device assignment) block multiple user-app reading ? i.e., basically, a user-level version of a driver allocating the device, which in the case of qemu-kvm device-assignment, is what is actually happening! :) (b) if driver is configured, need a quirk-registration, or generic, optional, driver function to check for user-app reading approval. ok, bash away... -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: > On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > I'm assuming that the device only breaks because udevadm is dumping the > > full I/O port register space of the device and that if an actual driver > > was interacting with it through this interface that it would work. Who > > knows how many devices will have read side-effects by udevadm blindly > > dumping these files. Thanks, > > Sysfs is a too public interface to export things there which make > devices/driver choke on a simple read() of an attribute. That's why the default permissions for the file do not allow users to read it. I wish we could do something as clever as the MMIO resource files, but I/O port spaces don't allow mmap for the predominant architecture. Eventually VFIO is meant to replace this access and does move device register access behind ioctls, but for now legacy KVM device assignment relies on these files and so might some UIO drivers. > This is nothing specific to udevadm, any tool can do that. Udevadm > will never read any of the files during normal operation. The admin > explicitly asked udevadm with a specific command to dump all the stuff > the device offers. Isn't it possible udevadm could drop privileges or filter out non-world readable files? > The kernel driver needs to be fixed to allow that, in the worst case, > the attributes not exported at all. People should take more care what > they export in /sys, it's not a hidden and private ioctl what's > exported there, stuff is very visible and will be looked at. File permissions... > Telling userspace not to use specific stuff in /sys I would not expect > to work as a strategy; there is too much weird stuff out there that > will always try to do that ... I agree, the kernel needs to protect itself from malicious apps, but if you run a malicious app with admin access, how much can/should we do? If we're going to ignore file permissions, why limit ourselves to read(), should we make everything safe against write() as well? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 10:50 -0400, Don Dutile wrote: > On 03/17/2013 06:28 PM, Alex Williamson wrote: > > On Sun, 2013-03-17 at 08:33 -0600, Myron Stowe wrote: > >> On Sun, 2013-03-17 at 07:38 -0600, Alex Williamson wrote: > >>> On Sat, 2013-03-16 at 22:36 -0700, Greg KH wrote: > >>>> On Sat, Mar 16, 2013 at 10:11:22PM -0600, Alex Williamson wrote: > >>>>> On Sat, 2013-03-16 at 18:03 -0700, Greg KH wrote: > >>>>>> On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > >>>>>>> On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > >>>>>>>> On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > >>>>>>>>> Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > >>>>>>>>> Port space and MMIO. This memory regions correspond to the device's > >>>>>>>>> internal status and control registers used to drive the device. > >>>>>>>>> > >>>>>>>>> Accessing these registers from userspace such as "udevadm info > >>>>>>>>> --attribute-walk --path=/sys/devices/..." does can not be allowed as > >>>>>>>>> such accesses outside of the driver, even just reading, can yield > >>>>>>>>> catastrophic consequences. > >>>>>>>>> > >>>>>>>>> Udevadm-info skips parsing a specific set of sysfs entries including > >>>>>>>>> 'resource'. This patch extends the set to include the additional > >>>>>>>>> 'resource<N>' entries that correspond to a PCI device's BARs. > >>>>>>>> > >>>>>>>> Nice, are you also going to patch bash to prevent a user from reading > >>>>>>>> these sysfs files as well? :) > >>>>>>>> > >>>>>>>> And pciutils? > >>>>>>>> > >>>>>>>> You get my point here, right? The root user just asked to read all of > >>>>>>>> the data for this device, so why wouldn't you allow it? Just like > >>>>>>>> 'lspci' does. Or bash does. > >>>>>>> > >>>>>>> Yes :P , you raise a very good point, there are a lot of way a user can > >>>>>>> poke around in those BARs. However, there is a difference between > >>>>>>> shooting yourself in the foot and getting what you deserve versus > >>>>>>> unknowingly executing a common command such as udevadm and having the > >>>>>>> system hang. > >>>>>>>> > >>>>>>>> If this hardware has a problem, then it needs to be fixed in the kernel, > >>>>>>>> not have random band-aids added to various userspace programs to paper > >>>>>>>> over the root problem here. Please fix the kernel driver and all should > >>>>>>>> be fine. No need to change udevadm. > >>>>>>> > >>>>>>> Xiangliang initially proposed a patch within the PCI core. Ignoring the > >>>>>>> specific issue with the proposal which I pointed out in the > >>>>>>> https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > >>>>>>> the right place to effect a change either as PCI's core isn't concerned > >>>>>>> with the contents or access limitations of those regions, those are > >>>>>>> issues that the driver concerns itself with. > >>>>>>> > >>>>>>> So things seem to be gravitating towards the driver. I'm fairly > >>>>>>> ignorant of this area but as Robert succinctly pointed out in the > >>>>>>> originating thread - the AHCI driver only uses the device's MMIO region. > >>>>>>> The I/O related regions are for legacy SFF-compatible ATA ports and are > >>>>>>> not used to driver the device. This, coupled with the observance that > >>>>>>> userspace accesses such as udevadm, and others like you additionally > >>>>>>> point out, do not filter through the device's driver for seems to > >>>>>>> suggest that changes to the driver will not help here either. > >>>>>> > >>>>>> A PCI quirk should handle this properly, right? Why not do that? Worse > >>>>>> thing, the quirk could just not expose these sysfs files for this > >>>>>> device, which would solve all userspace program issues, right? > >>>>> > >>>>> Not exactly. I/O port access through pci-sysfs was added for userspace > >>>>> programs, specifically qemu-kvm device assignment. We use the I/O port > >>>>> resource# files to access device owned I/O port registers using file > >>>>> permissions rather than global permissions such as iopl/ioperm. File > >>>>> permissions also prevent random users from accessing device registers > >>>>> through these files, but of course can't stop a privileged app that > >>>>> chooses to ignore the purpose of these files. A quirk would therefore > >>>>> remove a file that actually has a useful purpose for one app just so > >>>>> another app that has no particular reason for dumping the contents can > >>>>> run unabated. Thanks, > >>>> > >>>> The quirk would only be for this one specific device, which obviously > >>>> can't handle this type of access, so why would you want the sysfs files > >>>> even present for it at all? > >>> > >>> I'm assuming that the device only breaks because udevadm is dumping the > >>> full I/O port register space of the device and that if an actual driver > >>> was interacting with it through this interface that it would work. > >> > >> Correct: > >> the AHCI driver only uses the device's MMIO region. The I/O > >> related regions are for legacy SFF-compatible ATA ports and are > >> not used to driver the device. This, coupled with the > >> observance that userspace accesses such as udevadm, and others > >> like Greg additionally pointed out, do not filter through the > >> device's driver seems to suggest that changes to the driver will > >> not help here either. > > > > That may be true of our AHCI driver, but when it's assigned to a guest > > we're potentially using a completely different stack and cannot make > > that assumption. A guest running in compatibility mode or the option > > ROM for the device may still use I/O port regions. Thanks, > > > > Alex > > > > > > In quick summary: > (1)reading a device's registers may have side effects > on the device operation, e.g., a register maps to a device's FIFO register. > (2) Having two threads read such device registers can cause unknown results, > i.e., driver & user-app. > (3) It may be valid for a user-app to read device regs, e.g., > qemu-kvm assigned device > > So, can't it be solved by: > (a) if no driver is configured for the device, than it's valid for a user-app > to read the device regs ? > -- although diff. user apps doing so still exposes the problem, and > can't be distinguished, e.g., qemu-kvm + udevadm > -- or can file permissions (set by libvirt driving qemu-kvm > device assignment) block multiple user-app reading ? > i.e., basically, a user-level version of a driver allocating > the device, which in the case of qemu-kvm device-assignment, > is what is actually happening! :) > (b) if driver is configured, need a quirk-registration, or generic, optional, > driver function to check for user-app reading approval. > > ok, bash away... I think concurrency is a secondary issue. The primary issue is whether read() is somehow so special in sysfs that all files need to be regarded as o+r. If that's true, then indeed there are concurrency issues. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 18, 2013 at 10:24:40AM -0600, Alex Williamson wrote: > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: > > On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson > > <alex.williamson@redhat.com> wrote: > > > I'm assuming that the device only breaks because udevadm is dumping the > > > full I/O port register space of the device and that if an actual driver > > > was interacting with it through this interface that it would work. Who > > > knows how many devices will have read side-effects by udevadm blindly > > > dumping these files. Thanks, > > > > Sysfs is a too public interface to export things there which make > > devices/driver choke on a simple read() of an attribute. > > That's why the default permissions for the file do not allow users to > read it. I wish we could do something as clever as the MMIO resource > files, but I/O port spaces don't allow mmap for the predominant > architecture. Eventually VFIO is meant to replace this access and does > move device register access behind ioctls, but for now legacy KVM device > assignment relies on these files and so might some UIO drivers. > > > This is nothing specific to udevadm, any tool can do that. Udevadm > > will never read any of the files during normal operation. The admin > > explicitly asked udevadm with a specific command to dump all the stuff > > the device offers. > > Isn't it possible udevadm could drop privileges or filter out non-world > readable files? And you are going to do the same thing for bash? All other shells? Come on, the user specifically asked to read this file, as root, and udev did so. Just like bash would. Please fix the kernel if this is a real problem, you aren't going to be able to patch all userspace programs, that's not the proper solution here. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 09:41 -0700, Greg KH wrote: > On Mon, Mar 18, 2013 at 10:24:40AM -0600, Alex Williamson wrote: > > On Sun, 2013-03-17 at 15:00 +0100, Kay Sievers wrote: > > > On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson > > > <alex.williamson@redhat.com> wrote: > > > > I'm assuming that the device only breaks because udevadm is dumping the > > > > full I/O port register space of the device and that if an actual driver > > > > was interacting with it through this interface that it would work. Who > > > > knows how many devices will have read side-effects by udevadm blindly > > > > dumping these files. Thanks, > > > > > > Sysfs is a too public interface to export things there which make > > > devices/driver choke on a simple read() of an attribute. > > > > That's why the default permissions for the file do not allow users to > > read it. I wish we could do something as clever as the MMIO resource > > files, but I/O port spaces don't allow mmap for the predominant > > architecture. Eventually VFIO is meant to replace this access and does > > move device register access behind ioctls, but for now legacy KVM device > > assignment relies on these files and so might some UIO drivers. > > > > > This is nothing specific to udevadm, any tool can do that. Udevadm > > > will never read any of the files during normal operation. The admin > > > explicitly asked udevadm with a specific command to dump all the stuff > > > the device offers. > > > > Isn't it possible udevadm could drop privileges or filter out non-world > > readable files? > > And you are going to do the same thing for bash? All other shells? > > Come on, the user specifically asked to read this file, as root, and > udev did so. Just like bash would. > > Please fix the kernel if this is a real problem, you aren't going to be > able to patch all userspace programs, that's not the proper solution > here. At least for KVM the kernel fix is the addition of the vfio driver which gives us a non-sysfs way to do this. If this problem was found a few years later and we were ready to make the switch I'd support just removing these resource files. In the meantime we have userspace that depends on this interface, so I'm open to suggestions how to fix it. If we want to blacklist this specific device, that's fine, but as others have pointed out it's really a class problem. Perhaps we report 1 byte extra for the file length where EOF-1 is an enable byte? Is there anything else in file ops that we could use to make it slightly more complicated than open(), read() to access the device? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Williamson <alex.williamson@redhat.com> writes: > At least for KVM the kernel fix is the addition of the vfio driver which > gives us a non-sysfs way to do this. If this problem was found a few > years later and we were ready to make the switch I'd support just > removing these resource files. In the meantime we have userspace that > depends on this interface, so I'm open to suggestions how to fix it. I am puzzled by a couple of things in this discussion: 1) do you seriously mean that a userspace application (any, not just udevadm or qemu or whatever) should be able to read and write these registers while the device is owned by a driver? How is that ever going to work? 2) is it really so that a device can be so fundamentally screwed up by reading some registers, that a later driver probe cannot properly reinitialize it? I would have thought that the solution to all this was to return -EINVAL on any attemt to read or write these files while a driver is bound to the device. If userspace is going to use the API, then the application better unbind any driver first. Or? Am I missing something here? > If we want to blacklist this specific device, that's fine, but as others > have pointed out it's really a class problem. Perhaps we report 1 byte > extra for the file length where EOF-1 is an enable byte? Is there > anything else in file ops that we could use to make it slightly more > complicated than open(), read() to access the device? Thanks, If there really are devices which cannot handle reading at all, and cannot be reset to a sane state by later driver initialization, then a blacklist could be added for those devices. This should not be a common problem. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: > Alex Williamson <alex.williamson@redhat.com> writes: > > > At least for KVM the kernel fix is the addition of the vfio driver which > > gives us a non-sysfs way to do this. If this problem was found a few > > years later and we were ready to make the switch I'd support just > > removing these resource files. In the meantime we have userspace that > > depends on this interface, so I'm open to suggestions how to fix it. > > I am puzzled by a couple of things in this discussion: > > 1) do you seriously mean that a userspace application (any, not just > udevadm or qemu or whatever) should be able to read and write these > registers while the device is owned by a driver? How is that ever > going to work? The expectation is that the user doesn't mess with the device through pci-sysfs while it's running. This is really no different than config space or MMIO space in that respect. You can use setpci to break your PCI card while it's used by the driver today. The difference is that MMIO spaces side-step the issue by only allowing mmap and config space is known not to have read side-effects. > 2) is it really so that a device can be so fundamentally screwed up by > reading some registers, that a later driver probe cannot properly > reinitialize it? Never underestimate how broken hardware can be, though in this case reading a device register seems to be causing a system hang/reset. > I would have thought that the solution to all this was to return -EINVAL > on any attemt to read or write these files while a driver is bound to > the device. If userspace is going to use the API, then the application > better unbind any driver first. > > Or? Am I missing something here? That doesn't really solve anything though. Let's pretend the resource files only work while the device is bound to pci-stub. Now what happens when you run this udevadm command as admin while it's in use by the userspace driver? All we've done is limit the scope of the problem. > > If we want to blacklist this specific device, that's fine, but as others > > have pointed out it's really a class problem. Perhaps we report 1 byte > > extra for the file length where EOF-1 is an enable byte? Is there > > anything else in file ops that we could use to make it slightly more > > complicated than open(), read() to access the device? Thanks, > > If there really are devices which cannot handle reading at all, and > cannot be reset to a sane state by later driver initialization, then a > blacklist could be added for those devices. This should not be a common > problem. Yes, if these are dead registers, let's blacklist and move along. I suspect though that these registers probably work fine if you access them according to the device programming model, so blacklisting just prevents full use through something like KVM device assignment. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/18/13 13:54, Alex Williamson wrote: > On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: >> Alex Williamson <alex.williamson@redhat.com> writes: >> >>> At least for KVM the kernel fix is the addition of the vfio driver which >>> gives us a non-sysfs way to do this. If this problem was found a few >>> years later and we were ready to make the switch I'd support just >>> removing these resource files. In the meantime we have userspace that >>> depends on this interface, so I'm open to suggestions how to fix it. >> I am puzzled by a couple of things in this discussion: >> >> 1) do you seriously mean that a userspace application (any, not just >> udevadm or qemu or whatever) should be able to read and write these >> registers while the device is owned by a driver? How is that ever >> going to work? > The expectation is that the user doesn't mess with the device through > pci-sysfs while it's running. This is really no different than config > space or MMIO space in that respect. You can use setpci to break your > PCI card while it's used by the driver today. The difference is that > MMIO spaces side-step the issue by only allowing mmap and config space > is known not to have read side-effects. > >> 2) is it really so that a device can be so fundamentally screwed up by >> reading some registers, that a later driver probe cannot properly >> reinitialize it? > Never underestimate how broken hardware can be, though in this case > reading a device register seems to be causing a system hang/reset. The real problem is that PCI devices can be bus masters, which means they can screw up *ANYTHING* (almost)! -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Williamson <alex.williamson@redhat.com> wrote: >On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: >> Alex Williamson <alex.williamson@redhat.com> writes: >> >> > At least for KVM the kernel fix is the addition of the vfio driver >which >> > gives us a non-sysfs way to do this. If this problem was found a >few >> > years later and we were ready to make the switch I'd support just >> > removing these resource files. In the meantime we have userspace >that >> > depends on this interface, so I'm open to suggestions how to fix >it. >> >> I am puzzled by a couple of things in this discussion: >> >> 1) do you seriously mean that a userspace application (any, not just >> udevadm or qemu or whatever) should be able to read and write >these >> registers while the device is owned by a driver? How is that ever >> going to work? > >The expectation is that the user doesn't mess with the device through >pci-sysfs while it's running. This is really no different than config >space or MMIO space in that respect. But it is. That's the problem. As a user I expect to be able to run e.g "grep . /sys/devices/whatever/*" with no ill effects. This holds for config space or MMIO space. It does not for any reset-on-read register. > You can use setpci to break your >PCI card while it's used by the driver today. The difference is that >MMIO spaces side-step the issue by only allowing mmap and config space >is known not to have read side-effects. Yes. And that is why there is no problem exporting those. This difference is fundamental. >> 2) is it really so that a device can be so fundamentally screwed up >by >> reading some registers, that a later driver probe cannot properly >> reinitialize it? > >Never underestimate how broken hardware can be, True :) > though in this case >reading a device register seems to be causing a system hang/reset. I understand that it does so if the ahci driver is bound to the device while reading the registers, but does it also hang the system with no bound driver? How does it do that? By killing the bus? >> I would have thought that the solution to all this was to return >-EINVAL >> on any attemt to read or write these files while a driver is bound to >> the device. If userspace is going to use the API, then the >application >> better unbind any driver first. >> >> Or? Am I missing something here? > >That doesn't really solve anything though. Let's pretend the resource >files only work while the device is bound to pci-stub. Now what >happens >when you run this udevadm command as admin while it's in use by the >userspace driver? All we've done is limit the scope of the problem. Assuming that the system hangs without driver help and that this brokenness is widespread. I don't think any of those assumptions hold. Do they? >> > If we want to blacklist this specific device, that's fine, but as >others >> > have pointed out it's really a class problem. Perhaps we report 1 >byte >> > extra for the file length where EOF-1 is an enable byte? Is there >> > anything else in file ops that we could use to make it slightly >more >> > complicated than open(), read() to access the device? Thanks, >> >> If there really are devices which cannot handle reading at all, and >> cannot be reset to a sane state by later driver initialization, then >a >> blacklist could be added for those devices. This should not be a >common >> problem. > >Yes, if these are dead registers, let's blacklist and move along. I >suspect though that these registers probably work fine if you access >them according to the device programming model, so blacklisting just >prevents full use through something like KVM device assignment. Well, if the device is that broken then I think it will require the kernel to police the device programming. I don't see how you can leave a bomb like that because it might be useful in a rare and very theoretical case. Easier to just blacklist it... Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 19:25 +0100, Bjørn Mork wrote: > Alex Williamson <alex.williamson@redhat.com> wrote: > > >On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: > >> Alex Williamson <alex.williamson@redhat.com> writes: > >> > >> > At least for KVM the kernel fix is the addition of the vfio driver > >which > >> > gives us a non-sysfs way to do this. If this problem was found a > >few > >> > years later and we were ready to make the switch I'd support just > >> > removing these resource files. In the meantime we have userspace > >that > >> > depends on this interface, so I'm open to suggestions how to fix > >it. > >> > >> I am puzzled by a couple of things in this discussion: > >> > >> 1) do you seriously mean that a userspace application (any, not just > >> udevadm or qemu or whatever) should be able to read and write > >these > >> registers while the device is owned by a driver? How is that ever > >> going to work? > > > >The expectation is that the user doesn't mess with the device through > >pci-sysfs while it's running. This is really no different than config > >space or MMIO space in that respect. > > But it is. That's the problem. As a user I expect to be able to run > e.g "grep . /sys/devices/whatever/*" with no ill effects. This holds > for config space or MMIO space. It does not for any reset-on-read > register. As a non-admin user you can > > You can use setpci to break your > >PCI card while it's used by the driver today. The difference is that > >MMIO spaces side-step the issue by only allowing mmap and config space > >is known not to have read side-effects. > > Yes. And that is why there is no problem exporting those. This > difference is fundamental. So how do we side-step the problem with I/O port registers? If we remove them then KVM needs to run with iopl which is a pretty serious security hole should QEMU be exploited. We could activate the resource files only when the device is bound to pci-assign, but that only limits the scope and might break UIO drivers. We could modify the file to have an enable sequence, but we can't do this without breaking current userspace. As I mentioned, the VFIO driver is intended to replace KVM's use of these files, but we're not ready to rip it out, perhaps not even ready to declare it deprecated. > >> 2) is it really so that a device can be so fundamentally screwed up > >by > >> reading some registers, that a later driver probe cannot properly > >> reinitialize it? > > > >Never underestimate how broken hardware can be, > > True :) > > > though in this case > >reading a device register seems to be causing a system hang/reset. > > I understand that it does so if the ahci driver is bound to the device > while reading the registers, but does it also hang the system with no > bound driver? How does it do that? By killing the bus? I don't know, Myron? > >> I would have thought that the solution to all this was to return > >-EINVAL > >> on any attemt to read or write these files while a driver is bound to > >> the device. If userspace is going to use the API, then the > >application > >> better unbind any driver first. > >> > >> Or? Am I missing something here? > > > >That doesn't really solve anything though. Let's pretend the resource > >files only work while the device is bound to pci-stub. Now what > >happens > >when you run this udevadm command as admin while it's in use by the > >userspace driver? All we've done is limit the scope of the problem. > > Assuming that the system hangs without driver help and that this > brokenness is widespread. I don't think any of those assumptions hold. > Do they? I thought it was true that for this device a system hang happened regardless of the host driver, but haven't seen the original bug report. As for widespread, this is the first I've heard of problems in the 2.5+ years that we've supported these I/O port resource files. The rest is probably just FUD about random userspace apps trolling through device registers. > >> > If we want to blacklist this specific device, that's fine, but as > >others > >> > have pointed out it's really a class problem. Perhaps we report 1 > >byte > >> > extra for the file length where EOF-1 is an enable byte? Is there > >> > anything else in file ops that we could use to make it slightly > >more > >> > complicated than open(), read() to access the device? Thanks, > >> > >> If there really are devices which cannot handle reading at all, and > >> cannot be reset to a sane state by later driver initialization, then > >a > >> blacklist could be added for those devices. This should not be a > >common > >> problem. > > > >Yes, if these are dead registers, let's blacklist and move along. I > >suspect though that these registers probably work fine if you access > >them according to the device programming model, so blacklisting just > >prevents full use through something like KVM device assignment. > > Well, if the device is that broken then I think it will require the > kernel to police the device programming. I don't see how you can leave > a bomb like that because it might be useful in a rare and very > theoretical case. > > Easier to just blacklist it... Easier, yes. But it likely just kicks the problem down the road until the next device. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/16/2013 07:03 PM, Greg KH wrote: > On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: >> On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: >>> On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: >>>> Sysfs includes entries to memory that backs a PCI device's BARs, both I/O >>>> Port space and MMIO. This memory regions correspond to the device's >>>> internal status and control registers used to drive the device. >>>> >>>> Accessing these registers from userspace such as "udevadm info >>>> --attribute-walk --path=/sys/devices/..." does can not be allowed as >>>> such accesses outside of the driver, even just reading, can yield >>>> catastrophic consequences. >>>> >>>> Udevadm-info skips parsing a specific set of sysfs entries including >>>> 'resource'. This patch extends the set to include the additional >>>> 'resource<N>' entries that correspond to a PCI device's BARs. >>> >>> Nice, are you also going to patch bash to prevent a user from reading >>> these sysfs files as well? :) >>> >>> And pciutils? >>> >>> You get my point here, right? The root user just asked to read all of >>> the data for this device, so why wouldn't you allow it? Just like >>> 'lspci' does. Or bash does. lspci doesn't randomly attempt to access device registers, AFAIK.. >> >> Yes :P , you raise a very good point, there are a lot of way a user can >> poke around in those BARs. However, there is a difference between >> shooting yourself in the foot and getting what you deserve versus >> unknowingly executing a common command such as udevadm and having the >> system hang. >>> >>> If this hardware has a problem, then it needs to be fixed in the kernel, >>> not have random band-aids added to various userspace programs to paper >>> over the root problem here. Please fix the kernel driver and all should >>> be fine. No need to change udevadm. >> >> Xiangliang initially proposed a patch within the PCI core. Ignoring the >> specific issue with the proposal which I pointed out in the >> https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like >> the right place to effect a change either as PCI's core isn't concerned >> with the contents or access limitations of those regions, those are >> issues that the driver concerns itself with. >> >> So things seem to be gravitating towards the driver. I'm fairly >> ignorant of this area but as Robert succinctly pointed out in the >> originating thread - the AHCI driver only uses the device's MMIO region. >> The I/O related regions are for legacy SFF-compatible ATA ports and are >> not used to driver the device. This, coupled with the observance that >> userspace accesses such as udevadm, and others like you additionally >> point out, do not filter through the device's driver for seems to >> suggest that changes to the driver will not help here either. > > A PCI quirk should handle this properly, right? Why not do that? Worse > thing, the quirk could just not expose these sysfs files for this > device, which would solve all userspace program issues, right? A PCI quirk implies there is something wrong with this device in particular. This isn't the case. The device responds properly when it's accessed as intended. The problem is that udevadm (or other processes, like a random grep through sysfs for example) is effectively reading registers willy-nilly. This is absolutely not safe to do on many devices - and certainly not while a driver is attached to the device and has claimed the port or MMIO regions that are being accessed. Blocking access through these files to a device with an active driver that's claimed the regions would significantly reduce the chances of something like this causing problems. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 18, 2013 at 07:54:09PM -0600, Robert Hancock wrote: > On 03/16/2013 07:03 PM, Greg KH wrote: > >On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: > >>On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: > >>>On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: > >>>>Sysfs includes entries to memory that backs a PCI device's BARs, both I/O > >>>>Port space and MMIO. This memory regions correspond to the device's > >>>>internal status and control registers used to drive the device. > >>>> > >>>>Accessing these registers from userspace such as "udevadm info > >>>>--attribute-walk --path=/sys/devices/..." does can not be allowed as > >>>>such accesses outside of the driver, even just reading, can yield > >>>>catastrophic consequences. > >>>> > >>>>Udevadm-info skips parsing a specific set of sysfs entries including > >>>>'resource'. This patch extends the set to include the additional > >>>>'resource<N>' entries that correspond to a PCI device's BARs. > >>> > >>>Nice, are you also going to patch bash to prevent a user from reading > >>>these sysfs files as well? :) > >>> > >>>And pciutils? > >>> > >>>You get my point here, right? The root user just asked to read all of > >>>the data for this device, so why wouldn't you allow it? Just like > >>>'lspci' does. Or bash does. > > lspci doesn't randomly attempt to access device registers, AFAIK.. Have you read the man page for the '-xxx' option to lspci? lspci can be quite intrusive, and I used to have a number of systems that it would trash very easily if you ran it on them as root. > >>Yes :P , you raise a very good point, there are a lot of way a user can > >>poke around in those BARs. However, there is a difference between > >>shooting yourself in the foot and getting what you deserve versus > >>unknowingly executing a common command such as udevadm and having the > >>system hang. > >>> > >>>If this hardware has a problem, then it needs to be fixed in the kernel, > >>>not have random band-aids added to various userspace programs to paper > >>>over the root problem here. Please fix the kernel driver and all should > >>>be fine. No need to change udevadm. > >> > >>Xiangliang initially proposed a patch within the PCI core. Ignoring the > >>specific issue with the proposal which I pointed out in the > >>https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like > >>the right place to effect a change either as PCI's core isn't concerned > >>with the contents or access limitations of those regions, those are > >>issues that the driver concerns itself with. > >> > >>So things seem to be gravitating towards the driver. I'm fairly > >>ignorant of this area but as Robert succinctly pointed out in the > >>originating thread - the AHCI driver only uses the device's MMIO region. > >>The I/O related regions are for legacy SFF-compatible ATA ports and are > >>not used to driver the device. This, coupled with the observance that > >>userspace accesses such as udevadm, and others like you additionally > >>point out, do not filter through the device's driver for seems to > >>suggest that changes to the driver will not help here either. > > > >A PCI quirk should handle this properly, right? Why not do that? Worse > >thing, the quirk could just not expose these sysfs files for this > >device, which would solve all userspace program issues, right? > > A PCI quirk implies there is something wrong with this device in > particular. This isn't the case. The device responds properly when > it's accessed as intended. The problem is that udevadm (or other > processes, like a random grep through sysfs for example) is > effectively reading registers willy-nilly. This is absolutely not > safe to do on many devices - and certainly not while a driver is > attached to the device and has claimed the port or MMIO regions that > are being accessed. Then we need to fix that! In the kernel! Don't try to gloss over the problem by changing one random userspace program, you will never catch them all. Fix the root problem here people, that's all I'm asking for. > Blocking access through these files to a device with an active driver > that's claimed the regions would significantly reduce the chances of > something like this causing problems. Great, that's one possible solution, the other is just not creating the files at all for known problem devices, right? My main point here is, you aren't going to fix this in userspace, fix it in the kernel. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 18, 2013 at 8:03 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Mon, Mar 18, 2013 at 07:54:09PM -0600, Robert Hancock wrote: >> On 03/16/2013 07:03 PM, Greg KH wrote: >> >On Sat, Mar 16, 2013 at 05:50:53PM -0600, Myron Stowe wrote: >> >>On Sat, 2013-03-16 at 15:11 -0700, Greg KH wrote: >> >>>On Sat, Mar 16, 2013 at 03:35:19PM -0600, Myron Stowe wrote: >> >>>>Sysfs includes entries to memory that backs a PCI device's BARs, both I/O >> >>>>Port space and MMIO. This memory regions correspond to the device's >> >>>>internal status and control registers used to drive the device. >> >>>> >> >>>>Accessing these registers from userspace such as "udevadm info >> >>>>--attribute-walk --path=/sys/devices/..." does can not be allowed as >> >>>>such accesses outside of the driver, even just reading, can yield >> >>>>catastrophic consequences. >> >>>> >> >>>>Udevadm-info skips parsing a specific set of sysfs entries including >> >>>>'resource'. This patch extends the set to include the additional >> >>>>'resource<N>' entries that correspond to a PCI device's BARs. >> >>> >> >>>Nice, are you also going to patch bash to prevent a user from reading >> >>>these sysfs files as well? :) >> >>> >> >>>And pciutils? >> >>> >> >>>You get my point here, right? The root user just asked to read all of >> >>>the data for this device, so why wouldn't you allow it? Just like >> >>>'lspci' does. Or bash does. >> >> lspci doesn't randomly attempt to access device registers, AFAIK.. > > Have you read the man page for the '-xxx' option to lspci? lspci can be > quite intrusive, and I used to have a number of systems that it would > trash very easily if you ran it on them as root. > >> >>Yes :P , you raise a very good point, there are a lot of way a user can >> >>poke around in those BARs. However, there is a difference between >> >>shooting yourself in the foot and getting what you deserve versus >> >>unknowingly executing a common command such as udevadm and having the >> >>system hang. >> >>> >> >>>If this hardware has a problem, then it needs to be fixed in the kernel, >> >>>not have random band-aids added to various userspace programs to paper >> >>>over the root problem here. Please fix the kernel driver and all should >> >>>be fine. No need to change udevadm. >> >> >> >>Xiangliang initially proposed a patch within the PCI core. Ignoring the >> >>specific issue with the proposal which I pointed out in the >> >>https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like >> >>the right place to effect a change either as PCI's core isn't concerned >> >>with the contents or access limitations of those regions, those are >> >>issues that the driver concerns itself with. >> >> >> >>So things seem to be gravitating towards the driver. I'm fairly >> >>ignorant of this area but as Robert succinctly pointed out in the >> >>originating thread - the AHCI driver only uses the device's MMIO region. >> >>The I/O related regions are for legacy SFF-compatible ATA ports and are >> >>not used to driver the device. This, coupled with the observance that >> >>userspace accesses such as udevadm, and others like you additionally >> >>point out, do not filter through the device's driver for seems to >> >>suggest that changes to the driver will not help here either. >> > >> >A PCI quirk should handle this properly, right? Why not do that? Worse >> >thing, the quirk could just not expose these sysfs files for this >> >device, which would solve all userspace program issues, right? >> >> A PCI quirk implies there is something wrong with this device in >> particular. This isn't the case. The device responds properly when >> it's accessed as intended. The problem is that udevadm (or other >> processes, like a random grep through sysfs for example) is >> effectively reading registers willy-nilly. This is absolutely not >> safe to do on many devices - and certainly not while a driver is >> attached to the device and has claimed the port or MMIO regions that >> are being accessed. > > Then we need to fix that! > > In the kernel! > > Don't try to gloss over the problem by changing one random userspace > program, you will never catch them all. Fix the root problem here > people, that's all I'm asking for. > >> Blocking access through these files to a device with an active driver >> that's claimed the regions would significantly reduce the chances of >> something like this causing problems. > > Great, that's one possible solution, the other is just not creating the > files at all for known problem devices, right? I don't think one can reasonably enumerate all problem devices. There are probably countless devices which can potentially break if their resources (especially IO ports) are read in unexpected ways. Aside from devices like this one, which apparently don't like certain IO ports being read with certain access widths, there's every device in existence with read-to-reset type registers. The fix to this needs to apply to all devices. > > My main point here is, you aren't going to fix this in userspace, fix it > in the kernel. The kernel can help the situation by blocking access to devices with an active driver, but it can't fix all cases. Suppose the device has no driver loaded yet, how is the kernel supposed to tell the difference between software with a legitimate need to access these files for virtualization device assignment, etc. and something like udevadm or a random grep command that's reading the files without any idea what it's doing? udevadm does need to be fixed to avoid accessing these files because it's unnecessary and dangerous. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 18, 2013 at 08:09:22PM -0600, Robert Hancock wrote: > > Great, that's one possible solution, the other is just not creating the > > files at all for known problem devices, right? > > I don't think one can reasonably enumerate all problem devices. There > are probably countless devices which can potentially break if their > resources (especially IO ports) are read in unexpected ways. Aside > from devices like this one, which apparently don't like certain IO > ports being read with certain access widths, there's every device in > existence with read-to-reset type registers. The fix to this needs to > apply to all devices. > > > > > My main point here is, you aren't going to fix this in userspace, fix it > > in the kernel. > > The kernel can help the situation by blocking access to devices with > an active driver, but it can't fix all cases. Suppose the device has > no driver loaded yet, how is the kernel supposed to tell the > difference between software with a legitimate need to access these > files for virtualization device assignment, etc. and something like > udevadm or a random grep command that's reading the files without any > idea what it's doing? udevadm does need to be fixed to avoid accessing > these files because it's unnecessary and dangerous. Are you going to also fix grep? bash? cat? Come on, be realistic. If these files are so dangerous then they need to just be removed entirely from the kernel. You aren't going to be able to patch grep for this. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Mar 18, 2013 at 8:35 PM, Greg KH <gregkh@linuxfoundation.org> wrote: > On Mon, Mar 18, 2013 at 08:09:22PM -0600, Robert Hancock wrote: >> > Great, that's one possible solution, the other is just not creating the >> > files at all for known problem devices, right? >> >> I don't think one can reasonably enumerate all problem devices. There >> are probably countless devices which can potentially break if their >> resources (especially IO ports) are read in unexpected ways. Aside >> from devices like this one, which apparently don't like certain IO >> ports being read with certain access widths, there's every device in >> existence with read-to-reset type registers. The fix to this needs to >> apply to all devices. >> >> > >> > My main point here is, you aren't going to fix this in userspace, fix it >> > in the kernel. >> >> The kernel can help the situation by blocking access to devices with >> an active driver, but it can't fix all cases. Suppose the device has >> no driver loaded yet, how is the kernel supposed to tell the >> difference between software with a legitimate need to access these >> files for virtualization device assignment, etc. and something like >> udevadm or a random grep command that's reading the files without any >> idea what it's doing? udevadm does need to be fixed to avoid accessing >> these files because it's unnecessary and dangerous. > > Are you going to also fix grep? bash? cat? > > Come on, be realistic. If these files are so dangerous then they need > to just be removed entirely from the kernel. You aren't going to be > able to patch grep for this. Well, clearly not. Although accessing this file with grep, etc. is really just another way root can shoot themselves in the foot, it would be nice if this functionality could be provided in a way that didn't leave this kind of exposed land mine. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2013-03-18 at 12:59 -0600, Alex Williamson wrote: > On Mon, 2013-03-18 at 19:25 +0100, Bjørn Mork wrote: > > Alex Williamson <alex.williamson@redhat.com> wrote: > > > > >On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: > > >> Alex Williamson <alex.williamson@redhat.com> writes: > > >> > > >> > At least for KVM the kernel fix is the addition of the vfio driver > > >which > > >> > gives us a non-sysfs way to do this. If this problem was found a > > >few > > >> > years later and we were ready to make the switch I'd support just > > >> > removing these resource files. In the meantime we have userspace > > >that > > >> > depends on this interface, so I'm open to suggestions how to fix > > >it. > > >> > > >> I am puzzled by a couple of things in this discussion: > > >> > > >> 1) do you seriously mean that a userspace application (any, not just > > >> udevadm or qemu or whatever) should be able to read and write > > >these > > >> registers while the device is owned by a driver? How is that ever > > >> going to work? > > > > > >The expectation is that the user doesn't mess with the device through > > >pci-sysfs while it's running. This is really no different than config > > >space or MMIO space in that respect. > > > > But it is. That's the problem. As a user I expect to be able to run > > e.g "grep . /sys/devices/whatever/*" with no ill effects. This holds > > for config space or MMIO space. It does not for any reset-on-read > > register. > > As a non-admin user you can > > > > You can use setpci to break your > > >PCI card while it's used by the driver today. The difference is that > > >MMIO spaces side-step the issue by only allowing mmap and config space > > >is known not to have read side-effects. > > > > Yes. And that is why there is no problem exporting those. This > > difference is fundamental. > > So how do we side-step the problem with I/O port registers? If we > remove them then KVM needs to run with iopl which is a pretty serious > security hole should QEMU be exploited. We could activate the resource > files only when the device is bound to pci-assign, but that only limits > the scope and might break UIO drivers. We could modify the file to have > an enable sequence, but we can't do this without breaking current > userspace. As I mentioned, the VFIO driver is intended to replace KVM's > use of these files, but we're not ready to rip it out, perhaps not even > ready to declare it deprecated. > > > >> 2) is it really so that a device can be so fundamentally screwed up > > >by > > >> reading some registers, that a later driver probe cannot properly > > >> reinitialize it? > > > > > >Never underestimate how broken hardware can be, > > > > True :) > > > > > though in this case > > >reading a device register seems to be causing a system hang/reset. > > > > I understand that it does so if the ahci driver is bound to the device > > while reading the registers, but does it also hang the system with no > > bound driver? How does it do that? By killing the bus? > > I don't know, Myron? Yes - the system hangs when BAR1's (and likely BAR3's) I/O port space is read. Here are the details that I've been able to put together from the two linux-pci threads and various online sources - From Robert Hancock - "... BAR5 is the MMIO region used by the AHCI driver. BARs 0-4 are the legacy SFF-compatible ATA ports. Nothing should be messing with those IO ports while AHCI is enabled. ..." This likely explains why the system boots and runs fine as long as the 'udevadm ...' command is *not* ran (i.e. the driver never accesses the I/O port BARs). Using a SATA controller I have access to as an example for the details (Note: I do not have access to a system with the Marvell 9125 device): 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06) (prog-if 01 [AHCI 1.0]) Subsystem: Lenovo Device 2168 Region 0: I/O ports at 1860 [size=8] Region 1: I/O ports at 1814 [size=4] Region 2: I/O ports at 1818 [size=8] Region 3: I/O ports at 1810 [size=4] Region 4: I/O ports at 1840 [size=32] Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K] I/O port registers [1][2]: Primary IDE controller [0x1860-0x1867; 0x1814-0x1817] BAR0 Base address for the command block registers for ATA Channel X 0x1860 (Read/Write): Data Register 0x1861 (Read): Error Register 0x1861 (Write): Features Register 0x1862 (Read/Write): Sector Count Register 0x1863 (Read/Write): LBA Low Register 0x1864 (Read/Write): LBA Mid Register 0x1865 (Read/Write): LBA High Register 0x1866 (Read/Write): Drive/Head Register 0x1867 (Read): Status Register 0x1867 (Write): Command Register BAR1* Base address for the control register for ATA Channel X 0x1814 Reserved 0x1815 Reserved 0x1816 (Read): Alternate Status Register 0x1816 (Write): Device Control Register 0x1817 Reserved * The base must be Dword aligned; a PCI requirement. The Device Control and Alternate Status Registers are at ofset 0x2 from this base. [1] www.t13.org/documents/UploadedDocuments/project/d1510r1-Host-Adapter.pdf [2] lateblt.tripod.com/atapi.htm From Xiangliang - executing 'udevadm ...' causes a 32-bit I/O port read to BAR1's region. This is shown by the BE (Byte Enable) value of 0x1111. So apparently reads to this region that include any of reserved Bytes causes "the chip will go bad." So, only a Byte access at offset 2 is successful. I have not been able to get any more details as to the exact cause of the hang. I would have thought that the PCI transaction would have just timed out, or errored out, or something but apparently the platform ends up hanging. It appears that this device did not implement the reserved registers such that they would return 0 on reads or something more similarly sane. Since BARs 2 and 3 are not 0, indicating the device only supports one channel, I expect the same issue will occur when accessing BAR3. Again, I do not have access to a system with this device to test with. > > > >> I would have thought that the solution to all this was to return > > >-EINVAL > > >> on any attemt to read or write these files while a driver is bound to > > >> the device. If userspace is going to use the API, then the > > >application > > >> better unbind any driver first. > > >> > > >> Or? Am I missing something here? > > > > > >That doesn't really solve anything though. Let's pretend the resource > > >files only work while the device is bound to pci-stub. Now what > > >happens > > >when you run this udevadm command as admin while it's in use by the > > >userspace driver? All we've done is limit the scope of the problem. > > > > Assuming that the system hangs without driver help and that this > > brokenness is widespread. I don't think any of those assumptions hold. > > Do they? > > I thought it was true that for this device a system hang happened > regardless of the host driver, but haven't seen the original bug report. > As for widespread, this is the first I've heard of problems in the 2.5+ > years that we've supported these I/O port resource files. The rest is > probably just FUD about random userspace apps trolling through device > registers. > > > >> > If we want to blacklist this specific device, that's fine, but as > > >others > > >> > have pointed out it's really a class problem. Perhaps we report 1 > > >byte > > >> > extra for the file length where EOF-1 is an enable byte? Is there > > >> > anything else in file ops that we could use to make it slightly > > >more > > >> > complicated than open(), read() to access the device? Thanks, > > >> > > >> If there really are devices which cannot handle reading at all, and > > >> cannot be reset to a sane state by later driver initialization, then > > >a > > >> blacklist could be added for those devices. This should not be a > > >common > > >> problem. > > > > > >Yes, if these are dead registers, let's blacklist and move along. I > > >suspect though that these registers probably work fine if you access > > >them according to the device programming model, so blacklisting just > > >prevents full use through something like KVM device assignment. > > > > Well, if the device is that broken then I think it will require the > > kernel to police the device programming. I don't see how you can leave > > a bomb like that because it might be useful in a rare and very > > theoretical case. > > > > Easier to just blacklist it... > > Easier, yes. But it likely just kicks the problem down the road until > the next device. Thanks, > > Alex > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-03-19 at 10:57 -0600, Myron Stowe wrote: > On Mon, 2013-03-18 at 12:59 -0600, Alex Williamson wrote: > > On Mon, 2013-03-18 at 19:25 +0100, Bjørn Mork wrote: > > > Alex Williamson <alex.williamson@redhat.com> wrote: > > > > > > >On Mon, 2013-03-18 at 18:20 +0100, Bjørn Mork wrote: > > > >> Alex Williamson <alex.williamson@redhat.com> writes: > > > >> > > > >> > At least for KVM the kernel fix is the addition of the vfio driver > > > >which > > > >> > gives us a non-sysfs way to do this. If this problem was found a > > > >few > > > >> > years later and we were ready to make the switch I'd support just > > > >> > removing these resource files. In the meantime we have userspace > > > >that > > > >> > depends on this interface, so I'm open to suggestions how to fix > > > >it. > > > >> > > > >> I am puzzled by a couple of things in this discussion: > > > >> > > > >> 1) do you seriously mean that a userspace application (any, not just > > > >> udevadm or qemu or whatever) should be able to read and write > > > >these > > > >> registers while the device is owned by a driver? How is that ever > > > >> going to work? > > > > > > > >The expectation is that the user doesn't mess with the device through > > > >pci-sysfs while it's running. This is really no different than config > > > >space or MMIO space in that respect. > > > > > > But it is. That's the problem. As a user I expect to be able to run > > > e.g "grep . /sys/devices/whatever/*" with no ill effects. This holds > > > for config space or MMIO space. It does not for any reset-on-read > > > register. > > > > As a non-admin user you can > > > > > > You can use setpci to break your > > > >PCI card while it's used by the driver today. The difference is that > > > >MMIO spaces side-step the issue by only allowing mmap and config space > > > >is known not to have read side-effects. > > > > > > Yes. And that is why there is no problem exporting those. This > > > difference is fundamental. > > > > So how do we side-step the problem with I/O port registers? If we > > remove them then KVM needs to run with iopl which is a pretty serious > > security hole should QEMU be exploited. We could activate the resource > > files only when the device is bound to pci-assign, but that only limits > > the scope and might break UIO drivers. We could modify the file to have > > an enable sequence, but we can't do this without breaking current > > userspace. As I mentioned, the VFIO driver is intended to replace KVM's > > use of these files, but we're not ready to rip it out, perhaps not even > > ready to declare it deprecated. > > > > > >> 2) is it really so that a device can be so fundamentally screwed up > > > >by > > > >> reading some registers, that a later driver probe cannot properly > > > >> reinitialize it? > > > > > > > >Never underestimate how broken hardware can be, > > > > > > True :) > > > > > > > though in this case > > > >reading a device register seems to be causing a system hang/reset. > > > > > > I understand that it does so if the ahci driver is bound to the device > > > while reading the registers, but does it also hang the system with no > > > bound driver? How does it do that? By killing the bus? > > > > I don't know, Myron? > > Yes - the system hangs when BAR1's (and likely BAR3's) I/O port space is > read. Sorry - that wasn't very explicit. Just accessing BAR1's region as udevadm does is enough to hang the system - even when no driver is bound. > > Here are the details that I've been able to put together from the two > linux-pci threads and various online sources - > > > From Robert Hancock - "... BAR5 is the MMIO region used by the AHCI > driver. BARs 0-4 are the legacy SFF-compatible ATA ports. Nothing > should be messing with those IO ports while AHCI is enabled. ..." This > likely explains why the system boots and runs fine as long as the > 'udevadm ...' command is *not* ran (i.e. the driver never accesses the > I/O port BARs). > > Using a SATA controller I have access to as an example for the details > (Note: I do not have access to a system with the Marvell 9125 device): > 00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chipset 6 port SATA AHCI Controller (rev 06) (prog-if 01 [AHCI 1.0]) > Subsystem: Lenovo Device 2168 > Region 0: I/O ports at 1860 [size=8] > Region 1: I/O ports at 1814 [size=4] > Region 2: I/O ports at 1818 [size=8] > Region 3: I/O ports at 1810 [size=4] > Region 4: I/O ports at 1840 [size=32] > Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=2K] > > I/O port registers [1][2]: > Primary IDE controller [0x1860-0x1867; 0x1814-0x1817] > BAR0 Base address for the command block registers for ATA Channel X > 0x1860 (Read/Write): Data Register > 0x1861 (Read): Error Register > 0x1861 (Write): Features Register > 0x1862 (Read/Write): Sector Count Register > 0x1863 (Read/Write): LBA Low Register > 0x1864 (Read/Write): LBA Mid Register > 0x1865 (Read/Write): LBA High Register > 0x1866 (Read/Write): Drive/Head Register > 0x1867 (Read): Status Register > 0x1867 (Write): Command Register > BAR1* Base address for the control register for ATA Channel X > 0x1814 Reserved > 0x1815 Reserved > 0x1816 (Read): Alternate Status Register > 0x1816 (Write): Device Control Register > 0x1817 Reserved > > * The base must be Dword aligned; a PCI requirement. The Device Control > and Alternate Status Registers are at ofset 0x2 from this base. > > [1] www.t13.org/documents/UploadedDocuments/project/d1510r1-Host-Adapter.pdf > [2] lateblt.tripod.com/atapi.htm > > From Xiangliang - executing 'udevadm ...' causes a 32-bit I/O port read > to BAR1's region. This is shown by the BE (Byte Enable) value of > 0x1111. So apparently reads to this region that include any of reserved > Bytes causes "the chip will go bad." > > So, only a Byte access at offset 2 is successful. I have not been able > to get any more details as to the exact cause of the hang. I would have > thought that the PCI transaction would have just timed out, or errored > out, or something but apparently the platform ends up hanging. > > It appears that this device did not implement the reserved registers > such that they would return 0 on reads or something more similarly sane. > > Since BARs 2 and 3 are not 0, indicating the device only supports one > channel, I expect the same issue will occur when accessing BAR3. Again, > I do not have access to a system with this device to test with. > > > > > > >> I would have thought that the solution to all this was to return > > > >-EINVAL > > > >> on any attemt to read or write these files while a driver is bound to > > > >> the device. If userspace is going to use the API, then the > > > >application > > > >> better unbind any driver first. > > > >> > > > >> Or? Am I missing something here? > > > > > > > >That doesn't really solve anything though. Let's pretend the resource > > > >files only work while the device is bound to pci-stub. Now what > > > >happens > > > >when you run this udevadm command as admin while it's in use by the > > > >userspace driver? All we've done is limit the scope of the problem. > > > > > > Assuming that the system hangs without driver help and that this > > > brokenness is widespread. I don't think any of those assumptions hold. > > > Do they? > > > > I thought it was true that for this device a system hang happened > > regardless of the host driver, but haven't seen the original bug report. > > As for widespread, this is the first I've heard of problems in the 2.5+ > > years that we've supported these I/O port resource files. The rest is > > probably just FUD about random userspace apps trolling through device > > registers. > > > > > >> > If we want to blacklist this specific device, that's fine, but as > > > >others > > > >> > have pointed out it's really a class problem. Perhaps we report 1 > > > >byte > > > >> > extra for the file length where EOF-1 is an enable byte? Is there > > > >> > anything else in file ops that we could use to make it slightly > > > >more > > > >> > complicated than open(), read() to access the device? Thanks, > > > >> > > > >> If there really are devices which cannot handle reading at all, and > > > >> cannot be reset to a sane state by later driver initialization, then > > > >a > > > >> blacklist could be added for those devices. This should not be a > > > >common > > > >> problem. > > > > > > > >Yes, if these are dead registers, let's blacklist and move along. I > > > >suspect though that these registers probably work fine if you access > > > >them according to the device programming model, so blacklisting just > > > >prevents full use through something like KVM device assignment. > > > > > > Well, if the device is that broken then I think it will require the > > > kernel to police the device programming. I don't see how you can leave > > > a bomb like that because it might be useful in a rare and very > > > theoretical case. > > > > > > Easier to just blacklist it... > > > > Easier, yes. But it likely just kicks the problem down the road until > > the next device. Thanks, > > > > Alex > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/udevadm-info.c b/src/udevadm-info.c index ee9b59f..298acb5 100644 --- a/src/udevadm-info.c +++ b/src/udevadm-info.c @@ -37,13 +37,18 @@ static bool skip_attribute(const char *name) "uevent", "dev", "modalias", - "resource", "driver", "subsystem", "module", }; unsigned int i; + /* + * Skip any sysfs 'resource' entries, including 'resource<N>' entries + * that correspond to a device's I/O Port or MMIO space backed BARs. + */ + if (strncmp((const char *)name, "resource", sizeof("resource")-1) == 0) + return true; for (i = 0; i < ARRAY_SIZE(skip); i++) if (strcmp(name, skip[i]) == 0) return true;
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O Port space and MMIO. This memory regions correspond to the device's internal status and control registers used to drive the device. Accessing these registers from userspace such as "udevadm info --attribute-walk --path=/sys/devices/..." does can not be allowed as such accesses outside of the driver, even just reading, can yield catastrophic consequences. Udevadm-info skips parsing a specific set of sysfs entries including 'resource'. This patch extends the set to include the additional 'resource<N>' entries that correspond to a PCI device's BARs. Reported-by: Xiangliang Yu <yuxiangl@marvell.com> Signed-off-by: Myron Stowe <myron.stowe@redhat.com> --- src/udevadm-info.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html