Message ID | 20211116153446.317143-1-lukasz.gieryk@linux.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | hw/nvme: SR-IOV with Virtualization Enhancements | expand |
Hi Lukasz, I've been through this. I have a couple of review comments, but overall looks good for inclusion in nvme-next. Would be nice to get this in early in the cycle so it can mature there for 7.0. I'd like that we mark this support experimental, so we can easily do some changes to how parameters work since I'm not sure we completely agree on that yet. By the way, in the future, please add me and Keith as CCs on the entire series so we get CC'ed on replies to the cover-letter ;) On Nov 16 16:34, Łukasz Gieryk wrote: > This is the updated version of SR-IOV support for the NVMe device. > > Changes since v1: > - Dropped the "pcie: Set default and supported MaxReadReq to 512" patch. > The original author agrees it may be no longer needed for recent > kernels. > - Dropped the "pcie: Add callback preceding SR-IOV VFs update" patch. > A customized pc->config_write callback is used instead. > - Split the "hw/nvme: Calculate BAR attributes in a function” patch into > cleanup and update parts. > - Reworked the configuration options related to SR-IOV. > - Fixed nvme_update_msixcap_ts() for platform without MSIX support. > - Updated handling of SUBSYS_SLOT_RSVD values in nvme_subsys_ctrl(). > - Updated error codes returned from the Virtualization Management > command (DNR is set). > - Updated typedef/enum names mismatch. > - Few other minor tweaks and improvements. > > List of known gaps and nice-to-haves: > > 1) Interaction of secondary controllers with namespaces is not 100% > following the spec > > The limitation: VF has to be visible on the PCI bus first, and only then > such VF can have a namespace attached. > Looking at the spec I'm not even sure what the expected behavior is supposed to be, can you elaborate? I rebased this on latest, and with Hannes changes, shared namespaces will be attached by default, which seems to be reasonable. > The problem is that the mapping between controller and attached > namespaces is stored in the NvmeCtrl object, and unrealized VF doesn’t > have this object allocated. There are multiple ways to address the > issue, but none of those we’ve considered so far is elegant. The fact, > that the namespace-related code area is busy (pending patches, some > rework?), doesn’t help either. > > > 2) VFs report and support the same features as the parent PF > > Due to security reasons, user of a VF should be not allowed to interact > with other VFs. Essentially, VFs should not announce and support: > Namespace Management, Attachment, corresponding Identify commands, and > maybe other features as well. > This can be relatively easily fixed I think. I have a patch that already does this for Administrative controller types and it should be applicable here as well. I can follow up with that. > > 3) PMR and CMB must be disabled when SR-IOV is active > > The main concern is whether PMR/CMB should be even implemented for a > device that supports SR-IOV. If the answer is yes, then another question > arises: how the feature should behave? Simulation of different devices > may require different implementation. > > It's too early to get into such details, so we’ve decided to disallow > both features altogether if SR-IOV is enabled. > Reasonable enough for now. > > 4) The Private Resources Mode > > The NVMe Spec defines Flexible Resources as an optional feature. VFs can > alternatively support a fixed number of queues/interrupts. > > This SR-IOV implementation supports Flexible Resources with the > Virtualization Management command, and a fallback to Private Resources > is not available. Support for such fallback, if there’s demand, can be > implemented later. > Acceptable. > > 5) Support for Namespace Management command > > Device with virtualization enhancements must support the Namespace > Management command. The command is not absolutely necessary to use > SR-IOV, but for a more complete set of features you may want to > cherry-pick this patch: > https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg03107.html > together with this fix: > https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg06734.html > I think we can live with just Namespace Attachment suppoort for now, until the above patches are accepted.
On Wed, Nov 24, 2021 at 09:03:06AM +0100, Klaus Jensen wrote: > Hi Lukasz, > > I've been through this. I have a couple of review comments, but overall > looks good for inclusion in nvme-next. Would be nice to get this in > early in the cycle so it can mature there for 7.0. We (I’m speaking on behalf of the other Lukasz) are really happy to read that. We will do our best to make it happen. > > I'd like that we mark this support experimental, so we can easily do > some changes to how parameters work since I'm not sure we completely > agree on that yet. > > By the way, in the future, please add me and Keith as CCs on the entire > series so we get CC'ed on replies to the cover-letter ;) > > > List of known gaps and nice-to-haves: > > > > 1) Interaction of secondary controllers with namespaces is not 100% > > following the spec > > > > The limitation: VF has to be visible on the PCI bus first, and only then > > such VF can have a namespace attached. > > > > Looking at the spec I'm not even sure what the expected behavior is > supposed to be, can you elaborate? I rebased this on latest, and with > Hannes changes, shared namespaces will be attached by default, which > seems to be reasonable. An example flow: # Release flexible resources from PF (assuming it’s /dev/nvme0) nvme virt-mgmt -c 0 -r 0 -n 0 -a 1 /dev/nvme0 nvme virt-mgmt -c 0 -r 1 -n 0 -a 1 /dev/nvme0 echo 1 > /sys/class/nvme/nvme0/reset_controller # Bind sane minimums to VF1 (cntlid=1) and set it online nvme virt-mgmt -c 1 -r 0 -n 5 -a 8 /dev/nvme0 nvme virt-mgmt -c 1 -r 1 -n 5 -a 8 /dev/nvme0 nvme virt-mgmt -c 1 -a 9 /dev/nvme0 # Enable 2 VFs echo 2 > /sys/bus/pci/devices/<PF’s id>/sriov_numvfs # PF, VF1 and VF2 are visible on PCI lspci | grep Non-Volatile # The NVMe driver is bound to PF and VF1 (the only online VF) nvme list -v # VFs shall eventually not support Ns Management/Attachment commands, # and namespaces should be attached to VFs (i.e., their secondary # controllers) through the PF. # A namespace can be attached to VF1, VF2 nvme attach-ns /dev/nvme0 -c 1 -n X nvme attach-ns /dev/nvme0 -c 2 -n X # According to the spec this should also succeed, but today it won’t nvme attach-ns /dev/nvme0 -c 3 -n X VF3’s NvmeCtrl object is not yet allocated, so today there’s nothing for nvme_subsys_ctrl() to return for cntlid=3, besides NULL (the current behavior) or SUBSYS_SLOT_RSVD. Relevant use cases: - admin can configure disabled VFs, - information about attached ns persists when VFs are disabled, are not that critical, but of course it’s a discrepancy from what a real device can handle. In my opinion, to handle the cases correctly, information about attached namespaces could be moved to subsystem. Could you share your thoughts whether such approach would make sense?
On Nov 25 15:15, Łukasz Gieryk wrote: > On Wed, Nov 24, 2021 at 09:03:06AM +0100, Klaus Jensen wrote: > > Hi Lukasz, > > > > I've been through this. I have a couple of review comments, but overall > > looks good for inclusion in nvme-next. Would be nice to get this in > > early in the cycle so it can mature there for 7.0. > > We (I’m speaking on behalf of the other Lukasz) are really happy to > read that. We will do our best to make it happen. > Keith, do you have any comments on this series - otherwise I'd like to stage this for nvme-next come January. > > > > I'd like that we mark this support experimental, so we can easily do > > some changes to how parameters work since I'm not sure we completely > > agree on that yet. > > > > By the way, in the future, please add me and Keith as CCs on the entire > > series so we get CC'ed on replies to the cover-letter ;) > > > > > > List of known gaps and nice-to-haves: > > > > > > 1) Interaction of secondary controllers with namespaces is not 100% > > > following the spec > > > > > > The limitation: VF has to be visible on the PCI bus first, and only then > > > such VF can have a namespace attached. > > > > > > > Looking at the spec I'm not even sure what the expected behavior is > > supposed to be, can you elaborate? I rebased this on latest, and with > > Hannes changes, shared namespaces will be attached by default, which > > seems to be reasonable. > > An example flow: > > # Release flexible resources from PF (assuming it’s /dev/nvme0) > nvme virt-mgmt -c 0 -r 0 -n 0 -a 1 /dev/nvme0 > nvme virt-mgmt -c 0 -r 1 -n 0 -a 1 /dev/nvme0 > echo 1 > /sys/class/nvme/nvme0/reset_controller > # Bind sane minimums to VF1 (cntlid=1) and set it online > nvme virt-mgmt -c 1 -r 0 -n 5 -a 8 /dev/nvme0 > nvme virt-mgmt -c 1 -r 1 -n 5 -a 8 /dev/nvme0 > nvme virt-mgmt -c 1 -a 9 /dev/nvme0 > # Enable 2 VFs > echo 2 > /sys/bus/pci/devices/<PF’s id>/sriov_numvfs > # PF, VF1 and VF2 are visible on PCI > lspci | grep Non-Volatile > # The NVMe driver is bound to PF and VF1 (the only online VF) > nvme list -v > # VFs shall eventually not support Ns Management/Attachment commands, > # and namespaces should be attached to VFs (i.e., their secondary > # controllers) through the PF. > # A namespace can be attached to VF1, VF2 > nvme attach-ns /dev/nvme0 -c 1 -n X > nvme attach-ns /dev/nvme0 -c 2 -n X > # According to the spec this should also succeed, but today it won’t > nvme attach-ns /dev/nvme0 -c 3 -n X > > VF3’s NvmeCtrl object is not yet allocated, so today there’s nothing > for nvme_subsys_ctrl() to return for cntlid=3, besides NULL (the > current behavior) or SUBSYS_SLOT_RSVD. > > Relevant use cases: > - admin can configure disabled VFs, > - information about attached ns persists when VFs are disabled, > are not that critical, but of course it’s a discrepancy from what a > real device can handle. > > In my opinion, to handle the cases correctly, information about attached > namespaces could be moved to subsystem. Could you share your thoughts > whether such approach would make sense? > Thanks for the explaination. I actually already had this sort-of conversation with Hannes[1]. Different issue, but same solution (that is, having a per-CNTLID namespace list maintained in the subsystem). I have a refactor series coming up that will address this, so for now, I don't worry about this not being implemented exactly as the spec defines. [1]: https://lore.kernel.org/all/20210909094308.122038-1-hare@suse.de/t/#u
On Mon, Dec 20, 2021 at 08:12:06AM +0100, Klaus Jensen wrote: > On Nov 25 15:15, Łukasz Gieryk wrote: > > On Wed, Nov 24, 2021 at 09:03:06AM +0100, Klaus Jensen wrote: > > > Hi Lukasz, > > > > > > I've been through this. I have a couple of review comments, but overall > > > looks good for inclusion in nvme-next. Would be nice to get this in > > > early in the cycle so it can mature there for 7.0. > > > > We (I’m speaking on behalf of the other Lukasz) are really happy to > > read that. We will do our best to make it happen. > > > > Keith, do you have any comments on this series - otherwise I'd like to > stage this for nvme-next come January. > Please, wait with the staging - we are about to release the v3. We have few fixes that will make the device more compliant with the NVMe and SR-IOV Spec.