Message ID | 20241011202929.11611-2-nifan.cxl@gmail.com |
---|---|
State | New |
Headers | show |
Series | [QEMU,RFC] hw/mem/cxl_type3: add guard to avoid event log overflow during a DC extent add/release request | expand |
On Fri, 11 Oct 2024 13:24:50 -0700 nifan.cxl@gmail.com wrote: > From: Fan Ni <fan.ni@samsung.com> > > One DC extent add/release request can take multiple DC extents. > For each extent in the request, one DC event record will be generated and > isnerted into the event log. All the event records for the request will be > grouped with the More flag (see CXL spec r3.1, Table 8-168 and 8-170). > If an overflow happens during the process, the yet-to-insert records will > get lost, leaving the device in a situation where it notifies the host > only part of the extents involved, and the host never surfacing the > extents received and waiting for the remaining extents. Interesting corner. For other 'events' an overflow is natural because they can be out of the control of the device. This artificial limit was to trigger the overflow handling in those cases. For this one I'd expect the device to push back on the fabric management commands, or handle the event log filling so overflow doesn't happen. > > Add a check in qmp_cxl_process_dynamic_capacity_prescriptive and ensure > the event log does not overflow during the process. > > Currently we check the number of extents involved with the event > overflow threshold, do we need to tight the check and compare with > the remaining spot available in the event log? Yes. I think we need to prevent other outstanding events causing us trouble. Is it useful to support the case where we have more than one group of extents outstanding? If not we could simply fail the add whenever that happens. Maybe that is a reasonable stop gap until we have a reason to care about that case. We probably care when we have FM-API hooked up to this and want to test more advanced fabric management stuff, or poke a corner of the kernel code perhaps? I guess from a 'would it be right if a device did this' the answer may be yes, but that doesn't mean Linux is going to support such a device (at least not until we know they really exist). Ira, what do you think about this corner case? Maybe detect and scream if we aren't already? Jonathan > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > --- > hw/cxl/cxl-events.c | 2 -- > hw/mem/cxl_type3.c | 7 +++++++ > include/hw/cxl/cxl_events.h | 3 +++ > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c > index 12dee2e467..05d8aae627 100644 > --- a/hw/cxl/cxl-events.c > +++ b/hw/cxl/cxl-events.c > @@ -16,8 +16,6 @@ > #include "hw/cxl/cxl.h" > #include "hw/cxl/cxl_events.h" > > -/* Artificial limit on the number of events a log can hold */ > -#define CXL_TEST_EVENT_OVERFLOW 8 > > static void reset_overflow(CXLEventLog *log) > { > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c > index 3d7289fa84..32668df365 100644 > --- a/hw/mem/cxl_type3.c > +++ b/hw/mem/cxl_type3.c > @@ -2015,6 +2015,13 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path, > num_extents++; > } > > + if (num_extents > CXL_TEST_EVENT_OVERFLOW) { > + error_setg(errp, > + "at most %d extents allowed in one add/release request", > + CXL_TEST_EVENT_OVERFLOW); > + return; > + } > + > /* Create extent list for event being passed to host */ > i = 0; > list = records; > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h > index 38cadaa0f3..2a6b57e3e6 100644 > --- a/include/hw/cxl/cxl_events.h > +++ b/include/hw/cxl/cxl_events.h > @@ -12,6 +12,9 @@ > > #include "qemu/uuid.h" > > +/* Artificial limit on the number of events a log can hold */ > +#define CXL_TEST_EVENT_OVERFLOW 8 > + > /* > * CXL r3.1 section 8.2.9.2.2: Get Event Records (Opcode 0100h); Table 8-52 > *
On Mon, Oct 14, 2024 at 12:23:22PM +0100, Jonathan Cameron wrote: > On Fri, 11 Oct 2024 13:24:50 -0700 > nifan.cxl@gmail.com wrote: > > > From: Fan Ni <fan.ni@samsung.com> > > > > One DC extent add/release request can take multiple DC extents. > > For each extent in the request, one DC event record will be generated and > > isnerted into the event log. All the event records for the request will be > > grouped with the More flag (see CXL spec r3.1, Table 8-168 and 8-170). > > If an overflow happens during the process, the yet-to-insert records will > > get lost, leaving the device in a situation where it notifies the host > > only part of the extents involved, and the host never surfacing the > > extents received and waiting for the remaining extents. > > Interesting corner. For other 'events' an overflow is natural because > they can be out of the control of the device. This artificial limit > was to trigger the overflow handling in those cases. For this one I'd expect > the device to push back on the fabric management commands, or handle the > event log filling so overflow doesn't happen. > > > > > Add a check in qmp_cxl_process_dynamic_capacity_prescriptive and ensure > > the event log does not overflow during the process. > > > > Currently we check the number of extents involved with the event > > overflow threshold, do we need to tight the check and compare with > > the remaining spot available in the event log? > > Yes. I think we need to prevent other outstanding events causing us trouble. > > Is it useful to support the case where we have more than one > group of extents outstanding? If not we could simply fail the add whenever > that happens. Maybe that is a reasonable stop gap until we have a reason > to care about that case. We probably care when we have FM-API hooked up > to this and want to test more advanced fabric management stuff, or poke > a corner of the kernel code perhaps? As long as the last record with More flag cleared put in the log, the kernel is able to handle it and clear the log after finishing processing. The only issue I can see now is the last event cannot be inserted into the log due to overflow, so i think as long as we have enough space to hold all the records of a request in the log, it would be enough, no matter the log already has some outstanding extents or not. > > I guess from a 'would it be right if a device did this' the answer may be > yes, but that doesn't mean Linux is going to support such a device > (at least not until we know they really exist). Ira, what do you think > about this corner case? Maybe detect and scream if we aren't already? Any thought, Ira? Fan > > Jonathan > > > > > Signed-off-by: Fan Ni <fan.ni@samsung.com> > > --- > > hw/cxl/cxl-events.c | 2 -- > > hw/mem/cxl_type3.c | 7 +++++++ > > include/hw/cxl/cxl_events.h | 3 +++ > > 3 files changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c > > index 12dee2e467..05d8aae627 100644 > > --- a/hw/cxl/cxl-events.c > > +++ b/hw/cxl/cxl-events.c > > @@ -16,8 +16,6 @@ > > #include "hw/cxl/cxl.h" > > #include "hw/cxl/cxl_events.h" > > > > -/* Artificial limit on the number of events a log can hold */ > > -#define CXL_TEST_EVENT_OVERFLOW 8 > > > > static void reset_overflow(CXLEventLog *log) > > { > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c > > index 3d7289fa84..32668df365 100644 > > --- a/hw/mem/cxl_type3.c > > +++ b/hw/mem/cxl_type3.c > > @@ -2015,6 +2015,13 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path, > > num_extents++; > > } > > > > + if (num_extents > CXL_TEST_EVENT_OVERFLOW) { > > + error_setg(errp, > > + "at most %d extents allowed in one add/release request", > > + CXL_TEST_EVENT_OVERFLOW); > > + return; > > + } > > + > > /* Create extent list for event being passed to host */ > > i = 0; > > list = records; > > diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h > > index 38cadaa0f3..2a6b57e3e6 100644 > > --- a/include/hw/cxl/cxl_events.h > > +++ b/include/hw/cxl/cxl_events.h > > @@ -12,6 +12,9 @@ > > > > #include "qemu/uuid.h" > > > > +/* Artificial limit on the number of events a log can hold */ > > +#define CXL_TEST_EVENT_OVERFLOW 8 > > + > > /* > > * CXL r3.1 section 8.2.9.2.2: Get Event Records (Opcode 0100h); Table 8-52 > > * >
diff --git a/hw/cxl/cxl-events.c b/hw/cxl/cxl-events.c index 12dee2e467..05d8aae627 100644 --- a/hw/cxl/cxl-events.c +++ b/hw/cxl/cxl-events.c @@ -16,8 +16,6 @@ #include "hw/cxl/cxl.h" #include "hw/cxl/cxl_events.h" -/* Artificial limit on the number of events a log can hold */ -#define CXL_TEST_EVENT_OVERFLOW 8 static void reset_overflow(CXLEventLog *log) { diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c index 3d7289fa84..32668df365 100644 --- a/hw/mem/cxl_type3.c +++ b/hw/mem/cxl_type3.c @@ -2015,6 +2015,13 @@ static void qmp_cxl_process_dynamic_capacity_prescriptive(const char *path, num_extents++; } + if (num_extents > CXL_TEST_EVENT_OVERFLOW) { + error_setg(errp, + "at most %d extents allowed in one add/release request", + CXL_TEST_EVENT_OVERFLOW); + return; + } + /* Create extent list for event being passed to host */ i = 0; list = records; diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h index 38cadaa0f3..2a6b57e3e6 100644 --- a/include/hw/cxl/cxl_events.h +++ b/include/hw/cxl/cxl_events.h @@ -12,6 +12,9 @@ #include "qemu/uuid.h" +/* Artificial limit on the number of events a log can hold */ +#define CXL_TEST_EVENT_OVERFLOW 8 + /* * CXL r3.1 section 8.2.9.2.2: Get Event Records (Opcode 0100h); Table 8-52 *