Message ID | 20240205103602epcms2p8543d4f3a4bfb684c81f07a94627c7aef@epcms2p8 |
---|---|
State | New, archived |
Headers | show |
Series | cxl: question about cxl qos_class verification | expand |
On 2/5/24 3:36 AM, Wonjae Lee wrote: > Hello, > > To test the CXL driver with respect to QTG IDs on a real CXL device, I > connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3). > > However, during cxl endpoint probing, CDAT extraction and parsing works > fine, but cxl_qos_class_verify() for cxlmd does not run properly. > > To be precise, when cxl_qos_class_verify() is executed, the below error > handling code is executed since cxlmd->endpoint is NULL: > > if (!cxl_root) > return -ENODEV; > > > I'm not sure if I analyzed it correctly due to the complexity of the CXL > driver, but I think it's because the cxl_port driver execute > cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in > the cxl_mem driver. See the dmesg log below, where I've added debugging > code. > > # cxl_mem driver is adding the endpoint > [] cxl_mem mem0: call devm_cxl_add_enpoint > ... > # endpoint port is probed, and cxl_qos_class_verify() runs > [] cxl_port endpoint5: call cxl_qos_class_verify > [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL > [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL > ... > # cxl_mem driver sets cxlmd->endpoint > [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint > ... > > > I did an experiment to validate the hypothesis. If I call > cxl_endpoint_parse_cdat() after cxlmd->endpoint is set, > cxl_qos_class_verify() runs well without problems. > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > index c5c9d8e0d88d..33b39c6c46fd 100644 > --- a/drivers/cxl/mem.c > +++ b/drivers/cxl/mem.c > @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, > if (rc) > return rc; > > + cxl_endpoint_parse_cdat(endpoint); > + > if (!endpoint->dev.driver) { > dev_err(&cxlmd->dev, "%s failed probe\n", > dev_name(&endpoint->dev)); > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c > index 97c21566677a..ee77aba62780 100644 > --- a/drivers/cxl/port.c > +++ b/drivers/cxl/port.c > @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) > > /* Cache the data early to ensure is_visible() works */ > read_cdat_data(port); > - cxl_endpoint_parse_cdat(port); > > get_device(&cxlmd->dev); > rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd); > > > Maybe there's something I'm missing. It would be very helpful if anyone > could comment on the above analysis. I think this should fix the issue you are seeing? https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039 > > Thanks, > Wonjae
On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote: > > > On 2/5/24 3:36 AM, Wonjae Lee wrote: > > Hello, > > > > To test the CXL driver with respect to QTG IDs on a real CXL device, I > > connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3). > > > > However, during cxl endpoint probing, CDAT extraction and parsing works > > fine, but cxl_qos_class_verify() for cxlmd does not run properly. > > > > To be precise, when cxl_qos_class_verify() is executed, the below error > > handling code is executed since cxlmd->endpoint is NULL: > > > > if (!cxl_root) > > return -ENODEV; > > > > > > I'm not sure if I analyzed it correctly due to the complexity of the CXL > > driver, but I think it's because the cxl_port driver execute > > cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in > > the cxl_mem driver. See the dmesg log below, where I've added debugging > > code. > > > > # cxl_mem driver is adding the endpoint > > [] cxl_mem mem0: call devm_cxl_add_enpoint > > ... > > # endpoint port is probed, and cxl_qos_class_verify() runs > > [] cxl_port endpoint5: call cxl_qos_class_verify > > [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL > > [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL > > ... > > # cxl_mem driver sets cxlmd->endpoint > > [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint > > ... > > > > > > I did an experiment to validate the hypothesis. If I call > > cxl_endpoint_parse_cdat() after cxlmd->endpoint is set, > > cxl_qos_class_verify() runs well without problems. > > > > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > > index c5c9d8e0d88d..33b39c6c46fd 100644 > > --- a/drivers/cxl/mem.c > > +++ b/drivers/cxl/mem.c > > @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, > > if (rc) > > return rc; > > > > + cxl_endpoint_parse_cdat(endpoint); > > + > > if (!endpoint->dev.driver) { > > dev_err(&cxlmd->dev, "%s failed probe\n", > > dev_name(&endpoint->dev)); > > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c > > index 97c21566677a..ee77aba62780 100644 > > --- a/drivers/cxl/port.c > > +++ b/drivers/cxl/port.c > > @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) > > > > /* Cache the data early to ensure is_visible() works */ > > read_cdat_data(port); > > - cxl_endpoint_parse_cdat(port); > > > > get_device(&cxlmd->dev); > > rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd); > > > > > > Maybe there's something I'm missing. It would be very helpful if anyone > > could comment on the above analysis. > > I think this should fix the issue you are seeing? > > https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039 > Oh, you've already fixed it. I found that on the same testbed the commit you mentioned resolves the issue. Thanks for your response! > > > > > Thanks, > > Wonjae >
On 2/5/24 6:23 PM, Wonjae Lee wrote: > On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote: >> >> >> On 2/5/24 3:36 AM, Wonjae Lee wrote: >>> Hello, >>> >>> To test the CXL driver with respect to QTG IDs on a real CXL device, I >>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3). >>> >>> However, during cxl endpoint probing, CDAT extraction and parsing works >>> fine, but cxl_qos_class_verify() for cxlmd does not run properly. >>> >>> To be precise, when cxl_qos_class_verify() is executed, the below error >>> handling code is executed since cxlmd->endpoint is NULL: >>> >>> if (!cxl_root) >>> return -ENODEV; >>> >>> >>> I'm not sure if I analyzed it correctly due to the complexity of the CXL >>> driver, but I think it's because the cxl_port driver execute >>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in >>> the cxl_mem driver. See the dmesg log below, where I've added debugging >>> code. >>> >>> # cxl_mem driver is adding the endpoint >>> [] cxl_mem mem0: call devm_cxl_add_enpoint >>> ... >>> # endpoint port is probed, and cxl_qos_class_verify() runs >>> [] cxl_port endpoint5: call cxl_qos_class_verify >>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL >>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL >>> ... >>> # cxl_mem driver sets cxlmd->endpoint >>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint >>> ... >>> >>> >>> I did an experiment to validate the hypothesis. If I call >>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set, >>> cxl_qos_class_verify() runs well without problems. >>> >>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c >>> index c5c9d8e0d88d..33b39c6c46fd 100644 >>> --- a/drivers/cxl/mem.c >>> +++ b/drivers/cxl/mem.c >>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, >>> if (rc) >>> return rc; >>> >>> + cxl_endpoint_parse_cdat(endpoint); >>> + >>> if (!endpoint->dev.driver) { >>> dev_err(&cxlmd->dev, "%s failed probe\n", >>> dev_name(&endpoint->dev)); >>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c >>> index 97c21566677a..ee77aba62780 100644 >>> --- a/drivers/cxl/port.c >>> +++ b/drivers/cxl/port.c >>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) >>> >>> /* Cache the data early to ensure is_visible() works */ >>> read_cdat_data(port); >>> - cxl_endpoint_parse_cdat(port); >>> >>> get_device(&cxlmd->dev); >>> rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd); >>> >>> >>> Maybe there's something I'm missing. It would be very helpful if anyone >>> could comment on the above analysis. >> >> I think this should fix the issue you are seeing? >> >> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039 >> > > Oh, you've already fixed it. I found that on the same testbed the commit > you mentioned resolves the issue. > > Thanks for your response! Ok if I add your Tested-by tag to that patch? > >> >>> >>> Thanks, >>> Wonjae >>
On Tue, Feb 06, 2024 at 08:24:04AM -0700, Dave Jiang wrote: > > > On 2/5/24 6:23 PM, Wonjae Lee wrote: > > On Mon, Feb 05, 2024 at 03:31:35PM -0700, Dave Jiang wrote: > >> > >> > >> On 2/5/24 3:36 AM, Wonjae Lee wrote: > >>> Hello, > >>> > >>> To test the CXL driver with respect to QTG IDs on a real CXL device, I > >>> connected one CXL device to a CXL 2.0 Compliant System (v6.8-rc3). > >>> > >>> However, during cxl endpoint probing, CDAT extraction and parsing works > >>> fine, but cxl_qos_class_verify() for cxlmd does not run properly. > >>> > >>> To be precise, when cxl_qos_class_verify() is executed, the below error > >>> handling code is executed since cxlmd->endpoint is NULL: > >>> > >>> if (!cxl_root) > >>> return -ENODEV; > >>> > >>> > >>> I'm not sure if I analyzed it correctly due to the complexity of the CXL > >>> driver, but I think it's because the cxl_port driver execute > >>> cxl_qos_class_verify() before cxlmd->endpoint = endpoint was executed in > >>> the cxl_mem driver. See the dmesg log below, where I've added debugging > >>> code. > >>> > >>> # cxl_mem driver is adding the endpoint > >>> [] cxl_mem mem0: call devm_cxl_add_enpoint > >>> ... > >>> # endpoint port is probed, and cxl_qos_class_verify() runs > >>> [] cxl_port endpoint5: call cxl_qos_class_verify > >>> [] cxl_mem mem0: cxl_qos_class_verify: cxlmd->endpoint is NULL > >>> [] cxl_mem mem0: cxl_qos_class_verify: cxl_root is NULL > >>> ... > >>> # cxl_mem driver sets cxlmd->endpoint > >>> [] cxl_mem mem0: cxl_endpoint_autoremove: cxlmd->endpoint = endpoint > >>> ... > >>> > >>> > >>> I did an experiment to validate the hypothesis. If I call > >>> cxl_endpoint_parse_cdat() after cxlmd->endpoint is set, > >>> cxl_qos_class_verify() runs well without problems. > >>> > >>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c > >>> index c5c9d8e0d88d..33b39c6c46fd 100644 > >>> --- a/drivers/cxl/mem.c > >>> +++ b/drivers/cxl/mem.c > >>> @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, > >>> if (rc) > >>> return rc; > >>> > >>> + cxl_endpoint_parse_cdat(endpoint); > >>> + > >>> if (!endpoint->dev.driver) { > >>> dev_err(&cxlmd->dev, "%s failed probe\n", > >>> dev_name(&endpoint->dev)); > >>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c > >>> index 97c21566677a..ee77aba62780 100644 > >>> --- a/drivers/cxl/port.c > >>> +++ b/drivers/cxl/port.c > >>> @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) > >>> > >>> /* Cache the data early to ensure is_visible() works */ > >>> read_cdat_data(port); > >>> - cxl_endpoint_parse_cdat(port); > >>> > >>> get_device(&cxlmd->dev); > >>> rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd); > >>> > >>> > >>> Maybe there's something I'm missing. It would be very helpful if anyone > >>> could comment on the above analysis. > >> > >> I think this should fix the issue you are seeing? > >> > >> https://lore.kernel.org/linux-cxl/b243e80f-1b24-4756-8bb3-8389d66ea13a@intel.com/T/#mcbce77b6584bd1031d6c1928fcb36fe67be66039 > >> > > > > Oh, you've already fixed it. I found that on the same testbed the commit > > you mentioned resolves the issue. > > > > Thanks for your response! > > Ok if I add your Tested-by tag to that patch? Oh, sure. Feel free to add it: Tested-by: Wonjae Lee <wj28.lee@samsung.com> Thanks, Wonjae
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c index c5c9d8e0d88d..33b39c6c46fd 100644 --- a/drivers/cxl/mem.c +++ b/drivers/cxl/mem.c @@ -74,6 +74,8 @@ static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd, if (rc) return rc; + cxl_endpoint_parse_cdat(endpoint); + if (!endpoint->dev.driver) { dev_err(&cxlmd->dev, "%s failed probe\n", dev_name(&endpoint->dev)); diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 97c21566677a..ee77aba62780 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -111,7 +111,6 @@ static int cxl_endpoint_port_probe(struct cxl_port *port) /* Cache the data early to ensure is_visible() works */ read_cdat_data(port); - cxl_endpoint_parse_cdat(port); get_device(&cxlmd->dev); rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);