diff mbox series

[v2,2/3] nvme-hwmon: Kmalloc the NVME SMART log buffer

Message ID 20220929224648.8997-3-Sergey.Semin@baikalelectronics.ru (mailing list archive)
State New, archived
Headers show
Series block/nvme: Fix DMA-noncoherent platforms support | expand

Commit Message

Serge Semin Sept. 29, 2022, 10:46 p.m. UTC
Recent commit 52fde2c07da6 ("nvme: set dma alignment to dword") has caused
a regression on our platform. It turned out that the nvme_get_log() method
invocation caused the nvme_hwmon_data structure instance corruption. In
particular the nvme_hwmon_data.ctrl pointer was overwritten either with
zeros or with garbage. After some researches we discovered that the
problem happened even before the actual NVME DMA execution, but during the
buffer mapping. Since our platform was DMA-noncoherent the mapping implied
the cache-lines invalidations or write-backs depending on the
DMA-direction parameter. In case of the NVME SMART log getting the DMA was
performed from-device-to-memory, thus the cache-invalidation was activated
during the buffer mapping. Since the log-buffer wasn't cache-line aligned
the cache-invalidation caused the neighbour data discard. The neighbouring
data turned to be the data surrounding the buffer in the framework of the
nvme_hwmon_data structure.

In order to fix that we need to make sure that the whole log-buffer is
defined within the cache-line-aligned memory region so the
cache-invalidation procedure wouldn't involve the adjacent data. One of
the option to guarantee that is to kmalloc the DMA-buffer [1]. Seeing the
rest of the NVME core driver prefer that method it has been chosen to fix
this problem too.

Note after a deeper researches we found out that the denoted commit wasn't
a root cause of the problem. It just revealed the invalidity by activating
the DMA-based NVME SMART log getting performed in the framework of the
NVME hwmon driver. The problem was here since the initial commit of the
driver.

[1] Documentation/core-api/dma-api-howto.rst

Fixes: 400b6a7b13a3 ("nvme: Add hardware monitoring support")
Signed-off-by: Serge Semin <Sergey.Semin@baikalelectronics.ru>

---

Folks, I've thoroughly studied the whole NVME subsystem looking for
similar problems. Turned out there is one more place which may cause the
same issue. It's connected with the opal_dev.{cmd,req} buffers passed to
the nvme_sec_submit() method. The rest of the buffers involved in the NVME
DMA are either allocated by kmalloc (must be cache-line-aligned by design)
or bounced-buffered if allocated on the stack (see the blk_rq_map_kern()
method implementation). I am still not fully sure regarding the buffers
coming from user-space though, but AFAICS based on our
DMA-buffers-alignment sanity check procedure they haven't been detected as
cache-unaligned so far.

Changelog v2:
- Convert to allocating the nvme_smart_log structure instance instead of
  cache-aligning it. (@Christoph)
---
 drivers/nvme/host/hwmon.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Oct. 17, 2022, 7:18 a.m. UTC | #1
Thanks,

applied to nvme-6.1.
Serge Semin Oct. 17, 2022, 4:16 p.m. UTC | #2
Hello Christoph

On Mon, Oct 17, 2022 at 09:18:32AM +0200, Christoph Hellwig wrote:
> Thanks,
> 
> applied to nvme-6.1.

Please note the applied patch doesn't comply with the Keith' notes
Link: https://lore.kernel.org/linux-nvme/YzxueNRODpry8L0%2F@kbusch-mbp.dhcp.thefacebook.com/
Meanwhile without patch #1 (having only the accepted by you patch
applied) the NVME hwmon init now seems contradicting: it ignores one
kmalloc failure (returns zero) but fails on another one (returns
-ENOMEM). I asked you to have a look at the patches #1 and #2 of the
series
Link: https://lore.kernel.org/linux-nvme/20221007100134.faaekmuqyd5vy34m@mobilestation/
and give your opinion whether the re-spin was required: take the
Keith' notes or keep the patches as is. Could you please clarify the
situation?

-Sergey
Christoph Hellwig Oct. 18, 2022, 2:49 p.m. UTC | #3
On Mon, Oct 17, 2022 at 07:16:56PM +0300, Serge Semin wrote:
> Please note the applied patch doesn't comply with the Keith' notes
> Link: https://lore.kernel.org/linux-nvme/YzxueNRODpry8L0%2F@kbusch-mbp.dhcp.thefacebook.com/
> Meanwhile without patch #1 (having only the accepted by you patch
> applied) the NVME hwmon init now seems contradicting: it ignores one
> kmalloc failure (returns zero) but fails on another one (returns
> -ENOMEM). I asked you to have a look at the patches #1 and #2 of the
> series
> Link: https://lore.kernel.org/linux-nvme/20221007100134.faaekmuqyd5vy34m@mobilestation/
> and give your opinion whether the re-spin was required: take the
> Keith' notes or keep the patches as is. Could you please clarify the
> situation?

I'll fix this patch up to follow the recommendation from Keith, I somehow
thought this was already done.
diff mbox series

Patch

diff --git a/drivers/nvme/host/hwmon.c b/drivers/nvme/host/hwmon.c
index 1afb24a64145..654309767e76 100644
--- a/drivers/nvme/host/hwmon.c
+++ b/drivers/nvme/host/hwmon.c
@@ -12,7 +12,7 @@ 
 
 struct nvme_hwmon_data {
 	struct nvme_ctrl *ctrl;
-	struct nvme_smart_log log;
+	struct nvme_smart_log *log;
 	struct mutex read_lock;
 };
 
@@ -60,14 +60,14 @@  static int nvme_set_temp_thresh(struct nvme_ctrl *ctrl, int sensor, bool under,
 static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
 {
 	return nvme_get_log(data->ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
-			   NVME_CSI_NVM, &data->log, sizeof(data->log), 0);
+			   NVME_CSI_NVM, data->log, sizeof(*data->log), 0);
 }
 
 static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
 			   u32 attr, int channel, long *val)
 {
 	struct nvme_hwmon_data *data = dev_get_drvdata(dev);
-	struct nvme_smart_log *log = &data->log;
+	struct nvme_smart_log *log = data->log;
 	int temp;
 	int err;
 
@@ -163,7 +163,7 @@  static umode_t nvme_hwmon_is_visible(const void *_data,
 	case hwmon_temp_max:
 	case hwmon_temp_min:
 		if ((!channel && data->ctrl->wctemp) ||
-		    (channel && data->log.temp_sensor[channel - 1])) {
+		    (channel && data->log->temp_sensor[channel - 1])) {
 			if (data->ctrl->quirks &
 			    NVME_QUIRK_NO_TEMP_THRESH_CHANGE)
 				return 0444;
@@ -176,7 +176,7 @@  static umode_t nvme_hwmon_is_visible(const void *_data,
 		break;
 	case hwmon_temp_input:
 	case hwmon_temp_label:
-		if (!channel || data->log.temp_sensor[channel - 1])
+		if (!channel || data->log->temp_sensor[channel - 1])
 			return 0444;
 		break;
 	default:
@@ -232,14 +232,19 @@  int nvme_hwmon_init(struct nvme_ctrl *ctrl)
 	if (!data)
 		return -ENOMEM;
 
+	data->log = kzalloc(sizeof(*data->log), GFP_KERNEL);
+	if (!data->log) {
+		err = -ENOMEM;
+		goto err_free_data;
+	}
+
 	data->ctrl = ctrl;
 	mutex_init(&data->read_lock);
 
 	err = nvme_hwmon_get_smart_log(data);
 	if (err) {
 		dev_warn(dev, "Failed to read smart log (error %d)\n", err);
-		kfree(data);
-		return err;
+		goto err_free_log;
 	}
 
 	hwmon = hwmon_device_register_with_info(dev, "nvme",
@@ -247,11 +252,19 @@  int nvme_hwmon_init(struct nvme_ctrl *ctrl)
 						NULL);
 	if (IS_ERR(hwmon)) {
 		dev_warn(dev, "Failed to instantiate hwmon device\n");
-		kfree(data);
-		return PTR_ERR(hwmon);
+		err = PTR_ERR(hwmon);
+		goto err_free_log;
 	}
 	ctrl->hwmon_device = hwmon;
 	return 0;
+
+err_free_log:
+	kfree(data->log);
+
+err_free_data:
+	kfree(data);
+
+	return err;
 }
 
 void nvme_hwmon_exit(struct nvme_ctrl *ctrl)
@@ -262,6 +275,7 @@  void nvme_hwmon_exit(struct nvme_ctrl *ctrl)
 
 		hwmon_device_unregister(ctrl->hwmon_device);
 		ctrl->hwmon_device = NULL;
+		kfree(data->log);
 		kfree(data);
 	}
 }