diff mbox

[1/3] virtio: Basic implementation of virtio pstore driver

Message ID 20160904143900.14850-2-namhyung@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Namhyung Kim Sept. 4, 2016, 2:38 p.m. UTC
The virtio pstore driver provides interface to the pstore subsystem so
that the guest kernel's log/dump message can be saved on the host
machine.  Users can access the log file directly on the host, or on the
guest at the next boot using pstore filesystem.  It currently deals with
kernel log (printk) buffer only, but we can extend it to have other
information (like ftrace dump) later.

It supports legacy PCI device using a 16K buffer by default and it's
configurable.  It uses two virtqueues - one for (sync) read and another
for (async) write.  Since it cannot wait for write finished, it supports
up to 128 concurrent IO.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Anthony Liguori <aliguori@amazon.com>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: virtio-dev@lists.oasis-open.org
Cc: kvm@vger.kernel.org
Cc: qemu-devel@nongnu.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 drivers/virtio/Kconfig             |  10 +
 drivers/virtio/Makefile            |   1 +
 drivers/virtio/virtio_pstore.c     | 417 +++++++++++++++++++++++++++++++++++++
 include/uapi/linux/Kbuild          |   1 +
 include/uapi/linux/virtio_ids.h    |   1 +
 include/uapi/linux/virtio_pstore.h |  74 +++++++
 6 files changed, 504 insertions(+)
 create mode 100644 drivers/virtio/virtio_pstore.c
 create mode 100644 include/uapi/linux/virtio_pstore.h

Comments

Kees Cook Sept. 8, 2016, 8:49 p.m. UTC | #1
On Sun, Sep 4, 2016 at 7:38 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> The virtio pstore driver provides interface to the pstore subsystem so
> that the guest kernel's log/dump message can be saved on the host
> machine.  Users can access the log file directly on the host, or on the
> guest at the next boot using pstore filesystem.  It currently deals with
> kernel log (printk) buffer only, but we can extend it to have other
> information (like ftrace dump) later.
>
> It supports legacy PCI device using a 16K buffer by default and it's
> configurable.  It uses two virtqueues - one for (sync) read and another
> for (async) write.  Since it cannot wait for write finished, it supports
> up to 128 concurrent IO.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Anthony Liguori <aliguori@amazon.com>
> Cc: Anton Vorontsov <anton@enomsg.org>
> Cc: Colin Cross <ccross@android.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: virtio-dev@lists.oasis-open.org
> Cc: kvm@vger.kernel.org
> Cc: qemu-devel@nongnu.org
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

While I can't speak to the virtio parts, the interface into pstore
looks fine to me. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees
Stefan Hajnoczi Sept. 22, 2016, 11:57 a.m. UTC | #2
On Sun, Sep 04, 2016 at 11:38:58PM +0900, Namhyung Kim wrote:
> The virtio pstore driver provides interface to the pstore subsystem so
> that the guest kernel's log/dump message can be saved on the host
> machine.  Users can access the log file directly on the host, or on the
> guest at the next boot using pstore filesystem.  It currently deals with
> kernel log (printk) buffer only, but we can extend it to have other
> information (like ftrace dump) later.
> 
> It supports legacy PCI device using a 16K buffer by default and it's
> configurable.  It uses two virtqueues - one for (sync) read and another
> for (async) write.  Since it cannot wait for write finished, it supports
> up to 128 concurrent IO.

Please document the locks that this code relies on.  It is generally not
safe to call virtqueue_*() from multiple threads.  I also wonder about
locking for virtio_pstore->req_id and other fields.  Are locks missing
or is something in pstore ensuring safety?

> +static int virt_pstore_open(struct pstore_info *psi)
> +{
> +	struct virtio_pstore *vps = psi->data;
> +	struct virtio_pstore_req *req;
> +	struct virtio_pstore_res *res;
> +	struct scatterlist sgo[1], sgi[1];
> +	struct scatterlist *sgs[2] = { sgo, sgi };
> +	unsigned int len;
> +
> +	virt_pstore_get_reqs(vps, &req, &res);
> +
> +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_OPEN);
> +
> +	sg_init_one(sgo, req, sizeof(*req));
> +	sg_init_one(sgi, res, sizeof(*res));
> +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> +	virtqueue_kick(vps->vq[0]);
> +
> +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> +	return le32_to_cpu(res->ret);

This assumes the device puts compatible Linux errno values in res->ret.
The function doesn't need to return -errno if I'm reading fs/pstore/
code correctly.  You could return -1 on error to avoid making this
assumption.  The same applies to other res->ret usage below.

> +}
> +
> +static int virt_pstore_close(struct pstore_info *psi)
> +{
> +	struct virtio_pstore *vps = psi->data;
> +	struct virtio_pstore_req *req = &vps->req[vps->req_id];
> +	struct virtio_pstore_res *res = &vps->res[vps->req_id];

Assigning &vps->req[vps->req_id]/&vps->res[vps->req_id] is unnecessary,
virt_pstore_get_reqs() handles that below.

> +	struct scatterlist sgo[1], sgi[1];
> +	struct scatterlist *sgs[2] = { sgo, sgi };
> +	unsigned int len;
> +
> +	virt_pstore_get_reqs(vps, &req, &res);
> +
> +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_CLOSE);
> +
> +	sg_init_one(sgo, req, sizeof(*req));
> +	sg_init_one(sgi, res, sizeof(*res));
> +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> +	virtqueue_kick(vps->vq[0]);
> +
> +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> +	return le32_to_cpu(res->ret);
> +}
> +
> +static ssize_t virt_pstore_read(u64 *id, enum pstore_type_id *type,
> +				int *count, struct timespec *time,
> +				char **buf, bool *compressed,
> +				ssize_t *ecc_notice_size,
> +				struct pstore_info *psi)
> +{
> +	struct virtio_pstore *vps = psi->data;
> +	struct virtio_pstore_req *req;
> +	struct virtio_pstore_res *res;
> +	struct virtio_pstore_fileinfo info;
> +	struct scatterlist sgo[1], sgi[3];
> +	struct scatterlist *sgs[2] = { sgo, sgi };
> +	unsigned int len;
> +	unsigned int flags;
> +	int ret;
> +	void *bf;
> +
> +	virt_pstore_get_reqs(vps, &req, &res);
> +
> +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_READ);
> +
> +	sg_init_one(sgo, req, sizeof(*req));
> +	sg_init_table(sgi, 3);
> +	sg_set_buf(&sgi[0], res, sizeof(*res));
> +	sg_set_buf(&sgi[1], &info, sizeof(info));
> +	sg_set_buf(&sgi[2], psi->buf, psi->bufsize);
> +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> +	virtqueue_kick(vps->vq[0]);
> +
> +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> +	if (len < sizeof(*res) + sizeof(info))
> +		return -1;
> +
> +	ret = le32_to_cpu(res->ret);
> +	if (ret < 0)
> +		return ret;
> +
> +	len = le32_to_cpu(info.len);

We trust the device but a length check would be nice here to be clear
that the memcpy() below is always safe:

if (len > psi->bufsize)
    return -1;

> +
> +	bf = kmalloc(len, GFP_KERNEL);
> +	if (bf == NULL)
> +		return -ENOMEM;

There's no point in returning -ENOMEM if we return -1 and res->ret
above.  The caller has no way of knowing the true meaning of the return
value.  I would return -1 to be clear that there are no error constants.

> +
> +	*id    = le64_to_cpu(info.id);
> +	*type  = from_virtio_type(info.type);
> +	*count = le32_to_cpu(info.count);
> +
> +	flags = le32_to_cpu(info.flags);
> +	*compressed = flags & VIRTIO_PSTORE_FL_COMPRESSED;
> +
> +	time->tv_sec  = le64_to_cpu(info.time_sec);
> +	time->tv_nsec = le32_to_cpu(info.time_nsec);
> +
> +	memcpy(bf, psi->buf, len);
> +	*buf = bf;
> +
> +	return len;
> +}
> +
> +static int notrace virt_pstore_write(enum pstore_type_id type,
> +				     enum kmsg_dump_reason reason,
> +				     u64 *id, unsigned int part, int count,
> +				     bool compressed, size_t size,
> +				     struct pstore_info *psi)
> +{
> +	struct virtio_pstore *vps = psi->data;
> +	struct virtio_pstore_req *req;
> +	struct virtio_pstore_res *res;
> +	struct scatterlist sgo[2], sgi[1];
> +	struct scatterlist *sgs[2] = { sgo, sgi };
> +	unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
> +
> +	if (vps->failed)
> +		return -1;
> +
> +	*id = vps->req_id;
> +	virt_pstore_get_reqs(vps, &req, &res);

This function does not wait for a response from the device so you cannot
call virt_pstore_get_reqs() without risk of reusing an in-flight buffer.

> +
> +	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_WRITE);
> +	req->type  = to_virtio_type(type);
> +	req->flags = cpu_to_le32(flags);
> +
> +	sg_init_table(sgo, 2);
> +	sg_set_buf(&sgo[0], req, sizeof(*req));
> +	sg_set_buf(&sgo[1], psi->buf, size);
> +	sg_init_one(sgi, res, sizeof(*res));
> +	virtqueue_add_sgs(vps->vq[1], sgs, 1, 1, vps, GFP_ATOMIC);

If we don't wait for request completion then virtqueue_add_sgs() could
fail here if the virtqueue is already full.

> +	virtqueue_kick(vps->vq[1]);
> +
> +	return 0;
> +}
> +
> +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
> +			     struct timespec time, struct pstore_info *psi)
> +{
> +	struct virtio_pstore *vps = psi->data;
> +	struct virtio_pstore_req *req;
> +	struct virtio_pstore_res *res;
> +	struct scatterlist sgo[1], sgi[1];
> +	struct scatterlist *sgs[2] = { sgo, sgi };
> +	unsigned int len;
> +
> +	virt_pstore_get_reqs(vps, &req, &res);
> +
> +	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_ERASE);
> +	req->type  = to_virtio_type(type);
> +	req->id	   = cpu_to_le64(id);
> +	req->count = cpu_to_le32(count);
> +
> +	sg_init_one(sgo, req, sizeof(*req));
> +	sg_init_one(sgi, res, sizeof(*res));
> +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> +	virtqueue_kick(vps->vq[0]);

Erase commands are sent on the "read" virtqueue, not the "write"
virtqueue?

> +
> +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> +	return le32_to_cpu(res->ret);
> +}
> +
> +static int virt_pstore_init(struct virtio_pstore *vps)
> +{
> +	struct pstore_info *psinfo = &vps->pstore;
> +	int err;
> +
> +	if (!psinfo->bufsize)
> +		psinfo->bufsize = VIRT_PSTORE_BUFSIZE;
> +
> +	psinfo->buf = alloc_pages_exact(psinfo->bufsize, GFP_KERNEL);
> +	if (!psinfo->buf) {
> +		pr_err("cannot allocate pstore buffer\n");
> +		return -ENOMEM;
> +	}
> +
> +	psinfo->owner = THIS_MODULE;
> +	psinfo->name  = "virtio";
> +	psinfo->open  = virt_pstore_open;
> +	psinfo->close = virt_pstore_close;
> +	psinfo->read  = virt_pstore_read;
> +	psinfo->erase = virt_pstore_erase;
> +	psinfo->write = virt_pstore_write;
> +	psinfo->flags = PSTORE_FLAGS_FRAGILE;
> +
> +	psinfo->data  = vps;
> +	spin_lock_init(&psinfo->buf_lock);
> +
> +	err = pstore_register(psinfo);
> +	if (err)
> +		kfree(psinfo->buf);

Should this be free_pages_exact() instead of kfree()?

Is it safe to call pstore_register() before the remaining initialization
steps?

My understanding is that if pstore is already mounted then requests will
immediately be sent.  In order to support this we'd need to initialize
everything else first before calling pstore_register().

> +
> +	return err;
> +}
> +
> +static int virt_pstore_exit(struct virtio_pstore *vps)
> +{
> +	struct pstore_info *psinfo = &vps->pstore;
> +
> +	pstore_unregister(psinfo);
> +
> +	free_pages_exact(psinfo->buf, psinfo->bufsize);
> +	psinfo->buf = NULL;
> +	psinfo->bufsize = 0;
> +
> +	return 0;
> +}
> +
> +static int virtpstore_init_vqs(struct virtio_pstore *vps)
> +{
> +	vq_callback_t *callbacks[] = { virtpstore_ack, virtpstore_check };
> +	const char *names[] = { "pstore_read", "pstore_write" };
> +
> +	return vps->vdev->config->find_vqs(vps->vdev, 2, vps->vq,
> +					   callbacks, names);
> +}
> +
> +static void virtpstore_init_config(struct virtio_pstore *vps)
> +{
> +	u32 bufsize;
> +
> +	virtio_cread(vps->vdev, struct virtio_pstore_config, bufsize, &bufsize);
> +
> +	vps->pstore.bufsize = PAGE_ALIGN(bufsize);
> +}
> +
> +static void virtpstore_confirm_config(struct virtio_pstore *vps)
> +{
> +	u32 bufsize = vps->pstore.bufsize;
> +
> +	virtio_cwrite(vps->vdev, struct virtio_pstore_config, bufsize,
> +		     &bufsize);
> +}
> +
> +static int virtpstore_probe(struct virtio_device *vdev)
> +{
> +	struct virtio_pstore *vps;
> +	int err;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "driver init: config access disabled\n");
> +		return -EINVAL;
> +	}
> +
> +	vdev->priv = vps = kzalloc(sizeof(*vps), GFP_KERNEL);
> +	if (!vps) {
> +		err = -ENOMEM;
> +		goto out;
> +	}
> +	vps->vdev = vdev;
> +
> +	err = virtpstore_init_vqs(vps);
> +	if (err < 0)
> +		goto out_free;
> +
> +	virtpstore_init_config(vps);
> +
> +	err = virt_pstore_init(vps);
> +	if (err)
> +		goto out_del_vq;
> +
> +	virtpstore_confirm_config(vps);
> +
> +	init_waitqueue_head(&vps->acked);
> +
> +	virtio_device_ready(vdev);

This call is needed if the .probe() function uses virtqueues before
returning.  Right now it either doesn't use the virtqueues or it has
already used them in virt_pstore_init().  Please move this before
pstore_register().
Namhyung Kim Sept. 23, 2016, 5:48 a.m. UTC | #3
Hi Stefan,

On Thu, Sep 22, 2016 at 12:57:44PM +0100, Stefan Hajnoczi wrote:
> On Sun, Sep 04, 2016 at 11:38:58PM +0900, Namhyung Kim wrote:
> > The virtio pstore driver provides interface to the pstore subsystem so
> > that the guest kernel's log/dump message can be saved on the host
> > machine.  Users can access the log file directly on the host, or on the
> > guest at the next boot using pstore filesystem.  It currently deals with
> > kernel log (printk) buffer only, but we can extend it to have other
> > information (like ftrace dump) later.
> > 
> > It supports legacy PCI device using a 16K buffer by default and it's
> > configurable.  It uses two virtqueues - one for (sync) read and another
> > for (async) write.  Since it cannot wait for write finished, it supports
> > up to 128 concurrent IO.
> 
> Please document the locks that this code relies on.  It is generally not
> safe to call virtqueue_*() from multiple threads.  I also wonder about
> locking for virtio_pstore->req_id and other fields.  Are locks missing
> or is something in pstore ensuring safety?

Ok, I should use atomic inc for pstore->req_id.  The open-read-close
are serialized by the read_mutex of pstore_info.  Write can happend
anytime so I gave it a dedicate queue.

Erase is a problem though, normally it's only doable after mount
operation is done so no contention to the open-read-close.  But if the
pstore_update_ms is set, timer routine can schedule a work to do the
open-read-close loop which might contend to erase.

I'm not sure how useful pstore_update_ms is and the descriptoin saids

  "milliseconds before pstore updates its content "
  "(default is -1, which means runtime updates are disabled; "
  "enabling this option is not safe, it may lead to further "
  "corruption on Oopses)")


> 
> > +static int virt_pstore_open(struct pstore_info *psi)
> > +{
> > +	struct virtio_pstore *vps = psi->data;
> > +	struct virtio_pstore_req *req;
> > +	struct virtio_pstore_res *res;
> > +	struct scatterlist sgo[1], sgi[1];
> > +	struct scatterlist *sgs[2] = { sgo, sgi };
> > +	unsigned int len;
> > +
> > +	virt_pstore_get_reqs(vps, &req, &res);
> > +
> > +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_OPEN);
> > +
> > +	sg_init_one(sgo, req, sizeof(*req));
> > +	sg_init_one(sgi, res, sizeof(*res));
> > +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> > +	virtqueue_kick(vps->vq[0]);
> > +
> > +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> > +	return le32_to_cpu(res->ret);
> 
> This assumes the device puts compatible Linux errno values in res->ret.
> The function doesn't need to return -errno if I'm reading fs/pstore/
> code correctly.  You could return -1 on error to avoid making this
> assumption.  The same applies to other res->ret usage below.

Ok.

> 
> > +}
> > +
> > +static int virt_pstore_close(struct pstore_info *psi)
> > +{
> > +	struct virtio_pstore *vps = psi->data;
> > +	struct virtio_pstore_req *req = &vps->req[vps->req_id];
> > +	struct virtio_pstore_res *res = &vps->res[vps->req_id];
> 
> Assigning &vps->req[vps->req_id]/&vps->res[vps->req_id] is unnecessary,
> virt_pstore_get_reqs() handles that below.

Ah, right.

> 
> > +	struct scatterlist sgo[1], sgi[1];
> > +	struct scatterlist *sgs[2] = { sgo, sgi };
> > +	unsigned int len;
> > +
> > +	virt_pstore_get_reqs(vps, &req, &res);
> > +
> > +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_CLOSE);
> > +
> > +	sg_init_one(sgo, req, sizeof(*req));
> > +	sg_init_one(sgi, res, sizeof(*res));
> > +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> > +	virtqueue_kick(vps->vq[0]);
> > +
> > +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> > +	return le32_to_cpu(res->ret);
> > +}
> > +
> > +static ssize_t virt_pstore_read(u64 *id, enum pstore_type_id *type,
> > +				int *count, struct timespec *time,
> > +				char **buf, bool *compressed,
> > +				ssize_t *ecc_notice_size,
> > +				struct pstore_info *psi)
> > +{
> > +	struct virtio_pstore *vps = psi->data;
> > +	struct virtio_pstore_req *req;
> > +	struct virtio_pstore_res *res;
> > +	struct virtio_pstore_fileinfo info;
> > +	struct scatterlist sgo[1], sgi[3];
> > +	struct scatterlist *sgs[2] = { sgo, sgi };
> > +	unsigned int len;
> > +	unsigned int flags;
> > +	int ret;
> > +	void *bf;
> > +
> > +	virt_pstore_get_reqs(vps, &req, &res);
> > +
> > +	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_READ);
> > +
> > +	sg_init_one(sgo, req, sizeof(*req));
> > +	sg_init_table(sgi, 3);
> > +	sg_set_buf(&sgi[0], res, sizeof(*res));
> > +	sg_set_buf(&sgi[1], &info, sizeof(info));
> > +	sg_set_buf(&sgi[2], psi->buf, psi->bufsize);
> > +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> > +	virtqueue_kick(vps->vq[0]);
> > +
> > +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> > +	if (len < sizeof(*res) + sizeof(info))
> > +		return -1;
> > +
> > +	ret = le32_to_cpu(res->ret);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	len = le32_to_cpu(info.len);
> 
> We trust the device but a length check would be nice here to be clear
> that the memcpy() below is always safe:
> 
> if (len > psi->bufsize)
>     return -1;

Good point.

> 
> > +
> > +	bf = kmalloc(len, GFP_KERNEL);
> > +	if (bf == NULL)
> > +		return -ENOMEM;
> 
> There's no point in returning -ENOMEM if we return -1 and res->ret
> above.  The caller has no way of knowing the true meaning of the return
> value.  I would return -1 to be clear that there are no error constants.

Ok.

> 
> > +
> > +	*id    = le64_to_cpu(info.id);
> > +	*type  = from_virtio_type(info.type);
> > +	*count = le32_to_cpu(info.count);
> > +
> > +	flags = le32_to_cpu(info.flags);
> > +	*compressed = flags & VIRTIO_PSTORE_FL_COMPRESSED;
> > +
> > +	time->tv_sec  = le64_to_cpu(info.time_sec);
> > +	time->tv_nsec = le32_to_cpu(info.time_nsec);
> > +
> > +	memcpy(bf, psi->buf, len);
> > +	*buf = bf;
> > +
> > +	return len;
> > +}
> > +
> > +static int notrace virt_pstore_write(enum pstore_type_id type,
> > +				     enum kmsg_dump_reason reason,
> > +				     u64 *id, unsigned int part, int count,
> > +				     bool compressed, size_t size,
> > +				     struct pstore_info *psi)
> > +{
> > +	struct virtio_pstore *vps = psi->data;
> > +	struct virtio_pstore_req *req;
> > +	struct virtio_pstore_res *res;
> > +	struct scatterlist sgo[2], sgi[1];
> > +	struct scatterlist *sgs[2] = { sgo, sgi };
> > +	unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
> > +
> > +	if (vps->failed)
> > +		return -1;
> > +
> > +	*id = vps->req_id;
> > +	virt_pstore_get_reqs(vps, &req, &res);
> 
> This function does not wait for a response from the device so you cannot
> call virt_pstore_get_reqs() without risk of reusing an in-flight buffer.

Right.  This part is debatable IMHO.

Generally pstore is used to dump oops buffer when kernel is dying.
This is an occasional event so I think it's ok with the current code.

The problematic case is when other writers (console, pmsg or ftrace)
are enabled.  They might contend with the oops, but pstore serializes
them with a spinlock.  But as you said it still has a risk of reusing
the in-flight buffer.  I think it's okay that oops overwriting other,
but the reverse is not good.  I tried to mitigate the problem by
managing the buffer position to spread out the write but it's not a
complete solution.

I thought simply stopping the guest when writing oops, or we might
create a way to tell pstore to stop other writers (until writing oops
finished) somehow.

Kees, what do you think?

> 
> > +
> > +	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_WRITE);
> > +	req->type  = to_virtio_type(type);
> > +	req->flags = cpu_to_le32(flags);
> > +
> > +	sg_init_table(sgo, 2);
> > +	sg_set_buf(&sgo[0], req, sizeof(*req));
> > +	sg_set_buf(&sgo[1], psi->buf, size);
> > +	sg_init_one(sgi, res, sizeof(*res));
> > +	virtqueue_add_sgs(vps->vq[1], sgs, 1, 1, vps, GFP_ATOMIC);
> 
> If we don't wait for request completion then virtqueue_add_sgs() could
> fail here if the virtqueue is already full.

Ah, didn't known that.  So we need to wait for completion somehow.


> 
> > +	virtqueue_kick(vps->vq[1]);
> > +
> > +	return 0;
> > +}
> > +
> > +static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
> > +			     struct timespec time, struct pstore_info *psi)
> > +{
> > +	struct virtio_pstore *vps = psi->data;
> > +	struct virtio_pstore_req *req;
> > +	struct virtio_pstore_res *res;
> > +	struct scatterlist sgo[1], sgi[1];
> > +	struct scatterlist *sgs[2] = { sgo, sgi };
> > +	unsigned int len;
> > +
> > +	virt_pstore_get_reqs(vps, &req, &res);
> > +
> > +	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_ERASE);
> > +	req->type  = to_virtio_type(type);
> > +	req->id	   = cpu_to_le64(id);
> > +	req->count = cpu_to_le32(count);
> > +
> > +	sg_init_one(sgo, req, sizeof(*req));
> > +	sg_init_one(sgi, res, sizeof(*res));
> > +	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
> > +	virtqueue_kick(vps->vq[0]);
> 
> Erase commands are sent on the "read" virtqueue, not the "write"
> virtqueue?

Yes, you can think they're sync and async queues.  The write is the
only async operation.


> 
> > +
> > +	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
> > +	return le32_to_cpu(res->ret);
> > +}
> > +
> > +static int virt_pstore_init(struct virtio_pstore *vps)
> > +{
> > +	struct pstore_info *psinfo = &vps->pstore;
> > +	int err;
> > +
> > +	if (!psinfo->bufsize)
> > +		psinfo->bufsize = VIRT_PSTORE_BUFSIZE;
> > +
> > +	psinfo->buf = alloc_pages_exact(psinfo->bufsize, GFP_KERNEL);
> > +	if (!psinfo->buf) {
> > +		pr_err("cannot allocate pstore buffer\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	psinfo->owner = THIS_MODULE;
> > +	psinfo->name  = "virtio";
> > +	psinfo->open  = virt_pstore_open;
> > +	psinfo->close = virt_pstore_close;
> > +	psinfo->read  = virt_pstore_read;
> > +	psinfo->erase = virt_pstore_erase;
> > +	psinfo->write = virt_pstore_write;
> > +	psinfo->flags = PSTORE_FLAGS_FRAGILE;
> > +
> > +	psinfo->data  = vps;
> > +	spin_lock_init(&psinfo->buf_lock);
> > +
> > +	err = pstore_register(psinfo);
> > +	if (err)
> > +		kfree(psinfo->buf);
> 
> Should this be free_pages_exact() instead of kfree()?

Oops, right.

> 
> Is it safe to call pstore_register() before the remaining initialization
> steps?
> 
> My understanding is that if pstore is already mounted then requests will
> immediately be sent.  In order to support this we'd need to initialize
> everything else first before calling pstore_register().

Fair enough.  Will change.

> 
> > +
> > +	return err;
> > +}
> > +
> > +static int virt_pstore_exit(struct virtio_pstore *vps)
> > +{
> > +	struct pstore_info *psinfo = &vps->pstore;
> > +
> > +	pstore_unregister(psinfo);
> > +
> > +	free_pages_exact(psinfo->buf, psinfo->bufsize);
> > +	psinfo->buf = NULL;
> > +	psinfo->bufsize = 0;
> > +
> > +	return 0;
> > +}
> > +
> > +static int virtpstore_init_vqs(struct virtio_pstore *vps)
> > +{
> > +	vq_callback_t *callbacks[] = { virtpstore_ack, virtpstore_check };
> > +	const char *names[] = { "pstore_read", "pstore_write" };
> > +
> > +	return vps->vdev->config->find_vqs(vps->vdev, 2, vps->vq,
> > +					   callbacks, names);
> > +}
> > +
> > +static void virtpstore_init_config(struct virtio_pstore *vps)
> > +{
> > +	u32 bufsize;
> > +
> > +	virtio_cread(vps->vdev, struct virtio_pstore_config, bufsize, &bufsize);
> > +
> > +	vps->pstore.bufsize = PAGE_ALIGN(bufsize);
> > +}
> > +
> > +static void virtpstore_confirm_config(struct virtio_pstore *vps)
> > +{
> > +	u32 bufsize = vps->pstore.bufsize;
> > +
> > +	virtio_cwrite(vps->vdev, struct virtio_pstore_config, bufsize,
> > +		     &bufsize);
> > +}
> > +
> > +static int virtpstore_probe(struct virtio_device *vdev)
> > +{
> > +	struct virtio_pstore *vps;
> > +	int err;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "driver init: config access disabled\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	vdev->priv = vps = kzalloc(sizeof(*vps), GFP_KERNEL);
> > +	if (!vps) {
> > +		err = -ENOMEM;
> > +		goto out;
> > +	}
> > +	vps->vdev = vdev;
> > +
> > +	err = virtpstore_init_vqs(vps);
> > +	if (err < 0)
> > +		goto out_free;
> > +
> > +	virtpstore_init_config(vps);
> > +
> > +	err = virt_pstore_init(vps);
> > +	if (err)
> > +		goto out_del_vq;
> > +
> > +	virtpstore_confirm_config(vps);
> > +
> > +	init_waitqueue_head(&vps->acked);
> > +
> > +	virtio_device_ready(vdev);
> 
> This call is needed if the .probe() function uses virtqueues before
> returning.  Right now it either doesn't use the virtqueues or it has
> already used them in virt_pstore_init().  Please move this before
> pstore_register().

Will do.

Thank you so much for your detailed review!
Namhyung


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 77590320d44c..8f0e6c796c12 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -58,6 +58,16 @@  config VIRTIO_INPUT
 
 	 If unsure, say M.
 
+config VIRTIO_PSTORE
+	tristate "Virtio pstore driver"
+	depends on VIRTIO
+	depends on PSTORE
+	---help---
+	 This driver supports virtio pstore devices to save/restore
+	 panic and oops messages on the host.
+
+	 If unsure, say M.
+
  config VIRTIO_MMIO
 	tristate "Platform bus driver for memory mapped virtio devices"
 	depends on HAS_IOMEM && HAS_DMA
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 41e30e3dc842..bee68cb26d48 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -5,3 +5,4 @@  virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PSTORE) += virtio_pstore.o
diff --git a/drivers/virtio/virtio_pstore.c b/drivers/virtio/virtio_pstore.c
new file mode 100644
index 000000000000..c8fd8e39d1b8
--- /dev/null
+++ b/drivers/virtio/virtio_pstore.c
@@ -0,0 +1,417 @@ 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/pstore.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_ids.h>
+#include <uapi/linux/virtio_pstore.h>
+
+#define VIRT_PSTORE_ORDER    2
+#define VIRT_PSTORE_BUFSIZE  (4096 << VIRT_PSTORE_ORDER)
+#define VIRT_PSTORE_NR_REQ   128
+
+struct virtio_pstore {
+	struct virtio_device	*vdev;
+	struct virtqueue	*vq[2];
+	struct pstore_info	 pstore;
+	struct virtio_pstore_req req[VIRT_PSTORE_NR_REQ];
+	struct virtio_pstore_res res[VIRT_PSTORE_NR_REQ];
+	unsigned int		 req_id;
+
+	/* Waiting for host to ack */
+	wait_queue_head_t	acked;
+	int			failed;
+};
+
+#define TYPE_TABLE_ENTRY(_entry)				\
+	{ PSTORE_TYPE_##_entry, VIRTIO_PSTORE_TYPE_##_entry }
+
+struct type_table {
+	int pstore;
+	u16 virtio;
+} type_table[] = {
+	TYPE_TABLE_ENTRY(DMESG),
+};
+
+#undef TYPE_TABLE_ENTRY
+
+
+static __le16 to_virtio_type(enum pstore_type_id type)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(type_table); i++) {
+		if (type == type_table[i].pstore)
+			return cpu_to_le16(type_table[i].virtio);
+	}
+
+	return cpu_to_le16(VIRTIO_PSTORE_TYPE_UNKNOWN);
+}
+
+static enum pstore_type_id from_virtio_type(__le16 type)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(type_table); i++) {
+		if (le16_to_cpu(type) == type_table[i].virtio)
+			return type_table[i].pstore;
+	}
+
+	return PSTORE_TYPE_UNKNOWN;
+}
+
+static void virtpstore_ack(struct virtqueue *vq)
+{
+	struct virtio_pstore *vps = vq->vdev->priv;
+
+	wake_up(&vps->acked);
+}
+
+static void virtpstore_check(struct virtqueue *vq)
+{
+	struct virtio_pstore *vps = vq->vdev->priv;
+	struct virtio_pstore_res *res;
+	unsigned int len;
+
+	res = virtqueue_get_buf(vq, &len);
+	if (res == NULL)
+		return;
+
+	if (le32_to_cpu(res->ret) < 0)
+		vps->failed = 1;
+}
+
+static void virt_pstore_get_reqs(struct virtio_pstore *vps,
+				 struct virtio_pstore_req **preq,
+				 struct virtio_pstore_res **pres)
+{
+	unsigned int idx = vps->req_id++ % VIRT_PSTORE_NR_REQ;
+
+	*preq = &vps->req[idx];
+	*pres = &vps->res[idx];
+
+	memset(*preq, 0, sizeof(**preq));
+	memset(*pres, 0, sizeof(**pres));
+}
+
+static int virt_pstore_open(struct pstore_info *psi)
+{
+	struct virtio_pstore *vps = psi->data;
+	struct virtio_pstore_req *req;
+	struct virtio_pstore_res *res;
+	struct scatterlist sgo[1], sgi[1];
+	struct scatterlist *sgs[2] = { sgo, sgi };
+	unsigned int len;
+
+	virt_pstore_get_reqs(vps, &req, &res);
+
+	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_OPEN);
+
+	sg_init_one(sgo, req, sizeof(*req));
+	sg_init_one(sgi, res, sizeof(*res));
+	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
+	virtqueue_kick(vps->vq[0]);
+
+	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
+	return le32_to_cpu(res->ret);
+}
+
+static int virt_pstore_close(struct pstore_info *psi)
+{
+	struct virtio_pstore *vps = psi->data;
+	struct virtio_pstore_req *req = &vps->req[vps->req_id];
+	struct virtio_pstore_res *res = &vps->res[vps->req_id];
+	struct scatterlist sgo[1], sgi[1];
+	struct scatterlist *sgs[2] = { sgo, sgi };
+	unsigned int len;
+
+	virt_pstore_get_reqs(vps, &req, &res);
+
+	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_CLOSE);
+
+	sg_init_one(sgo, req, sizeof(*req));
+	sg_init_one(sgi, res, sizeof(*res));
+	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
+	virtqueue_kick(vps->vq[0]);
+
+	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
+	return le32_to_cpu(res->ret);
+}
+
+static ssize_t virt_pstore_read(u64 *id, enum pstore_type_id *type,
+				int *count, struct timespec *time,
+				char **buf, bool *compressed,
+				ssize_t *ecc_notice_size,
+				struct pstore_info *psi)
+{
+	struct virtio_pstore *vps = psi->data;
+	struct virtio_pstore_req *req;
+	struct virtio_pstore_res *res;
+	struct virtio_pstore_fileinfo info;
+	struct scatterlist sgo[1], sgi[3];
+	struct scatterlist *sgs[2] = { sgo, sgi };
+	unsigned int len;
+	unsigned int flags;
+	int ret;
+	void *bf;
+
+	virt_pstore_get_reqs(vps, &req, &res);
+
+	req->cmd = cpu_to_le16(VIRTIO_PSTORE_CMD_READ);
+
+	sg_init_one(sgo, req, sizeof(*req));
+	sg_init_table(sgi, 3);
+	sg_set_buf(&sgi[0], res, sizeof(*res));
+	sg_set_buf(&sgi[1], &info, sizeof(info));
+	sg_set_buf(&sgi[2], psi->buf, psi->bufsize);
+	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
+	virtqueue_kick(vps->vq[0]);
+
+	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
+	if (len < sizeof(*res) + sizeof(info))
+		return -1;
+
+	ret = le32_to_cpu(res->ret);
+	if (ret < 0)
+		return ret;
+
+	len = le32_to_cpu(info.len);
+
+	bf = kmalloc(len, GFP_KERNEL);
+	if (bf == NULL)
+		return -ENOMEM;
+
+	*id    = le64_to_cpu(info.id);
+	*type  = from_virtio_type(info.type);
+	*count = le32_to_cpu(info.count);
+
+	flags = le32_to_cpu(info.flags);
+	*compressed = flags & VIRTIO_PSTORE_FL_COMPRESSED;
+
+	time->tv_sec  = le64_to_cpu(info.time_sec);
+	time->tv_nsec = le32_to_cpu(info.time_nsec);
+
+	memcpy(bf, psi->buf, len);
+	*buf = bf;
+
+	return len;
+}
+
+static int notrace virt_pstore_write(enum pstore_type_id type,
+				     enum kmsg_dump_reason reason,
+				     u64 *id, unsigned int part, int count,
+				     bool compressed, size_t size,
+				     struct pstore_info *psi)
+{
+	struct virtio_pstore *vps = psi->data;
+	struct virtio_pstore_req *req;
+	struct virtio_pstore_res *res;
+	struct scatterlist sgo[2], sgi[1];
+	struct scatterlist *sgs[2] = { sgo, sgi };
+	unsigned int flags = compressed ? VIRTIO_PSTORE_FL_COMPRESSED : 0;
+
+	if (vps->failed)
+		return -1;
+
+	*id = vps->req_id;
+	virt_pstore_get_reqs(vps, &req, &res);
+
+	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_WRITE);
+	req->type  = to_virtio_type(type);
+	req->flags = cpu_to_le32(flags);
+
+	sg_init_table(sgo, 2);
+	sg_set_buf(&sgo[0], req, sizeof(*req));
+	sg_set_buf(&sgo[1], psi->buf, size);
+	sg_init_one(sgi, res, sizeof(*res));
+	virtqueue_add_sgs(vps->vq[1], sgs, 1, 1, vps, GFP_ATOMIC);
+	virtqueue_kick(vps->vq[1]);
+
+	return 0;
+}
+
+static int virt_pstore_erase(enum pstore_type_id type, u64 id, int count,
+			     struct timespec time, struct pstore_info *psi)
+{
+	struct virtio_pstore *vps = psi->data;
+	struct virtio_pstore_req *req;
+	struct virtio_pstore_res *res;
+	struct scatterlist sgo[1], sgi[1];
+	struct scatterlist *sgs[2] = { sgo, sgi };
+	unsigned int len;
+
+	virt_pstore_get_reqs(vps, &req, &res);
+
+	req->cmd   = cpu_to_le16(VIRTIO_PSTORE_CMD_ERASE);
+	req->type  = to_virtio_type(type);
+	req->id	   = cpu_to_le64(id);
+	req->count = cpu_to_le32(count);
+
+	sg_init_one(sgo, req, sizeof(*req));
+	sg_init_one(sgi, res, sizeof(*res));
+	virtqueue_add_sgs(vps->vq[0], sgs, 1, 1, vps, GFP_KERNEL);
+	virtqueue_kick(vps->vq[0]);
+
+	wait_event(vps->acked, virtqueue_get_buf(vps->vq[0], &len));
+	return le32_to_cpu(res->ret);
+}
+
+static int virt_pstore_init(struct virtio_pstore *vps)
+{
+	struct pstore_info *psinfo = &vps->pstore;
+	int err;
+
+	if (!psinfo->bufsize)
+		psinfo->bufsize = VIRT_PSTORE_BUFSIZE;
+
+	psinfo->buf = alloc_pages_exact(psinfo->bufsize, GFP_KERNEL);
+	if (!psinfo->buf) {
+		pr_err("cannot allocate pstore buffer\n");
+		return -ENOMEM;
+	}
+
+	psinfo->owner = THIS_MODULE;
+	psinfo->name  = "virtio";
+	psinfo->open  = virt_pstore_open;
+	psinfo->close = virt_pstore_close;
+	psinfo->read  = virt_pstore_read;
+	psinfo->erase = virt_pstore_erase;
+	psinfo->write = virt_pstore_write;
+	psinfo->flags = PSTORE_FLAGS_FRAGILE;
+
+	psinfo->data  = vps;
+	spin_lock_init(&psinfo->buf_lock);
+
+	err = pstore_register(psinfo);
+	if (err)
+		kfree(psinfo->buf);
+
+	return err;
+}
+
+static int virt_pstore_exit(struct virtio_pstore *vps)
+{
+	struct pstore_info *psinfo = &vps->pstore;
+
+	pstore_unregister(psinfo);
+
+	free_pages_exact(psinfo->buf, psinfo->bufsize);
+	psinfo->buf = NULL;
+	psinfo->bufsize = 0;
+
+	return 0;
+}
+
+static int virtpstore_init_vqs(struct virtio_pstore *vps)
+{
+	vq_callback_t *callbacks[] = { virtpstore_ack, virtpstore_check };
+	const char *names[] = { "pstore_read", "pstore_write" };
+
+	return vps->vdev->config->find_vqs(vps->vdev, 2, vps->vq,
+					   callbacks, names);
+}
+
+static void virtpstore_init_config(struct virtio_pstore *vps)
+{
+	u32 bufsize;
+
+	virtio_cread(vps->vdev, struct virtio_pstore_config, bufsize, &bufsize);
+
+	vps->pstore.bufsize = PAGE_ALIGN(bufsize);
+}
+
+static void virtpstore_confirm_config(struct virtio_pstore *vps)
+{
+	u32 bufsize = vps->pstore.bufsize;
+
+	virtio_cwrite(vps->vdev, struct virtio_pstore_config, bufsize,
+		     &bufsize);
+}
+
+static int virtpstore_probe(struct virtio_device *vdev)
+{
+	struct virtio_pstore *vps;
+	int err;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "driver init: config access disabled\n");
+		return -EINVAL;
+	}
+
+	vdev->priv = vps = kzalloc(sizeof(*vps), GFP_KERNEL);
+	if (!vps) {
+		err = -ENOMEM;
+		goto out;
+	}
+	vps->vdev = vdev;
+
+	err = virtpstore_init_vqs(vps);
+	if (err < 0)
+		goto out_free;
+
+	virtpstore_init_config(vps);
+
+	err = virt_pstore_init(vps);
+	if (err)
+		goto out_del_vq;
+
+	virtpstore_confirm_config(vps);
+
+	init_waitqueue_head(&vps->acked);
+
+	virtio_device_ready(vdev);
+
+	dev_info(&vdev->dev, "driver init: ok (bufsize = %luK, flags = %x)\n",
+		 vps->pstore.bufsize >> 10, vps->pstore.flags);
+
+	return 0;
+
+out_del_vq:
+	vdev->config->del_vqs(vdev);
+out_free:
+	kfree(vps);
+out:
+	dev_err(&vdev->dev, "driver init: failed with %d\n", err);
+	return err;
+}
+
+static void virtpstore_remove(struct virtio_device *vdev)
+{
+	struct virtio_pstore *vps = vdev->priv;
+
+	virt_pstore_exit(vps);
+
+	/* Now we reset the device so we can clean up the queues. */
+	vdev->config->reset(vdev);
+
+	vdev->config->del_vqs(vdev);
+
+	kfree(vps);
+}
+
+static unsigned int features[] = {
+};
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PSTORE, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+static struct virtio_driver virtio_pstore_driver = {
+	.driver.name         = KBUILD_MODNAME,
+	.driver.owner        = THIS_MODULE,
+	.feature_table       = features,
+	.feature_table_size  = ARRAY_SIZE(features),
+	.id_table            = id_table,
+	.probe               = virtpstore_probe,
+	.remove              = virtpstore_remove,
+};
+
+module_virtio_driver(virtio_pstore_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Namhyung Kim <namhyung@kernel.org>");
+MODULE_DESCRIPTION("Virtio pstore driver");
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 6d4e92ccdc91..9bbb1554d8b2 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -449,6 +449,7 @@  header-y += virtio_ids.h
 header-y += virtio_input.h
 header-y += virtio_net.h
 header-y += virtio_pci.h
+header-y += virtio_pstore.h
 header-y += virtio_ring.h
 header-y += virtio_rng.h
 header-y += virtio_scsi.h
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 77925f587b15..c72a9ab588c0 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -41,5 +41,6 @@ 
 #define VIRTIO_ID_CAIF	       12 /* Virtio caif */
 #define VIRTIO_ID_GPU          16 /* virtio GPU */
 #define VIRTIO_ID_INPUT        18 /* virtio input */
+#define VIRTIO_ID_PSTORE       22 /* virtio pstore */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pstore.h b/include/uapi/linux/virtio_pstore.h
new file mode 100644
index 000000000000..57c35327f53b
--- /dev/null
+++ b/include/uapi/linux/virtio_pstore.h
@@ -0,0 +1,74 @@ 
+#ifndef _LINUX_VIRTIO_PSTORE_H
+#define _LINUX_VIRTIO_PSTORE_H
+/* This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE. */
+#include <linux/types.h>
+#include <linux/virtio_types.h>
+
+#define VIRTIO_PSTORE_CMD_NULL   0
+#define VIRTIO_PSTORE_CMD_OPEN   1
+#define VIRTIO_PSTORE_CMD_READ   2
+#define VIRTIO_PSTORE_CMD_WRITE  3
+#define VIRTIO_PSTORE_CMD_ERASE  4
+#define VIRTIO_PSTORE_CMD_CLOSE  5
+
+#define VIRTIO_PSTORE_TYPE_UNKNOWN  0
+#define VIRTIO_PSTORE_TYPE_DMESG    1
+
+#define VIRTIO_PSTORE_FL_COMPRESSED  1
+
+struct virtio_pstore_req {
+	__le16		cmd;
+	__le16		type;
+	__le32		flags;
+	__le64		id;
+	__le32		count;
+	__le32		reserved;
+};
+
+struct virtio_pstore_res {
+	__le16		cmd;
+	__le16		type;
+	__le32		ret;
+};
+
+struct virtio_pstore_fileinfo {
+	__le64		id;
+	__le32		count;
+	__le16		type;
+	__le16		unused;
+	__le32		flags;
+	__le32		len;
+	__le64		time_sec;
+	__le32		time_nsec;
+	__le32		reserved;
+};
+
+struct virtio_pstore_config {
+	__le32		bufsize;
+};
+
+#endif /* _LINUX_VIRTIO_PSTORE_H */