diff mbox

[4/5] coresight-stm: adding driver for CoreSight STM component

Message ID 1425078294-13059-5-git-send-email-mathieu.poirier@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Mathieu Poirier Feb. 27, 2015, 11:04 p.m. UTC
From: Pratik Patel <pratikp@codeaurora.org>

This driver adds support for the STM CoreSight IP block,
allowing any system compoment (HW or SW) to log and
aggregate messages via a single entity.

The STM exposes an application defined number of channels
called stimulus port.  Configuration is done using entries
in sysfs and channels made available to userspace via devfs.

Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
---
 .../ABI/testing/sysfs-bus-coresight-devices-stm    |   62 ++
 Documentation/trace/coresight.txt                  |   88 +-
 drivers/coresight/Kconfig                          |   10 +
 drivers/coresight/Makefile                         |    1 +
 drivers/coresight/coresight-stm.c                  | 1070 ++++++++++++++++++++
 include/linux/coresight-stm.h                      |   40 +
 include/uapi/linux/coresight-stm.h                 |   23 +
 7 files changed, 1292 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 create mode 100644 drivers/coresight/coresight-stm.c
 create mode 100644 include/linux/coresight-stm.h
 create mode 100644 include/uapi/linux/coresight-stm.h

Comments

Alexander Shishkin March 7, 2015, 12:27 p.m. UTC | #1
Mathieu Poirier <mathieu.poirier@linaro.org> writes:

> From: Pratik Patel <pratikp@codeaurora.org>
>
> This driver adds support for the STM CoreSight IP block,
> allowing any system compoment (HW or SW) to log and
> aggregate messages via a single entity.
>
> The STM exposes an application defined number of channels
> called stimulus port.  Configuration is done using entries
> in sysfs and channels made available to userspace via devfs.

I somehow missed it when it was posted, but anyway. It looks like we are
solving a very similar problem (the STM device) and I'd like to propose
a generic and architecture independent solution [1]. I also took the
liberty to copy you and Pratik on those patches.

[1] https://lkml.org/lkml/2015/3/7/99

Regards,
--
Alex
Alexander Shishkin March 30, 2015, 2:04 p.m. UTC | #2
Mathieu Poirier <mathieu.poirier@linaro.org> writes:

> +static int stm_send(void *addr, const void *data, u32 size)
> +{
> +	u32 len = size;
> +
> +	if (((unsigned long)data & 0x1) && (size >= 1)) {
> +		writeb_relaxed(*(u8 *)data, addr);
> +		data++;
> +		size--;
> +	}
> +	if (((unsigned long)data & 0x2) && (size >= 2)) {
> +		writew_relaxed(*(u16 *)data, addr);
> +		data += 2;
> +		size -= 2;
> +	}
> +
> +	/* now we are 32bit aligned */
> +	while (size >= 4) {
> +		writel_relaxed(*(u32 *)data, addr);
> +		data += 4;
> +		size -= 4;
> +	}
> +
> +	if (size >= 2) {
> +		writew_relaxed(*(u16 *)data, addr);
> +		data += 2;
> +		size -= 2;
> +	}
> +	if (size >= 1) {
> +		writeb_relaxed(*(u8 *)data, addr);
> +		data++;
> +		size--;
> +	}
> +
> +	return len;
> +}
> +
> +static int stm_trace_data(unsigned long ch_addr, u32 options,
> +			  const void *data, u32 size)
> +{
> +	void *addr;
> +
> +	options &= ~STM_OPTION_TIMESTAMPED;
> +	addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
> +
> +	return stm_send(addr, data, size);
> +}
> +
> +static inline int stm_trace_hw(u32 options, u32 channel, u8 entity_id,
> +			       const void *data, u32 size)
> +{
> +	int len = 0;
> +	unsigned long ch_addr;
> +	struct stm_drvdata *drvdata = stmdrvdata;
> +
> +
> +	/* get the channel address */
> +	ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
> +
> +	if (drvdata->write_64bit)
> +		len = stm_trace_data_64bit(ch_addr, options, data, size);
> +	else
> +		/* send the payload data */
> +		len = stm_trace_data(ch_addr, options, data, size);
> +
> +	return len;
> +}

As it looks from the above snippet, you're using a stream of DATA
packets for user's payload. I also noticed that you use an ioctl to
trigger timestamps.

Now, in the STP protocol there are, for example, marked data packets
that can be used to mark beginning of a higher-level message,
timestamped data packets that can be used to mean the same thing and
FLAG packets to mark message boundaries.

In my Intel TH code, I'm using D*TS packet for the beginning of a
message (or "frame") and FLAG packet for the the end of a message.

So my question is, is there any specific STP framing pattern that you
use with Coresight STM or should we perhaps figure out a generic framing
pattern and make it part of the stm class as well?

For example, we can replace stm's .write callback with something like

    int (*packet)(struct stm_data *data,
                  unsigned int type,    /* data, flag, trig etc */
                  unsigned int options, /* timestamped, marked */
                  u64 payload);

and let the stm core do the "framing", which, then, will be common and
consistent across different architectures/stm implementations.

> +static long stm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
> +{
> +	u32 options;
> +	struct stm_node *node = file->private_data;
> +
> +	switch (cmd) {
> +	case STM_IOCTL_SET_OPTIONS:
> +		if (copy_from_user(&options, (void __user *)arg, sizeof(u32)))
> +			return -EFAULT;
> +
> +		options &= (STM_OPTION_TIMESTAMPED | STM_OPTION_GUARANTEED);
> +		node->options = options;
> +		break;
> +	case STM_IOCTL_GET_OPTIONS:
> +		options = node->options;
> +		if (copy_to_user((void __user *)arg, &options, sizeof(options)))
> +			return -EFAULT;
> +		break;
> +	default:
> +		return -EINVAL;
> +	};
> +
> +	return 0;
> +}

That way, we also won't need private ioctl()s, or at least, not for this
reason.

How do you feel about this?

Regards,
--
Alex
Mathieu Poirier March 30, 2015, 3:48 p.m. UTC | #3
On 30 March 2015 at 08:04, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>
>> +static int stm_send(void *addr, const void *data, u32 size)
>> +{
>> +     u32 len = size;
>> +
>> +     if (((unsigned long)data & 0x1) && (size >= 1)) {
>> +             writeb_relaxed(*(u8 *)data, addr);
>> +             data++;
>> +             size--;
>> +     }
>> +     if (((unsigned long)data & 0x2) && (size >= 2)) {
>> +             writew_relaxed(*(u16 *)data, addr);
>> +             data += 2;
>> +             size -= 2;
>> +     }
>> +
>> +     /* now we are 32bit aligned */
>> +     while (size >= 4) {
>> +             writel_relaxed(*(u32 *)data, addr);
>> +             data += 4;
>> +             size -= 4;
>> +     }
>> +
>> +     if (size >= 2) {
>> +             writew_relaxed(*(u16 *)data, addr);
>> +             data += 2;
>> +             size -= 2;
>> +     }
>> +     if (size >= 1) {
>> +             writeb_relaxed(*(u8 *)data, addr);
>> +             data++;
>> +             size--;
>> +     }
>> +
>> +     return len;
>> +}
>> +
>> +static int stm_trace_data(unsigned long ch_addr, u32 options,
>> +                       const void *data, u32 size)
>> +{
>> +     void *addr;
>> +
>> +     options &= ~STM_OPTION_TIMESTAMPED;
>> +     addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
>> +
>> +     return stm_send(addr, data, size);
>> +}
>> +
>> +static inline int stm_trace_hw(u32 options, u32 channel, u8 entity_id,
>> +                            const void *data, u32 size)
>> +{
>> +     int len = 0;
>> +     unsigned long ch_addr;
>> +     struct stm_drvdata *drvdata = stmdrvdata;
>> +
>> +
>> +     /* get the channel address */
>> +     ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
>> +
>> +     if (drvdata->write_64bit)
>> +             len = stm_trace_data_64bit(ch_addr, options, data, size);
>> +     else
>> +             /* send the payload data */
>> +             len = stm_trace_data(ch_addr, options, data, size);
>> +
>> +     return len;
>> +}
>
> As it looks from the above snippet, you're using a stream of DATA
> packets for user's payload. I also noticed that you use an ioctl to
> trigger timestamps.

Right, the ioctl() conveys user space intentions on that channel.
Options are kept and applied on every packet for as long as the
channel is open.

>
> Now, in the STP protocol there are, for example, marked data packets
> that can be used to mark beginning of a higher-level message,
> timestamped data packets that can be used to mean the same thing and
> FLAG packets to mark message boundaries.

Same on my side, I simply haven't included them yet.  I'll do so in my
next iteration.

>
> In my Intel TH code, I'm using D*TS packet for the beginning of a
> message (or "frame") and FLAG packet for the the end of a message.
>
> So my question is, is there any specific STP framing pattern that you
> use with Coresight STM or should we perhaps figure out a generic framing
> pattern and make it part of the stm class as well?

Now specific pattern... Sending a packet consists of MARK, DATA, FLAG.

>
> For example, we can replace stm's .write callback with something like
>
>     int (*packet)(struct stm_data *data,
>                   unsigned int type,    /* data, flag, trig etc */
>                   unsigned int options, /* timestamped, marked */
>                   u64 payload);
>
> and let the stm core do the "framing", which, then, will be common and
> consistent across different architectures/stm implementations.

I think the framing should be left to individual drivers.  It's only a
matter of time before we get a weird device that doesn't play well
with others, forcing to carry the ugliness in the STM core rather than
the driver.

And isn't carrying "options" redundant?  Using "container_of" on the
"data" field one can get back to the driver specific structure, which
is definitely a better place to keep that information.  I think the
general structure looks good right now, we simply need to find a way
to get rid of the ioctls.

Regarding the same "options", how did you plan on getting those from user space?

>
>> +static long stm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
>> +{
>> +     u32 options;
>> +     struct stm_node *node = file->private_data;
>> +
>> +     switch (cmd) {
>> +     case STM_IOCTL_SET_OPTIONS:
>> +             if (copy_from_user(&options, (void __user *)arg, sizeof(u32)))
>> +                     return -EFAULT;
>> +
>> +             options &= (STM_OPTION_TIMESTAMPED | STM_OPTION_GUARANTEED);
>> +             node->options = options;
>> +             break;
>> +     case STM_IOCTL_GET_OPTIONS:
>> +             options = node->options;
>> +             if (copy_to_user((void __user *)arg, &options, sizeof(options)))
>> +                     return -EFAULT;
>> +             break;
>> +     default:
>> +             return -EINVAL;
>> +     };
>> +
>> +     return 0;
>> +}
>
> That way, we also won't need private ioctl()s, or at least, not for this
> reason.
>
> How do you feel about this?
>
> Regards,
> --
> Alex
Alexander Shishkin March 31, 2015, 3:04 p.m. UTC | #4
Mathieu Poirier <mathieu.poirier@linaro.org> writes:

> On 30 March 2015 at 08:04, Alexander Shishkin
> <alexander.shishkin@linux.intel.com> wrote:
>> As it looks from the above snippet, you're using a stream of DATA
>> packets for user's payload. I also noticed that you use an ioctl to
>> trigger timestamps.
>
> Right, the ioctl() conveys user space intentions on that channel.
> Options are kept and applied on every packet for as long as the
> channel is open.

So this means that, for example, if you enable timestamps on a channel,
then every single data packet on that channel will be timestamped, which
is a lot of timestamps. Normally, you would only be interested in the
timestamp on the first data packet in the message (or frame or however
we decide to call it). This is one of the reasons why I'm suggesting a
common framing scheme or a "protocol".

>> Now, in the STP protocol there are, for example, marked data packets
>> that can be used to mark beginning of a higher-level message,
>> timestamped data packets that can be used to mean the same thing and
>> FLAG packets to mark message boundaries.
>
> Same on my side, I simply haven't included them yet.  I'll do so in my
> next iteration.
>
>>
>> In my Intel TH code, I'm using D*TS packet for the beginning of a
>> message (or "frame") and FLAG packet for the the end of a message.
>>
>> So my question is, is there any specific STP framing pattern that you
>> use with Coresight STM or should we perhaps figure out a generic framing
>> pattern and make it part of the stm class as well?
>
> Now specific pattern... Sending a packet consists of MARK, DATA, FLAG.

Is this pattern mandated by a decoder that you use or is there any other
reason why it's exactly that?

>>
>> For example, we can replace stm's .write callback with something like
>>
>>     int (*packet)(struct stm_data *data,
>>                   unsigned int type,    /* data, flag, trig etc */
>>                   unsigned int options, /* timestamped, marked */
>>                   u64 payload);
>>
>> and let the stm core do the "framing", which, then, will be common and
>> consistent across different architectures/stm implementations.
>
> I think the framing should be left to individual drivers.  It's only a
> matter of time before we get a weird device that doesn't play well
> with others, forcing to carry the ugliness in the STM core rather than
> the driver.

Not necessarily. If a device doesn't support one type of packet or the
other, it will be up to them to work around that in the above .packet
callback.

As for the devices that don't play well, there's a question of how much
one can violate the spec and still call oneself compliant.

> And isn't carrying "options" redundant?  Using "container_of" on the
> "data" field one can get back to the driver specific structure, which
> is definitely a better place to keep that information.  I think the
> general structure looks good right now, we simply need to find a way
> to get rid of the ioctls.

No, what I mean by options here is a property of each individual packet,
not the whole channel. For example, if I want the underlying driver to
send a marked data packet, I do

     stm_data->packet(stm_data, STP_PACKET_D8, STP_OPTION_MARKED, payload);

or if I want to send a timestamped flag, I do

     stm_data->packet(stm_data, STP_PACKET_FLAG, STP_OPTION_TS, 0);

Like I said above, there seems little to be gained from enabling
timestamps for all packets in one channel.

> Regarding the same "options", how did you plan on getting those from user space?

Ideally, if we have a framing convension, we don't need to get it from
userspace at all, all userspace should care about is writing data to the
character device and we wrap it up and feed it to the underlying driver.

Regards,
--
Alex
Mathieu Poirier April 1, 2015, 2:27 p.m. UTC | #5
Adding Al Grant to the conversation - his knowledge on HW tracing for
the ARM architecture is definitely an asset for this kind of planning.
Please add him to future patchset as well.

On 31 March 2015 at 09:04, Alexander Shishkin
<alexander.shishkin@linux.intel.com> wrote:
> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>
>> On 30 March 2015 at 08:04, Alexander Shishkin
>> <alexander.shishkin@linux.intel.com> wrote:
>>> As it looks from the above snippet, you're using a stream of DATA
>>> packets for user's payload. I also noticed that you use an ioctl to
>>> trigger timestamps.
>>
>> Right, the ioctl() conveys user space intentions on that channel.
>> Options are kept and applied on every packet for as long as the
>> channel is open.
>
> So this means that, for example, if you enable timestamps on a channel,
> then every single data packet on that channel will be timestamped, which
> is a lot of timestamps.

That is how the original coresight-stm driver was implemented.  My
initial goal was to upstream something that could be used as a
conversation starter or a foundation to start building on.  I had
planned to look into the protocol specification itself in later steps.
But addressing the issue now is just as worthy.

>Normally, you would only be interested in the
> timestamp on the first data packet in the message (or frame or however
> we decide to call it). This is one of the reasons why I'm suggesting a
> common framing scheme or a "protocol".
>
>>> Now, in the STP protocol there are, for example, marked data packets
>>> that can be used to mark beginning of a higher-level message,
>>> timestamped data packets that can be used to mean the same thing and
>>> FLAG packets to mark message boundaries.
>>
>> Same on my side, I simply haven't included them yet.  I'll do so in my
>> next iteration.
>>
>>>
>>> In my Intel TH code, I'm using D*TS packet for the beginning of a
>>> message (or "frame") and FLAG packet for the the end of a message.

By the way, are you following the OST specification of this is a
scheme you came up with?

>>>
>>> So my question is, is there any specific STP framing pattern that you
>>> use with Coresight STM or should we perhaps figure out a generic framing
>>> pattern and make it part of the stm class as well?
>>
>> Now specific pattern... Sending a packet consists of MARK, DATA, FLAG.
>
> Is this pattern mandated by a decoder that you use or is there any other
> reason why it's exactly that?

The driver was following the OST specification, or something close to
that.  I don't have access to the standard itself and as such not in a
position to assert how accurate the implementation.  That is one of
the reason I left it out of my patchset.

>
>>>
>>> For example, we can replace stm's .write callback with something like
>>>
>>>     int (*packet)(struct stm_data *data,
>>>                   unsigned int type,    /* data, flag, trig etc */
>>>                   unsigned int options, /* timestamped, marked */
>>>                   u64 payload);
>>>
>>> and let the stm core do the "framing", which, then, will be common and
>>> consistent across different architectures/stm implementations.
>>
>> I think the framing should be left to individual drivers.  It's only a
>> matter of time before we get a weird device that doesn't play well
>> with others, forcing to carry the ugliness in the STM core rather than
>> the driver.
>
> Not necessarily. If a device doesn't support one type of packet or the
> other, it will be up to them to work around that in the above .packet
> callback.
>
> As for the devices that don't play well, there's a question of how much
> one can violate the spec and still call oneself compliant.

I understand your point of view.  On my side I'm trying to avoid
painting ourselves in the corner.

>
>> And isn't carrying "options" redundant?  Using "container_of" on the
>> "data" field one can get back to the driver specific structure, which
>> is definitely a better place to keep that information.  I think the
>> general structure looks good right now, we simply need to find a way
>> to get rid of the ioctls.
>
> No, what I mean by options here is a property of each individual packet,
> not the whole channel. For example, if I want the underlying driver to
> send a marked data packet, I do
>
>      stm_data->packet(stm_data, STP_PACKET_D8, STP_OPTION_MARKED, payload);
>
> or if I want to send a timestamped flag, I do
>
>      stm_data->packet(stm_data, STP_PACKET_FLAG, STP_OPTION_TS, 0);

Ah!  It's getting clearer now.

>
> Like I said above, there seems little to be gained from enabling
> timestamps for all packets in one channel.
>
>> Regarding the same "options", how did you plan on getting those from user space?
>
> Ideally, if we have a framing convension, we don't need to get it from
> userspace at all, all userspace should care about is writing data to the
> character device and we wrap it up and feed it to the underlying driver.

What do you have in mind for "framing convention"?  As mentioned above
codeAurora was using the OST specification but Al Grant tell me it
isn't supported anymore.

Thanks for the open dialogue,
Mathieu

>
> Regards,
> --
> Alex
Mathieu Poirier April 1, 2015, 2:28 p.m. UTC | #6
+ Al Grant

On 1 April 2015 at 08:27, Mathieu Poirier <mathieu.poirier@linaro.org> wrote:
> Adding Al Grant to the conversation - his knowledge on HW tracing for
> the ARM architecture is definitely an asset for this kind of planning.
> Please add him to future patchset as well.
>
> On 31 March 2015 at 09:04, Alexander Shishkin
> <alexander.shishkin@linux.intel.com> wrote:
>> Mathieu Poirier <mathieu.poirier@linaro.org> writes:
>>
>>> On 30 March 2015 at 08:04, Alexander Shishkin
>>> <alexander.shishkin@linux.intel.com> wrote:
>>>> As it looks from the above snippet, you're using a stream of DATA
>>>> packets for user's payload. I also noticed that you use an ioctl to
>>>> trigger timestamps.
>>>
>>> Right, the ioctl() conveys user space intentions on that channel.
>>> Options are kept and applied on every packet for as long as the
>>> channel is open.
>>
>> So this means that, for example, if you enable timestamps on a channel,
>> then every single data packet on that channel will be timestamped, which
>> is a lot of timestamps.
>
> That is how the original coresight-stm driver was implemented.  My
> initial goal was to upstream something that could be used as a
> conversation starter or a foundation to start building on.  I had
> planned to look into the protocol specification itself in later steps.
> But addressing the issue now is just as worthy.
>
>>Normally, you would only be interested in the
>> timestamp on the first data packet in the message (or frame or however
>> we decide to call it). This is one of the reasons why I'm suggesting a
>> common framing scheme or a "protocol".
>>
>>>> Now, in the STP protocol there are, for example, marked data packets
>>>> that can be used to mark beginning of a higher-level message,
>>>> timestamped data packets that can be used to mean the same thing and
>>>> FLAG packets to mark message boundaries.
>>>
>>> Same on my side, I simply haven't included them yet.  I'll do so in my
>>> next iteration.
>>>
>>>>
>>>> In my Intel TH code, I'm using D*TS packet for the beginning of a
>>>> message (or "frame") and FLAG packet for the the end of a message.
>
> By the way, are you following the OST specification of this is a
> scheme you came up with?
>
>>>>
>>>> So my question is, is there any specific STP framing pattern that you
>>>> use with Coresight STM or should we perhaps figure out a generic framing
>>>> pattern and make it part of the stm class as well?
>>>
>>> Now specific pattern... Sending a packet consists of MARK, DATA, FLAG.
>>
>> Is this pattern mandated by a decoder that you use or is there any other
>> reason why it's exactly that?
>
> The driver was following the OST specification, or something close to
> that.  I don't have access to the standard itself and as such not in a
> position to assert how accurate the implementation.  That is one of
> the reason I left it out of my patchset.
>
>>
>>>>
>>>> For example, we can replace stm's .write callback with something like
>>>>
>>>>     int (*packet)(struct stm_data *data,
>>>>                   unsigned int type,    /* data, flag, trig etc */
>>>>                   unsigned int options, /* timestamped, marked */
>>>>                   u64 payload);
>>>>
>>>> and let the stm core do the "framing", which, then, will be common and
>>>> consistent across different architectures/stm implementations.
>>>
>>> I think the framing should be left to individual drivers.  It's only a
>>> matter of time before we get a weird device that doesn't play well
>>> with others, forcing to carry the ugliness in the STM core rather than
>>> the driver.
>>
>> Not necessarily. If a device doesn't support one type of packet or the
>> other, it will be up to them to work around that in the above .packet
>> callback.
>>
>> As for the devices that don't play well, there's a question of how much
>> one can violate the spec and still call oneself compliant.
>
> I understand your point of view.  On my side I'm trying to avoid
> painting ourselves in the corner.
>
>>
>>> And isn't carrying "options" redundant?  Using "container_of" on the
>>> "data" field one can get back to the driver specific structure, which
>>> is definitely a better place to keep that information.  I think the
>>> general structure looks good right now, we simply need to find a way
>>> to get rid of the ioctls.
>>
>> No, what I mean by options here is a property of each individual packet,
>> not the whole channel. For example, if I want the underlying driver to
>> send a marked data packet, I do
>>
>>      stm_data->packet(stm_data, STP_PACKET_D8, STP_OPTION_MARKED, payload);
>>
>> or if I want to send a timestamped flag, I do
>>
>>      stm_data->packet(stm_data, STP_PACKET_FLAG, STP_OPTION_TS, 0);
>
> Ah!  It's getting clearer now.
>
>>
>> Like I said above, there seems little to be gained from enabling
>> timestamps for all packets in one channel.
>>
>>> Regarding the same "options", how did you plan on getting those from user space?
>>
>> Ideally, if we have a framing convension, we don't need to get it from
>> userspace at all, all userspace should care about is writing data to the
>> character device and we wrap it up and feed it to the underlying driver.
>
> What do you have in mind for "framing convention"?  As mentioned above
> codeAurora was using the OST specification but Al Grant tell me it
> isn't supported anymore.
>
> Thanks for the open dialogue,
> Mathieu
>
>>
>> Regards,
>> --
>> Alex
diff mbox

Patch

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm b/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
new file mode 100644
index 000000000000..44fda9a47ea0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
@@ -0,0 +1,62 @@ 
+What:		/sys/bus/coresight/devices/<memory_map>.stm/enable_source
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Enable/disable tracing on this specific trace macrocell.
+		Enabling the trace macrocell implies it has been configured
+		properly and a sink has been identidifed for it.  The path
+		of coresight components linking the source to the sink is
+		configured and managed automatically by the coresight framework.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/entities
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Controls which entities have been allowed to use the
+		stimulus ports, regarless of the channel they were assigned.
+		Entity definition can be found in
+		include/uapi/linux/coresight-stm32.h
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/hwevent_enable
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Provides access to the HW event enable register, used in
+		conjunction with HW event bank select register.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/hwevent_select
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Gives access to the HW event block select register
+		(STMHEBSR) in order to configure up to 256 channels.  Used in
+		conjunction with "hwevent_enable" register as described above.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/port_enable
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Provides access to the stimlus port enable register
+		(STMSPER).  Used in conjunction with "port_select" described
+		below.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/port_select
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Used to determine which bank of stimulus port bit in
+		register STMSPER (see above) apply to.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/status
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(R) List various control and status registers.  The specific
+		layout and content is driver specific.
+
+What:		/sys/bus/coresight/devices/<memory_map>.stm/traceid
+Date:		February 2015
+KernelVersion:	4.01
+Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
+Description:	(RW) Holds the trace ID that will appear in the trace stream
+		coming from this trace entity.
diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index 02361552a3ea..a041477698d9 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -190,8 +190,8 @@  expected to be accessed and controlled using those entries.
 Last but not least, "struct module *owner" is expected to be set to reflect
 the information carried in "THIS_MODULE".
 
-How to use
-----------
+How to use the tracer modules
+-----------------------------
 
 Before trace collection can start, a coresight sink needs to be identify.
 There is no limit on the amount of sinks (nor sources) that can be enabled at
@@ -297,3 +297,87 @@  Info                                    Tracing enabled
 Instruction     13570831        0x8026B584      E28DD00C        false   ADD      sp,sp,#0xc
 Instruction     0       0x8026B588      E8BD8000        true    LDM      sp!,{pc}
 Timestamp                                       Timestamp: 17107041535
+
+How to use the STM module
+-------------------------
+
+Using the System Trace Macrocell module is the same as the tracers - the only
+difference is that components (entities) are driving the trace capture rather
+than the program flow through the code.
+
+As with any other CoreSight component specifics about the STM tracer can be
+found in sysfs, with more information on each entry being found in [1]:
+
+root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm
+enable_source   hwevent_select  power           traceid
+entities        port_enable     status          uevent
+hwevent_enable  port_select     subsystem
+root@genericarmv8:~#
+
+Like any other source a sink needs to be identified and the STM enabled before
+being used:
+
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
+root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source
+
+From there user space applications can request and use channels using the devfs
+interface provided for that purpose:
+
+root@genericarmv8:~# ls -l /dev/20100000.stm
+crw-------    1 root     root       10,  61 Jan  3 18:11 /dev/20100000.stm
+root@genericarmv8:~#
+
+The following sample program provides an example of the supported operations:
+
+#include <stdio.h>
+#include <fcntl.h>
+#include <string.h>
+#include <linux/coresight-stm.h>
+
+#define BUFSIZE	20
+
+int main(int argc, char *argv[])
+{
+	int fd, n;
+	unsigned long options;
+	char buf[BUFSIZE];
+	char data[BUFSIZE] = "this is a test";
+
+	fd = open (argv[1], O_RDWR, 0);
+
+	if (n == -1) {
+		printf("can't open %s\n", argv[1]);
+		return 0;
+	}
+
+	n = read(fd, buf, BUFSIZE);
+	printf("channel_id: %d\n", atoi(buf));
+
+	options = STM_OPTION_TIMESTAMPED;
+	ioctl(fd, STM_IOCTL_SET_OPTIONS, &options);
+	options = 0;
+	ioctl(fd, STM_IOCTL_GET_OPTIONS, &options);
+	printf("options: 0x%x\n", options);
+
+	write(fd, data, strlen(data));
+
+	close(fd);
+
+	return 0;
+}
+
+When opening the devfs entry the first available channel is reserved for the
+requesting application.  That channel will remain the same until close() is
+called where it will go back to the channel pool.  From there calling open()
+again may or may _not_ yield the same channel number.
+
+From user space applications can determine what channel they've been given by
+issueing a read() on the file descriptor returned by open().  An ioctl() call is
+provided to set the channel options and the write() method will inject logging
+information in the channel.  There is no limit on the amount of channels an
+application can reserve, granted they use a different file descriptor for each
+one.
+
+If no more channels are available value of returned channel ID is '-1'.
+
+[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
diff --git a/drivers/coresight/Kconfig b/drivers/coresight/Kconfig
index fc1f1ae7a49d..c865dcd306cd 100644
--- a/drivers/coresight/Kconfig
+++ b/drivers/coresight/Kconfig
@@ -58,4 +58,14 @@  config CORESIGHT_SOURCE_ETM3X
 	  which allows tracing the instructions that a processor is executing
 	  This is primarily useful for instruction level tracing.  Depending
 	  the ETM version data tracing may also be available.
+
+config CORESIGHT_STM
+	bool "CoreSight System Trace Macrocell driver"
+	depends on (ARM && !(CPU_32v3 || CPU_32v4 || CPU_32v4T)) || ARM64
+	select CORESIGHT_LINKS_AND_SINKS
+	help
+	  This driver provides support for hardware assisted software
+	  instrumentation based tracing. This is primarily used for
+	  logging useful software events or data coming from various entities
+	  in the system, possibly running different OSs
 endif
diff --git a/drivers/coresight/Makefile b/drivers/coresight/Makefile
index 4b4bec890ef5..7005b48a33ed 100644
--- a/drivers/coresight/Makefile
+++ b/drivers/coresight/Makefile
@@ -9,3 +9,4 @@  obj-$(CONFIG_CORESIGHT_SINK_ETBV10) += coresight-etb10.o
 obj-$(CONFIG_CORESIGHT_LINKS_AND_SINKS) += coresight-funnel.o \
 					   coresight-replicator.o
 obj-$(CONFIG_CORESIGHT_SOURCE_ETM3X) += coresight-etm3x.o coresight-etm-cp14.o
+obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
diff --git a/drivers/coresight/coresight-stm.c b/drivers/coresight/coresight-stm.c
new file mode 100644
index 000000000000..61ab0c933eb5
--- /dev/null
+++ b/drivers/coresight/coresight-stm.c
@@ -0,0 +1,1070 @@ 
+/* Copyright (c) 2014-2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/device.h>
+#include <linux/io.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+#include <linux/clk.h>
+#include <linux/bitmap.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/coresight.h>
+#include <linux/coresight-stm.h>
+#include <linux/amba/bus.h>
+#include <asm/unaligned.h>
+
+#include "coresight-priv.h"
+
+#define STMDMASTARTR			0xc04
+#define STMDMASTOPR			0xc08
+#define STMDMASTATR			0xc0c
+#define STMDMACTLR			0xc10
+#define STMDMAIDR			0xcfc
+#define STMHEER				0xd00
+#define STMHETER			0xd20
+#define STMHEBSR			0xd60
+#define STMHEMCR			0xd64
+#define STMHEMASTR			0xdf4
+#define STMHEFEAT1R			0xdf8
+#define STMHEIDR			0xdfc
+#define STMSPER				0xe00
+#define STMSPTER			0xe20
+#define STMPRIVMASKR			0xe40
+#define STMSPSCR			0xe60
+#define STMSPMSCR			0xe64
+#define STMSPOVERRIDER			0xe68
+#define STMSPMOVERRIDER			0xe6c
+#define STMSPTRIGCSR			0xe70
+#define STMTCSR				0xe80
+#define STMTSSTIMR			0xe84
+#define STMTSFREQR			0xe8c
+#define STMSYNCR			0xe90
+#define STMAUXCR			0xe94
+#define STMSPFEAT1R			0xea0
+#define STMSPFEAT2R			0xea4
+#define STMSPFEAT3R			0xea8
+#define STMITTRIGGER			0xee8
+#define STMITATBDATA0			0xeec
+#define STMITATBCTR2			0xef0
+#define STMITATBID			0xef4
+#define STMITATBCTR0			0xef8
+
+#define STM_32_CHANNEL			32
+#define BYTES_PER_CHANNEL		256
+#define STM_TRACE_BUF_SIZE		4096
+
+/* Register bit definition */
+#define STMTCSR_BUSY_BIT		23
+/* Reserve the first 10 channels for kernel usage */
+#define STM_CHANNEL_OFFSET		0
+
+enum stm_pkt_type {
+	STM_PKT_TYPE_DATA	= 0x98,
+	STM_PKT_TYPE_FLAG	= 0xE8,
+	STM_PKT_TYPE_TRIG	= 0xF8,
+};
+
+enum {
+	STM_OPTION_MARKED	= 0x10,
+};
+
+#define stm_channel_addr(drvdata, ch)	(drvdata->chs.base +	\
+					(ch * BYTES_PER_CHANNEL))
+#define stm_channel_off(type, opts)	(type & ~opts)
+
+#ifndef CONFIG_64BIT
+static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+{
+	asm volatile("strd %1, %0"
+		     : "+Qo" (*(volatile u64 __force *)addr)
+		     : "r" (val));
+}
+#undef writeq_relaxed
+#define writeq_relaxed(v, c)	__raw_writeq((__force u64) cpu_to_le64(v), c)
+#endif
+
+static int boot_nr_channel;
+
+module_param_named(
+	boot_nr_channel, boot_nr_channel, int, S_IRUGO
+);
+
+/**
+ * struct channel_space - central management entity for extended ports
+ * @base:		memory mapped base address where channels start.
+ * @bitmap:		tally of which channel is being used.
+ */
+struct channel_space {
+	void __iomem		*base;
+	unsigned long		*bitmap;
+};
+
+/**
+ * struct stm_node - aggregation of channel information for userspace access
+ * @channel_id:		the channel number associated to this file descriptor.
+ * @options:		options for this channel - none, timestamped,
+ *			guaranteed.
+ * @drvdata:		STM driver specifics.
+ */
+struct stm_node {
+	int			channel_id;
+	u32			options;
+	struct stm_drvdata	*drvdata;
+};
+
+/**
+ * struct stm_drvdata - specifics associated to an STM component
+ * @ base:		memory mapped base address for this component.
+ * @dev:		the device entity associated to this component.
+ * @csdev:		component vitals needed by the framework.
+ * @miscdev:		specifics to handle "/dev/xyz.stm" entry.
+ * @clk:		the clock this component is associated to.
+ * @spinlock:		only one at a time pls.
+ * @chs:		the channels accociated to this STM.
+ * @enable:		this STM is being used.
+ * @entities:		set of entities allowed to access the STM ports.
+ * @traceid:		value of the current ID for this component.
+ * @write_64bit:	whether this STM supports 64 bit access.
+ * @stmsper:		settings for register STMSPER.
+ * @stmspscr:		settings for register STMSPSCR.
+ * @numsp:		the total number of stimulus port support by this STM.
+ * @stmheer:		settings for register STMHEER.
+ * @stmheter:		settings for register STMHETER.
+ * @stmhebsr:		settings for register STMHEBSR.
+ */
+struct stm_drvdata {
+	void __iomem		*base;
+	struct device		*dev;
+	struct coresight_device	*csdev;
+	struct miscdevice	miscdev;
+	struct clk		*clk;
+	spinlock_t		spinlock;
+	struct channel_space	chs;
+	bool			enable;
+	DECLARE_BITMAP(entities, STM_ENTITY_MAX);
+	u8			traceid;
+	u32			write_64bit;
+	u32			stmsper;
+	u32			stmspscr;
+	u32			numsp;
+	u32			stmheer;
+	u32			stmheter;
+	u32			stmhebsr;
+};
+
+static struct stm_drvdata *stmdrvdata;
+
+static void stm_hwevent_enable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(drvdata->stmhebsr, drvdata->base + STMHEBSR);
+	writel_relaxed(drvdata->stmheter, drvdata->base + STMHETER);
+	writel_relaxed(drvdata->stmheer, drvdata->base + STMHEER);
+	writel_relaxed(0x01 |	/* Enable HW event tracing */
+		       0x04,	/* Error detection on event tracing */
+		       drvdata->base + STMHEMCR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_port_enable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+	/* ATB trigger enable on direct writes to TRIG locations */
+	writel_relaxed(0x10,
+		       drvdata->base + STMSPTRIGCSR);
+	writel_relaxed(drvdata->stmspscr, drvdata->base + STMSPSCR);
+	writel_relaxed(drvdata->stmsper, drvdata->base + STMSPER);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_enable_hw(struct stm_drvdata *drvdata)
+{
+	if (drvdata->stmheer)
+		stm_hwevent_enable_hw(drvdata);
+
+	stm_port_enable_hw(drvdata);
+
+	CS_UNLOCK(drvdata->base);
+
+	/* 4096 byte between synchronisation packets */
+	writel_relaxed(0xFFF, drvdata->base + STMSYNCR);
+	writel_relaxed((drvdata->traceid << 16 | /* trace id */
+			0x02 |			 /* timestamp enable */
+			0x01),			 /* global STM enable */
+			drvdata->base + STMTCSR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static int stm_enable(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+	int ret;
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	stm_enable_hw(drvdata);
+	drvdata->enable = true;
+	spin_unlock(&drvdata->spinlock);
+
+	dev_info(drvdata->dev, "STM tracing enabled\n");
+	return 0;
+}
+
+static void stm_hwevent_disable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(0x0, drvdata->base + STMHEMCR);
+	writel_relaxed(0x0, drvdata->base + STMHEER);
+	writel_relaxed(0x0, drvdata->base + STMHETER);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_port_disable_hw(struct stm_drvdata *drvdata)
+{
+	CS_UNLOCK(drvdata->base);
+
+	writel_relaxed(0x0, drvdata->base + STMSPER);
+	writel_relaxed(0x0, drvdata->base + STMSPTRIGCSR);
+
+	CS_LOCK(drvdata->base);
+}
+
+static void stm_disable_hw(struct stm_drvdata *drvdata)
+{
+	u32 val;
+
+	CS_UNLOCK(drvdata->base);
+
+	val = readl_relaxed(drvdata->base + STMTCSR);
+	val &= ~0x1; /* clear global STM enable [0] */
+	writel_relaxed(val, drvdata->base + STMTCSR);
+
+	CS_LOCK(drvdata->base);
+
+	stm_port_disable_hw(drvdata);
+	if (drvdata->stmheer)
+		stm_hwevent_disable_hw(drvdata);
+}
+
+static void stm_disable(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	spin_lock(&drvdata->spinlock);
+	stm_disable_hw(drvdata);
+	drvdata->enable = false;
+	spin_unlock(&drvdata->spinlock);
+
+	/* Wait until the engine has completely stopped */
+	coresight_timeout(drvdata, STMTCSR, STMTCSR_BUSY_BIT, 0);
+
+	clk_disable_unprepare(drvdata->clk);
+
+	dev_info(drvdata->dev, "STM tracing disabled\n");
+}
+
+static int stm_trace_id(struct coresight_device *csdev)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
+
+	return drvdata->traceid;
+}
+
+static const struct coresight_ops_source stm_source_ops = {
+	.trace_id	= stm_trace_id,
+	.enable		= stm_enable,
+	.disable	= stm_disable,
+};
+
+static const struct coresight_ops stm_cs_ops = {
+	.source_ops	= &stm_source_ops,
+};
+
+static int stm_channel_alloc(u32 off)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+	int ch = -1;
+
+	do {
+		ch = find_next_zero_bit(drvdata->chs.bitmap,
+					drvdata->numsp, off);
+	} while ((ch < drvdata->numsp) &&
+		 test_and_set_bit(ch, drvdata->chs.bitmap));
+
+	return ch;
+}
+
+static void stm_channel_free(u32 ch)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+	clear_bit(ch, drvdata->chs.bitmap);
+}
+
+static int stm_send_64bit(void *addr, const void *data, u32 size)
+{
+	u64 prepad = 0;
+	u64 postpad = 0;
+	char *pad;
+	u8 off, endoff;
+	u32 len = size;
+
+	off = (unsigned long)data & 0x7;
+
+	if (off) {
+		endoff = 8 - off;
+		pad = (char *)&prepad;
+		pad += off;
+
+		while (endoff && size) {
+			*pad++ = *(char *)data++;
+			endoff--;
+			size--;
+		}
+		writeq_relaxed(prepad, addr);
+	}
+
+	/* now we are 64bit aligned */
+	while (size >= 8) {
+		writeq_relaxed(*(u64 *)data, addr);
+		data += 8;
+		size -= 8;
+	}
+
+	endoff = 0;
+
+	if (size) {
+		endoff = 8 - (u8)size;
+		pad = (char *)&postpad;
+
+		while (size) {
+			*pad++ = *(char *)data++;
+			size--;
+		}
+		writeq_relaxed(postpad, addr);
+	}
+
+	return len + off + endoff;
+}
+
+static int stm_trace_data_64bit(unsigned long ch_addr, u32 options,
+				const void *data, u32 size)
+{
+	void *addr;
+
+	options &= ~STM_OPTION_TIMESTAMPED;
+	addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
+
+	return stm_send_64bit(addr, data, size);
+}
+
+static int stm_send(void *addr, const void *data, u32 size)
+{
+	u32 len = size;
+
+	if (((unsigned long)data & 0x1) && (size >= 1)) {
+		writeb_relaxed(*(u8 *)data, addr);
+		data++;
+		size--;
+	}
+	if (((unsigned long)data & 0x2) && (size >= 2)) {
+		writew_relaxed(*(u16 *)data, addr);
+		data += 2;
+		size -= 2;
+	}
+
+	/* now we are 32bit aligned */
+	while (size >= 4) {
+		writel_relaxed(*(u32 *)data, addr);
+		data += 4;
+		size -= 4;
+	}
+
+	if (size >= 2) {
+		writew_relaxed(*(u16 *)data, addr);
+		data += 2;
+		size -= 2;
+	}
+	if (size >= 1) {
+		writeb_relaxed(*(u8 *)data, addr);
+		data++;
+		size--;
+	}
+
+	return len;
+}
+
+static int stm_trace_data(unsigned long ch_addr, u32 options,
+			  const void *data, u32 size)
+{
+	void *addr;
+
+	options &= ~STM_OPTION_TIMESTAMPED;
+	addr = (void *)(ch_addr | stm_channel_off(STM_PKT_TYPE_DATA, options));
+
+	return stm_send(addr, data, size);
+}
+
+static inline int stm_trace_hw(u32 options, u32 channel, u8 entity_id,
+			       const void *data, u32 size)
+{
+	int len = 0;
+	unsigned long ch_addr;
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+
+	/* get the channel address */
+	ch_addr = (unsigned long)stm_channel_addr(drvdata, channel);
+
+	if (drvdata->write_64bit)
+		len = stm_trace_data_64bit(ch_addr, options, data, size);
+	else
+		/* send the payload data */
+		len = stm_trace_data(ch_addr, options, data, size);
+
+	return len;
+}
+
+/**
+ * stm_trace - trace the binary or string data through STM
+ * @options: tracing options - guaranteed, timestamped, etc
+ * @entity_id: entity representing the trace data
+ * @data: pointer to binary r string data buffer
+ * @size: size of data to send
+ *
+ * Returns: number of bytes transferred over STM
+ */
+int stm_trace(u32 options, int channel_id,
+	      u8 entity_id, const void *data, u32 size)
+{
+	struct stm_drvdata *drvdata = stmdrvdata;
+
+	if (channel_id < 0)
+		return 0;
+
+	if (!(drvdata && drvdata->enable &&
+	      test_bit(entity_id, drvdata->entities)))
+		return 0;
+
+	return stm_trace_hw(options, (u32)channel_id,
+			    entity_id, data, size);
+}
+EXPORT_SYMBOL(stm_trace);
+
+static int stm_open(struct inode *inode, struct file *file)
+{
+	struct stm_node *node;
+	struct stm_drvdata *drvdata = container_of(file->private_data,
+						   struct stm_drvdata, miscdev);
+
+	node = kmalloc(sizeof(struct stm_node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	node->drvdata = drvdata;
+	node->options = STM_OPTION_TIMESTAMPED;
+	node->channel_id = stm_channel_alloc(STM_CHANNEL_OFFSET);
+	if (node->channel_id < 0)
+		return -ENOMEM;
+
+	file->private_data = node;
+	return 0;
+}
+
+static int stm_release(struct inode *inode, struct file *file)
+{
+	struct stm_node *node = file->private_data;
+
+	/* we are done, free the channel */
+	if (node->channel_id >= 0)
+		stm_channel_free((u32)node->channel_id);
+	file->private_data = NULL;
+	kfree(node);
+	return 0;
+}
+
+static ssize_t stm_read(struct file *file, char __user *data,
+			size_t size, loff_t *ppos)
+{
+	char buf[20];
+	struct stm_node *node = file->private_data;
+
+	snprintf(buf, sizeof(buf), "%d", node->channel_id);
+	return simple_read_from_buffer(data, size, ppos,
+				       buf, strlen(buf));
+}
+
+static ssize_t stm_write(struct file *file, const char __user *data,
+			 size_t size, loff_t *ppos)
+{
+	char *buf;
+	struct stm_node *node = file->private_data;
+	struct stm_drvdata *drvdata = node->drvdata;
+
+	if (node->channel_id < 0)
+		return -EINVAL;
+
+	if (!drvdata->enable || !size)
+		return -EINVAL;
+
+	if (size > STM_TRACE_BUF_SIZE)
+		size = STM_TRACE_BUF_SIZE;
+
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	if (copy_from_user(buf, data, size)) {
+		kfree(buf);
+		dev_dbg(drvdata->dev, "%s: copy_from_user failed\n", __func__);
+		return -EFAULT;
+	}
+
+	if (!test_bit(STM_ENTITY_TRACE_USPACE, drvdata->entities)) {
+		kfree(buf);
+		return size;
+	}
+
+	stm_trace_hw(node->options, (u32)node->channel_id,
+		     STM_ENTITY_TRACE_USPACE, buf, size);
+
+	kfree(buf);
+
+	return size;
+}
+
+static long stm_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	u32 options;
+	struct stm_node *node = file->private_data;
+
+	switch (cmd) {
+	case STM_IOCTL_SET_OPTIONS:
+		if (copy_from_user(&options, (void __user *)arg, sizeof(u32)))
+			return -EFAULT;
+
+		options &= (STM_OPTION_TIMESTAMPED | STM_OPTION_GUARANTEED);
+		node->options = options;
+		break;
+	case STM_IOCTL_GET_OPTIONS:
+		options = node->options;
+		if (copy_to_user((void __user *)arg, &options, sizeof(options)))
+			return -EFAULT;
+		break;
+	default:
+		return -EINVAL;
+	};
+
+	return 0;
+}
+
+static const struct file_operations stm_fops = {
+	.owner		= THIS_MODULE,
+	.open		= stm_open,
+	.write		= stm_write,
+	.read		= stm_read,
+	.llseek		= no_llseek,
+	.unlocked_ioctl	= stm_ioctl,
+	.release	= stm_release,
+};
+
+static ssize_t hwevent_enable_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val = drvdata->stmheer;
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t hwevent_enable_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return -EINVAL;
+
+	drvdata->stmheer = val;
+	/* HW event enable and trigger go hand in hand */
+	drvdata->stmheter = val;
+
+	return size;
+}
+static DEVICE_ATTR_RW(hwevent_enable);
+
+static ssize_t hwevent_select_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val = drvdata->stmhebsr;
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t hwevent_select_store(struct device *dev,
+				    struct device_attribute *attr,
+				    const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return -EINVAL;
+
+	drvdata->stmhebsr = val;
+
+	return size;
+}
+static DEVICE_ATTR_RW(hwevent_select);
+
+static ssize_t port_select_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+
+	if (!drvdata->enable) {
+		val = drvdata->stmspscr;
+	} else {
+		spin_lock(&drvdata->spinlock);
+		val = readl_relaxed(drvdata->base + STMSPSCR);
+		spin_unlock(&drvdata->spinlock);
+	}
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t port_select_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val, stmsper;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	drvdata->stmspscr = val;
+
+	if (drvdata->enable) {
+		CS_UNLOCK(drvdata->base);
+		/* Process as per ARM's TRM recommendation */
+		stmsper = readl_relaxed(drvdata->base + STMSPER);
+		writel_relaxed(0x0, drvdata->base + STMSPER);
+		writel_relaxed(drvdata->stmspscr, drvdata->base + STMSPSCR);
+		writel_relaxed(stmsper, drvdata->base + STMSPER);
+		CS_LOCK(drvdata->base);
+	}
+	spin_unlock(&drvdata->spinlock);
+
+	return size;
+}
+static DEVICE_ATTR_RW(port_select);
+
+static ssize_t port_enable_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+
+	if (!drvdata->enable) {
+		val = drvdata->stmsper;
+	} else {
+		spin_lock(&drvdata->spinlock);
+		val = readl_relaxed(drvdata->base + STMSPER);
+		spin_unlock(&drvdata->spinlock);
+	}
+
+	return scnprintf(buf, PAGE_SIZE, "%#lx\n", val);
+}
+
+static ssize_t port_enable_store(struct device *dev,
+				 struct device_attribute *attr,
+				 const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val;
+	int ret = 0;
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	spin_lock(&drvdata->spinlock);
+	drvdata->stmsper = val;
+
+	if (drvdata->enable) {
+		CS_UNLOCK(drvdata->base);
+		writel_relaxed(drvdata->stmsper, drvdata->base + STMSPER);
+		CS_LOCK(drvdata->base);
+	}
+	spin_unlock(&drvdata->spinlock);
+
+	return size;
+}
+static DEVICE_ATTR_RW(port_enable);
+
+static ssize_t entities_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	ssize_t len;
+
+	len = scnprintf(buf, PAGE_SIZE, "%*pb",
+			STM_ENTITY_MAX, drvdata->entities);
+
+	if (PAGE_SIZE - len < 2)
+		len = -EINVAL;
+	else
+		len += scnprintf(buf + len, 2, "\n");
+
+	return len;
+}
+
+static ssize_t entities_store(struct device *dev,
+			      struct device_attribute *attr,
+			      const char *buf, size_t size)
+{
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+	unsigned long val1, val2;
+
+	if (sscanf(buf, "%lx %lx", &val1, &val2) != 2)
+		return -EINVAL;
+
+	if (val1 >= STM_ENTITY_MAX)
+		return -EINVAL;
+
+	if (val2)
+		__set_bit(val1, drvdata->entities);
+	else
+		__clear_bit(val1, drvdata->entities);
+
+	return size;
+}
+static DEVICE_ATTR_RW(entities);
+
+static ssize_t status_show(struct device *dev,
+			   struct device_attribute *attr, char *buf)
+{
+	int ret;
+	unsigned long flags;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	spin_lock_irqsave(&drvdata->spinlock, flags);
+
+	CS_UNLOCK(drvdata->base);
+	ret = sprintf(buf,
+		      "STMTCSR:\t0x%08x\n"
+		      "STMTSFREQR:\t0x%08x\n"
+		      "STMTSYNCR:\t0x%08x\n"
+		      "STMSPER:\t0x%08x\n"
+		      "STMSPTER:\t0x%08x\n"
+		      "STMPRIVMASKR:\t0x%08x\n"
+		      "STMSPSCR:\t0x%08x\n"
+		      "STMSPMSCR:\t0x%08x\n"
+		      "STMFEAT1R:\t0x%08x\n"
+		      "STMFEAT2R:\t0x%08x\n"
+		      "STMFEAT3R:\t0x%08x\n"
+		      "STMDEVID:\t0x%08x\n",
+		      readl_relaxed(drvdata->base + STMTCSR),
+		      readl_relaxed(drvdata->base + STMTSFREQR),
+		      readl_relaxed(drvdata->base + STMSYNCR),
+		      readl_relaxed(drvdata->base + STMSPER),
+		      readl_relaxed(drvdata->base + STMSPTER),
+		      readl_relaxed(drvdata->base + STMPRIVMASKR),
+		      readl_relaxed(drvdata->base + STMSPSCR),
+		      readl_relaxed(drvdata->base + STMSPMSCR),
+		      readl_relaxed(drvdata->base + STMSPFEAT1R),
+		      readl_relaxed(drvdata->base + STMSPFEAT2R),
+		      readl_relaxed(drvdata->base + STMSPFEAT3R),
+		      readl_relaxed(drvdata->base + CORESIGHT_DEVID));
+
+	CS_LOCK(drvdata->base);
+	spin_unlock_irqrestore(&drvdata->spinlock, flags);
+	clk_disable_unprepare(drvdata->clk);
+
+	return ret;
+}
+static DEVICE_ATTR_RO(status);
+
+static ssize_t traceid_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	unsigned long val;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	val = drvdata->traceid;
+	return sprintf(buf, "%#lx\n", val);
+}
+
+static ssize_t traceid_store(struct device *dev,
+			     struct device_attribute *attr,
+			     const char *buf, size_t size)
+{
+	int ret;
+	unsigned long val;
+	struct stm_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	ret = kstrtoul(buf, 16, &val);
+	if (ret)
+		return ret;
+
+	/* traceid field is 7bit wide on STM32 */
+	drvdata->traceid = val & 0x7f;
+	return size;
+}
+static DEVICE_ATTR_RW(traceid);
+
+static struct attribute *coresight_stm_attrs[] = {
+	&dev_attr_hwevent_enable.attr,
+	&dev_attr_hwevent_select.attr,
+	&dev_attr_port_enable.attr,
+	&dev_attr_port_select.attr,
+	&dev_attr_entities.attr,
+	&dev_attr_status.attr,
+	&dev_attr_traceid.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(coresight_stm);
+
+static int stm_get_resource_byname(struct device_node *np,
+				   char *ch_base, struct resource *res)
+{
+	const char *name = NULL;
+	int index = 0, found = 0;
+
+	while (!of_property_read_string_index(np, "reg-names", index, &name)) {
+		if (strcmp(ch_base, name)) {
+			index++;
+			continue;
+		}
+
+		/* We have a match and @index is where it's at */
+		found = 1;
+		break;
+	}
+
+	if (!found)
+		return -EINVAL;
+
+	return of_address_to_resource(np, index, res);
+}
+
+static u32 stm_fundamental_data_size(struct stm_drvdata *drvdata)
+{
+	u32 stmspfeat2r;
+
+	stmspfeat2r = readl_relaxed(drvdata->base + STMSPFEAT2R);
+	return BMVAL(stmspfeat2r, 12, 15);
+}
+
+static u32 stm_num_stimulus_port(struct stm_drvdata *drvdata)
+{
+	u32 numsp;
+
+	numsp = readl_relaxed(drvdata->base + CORESIGHT_DEVID);
+	/*
+	 * NUMPS in STMDEVID is 17 bit long and if equal to 0x0,
+	 * 32 stimulus ports are supported.
+	 */
+	numsp &= 0x1ffff;
+	if (!numsp)
+		numsp = STM_32_CHANNEL;
+	return numsp;
+}
+
+static void stm_init_default_data(struct stm_drvdata *drvdata)
+{
+	/* Don't use port selection */
+	drvdata->stmspscr = 0x0;
+	/*
+	 * Enable all channel regardless of their number.  When port
+	 * selection isn't used (see above) STMSPER applies to all
+	 * 32 channel group available, hence setting all 32 bits to 1
+	 */
+	drvdata->stmsper = ~0x0;
+
+	/*
+	 * Select arbitrary value to start with.  If there is a conflict
+	 * with other tracers the framework will deal with it.
+	 */
+	drvdata->traceid = 0x20;
+
+	bitmap_zero(drvdata->entities, STM_ENTITY_MAX);
+}
+
+static int stm_probe(struct amba_device *adev, const struct amba_id *id)
+{
+	int ret;
+	void __iomem *base;
+	unsigned long *bitmap;
+	struct device *dev = &adev->dev;
+	struct coresight_platform_data *pdata = NULL;
+	struct stm_drvdata *drvdata;
+	struct resource *res = &adev->res;
+	struct resource ch_res;
+	size_t res_size, bitmap_size;
+	struct coresight_desc *desc;
+	struct device_node *np = adev->dev.of_node;
+
+	if (np) {
+		pdata = of_get_coresight_platform_data(dev, np);
+		if (IS_ERR(pdata))
+			return PTR_ERR(pdata);
+		adev->dev.platform_data = pdata;
+	}
+	drvdata = devm_kzalloc(dev, sizeof(*drvdata), GFP_KERNEL);
+	if (!drvdata)
+		return -ENOMEM;
+
+	/* Store the driver data pointer for use in exported functions */
+	stmdrvdata = drvdata;
+	drvdata->dev = &adev->dev;
+	dev_set_drvdata(dev, drvdata);
+
+	base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+	drvdata->base = base;
+
+	ret = stm_get_resource_byname(np, "stm-stimulus-base", &ch_res);
+	if (ret)
+		return ret;
+
+	base = devm_ioremap_resource(dev, &ch_res);
+	if (IS_ERR(base))
+		return PTR_ERR(base);
+	drvdata->chs.base = base;
+
+	ret = clk_prepare_enable(drvdata->clk);
+	if (ret)
+		return ret;
+
+	drvdata->write_64bit = stm_fundamental_data_size(drvdata);
+
+	if (boot_nr_channel) {
+		drvdata->numsp = boot_nr_channel;
+		res_size = min((resource_size_t)(boot_nr_channel *
+				  BYTES_PER_CHANNEL), resource_size(res));
+		bitmap_size = boot_nr_channel * sizeof(long);
+	} else {
+		drvdata->numsp = stm_num_stimulus_port(drvdata);
+		res_size = min((resource_size_t)(drvdata->numsp *
+				 BYTES_PER_CHANNEL), resource_size(res));
+		bitmap_size = drvdata->numsp * sizeof(long);
+	}
+
+	clk_disable_unprepare(drvdata->clk);
+
+	bitmap = devm_kzalloc(dev, bitmap_size, GFP_KERNEL);
+	if (!bitmap)
+		return -ENOMEM;
+	drvdata->chs.bitmap = bitmap;
+
+	spin_lock_init(&drvdata->spinlock);
+
+	drvdata->clk = adev->pclk;
+
+	stm_init_default_data(drvdata);
+
+	desc = devm_kzalloc(dev, sizeof(*desc), GFP_KERNEL);
+	if (!desc)
+		return -ENOMEM;
+
+	desc->type = CORESIGHT_DEV_TYPE_SOURCE;
+	desc->subtype.source_subtype = CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE;
+	desc->ops = &stm_cs_ops;
+	desc->pdata = pdata;
+	desc->dev = dev;
+	desc->groups = coresight_stm_groups;
+	drvdata->csdev = coresight_register(desc);
+	if (IS_ERR(drvdata->csdev))
+		return PTR_ERR(drvdata->csdev);
+
+	drvdata->miscdev.name = pdata->name;
+	drvdata->miscdev.minor = MISC_DYNAMIC_MINOR;
+	drvdata->miscdev.fops = &stm_fops;
+	ret = misc_register(&drvdata->miscdev);
+	if (ret)
+		goto err;
+
+	dev_info(drvdata->dev, "STM initialized\n");
+
+	return 0;
+err:
+	coresight_unregister(drvdata->csdev);
+	return ret;
+}
+
+static int stm_remove(struct amba_device *adev)
+{
+	struct stm_drvdata *drvdata = amba_get_drvdata(adev);
+
+	misc_deregister(&drvdata->miscdev);
+	coresight_unregister(drvdata->csdev);
+	return 0;
+}
+
+static struct amba_id stm_ids[] = {
+	{
+		.id     = 0x0003b962,
+		.mask   = 0x0003ffff,
+	},
+	{ 0, 0},
+};
+
+static struct amba_driver stm_driver = {
+	.drv = {
+		.name   = "coresight-stm",
+		.owner	= THIS_MODULE,
+	},
+	.probe          = stm_probe,
+	.remove         = stm_remove,
+	.id_table	= stm_ids,
+};
+
+module_amba_driver(stm_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_DESCRIPTION("CoreSight System Trace Macrocell driver");
diff --git a/include/linux/coresight-stm.h b/include/linux/coresight-stm.h
new file mode 100644
index 000000000000..fc791562ad7c
--- /dev/null
+++ b/include/linux/coresight-stm.h
@@ -0,0 +1,40 @@ 
+#ifndef __LINUX_CORESIGHT_STM_H_
+#define __LINUX_CORESIGHT_STM_H_
+
+#include <uapi/linux/coresight-stm.h>
+
+/* kernel uses ch_id 0 until a better (more flexible) way is found */
+#define CH_ID_KERNEL	0
+
+#define stm_log_inv(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_NONE, CH_ID_KERNEL,		\
+	STM_ENTITY_TRACE_KERNEL, data, size)
+
+#define stm_log_inv_ts(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_TIMESTAMPED, CH_ID_KERNEL,		\
+	STM_ENTITY_TRACE_KERNEL, data, size)			\
+
+#define stm_log_gtd(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_GUARANTEED, CH_ID_KERNEL,		\
+	STM_ENTITY_TRACE_KERNEL, data, size)			\
+
+#define stm_log_gtd_ts(entity_id, ch_id, data, size)		\
+	stm_trace(STM_OPTION_GUARANTEED |			\
+		  STM_OPTION_TIMESTAMPED,			\
+		  CH_ID_KERNEL, STM_ENTITY_TRACE_KERNEL, data, size)
+
+#define stm_log(entity_id, ch_id, data, size)			\
+	stm_log_inv_ts(entity_id, ch_id, data, size)
+
+#ifdef CONFIG_CORESIGHT_STM
+extern int stm_trace(u32 options, int channel_id,
+		     u8 entity_id, const void *data, u32 size);
+#else
+static inline int stm_trace(u32 options, int channel_id,
+			    u8 entity_id, const void *data, u32 size)
+{
+	return 0;
+}
+#endif
+
+#endif
diff --git a/include/uapi/linux/coresight-stm.h b/include/uapi/linux/coresight-stm.h
new file mode 100644
index 000000000000..208a4d79c4ee
--- /dev/null
+++ b/include/uapi/linux/coresight-stm.h
@@ -0,0 +1,23 @@ 
+#ifndef __UAPI_CORESIGHT_STM_H_
+#define __UAPI_CORESIGHT_STM_H_
+
+enum {
+	STM_ENTITY_NONE			= 0x00,
+	STM_ENTITY_TRACE_KERNEL		= 0x01,
+	STM_ENTITY_TRACE_USPACE		= 0x10,
+	STM_ENTITY_MAX			= 0xFF,
+};
+
+enum {
+	STM_IOCTL_NONE			= 0x00,
+	STM_IOCTL_SET_OPTIONS		= 0x01,
+	STM_IOCTL_GET_OPTIONS		= 0x10,
+};
+
+enum {
+	STM_OPTION_NONE			= 0x0,
+	STM_OPTION_TIMESTAMPED		= 0x08,
+	STM_OPTION_GUARANTEED		= 0x80,
+};
+
+#endif