[RFC] drm/panfrost: Add initial panfrost driver
diff mbox series

Message ID 20190308002408.32682-1-robh@kernel.org
State New
Headers show
Series
  • [RFC] drm/panfrost: Add initial panfrost driver
Related show

Commit Message

Rob Herring March 8, 2019, 12:24 a.m. UTC
From: "Marty E. Plummer" <hanetzer@startmail.com>

This adds the initial driver for panfrost which supports Arm Mali
Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
has been tested.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <maxime.ripard@bootlin.com>
Cc: Sean Paul <sean@poorly.run>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
---
Sending this out in the spirit of release early, release often. We're 
close to parity compared to mesa + the vendor driver. There's a few 
issues Tomeu is chasing. 

There's still some pieces of the h/w setup we've just hardcoded. Locking 
in various places is probably missing. Error recovery is non-existent 
(other than module unload/load). There's some work to add tracepoints 
and perf counters that's not here yet. Bifrost GPUs are definitely not 
supported yet other than identifying them. Primarily the MMU setup is 
missing.

How's performance? Great, because I haven't measured it.

This patch and its dependencies are available here[1]. The mesa support 
is here[2]. Both are likely to change (daily).

Rob

[1] https://gitlab.freedesktop.org/robh/linux-panfrost.git panfrost-rebase
[2] https://gitlab.freedesktop.org/tomeu/mesa.git mainline-driver

 drivers/gpu/drm/Kconfig                      |   2 +
 drivers/gpu/drm/Makefile                     |   1 +
 drivers/gpu/drm/panfrost/Kconfig             |  14 +
 drivers/gpu/drm/panfrost/Makefile            |  11 +
 drivers/gpu/drm/panfrost/panfrost_device.c   | 127 ++++
 drivers/gpu/drm/panfrost/panfrost_device.h   |  83 +++
 drivers/gpu/drm/panfrost/panfrost_drv.c      | 419 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_features.h | 308 +++++++++
 drivers/gpu/drm/panfrost/panfrost_gem.c      |  92 +++
 drivers/gpu/drm/panfrost/panfrost_gem.h      |  29 +
 drivers/gpu/drm/panfrost/panfrost_gpu.c      | 464 +++++++++++++
 drivers/gpu/drm/panfrost/panfrost_gpu.h      |  15 +
 drivers/gpu/drm/panfrost/panfrost_issues.h   | 175 +++++
 drivers/gpu/drm/panfrost/panfrost_job.c      | 662 +++++++++++++++++++
 drivers/gpu/drm/panfrost/panfrost_job.h      |  47 ++
 drivers/gpu/drm/panfrost/panfrost_mmu.c      | 409 ++++++++++++
 drivers/gpu/drm/panfrost/panfrost_mmu.h      |  15 +
 include/uapi/drm/panfrost_drm.h              | 138 ++++
 18 files changed, 3011 insertions(+)
 create mode 100644 drivers/gpu/drm/panfrost/Kconfig
 create mode 100644 drivers/gpu/drm/panfrost/Makefile
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_drv.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_features.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_issues.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.h
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.c
 create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.h
 create mode 100644 include/uapi/drm/panfrost_drm.h

Comments

Dave Airlie March 8, 2019, 12:51 a.m. UTC | #1
+struct drm_panfrost_submit {
> +
> +       /** Address to GPU mapping of job descriptor */
> +       __u64 jc;
> +
> +       /** An optional sync object to wait on before starting this job. */
> +       __u32 in_sync;
> +
> +       /** An optional sync object to place the completion fence in. */
> +       __u32 out_sync;
> +
> +       /** Pointer to a u32 array of the BOs that are referenced by the job. */
> +       __u64 bo_handles;
> +
> +       /** Number of BO handles passed in (size is that times 4). */
> +       __u32 bo_handle_count;
> +
> +       /** A combination of PANFROST_JD_REQ_* */
> +       __u32 requirements;
> +};
> +

I really think to write a decent vulkan driver, you need to take
arrays of in sync,

Look at the amdgpu chunk API.

Dave.
Alyssa Rosenzweig March 8, 2019, 2:33 a.m. UTC | #2
> I really think to write a decent vulkan driver, you need to take
> arrays of in sync,

Vulkan? What's that? ;)
Alyssa Rosenzweig March 8, 2019, 5 a.m. UTC | #3
Oh my onions, it's really here! It's really coming! It's really here!

----

> +	  DRM driver for ARM Mali Midgard (t6xx, t7xx, t8xx) and
> +	  Bifrost (G3x, G5x, G7x) GPUs.

Nitpick: the model names should maybe be capitalized? Or at least, the
T/G should be consistent? I'm not sure what the vendor marketing names
are exactly.

> +	unsigned long base_hw_features[64 / BITS_PER_LONG];
> +	unsigned long hw_issues[64 / BITS_PER_LONG];

This is confusing...? Is the idea to always have u64, regardless of
32/64-bitness? If so, why not just u64? I'm not totally sure why these
aren't just bitmasks, if we are capping to 64 for each. On the other
hand, there are a lot more than issues exposed by the kbase, though the
vast majority don't apply to kernel space (and should be sorted as a
purely userspace affair)..?

Also, nitpick: s/base_hw_features/hw_features/g, for consistency and not
inheriting naming cruft.

> +	struct panfrost_job *jobs[3];

3 is a bit of a magic number, it feels like. I'm guessing this
corresponds to job slots JS0/JS1/JS2? If so, I guess just add a quick
comment about that, since otherwise it feels a little random.

(Maybe I'm biased because `struct panfrost_job` means something totally
different in userspace for me...)

> +/* DRM_AUTH is required on SUBMIT for now, while all clients share a single
> + * address space.  Note that render nodes would be able to submit jobs that
> + * could access BOs from clients authenticated with the master node.
> + */

This concerns me. Per-process address spaces (configured natively in the
MMU) is essential from both security and stability standpoints. (It's
possible I'm misunderstanding what DRM_AUTH means in this context; this
is more responding to "share a single address space").

> +	drm_mm_init(&pfdev->mm, 0, SZ_4G); // 4G enough for now. can be 48-bit

What's the rationale for picking 4G (when the virtual address space is
64-bit, physical is 48-bit)? Easier portability to 32-bit for
simplicity, or something else?

> +static const struct of_device_id dt_match[] = {
> +	{ .compatible = "arm,mali-t760" },
> +	{ .compatible = "arm,mali-t860" },
> +	{}
> +};

Do we want to add compatibles for the rest of the Mali's on the initial
merge, or wait until they're actually confirmed working so we don't load
and cause problems on untested hardware?

> +enum base_hw_feature {
	...
> +	HW_FEATURE_PROTECTED_MODE
	...
> +};

1) I know these names are inherited from kbase, but might we prefer
panfrost-prefixed names for consistency?

2) I recall discussing this a bit over IRC, but most of these properties
are of more use to userspace than kernelspace. Does it make sense to
keep the feature list here rather than just in Mesa, bearing in mind
Mesa upgrades are easier than kernel upgrades? (I think you may have
been the one to bring up this fact, but hoping it doesn't get lost over
IRC).

3) On a matter of principle, I don't like wasting a bit on
Digital Rainbow Management (especially when it's not something we're
realistically going to implement for numerous reasons...)

> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -0,0 +1,464 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
> +/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
> +/* Copyright 2019 Collabora ltd. */

Given the register definitions are here (including the comments from
kbase -- if we strip those, it might be a different story), it might be
safer to add a vendor copyright here too.

Alternatively, maybe the registers should be in their own file anyway. I
know they were explicitly moved inward earlier, but conceptually I don't
see why that's preferable to a centralized panfrost_regs.h file?
Copyright/origin is more transparent that way too.

> +	for (timeout = 500; timeout > 0; timeout--) {

500 seems a little arbitrary...?

> +	// Need more version detection
> +	if (pfdev->features.id == 0x0860) {

Seconded, you have the GPU_MODELs just below...?

> +		for (i = 0; i < MAX_HW_REVS; i++) {
> +			if ((model->revs[i].revision != rev) &&
> +			    (model->revs[i].revision != (rev & ~0xf)))
> +				continue;
> +			hw_issues |= model->revs[i].issues;
> +			break;
> +		}

Nitpick: The control flow logic seems a little overcomplicated here.

> +	msleep(10);

What kind of race condition are we working around here? ;)

> +void panfrost_gpu_fini(struct panfrost_device *pfdev)
> +{
> +
> +}

Anything that has to happen here? If no, add a comment in the body
saying that. If yes, well, that (or at least /* stub */)...

> +/*
> + * This is not a complete list of issues, but only the ones the driver needs
> + * to care about.
> + */
> +enum base_hw_issue {
> +	HW_ISSUE_6367,

Similar nitpicks as with the feature list. I will say, this is
_incredibly opaque_. I realize the vendor driver is _intentionally_
opaque here, but that doesn't make this any easier to follow ;)

The good news is that older vendor driver releases (for T760 and
earlier?) had comments documenting what all the errata were, so the vast
majority of these we do have documentation on. Plus, if these are just
the issues the _kernel_ driver cares about, we have documentation for
that given the has_issue calls across kbase. Regardless, something more
transparent than an uncommented number might be nice.

> +	GPUCORE_1619,

What's a GPUCORE and what was kbase thinking...? ;)

> +#endif /* _HWCONFIG_ISSUES_H_ */

s/_HWCONFIG_ISSUES_H/__PANFROST_ISSUES_H_/

> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
> +/* Copyright 2019 Collabora ltd. */

See register concerns again.

> +static int panfrost_job_get_slot(struct panfrost_job *job)

It might help to have a quick comment explaining what JS0/JS1/JS2 are to
refresh the reader's (and eventually your) memory. Just a simple, you
know:

/* JS0: fragment jobs.
 * JS1: vertex/tiler jobs
 * JS2: compute jobs
 */

> +#if 0
> +// Ignore compute for now

Maybe don't ignore compute if it's just this little routine? :)

> +		if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_8987))
> +			return 2;

It looks like 8987 only applies to the very earliest dev models (based
on the issues list) so we shouldn't need to worry about this.
t600_r0p0_15dev0 can probably be safely ignored entirely (and deleted
from the issues/features/models list, frankly). I doubt that model is
even publicly available...

> +static void panfrost_job_write_affinity(struct panfrost_device *pfdev,

What's affinity? :)

> +		udelay(100);

(Arbitrary?)

> +#if 0
> +	if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_10649))
> +		cfg |= JS_CONFIG_START_MMU;

This issue seems to apply to a lot of GPUs we *do* care about; better
handle this.

> +
> +	if (panfrost_has_hw_feature(kbdev,
> +				BASE_HW_FEATURE_JOBCHAIN_DISAMBIGUATION)) {
> +		if (!kbdev->hwaccess.backend.slot_rb[js].job_chain_flag) {
> +			cfg |= JS_CONFIG_JOB_CHAIN_FLAG;
> +			katom->atom_flags |= KBASE_KATOM_FLAGS_JOBCHAIN;
> +			kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
> +								true;
> +		} else {
> +			katom->atom_flags &= ~KBASE_KATOM_FLAGS_JOBCHAIN;
> +			kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
> +								false;
> +		}
> +	}

What does this code do / why is it if 0'd?

> +	for (i = 0; i < bo_count; i++)
> +		/* XXX: Use shared fences for read-only objects. */
> +		reservation_object_add_excl_fence(bos[i]->resv, fence);

I might be paranoid, but 2 lines means braces :)

> +	for (i = 0; i < job->bo_count; i++)
> +		drm_gem_object_put_unlocked(job->bos[i]);
> +	kvfree(job->bos);
> +
> +	kfree(job);

Nitpick: move the blank space up a line.

> +	if (job_read(pfdev, JS_STATUS(js)) == 8) {

What does 8 mean?

> +//		dev_err(pfdev->dev, "reseting gpu");
> +//		panfrost_gpu_reset(pfdev);
> +	}
> +
> +	/* For now, just say we're done. No reset and retry. */
> +//	job_write(pfdev, JS_COMMAND(js), JS_COMMAND_HARD_STOP);
> +	dma_fence_signal(job->done_fence);

That's probably reasonable, at least for now. If our job faults we have
bigger issues / retrying is probably futile. That said, if we're not
resetting is there a risk of lock-ups?

> +		/* Non-Fault Status code */
> +		/* Job exceptions */

I think the "FAULT" suffix implies that loudly enough :)

> +	job_write(pfdev, JOB_INT_CLEAR, 0x70007);
> +	job_write(pfdev, JOB_INT_MASK, 0x70007);

Meaning of the magic numbers...?

> +#define NUM_JOB_SLOTS	2	/* Don't need 3rd one until we have compute support */

Sure, but there _are_ 3 slots in the hardware; there's no need to lie
about that even if we don't presently schedule anything there?

> +// SPDX-License-Identifier:	GPL-2.0
> +/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */

(Likewise, register copyright).

> +//	if (kbdev->system_coherency == COHERENCY_ACE)
> +//		current_setup->transtab |= AS_TRANSTAB_LPAE_SHARE_OUTER;

Bwap?

> +	//struct panfrost_device *pfdev = cookie;
> +	// Wait 1000 GPU cycles!?

?!
> +		if (panfrost_has_hw_feature(pfdev, HW_FEATURE_AARCH64_MMU))
> +			return "ATOMIC";
> +		else
> +			return "UNKNOWN";

Does it really make sense to check for the feature to determine the
name...? I mean, that code path should be unreachable, but still (at
least without the check the code is slightly neater..)

> +		fault_status = mmu_read(pfdev, AS_FAULTSTATUS(i));
> +		addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(i));
> +		addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(i)) << 32;

I don't know if it's necessary for the initial merge, but maybe at least
put a TODO comment in here that growable memory (lazy allocations) will
be implemented here in the future...?

> +void panfrost_mmu_fini(struct panfrost_device *pfdev)
> +{
> +
> +}

Empty?

----------------------------

Overall, I'm super happy to see this! Nice work, guys! ^_^

-Alyssa
Neil Armstrong March 8, 2019, 8:18 a.m. UTC | #4
On 08/03/2019 06:00, Alyssa Rosenzweig wrote:
> Oh my onions, it's really here! It's really coming! It's really here!
> 
> ----
> 

<snip>

> 
>> +static const struct of_device_id dt_match[] = {
>> +	{ .compatible = "arm,mali-t760" },
>> +	{ .compatible = "arm,mali-t860" },
>> +	{}
>> +};
> 
> Do we want to add compatibles for the rest of the Mali's on the initial
> merge, or wait until they're actually confirmed working so we don't load
> and cause problems on untested hardware?

We should definitely stick to the midgard bindings here and add all the compatibles
even if we haven't tested yet, maybe by adding a warning mechanism reporting
when used on an untested mali core ?

BTW rob, I resent the bifrost binding doc with an unique, "arm,mali-bifrost"
compatible, adding it would help starting debugging bifrost aswell !

> 
>> +enum base_hw_feature {
> 	...
>> +	HW_FEATURE_PROTECTED_MODE
> 	...
>> +};

<snip>

> 
> ----------------------------
> 
> Overall, I'm super happy to see this! Nice work, guys! ^_^

Yeah pretty cool work !

I'll run it on a T820 ASAP and push fixes on the gitlab repo.

Neil

> 
> -Alyssa
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Neil Armstrong March 8, 2019, 8:20 a.m. UTC | #5
On 08/03/2019 01:24, Rob Herring wrote:
> From: "Marty E. Plummer" <hanetzer@startmail.com>
> 
> This adds the initial driver for panfrost which supports Arm Mali
> Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> has been tested.
> 
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <maxime.ripard@bootlin.com>
> Cc: Sean Paul <sean@poorly.run>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Eric Anholt <eric@anholt.net>
> Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---

<snip>

> +
> +static const struct panfrost_model gpu_models[] = {
> +	/* T60x has an oddball version */
> +	GPU_MODEL(t600, 0x6956, 0xffff,
> +		GPU_REV_EXT(t600, 0, 0, 1, _15dev0)),
> +	GPU_MODEL_MIDGARD(t620, 0x620,
> +		GPU_REV(t620, 0, 1), GPU_REV(t620, 1, 0)),
> +	GPU_MODEL_MIDGARD(t720, 0x720),
> +	GPU_MODEL_MIDGARD(t760, 0x750,
> +		GPU_REV(t760, 0, 0), GPU_REV(t760, 0, 1),
> +		GPU_REV_EXT(t760, 0, 1, 0, _50rel0),
> +		GPU_REV(t760, 0, 2), GPU_REV(t760, 0, 3)),
> +	GPU_MODEL_MIDGARD(t820, 0x820),
> +	GPU_MODEL_MIDGARD(t830, 0x830),
> +	GPU_MODEL_MIDGARD(t860, 0x860),
> +	GPU_MODEL_MIDGARD(t880, 0x880),
> +
> +	GPU_MODEL_BIFROST(g71, 0x6000,
> +		GPU_REV_EXT(g71, 0, 0, 1, _05dev0)),
> +	GPU_MODEL_BIFROST(g72, 0x6001),
> +	GPU_MODEL_BIFROST(g51, 0x7000),
> +	GPU_MODEL_BIFROST(g76, 0x7001),
> +	GPU_MODEL_BIFROST(g52, 0x7002),
> +	GPU_MODEL_BIFROST(g31, 0x7003,
> +		GPU_REV(g31, 1, 0)),

G31 r0p0 should be supported, the Amlogic G12A has it :

[   98.036507] mali ffe40000.gpu: GPU identified as 0x3 arch 7.0.9 r0p0 status 0

as reported by mali_kbase.

Neil

> +};
> +
> +static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
> +{
> +	u32 gpu_id, num_js, major, minor, status, rev;
> +	const char *name = "unknown";
> +	u64 hw_feat = 0;
> +	u64 hw_issues = hw_issues_all;
> +	const struct panfrost_model *model;
> +	int i;

<snip>

> +#endif
> +
> +#endif /* _PANFROST_DRM_H_ */
>
Rob Herring March 8, 2019, 2:31 p.m. UTC | #6
On Thu, Mar 7, 2019 at 11:00 PM Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
>
> Oh my onions, it's really here! It's really coming! It's really here!
>
> ----
>
> > +       DRM driver for ARM Mali Midgard (t6xx, t7xx, t8xx) and
> > +       Bifrost (G3x, G5x, G7x) GPUs.
>
> Nitpick: the model names should maybe be capitalized? Or at least, the
> T/G should be consistent? I'm not sure what the vendor marketing names
> are exactly.
>
> > +     unsigned long base_hw_features[64 / BITS_PER_LONG];
> > +     unsigned long hw_issues[64 / BITS_PER_LONG];
>
> This is confusing...? Is the idea to always have u64, regardless of
> 32/64-bitness? If so, why not just u64? I'm not totally sure why these
> aren't just bitmasks, if we are capping to 64 for each.

bitmasks in the kernel use unsigned long arrays. A strange choice
which I guess was either because it predated 64-bit or enables atomic
ops which tend to be on the native size. So this just fixes the size
to 64-bits for 32 and 64 bit systems.

> On the other
> hand, there are a lot more than issues exposed by the kbase, though the
> vast majority don't apply to kernel space (and should be sorted as a
> purely userspace affair)..?

Issues I trimmed down to only the kernel ones. Features were small
enough, I just left them all.

> Also, nitpick: s/base_hw_features/hw_features/g, for consistency and not
> inheriting naming cruft.

+1

> > +     struct panfrost_job *jobs[3];
>
> 3 is a bit of a magic number, it feels like. I'm guessing this
> corresponds to job slots JS0/JS1/JS2? If so, I guess just add a quick
> comment about that, since otherwise it feels a little random.

Job slots, yes. I think I have a define somewhere already I'll use.

> (Maybe I'm biased because `struct panfrost_job` means something totally
> different in userspace for me...)
>
> > +/* DRM_AUTH is required on SUBMIT for now, while all clients share a single
> > + * address space.  Note that render nodes would be able to submit jobs that
> > + * could access BOs from clients authenticated with the master node.
> > + */
>
> This concerns me. Per-process address spaces (configured natively in the
> MMU) is essential from both security and stability standpoints. (It's
> possible I'm misunderstanding what DRM_AUTH means in this context; this
> is more responding to "share a single address space").

We'll get there. I'll just point out that freedreno is also a single
address space (though there are patches for that now).

> > +     drm_mm_init(&pfdev->mm, 0, SZ_4G); // 4G enough for now. can be 48-bit
>
> What's the rationale for picking 4G (when the virtual address space is
> 64-bit, physical is 48-bit)? Easier portability to 32-bit for
> simplicity, or something else?

Systems simply don't have enough RAM that you'd want to use 4G for
graphics memory.

That reminds me, I want to not start at 0 to catch NULL addresses.

> > +static const struct of_device_id dt_match[] = {
> > +     { .compatible = "arm,mali-t760" },
> > +     { .compatible = "arm,mali-t860" },
> > +     {}
> > +};
>
> Do we want to add compatibles for the rest of the Mali's on the initial
> merge, or wait until they're actually confirmed working so we don't load
> and cause problems on untested hardware?

I'd say wait. I also suggested on the bifrost bindings that we do a
more generic fallback as the h/w is pretty much discoverable for
model, version and features.

> > +enum base_hw_feature {
>         ...
> > +     HW_FEATURE_PROTECTED_MODE
>         ...
> > +};
>
> 1) I know these names are inherited from kbase, but might we prefer
> panfrost-prefixed names for consistency?

They aren't exposed outside of the driver, so namespacing them is a
bit pointless and just makes the names too long. I will at least
rename base_hw_feature.

> 2) I recall discussing this a bit over IRC, but most of these properties
> are of more use to userspace than kernelspace. Does it make sense to
> keep the feature list here rather than just in Mesa, bearing in mind
> Mesa upgrades are easier than kernel upgrades? (I think you may have
> been the one to bring up this fact, but hoping it doesn't get lost over
> IRC).

Unlike the issues list, it was small enough I didn't really think about it here.

> 3) On a matter of principle, I don't like wasting a bit on
> Digital Rainbow Management (especially when it's not something we're
> realistically going to implement for numerous reasons...)

If this is ever going to be used in commercial products, it will need
to be supported whether in tree or out. We can leave that for another
day.

>
> > +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> > @@ -0,0 +1,464 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
> > +/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
> > +/* Copyright 2019 Collabora ltd. */
>
> Given the register definitions are here (including the comments from
> kbase -- if we strip those, it might be a different story), it might be
> safer to add a vendor copyright here too.
>
> Alternatively, maybe the registers should be in their own file anyway. I
> know they were explicitly moved inward earlier, but conceptually I don't
> see why that's preferable to a centralized panfrost_regs.h file?

Primarily to enforce good hygiene to only access a sub-block's
registers within its .c file.

> Copyright/origin is more transparent that way too.

Good point. Otherwise, I can say "Register definitions from ...,
Copyright ARM..."

>
> > +     for (timeout = 500; timeout > 0; timeout--) {
>
> 500 seems a little arbitrary...?

We should should have a udelay in here at least to get a known time
rather than how much the compiler optimizes. Though the timeout would
still be arbitrary.

>
> > +     // Need more version detection
> > +     if (pfdev->features.id == 0x0860) {
>
> Seconded, you have the GPU_MODELs just below...?

This means I just stuffed in fixed values to these registers. The
kbase driver uses hw_issues and such to determine the setup.

>
> > +             for (i = 0; i < MAX_HW_REVS; i++) {
> > +                     if ((model->revs[i].revision != rev) &&
> > +                         (model->revs[i].revision != (rev & ~0xf)))
> > +                             continue;
> > +                     hw_issues |= model->revs[i].issues;
> > +                     break;
> > +             }
>
> Nitpick: The control flow logic seems a little overcomplicated here.
>
> > +     msleep(10);
>
> What kind of race condition are we working around here? ;)

Hard to say with the context deleted...

As powering on h/w takes time, a delay seemed appropriate. i think I
can check some status bits though.

>
> > +void panfrost_gpu_fini(struct panfrost_device *pfdev)
> > +{
> > +
> > +}
>
> Anything that has to happen here? If no, add a comment in the body
> saying that. If yes, well, that (or at least /* stub */)...
>
> > +/*
> > + * This is not a complete list of issues, but only the ones the driver needs
> > + * to care about.
> > + */
> > +enum base_hw_issue {
> > +     HW_ISSUE_6367,
>
> Similar nitpicks as with the feature list. I will say, this is
> _incredibly opaque_. I realize the vendor driver is _intentionally_
> opaque here, but that doesn't make this any easier to follow ;)
>
> The good news is that older vendor driver releases (for T760 and
> earlier?) had comments documenting what all the errata were, so the vast
> majority of these we do have documentation on. Plus, if these are just
> the issues the _kernel_ driver cares about, we have documentation for
> that given the has_issue calls across kbase. Regardless, something more
> transparent than an uncommented number might be nice.

It's not always clear looking at the kbase driver. Often it's just if
some issue, set or don't set some other bit.

IMO, if someone wants to improve the documentation here, that can come later.

>
> > +     GPUCORE_1619,
>
> What's a GPUCORE and what was kbase thinking...? ;)

No idea other than it is only for T604 and s/w models.

>
> > +#endif /* _HWCONFIG_ISSUES_H_ */
>
> s/_HWCONFIG_ISSUES_H/__PANFROST_ISSUES_H_/
>
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
> > +/* Copyright 2019 Collabora ltd. */
>
> See register concerns again.
>
> > +static int panfrost_job_get_slot(struct panfrost_job *job)
>
> It might help to have a quick comment explaining what JS0/JS1/JS2 are to
> refresh the reader's (and eventually your) memory. Just a simple, you
> know:
>
> /* JS0: fragment jobs.
>  * JS1: vertex/tiler jobs
>  * JS2: compute jobs
>  */
>
> > +#if 0
> > +// Ignore compute for now
>
> Maybe don't ignore compute if it's just this little routine? :)
>
> > +             if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_8987))
> > +                     return 2;
>
> It looks like 8987 only applies to the very earliest dev models (based
> on the issues list) so we shouldn't need to worry about this.
> t600_r0p0_15dev0 can probably be safely ignored entirely (and deleted
> from the issues/features/models list, frankly). I doubt that model is
> even publicly available...

Sadly, I confirmed it is sitting on my desk. Samsung chromebook
(snow). As to whether anyone still cares about snow and wants to run a
current mainline kernel on it, I don't know. It doesn't look to me
like there's any effort there. I certainly don't plan to try.

> > +static void panfrost_job_write_affinity(struct panfrost_device *pfdev,
>
> What's affinity? :)

Hard coded ATM.

> > +             udelay(100);
>
> (Arbitrary?)

Tell me what you'd like it to be...

>
> > +#if 0
> > +     if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_10649))
> > +             cfg |= JS_CONFIG_START_MMU;
>
> This issue seems to apply to a lot of GPUs we *do* care about; better
> handle this.
>
> > +
> > +     if (panfrost_has_hw_feature(kbdev,
> > +                             BASE_HW_FEATURE_JOBCHAIN_DISAMBIGUATION)) {
> > +             if (!kbdev->hwaccess.backend.slot_rb[js].job_chain_flag) {
> > +                     cfg |= JS_CONFIG_JOB_CHAIN_FLAG;
> > +                     katom->atom_flags |= KBASE_KATOM_FLAGS_JOBCHAIN;
> > +                     kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
> > +                                                             true;
> > +             } else {
> > +                     katom->atom_flags &= ~KBASE_KATOM_FLAGS_JOBCHAIN;
> > +                     kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
> > +                                                             false;
> > +             }
> > +     }
>
> What does this code do / why is it if 0'd?

Isn't it obvious?

It's a TODO we need to figure out.

>
> > +     for (i = 0; i < bo_count; i++)
> > +             /* XXX: Use shared fences for read-only objects. */
> > +             reservation_object_add_excl_fence(bos[i]->resv, fence);
>
> I might be paranoid, but 2 lines means braces :)
>
> > +     for (i = 0; i < job->bo_count; i++)
> > +             drm_gem_object_put_unlocked(job->bos[i]);
> > +     kvfree(job->bos);
> > +
> > +     kfree(job);
>
> Nitpick: move the blank space up a line.
>
> > +     if (job_read(pfdev, JS_STATUS(js)) == 8) {
>
> What does 8 mean?
>
> > +//           dev_err(pfdev->dev, "reseting gpu");
> > +//           panfrost_gpu_reset(pfdev);
> > +     }
> > +
> > +     /* For now, just say we're done. No reset and retry. */
> > +//   job_write(pfdev, JS_COMMAND(js), JS_COMMAND_HARD_STOP);
> > +     dma_fence_signal(job->done_fence);
>
> That's probably reasonable, at least for now. If our job faults we have
> bigger issues / retrying is probably futile. That said, if we're not
> resetting is there a risk of lock-ups?

Maybe? Reloading the module will reset things.

>
> > +             /* Non-Fault Status code */
> > +             /* Job exceptions */
>
> I think the "FAULT" suffix implies that loudly enough :)
>
> > +     job_write(pfdev, JOB_INT_CLEAR, 0x70007);
> > +     job_write(pfdev, JOB_INT_MASK, 0x70007);
>
> Meaning of the magic numbers...?
>
> > +#define NUM_JOB_SLOTS        2       /* Don't need 3rd one until we have compute support */
>
> Sure, but there _are_ 3 slots in the hardware; there's no need to lie
> about that even if we don't presently schedule anything there?
>
> > +// SPDX-License-Identifier:  GPL-2.0
> > +/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
>
> (Likewise, register copyright).
>
> > +//   if (kbdev->system_coherency == COHERENCY_ACE)
> > +//           current_setup->transtab |= AS_TRANSTAB_LPAE_SHARE_OUTER;
>
> Bwap?

Only matters for bifrost. There for a TODO I guess.

>
> > +     //struct panfrost_device *pfdev = cookie;
> > +     // Wait 1000 GPU cycles!?
>
> ?!

That's what the kbase driver does...

> > +             if (panfrost_has_hw_feature(pfdev, HW_FEATURE_AARCH64_MMU))
> > +                     return "ATOMIC";
> > +             else
> > +                     return "UNKNOWN";
>
> Does it really make sense to check for the feature to determine the
> name...? I mean, that code path should be unreachable, but still (at
> least without the check the code is slightly neater..)
>
> > +             fault_status = mmu_read(pfdev, AS_FAULTSTATUS(i));
> > +             addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(i));
> > +             addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(i)) << 32;
>
> I don't know if it's necessary for the initial merge, but maybe at least
> put a TODO comment in here that growable memory (lazy allocations) will
> be implemented here in the future...?

That affects more than just the code here. We probably need a higher
level TODO list.

> > +void panfrost_mmu_fini(struct panfrost_device *pfdev)
> > +{
> > +
> > +}
>
> Empty?

Yeah, that needs some cleanup.

Rob
Rob Herring March 8, 2019, 2:39 p.m. UTC | #7
On Fri, Mar 8, 2019 at 2:18 AM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> On 08/03/2019 06:00, Alyssa Rosenzweig wrote:
> > Oh my onions, it's really here! It's really coming! It's really here!
> >
> > ----
> >
>
> <snip>
>
> >
> >> +static const struct of_device_id dt_match[] = {
> >> +    { .compatible = "arm,mali-t760" },
> >> +    { .compatible = "arm,mali-t860" },
> >> +    {}
> >> +};
> >
> > Do we want to add compatibles for the rest of the Mali's on the initial
> > merge, or wait until they're actually confirmed working so we don't load
> > and cause problems on untested hardware?
>
> We should definitely stick to the midgard bindings here and add all the compatibles
> even if we haven't tested yet, maybe by adding a warning mechanism reporting
> when used on an untested mali core ?

Compatibles is the only thing we have documenting what's tested or
not. So either we add compatibles when tested or remove warnings when
tested? Either way a kernel update is needed.

> BTW rob, I resent the bifrost binding doc with an unique, "arm,mali-bifrost"
> compatible, adding it would help starting debugging bifrost aswell !

There's a lot missing for bifrost, and a compatible string is the
least of it. I can probably add most of it, but can't test any of it
(and no one can as the mesa side is not done).

>
> >
> >> +enum base_hw_feature {
> >       ...
> >> +    HW_FEATURE_PROTECTED_MODE
> >       ...
> >> +};
>
> <snip>
>
> >
> > ----------------------------
> >
> > Overall, I'm super happy to see this! Nice work, guys! ^_^
>
> Yeah pretty cool work !
>
> I'll run it on a T820 ASAP and push fixes on the gitlab repo.

What SoC is that?

Rob
Neil Armstrong March 8, 2019, 2:50 p.m. UTC | #8
On 08/03/2019 15:39, Rob Herring wrote:
> On Fri, Mar 8, 2019 at 2:18 AM Neil Armstrong <narmstrong@baylibre.com> wrote:
>>
>> On 08/03/2019 06:00, Alyssa Rosenzweig wrote:
>>> Oh my onions, it's really here! It's really coming! It's really here!
>>>
>>> ----
>>>
>>
>> <snip>
>>
>>>
>>>> +static const struct of_device_id dt_match[] = {
>>>> +    { .compatible = "arm,mali-t760" },
>>>> +    { .compatible = "arm,mali-t860" },
>>>> +    {}
>>>> +};
>>>
>>> Do we want to add compatibles for the rest of the Mali's on the initial
>>> merge, or wait until they're actually confirmed working so we don't load
>>> and cause problems on untested hardware?
>>
>> We should definitely stick to the midgard bindings here and add all the compatibles
>> even if we haven't tested yet, maybe by adding a warning mechanism reporting
>> when used on an untested mali core ?
> 
> Compatibles is the only thing we have documenting what's tested or
> not. So either we add compatibles when tested or remove warnings when
> tested? Either way a kernel update is needed.

The kernel will be updated only to remove the warning, since midgard is
also HW discoverable, we can add all the bindings compatible without risks.

> 
>> BTW rob, I resent the bifrost binding doc with an unique, "arm,mali-bifrost"
>> compatible, adding it would help starting debugging bifrost aswell !
> 
> There's a lot missing for bifrost, and a compatible string is the
> least of it. I can probably add most of it, but can't test any of it
> (and no one can as the mesa side is not done).

What's the problem into adding the compatibles and add a warning filtered on
tested model on the HW discovered ID instead ?

> 
>>
>>>
>>>> +enum base_hw_feature {
>>>       ...
>>>> +    HW_FEATURE_PROTECTED_MODE
>>>       ...
>>>> +};
>>
>> <snip>
>>
>>>
>>> ----------------------------
>>>
>>> Overall, I'm super happy to see this! Nice work, guys! ^_^
>>
>> Yeah pretty cool work !
>>
>> I'll run it on a T820 ASAP and push fixes on the gitlab repo.
> 
> What SoC is that?

Amlogic S912

The t820 node is not upstream since it needs a bindings change to handle
the soc specific reset lines.
The reset lines are needed because the internal midgard soft-reset does not
work, thus needing the external reset lines.
It also needs some other tweaks, but I'll need to work on it to understand
what's needed on the drm driver, the mali_kbase needs a hacky tweak to
enable all the cores, but you seem to manually enable the core which is fine.

Neil

> 
> Rob
>
Rob Herring March 8, 2019, 2:51 p.m. UTC | #9
On Fri, Mar 8, 2019 at 2:20 AM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> On 08/03/2019 01:24, Rob Herring wrote:
> > From: "Marty E. Plummer" <hanetzer@startmail.com>
> >
> > This adds the initial driver for panfrost which supports Arm Mali
> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> > has been tested.
> >
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <maxime.ripard@bootlin.com>
> > Cc: Sean Paul <sean@poorly.run>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: Eric Anholt <eric@anholt.net>
> > Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
> > Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > ---
>
> <snip>
>
> > +
> > +static const struct panfrost_model gpu_models[] = {
> > +     /* T60x has an oddball version */
> > +     GPU_MODEL(t600, 0x6956, 0xffff,
> > +             GPU_REV_EXT(t600, 0, 0, 1, _15dev0)),
> > +     GPU_MODEL_MIDGARD(t620, 0x620,
> > +             GPU_REV(t620, 0, 1), GPU_REV(t620, 1, 0)),
> > +     GPU_MODEL_MIDGARD(t720, 0x720),
> > +     GPU_MODEL_MIDGARD(t760, 0x750,
> > +             GPU_REV(t760, 0, 0), GPU_REV(t760, 0, 1),
> > +             GPU_REV_EXT(t760, 0, 1, 0, _50rel0),
> > +             GPU_REV(t760, 0, 2), GPU_REV(t760, 0, 3)),
> > +     GPU_MODEL_MIDGARD(t820, 0x820),
> > +     GPU_MODEL_MIDGARD(t830, 0x830),
> > +     GPU_MODEL_MIDGARD(t860, 0x860),
> > +     GPU_MODEL_MIDGARD(t880, 0x880),
> > +
> > +     GPU_MODEL_BIFROST(g71, 0x6000,
> > +             GPU_REV_EXT(g71, 0, 0, 1, _05dev0)),
> > +     GPU_MODEL_BIFROST(g72, 0x6001),
> > +     GPU_MODEL_BIFROST(g51, 0x7000),
> > +     GPU_MODEL_BIFROST(g76, 0x7001),
> > +     GPU_MODEL_BIFROST(g52, 0x7002),
> > +     GPU_MODEL_BIFROST(g31, 0x7003,
> > +             GPU_REV(g31, 1, 0)),
>
> G31 r0p0 should be supported, the Amlogic G12A has it :
>
> [   98.036507] mali ffe40000.gpu: GPU identified as 0x3 arch 7.0.9 r0p0 status 0
>
> as reported by mali_kbase.

It is. There are no h/w issues specific to r0p0 that the kernel has to
care about. Somehow, r1p0 added a new issue.

We aren't tracking exactly all known and supported versions. Userspace
can do this if we want as the kernel doesn't even track all the
issues. If we were to have exact matching, we should only add the ones
found in h/w supported upstream. I imagine there's lots of revisions
in the kbase driver we'll never see.

Rob
Alyssa Rosenzweig March 8, 2019, 3:34 p.m. UTC | #10
> bitmasks in the kernel use unsigned long arrays. A strange choice
> which I guess was either because it predated 64-bit or enables atomic
> ops which tend to be on the native size. So this just fixes the size
> to 64-bits for 32 and 64 bit systems.

Bizarre, but if that's the standard, then OK.

> Issues I trimmed down to only the kernel ones. Features were small
> enough, I just left them all.

+1

> Job slots, yes. I think I have a define somewhere already I'll use.

NUM_JOB_SLOTS 

> I'll just point out that freedreno is also a single address space
> (though there are patches for that now).

Huh. That seems.... huh.

> Systems simply don't have enough RAM that you'd want to use 4G for
> graphics memory.

Fair enough.

> That reminds me, I want to not start at 0 to catch NULL addresses.

+1

> They aren't exposed outside of the driver, so namespacing them is a
> bit pointless and just makes the names too long. I will at least
> rename base_hw_feature.

+1

> Unlike the issues list, it was small enough I didn't really think about it here.

+1 though this may change in the future.

> If this is ever going to be used in commercial products, it will need
> to be supported whether in tree or out. We can leave that for another
> day.

*angry acceptance noises*

> Primarily to enforce good hygiene to only access a sub-block's
> registers within its .c file.
> 
> > Copyright/origin is more transparent that way too.
> 
> Good point. Otherwise, I can say "Register definitions from ...,
> Copyright ARM..."

Perhaps it would make sense to have a regs/ folder with a
.h corresponding to each of the components. So it's separated out but
still prevents unintended access (since you'd need to #include the wrong
component explicitly, and that'd be immediately vetoed on a patch).

> This means I just stuffed in fixed values to these registers. The
> kbase driver uses hw_issues and such to determine the setup.

Oh, I see. 

> Hard to say with the context deleted...

Sorry! I'm new to this, bear with me :)

> As powering on h/w takes time, a delay seemed appropriate. i think I
> can check some status bits though.

That conceptually seems better, I think.

> IMO, if someone wants to improve the documentation here, that can come later.

I'd be happy to submit a patch helping with this (either before we
upstream or after).

> Sadly, I confirmed it is sitting on my desk. Samsung chromebook
> (snow).

Any chance that was an earlier dev version of Snow that you got from
your employer and this was maybe fixed in the real thing? Or would that
be too convenient?

> Hard coded ATM.

No, like, what does "affinity" mean in this context? I didn't study
kbase too hard so I'm having troubles following any of this.

> Tell me what you'd like it to be...

Maybe some nice round number like 64 or 128..? Kidding ;)

> Isn't it obvious?
> 
> It's a TODO we need to figure out.

...Ah. Yes, that should have been obvious. My apologies.

> Maybe? Reloading the module will reset things.

The issue is that unprivileged userspace apps are able to submit
arbitrarily crazy workloads to the kernel (without necessarily needing
the driver's permission). It's trivial for unprivileged code to cause a
fault (and thanks to driver bugs, WebGL is an attack surface here as
well); doing so should not lock up the GPU requiring superuser access to
correct. Faults are not catastrophic but they do happen.

> Only matters for bifrost. There for a TODO I guess.

Gotcha, alright.

> That's what the kbase driver does...

Huh.

> That affects more than just the code here. We probably need a higher
> level TODO list.

+1

-Alyssa
Rob Herring March 8, 2019, 4:16 p.m. UTC | #11
On Fri, Mar 8, 2019 at 9:34 AM Alyssa Rosenzweig <alyssa@rosenzweig.io> wrote:
>
> > bitmasks in the kernel use unsigned long arrays. A strange choice
> > which I guess was either because it predated 64-bit or enables atomic
> > ops which tend to be on the native size. So this just fixes the size
> > to 64-bits for 32 and 64 bit systems.
>
> Bizarre, but if that's the standard, then OK.
>
> > Issues I trimmed down to only the kernel ones. Features were small
> > enough, I just left them all.
>
> +1
>
> > Job slots, yes. I think I have a define somewhere already I'll use.
>
> NUM_JOB_SLOTS
>
> > I'll just point out that freedreno is also a single address space
> > (though there are patches for that now).
>
> Huh. That seems.... huh.
>
> > Systems simply don't have enough RAM that you'd want to use 4G for
> > graphics memory.
>
> Fair enough.
>
> > That reminds me, I want to not start at 0 to catch NULL addresses.
>
> +1
>
> > They aren't exposed outside of the driver, so namespacing them is a
> > bit pointless and just makes the names too long. I will at least
> > rename base_hw_feature.
>
> +1
>
> > Unlike the issues list, it was small enough I didn't really think about it here.
>
> +1 though this may change in the future.
>
> > If this is ever going to be used in commercial products, it will need
> > to be supported whether in tree or out. We can leave that for another
> > day.
>
> *angry acceptance noises*
>
> > Primarily to enforce good hygiene to only access a sub-block's
> > registers within its .c file.
> >
> > > Copyright/origin is more transparent that way too.
> >
> > Good point. Otherwise, I can say "Register definitions from ...,
> > Copyright ARM..."
>
> Perhaps it would make sense to have a regs/ folder with a
> .h corresponding to each of the components. So it's separated out but
> still prevents unintended access (since you'd need to #include the wrong
> component explicitly, and that'd be immediately vetoed on a patch).
>
> > This means I just stuffed in fixed values to these registers. The
> > kbase driver uses hw_issues and such to determine the setup.
>
> Oh, I see.
>
> > Hard to say with the context deleted...
>
> Sorry! I'm new to this, bear with me :)
>
> > As powering on h/w takes time, a delay seemed appropriate. i think I
> > can check some status bits though.
>
> That conceptually seems better, I think.
>
> > IMO, if someone wants to improve the documentation here, that can come later.
>
> I'd be happy to submit a patch helping with this (either before we
> upstream or after).
>
> > Sadly, I confirmed it is sitting on my desk. Samsung chromebook
> > (snow).
>
> Any chance that was an earlier dev version of Snow that you got from
> your employer and this was maybe fixed in the real thing? Or would that
> be too convenient?

It was given to me and a bunch of other ARM kernel devs, but I think
it was in production by then. It's an A01 rev which matches this:

https://www.notebookcheck.net/Samsung-Chromebook-XE303C12-A01US.84022.0.html

The only other rev is a UK version.

>
> > Hard coded ATM.
>
> No, like, what does "affinity" mean in this context? I didn't study
> kbase too hard so I'm having troubles following any of this.

Core affinity. It's which shader cores to use which was based on flags
you pass in for the job. Some of the h/w has multiple L2 caches (IIRC
just bifrost) and the code is fairly hard to follow which is why it's
just hardcoded. Sorry, I don't have a better explanation, but I've
already forgotten some of the details and stopped looking at it once I
found hardcoding it would work for now...

This and other parts of this code is why I asked if there are other
features of the kbase job submit that we may need in the future
(besides just compute).

> > Tell me what you'd like it to be...
>
> Maybe some nice round number like 64 or 128..? Kidding ;)
>
> > Isn't it obvious?
> >
> > It's a TODO we need to figure out.
>
> ...Ah. Yes, that should have been obvious. My apologies.
>
> > Maybe? Reloading the module will reset things.
>
> The issue is that unprivileged userspace apps are able to submit
> arbitrarily crazy workloads to the kernel (without necessarily needing
> the driver's permission). It's trivial for unprivileged code to cause a
> fault (and thanks to driver bugs, WebGL is an attack surface here as
> well); doing so should not lock up the GPU requiring superuser access to
> correct. Faults are not catastrophic but they do happen.

I understand, but wouldn't just running conformance tests likely kill
things too? I'm guessing we can't even run a web browser yet, so WebGL
problem is solved. ;)

In any case, Tomeu said resetting is next up for him.

Rob
Eric Anholt March 8, 2019, 4:28 p.m. UTC | #12
Rob Herring <robh@kernel.org> writes:

> From: "Marty E. Plummer" <hanetzer@startmail.com>
>
> This adds the initial driver for panfrost which supports Arm Mali
> Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> has been tested.
>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <maxime.ripard@bootlin.com>
> Cc: Sean Paul <sean@poorly.run>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Eric Anholt <eric@anholt.net>
> Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
> Sending this out in the spirit of release early, release often. We're 
> close to parity compared to mesa + the vendor driver. There's a few 
> issues Tomeu is chasing. 
>
> There's still some pieces of the h/w setup we've just hardcoded. Locking 
> in various places is probably missing. Error recovery is non-existent 
> (other than module unload/load). There's some work to add tracepoints 
> and perf counters that's not here yet. Bifrost GPUs are definitely not 
> supported yet other than identifying them. Primarily the MMU setup is 
> missing.
>
> How's performance? Great, because I haven't measured it.
>
> This patch and its dependencies are available here[1]. The mesa support 
> is here[2]. Both are likely to change (daily).

My inclination would be to merge this soon, when basic checkpatch is in
shape.  The UABI looks good, and that's the important part.

> +static void panfrost_unlock_bo_reservations(struct drm_gem_object **bos,
> +					    int bo_count,
> +					    struct ww_acquire_ctx *acquire_ctx)
> +{
> +	int i;
> +
> +	for (i = 0; i < bo_count; i++) {
> +		ww_mutex_unlock(&bos[i]->resv->lock);
> +	}
> +	ww_acquire_fini(acquire_ctx);
> +}
> +
> +/* Takes the reservation lock on all the BOs being referenced, so that
> + * at queue submit time we can update the reservations.
> + *
> + * We don't lock the RCL the tile alloc/state BOs, or overflow memory
> + * (all of which are on exec->unref_list).  They're entirely private
> + * to panfrost, so we don't attach dma-buf fences to them.
> + */
> +static int panfrost_lock_bo_reservations(struct drm_gem_object **bos,
> +					 int bo_count,
> +					 struct ww_acquire_ctx *acquire_ctx)
> +{
> +	int contended_lock = -1;
> +	int i, ret;
> +
> +	ww_acquire_init(acquire_ctx, &reservation_ww_class);
> +
> +retry:
> +	if (contended_lock != -1) {
> +		struct drm_gem_object *bo = bos[contended_lock];
> +
> +		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
> +						       acquire_ctx);
> +		if (ret) {
> +			ww_acquire_done(acquire_ctx);
> +			return ret;
> +		}
> +	}
> +
> +	for (i = 0; i < bo_count; i++) {
> +		if (i == contended_lock)
> +			continue;
> +
> +		ret = ww_mutex_lock_interruptible(&bos[i]->resv->lock,
> +						  acquire_ctx);
> +		if (ret) {
> +			int j;
> +
> +			for (j = 0; j < i; j++)
> +				ww_mutex_unlock(&bos[j]->resv->lock);
> +
> +			if (contended_lock != -1 && contended_lock >= i) {
> +				struct drm_gem_object *bo = bos[contended_lock];
> +
> +				ww_mutex_unlock(&bo->resv->lock);
> +			}
> +
> +			if (ret == -EDEADLK) {
> +				contended_lock = i;
> +				goto retry;
> +			}
> +
> +			ww_acquire_done(acquire_ctx);
> +			return ret;
> +		}
> +	}
> +
> +	ww_acquire_done(acquire_ctx);
> +
> +	/* Reserve space for our shared (read-only) fence references,
> +	 * before we commit the job to the hardware.
> +	 */
> +	for (i = 0; i < bo_count; i++) {
> +		ret = reservation_object_reserve_shared(bos[i]->resv, 1);
> +		if (ret) {
> +			panfrost_unlock_bo_reservations(bos, bo_count,
> +						   acquire_ctx);
> +			return ret;
> +		}
> +	}
> +
> +	return 0;
> +}

I just sent out my shared helpers for most of this function, hopefully
you can use them.

It looks like you've got v3d's silliness with the fences -- we reserve a
shared slot, then use excl only anyway.  For v3d I'm planning on moving
to just excl -- only one of my entrypoints has info on write vs
read-only, and I don't know of a usecase where having multiple read-only
consumers of a shared buffer simultaneously matters.

More importantly, I think you also have my bug of not doing implicit
synchronization on buffers, which will break X11 rendering
sometimes. X11's GL requirements are that previously-submitted rendering
by the client fd will execute before X11's rendering on its fd to the
same buffers.  If you're running a single client, X11's copies are cheap
enough that it'll probably work out most of the time.

> --- /dev/null
> +++ b/include/uapi/drm/panfrost_drm.h

> +#define DRM_IOCTL_PANFROST_SUBMIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> +#define DRM_IOCTL_PANFROST_WAIT_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> +#define DRM_IOCTL_PANFROST_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_CREATE_BO, struct drm_panfrost_create_bo)
> +#define DRM_IOCTL_PANFROST_MMAP_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
> +#define DRM_IOCTL_PANFROST_GET_PARAM		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
> +#define DRM_IOCTL_PANFROST_GET_BO_OFFSET	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)

SUBMIT and WAIT_BO might be IOR instead of IOWR
Alyssa Rosenzweig March 8, 2019, 6:46 p.m. UTC | #13
> It was given to me and a bunch of other ARM kernel devs, but I think
> it was in production by then. It's an A01 rev which matches this:
> 
> https://www.notebookcheck.net/Samsung-Chromebook-XE303C12-A01US.84022.0.html
> 
> The only other rev is a UK version.

Wacky. Something seems decidedly odd about a "-dev" GPU used in
production, though I concede snow was an odd machine in the first place.

> Core affinity. It's which shader cores to use which was based on flags
> you pass in for the job. Some of the h/w has multiple L2 caches (IIRC
> just bifrost) and the code is fairly hard to follow which is why it's
> just hardcoded. Sorry, I don't have a better explanation, but I've
> already forgotten some of the details and stopped looking at it once I
> found hardcoding it would work for now...
> 
> This and other parts of this code is why I asked if there are other
> features of the kbase job submit that we may need in the future
> (besides just compute).

Ahh, okay. As far as I can tell, what we have now (plus the basic
compute adjustments we've talked about) should be good through GLES3.
It's possible the hairier details are maybe exposed in OpenCL (?), but I
haven't looked at that yet, so I couldn't say. It's probably fine to
keep hardcoding until we can't.

> I understand, but wouldn't just running conformance tests likely kill
> things too? I'm guessing we can't even run a web browser yet, so WebGL
> problem is solved. ;)

Hehe, yes, the conformance tests definitely cause everything to go
kerplutz, hence why having resetting is essential just from a stability
standpoint (even just for dev, let alone security issues -- unlikely
most of the above, this *is* a blocker IMO).

> In any case, Tomeu said resetting is next up for him.

Ah, good! :)
Rob Herring March 13, 2019, 1:06 p.m. UTC | #14
On Fri, Mar 8, 2019 at 10:29 AM Eric Anholt <eric@anholt.net> wrote:
>
> Rob Herring <robh@kernel.org> writes:
>
> > From: "Marty E. Plummer" <hanetzer@startmail.com>
> >
> > This adds the initial driver for panfrost which supports Arm Mali
> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> > has been tested.

[...]

> It looks like you've got v3d's silliness with the fences -- we reserve a
> shared slot, then use excl only anyway.  For v3d I'm planning on moving
> to just excl -- only one of my entrypoints has info on write vs
> read-only, and I don't know of a usecase where having multiple read-only
> consumers of a shared buffer simultaneously matters.
>
> More importantly, I think you also have my bug of not doing implicit
> synchronization on buffers, which will break X11 rendering
> sometimes. X11's GL requirements are that previously-submitted rendering
> by the client fd will execute before X11's rendering on its fd to the
> same buffers.  If you're running a single client, X11's copies are cheap
> enough that it'll probably work out most of the time.

Is there a fix for this? I didn't find anything that looked like one.

>
> > --- /dev/null
> > +++ b/include/uapi/drm/panfrost_drm.h
>
> > +#define DRM_IOCTL_PANFROST_SUBMIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> > +#define DRM_IOCTL_PANFROST_WAIT_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> > +#define DRM_IOCTL_PANFROST_CREATE_BO         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_CREATE_BO, struct drm_panfrost_create_bo)
> > +#define DRM_IOCTL_PANFROST_MMAP_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
> > +#define DRM_IOCTL_PANFROST_GET_PARAM         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
> > +#define DRM_IOCTL_PANFROST_GET_BO_OFFSET     DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
>
> SUBMIT and WAIT_BO might be IOR instead of IOWR

Huh? Perhaps WAIT_BO should be IOW as we don't update the timeout
being absolute, but both have input parameters and SUBMIT has output
params.

Rob
Eric Anholt March 13, 2019, 4:09 p.m. UTC | #15
Rob Herring <robh@kernel.org> writes:

> On Fri, Mar 8, 2019 at 10:29 AM Eric Anholt <eric@anholt.net> wrote:
>>
>> Rob Herring <robh@kernel.org> writes:
>>
>> > From: "Marty E. Plummer" <hanetzer@startmail.com>
>> >
>> > This adds the initial driver for panfrost which supports Arm Mali
>> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
>> > has been tested.
>
> [...]
>
>> It looks like you've got v3d's silliness with the fences -- we reserve a
>> shared slot, then use excl only anyway.  For v3d I'm planning on moving
>> to just excl -- only one of my entrypoints has info on write vs
>> read-only, and I don't know of a usecase where having multiple read-only
>> consumers of a shared buffer simultaneously matters.
>>
>> More importantly, I think you also have my bug of not doing implicit
>> synchronization on buffers, which will break X11 rendering
>> sometimes. X11's GL requirements are that previously-submitted rendering
>> by the client fd will execute before X11's rendering on its fd to the
>> same buffers.  If you're running a single client, X11's copies are cheap
>> enough that it'll probably work out most of the time.
>
> Is there a fix for this? I didn't find anything that looked like one.
>
>>
>> > --- /dev/null
>> > +++ b/include/uapi/drm/panfrost_drm.h
>>
>> > +#define DRM_IOCTL_PANFROST_SUBMIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
>> > +#define DRM_IOCTL_PANFROST_WAIT_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
>> > +#define DRM_IOCTL_PANFROST_CREATE_BO         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_CREATE_BO, struct drm_panfrost_create_bo)
>> > +#define DRM_IOCTL_PANFROST_MMAP_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
>> > +#define DRM_IOCTL_PANFROST_GET_PARAM         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
>> > +#define DRM_IOCTL_PANFROST_GET_BO_OFFSET     DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
>>
>> SUBMIT and WAIT_BO might be IOR instead of IOWR
>
> Huh? Perhaps WAIT_BO should be IOW as we don't update the timeout
> being absolute, but both have input parameters and SUBMIT has output
> params.

Sorry, IOW was what I meant.  I'm not seeing the output param of SUBMIT
-- are you thinking of how the syncobj gets updated?
Eric Anholt March 13, 2019, 5:56 p.m. UTC | #16
Rob Herring <robh@kernel.org> writes:

> On Fri, Mar 8, 2019 at 10:29 AM Eric Anholt <eric@anholt.net> wrote:
>>
>> Rob Herring <robh@kernel.org> writes:
>>
>> > From: "Marty E. Plummer" <hanetzer@startmail.com>
>> >
>> > This adds the initial driver for panfrost which supports Arm Mali
>> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
>> > has been tested.
>
> [...]
>
>> It looks like you've got v3d's silliness with the fences -- we reserve a
>> shared slot, then use excl only anyway.  For v3d I'm planning on moving
>> to just excl -- only one of my entrypoints has info on write vs
>> read-only, and I don't know of a usecase where having multiple read-only
>> consumers of a shared buffer simultaneously matters.
>>
>> More importantly, I think you also have my bug of not doing implicit
>> synchronization on buffers, which will break X11 rendering
>> sometimes. X11's GL requirements are that previously-submitted rendering
>> by the client fd will execute before X11's rendering on its fd to the
>> same buffers.  If you're running a single client, X11's copies are cheap
>> enough that it'll probably work out most of the time.
>
> Is there a fix for this? I didn't find anything that looked like one.

Missed this part.

I'm thinking something like what the lima driver has for implicit sync.
Rob Herring March 13, 2019, 5:58 p.m. UTC | #17
On Wed, Mar 13, 2019 at 11:09 AM Eric Anholt <eric@anholt.net> wrote:
>
> Rob Herring <robh@kernel.org> writes:
>
> > On Fri, Mar 8, 2019 at 10:29 AM Eric Anholt <eric@anholt.net> wrote:
> >>
> >> Rob Herring <robh@kernel.org> writes:
> >>
> >> > From: "Marty E. Plummer" <hanetzer@startmail.com>
> >> >
> >> > This adds the initial driver for panfrost which supports Arm Mali
> >> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> >> > has been tested.
> >
> > [...]
> >
> >> It looks like you've got v3d's silliness with the fences -- we reserve a
> >> shared slot, then use excl only anyway.  For v3d I'm planning on moving
> >> to just excl -- only one of my entrypoints has info on write vs
> >> read-only, and I don't know of a usecase where having multiple read-only
> >> consumers of a shared buffer simultaneously matters.
> >>
> >> More importantly, I think you also have my bug of not doing implicit
> >> synchronization on buffers, which will break X11 rendering
> >> sometimes. X11's GL requirements are that previously-submitted rendering
> >> by the client fd will execute before X11's rendering on its fd to the
> >> same buffers.  If you're running a single client, X11's copies are cheap
> >> enough that it'll probably work out most of the time.
> >
> > Is there a fix for this? I didn't find anything that looked like one.
> >
> >>
> >> > --- /dev/null
> >> > +++ b/include/uapi/drm/panfrost_drm.h
> >>
> >> > +#define DRM_IOCTL_PANFROST_SUBMIT            DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
> >> > +#define DRM_IOCTL_PANFROST_WAIT_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
> >> > +#define DRM_IOCTL_PANFROST_CREATE_BO         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_CREATE_BO, struct drm_panfrost_create_bo)
> >> > +#define DRM_IOCTL_PANFROST_MMAP_BO           DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
> >> > +#define DRM_IOCTL_PANFROST_GET_PARAM         DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
> >> > +#define DRM_IOCTL_PANFROST_GET_BO_OFFSET     DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
> >>
> >> SUBMIT and WAIT_BO might be IOR instead of IOWR
> >
> > Huh? Perhaps WAIT_BO should be IOW as we don't update the timeout
> > being absolute, but both have input parameters and SUBMIT has output
> > params.
>
> Sorry, IOW was what I meant.  I'm not seeing the output param of SUBMIT
> -- are you thinking of how the syncobj gets updated?

Yeah, I was assuming out_sync was set on output, but now that I
actually look at it I see that's not the case as only the object is
updated.

Rob
Neil Armstrong March 14, 2019, 9:01 a.m. UTC | #18
On 08/03/2019 01:24, Rob Herring wrote:
> From: "Marty E. Plummer" <hanetzer@startmail.com>
> 
> This adds the initial driver for panfrost which supports Arm Mali
> Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> has been tested.
> 
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Maxime Ripard <maxime.ripard@bootlin.com>
> Cc: Sean Paul <sean@poorly.run>
> Cc: David Airlie <airlied@linux.ie>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Eric Anholt <eric@anholt.net>
> Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
> Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
> Sending this out in the spirit of release early, release often. We're 
> close to parity compared to mesa + the vendor driver. There's a few 
> issues Tomeu is chasing. 
> 
> There's still some pieces of the h/w setup we've just hardcoded. Locking 
> in various places is probably missing. Error recovery is non-existent 
> (other than module unload/load). There's some work to add tracepoints 
> and perf counters that's not here yet. Bifrost GPUs are definitely not 
> supported yet other than identifying them. Primarily the MMU setup is 
> missing.
> 
> How's performance? Great, because I haven't measured it.
> 
> This patch and its dependencies are available here[1]. The mesa support 
> is here[2]. Both are likely to change (daily).
> 
> Rob
> 
> [1] https://gitlab.freedesktop.org/robh/linux-panfrost.git panfrost-rebase
> [2] https://gitlab.freedesktop.org/tomeu/mesa.git mainline-driver
> 
>  drivers/gpu/drm/Kconfig                      |   2 +
>  drivers/gpu/drm/Makefile                     |   1 +
>  drivers/gpu/drm/panfrost/Kconfig             |  14 +
>  drivers/gpu/drm/panfrost/Makefile            |  11 +
>  drivers/gpu/drm/panfrost/panfrost_device.c   | 127 ++++
>  drivers/gpu/drm/panfrost/panfrost_device.h   |  83 +++
>  drivers/gpu/drm/panfrost/panfrost_drv.c      | 419 ++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_features.h | 308 +++++++++
>  drivers/gpu/drm/panfrost/panfrost_gem.c      |  92 +++
>  drivers/gpu/drm/panfrost/panfrost_gem.h      |  29 +
>  drivers/gpu/drm/panfrost/panfrost_gpu.c      | 464 +++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_gpu.h      |  15 +
>  drivers/gpu/drm/panfrost/panfrost_issues.h   | 175 +++++
>  drivers/gpu/drm/panfrost/panfrost_job.c      | 662 +++++++++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_job.h      |  47 ++
>  drivers/gpu/drm/panfrost/panfrost_mmu.c      | 409 ++++++++++++
>  drivers/gpu/drm/panfrost/panfrost_mmu.h      |  15 +
>  include/uapi/drm/panfrost_drm.h              | 138 ++++
>  18 files changed, 3011 insertions(+)
>  create mode 100644 drivers/gpu/drm/panfrost/Kconfig
>  create mode 100644 drivers/gpu/drm/panfrost/Makefile
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_drv.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_features.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_issues.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.h
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.c
>  create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.h
>  create mode 100644 include/uapi/drm/panfrost_drm.h
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index febdc102b75c..cdafe35f0576 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -335,6 +335,8 @@ source "drivers/gpu/drm/tve200/Kconfig"
>  
>  source "drivers/gpu/drm/xen/Kconfig"
>  
> +source "drivers/gpu/drm/panfrost/Kconfig"
> +
>  # Keep legacy drivers last
>  
>  menuconfig DRM_LEGACY
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index 7476ed945e30..66fd696ae60c 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -110,3 +110,4 @@ obj-$(CONFIG_DRM_TINYDRM) += tinydrm/
>  obj-$(CONFIG_DRM_PL111) += pl111/
>  obj-$(CONFIG_DRM_TVE200) += tve200/
>  obj-$(CONFIG_DRM_XEN) += xen/
> +obj-$(CONFIG_DRM_PANFROST) += panfrost/
> diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
> new file mode 100644
> index 000000000000..eb7283149354
> --- /dev/null
> +++ b/drivers/gpu/drm/panfrost/Kconfig
> @@ -0,0 +1,14 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +config DRM_PANFROST
> +	tristate "Panfrost (DRM support for ARM Mali Midgard/Bifrost GPUs)"
> +	depends on DRM
> +	depends on ARCH_ROCKCHIP

Could you switch to
+       depends on ARM || ARM64 || COMPILE_TEST
instead of ARCH_ROCKHIP ?

It will simply bringup on non-rockchip boards.

Neil

> +	depends on MMU
> +	select DRM_SCHED
> +	select IOMMU_SUPPORT
> +	select IOMMU_IO_PGTABLE_LPAE
> +	select DRM_GEM_SHMEM_HELPER
> +	help
> +	  DRM driver for ARM Mali Midgard (t6xx, t7xx, t8xx) and
> +	  Bifrost (G3x, G5x, G7x) GPUs.

[...]

> +/**
> + * Returns the offset for the BO in the GPU address space for this DRM fd.
> + * This is the same value returned by drm_panfrost_create_bo, if that was called
> + * from this DRM fd.
> + */
> +struct drm_panfrost_get_bo_offset {
> +	__u32 handle;
> +	__u32 pad;
> +	__u64 offset;
> +};
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif /* _PANFROST_DRM_H_ */
>
Rob Herring March 14, 2019, 12:46 p.m. UTC | #19
On Thu, Mar 14, 2019 at 4:01 AM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> On 08/03/2019 01:24, Rob Herring wrote:
> > From: "Marty E. Plummer" <hanetzer@startmail.com>
> >
> > This adds the initial driver for panfrost which supports Arm Mali
> > Midgard and Bifrost family of GPUs. Currently, only the T860 Midgard GPU
> > has been tested.
> >
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Maxime Ripard <maxime.ripard@bootlin.com>
> > Cc: Sean Paul <sean@poorly.run>
> > Cc: David Airlie <airlied@linux.ie>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Alyssa Rosenzweig <alyssa@rosenzweig.io>
> > Cc: Lyude Paul <lyude@redhat.com>
> > Cc: Eric Anholt <eric@anholt.net>
> > Signed-off-by: Marty E. Plummer <hanetzer@startmail.com>
> > Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > ---
> > Sending this out in the spirit of release early, release often. We're
> > close to parity compared to mesa + the vendor driver. There's a few
> > issues Tomeu is chasing.
> >
> > There's still some pieces of the h/w setup we've just hardcoded. Locking
> > in various places is probably missing. Error recovery is non-existent
> > (other than module unload/load). There's some work to add tracepoints
> > and perf counters that's not here yet. Bifrost GPUs are definitely not
> > supported yet other than identifying them. Primarily the MMU setup is
> > missing.

[...]

> > diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
> > new file mode 100644
> > index 000000000000..eb7283149354
> > --- /dev/null
> > +++ b/drivers/gpu/drm/panfrost/Kconfig
> > @@ -0,0 +1,14 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +config DRM_PANFROST
> > +     tristate "Panfrost (DRM support for ARM Mali Midgard/Bifrost GPUs)"
> > +     depends on DRM
> > +     depends on ARCH_ROCKCHIP
>
> Could you switch to
> +       depends on ARM || ARM64 || COMPILE_TEST
> instead of ARCH_ROCKHIP ?
>
> It will simply bringup on non-rockchip boards.

Yes, certainly.

Patch
diff mbox series

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index febdc102b75c..cdafe35f0576 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -335,6 +335,8 @@  source "drivers/gpu/drm/tve200/Kconfig"
 
 source "drivers/gpu/drm/xen/Kconfig"
 
+source "drivers/gpu/drm/panfrost/Kconfig"
+
 # Keep legacy drivers last
 
 menuconfig DRM_LEGACY
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 7476ed945e30..66fd696ae60c 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -110,3 +110,4 @@  obj-$(CONFIG_DRM_TINYDRM) += tinydrm/
 obj-$(CONFIG_DRM_PL111) += pl111/
 obj-$(CONFIG_DRM_TVE200) += tve200/
 obj-$(CONFIG_DRM_XEN) += xen/
+obj-$(CONFIG_DRM_PANFROST) += panfrost/
diff --git a/drivers/gpu/drm/panfrost/Kconfig b/drivers/gpu/drm/panfrost/Kconfig
new file mode 100644
index 000000000000..eb7283149354
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/Kconfig
@@ -0,0 +1,14 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+config DRM_PANFROST
+	tristate "Panfrost (DRM support for ARM Mali Midgard/Bifrost GPUs)"
+	depends on DRM
+	depends on ARCH_ROCKCHIP
+	depends on MMU
+	select DRM_SCHED
+	select IOMMU_SUPPORT
+	select IOMMU_IO_PGTABLE_LPAE
+	select DRM_GEM_SHMEM_HELPER
+	help
+	  DRM driver for ARM Mali Midgard (t6xx, t7xx, t8xx) and
+	  Bifrost (G3x, G5x, G7x) GPUs.
diff --git a/drivers/gpu/drm/panfrost/Makefile b/drivers/gpu/drm/panfrost/Makefile
new file mode 100644
index 000000000000..d07e0971b687
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/Makefile
@@ -0,0 +1,11 @@ 
+# SPDX-License-Identifier: GPL-2.0
+
+panfrost-y := \
+	panfrost_drv.o \
+	panfrost_device.o \
+	panfrost_gem.o \
+	panfrost_gpu.o \
+	panfrost_job.o \
+	panfrost_mmu.o
+
+obj-$(CONFIG_DRM_PANFROST) += panfrost.o
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.c b/drivers/gpu/drm/panfrost/panfrost_device.c
new file mode 100644
index 000000000000..cea3108d16cb
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_device.c
@@ -0,0 +1,127 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#include <linux/clk.h>
+#include <linux/platform_device.h>
+#include <linux/regulator/consumer.h>
+
+#include "panfrost_device.h"
+#include "panfrost_gpu.h"
+#include "panfrost_job.h"
+#include "panfrost_mmu.h"
+
+static int panfrost_clk_init(struct panfrost_device *pfdev)
+{
+	int err;
+	unsigned long rate;
+
+	pfdev->clock = devm_clk_get(pfdev->dev, NULL);
+	if (IS_ERR(pfdev->clock)) {
+		dev_err(pfdev->dev, "get clock failed %ld\n", PTR_ERR(pfdev->clock));
+		return PTR_ERR(pfdev->clock);
+	}
+
+	rate = clk_get_rate(pfdev->clock);
+	dev_info(pfdev->dev, "clock rate = %lu\n", rate);
+
+	if ((err = clk_prepare_enable(pfdev->clock)))
+		return err;
+
+	return 0;
+}
+
+static void panfrost_clk_fini(struct panfrost_device *pfdev)
+{
+	clk_disable_unprepare(pfdev->clock);
+}
+
+static int panfrost_regulator_init(struct panfrost_device *pfdev)
+{
+	int ret;
+	pfdev->regulator = devm_regulator_get_optional(pfdev->dev, "mali");
+	if (IS_ERR(pfdev->regulator)) {
+		ret = PTR_ERR(pfdev->regulator);
+		pfdev->regulator = NULL;
+		if (ret == -ENODEV)
+			return 0;
+		dev_err(pfdev->dev, "failed to get regulator: %ld\n", PTR_ERR(pfdev->regulator));
+		return ret;
+	}
+
+	ret = regulator_enable(pfdev->regulator);
+	if (ret < 0) {
+		dev_err(pfdev->dev, "failed to enable regulator: %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void panfrost_regulator_fini(struct panfrost_device *pfdev)
+{
+	if (pfdev->regulator)
+		regulator_disable(pfdev->regulator);
+}
+
+
+int panfrost_device_init(struct panfrost_device *pfdev)
+{
+	int err;
+	struct resource *res;
+
+	mutex_init(&pfdev->sched_lock);
+	INIT_LIST_HEAD(&pfdev->scheduled_jobs);
+
+	err = panfrost_clk_init(pfdev);
+	if (err) {
+		dev_err(pfdev->dev, "clk init failed %d\n", err);
+		return err;
+	}
+
+	if ((err = panfrost_regulator_init(pfdev))) {
+		dev_err(pfdev->dev, "regulator init failed %d\n", err);
+		goto err_out0;
+	}
+
+	res = platform_get_resource(pfdev->pdev, IORESOURCE_MEM, 0);
+	pfdev->iomem = devm_ioremap_resource(pfdev->dev, res);
+	if (IS_ERR(pfdev->iomem)) {
+		dev_err(pfdev->dev, "failed to ioremap iomem\n");
+		err = PTR_ERR(pfdev->iomem);
+		goto err_out1;
+	}
+
+	err = panfrost_gpu_init(pfdev);
+	if (err)
+		goto err_out1;
+
+	err = panfrost_mmu_init(pfdev);
+	if (err)
+		goto err_out2;
+
+	err = panfrost_job_init(pfdev);
+	if (err)
+		goto err_out3;
+
+	return 0;
+err_out3:
+	panfrost_mmu_fini(pfdev);
+err_out2:
+	panfrost_gpu_fini(pfdev);
+err_out1:
+	panfrost_regulator_fini(pfdev);
+err_out0:
+	panfrost_clk_fini(pfdev);
+	return err;
+}
+
+void panfrost_device_fini(struct panfrost_device *pfdev)
+{
+	panfrost_job_fini(pfdev);
+	panfrost_mmu_fini(pfdev);
+	panfrost_gpu_fini(pfdev);
+	panfrost_regulator_fini(pfdev);
+
+	panfrost_clk_fini(pfdev);
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_device.h b/drivers/gpu/drm/panfrost/panfrost_device.h
new file mode 100644
index 000000000000..3c20c6cf279c
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_device.h
@@ -0,0 +1,83 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#ifndef __PANFROST_DEVICE_H__
+#define __PANFROST_DEVICE_H__
+
+#include <linux/spinlock.h>
+#include <drm/drm_device.h>
+#include <drm/drm_mm.h>
+
+#include "panfrost_job.h"
+
+struct panfrost_device;
+struct panfrost_mmu;
+struct panfrost_job_slot;
+
+struct panfrost_features {
+	u16 id;
+	u16 revision;
+
+	u64 shader_present;
+	u64 tiler_present;
+	u64 l2_present;
+	u64 stack_present;
+	u32 as_present;
+	u32 js_present;
+
+	u32 l2_features;
+	u32 core_features;
+	u32 tiler_features;
+	u32 mem_features;
+	u32 mmu_features;
+	u32 thread_features;
+	u32 max_threads;
+	u32 thread_max_workgroup_sz;
+	u32 thread_max_barrier_sz;
+	u32 coherency_features;
+	u32 texture_features[4];
+	u32 js_features[16];
+
+	unsigned long base_hw_features[64 / BITS_PER_LONG];
+	unsigned long hw_issues[64 / BITS_PER_LONG];
+};
+
+struct panfrost_device {
+	struct device *dev;
+	struct drm_device *ddev;
+	struct platform_device *pdev;
+
+	struct drm_mm mm;
+	spinlock_t mm_lock;
+
+	void __iomem *iomem;
+	struct clk *clock;
+	struct regulator *regulator;
+
+	struct panfrost_features features;
+
+	struct panfrost_mmu *mmu;
+	struct panfrost_job_slot *js;
+
+	struct panfrost_job *jobs[3];
+	struct list_head scheduled_jobs;
+
+	struct mutex sched_lock;
+};
+
+struct panfrost_file_priv {
+	struct panfrost_device *pfdev;
+
+	struct drm_sched_entity sched_entity[NUM_JOB_SLOTS];
+};
+
+static inline struct panfrost_device *to_panfrost_device(struct drm_device *ddev)
+{
+	return ddev->dev_private;
+}
+
+int panfrost_device_init(struct panfrost_device *pfdev);
+void panfrost_device_fini(struct panfrost_device *pfdev);
+
+#endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c
new file mode 100644
index 000000000000..95eb85fa8f04
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -0,0 +1,419 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
+/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+#include <linux/of_platform.h>
+#include <linux/pagemap.h>
+#include <drm/panfrost_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_ioctl.h>
+#include <drm/drm_syncobj.h>
+#include <drm/drm_utils.h>
+
+#include "panfrost_device.h"
+#include "panfrost_gem.h"
+#include "panfrost_mmu.h"
+#include "panfrost_job.h"
+#include "panfrost_gpu.h"
+
+static int panfrost_ioctl_get_param(struct drm_device *ddev, void *data, struct drm_file *file)
+{
+	struct drm_panfrost_get_param *param = data;
+	struct panfrost_device *pfdev = ddev->dev_private;
+
+	if (param->pad != 0)
+		return -EINVAL;
+
+	switch(param->param) {
+	case DRM_PANFROST_PARAM_GPU_ID:
+		param->value = pfdev->features.id;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data,
+		struct drm_file *file)
+{
+	int ret;
+	struct drm_gem_shmem_object *shmem;
+	struct drm_panfrost_create_bo *args = data;
+
+	if (!args->size || args->flags)
+		return -EINVAL;
+
+	shmem = drm_gem_shmem_create_with_handle(file, dev, args->size, &args->handle);
+	if (IS_ERR(shmem))
+		return PTR_ERR(shmem);
+
+	ret = panfrost_mmu_map(to_panfrost_bo(&shmem->base));
+	if (ret)
+		goto err_free;
+
+	args->offset = to_panfrost_bo(&shmem->base)->node.start << PAGE_SHIFT;
+
+	return 0;
+
+err_free:
+	drm_gem_object_put_unlocked(&shmem->base);
+	return ret;
+}
+
+/**
+ * panfrost_lookup_bos() - Sets up job->bo[] with the GEM objects
+ * referenced by the job.
+ * @dev: DRM device
+ * @file_priv: DRM file for this fd
+ * @args: IOCTL args
+ * @job: job being set up
+ *
+ * Resolve handles from userspace to BOs and attach them to job.
+ *
+ * Note that this function doesn't need to unreference the BOs on
+ * failure, because that will happen at panfrost_job_cleanup() time.
+ */
+static int
+panfrost_lookup_bos(struct drm_device *dev,
+		  struct drm_file *file_priv,
+		  struct drm_panfrost_submit *args,
+		  struct panfrost_job *job)
+{
+	u32 *handles;
+	int ret = 0;
+	int i;
+
+	job->bo_count = args->bo_handle_count;
+
+	if (!job->bo_count)
+		return 0;
+
+	job->bos = kvmalloc_array(job->bo_count,
+				  sizeof(struct drm_panfrost_gem_object *),
+				  GFP_KERNEL | __GFP_ZERO);
+	if (!job->bos) {
+		DRM_DEBUG("Failed to allocate validated BO pointers\n");
+		return -ENOMEM;
+	}
+
+	handles = kvmalloc_array(job->bo_count, sizeof(u32), GFP_KERNEL);
+	if (!handles) {
+		ret = -ENOMEM;
+		DRM_DEBUG("Failed to allocate incoming GEM handles\n");
+		goto fail;
+	}
+
+	if (copy_from_user(handles,
+			   (void __user *)(uintptr_t)args->bo_handles,
+			   job->bo_count * sizeof(u32))) {
+		ret = -EFAULT;
+		DRM_DEBUG("Failed to copy in GEM handles\n");
+		goto fail;
+	}
+
+	spin_lock(&file_priv->table_lock);
+	for (i = 0; i < job->bo_count; i++) {
+		struct drm_gem_object *bo = idr_find(&file_priv->object_idr,
+						     handles[i]);
+		if (!bo) {
+			DRM_DEBUG("Failed to look up GEM BO %d: %d\n",
+				  i, handles[i]);
+			ret = -ENOENT;
+			spin_unlock(&file_priv->table_lock);
+			goto fail;
+		}
+		drm_gem_object_get(bo);
+		job->bos[i] = bo;
+	}
+	spin_unlock(&file_priv->table_lock);
+
+fail:
+	kvfree(handles);
+	return ret;
+}
+
+static int panfrost_ioctl_submit(struct drm_device *dev, void *data,
+		struct drm_file *file)
+{
+	struct panfrost_device *pfdev = dev->dev_private;
+	struct drm_panfrost_submit *args = data;
+	struct drm_syncobj *sync_out;
+	struct panfrost_job *job;
+	int ret = 0;
+
+	job = kcalloc(1, sizeof(*job), GFP_KERNEL);
+	if (!job)
+		return -ENOMEM;
+
+	kref_init(&job->refcount);
+
+	ret = drm_syncobj_find_fence(file, args->in_sync, 0, 0,
+				     &job->in_fence);
+	if (ret == -EINVAL)
+		goto fail;
+
+	job->pfdev = pfdev;
+	job->jc = args->jc;
+	job->requirements = args->requirements;
+	job->flush_id = panfrost_gpu_get_latest_flush_id(pfdev);
+	job->file_priv = file->driver_priv;
+
+	ret = panfrost_lookup_bos(dev, file, args, job);
+	if (ret)
+		goto fail;
+
+	ret = panfrost_job_push(job);
+	if (ret)
+		goto fail;
+
+	/* Update the return sync object for the job */
+	sync_out = drm_syncobj_find(file, args->out_sync);
+	if (sync_out) {
+		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
+		drm_syncobj_put(sync_out);
+	}
+
+fail:
+	panfrost_job_put(job);
+
+	return ret;
+}
+
+static int
+panfrost_ioctl_wait_bo(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv)
+{
+	long ret;
+	struct drm_panfrost_wait_bo *args = data;
+	struct drm_gem_object *gem_obj;
+	unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
+
+	if (args->pad)
+		return -EINVAL;
+
+	gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!gem_obj)
+		return -ENOENT;
+
+	ret = reservation_object_wait_timeout_rcu(gem_obj->resv, true,
+						  true, timeout);
+	if (!ret)
+		ret = timeout ? -ETIMEDOUT : -EBUSY;
+
+	drm_gem_object_put_unlocked(gem_obj);
+
+	return ret;
+}
+
+static int panfrost_ioctl_mmap_bo(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct drm_panfrost_mmap_bo *args = data;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	if (args->flags != 0) {
+		DRM_INFO("unknown mmap_bo flags: %d\n", args->flags);
+		return -EINVAL;
+	}
+
+	gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!gem_obj) {
+		DRM_DEBUG("Failed to look up GEM BO %d\n", args->handle);
+		return -ENOENT;
+	}
+
+	ret = drm_gem_create_mmap_offset(gem_obj);
+	if (ret == 0)
+		args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+	drm_gem_object_put_unlocked(gem_obj);
+
+	return ret;
+}
+
+static int panfrost_ioctl_get_bo_offset(struct drm_device *dev, void *data,
+			    struct drm_file *file_priv)
+{
+	struct drm_panfrost_get_bo_offset *args = data;
+	struct drm_gem_object *gem_obj;
+	struct panfrost_gem_object *bo;
+
+	gem_obj = drm_gem_object_lookup(file_priv, args->handle);
+	if (!gem_obj) {
+		DRM_DEBUG("Failed to look up GEM BO %d\n", args->handle);
+		return -ENOENT;
+	}
+	bo = to_panfrost_bo(gem_obj);
+
+	args->offset = bo->node.start << PAGE_SHIFT;
+
+	drm_gem_object_put_unlocked(gem_obj);
+	return 0;
+}
+
+static int
+panfrost_open(struct drm_device *dev, struct drm_file *file)
+{
+	struct panfrost_device *pfdev = dev->dev_private;
+	struct panfrost_file_priv *panfrost_priv;
+
+	panfrost_priv = kzalloc(sizeof(*panfrost_priv), GFP_KERNEL);
+	if (!panfrost_priv)
+		return -ENOMEM;
+
+	panfrost_priv->pfdev = pfdev;
+	file->driver_priv = panfrost_priv;
+
+	panfrost_job_open(panfrost_priv);
+
+	return 0;
+}
+
+static void
+panfrost_postclose(struct drm_device *dev, struct drm_file *file)
+{
+	struct panfrost_file_priv *panfrost_priv = file->driver_priv;
+
+	panfrost_job_close(panfrost_priv);
+
+	kfree(panfrost_priv);
+}
+
+/* DRM_AUTH is required on SUBMIT for now, while all clients share a single
+ * address space.  Note that render nodes would be able to submit jobs that
+ * could access BOs from clients authenticated with the master node.
+ */
+static const struct drm_ioctl_desc panfrost_drm_driver_ioctls[] = {
+#define PANFROST_IOCTL(n, func, flags) \
+	DRM_IOCTL_DEF_DRV(PANFROST_##n, panfrost_ioctl_##func, flags)
+
+	PANFROST_IOCTL(SUBMIT,		submit,		DRM_RENDER_ALLOW | DRM_AUTH),
+	PANFROST_IOCTL(WAIT_BO,		wait_bo,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(CREATE_BO,	create_bo,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(MMAP_BO,		mmap_bo,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(GET_PARAM,	get_param,	DRM_RENDER_ALLOW),
+	PANFROST_IOCTL(GET_BO_OFFSET,	get_bo_offset,	DRM_RENDER_ALLOW),
+};
+
+DEFINE_DRM_GEM_SHMEM_FOPS(panfrost_drm_driver_fops);
+
+static struct drm_driver panfrost_drm_driver = {
+	.driver_features	= DRIVER_RENDER | DRIVER_GEM | DRIVER_PRIME | DRIVER_SYNCOBJ,
+	.open			= panfrost_open,
+	.postclose		= panfrost_postclose,
+	.ioctls			= panfrost_drm_driver_ioctls,
+	.num_ioctls		= ARRAY_SIZE(panfrost_drm_driver_ioctls),
+	.fops			= &panfrost_drm_driver_fops,
+	.name			= "panfrost",
+	.desc			= "panfrost DRM",
+	.date			= "20180908",
+	.major			= 1,
+	.minor			= 0,
+
+	.gem_create_object	= panfrost_gem_create_object,
+	.prime_handle_to_fd	= drm_gem_prime_handle_to_fd,
+	.prime_fd_to_handle	= drm_gem_prime_fd_to_handle,
+	.gem_prime_import_sg_table = panfrost_gem_prime_import_sg_table,
+	.gem_prime_mmap		= drm_gem_prime_mmap,
+};
+
+static int panfrost_pdev_probe(struct platform_device *pdev)
+{
+	struct panfrost_device *pfdev;
+	struct drm_device *ddev;
+	int err;
+
+	pfdev = devm_kzalloc(&pdev->dev, sizeof(*pfdev), GFP_KERNEL);
+	if (!pfdev)
+		return -ENOMEM;
+
+	pfdev->pdev = pdev;
+	pfdev->dev = &pdev->dev;
+
+	platform_set_drvdata(pdev, pfdev);
+
+	/* Allocate and initialze the DRM device. */
+	ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
+	if (IS_ERR(ddev))
+		return PTR_ERR(ddev);
+
+	ddev->dev_private = pfdev;
+	pfdev->ddev = ddev;
+
+	spin_lock_init(&pfdev->mm_lock);
+	drm_mm_init(&pfdev->mm, 0, SZ_4G); // 4G enough for now. can be 48-bit
+
+	err = panfrost_device_init(pfdev);
+	if (err) {
+		dev_err(&pdev->dev, "Fatal error during GPU init\n");
+		goto err_out0;
+	}
+
+	/*
+	 * Register the DRM device with the core and the connectors with
+	 * sysfs
+	 */
+	err = drm_dev_register(ddev, 0);
+	if (err < 0)
+		goto err_out1;
+
+	return 0;
+
+err_out1:
+	panfrost_device_fini(pfdev);
+err_out0:
+	drm_dev_put(ddev);
+	return err;
+}
+
+static int panfrost_pdev_remove(struct platform_device *pdev)
+{
+	struct panfrost_device *pfdev = platform_get_drvdata(pdev);
+	struct drm_device *ddev = pfdev->ddev;
+
+	drm_dev_unregister(ddev);
+	panfrost_device_fini(pfdev);
+	drm_dev_put(ddev);
+	return 0;
+}
+
+static const struct of_device_id dt_match[] = {
+	{ .compatible = "arm,mali-t760" },
+	{ .compatible = "arm,mali-t860" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, dt_match);
+
+static struct platform_driver panfrost_platform_driver = {
+	.probe		= panfrost_pdev_probe,
+	.remove		= panfrost_pdev_remove,
+	.driver		= {
+		.name	= "panfrost",
+		.of_match_table = dt_match,
+	},
+};
+
+static int __init panfrost_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&panfrost_platform_driver);
+
+	return ret;
+}
+module_init(panfrost_init);
+
+static void __exit panfrost_exit(void)
+{
+	platform_driver_unregister(&panfrost_platform_driver);
+}
+module_exit(panfrost_exit);
+
+MODULE_AUTHOR("Panfrost Project Developers");
+MODULE_DESCRIPTION("Panfrost DRM Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/drivers/gpu/drm/panfrost/panfrost_features.h b/drivers/gpu/drm/panfrost/panfrost_features.h
new file mode 100644
index 000000000000..0f1358ee1a4e
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_features.h
@@ -0,0 +1,308 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) COPYRIGHT 2014-2018 ARM Limited. All rights reserved. */
+/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
+#ifndef __PANFROST_FEATURES_H__
+#define __PANFROST_FEATURES_H__
+
+#include <linux/bitops.h>
+
+#include "panfrost_device.h"
+
+enum base_hw_feature {
+	HW_FEATURE_JOBCHAIN_DISAMBIGUATION,
+	HW_FEATURE_PWRON_DURING_PWROFF_TRANS,
+	HW_FEATURE_XAFFINITY,
+	HW_FEATURE_OUT_OF_ORDER_EXEC,
+	HW_FEATURE_MRT,
+	HW_FEATURE_BRNDOUT_CC,
+	HW_FEATURE_INTERPIPE_REG_ALIASING,
+	HW_FEATURE_LD_ST_TILEBUFFER,
+	HW_FEATURE_MSAA_16X,
+	HW_FEATURE_32_BIT_UNIFORM_ADDRESS,
+	HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL,
+	HW_FEATURE_OPTIMIZED_COVERAGE_MASK,
+	HW_FEATURE_T7XX_PAIRING_RULES,
+	HW_FEATURE_LD_ST_LEA_TEX,
+	HW_FEATURE_LINEAR_FILTER_FLOAT,
+	HW_FEATURE_WORKGROUP_ROUND_MULTIPLE_OF_4,
+	HW_FEATURE_IMAGES_IN_FRAGMENT_SHADERS,
+	HW_FEATURE_TEST4_DATUM_MODE,
+	HW_FEATURE_NEXT_INSTRUCTION_TYPE,
+	HW_FEATURE_BRNDOUT_KILL,
+	HW_FEATURE_WARPING,
+	HW_FEATURE_V4,
+	HW_FEATURE_FLUSH_REDUCTION,
+	HW_FEATURE_PROTECTED_MODE,
+	HW_FEATURE_COHERENCY_REG,
+	HW_FEATURE_PROTECTED_DEBUG_MODE,
+	HW_FEATURE_AARCH64_MMU,
+	HW_FEATURE_TLS_HASHING,
+	HW_FEATURE_THREAD_GROUP_SPLIT,
+	HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG,
+};
+
+#define hw_features_t600 (\
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_V4))
+
+#define hw_features_t620 (\
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_V4))
+
+#define hw_features_t720 (\
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_OPTIMIZED_COVERAGE_MASK) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_WORKGROUP_ROUND_MULTIPLE_OF_4) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_V4))
+
+
+#define hw_features_t760 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+
+// T860
+#define hw_features_t860 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+
+#define hw_features_t880 hw_features_t860
+
+#define hw_features_t830 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+
+#define hw_features_t820 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT))
+
+#define hw_features_g71 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG))
+
+#define hw_features_g72 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG))
+
+#define hw_features_g51 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG))
+
+#define hw_features_g52 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG))
+
+#define hw_features_g76 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
+	BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
+	BIT_ULL(HW_FEATURE_TLS_HASHING) | \
+	BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
+
+#define hw_features_g31 (\
+	BIT_ULL(HW_FEATURE_JOBCHAIN_DISAMBIGUATION) | \
+	BIT_ULL(HW_FEATURE_PWRON_DURING_PWROFF_TRANS) | \
+	BIT_ULL(HW_FEATURE_XAFFINITY) | \
+	BIT_ULL(HW_FEATURE_WARPING) | \
+	BIT_ULL(HW_FEATURE_INTERPIPE_REG_ALIASING) | \
+	BIT_ULL(HW_FEATURE_32_BIT_UNIFORM_ADDRESS) | \
+	BIT_ULL(HW_FEATURE_ATTR_AUTO_TYPE_INFERRAL) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_CC) | \
+	BIT_ULL(HW_FEATURE_BRNDOUT_KILL) | \
+	BIT_ULL(HW_FEATURE_LD_ST_LEA_TEX) | \
+	BIT_ULL(HW_FEATURE_LD_ST_TILEBUFFER) | \
+	BIT_ULL(HW_FEATURE_LINEAR_FILTER_FLOAT) | \
+	BIT_ULL(HW_FEATURE_MRT) | \
+	BIT_ULL(HW_FEATURE_MSAA_16X) | \
+	BIT_ULL(HW_FEATURE_NEXT_INSTRUCTION_TYPE) | \
+	BIT_ULL(HW_FEATURE_OUT_OF_ORDER_EXEC) | \
+	BIT_ULL(HW_FEATURE_T7XX_PAIRING_RULES) | \
+	BIT_ULL(HW_FEATURE_TEST4_DATUM_MODE) | \
+	BIT_ULL(HW_FEATURE_THREAD_GROUP_SPLIT) | \
+	BIT_ULL(HW_FEATURE_FLUSH_REDUCTION) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_MODE) | \
+	BIT_ULL(HW_FEATURE_PROTECTED_DEBUG_MODE) | \
+	BIT_ULL(HW_FEATURE_COHERENCY_REG) | \
+	BIT_ULL(HW_FEATURE_AARCH64_MMU) | \
+	BIT_ULL(HW_FEATURE_TLS_HASHING) | \
+	BIT_ULL(HW_FEATURE_3BIT_EXT_RW_L2_MMU_CONFIG))
+
+static inline bool panfrost_has_hw_feature(struct panfrost_device *pfdev, enum base_hw_feature feat)
+{
+	return test_bit(feat, pfdev->features.base_hw_features);
+}
+
+#endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
new file mode 100644
index 000000000000..31f13f49277a
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -0,0 +1,92 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+
+#include <drm/panfrost_drm.h>
+#include "panfrost_device.h"
+#include "panfrost_gem.h"
+#include "panfrost_mmu.h"
+
+/* Called DRM core on the last userspace/kernel unreference of the
+ * BO.
+ */
+void panfrost_gem_free_object(struct drm_gem_object *obj)
+{
+	struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+	struct panfrost_device *pfdev = obj->dev->dev_private;
+
+	panfrost_mmu_unmap(bo);
+
+	spin_lock(&pfdev->mm_lock);
+	drm_mm_remove_node(&bo->node);
+	spin_unlock(&pfdev->mm_lock);
+
+	drm_gem_shmem_free_object(obj);
+}
+
+static const struct drm_gem_object_funcs panfrost_gem_funcs = {
+	.free = panfrost_gem_free_object,
+	.print_info = drm_gem_shmem_print_info,
+	.pin = drm_gem_shmem_pin,
+	.unpin = drm_gem_shmem_unpin,
+	.get_sg_table = drm_gem_shmem_get_sg_table,
+	.vmap = drm_gem_shmem_vmap,
+	.vunmap = drm_gem_shmem_vunmap,
+	.vm_ops = &drm_gem_shmem_vm_ops,
+};
+
+/**
+ * panfrost_gem_create_object - Implementation of driver->gem_create_object.
+ * @dev: DRM device
+ * @size: Size in bytes of the memory the object will reference
+ *
+ * This lets the GEM helpers allocate object structs for us, and keep
+ * our BO stats correct.
+ */
+struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size)
+{
+	int ret;
+	struct panfrost_device *pfdev = dev->dev_private;
+	struct panfrost_gem_object *obj;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return NULL;
+
+	obj->base.base.funcs = &panfrost_gem_funcs;
+
+	spin_lock(&pfdev->mm_lock);
+	ret = drm_mm_insert_node(&pfdev->mm, &obj->node,
+				 roundup(size, PAGE_SIZE) >> PAGE_SHIFT);
+	spin_unlock(&pfdev->mm_lock);
+	if (ret)
+		goto free_obj;
+
+	return &obj->base.base;
+
+free_obj:
+	kfree(obj);
+	return ERR_PTR(ret);
+}
+
+struct drm_gem_object *
+panfrost_gem_prime_import_sg_table(struct drm_device *dev,
+				   struct dma_buf_attachment *attach,
+				   struct sg_table *sgt)
+{
+	struct drm_gem_object *obj;
+	struct panfrost_gem_object *pobj;
+
+	obj = drm_gem_shmem_prime_import_sg_table(dev, attach, sgt);
+	pobj = to_panfrost_bo(obj);
+
+	obj->resv = attach->dmabuf->resv;
+
+	panfrost_mmu_map(pobj);
+
+	return obj;
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h
new file mode 100644
index 000000000000..045000eb5fcf
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#ifndef __PANFROST_GEM_H__
+#define __PANFROST_GEM_H__
+
+#include <drm/drm_gem_shmem_helper.h>
+#include <drm/drm_mm.h>
+
+struct panfrost_gem_object {
+	struct drm_gem_shmem_object base;
+
+	struct drm_mm_node node;
+};
+
+static inline
+struct  panfrost_gem_object *to_panfrost_bo(struct drm_gem_object *obj)
+{
+	return container_of(to_drm_gem_shmem_obj(obj), struct panfrost_gem_object, base);
+}
+
+struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t size);
+
+struct drm_gem_object *
+panfrost_gem_prime_import_sg_table(struct drm_device *dev,
+				   struct dma_buf_attachment *attach,
+				   struct sg_table *sgt);
+
+#endif /* __PANFROST_GEM_H__ */
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c
new file mode 100644
index 000000000000..b71faa583145
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -0,0 +1,464 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
+/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+
+#include <linux/bitmap.h>
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+
+#include "panfrost_device.h"
+#include "panfrost_features.h"
+#include "panfrost_issues.h"
+#include "panfrost_gpu.h"
+
+#define GPU_ID				0x00
+#define GPU_L2_FEATURES			0x004	/* (RO) Level 2 cache features */
+#define GPU_CORE_FEATURES		0x008	/* (RO) Shader Core Features */
+#define GPU_TILER_FEATURES		0x00C	/* (RO) Tiler Features */
+#define GPU_MEM_FEATURES		0x010	/* (RO) Memory system features */
+#define   GROUPS_L2_COHERENT		BIT(0)	/* Cores groups are l2 coherent */
+
+#define GPU_MMU_FEATURES		0x014	/* (RO) MMU features */
+#define GPU_AS_PRESENT			0x018	/* (RO) Address space slots present */
+#define GPU_JS_PRESENT			0x01C	/* (RO) Job slots present */
+
+
+#define GPU_INT_RAWSTAT			0x20
+#define GPU_INT_CLEAR			0x24
+#define GPU_INT_MASK			0x28
+#define GPU_INT_STAT			0x2c
+#define   GPU_IRQ_FAULT			BIT(0)
+#define   GPU_IRQ_MULTIPLE_FAULT	BIT(7)
+#define   GPU_IRQ_RESET_COMPLETED	BIT(8)
+#define   GPU_IRQ_POWER_CHANGED		BIT(9)
+#define   GPU_IRQ_POWER_CHANGED_ALL	BIT(10)
+#define   GPU_IRQ_PERFCNT_SAMPLE_COMPLETED BIT(16)
+#define   GPU_IRQ_CLEAN_CACHES_COMPLETED BIT(17)
+#define   GPU_IRQ_MASK_ALL			 \
+	  (					 \
+	   GPU_IRQ_FAULT			|\
+	   GPU_IRQ_MULTIPLE_FAULT		|\
+	   GPU_IRQ_RESET_COMPLETED		|\
+	   GPU_IRQ_POWER_CHANGED		|\
+	   GPU_IRQ_POWER_CHANGED_ALL		|\
+	   GPU_IRQ_PERFCNT_SAMPLE_COMPLETED	|\
+	   GPU_IRQ_CLEAN_CACHES_COMPLETED )
+#define GPU_IRQ_MASK_ERROR	   		\
+	(					\
+	 GPU_IRQ_FAULT				|\
+	 GPU_IRQ_MULTIPLE_FAULT)
+#define GPU_CMD				0x30
+#define   GPU_CMD_SOFT_RESET		0x01
+#define GPU_STATUS			0x34
+#define GPU_LATEST_FLUSH_ID		0x38
+
+
+#define GPU_THREAD_MAX_THREADS		0x0A0	/* (RO) Maximum number of threads per core */
+#define GPU_THREAD_MAX_WORKGROUP_SIZE	0x0A4	/* (RO) Maximum workgroup size */
+#define GPU_THREAD_MAX_BARRIER_SIZE	0x0A8	/* (RO) Maximum threads waiting at a barrier */
+#define GPU_THREAD_FEATURES		0x0AC	/* (RO) Thread features */
+#define GPU_THREAD_TLS_ALLOC		0x310   /* (RO) Number of threads per core that
+						 * TLS must be allocated for */
+
+#define GPU_TEXTURE_FEATURES(n)		(0x0B0 + ((n) * 4))
+#define GPU_JS_FEATURES(n)		(0x0C0 + ((n) * 4))
+
+#define GPU_SHADER_PRESENT_LO		0x100	/* (RO) Shader core present bitmap, low word */
+#define GPU_SHADER_PRESENT_HI		0x104	/* (RO) Shader core present bitmap, high word */
+#define GPU_TILER_PRESENT_LO		0x110	/* (RO) Tiler core present bitmap, low word */
+#define GPU_TILER_PRESENT_HI		0x114	/* (RO) Tiler core present bitmap, high word */
+
+#define GPU_L2_PRESENT_LO		0x120	/* (RO) Level 2 cache present bitmap, low word */
+#define GPU_L2_PRESENT_HI		0x124	/* (RO) Level 2 cache present bitmap, high word */
+
+#define GPU_COHERENCY_FEATURES		0x300	/* (RO) Coherency features present */
+
+#define GPU_STACK_PRESENT_LO		0xE00   /* (RO) Core stack present bitmap, low word */
+#define GPU_STACK_PRESENT_HI		0xE04   /* (RO) Core stack present bitmap, high word */
+
+#define SHADER_READY_LO         0x140	/* (RO) Shader core ready bitmap, low word */
+#define SHADER_READY_HI         0x144	/* (RO) Shader core ready bitmap, high word */
+
+#define TILER_READY_LO          0x150	/* (RO) Tiler core ready bitmap, low word */
+#define TILER_READY_HI          0x154	/* (RO) Tiler core ready bitmap, high word */
+
+#define L2_READY_LO             0x160	/* (RO) Level 2 cache ready bitmap, low word */
+#define L2_READY_HI             0x164	/* (RO) Level 2 cache ready bitmap, high word */
+
+#define STACK_READY_LO          0xE10   /* (RO) Core stack ready bitmap, low word */
+#define STACK_READY_HI          0xE14   /* (RO) Core stack ready bitmap, high word */
+
+
+#define SHADER_PWRON_LO         0x180	/* (WO) Shader core power on bitmap, low word */
+#define SHADER_PWRON_HI         0x184	/* (WO) Shader core power on bitmap, high word */
+
+#define TILER_PWRON_LO          0x190	/* (WO) Tiler core power on bitmap, low word */
+#define TILER_PWRON_HI          0x194	/* (WO) Tiler core power on bitmap, high word */
+
+#define L2_PWRON_LO             0x1A0	/* (WO) Level 2 cache power on bitmap, low word */
+#define L2_PWRON_HI             0x1A4	/* (WO) Level 2 cache power on bitmap, high word */
+
+#define STACK_PWRON_LO          0xE20   /* (RO) Core stack power on bitmap, low word */
+#define STACK_PWRON_HI          0xE24   /* (RO) Core stack power on bitmap, high word */
+
+
+#define SHADER_PWROFF_LO        0x1C0	/* (WO) Shader core power off bitmap, low word */
+#define SHADER_PWROFF_HI        0x1C4	/* (WO) Shader core power off bitmap, high word */
+
+#define TILER_PWROFF_LO         0x1D0	/* (WO) Tiler core power off bitmap, low word */
+#define TILER_PWROFF_HI         0x1D4	/* (WO) Tiler core power off bitmap, high word */
+
+#define L2_PWROFF_LO            0x1E0	/* (WO) Level 2 cache power off bitmap, low word */
+#define L2_PWROFF_HI            0x1E4	/* (WO) Level 2 cache power off bitmap, high word */
+
+#define STACK_PWROFF_LO         0xE30   /* (RO) Core stack power off bitmap, low word */
+#define STACK_PWROFF_HI         0xE34   /* (RO) Core stack power off bitmap, high word */
+
+
+#define SHADER_PWRTRANS_LO      0x200	/* (RO) Shader core power transition bitmap, low word */
+#define SHADER_PWRTRANS_HI      0x204	/* (RO) Shader core power transition bitmap, high word */
+
+#define TILER_PWRTRANS_LO       0x210	/* (RO) Tiler core power transition bitmap, low word */
+#define TILER_PWRTRANS_HI       0x214	/* (RO) Tiler core power transition bitmap, high word */
+
+#define L2_PWRTRANS_LO          0x220	/* (RO) Level 2 cache power transition bitmap, low word */
+#define L2_PWRTRANS_HI          0x224	/* (RO) Level 2 cache power transition bitmap, high word */
+
+#define STACK_PWRTRANS_LO       0xE40   /* (RO) Core stack power transition bitmap, low word */
+#define STACK_PWRTRANS_HI       0xE44   /* (RO) Core stack power transition bitmap, high word */
+
+
+#define SHADER_PWRACTIVE_LO     0x240	/* (RO) Shader core active bitmap, low word */
+#define SHADER_PWRACTIVE_HI     0x244	/* (RO) Shader core active bitmap, high word */
+
+#define TILER_PWRACTIVE_LO      0x250	/* (RO) Tiler core active bitmap, low word */
+#define TILER_PWRACTIVE_HI      0x254	/* (RO) Tiler core active bitmap, high word */
+
+#define L2_PWRACTIVE_LO         0x260	/* (RO) Level 2 cache active bitmap, low word */
+#define L2_PWRACTIVE_HI         0x264	/* (RO) Level 2 cache active bitmap, high word */
+
+#define GPU_JM_CONFIG		0xF00   /* (RW) Job Manager configuration register (Implementation specific register) */
+#define GPU_SHADER_CONFIG	0xF04	/* (RW) Shader core configuration settings (Implementation specific register) */
+#define GPU_TILER_CONFIG	0xF08   /* (RW) Tiler core configuration settings (Implementation specific register) */
+#define GPU_L2_MMU_CONFIG	0xF0C	/* (RW) Configuration of the L2 cache and MMU (Implementation specific register) */
+
+/* L2_MMU_CONFIG register */
+#define L2_MMU_CONFIG_ALLOW_SNOOP_DISPARITY_SHIFT       (23)
+#define L2_MMU_CONFIG_ALLOW_SNOOP_DISPARITY             (0x1 << L2_MMU_CONFIG_ALLOW_SNOOP_DISPARITY_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT        (24)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_READS              (0x3 << L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_OCTANT       (0x1 << L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_QUARTER      (0x2 << L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_HALF         (0x3 << L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT)
+
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT       (26)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES             (0x3 << L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_OCTANT      (0x1 << L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_QUARTER     (0x2 << L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT)
+#define L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_HALF        (0x3 << L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT)
+
+#define L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_READS_SHIFT      (12)
+#define L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_READS            (0x7 << L2_MMU_CONFIG_LIMIT_EXTERNAL_READS_SHIFT)
+
+#define L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_WRITES_SHIFT     (15)
+#define L2_MMU_CONFIG_3BIT_LIMIT_EXTERNAL_WRITES           (0x7 << L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES_SHIFT)
+
+/* SHADER_CONFIG register */
+
+#define SC_ALT_COUNTERS             (1ul << 3)
+#define SC_OVERRIDE_FWD_PIXEL_KILL  (1ul << 4)
+#define SC_SDC_DISABLE_OQ_DISCARD   (1ul << 6)
+#define SC_LS_ALLOW_ATTR_TYPES      (1ul << 16)
+#define SC_LS_PAUSEBUFFER_DISABLE   (1ul << 16)
+#define SC_TLS_HASH_ENABLE          (1ul << 17)
+#define SC_LS_ATTR_CHECK_DISABLE    (1ul << 18)
+#define SC_ENABLE_TEXGRD_FLAGS      (1ul << 25)
+/* End SHADER_CONFIG register */
+
+/* TILER_CONFIG register */
+
+#define TC_CLOCK_GATE_OVERRIDE      (1ul << 0)
+
+/* JM_CONFIG register */
+
+#define JM_TIMESTAMP_OVERRIDE  (1ul << 0)
+#define JM_CLOCK_GATE_OVERRIDE (1ul << 1)
+#define JM_JOB_THROTTLE_ENABLE (1ul << 2)
+#define JM_JOB_THROTTLE_LIMIT_SHIFT (3)
+#define JM_MAX_JOB_THROTTLE_LIMIT (0x3F)
+#define JM_FORCE_COHERENCY_FEATURES_SHIFT (2)
+#define JM_IDVS_GROUP_SIZE_SHIFT (16)
+#define JM_MAX_IDVS_GROUP_SIZE (0x3F)
+
+
+#define gpu_write(dev, reg, data) writel(data, dev->iomem + reg)
+#define gpu_read(dev, reg) readl(dev->iomem + reg)
+
+static irqreturn_t panfrost_gpu_irq_handler(int irq, void *data)
+{
+	struct panfrost_device *pfdev = data;
+	u32 state = gpu_read(pfdev, GPU_INT_STAT);
+	u32 status = gpu_read(pfdev, GPU_STATUS);
+	bool done = false;
+
+	if (!state)
+		return IRQ_NONE;
+
+	if (state & GPU_IRQ_MASK_ERROR) {
+		dev_err(pfdev->dev, "gpu error irq state=%x status=%x\n",
+			state, status);
+
+		gpu_write(pfdev, GPU_INT_MASK, 0);
+
+		done = true;
+	}
+
+	gpu_write(pfdev, GPU_INT_CLEAR, state);
+
+	return IRQ_HANDLED;
+}
+
+static void panfrost_gpu_soft_reset_async(struct panfrost_device *pfdev)
+{
+	gpu_write(pfdev, GPU_INT_MASK, 0);
+	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_RESET_COMPLETED);
+	gpu_write(pfdev, GPU_CMD, GPU_CMD_SOFT_RESET);
+}
+
+static int panfrost_gpu_soft_reset_async_wait(struct panfrost_device *pfdev)
+{
+	int timeout;
+
+	for (timeout = 500; timeout > 0; timeout--) {
+		if (gpu_read(pfdev, GPU_INT_RAWSTAT) & GPU_IRQ_RESET_COMPLETED)
+			break;
+	}
+
+	if (!timeout) {
+		dev_err(pfdev->dev, "gpu soft reset timed out\n");
+		return -ETIMEDOUT;
+	}
+
+	gpu_write(pfdev, GPU_INT_CLEAR, GPU_IRQ_MASK_ALL);
+	gpu_write(pfdev, GPU_INT_MASK, GPU_IRQ_MASK_ALL);
+
+	return 0;
+}
+
+static void panfrost_gpu_init_quirks(struct panfrost_device *pfdev)
+{
+	u32 sc = 0, tiler = 0, jm = 0, mmu = 0;
+
+	mmu = gpu_read(pfdev, GPU_L2_MMU_CONFIG);
+
+	// Need more version detection
+	if (pfdev->features.id == 0x0860) {
+		sc = SC_LS_ALLOW_ATTR_TYPES;
+		tiler = TC_CLOCK_GATE_OVERRIDE;
+		jm = JM_MAX_JOB_THROTTLE_LIMIT << JM_JOB_THROTTLE_LIMIT_SHIFT;
+		mmu &= ~(L2_MMU_CONFIG_LIMIT_EXTERNAL_READS | L2_MMU_CONFIG_LIMIT_EXTERNAL_WRITES);
+	}
+
+	gpu_write(pfdev, GPU_SHADER_CONFIG, sc);
+	gpu_write(pfdev, GPU_TILER_CONFIG, tiler);
+	gpu_write(pfdev, GPU_L2_MMU_CONFIG, mmu);
+	gpu_write(pfdev, GPU_JM_CONFIG, jm);
+}
+
+#define MAX_HW_REVS 6
+
+struct panfrost_model {
+	const char *name;
+	u32 id;
+	u32 id_mask;
+	u64 features;
+	u64 issues;
+	struct {
+		u32 revision;
+		u64 issues;
+	} revs[MAX_HW_REVS];
+};
+
+#define GPU_MODEL(_name, _id, _mask, ...) \
+{\
+	.name = __stringify(_name),				\
+	.id = _id,						\
+	.id_mask = _mask,					\
+	.features = hw_features_##_name,			\
+	.issues = hw_issues_##_name,			\
+	.revs = { __VA_ARGS__ },				\
+}
+#define GPU_MODEL_MIDGARD(name, id, ...) GPU_MODEL(name, id, 0xfff, __VA_ARGS__)
+#define GPU_MODEL_BIFROST(name, id, ...) GPU_MODEL(name, id, 0xf00f, __VA_ARGS__)
+
+#define GPU_REV_EXT(name, _rev, _p, _s, stat) \
+{\
+	.revision = (_rev) << 12 | (_p) << 4 | (_s),		\
+	.issues = hw_issues_##name##_r##_rev##p##_p##stat,	\
+}
+#define GPU_REV(name, r, p) GPU_REV_EXT(name, r, p, 0, )
+
+static const struct panfrost_model gpu_models[] = {
+	/* T60x has an oddball version */
+	GPU_MODEL(t600, 0x6956, 0xffff,
+		GPU_REV_EXT(t600, 0, 0, 1, _15dev0)),
+	GPU_MODEL_MIDGARD(t620, 0x620,
+		GPU_REV(t620, 0, 1), GPU_REV(t620, 1, 0)),
+	GPU_MODEL_MIDGARD(t720, 0x720),
+	GPU_MODEL_MIDGARD(t760, 0x750,
+		GPU_REV(t760, 0, 0), GPU_REV(t760, 0, 1),
+		GPU_REV_EXT(t760, 0, 1, 0, _50rel0),
+		GPU_REV(t760, 0, 2), GPU_REV(t760, 0, 3)),
+	GPU_MODEL_MIDGARD(t820, 0x820),
+	GPU_MODEL_MIDGARD(t830, 0x830),
+	GPU_MODEL_MIDGARD(t860, 0x860),
+	GPU_MODEL_MIDGARD(t880, 0x880),
+
+	GPU_MODEL_BIFROST(g71, 0x6000,
+		GPU_REV_EXT(g71, 0, 0, 1, _05dev0)),
+	GPU_MODEL_BIFROST(g72, 0x6001),
+	GPU_MODEL_BIFROST(g51, 0x7000),
+	GPU_MODEL_BIFROST(g76, 0x7001),
+	GPU_MODEL_BIFROST(g52, 0x7002),
+	GPU_MODEL_BIFROST(g31, 0x7003,
+		GPU_REV(g31, 1, 0)),
+};
+
+static void panfrost_gpu_init_features(struct panfrost_device *pfdev)
+{
+	u32 gpu_id, num_js, major, minor, status, rev;
+	const char *name = "unknown";
+	u64 hw_feat = 0;
+	u64 hw_issues = hw_issues_all;
+	const struct panfrost_model *model;
+	int i;
+
+	pfdev->features.l2_features = gpu_read(pfdev, GPU_L2_FEATURES);
+	pfdev->features.core_features = gpu_read(pfdev, GPU_CORE_FEATURES);
+	pfdev->features.tiler_features = gpu_read(pfdev, GPU_TILER_FEATURES);
+	pfdev->features.mem_features = gpu_read(pfdev, GPU_MEM_FEATURES);
+	pfdev->features.mmu_features = gpu_read(pfdev, GPU_MMU_FEATURES);
+	pfdev->features.thread_features = gpu_read(pfdev, GPU_THREAD_FEATURES);
+	pfdev->features.coherency_features = gpu_read(pfdev, GPU_COHERENCY_FEATURES);
+	for (i = 0; i < 4; i++)
+		pfdev->features.texture_features[i] = gpu_read(pfdev, GPU_TEXTURE_FEATURES(i));
+
+	pfdev->features.as_present = gpu_read(pfdev, GPU_AS_PRESENT);
+
+	pfdev->features.js_present = gpu_read(pfdev, GPU_JS_PRESENT);
+	num_js = hweight32(pfdev->features.js_present);
+	for (i = 0; i < num_js; i++)
+		pfdev->features.js_features[i] = gpu_read(pfdev, GPU_JS_FEATURES(i));
+
+	pfdev->features.shader_present = gpu_read(pfdev, GPU_SHADER_PRESENT_LO);
+	pfdev->features.shader_present |= (u64)gpu_read(pfdev, GPU_SHADER_PRESENT_HI) << 32;
+
+	pfdev->features.tiler_present = gpu_read(pfdev, GPU_TILER_PRESENT_LO);
+	pfdev->features.tiler_present |= (u64)gpu_read(pfdev, GPU_TILER_PRESENT_HI) << 32;
+
+	pfdev->features.l2_present = gpu_read(pfdev, GPU_L2_PRESENT_LO);
+	pfdev->features.l2_present |= (u64)gpu_read(pfdev, GPU_L2_PRESENT_HI) << 32;
+
+	pfdev->features.stack_present = gpu_read(pfdev, GPU_STACK_PRESENT_LO);
+	pfdev->features.stack_present |= (u64)gpu_read(pfdev, GPU_STACK_PRESENT_HI) << 32;
+
+	gpu_id = gpu_read(pfdev, GPU_ID);
+	pfdev->features.id = gpu_id >> 16;
+	pfdev->features.revision = gpu_id & 0xffff;
+
+	major = (pfdev->features.revision >> 12) & 0xf;
+	minor = (pfdev->features.revision >> 4) & 0xff;
+	status = pfdev->features.revision & 0xf;
+	rev = pfdev->features.revision;
+
+	gpu_id = pfdev->features.id;
+
+	for (model = gpu_models; model->name; model++) {
+		if ((gpu_id & model->id_mask) != model->id)
+			continue;
+
+		name = model->name;
+		hw_feat = model->features;
+		hw_issues |= model->issues;
+		for (i = 0; i < MAX_HW_REVS; i++) {
+			if ((model->revs[i].revision != rev) &&
+			    (model->revs[i].revision != (rev & ~0xf)))
+				continue;
+			hw_issues |= model->revs[i].issues;
+			break;
+		}
+
+		break;
+	}
+
+	bitmap_from_u64(pfdev->features.base_hw_features, hw_feat);
+	bitmap_from_u64(pfdev->features.hw_issues, hw_issues);
+
+	dev_info(pfdev->dev, "mali-%s id 0x%x major 0x%x minor 0x%x status 0x%x",
+		 name, gpu_id, major, minor, status);
+	dev_info(pfdev->dev, "features: %64pb, issues: %64pb",
+		 pfdev->features.base_hw_features,
+		 pfdev->features.hw_issues);
+
+	dev_info(pfdev->dev, "Features: L2:0x%08x Shader:0x%08x Tiler:0x%08x Mem:0x%0x MMU:0x%08x AS:0x%x JS:0x%x",
+		 gpu_read(pfdev, GPU_L2_FEATURES),
+		 gpu_read(pfdev, GPU_CORE_FEATURES),
+		 gpu_read(pfdev, GPU_TILER_FEATURES),
+		 gpu_read(pfdev, GPU_MEM_FEATURES),
+		 gpu_read(pfdev, GPU_MMU_FEATURES),
+		 gpu_read(pfdev, GPU_AS_PRESENT),
+		 gpu_read(pfdev, GPU_JS_PRESENT));
+
+	dev_info(pfdev->dev, "shader_present=0x%0llx", pfdev->features.shader_present);
+}
+
+int panfrost_gpu_init(struct panfrost_device *pfdev)
+{
+	int err, irq;
+
+	panfrost_gpu_init_features(pfdev);
+
+	panfrost_gpu_soft_reset_async(pfdev);
+	err = panfrost_gpu_soft_reset_async_wait(pfdev);
+	if (err)
+		return err;
+
+	irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "gpu");
+	if (irq <= 0)
+		return -ENODEV;
+
+	err = devm_request_irq(pfdev->dev, irq, panfrost_gpu_irq_handler,
+			       IRQF_SHARED, "gpu", pfdev);
+	if (err) {
+		dev_err(pfdev->dev, "failed to request gpu irq");
+		return err;
+	}
+
+	panfrost_gpu_init_quirks(pfdev);
+
+	gpu_write(pfdev, SHADER_PWRON_LO, pfdev->features.shader_present);
+	gpu_write(pfdev, TILER_PWRON_LO, pfdev->features.tiler_present);
+	gpu_write(pfdev, L2_PWRON_LO, pfdev->features.l2_present);
+	gpu_write(pfdev, STACK_PWRON_LO, pfdev->features.stack_present);
+	msleep(10);
+
+	dev_info(pfdev->dev, "shader pwr=%x", gpu_read(pfdev, SHADER_READY_LO));
+	dev_info(pfdev->dev, "tiler pwr=%x", gpu_read(pfdev, TILER_READY_LO));
+
+	return 0;
+}
+
+void panfrost_gpu_fini(struct panfrost_device *pfdev)
+{
+
+}
+
+u32 panfrost_gpu_get_latest_flush_id(struct panfrost_device *pfdev)
+{
+	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
+		return gpu_read(pfdev, GPU_LATEST_FLUSH_ID);
+	return 0;
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.h b/drivers/gpu/drm/panfrost/panfrost_gpu.h
new file mode 100644
index 000000000000..98958dafb43b
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.h
@@ -0,0 +1,15 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2018 Marty E. Plummer <hanetzer@startmail.com> */
+/* Copyright 2019 Collabora ltd. */
+
+#ifndef __PANFROST_GPU_H__
+#define __PANFROST_GPU_H__
+
+struct panfrost_device;
+
+int panfrost_gpu_init(struct panfrost_device *pfdev);
+void panfrost_gpu_fini(struct panfrost_device *pfdev);
+
+u32 panfrost_gpu_get_latest_flush_id(struct panfrost_device *pfdev);
+
+#endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h b/drivers/gpu/drm/panfrost/panfrost_issues.h
new file mode 100644
index 000000000000..7a5db123e8f9
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -0,0 +1,175 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* (C) COPYRIGHT 2014-2018 ARM Limited. All rights reserved. */
+/* Copyright 2019 Linaro, Ltd., Rob Herring <robh@kernel.org> */
+#ifndef __PANFROST_ISSUES_H__
+#define __PANFROST_ISSUES_H__
+
+#include <linux/bitops.h>
+
+#include "panfrost_device.h"
+
+/*
+ * This is not a complete list of issues, but only the ones the driver needs
+ * to care about.
+ */
+enum base_hw_issue {
+	HW_ISSUE_6367,
+	HW_ISSUE_6787,
+	HW_ISSUE_8186,
+	HW_ISSUE_8245,
+	HW_ISSUE_8316,
+	HW_ISSUE_8394,
+	HW_ISSUE_8401,
+	HW_ISSUE_8408,
+	HW_ISSUE_8443,
+	HW_ISSUE_8987,
+	HW_ISSUE_9435,
+	HW_ISSUE_9510,
+	HW_ISSUE_9630,
+	HW_ISSUE_10327,
+	HW_ISSUE_10649,
+	HW_ISSUE_10676,
+	HW_ISSUE_10797,
+	HW_ISSUE_10817,
+	HW_ISSUE_10883,
+	HW_ISSUE_10959,
+	HW_ISSUE_10969,
+	HW_ISSUE_11020,
+	HW_ISSUE_11024,
+	HW_ISSUE_11035,
+	HW_ISSUE_11056,
+	HW_ISSUE_T76X_3542,
+	HW_ISSUE_T76X_3953,
+	HW_ISSUE_TMIX_8463,
+	GPUCORE_1619,
+	HW_ISSUE_TMIX_8438,
+	HW_ISSUE_TGOX_R1_1234,
+	HW_ISSUE_END
+};
+
+#define hw_issues_all (\
+	BIT_ULL(HW_ISSUE_9435))
+
+#define hw_issues_t600 (\
+	BIT_ULL(HW_ISSUE_6367) | \
+	BIT_ULL(HW_ISSUE_6787) | \
+	BIT_ULL(HW_ISSUE_8408) | \
+	BIT_ULL(HW_ISSUE_9510) | \
+	BIT_ULL(HW_ISSUE_10649) | \
+	BIT_ULL(HW_ISSUE_10676) | \
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11035) | \
+	BIT_ULL(HW_ISSUE_11056) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t600_r0p0_15dev0 (\
+	BIT_ULL(HW_ISSUE_8186) | \
+	BIT_ULL(HW_ISSUE_8245) | \
+	BIT_ULL(HW_ISSUE_8316) | \
+	BIT_ULL(HW_ISSUE_8394) | \
+	BIT_ULL(HW_ISSUE_8401) | \
+	BIT_ULL(HW_ISSUE_8443) | \
+	BIT_ULL(HW_ISSUE_8987) | \
+	BIT_ULL(HW_ISSUE_9630) | \
+	BIT_ULL(HW_ISSUE_10969) | \
+	BIT_ULL(GPUCORE_1619))
+
+#define hw_issues_t620 (\
+	BIT_ULL(HW_ISSUE_10649) | \
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_10959) | \
+	BIT_ULL(HW_ISSUE_11056) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t620_r0p1 (\
+	BIT_ULL(HW_ISSUE_10327) | \
+	BIT_ULL(HW_ISSUE_10676) | \
+	BIT_ULL(HW_ISSUE_10817) | \
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11024) | \
+	BIT_ULL(HW_ISSUE_11035))
+
+#define hw_issues_t620_r1p0 (\
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11024))
+
+#define hw_issues_t720 (\
+	BIT_ULL(HW_ISSUE_10649) | \
+	BIT_ULL(HW_ISSUE_10797) | \
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_11056) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t760 (\
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_T76X_3953) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t760_r0p0 (\
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11024) | \
+	BIT_ULL(HW_ISSUE_T76X_3542))
+
+#define hw_issues_t760_r0p1 (\
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11024) | \
+	BIT_ULL(HW_ISSUE_T76X_3542))
+
+#define hw_issues_t760_r0p1_50rel0 (\
+	BIT_ULL(HW_ISSUE_T76X_3542))
+
+#define hw_issues_t760_r0p2 (\
+	BIT_ULL(HW_ISSUE_11020) | \
+	BIT_ULL(HW_ISSUE_11024) | \
+	BIT_ULL(HW_ISSUE_T76X_3542))
+
+#define hw_issues_t760_r0p3 (\
+	BIT_ULL(HW_ISSUE_T76X_3542))
+
+#define hw_issues_t820 (\
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_T76X_3953) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t830 (\
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_T76X_3953) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t860 (\
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_T76X_3953) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_t880 (\
+	BIT_ULL(HW_ISSUE_10883) | \
+	BIT_ULL(HW_ISSUE_T76X_3953) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_g31 0
+
+#define hw_issues_g31_r1p0 (\
+	BIT_ULL(HW_ISSUE_TGOX_R1_1234))
+
+#define hw_issues_g51 0
+
+#define hw_issues_g52 0
+
+#define hw_issues_g71 (\
+	BIT_ULL(HW_ISSUE_TMIX_8463) | \
+	BIT_ULL(HW_ISSUE_TMIX_8438))
+
+#define hw_issues_g71_r0p0_05dev0 (\
+	BIT_ULL(HW_ISSUE_T76X_3953))
+
+#define hw_issues_g72 0
+
+#define hw_issues_g76 0
+
+static inline bool panfrost_has_hw_issue(struct panfrost_device *pfdev, enum base_hw_issue feat)
+{
+	return test_bit(feat, pfdev->features.hw_issues);
+}
+
+#endif /* _HWCONFIG_ISSUES_H_ */
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
new file mode 100644
index 000000000000..ca9544976317
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -0,0 +1,662 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/reservation.h>
+#include <drm/gpu_scheduler.h>
+#include <drm/panfrost_drm.h>
+
+#include "panfrost_device.h"
+#include "panfrost_job.h"
+#include "panfrost_features.h"
+#include "panfrost_gem.h"
+
+#define JOB_BASE 0x1000
+
+/* Job Control regs */
+#define JOB_INT_RAWSTAT		0x000
+#define JOB_INT_CLEAR		0x004
+#define JOB_INT_MASK		0x008
+#define JOB_INT_STAT		0x00c
+#define JOB_INT_JS_STATE	0x010
+#define JOB_INT_THROTTLE	0x014
+
+#define JS_BASE			0x800
+#define JS_HEAD_LO(n)		(JS_BASE + ((n) * 0x80) + 0x00)
+#define JS_HEAD_HI(n)		(JS_BASE + ((n) * 0x80) + 0x04)
+#define JS_TAIL_LO(n)		(JS_BASE + ((n) * 0x80) + 0x08)
+#define JS_TAIL_HI(n)		(JS_BASE + ((n) * 0x80) + 0x0c)
+#define JS_AFFINITY_LO(n)	(JS_BASE + ((n) * 0x80) + 0x10)
+#define JS_AFFINITY_HI(n)	(JS_BASE + ((n) * 0x80) + 0x14)
+#define JS_CONFIG(n)		(JS_BASE + ((n) * 0x80) + 0x18)
+#define JS_XAFFINITY(n)		(JS_BASE + ((n) * 0x80) + 0x1c)
+#define JS_COMMAND(n)		(JS_BASE + ((n) * 0x80) + 0x20)
+#define JS_STATUS(n)		(JS_BASE + ((n) * 0x80) + 0x24)
+#define JS_HEAD_NEXT_LO(n)	(JS_BASE + ((n) * 0x80) + 0x40)
+#define JS_HEAD_NEXT_HI(n)	(JS_BASE + ((n) * 0x80) + 0x44)
+#define JS_AFFINITY_NEXT_LO(n)	(JS_BASE + ((n) * 0x80) + 0x50)
+#define JS_AFFINITY_NEXT_HI(n)	(JS_BASE + ((n) * 0x80) + 0x54)
+#define JS_CONFIG_NEXT(n)	(JS_BASE + ((n) * 0x80) + 0x58)
+#define JS_COMMAND_NEXT(n)	(JS_BASE + ((n) * 0x80) + 0x60)
+#define JS_FLUSH_ID_NEXT(n)	(JS_BASE + ((n) * 0x80) + 0x70)
+
+/* Possible values of JS_CONFIG and JS_CONFIG_NEXT registers */
+#define JS_CONFIG_START_FLUSH_NO_ACTION        (0u << 0)
+#define JS_CONFIG_START_FLUSH_CLEAN            (1u << 8)
+#define JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE (3u << 8)
+#define JS_CONFIG_START_MMU                    (1u << 10)
+#define JS_CONFIG_JOB_CHAIN_FLAG               (1u << 11)
+#define JS_CONFIG_END_FLUSH_NO_ACTION          JS_CONFIG_START_FLUSH_NO_ACTION
+#define JS_CONFIG_END_FLUSH_CLEAN              (1u << 12)
+#define JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE   (3u << 12)
+#define JS_CONFIG_ENABLE_FLUSH_REDUCTION       (1u << 14)
+#define JS_CONFIG_DISABLE_DESCRIPTOR_WR_BK     (1u << 15)
+#define JS_CONFIG_THREAD_PRI(n)                ((n) << 16)
+
+#define JS_COMMAND_NOP			0x00
+#define JS_COMMAND_START			0x01
+#define JS_COMMAND_SOFT_STOP   0x02	/* Gently stop processing a job chain */
+#define JS_COMMAND_HARD_STOP   0x03	/* Rudely stop processing a job chain */
+#define JS_COMMAND_SOFT_STOP_0 0x04	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 0 */
+#define JS_COMMAND_HARD_STOP_0 0x05	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 0 */
+#define JS_COMMAND_SOFT_STOP_1 0x06	/* Execute SOFT_STOP if JOB_CHAIN_FLAG is 1 */
+#define JS_COMMAND_HARD_STOP_1 0x07	/* Execute HARD_STOP if JOB_CHAIN_FLAG is 1 */
+
+#define job_write(dev, reg, data) writel(data, dev->iomem + JOB_BASE + (reg))
+#define job_read(dev, reg) readl(dev->iomem + JOB_BASE + (reg))
+
+struct panfrost_queue_state {
+	struct drm_gpu_scheduler sched;
+
+	u64 fence_context;
+	u64 emit_seqno;
+};
+
+struct panfrost_job_slot {
+	struct panfrost_queue_state queue[NUM_JOB_SLOTS];
+	spinlock_t job_lock;
+};
+
+static struct panfrost_job *
+to_panfrost_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct panfrost_job, base);
+}
+
+struct panfrost_fence {
+	struct dma_fence base;
+	struct drm_device *dev;
+	/* panfrost seqno for signaled() test */
+	u64 seqno;
+	int queue;
+};
+
+static inline struct panfrost_fence *
+to_panfrost_fence(struct dma_fence *fence)
+{
+	return (struct panfrost_fence *)fence;
+}
+
+static const char *panfrost_fence_get_driver_name(struct dma_fence *fence)
+{
+	return "panfrost";
+}
+
+static const char *panfrost_fence_get_timeline_name(struct dma_fence *fence)
+{
+	struct panfrost_fence *f = to_panfrost_fence(fence);
+
+	switch (f->queue) {
+	case 0:
+		return "panfrost-js-0";
+	case 1:
+		return "panfrost-js-1";
+	case 2:
+		return "panfrost-js-2";
+	default:
+		return NULL;
+	}
+}
+
+static const struct dma_fence_ops panfrost_fence_ops = {
+	.get_driver_name = panfrost_fence_get_driver_name,
+	.get_timeline_name = panfrost_fence_get_timeline_name,
+};
+
+static struct dma_fence *panfrost_fence_create(struct panfrost_device *pfdev, int js_num)
+{
+	struct panfrost_fence *fence;
+	struct panfrost_job_slot *js = pfdev->js;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->dev = pfdev->ddev;
+	fence->queue = js_num;
+	fence->seqno = ++js->queue[js_num].emit_seqno;
+	dma_fence_init(&fence->base, &panfrost_fence_ops, &js->job_lock,
+		       js->queue[js_num].fence_context, fence->seqno);
+
+	return &fence->base;
+}
+
+static int panfrost_job_get_slot(struct panfrost_job *job)
+{
+	if (job->requirements & PANFROST_JD_REQ_FS)
+		return 0;
+
+#if 0
+// Ignore compute for now
+	if (job->requirements & BASE_JD_REQ_ONLY_COMPUTE) {
+		if (job->device_nr == 1 &&
+				kbdev->gpu_props.num_core_groups == 2)
+			return 2;
+		if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_8987))
+			return 2;
+	}
+#endif
+	return 1;
+}
+
+static void panfrost_job_write_affinity(struct panfrost_device *pfdev,
+					u32 requirements,
+					int js)
+{
+	u64 affinity = 0xf;
+#if 0
+	if ((requirements & (BASE_JD_REQ_FS | BASE_JD_REQ_CS | BASE_JD_REQ_T)) ==
+			BASE_JD_REQ_T) {
+		/* Tiler-only atom */
+		/* If the hardware supports XAFFINITY then we'll only enable
+		 * the tiler (which is the default so this is a no-op),
+		 * otherwise enable shader core 0.
+		 */
+		if (!panfrost_has_hw_feature(kbdev, BASE_HW_FEATURE_XAFFINITY))
+			affinity = 1;
+		else
+			affinity = 0;
+	} else if ((requirements & (BASE_JD_REQ_COHERENT_GROUP |
+			BASE_JD_REQ_SPECIFIC_COHERENT_GROUP))) {
+		unsigned int num_core_groups = kbdev->gpu_props.num_core_groups;
+		struct mali_base_gpu_coherent_group_info *coherency_info =
+			&kbdev->gpu_props.props.coherency_info;
+
+		affinity = kbase_pm_ca_get_core_mask(kbdev) &
+				kbdev->pm.debug_core_mask[js];
+
+		/* JS2 on a dual core group system targets core group 1. All
+		 * other cases target core group 0.
+		 */
+		if (js == 2 && num_core_groups > 1)
+			affinity &= coherency_info->group[1].core_mask;
+		else
+			affinity &= coherency_info->group[0].core_mask;
+	} else {
+		/* Use all cores */
+		affinity = kbase_pm_ca_get_core_mask(kbdev) &
+				kbdev->pm.debug_core_mask[js];
+	}
+#endif
+	job_write(pfdev, JS_AFFINITY_NEXT_LO(js), affinity & 0xFFFFFFFF);
+	job_write(pfdev, JS_AFFINITY_NEXT_HI(js), affinity >> 32);
+}
+
+static void panfrost_job_hw_submit(struct panfrost_job *job, int js)
+{
+	struct panfrost_device *pfdev = job->pfdev;
+	u32 cfg;
+	u64 jc_head = job->jc;
+	int timeout = 100;
+
+	do {
+		if (!job_read(pfdev, JS_COMMAND_NEXT(js)))
+			break;
+		udelay(100);
+	} while (timeout--);
+	if (!timeout) {
+		dev_err(pfdev->dev, "JS%d busy", js);
+		return;
+	}
+
+
+	job_write(pfdev, JS_HEAD_NEXT_LO(js), jc_head & 0xFFFFFFFF);
+	job_write(pfdev, JS_HEAD_NEXT_HI(js), jc_head >> 32);
+
+	panfrost_job_write_affinity(pfdev, job->requirements, js);
+
+	/* start MMU, medium priority, cache clean/flush on end, clean/flush on
+	 * start */
+	// TODO: different address spaces
+	cfg = JS_CONFIG_THREAD_PRI(8) |
+		JS_CONFIG_START_FLUSH_CLEAN_INVALIDATE |
+		JS_CONFIG_END_FLUSH_CLEAN_INVALIDATE;
+
+	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
+		cfg |= JS_CONFIG_ENABLE_FLUSH_REDUCTION;
+
+#if 0
+	if (kbase_hw_has_issue(kbdev, BASE_HW_ISSUE_10649))
+		cfg |= JS_CONFIG_START_MMU;
+
+	if (panfrost_has_hw_feature(kbdev,
+				BASE_HW_FEATURE_JOBCHAIN_DISAMBIGUATION)) {
+		if (!kbdev->hwaccess.backend.slot_rb[js].job_chain_flag) {
+			cfg |= JS_CONFIG_JOB_CHAIN_FLAG;
+			katom->atom_flags |= KBASE_KATOM_FLAGS_JOBCHAIN;
+			kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
+								true;
+		} else {
+			katom->atom_flags &= ~KBASE_KATOM_FLAGS_JOBCHAIN;
+			kbdev->hwaccess.backend.slot_rb[js].job_chain_flag =
+								false;
+		}
+	}
+#endif
+	job_write(pfdev, JS_CONFIG_NEXT(js), cfg);
+
+	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_FLUSH_REDUCTION))
+		job_write(pfdev, JS_FLUSH_ID_NEXT(js), job->flush_id);
+
+	/* GO ! */
+	dev_dbg(pfdev->dev, "JS: Submitting atom %p to js[%d] with head=0x%llx",
+				job, js, jc_head);
+
+	job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_START);
+}
+
+
+static void panfrost_unlock_bo_reservations(struct drm_gem_object **bos,
+					    int bo_count,
+					    struct ww_acquire_ctx *acquire_ctx)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++) {
+		ww_mutex_unlock(&bos[i]->resv->lock);
+	}
+	ww_acquire_fini(acquire_ctx);
+}
+
+/* Takes the reservation lock on all the BOs being referenced, so that
+ * at queue submit time we can update the reservations.
+ *
+ * We don't lock the RCL the tile alloc/state BOs, or overflow memory
+ * (all of which are on exec->unref_list).  They're entirely private
+ * to panfrost, so we don't attach dma-buf fences to them.
+ */
+static int panfrost_lock_bo_reservations(struct drm_gem_object **bos,
+					 int bo_count,
+					 struct ww_acquire_ctx *acquire_ctx)
+{
+	int contended_lock = -1;
+	int i, ret;
+
+	ww_acquire_init(acquire_ctx, &reservation_ww_class);
+
+retry:
+	if (contended_lock != -1) {
+		struct drm_gem_object *bo = bos[contended_lock];
+
+		ret = ww_mutex_lock_slow_interruptible(&bo->resv->lock,
+						       acquire_ctx);
+		if (ret) {
+			ww_acquire_done(acquire_ctx);
+			return ret;
+		}
+	}
+
+	for (i = 0; i < bo_count; i++) {
+		if (i == contended_lock)
+			continue;
+
+		ret = ww_mutex_lock_interruptible(&bos[i]->resv->lock,
+						  acquire_ctx);
+		if (ret) {
+			int j;
+
+			for (j = 0; j < i; j++)
+				ww_mutex_unlock(&bos[j]->resv->lock);
+
+			if (contended_lock != -1 && contended_lock >= i) {
+				struct drm_gem_object *bo = bos[contended_lock];
+
+				ww_mutex_unlock(&bo->resv->lock);
+			}
+
+			if (ret == -EDEADLK) {
+				contended_lock = i;
+				goto retry;
+			}
+
+			ww_acquire_done(acquire_ctx);
+			return ret;
+		}
+	}
+
+	ww_acquire_done(acquire_ctx);
+
+	/* Reserve space for our shared (read-only) fence references,
+	 * before we commit the job to the hardware.
+	 */
+	for (i = 0; i < bo_count; i++) {
+		ret = reservation_object_reserve_shared(bos[i]->resv, 1);
+		if (ret) {
+			panfrost_unlock_bo_reservations(bos, bo_count,
+						   acquire_ctx);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void panfrost_attach_object_fences(struct drm_gem_object **bos,
+					  int bo_count,
+					  struct dma_fence *fence)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		/* XXX: Use shared fences for read-only objects. */
+		reservation_object_add_excl_fence(bos[i]->resv, fence);
+}
+
+int panfrost_job_push(struct panfrost_job *job)
+{
+	struct panfrost_device *pfdev = job->pfdev;
+	int slot = panfrost_job_get_slot(job);
+	struct drm_sched_entity *entity = &job->file_priv->sched_entity[slot];
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	mutex_lock(&pfdev->sched_lock);
+
+	ret = panfrost_lock_bo_reservations(job->bos, job->bo_count,
+					    &acquire_ctx);
+	if (ret) {
+		mutex_unlock(&pfdev->sched_lock);
+		return ret;
+	}
+
+	ret = drm_sched_job_init(&job->base, entity, NULL);
+	if (ret) {
+		mutex_unlock(&pfdev->sched_lock);
+		goto unlock;
+	}
+
+	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	kref_get(&job->refcount); /* put by scheduler job completion */
+
+	drm_sched_entity_push_job(&job->base, entity);
+
+	mutex_unlock(&pfdev->sched_lock);
+
+	panfrost_attach_object_fences(job->bos, job->bo_count,
+				 job->render_done_fence);
+
+unlock:
+	panfrost_unlock_bo_reservations(job->bos, job->bo_count, &acquire_ctx);
+
+	return ret;
+}
+
+static void panfrost_job_cleanup(struct kref *ref)
+{
+	struct panfrost_job *job = container_of(ref, struct panfrost_job,
+						refcount);
+	unsigned int i;
+
+	dma_fence_put(job->in_fence);
+	dma_fence_put(job->done_fence);
+	dma_fence_put(job->render_done_fence);
+
+	for (i = 0; i < job->bo_count; i++)
+		drm_gem_object_put_unlocked(job->bos[i]);
+	kvfree(job->bos);
+
+	kfree(job);
+}
+
+void panfrost_job_put(struct panfrost_job *job)
+{
+	kref_put(&job->refcount, panfrost_job_cleanup);
+}
+
+static void panfrost_job_free(struct drm_sched_job *sched_job)
+{
+	struct panfrost_job *job = to_panfrost_job(sched_job);
+
+	drm_sched_job_cleanup(sched_job);
+
+	panfrost_job_put(job);
+}
+
+static struct dma_fence *panfrost_job_dependency(struct drm_sched_job *sched_job,
+						 struct drm_sched_entity *s_entity)
+{
+	struct panfrost_job *job = to_panfrost_job(sched_job);
+	struct dma_fence *fence;
+
+	fence = job->in_fence;
+	if (fence) {
+		job->in_fence = NULL;
+		return fence;
+	}
+
+	return NULL;
+}
+
+static struct dma_fence *panfrost_job_run(struct drm_sched_job *sched_job)
+{
+	struct panfrost_job *job = to_panfrost_job(sched_job);
+	struct panfrost_device *pfdev = job->pfdev;
+	int slot = panfrost_job_get_slot(job);
+	struct dma_fence *fence = NULL;
+
+	if (unlikely(job->base.s_fence->finished.error))
+		return NULL;
+
+	pfdev->jobs[slot] = job;
+
+	fence = panfrost_fence_create(pfdev, slot);
+	if (IS_ERR(fence))
+		return NULL;
+
+	if (job->done_fence)
+		dma_fence_put(job->done_fence);
+	job->done_fence = dma_fence_get(fence);
+
+	panfrost_job_hw_submit(job, slot);
+
+	return fence;
+}
+
+static void panfrost_job_timedout(struct drm_sched_job *sched_job)
+{
+	struct panfrost_job *job = to_panfrost_job(sched_job);
+	struct panfrost_device *pfdev = job->pfdev;
+	int js = panfrost_job_get_slot(job);
+
+	job_write(pfdev, JS_COMMAND_NEXT(js), JS_COMMAND_NOP);
+
+	job_write(pfdev, JS_COMMAND(js), JS_COMMAND_HARD_STOP_0);
+	dev_err(pfdev->dev, "gpu sched timeout, js=%d, status=0x%x, head=0x%x, tail=0x%x",
+		js,
+		job_read(pfdev, JS_STATUS(js)),
+		job_read(pfdev, JS_HEAD_LO(js)),
+		job_read(pfdev, JS_TAIL_LO(js)));
+
+	if (job_read(pfdev, JS_STATUS(js)) == 8) {
+//		dev_err(pfdev->dev, "reseting gpu");
+//		panfrost_gpu_reset(pfdev);
+	}
+
+	/* For now, just say we're done. No reset and retry. */
+//	job_write(pfdev, JS_COMMAND(js), JS_COMMAND_HARD_STOP);
+	dma_fence_signal(job->done_fence);
+}
+
+static const struct drm_sched_backend_ops panfrost_sched_ops = {
+	.dependency = panfrost_job_dependency,
+	.run_job = panfrost_job_run,
+	.timedout_job = panfrost_job_timedout,
+	.free_job = panfrost_job_free
+};
+
+static const char *job_exception_name(u32 exception_code)
+{
+	switch (exception_code) {
+		/* Non-Fault Status code */
+	case 0x00: return "NOT_STARTED/IDLE/OK";
+	case 0x01: return "DONE";
+	case 0x02: return "INTERRUPTED";
+	case 0x03: return "STOPPED";
+	case 0x04: return "TERMINATED";
+	case 0x08: return "ACTIVE";
+		/* Job exceptions */
+	case 0x40: return "JOB_CONFIG_FAULT";
+	case 0x41: return "JOB_POWER_FAULT";
+	case 0x42: return "JOB_READ_FAULT";
+	case 0x43: return "JOB_WRITE_FAULT";
+	case 0x44: return "JOB_AFFINITY_FAULT";
+	case 0x48: return "JOB_BUS_FAULT";
+	case 0x50: return "INSTR_INVALID_PC";
+	case 0x51: return "INSTR_INVALID_ENC";
+	case 0x52: return "INSTR_TYPE_MISMATCH";
+	case 0x53: return "INSTR_OPERAND_FAULT";
+	case 0x54: return "INSTR_TLS_FAULT";
+	case 0x55: return "INSTR_BARRIER_FAULT";
+	case 0x56: return "INSTR_ALIGN_FAULT";
+	case 0x58: return "DATA_INVALID_FAULT";
+	case 0x59: return "TILE_RANGE_FAULT";
+	case 0x5A: return "ADDR_RANGE_FAULT";
+	case 0x60: return "OUT_OF_MEMORY";
+	}
+
+	return "UNKNOWN";
+}
+
+static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
+{
+	struct panfrost_device *pfdev = data;
+	u32 status = job_read(pfdev, JOB_INT_STAT);
+	int j;
+
+	dev_dbg(pfdev->dev, "jobslot irq status=%x\n", status);
+
+	if (!status)
+		return IRQ_NONE;
+
+	for (j = 0; status; j++) {
+		u32 mask = (1 << j) | (1 << (j + 16));
+
+		if (!(status & mask))
+			continue;
+
+		job_write(pfdev, JOB_INT_CLEAR, mask);
+
+		if (status & BIT(j + 16)) {
+			job_write(pfdev, JS_COMMAND_NEXT(j), JS_COMMAND_NOP);
+			job_write(pfdev, JS_COMMAND(j), JS_COMMAND_HARD_STOP_0);
+
+			dev_err(pfdev->dev, "js fault, js=%d, status=%s, head=0x%x, tail=0x%x",
+				j,
+				job_exception_name(job_read(pfdev, JS_STATUS(j))),
+				job_read(pfdev, JS_HEAD_LO(j)),
+				job_read(pfdev, JS_TAIL_LO(j)));
+		}
+
+		if (status & BIT(j)) {
+			dma_fence_signal(pfdev->jobs[j]->done_fence);
+		}
+
+		status &= ~mask;
+	}
+
+	return IRQ_HANDLED;
+}
+
+int panfrost_job_init(struct panfrost_device *pfdev)
+{
+	struct panfrost_job_slot *js;
+	int ret, j, irq;
+
+	pfdev->js = js = devm_kzalloc(pfdev->dev, sizeof(*js), GFP_KERNEL);
+	if (!js)
+		return -ENOMEM;
+
+	spin_lock_init(&js->job_lock);
+
+	irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "job");
+	if (irq <= 0)
+		return -ENODEV;
+
+	ret = devm_request_irq(pfdev->dev, irq, panfrost_job_irq_handler,
+			       IRQF_SHARED, "job", pfdev);
+	if (ret) {
+		dev_err(pfdev->dev, "failed to request job irq");
+		return ret;
+	}
+	job_write(pfdev, JOB_INT_CLEAR, 0x70007);
+	job_write(pfdev, JOB_INT_MASK, 0x70007);
+
+	for (j = 0; j < NUM_JOB_SLOTS; j++) {
+		js->queue[j].fence_context = dma_fence_context_alloc(1);
+
+		ret = drm_sched_init(&js->queue[j].sched,
+				     &panfrost_sched_ops,
+				     1, 0, msecs_to_jiffies(500),
+				     "pan_js");
+		if (ret) {
+			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
+			goto err_sched;
+		}
+	}
+
+	return 0;
+
+err_sched:
+	for (j--; j >= 0; j--)
+		drm_sched_fini(&js->queue[j].sched);
+
+	return ret;
+}
+
+void panfrost_job_fini(struct panfrost_device *pfdev)
+{
+	struct panfrost_job_slot *js = pfdev->js;
+	int j;
+
+	for (j = 0; j < NUM_JOB_SLOTS; j++)
+		drm_sched_fini(&js->queue[j].sched);
+
+}
+
+void panfrost_job_open(struct panfrost_file_priv *panfrost_priv)
+{
+	struct panfrost_device *pfdev = panfrost_priv->pfdev;
+	struct panfrost_job_slot *js = pfdev->js;
+	struct drm_sched_rq *rq;
+	int ret, i;
+
+	for (i = 0; i < NUM_JOB_SLOTS; i++) {
+		rq = &js->queue[i].sched.sched_rq[DRM_SCHED_PRIORITY_NORMAL];
+		ret = drm_sched_entity_init(&panfrost_priv->sched_entity[i], &rq, 1, NULL);
+		WARN_ON(ret);
+	}
+}
+
+void panfrost_job_close(struct panfrost_file_priv *panfrost_priv)
+{
+	int i;
+
+	for (i = 0; i < NUM_JOB_SLOTS; i++) {
+		drm_sched_entity_destroy(&panfrost_priv->sched_entity[i]);
+	}
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.h b/drivers/gpu/drm/panfrost/panfrost_job.h
new file mode 100644
index 000000000000..cceb6e655e4c
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_job.h
@@ -0,0 +1,47 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright 2019 Collabora ltd. */
+
+#ifndef __PANFROST_JOB_H__
+#define __PANFROST_JOB_H__
+
+#include <uapi/drm/panfrost_drm.h>
+#include <drm/gpu_scheduler.h>
+
+#define NUM_JOB_SLOTS	2	/* Don't need 3rd one until we have compute support */
+
+struct panfrost_device;
+struct panfrost_gem_object;
+struct panfrost_file_priv;
+
+struct panfrost_job {
+	struct drm_sched_job base;
+
+	struct kref refcount;
+
+	struct panfrost_device *pfdev;
+	struct panfrost_file_priv *file_priv;
+
+	/* An optional fence userspace can pass in for the job to depend on. */
+	struct dma_fence *in_fence;
+
+	/* fence to be signaled by IRQ handler when the job is complete. */
+	struct dma_fence *done_fence;
+
+	__u64 jc;
+	__u32 requirements;
+	__u32 flush_id;
+
+	struct drm_gem_object **bos;
+	u32 bo_count;
+
+	struct dma_fence *render_done_fence;
+};
+
+int panfrost_job_init(struct panfrost_device *pfdev);
+void panfrost_job_fini(struct panfrost_device *pfdev);
+void panfrost_job_open(struct panfrost_file_priv *panfrost_priv);
+void panfrost_job_close(struct panfrost_file_priv *panfrost_priv);
+int panfrost_job_push(struct panfrost_job *job);
+void panfrost_job_put(struct panfrost_job *job);
+
+#endif
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
new file mode 100644
index 000000000000..107d734050f3
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -0,0 +1,409 @@ 
+// SPDX-License-Identifier:	GPL-2.0
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#include <linux/delay.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/io-pgtable.h>
+#include <linux/iommu.h>
+#include <linux/platform_device.h>
+
+#include "panfrost_device.h"
+#include "panfrost_mmu.h"
+#include "panfrost_gem.h"
+#include "panfrost_features.h"
+
+#define MMU_BASE 0x2000
+
+/* MMU regs */
+#define MMU_INT_RAWSTAT			0x00
+#define MMU_INT_CLEAR			0x04
+#define MMU_INT_MASK			0x08
+#define MMU_INT_STAT			0x0c
+
+/* AS_COMMAND register commands */
+#define AS_COMMAND_NOP         0x00	/* NOP Operation */
+#define AS_COMMAND_UPDATE      0x01	/* Broadcasts the values in AS_TRANSTAB and ASn_MEMATTR to all MMUs */
+#define AS_COMMAND_LOCK        0x02	/* Issue a lock region command to all MMUs */
+#define AS_COMMAND_UNLOCK      0x03	/* Issue a flush region command to all MMUs */
+#define AS_COMMAND_FLUSH       0x04	/* Flush all L2 caches then issue a flush region command to all MMUs
+					   (deprecated - only for use with T60x) */
+#define AS_COMMAND_FLUSH_PT    0x04	/* Flush all L2 caches then issue a flush region command to all MMUs */
+#define AS_COMMAND_FLUSH_MEM   0x05	/* Wait for memory accesses to complete, flush all the L1s cache then
+					   flush all L2 caches then issue a flush region command to all MMUs */
+
+#define MMU_AS(as)			(0x400 + ((as) << 6))
+
+#define AS_TRANSTAB_LO(as)		(MMU_AS(as) + 0x00) /* (RW) Translation Table Base Address for address space n, low word */
+#define AS_TRANSTAB_HI(as)		(MMU_AS(as) + 0x04) /* (RW) Translation Table Base Address for address space n, high word */
+#define AS_MEMATTR_LO(as)		(MMU_AS(as) + 0x08) /* (RW) Memory attributes for address space n, low word. */
+#define AS_MEMATTR_HI(as)		(MMU_AS(as) + 0x0C) /* (RW) Memory attributes for address space n, high word. */
+#define AS_LOCKADDR_LO(as)		(MMU_AS(as) + 0x10) /* (RW) Lock region address for address space n, low word */
+#define AS_LOCKADDR_HI(as)		(MMU_AS(as) + 0x14) /* (RW) Lock region address for address space n, high word */
+#define AS_COMMAND(as)			(MMU_AS(as) + 0x18) /* (WO) MMU command register for address space n */
+#define AS_FAULTSTATUS(as)		(MMU_AS(as) + 0x1C) /* (RO) MMU fault status register for address space n */
+#define AS_FAULTADDRESS_LO(as)		(MMU_AS(as) + 0x20) /* (RO) Fault Address for address space n, low word */
+#define AS_FAULTADDRESS_HI(as)		(MMU_AS(as) + 0x24) /* (RO) Fault Address for address space n, high word */
+#define AS_STATUS(as)			(MMU_AS(as) + 0x28) /* (RO) Status flags for address space n */
+
+/* (RW) Translation table configuration for address space n, low word */
+#define AS_TRANSCFG_LO         0x30
+/* (RW) Translation table configuration for address space n, high word */
+#define AS_TRANSCFG_HI         0x34
+/* (RO) Secondary fault address for address space n, low word */
+#define AS_FAULTEXTRA_LO       0x38
+/* (RO) Secondary fault address for address space n, high word */
+#define AS_FAULTEXTRA_HI       0x3C
+
+/*
+ * Begin LPAE MMU TRANSTAB register values
+ */
+#define AS_TRANSTAB_LPAE_ADDR_SPACE_MASK   0xfffffffffffff000
+#define AS_TRANSTAB_LPAE_ADRMODE_UNMAPPED  (0u << 0)
+#define AS_TRANSTAB_LPAE_ADRMODE_IDENTITY  (1u << 1)
+#define AS_TRANSTAB_LPAE_ADRMODE_TABLE     (3u << 0)
+#define AS_TRANSTAB_LPAE_READ_INNER        (1u << 2)
+#define AS_TRANSTAB_LPAE_SHARE_OUTER       (1u << 4)
+
+#define AS_TRANSTAB_LPAE_ADRMODE_MASK      0x00000003
+
+#define AS_STATUS_AS_ACTIVE 0x01
+
+#define AS_FAULTSTATUS_ACCESS_TYPE_MASK                  (0x3<<8)
+#define AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC                (0x0<<8)
+#define AS_FAULTSTATUS_ACCESS_TYPE_EX                    (0x1<<8)
+#define AS_FAULTSTATUS_ACCESS_TYPE_READ                  (0x2<<8)
+#define AS_FAULTSTATUS_ACCESS_TYPE_WRITE                 (0x3<<8)
+
+
+#define mmu_write(dev, reg, data) writel(data, dev->iomem + MMU_BASE + reg)
+#define mmu_read(dev, reg) readl(dev->iomem + MMU_BASE + reg)
+
+struct panfrost_mmu {
+	struct io_pgtable_ops *pgtbl_ops;
+	struct mutex lock;
+};
+
+static int wait_ready(struct panfrost_device *pfdev, u32 as_nr)
+{
+	unsigned int max_loops = 10000;
+	u32 val;
+
+	/* Wait for the MMU status to indicate there is no active command, in
+	 * case one is pending. Do not log remaining register accesses. */
+	do {
+		val = mmu_read(pfdev, AS_STATUS(as_nr));
+		if (!(val & AS_STATUS_AS_ACTIVE))
+			break;
+
+		udelay(1);
+	} while (--max_loops);
+
+	if (max_loops == 0) {
+		dev_err(pfdev->dev, "AS_ACTIVE bit stuck\n");
+		return -ETIMEDOUT;
+	}
+
+	return 0;
+}
+
+static int write_cmd(struct panfrost_device *pfdev, u32 as_nr, u32 cmd)
+{
+	int status;
+
+	/* write AS_COMMAND when MMU is ready to accept another command */
+	status = wait_ready(pfdev, as_nr);
+	if (!status)
+		mmu_write(pfdev, AS_COMMAND(as_nr), cmd);
+
+	return status;
+}
+
+static void lock_region(struct panfrost_device *pfdev, u32 as_nr,
+			u64 iova, size_t size)
+{
+	u8 region_width;
+	u64 region = iova & PAGE_MASK;
+	/*
+	 * fls returns (given the ASSERT above):
+	 * 1 .. 32
+	 *
+	 * 10 + fls(num_pages)
+	 * results in the range (11 .. 42)
+	 */
+
+	size = round_up(size, PAGE_SIZE);
+
+	region_width = 10 + fls(size >> PAGE_SHIFT);
+	if ((size >> PAGE_SHIFT) != (1ul << (region_width - 11))) {
+		/* not pow2, so must go up to the next pow2 */
+		region_width += 1;
+	}
+	region |= region_width;
+
+	/* Lock the region that needs to be updated */
+	mmu_write(pfdev, AS_LOCKADDR_LO(as_nr), region & 0xFFFFFFFFUL);
+	mmu_write(pfdev, AS_LOCKADDR_HI(as_nr), (region >> 32) & 0xFFFFFFFFUL);
+	write_cmd(pfdev, as_nr, AS_COMMAND_LOCK);
+}
+
+
+static int mmu_hw_do_operation(struct panfrost_device *pfdev, u32 as_nr,
+		u64 iova, size_t size, u32 op)
+{
+	if (op != AS_COMMAND_UNLOCK)
+		lock_region(pfdev, as_nr, iova, size);
+
+	/* Run the MMU operation */
+	write_cmd(pfdev, as_nr, op);
+
+	/* Wait for the flush to complete */
+	return wait_ready(pfdev, as_nr);
+}
+
+static void mmu_enable(struct panfrost_device *pfdev, u32 as_nr, u64 pgt_base, u64 mem_attr)
+{
+//	if (kbdev->system_coherency == COHERENCY_ACE)
+//		current_setup->transtab |= AS_TRANSTAB_LPAE_SHARE_OUTER;
+
+	pgt_base &= AS_TRANSTAB_LPAE_ADDR_SPACE_MASK;
+	pgt_base |= AS_TRANSTAB_LPAE_ADRMODE_TABLE | AS_TRANSTAB_LPAE_READ_INNER;
+	mmu_write(pfdev, AS_TRANSTAB_LO(as_nr), pgt_base & 0xffffffffUL);
+	mmu_write(pfdev, AS_TRANSTAB_HI(as_nr), pgt_base >> 32);
+
+	mmu_write(pfdev, AS_MEMATTR_LO(as_nr), mem_attr & 0xffffffffUL);
+	mmu_write(pfdev, AS_MEMATTR_HI(as_nr), mem_attr >> 32);
+
+	write_cmd(pfdev, as_nr, AS_COMMAND_UPDATE);
+}
+
+
+int panfrost_mmu_map(struct panfrost_gem_object *bo)
+{
+	struct drm_gem_object *obj = &bo->base.base;
+	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
+	struct io_pgtable_ops *ops = pfdev->mmu->pgtbl_ops;
+	u64 iova = bo->node.start << PAGE_SHIFT;
+	unsigned int count;
+	struct scatterlist *sgl;
+	struct sg_table *sgt;
+
+	sgt = drm_gem_shmem_get_pages_sgt(obj);
+	if (WARN_ON(IS_ERR(sgt)))
+		return PTR_ERR(sgt);
+
+	mutex_lock(&pfdev->mmu->lock);
+
+	for_each_sg(sgt->sgl, sgl, sgt->nents, count) {
+		unsigned long paddr = sg_dma_address(sgl);
+		size_t len = sg_dma_len(sgl);
+
+		while (len) {
+			dev_dbg(pfdev->dev, "map: iova=%llx, paddr=%lx, len=%zx", iova, paddr, len);
+			ops->map(ops, iova, paddr, SZ_4K, IOMMU_WRITE | IOMMU_READ);
+			iova += SZ_4K;
+			paddr += SZ_4K;
+			len -= SZ_4K;
+		}
+	}
+
+	mutex_unlock(&pfdev->mmu->lock);
+	return 0;
+}
+
+void panfrost_mmu_unmap(struct panfrost_gem_object *bo)
+{
+	struct drm_gem_object *obj = &bo->base.base;
+	struct panfrost_device *pfdev = to_panfrost_device(obj->dev);
+	struct io_pgtable_ops *ops = pfdev->mmu->pgtbl_ops;
+	u64 iova = bo->node.start << PAGE_SHIFT;
+	size_t len = bo->node.size << PAGE_SHIFT;
+	size_t unmapped_len = 0;
+
+	mutex_lock(&pfdev->mmu->lock);
+
+	while (unmapped_len < len) {
+		ops->unmap(ops, iova, SZ_4K);
+		iova += SZ_4K;
+		unmapped_len += SZ_4K;
+	}
+
+	mutex_unlock(&pfdev->mmu->lock);
+}
+
+static void mmu_tlb_inv_context_s1(void *cookie)
+{
+	struct panfrost_device *pfdev = cookie;
+
+	mmu_hw_do_operation(pfdev, 0, 0, ~0UL, AS_COMMAND_FLUSH_MEM);
+}
+
+static void mmu_tlb_inv_range_nosync(unsigned long iova, size_t size, size_t granule,
+		      bool leaf, void *cookie)
+{
+	struct panfrost_device *pfdev = cookie;
+
+	dev_dbg(pfdev->dev, "inv_range: iova=%lx, size=%zx, granule=%zx, leaf=%d",
+		iova, size, granule, leaf);
+	mmu_hw_do_operation(pfdev, 0, iova, size, AS_COMMAND_FLUSH_PT);
+}
+static void mmu_tlb_sync_context(void *cookie)
+{
+	//struct panfrost_device *pfdev = cookie;
+	// Wait 1000 GPU cycles!?
+}
+
+static const struct iommu_gather_ops mmu_tlb_ops = {
+	.tlb_flush_all	= mmu_tlb_inv_context_s1,
+	.tlb_add_flush	= mmu_tlb_inv_range_nosync,
+	.tlb_sync	= mmu_tlb_sync_context,
+};
+
+const char *mmu_exception_name(struct panfrost_device *pfdev, u32 exception_code)
+{
+	switch (exception_code) {
+	case 0xC0 ... 0xC7: return "TRANSLATION_FAULT";
+	case 0xC8: return "PERMISSION_FAULT";
+	case 0xD0 ... 0xD7: return "TRANSTAB_BUS_FAULT";
+	case 0xD8: return "ACCESS_FLAG";
+	}
+
+	if (panfrost_has_hw_feature(pfdev, HW_FEATURE_AARCH64_MMU)) {
+		switch (exception_code) {
+		case 0xC9 ... 0xCF: return "PERMISSION_FAULT";
+		case 0xD9 ... 0xDF: return "ACCESS_FLAG";
+		case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
+		case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
+		};
+	}
+	return "UNKNOWN";
+}
+
+static const char *access_type_name(struct panfrost_device *pfdev,
+		u32 fault_status)
+{
+	switch (fault_status & AS_FAULTSTATUS_ACCESS_TYPE_MASK) {
+	case AS_FAULTSTATUS_ACCESS_TYPE_ATOMIC:
+		if (panfrost_has_hw_feature(pfdev, HW_FEATURE_AARCH64_MMU))
+			return "ATOMIC";
+		else
+			return "UNKNOWN";
+	case AS_FAULTSTATUS_ACCESS_TYPE_READ:
+		return "READ";
+	case AS_FAULTSTATUS_ACCESS_TYPE_WRITE:
+		return "WRITE";
+	case AS_FAULTSTATUS_ACCESS_TYPE_EX:
+		return "EXECUTE";
+	default:
+		WARN_ON(1);
+		return NULL;
+	}
+}
+
+static irqreturn_t panfrost_mmu_irq_handler(int irq, void *data)
+{
+	struct panfrost_device *pfdev = data;
+	u32 status = mmu_read(pfdev, MMU_INT_STAT);
+	int i;
+
+	if (!status)
+		return IRQ_NONE;
+
+	dev_err(pfdev->dev, "mmu irq status=%x\n", status);
+
+	for (i = 0; status; i++) {
+		u32 mask = BIT(i) | BIT(i + 16);
+		u64 addr;
+		u32 fault_status;
+		u32 exception_type;
+		u32 access_type;
+		u32 source_id;
+
+		if (!(status & mask))
+			continue;
+
+		fault_status = mmu_read(pfdev, AS_FAULTSTATUS(i));
+		addr = mmu_read(pfdev, AS_FAULTADDRESS_LO(i));
+		addr |= (u64)mmu_read(pfdev, AS_FAULTADDRESS_HI(i)) << 32;
+
+		/* decode the fault status */
+		exception_type = fault_status & 0xFF;
+		access_type = (fault_status >> 8) & 0x3;
+		source_id = (fault_status >> 16);
+
+		/* terminal fault, print info about the fault */
+		dev_err(pfdev->dev,
+			"Unhandled Page fault in AS%d at VA 0x%016llX\n"
+			"Reason: %s\n"
+			"raw fault status: 0x%X\n"
+			"decoded fault status: %s\n"
+			"exception type 0x%X: %s\n"
+			"access type 0x%X: %s\n"
+			"source id 0x%X\n",
+			i, addr,
+			"TODO",
+			fault_status,
+			(fault_status & (1 << 10) ? "DECODER FAULT" : "SLAVE FAULT"),
+			exception_type, mmu_exception_name(pfdev, exception_type),
+			access_type, access_type_name(pfdev, fault_status),
+			source_id);
+
+		mmu_write(pfdev, MMU_INT_CLEAR, mask);
+
+		status &= ~mask;
+	}
+
+	return IRQ_HANDLED;
+};
+
+int panfrost_mmu_init(struct panfrost_device *pfdev)
+{
+	struct io_pgtable_ops *pgtbl_ops;
+	struct io_pgtable_cfg pgtbl_cfg;
+	int err, irq;
+
+	pfdev->mmu = devm_kzalloc(pfdev->dev, sizeof(struct panfrost_mmu), GFP_KERNEL);
+	if (!pfdev->mmu)
+		return -ENOMEM;
+
+	mutex_init(&pfdev->mmu->lock);
+
+	irq = platform_get_irq_byname(to_platform_device(pfdev->dev), "mmu");
+	if (irq <= 0)
+		return -ENODEV;
+
+	err = devm_request_irq(pfdev->dev, irq, panfrost_mmu_irq_handler,
+			       IRQF_SHARED, "mmu", pfdev);
+
+	if (err) {
+		dev_err(pfdev->dev, "failed to request mmu irq");
+		return err;
+	}
+	mmu_write(pfdev, MMU_INT_CLEAR, ~0);
+	mmu_write(pfdev, MMU_INT_MASK, ~0);
+
+	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.pgsize_bitmap	= SZ_4K, // | SZ_2M | SZ_1G),
+		.ias		= 48,
+		.oas		= 40,	// Should come from dma mask?
+		.tlb		= &mmu_tlb_ops,
+		.iommu_dev	= pfdev->dev,
+	};
+
+	pgtbl_ops = alloc_io_pgtable_ops(ARM_MALI_LPAE, &pgtbl_cfg, pfdev);
+	if (!pgtbl_ops)
+		return -ENOMEM;
+
+	pfdev->mmu->pgtbl_ops = pgtbl_ops;
+
+	// Need to revisit mem attrs. NC is the default, Mali driver is inner WT
+	mmu_enable(pfdev, 0, pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0],
+		   pgtbl_cfg.arm_lpae_s1_cfg.mair[0]);
+
+	return 0;
+}
+
+void panfrost_mmu_fini(struct panfrost_device *pfdev)
+{
+
+}
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.h b/drivers/gpu/drm/panfrost/panfrost_mmu.h
new file mode 100644
index 000000000000..32d01aaa3097
--- /dev/null
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.h
@@ -0,0 +1,15 @@ 
+// SPDX-License-Identifier:	GPL-2.0
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+
+#ifndef __PANFROST_MMU_H__
+#define __PANFROST_MMU_H__
+
+struct panfrost_gem_object;
+
+int panfrost_mmu_map(struct panfrost_gem_object *bo);
+void panfrost_mmu_unmap(struct panfrost_gem_object *bo);
+
+int panfrost_mmu_init(struct panfrost_device *pfdev);
+void panfrost_mmu_fini(struct panfrost_device *pfdev);
+
+#endif
diff --git a/include/uapi/drm/panfrost_drm.h b/include/uapi/drm/panfrost_drm.h
new file mode 100644
index 000000000000..49eb6be8360d
--- /dev/null
+++ b/include/uapi/drm/panfrost_drm.h
@@ -0,0 +1,138 @@ 
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2014-2018 Broadcom
+ * Copyright © 2019 Collabora ltd.
+ */
+#ifndef _PANFROST_DRM_H_
+#define _PANFROST_DRM_H_
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_PANFROST_SUBMIT			0x00
+#define DRM_PANFROST_WAIT_BO			0x01
+#define DRM_PANFROST_CREATE_BO			0x02
+#define DRM_PANFROST_MMAP_BO			0x03
+#define DRM_PANFROST_GET_PARAM			0x04
+#define DRM_PANFROST_GET_BO_OFFSET		0x05
+
+#define DRM_IOCTL_PANFROST_SUBMIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_SUBMIT, struct drm_panfrost_submit)
+#define DRM_IOCTL_PANFROST_WAIT_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_WAIT_BO, struct drm_panfrost_wait_bo)
+#define DRM_IOCTL_PANFROST_CREATE_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_CREATE_BO, struct drm_panfrost_create_bo)
+#define DRM_IOCTL_PANFROST_MMAP_BO		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_MMAP_BO, struct drm_panfrost_mmap_bo)
+#define DRM_IOCTL_PANFROST_GET_PARAM		DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_PARAM, struct drm_panfrost_get_param)
+#define DRM_IOCTL_PANFROST_GET_BO_OFFSET	DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset)
+
+#define PANFROST_JD_REQ_FS (1 << 0)
+
+/**
+ * struct drm_panfrost_submit - ioctl argument for submitting commands to the 3D
+ * engine.
+ *
+ * This asks the kernel to have the GPU execute a render command list.
+ */
+struct drm_panfrost_submit {
+
+	/** Address to GPU mapping of job descriptor */
+	__u64 jc;
+
+	/** An optional sync object to wait on before starting this job. */
+	__u32 in_sync;
+
+	/** An optional sync object to place the completion fence in. */
+	__u32 out_sync;
+
+	/** Pointer to a u32 array of the BOs that are referenced by the job. */
+	__u64 bo_handles;
+
+	/** Number of BO handles passed in (size is that times 4). */
+	__u32 bo_handle_count;
+
+	/** A combination of PANFROST_JD_REQ_* */
+	__u32 requirements;
+};
+
+/**
+ * struct drm_panfrost_wait_bo - ioctl argument for waiting for
+ * completion of the last DRM_PANFROST_SUBMIT_CL on a BO.
+ *
+ * This is useful for cases where multiple processes might be
+ * rendering to a BO and you want to wait for all rendering to be
+ * completed.
+ */
+struct drm_panfrost_wait_bo {
+	__u32 handle;
+	__u32 pad;
+	__s64 timeout_ns;	/* absolute */
+};
+
+/**
+ * struct drm_panfrost_create_bo - ioctl argument for creating Panfrost BOs.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_panfrost_create_bo {
+	__u32 size;
+	__u32 flags;
+	/** Returned GEM handle for the BO. */
+	__u32 handle;
+	/**
+	 * Returned offset for the BO in the GPU address space.  This offset
+	 * is private to the DRM fd and is valid for the lifetime of the GEM
+	 * handle.
+	 *
+	 * This offset value will always be nonzero, since various HW
+	 * units treat 0 specially.
+	 */
+	__u32 offset;
+};
+
+/**
+ * struct drm_panfrost_mmap_bo - ioctl argument for mapping Panfrost BOs.
+ *
+ * This doesn't actually perform an mmap.  Instead, it returns the
+ * offset you need to use in an mmap on the DRM device node.  This
+ * means that tools like valgrind end up knowing about the mapped
+ * memory.
+ *
+ * There are currently no values for the flags argument, but it may be
+ * used in a future extension.
+ */
+struct drm_panfrost_mmap_bo {
+	/** Handle for the object being mapped. */
+	__u32 handle;
+	__u32 flags;
+	/** offset into the drm node to use for subsequent mmap call. */
+	__u64 offset;
+};
+
+enum drm_panfrost_param {
+	DRM_PANFROST_PARAM_GPU_ID,
+};
+
+struct drm_panfrost_get_param {
+	__u32 param;
+	__u32 pad;
+	__u64 value;
+};
+
+/**
+ * Returns the offset for the BO in the GPU address space for this DRM fd.
+ * This is the same value returned by drm_panfrost_create_bo, if that was called
+ * from this DRM fd.
+ */
+struct drm_panfrost_get_bo_offset {
+	__u32 handle;
+	__u32 pad;
+	__u64 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* _PANFROST_DRM_H_ */