perf: add qcom l2 cache perf events driver

The L2 cache perf driver is named 'l2cache_counters' and can be
used with perf tool to profile L2 cache events as below

=> DDR read (Read-Shared, Read-Unique, Read-Clean and
   Read-Not-Shared-Dirty transactions on GNOC Interface)

=> DDR write (Write-Back, Write-Clean and Write-Evict transactions
   on GNOC Interface

=> SNOOP Read (Read-Once, Read-Shared, Read-Unique, Read-Clean and
   Read-Not-Shared-Dirty transactions from GNOC to Cluster interface)

=> ACP Write(Write-Back, Write-Clean and Write-Evict transactions
   to ACP port of Collapsed Cluster)

=> Tenure counter(Low-Power mode tenure is used to count tenure (no. of XO-
   19.2MHz) of L2 Low-Power mode.

=> Low/Mid/High occurrence counter: Based on threshold set for low and mid
   tenure counter, current tenure count is compared and based on which
   category it belongs, respective occurrence counter gets incremented.
   e.g:

   1. 0 < Current Tenure <= Low-tenure threshold : Low-Tenure
   2. Low-tenure < Current Tenure <= Mid-tenure threshold : Mid-Tenure
   3. Mid-tenure < Current tenure : High-Tenure

Change-Id: I9f8aedd21a92cbd6908deb5a8e4c7e32220bea74
Signed-off-by: Mukesh Ojha <mojha@codeaurora.org>
This commit is contained in:
Mukesh Ojha 2019-11-13 13:57:55 +05:30
parent 3081488d36
commit 5745954a5a
4 changed files with 1283 additions and 0 deletions

View File

@ -0,0 +1,63 @@
Qualcomm Technologies, Inc. l2 Cache counters
=============================================
This driver supports the L2 cache clusters counters found in
Qualcomm Technologies, Inc.
There are multiple physical L2 cache clusters, each with their
own counters. Each cluster has one or more CPUs associated with it.
There is one logical L2 PMU exposed, which aggregates the results from
the physical PMUs(counters).
The driver provides a description of its available events and configuration
options in sysfs, see /sys/devices/l2cache_counters.
The "format" directory describes the format of the events.
And format is of the form 0xXXX
Where,
1 bit(lsb) for group (group is either txn/tenure counter).
4 bits for serial number for counter starting from 0 to 8.
5 bits for bit position of counter enable bit in a register.
The driver provides a "cpumask" sysfs attribute which contains a mask
consisting of one CPU per cluster which will be used to handle all the PMU
events on that cluster.
Examples for use with perf:
perf stat -e l2cache_counters/ddr_read/,l2cache_counters/ddr_write/ -a sleep 1
perf stat -e l2cache_counters/cycles/ -C 2 sleep 1
Limitation: The driver does not support sampling, therefore "perf record" will
not work. Per-task perf sessions are not supported.
For transaction counters we don't need to set any configuration
before monitoring.
For tenure counter use case, we need to set threshold value of low and mid
range occurrence counter value of cluster(as these occurrence counter exist
for each cluster) in sysfs.
echo 1 > /sys/bus/eventsource/devices/l2cache_counters/which_cluster_tenure
echo X > /sys/bus/event_source/devices/l2cache_counters/low_tenure_threshold
echo Y > /sys/bus/event_source/devices/l2cache_counters/mid_tenure_threshold
Here, X < Y
e.g:
perf stat -e l2cache_counters/low_range_occur/ -e
l2cache_counters/mid_range_occur/ -e l2cache_counters/high_range
_occur/ -C 4 sleep 10
Performance counter stats for 'CPU(s) 4':
7 l2cache_counters/low_range_occur/
5 l2cache_counters/mid_range_occur/
7 l2cache_counters/high_range_occur/
10.204140400 seconds time elapsed

View File

@ -77,6 +77,15 @@ config QCOM_L2_PMU
Adds the L2 cache PMU into the perf events subsystem for
monitoring L2 cache events.
config QCOM_L2_COUNTERS
bool "Qualcomm Technologies L2-cache counters (PMU)"
depends on ARCH_QCOM && ARM64
help
Provides support for the L2 cache counters
in Qualcomm Technologies processors.
Adds the L2 cache counters support into the perf events subsystem for
monitoring L2 cache events.
config QCOM_L3_PMU
bool "Qualcomm Technologies L3-cache PMU"
depends on ARCH_QCOM && ARM64 && ACPI

View File

@ -6,6 +6,7 @@ obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
obj-$(CONFIG_HISI_PMU) += hisilicon/
obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o
obj-$(CONFIG_QCOM_L2_COUNTERS) += qcom_l2_counters.o
obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
obj-$(CONFIG_QCOM_LLCC_PMU) += qcom_llcc_pmu.o
obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o

File diff suppressed because it is too large Load Diff