mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
#include <linux/mm.h>
|
|
|
|
#include <linux/mm_event.h>
|
|
|
|
#include <linux/sched.h>
|
|
|
|
#include <linux/vmalloc.h>
|
|
|
|
#include <linux/seq_file.h>
|
|
|
|
#include <linux/debugfs.h>
|
|
|
|
|
|
|
|
#define CREATE_TRACE_POINTS
|
|
|
|
#include <trace/events/mm_event.h>
|
2018-08-06 08:02:06 +02:00
|
|
|
/* msec */
|
2019-01-17 03:14:14 +01:00
|
|
|
static unsigned long period_ms __read_mostly = 500;
|
|
|
|
static unsigned long vmstat_period_ms __read_mostly = 1000;
|
2019-01-15 05:54:07 +01:00
|
|
|
static unsigned long vmstat_next_period;
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
|
2019-01-17 03:14:14 +01:00
|
|
|
static DEFINE_SPINLOCK(vmstat_lock);
|
|
|
|
static DEFINE_RWLOCK(period_lock);
|
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
void mm_event_task_init(struct task_struct *tsk)
|
|
|
|
{
|
|
|
|
memset(tsk->mm_event, 0, sizeof(tsk->mm_event));
|
|
|
|
tsk->next_period = 0;
|
|
|
|
}
|
|
|
|
|
2019-01-15 05:54:07 +01:00
|
|
|
static void record_vmstat(void)
|
|
|
|
{
|
|
|
|
int cpu;
|
|
|
|
struct mm_event_vmstat vmstat;
|
|
|
|
|
2019-01-17 03:14:14 +01:00
|
|
|
if (time_is_after_jiffies(vmstat_next_period))
|
2019-01-15 05:54:07 +01:00
|
|
|
return;
|
|
|
|
|
|
|
|
/* Need double check under the lock */
|
|
|
|
spin_lock(&vmstat_lock);
|
2019-01-17 03:14:14 +01:00
|
|
|
if (time_is_after_jiffies(vmstat_next_period)) {
|
2019-01-15 05:54:07 +01:00
|
|
|
spin_unlock(&vmstat_lock);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
vmstat_next_period = jiffies + msecs_to_jiffies(vmstat_period_ms);
|
|
|
|
spin_unlock(&vmstat_lock);
|
|
|
|
|
|
|
|
memset(&vmstat, 0, sizeof(vmstat));
|
|
|
|
vmstat.free = global_zone_page_state(NR_FREE_PAGES);
|
|
|
|
vmstat.slab = global_node_page_state(NR_SLAB_RECLAIMABLE) +
|
|
|
|
global_node_page_state(NR_SLAB_UNRECLAIMABLE);
|
|
|
|
|
|
|
|
vmstat.file = global_node_page_state(NR_ACTIVE_FILE) +
|
|
|
|
global_node_page_state(NR_INACTIVE_FILE);
|
|
|
|
vmstat.anon = global_node_page_state(NR_ACTIVE_ANON) +
|
|
|
|
global_node_page_state(NR_INACTIVE_ANON);
|
2019-04-24 08:12:20 +02:00
|
|
|
vmstat.ion = global_node_page_state(NR_ION_HEAP);
|
2019-01-15 05:54:07 +01:00
|
|
|
|
|
|
|
vmstat.ws_refault = global_node_page_state(WORKINGSET_REFAULT);
|
|
|
|
vmstat.ws_activate = global_node_page_state(WORKINGSET_ACTIVATE);
|
|
|
|
vmstat.mapped = global_node_page_state(NR_FILE_MAPPED);
|
|
|
|
|
|
|
|
for_each_online_cpu(cpu) {
|
|
|
|
struct vm_event_state *this = &per_cpu(vm_event_states, cpu);
|
|
|
|
|
|
|
|
/* sectors to kbytes for PGPGIN/PGPGOUT */
|
|
|
|
vmstat.pgin += this->event[PGPGIN] / 2;
|
|
|
|
vmstat.pgout += this->event[PGPGOUT] / 2;
|
|
|
|
vmstat.swpin += this->event[PSWPIN];
|
|
|
|
vmstat.swpout += this->event[PSWPOUT];
|
|
|
|
vmstat.reclaim_steal += this->event[PGSTEAL_DIRECT] +
|
|
|
|
this->event[PGSTEAL_KSWAPD];
|
|
|
|
vmstat.reclaim_scan += this->event[PGSCAN_DIRECT] +
|
|
|
|
this->event[PGSCAN_KSWAPD];
|
|
|
|
vmstat.compact_scan += this->event[COMPACTFREE_SCANNED] +
|
2019-01-28 12:00:33 +01:00
|
|
|
this->event[COMPACTMIGRATE_SCANNED];
|
2019-01-15 05:54:07 +01:00
|
|
|
}
|
|
|
|
trace_mm_event_vmstat_record(&vmstat);
|
|
|
|
}
|
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
static void record_stat(void)
|
|
|
|
{
|
2019-01-17 03:14:14 +01:00
|
|
|
int i;
|
|
|
|
|
|
|
|
if (time_is_after_jiffies(current->next_period))
|
|
|
|
return;
|
|
|
|
|
|
|
|
read_lock(&period_lock);
|
|
|
|
current->next_period = jiffies + msecs_to_jiffies(period_ms);
|
|
|
|
read_unlock(&period_lock);
|
|
|
|
|
|
|
|
for (i = 0; i < MM_TYPE_NUM; i++) {
|
|
|
|
if (current->mm_event[i].count == 0)
|
|
|
|
continue;
|
|
|
|
trace_mm_event_record(i, ¤t->mm_event[i]);
|
|
|
|
memset(¤t->mm_event[i], 0,
|
|
|
|
sizeof(struct mm_event_task));
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
}
|
2019-01-17 03:14:14 +01:00
|
|
|
|
2019-10-15 18:09:36 +02:00
|
|
|
record_vmstat();
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void mm_event_start(ktime_t *time)
|
|
|
|
{
|
|
|
|
*time = ktime_get();
|
|
|
|
}
|
|
|
|
|
|
|
|
void mm_event_end(enum mm_event_type event, ktime_t start)
|
|
|
|
{
|
|
|
|
s64 elapsed = ktime_us_delta(ktime_get(), start);
|
|
|
|
|
|
|
|
current->mm_event[event].count++;
|
|
|
|
current->mm_event[event].accm_lat += elapsed;
|
|
|
|
if (elapsed > current->mm_event[event].max_lat)
|
|
|
|
current->mm_event[event].max_lat = elapsed;
|
|
|
|
record_stat();
|
|
|
|
}
|
2020-04-08 01:16:36 +02:00
|
|
|
EXPORT_SYMBOL_GPL(mm_event_end);
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
|
2019-04-23 12:19:27 +02:00
|
|
|
void mm_event_count(enum mm_event_type event, int count)
|
|
|
|
{
|
|
|
|
current->mm_event[event].count += count;
|
|
|
|
record_stat();
|
|
|
|
}
|
2020-04-17 00:04:53 +02:00
|
|
|
EXPORT_SYMBOL_GPL(mm_event_count);
|
2019-04-23 12:19:27 +02:00
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
static struct dentry *mm_event_root;
|
|
|
|
|
2018-08-06 08:02:06 +02:00
|
|
|
static int period_ms_set(void *data, u64 val)
|
|
|
|
{
|
|
|
|
if (val < 1 || val > ULONG_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-01-17 03:14:14 +01:00
|
|
|
write_lock(&period_lock);
|
2018-08-06 08:02:06 +02:00
|
|
|
period_ms = (unsigned long)val;
|
2019-01-17 03:14:14 +01:00
|
|
|
write_unlock(&period_lock);
|
2018-08-06 08:02:06 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int period_ms_get(void *data, u64 *val)
|
|
|
|
{
|
2019-01-17 03:14:14 +01:00
|
|
|
read_lock(&period_lock);
|
2018-08-06 08:02:06 +02:00
|
|
|
*val = period_ms;
|
2019-01-17 03:14:14 +01:00
|
|
|
read_unlock(&period_lock);
|
|
|
|
|
2018-08-06 08:02:06 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-15 05:54:07 +01:00
|
|
|
static int vmstat_period_ms_set(void *data, u64 val)
|
|
|
|
{
|
|
|
|
if (val < 1 || val > ULONG_MAX)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2019-01-17 03:14:14 +01:00
|
|
|
spin_lock(&vmstat_lock);
|
2019-01-15 05:54:07 +01:00
|
|
|
vmstat_period_ms = (unsigned long)val;
|
2019-01-17 03:14:14 +01:00
|
|
|
spin_unlock(&vmstat_lock);
|
2019-01-15 05:54:07 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int vmstat_period_ms_get(void *data, u64 *val)
|
|
|
|
{
|
2019-01-17 03:14:14 +01:00
|
|
|
spin_lock(&vmstat_lock);
|
2019-01-15 05:54:07 +01:00
|
|
|
*val = vmstat_period_ms;
|
2019-01-17 03:14:14 +01:00
|
|
|
spin_unlock(&vmstat_lock);
|
2019-01-15 05:54:07 +01:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-06 08:02:06 +02:00
|
|
|
DEFINE_SIMPLE_ATTRIBUTE(period_ms_operations, period_ms_get,
|
|
|
|
period_ms_set, "%llu\n");
|
2019-01-15 05:54:07 +01:00
|
|
|
DEFINE_SIMPLE_ATTRIBUTE(vmstat_period_ms_operations, vmstat_period_ms_get,
|
|
|
|
vmstat_period_ms_set, "%llu\n");
|
2018-08-06 08:02:06 +02:00
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
static int __init mm_event_init(void)
|
|
|
|
{
|
2018-08-06 08:02:06 +02:00
|
|
|
struct dentry *entry;
|
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
mm_event_root = debugfs_create_dir("mm_event", NULL);
|
|
|
|
if (!mm_event_root) {
|
|
|
|
pr_warn("debugfs dir <mm_event> creation failed\n");
|
|
|
|
return PTR_ERR(mm_event_root);
|
|
|
|
}
|
|
|
|
|
2018-08-06 08:02:06 +02:00
|
|
|
entry = debugfs_create_file("period_ms", 0644,
|
|
|
|
mm_event_root, NULL, &period_ms_operations);
|
|
|
|
|
|
|
|
if (IS_ERR(entry)) {
|
|
|
|
pr_warn("debugfs file mm_event_task creation failed\n");
|
|
|
|
debugfs_remove_recursive(mm_event_root);
|
|
|
|
return PTR_ERR(entry);
|
|
|
|
}
|
|
|
|
|
2019-01-15 05:54:07 +01:00
|
|
|
entry = debugfs_create_file("vmstat_period_ms", 0644,
|
|
|
|
mm_event_root, NULL, &vmstat_period_ms_operations);
|
|
|
|
if (IS_ERR(entry)) {
|
|
|
|
pr_warn("debugfs file vmstat_mm_event_task creation failed\n");
|
|
|
|
debugfs_remove_recursive(mm_event_root);
|
|
|
|
return PTR_ERR(entry);
|
|
|
|
}
|
|
|
|
|
mm: introduce per-process mm event tracking feature
Linux supports /proc/meminfo and /proc/vmstat stats as memory health metric.
Android uses them too. If user see something goes wrong(e.g., sluggish, jank)
on their system, they can capture and report system state to developers
for debugging.
It shows memory stat at the moment the bug is captured. However, it’s
not enough to investigate application's jank problem caused by memory
shortage. Because
1. It just shows event count which doesn’t quantify the latency of the
application well. Jank could happen by various reasons and one of simple
scenario is frame drop for a second. App should draw the frame every 16ms
interval. Just number of stats(e.g., allocstall or pgmajfault) couldn't
represnt how many of time the app spends for handling the event.
2. At bugreport, dump with vmstat and meminfo is never helpful because it's
too late to capture the moment when the problem happens.
When the user catch up the problem and try to capture the system state,
the problem has already gone.
3. Although we could capture MM stat at the moment bug happens, it couldn't
be helpful because MM stats are usually very flucuate so we need historical
data rather than one-time snapshot to see MM trend.
To solve above problems, this patch introduces per-process, light-weight,
mm event stat. Basically, it tracks minor/major faults, reclaim and compaction
latency of each process as well as event count and record the data into global
buffer.
To compromise memory overhead, it doesn't record every MM event of the process
to the buffer but just drain accumuated stats every 0.5sec interval to buffer.
If there isn't any event, it just skips the recording.
For latency data, it keeps average/max latency of each event in that period
With that, we could keep useful information with small buffer so that
we couldn't miss precious information any longer although the capture time
is rather late. This patch introduces basic facility of MM event stat.
After all patches in this patchset are applied, outout format is as follows,
dumpstate can use it for VM debugging in future.
<...>-1665 [001] d... 217.575173: mm_event_record: min_flt count=203 avg_lat=3 max_lat=58
<...>-1665 [001] d... 217.575183: mm_event_record: maj_flt count=1 avg_lat=1994 max_lat=1994
<...>-1665 [001] d... 217.575184: mm_event_record: kern_alloc count=227 avg_lat=0 max_lat=0
<...>-626 [000] d... 217.578096: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
<...>-6547 [000] .... 217.581913: mm_event_record: min_flt count=7 avg_lat=7 max_lat=20
<...>-6547 [000] .... 217.581955: mm_event_record: kern_alloc count=4 avg_lat=0 max_lat=0
This feature uses event trace for output buffer so that we could use all of
general benefit of event trace(e.g., buffer size management, filtering and
so on). To prevent overflow of the ring buffer by other random event race,
highly suggest that create separate instance of tracing
on /sys/kernel/debug/tracing/instances/
I had a concern of adding overhead. Actually, major|compaction/reclaim
are already heavy cost so it should be not a concern. Rather than,
minor fault and kern alloc would be severe so I tested a micro benchmark
to measure minor page fault overhead.
Test scenario is create 40 threads and each of them does minor
page fault for 25M range(ranges are not overwrapped).
I didn't see any noticible regression.
Base:
fault/wsec avg: 758489.8288
minor faults=13123118, major faults=0 ctx switch=139234
User System Wall fault/wsec
39.55s 41.73s 17.49s 749995.768
minor faults=13123135, major faults=0 ctx switch=139627
User System Wall fault/wsec
34.59s 41.61s 16.95s 773906.976
minor faults=13123061, major faults=0 ctx switch=139254
User System Wall fault/wsec
39.03s 41.55s 16.97s 772966.334
minor faults=13123131, major faults=0 ctx switch=139970
User System Wall fault/wsec
36.71s 42.12s 17.04s 769941.019
minor faults=13123027, major faults=0 ctx switch=138524
User System Wall fault/wsec
42.08s 42.24s 18.08s 725639.047
Base + MM event + event trace enable:
fault/wsec avg: 759626.1488
minor faults=13123488, major faults=0 ctx switch=140303
User System Wall fault/wsec
37.66s 42.21s 17.48s 750414.257
minor faults=13123066, major faults=0 ctx switch=138119
User System Wall fault/wsec
36.77s 42.14s 17.49s 750010.107
minor faults=13123505, major faults=0 ctx switch=140021
User System Wall fault/wsec
38.51s 42.50s 17.54s 748022.219
minor faults=13123431, major faults=0 ctx switch=138517
User System Wall fault/wsec
36.74s 41.49s 17.03s 770255.610
minor faults=13122955, major faults=0 ctx switch=137174
User System Wall fault/wsec
40.68s 40.97s 16.83s 779428.551
Bug: 80168800
Bug: 116825053
Bug: 153442668
Test: boot
Change-Id: I4e69c994f47402766481c58ab5ec2071180964b8
Signed-off-by: Minchan Kim <minchan@google.com>
(cherry picked from commit 04ff5ec537a5f9f546dcb32257d8fbc1f4d9ca2d)
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: UtsavBalar1231 <utsavbalar1231@gmail.com>
2019-04-22 18:04:59 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
subsys_initcall(mm_event_init);
|