ANDROID: sched: Clean-up SchedTune documentation

SchedTune's documentation file hasn't been updated in a while and
occasionally contains stale informations in a few places.

Clean-up the doc file by:
 - removing the references to the now defunct sched-freq;
 - replacing references to SCHED_LOAD_SCALE by SCHED_CAPACITY_SCALE;
 - removing the statement that the wake-up path is not boost-aware;
 - removing reference to negative boosting;
 - removing 'motivation' paragraphs that aren't really relevant anymore;
 - and making sure to fit all the text in 80 chars.

No fundamental changes about the core of the explanations.

Bug: 120440300
Fixes: 04629103c9ff ("ANDROID: sched: fair/tune: Add schedtune with
cgroups interface")
Change-Id: I8ad92a93082e2efe92bc3a7526960e50032be909
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
This commit is contained in:
Quentin Perret 2018-12-05 17:58:19 +00:00 committed by Todd Kjos
parent a359befaf6
commit 61dd81300c

View File

@ -30,11 +30,9 @@ Table of Contents
1. Motivation
=============
Sched-DVFS [3] was a new event-driven cpufreq governor which allows the
Schedutil [3] is a utilization-driven cpufreq governor which allows the
scheduler to select the optimal DVFS operating point (OPP) for running a task
allocated to a CPU. Later, the cpufreq maintainers introduced a similar
governor, schedutil. The introduction of schedutil also enables running
workloads at the most energy efficient OPPs.
allocated to a CPU.
However, sometimes it may be desired to intentionally boost the performance of
a workload even if that could imply a reasonable increase in energy
@ -44,16 +42,16 @@ by it's CPU bandwidth demand.
This last requirement is especially important if we consider that one of the
main goals of the utilization-driven governor component is to replace all
currently available CPUFreq policies. Since sched-DVFS and schedutil are event
based, as opposed to the sampling driven governors we currently have, they are
already more responsive at selecting the optimal OPP to run tasks allocated to
a CPU. However, just tracking the actual task load demand may not be enough
from a performance standpoint. For example, it is not possible to get
behaviors similar to those provided by the "performance" and "interactive"
CPUFreq governors.
currently available CPUFreq policies. Since schedutil is event-based, as
opposed to the sampling driven governors we currently have, they are already
more responsive at selecting the optimal OPP to run tasks allocated to a CPU.
However, just tracking the actual task utilization may not be enough from a
performance standpoint. For example, it is not possible to get behaviors
similar to those provided by the "performance" and "interactive" CPUFreq
governors.
This document describes an implementation of a tunable, stacked on top of the
utilization-driven governors which extends their functionality to support task
utilization-driven governor which extends its functionality to support task
performance boosting.
By "performance boosting" we mean the reduction of the time required to
@ -63,17 +61,6 @@ example, if we consider a simple periodic task which executes the same workload
for 5[s] every 20[s] while running at a certain OPP, a boosted execution of
that task must complete each of its activations in less than 5[s].
A previous attempt [5] to introduce such a boosting feature has not been
successful mainly because of the complexity of the proposed solution. Previous
versions of the approach described in this document exposed a single simple
interface to user-space. This single tunable knob allowed the tuning of
system wide scheduler behaviours ranging from energy efficiency at one end
through to incremental performance boosting at the other end. This first
tunable affects all tasks. However, that is not useful for Android products
so in this version only a more advanced extension of the concept is provided
which uses CGroups to boost the performance of only selected tasks while using
the energy efficient default for all others.
The rest of this document introduces in more details the proposed solution
which has been named SchedTune.
@ -97,25 +84,22 @@ More details are given in section 5.
2.1 Boosting
============
The boost value is expressed as an integer in the range [-100..0..100].
The boost value is expressed as an integer in the range [0..100].
A value of 0 (default) configures the CFS scheduler for maximum energy
efficiency. This means that sched-DVFS runs the tasks at the minimum OPP
efficiency. This means that schedutil runs the tasks at the minimum OPP
required to satisfy their workload demand.
A value of 100 configures scheduler for maximum performance, which translates
to the selection of the maximum OPP on that CPU.
A value of -100 configures scheduler for minimum performance, which translates
to the selection of the minimum OPP on that CPU.
The range between -100, 0 and 100 can be set to satisfy other scenarios suitably.
For example to satisfy interactive response or depending on other system events
The range between 0 and 100 can be set to satisfy other scenarios suitably. For
example to satisfy interactive response or depending on other system events
(battery level etc).
The overall design of the SchedTune module is built on top of "Per-Entity Load
Tracking" (PELT) signals and sched-DVFS by introducing a bias on the Operating
Performance Point (OPP) selection.
Tracking" (PELT) signals and schedutil by introducing a bias on the OPP
selection.
Each time a task is allocated on a CPU, cpufreq is given the opportunity to tune
the operating frequency of that CPU to better match the workload demand. The
@ -141,9 +125,6 @@ can be placed according to the energy-aware wakeup strategy.
A value of 1 signals to the CFS scheduler that tasks in this group should be
placed to minimise wakeup latency.
The value is combined with the boost value - task placement will not be
boost aware however CPU OPP selection is still boost aware.
Android platforms typically use this flag for application tasks which the
user is currently interacting with.
@ -169,21 +150,16 @@ to a signal to get its inflated value:
margin := boosting_strategy(sched_cfs_boost, signal)
boosted_signal := signal + margin
Different boosting strategies were identified and analyzed before selecting the
one found to be most effective.
Signal Proportional Compensation (SPC)
--------------------------------------
In this boosting strategy the sched_cfs_boost value is used to compute a
margin which is proportional to the complement of the original signal.
The boosting strategy currently implemented in SchedTune is called 'Signal
Proportional Compensation' (SPC). With SPC, the sched_cfs_boost value is used to
compute a margin which is proportional to the complement of the original signal.
When a signal has a maximum possible value, its complement is defined as
the delta from the actual value and its possible maximum.
Since the tunable implementation uses signals which have SCHED_LOAD_SCALE as
Since the tunable implementation uses signals which have SCHED_CAPACITY_SCALE as
the maximum possible value, the margin becomes:
margin := sched_cfs_boost * (SCHED_LOAD_SCALE - signal)
margin := sched_cfs_boost * (SCHED_CAPACITY_SCALE - signal)
Using this boosting strategy:
- a 100% sched_cfs_boost means that the signal is scaled to the maximum value
@ -209,7 +185,7 @@ following figure where:
^
| SCHED_LOAD_SCALE
| SCHED_CAPACITY_SCALE
+-----------------------------------------------------------------+
|pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
|
@ -250,7 +226,7 @@ one, depending on the value of sched_cfs_boost. This is a clean an non invasive
modification of the existing existing code paths.
The signal representing a CPU's utilization is boosted according to the
previously described SPC boosting strategy. To sched-DVFS, this allows a CPU
previously described SPC boosting strategy. To schedutil, this allows a CPU
(ie CFS run-queue) to appear more used then it actually is.
Thus, with the sched_cfs_boost enabled we have the following main functions to
@ -262,10 +238,9 @@ get the current utilization of a CPU:
The new boosted_cpu_util() is similar to the first but returns a boosted
utilization signal which is a function of the sched_cfs_boost value.
This function is used in the CFS scheduler code paths where sched-DVFS needs to
decide the OPP to run a CPU at.
For example, this allows selecting the highest OPP for a CPU which has
the boost value set to 100%.
This function is used in the CFS scheduler code paths where schedutil needs to
decide the OPP to run a CPU at. For example, this allows selecting the highest
OPP for a CPU which has the boost value set to 100%.
5. Per task group boosting
@ -305,16 +280,16 @@ main characteristics:
This number is defined at compile time and by default configured to 16.
This is a design decision motivated by two main reasons:
a) In a real system we do not expect utilization scenarios with more then few
boost groups. For example, a reasonable collection of groups could be
just "background", "interactive" and "performance".
a) In a real system we do not expect utilization scenarios with more than
a few boost groups. For example, a reasonable collection of groups could
be just "background", "interactive" and "performance".
b) It simplifies the implementation considerably, especially for the code
which has to compute the per CPU boosting once there are multiple
RUNNABLE tasks with different boost values.
Such a simple design should allow servicing the main utilization scenarios identified
so far. It provides a simple interface which can be used to manage the
power-performance of all tasks or only selected tasks.
Such a simple design should allow servicing the main utilization scenarios
identified so far. It provides a simple interface which can be used to manage
the power-performance of all tasks or only selected tasks.
Moreover, this interface can be easily integrated by user-space run-times (e.g.
Android, ChromeOS) to implement a QoS solution for task boosting based on tasks
classification, which has been a long standing requirement.
@ -397,9 +372,9 @@ How are multiple groups of tasks with different boost values managed?
---------------------------------------------------------------------
The current SchedTune implementation keeps track of the boosted RUNNABLE tasks
on a CPU. The CPU utilization seen by the scheduler-driven cpufreq governors
(and used to select an appropriate OPP) is boosted with a value which is the
maximum of the boost values of the currently RUNNABLE tasks in its RQ.
on a CPU. The CPU utilization seen by schedutil (and used to select an
appropriate OPP) is boosted with a value which is the maximum of the boost
values of the currently RUNNABLE tasks in its RQ.
This allows cpufreq to boost a CPU only while there are boosted tasks ready
to run and switch back to the energy efficient mode as soon as the last boosted
@ -410,4 +385,4 @@ task is dequeued.
=============
[1] http://lwn.net/Articles/552889
[2] http://lkml.org/lkml/2012/5/18/91
[3] http://lkml.org/lkml/2015/6/26/620
[3] https://lkml.org/lkml/2016/3/29/1041