ON STOCHASTIC OPTIMIZATION FOR SMARTPHONE CPU ENERGY CONSUMPTION DECREASE

On


Introduction.
Mobile devices, such as smartphones, tablets, and smartwatches, became an integral part of modern life.Digital services provided by these devices both improve quality of life and form new ways of social interactions.More than 6.3 billion smartphone subscriptions were active, and in 2024 this figure is expected to exceed 7 billion people.The most common operating system (OS) for mobile devices today is Android [1].
As the battery capacity of mobile devices is limited, power consumption optimization becomes an increasingly important task -no one wants to have an important call interrupted because of a discharged battery.This issue could be alleviated by applied software, which is capable of providing energy consumption models and eco-friendly profiles.Then, a boost of device longevity is provided by new hardware technologies like Li-Pol batteries or heterogeneous big.LITTLE CPU architecture.Finally, Android OS contains a power management subsystem which includes three components: energy-aware scheduling (EAS), dynamic voltage frequency scaling (DVFS) governors and idle state (IS) governors, so improving underlying algorithms improves energy consumption as well.
In essence, the DVFS algorithm analyzes the current state of the smartphone and advises the CPU to switch to a particular frequency, so that the algorithm could meet some optimization criteria.Performance and energy consumption cannot be optimized at the same time, because to save energy one has to set a low frequency, and higher computational capabilities call for higher battery resources.Therefore, every energy-aware DVFS algorithm finds some form of balance by introducing combined optimization criteria.Ideally, applications run as slow as perceivable comfortable to save energy.
A significant part of research is mainly concentrated on DVFS algorithms or better energy models for task scheduling, and there is a reason behind such attention -energy-aware task allocation combined with online CPU performance control could result in considerable energy savings.Simultaneous perturbation stochastic approximation (SPSA) is one of the possible ways to track optimal CPU frequency in terms of both energy consumption and performance.On average, the algorithm will nearly follow the steepest descent direction [2].SPSA could be considered a random search technique, and it helped in solving various computer science-related tasks [3].In our previous work, we investigated an SPSA-based DVFS governor using a single noisy observation [4], and while working on par with commonly used algorithms, further experiments confirmed that an approach proposed there, although already usable, was not final.SPSA with two noisy observations is more stable compared to one observation version [5].Thus, the optimal frequency tracking process converges faster on average.
The main contribution of this paper is a proof of the concept that the energy-aware DVFS algorithm could be built based on SPSA with two noisy observations.Additionally, we formulate a different average risk function which takes into account the cost of execution of a program at a particular frequency to save energy.Technical considerations are also investigated, for example, whether is it worth running our new DVFS governor with EAS or Completely Fair Scheduler (CFS), and how to connect it with EAS when needed.We also take into account the notable fact that the CPU frequency change cycle could be considerably longer than the DVFS cycle.
This paper is organized as follows.In Section 2, an overview of the DVFS and EAS alongside the existing algorithms for modern smartphones for those subsystems is given.It also contains a brief overview of stochastic optimization in general, a description of the SPSA approach and a review of the previous work in the field.Section  are presented in Section 4, while experimental results are analyzed in Section 5.
Final remarks are given in Section 6 to conclude this paper.

Overview 2.1. DVFS and EAS.
CPUs in modern smartphone systems-on-a-chip are always multicore.Generally, this architecture is homogeneous -each core has the same computational capabilities and power profile.However, there is a novel approach for mobile device CPU design reflected in the ARM bid.LITTLE architecture -a heterogeneous CPU.Cores are divided into several clusters, and each core is homogeneous only within its cluster.Therefore it is possible to run computationally non-demanding programs on weaker, but energy efficient (so-called "LITTLE") cores, while more powerful and power-hungry ("big") cores are utilized for prioritized or computationally intensive tasks.A real-world example of such a design is given in Table 1.It is important to note for our research that core configuration is available within OS, and operating frequency is set uniformly for the entire cluster, that is, every core within a cluster always works at the same frequency.Different clusters might work at different frequencies at the same time.
1006 Информатика и автоматизация. 2023.Том 22 № 5. ISSN 2713-3192 (печ.)ISSN 2713-3206 (онлайн) www.ia.spcras.ru_____________________________________________________________________ ЦИФРОВЫЕ ИНФОРМАЦИОННО-ТЕЛЕКОММУНИКАЦИОННЫЕ ТЕХНОЛОГИИ DVFS as a technique is a control over CPU operating frequency and voltage at runtime to increase or decrease CPU performance at a cost of power consumption.Generally, CPU power consumption is related to the cube of its frequency, and as a side effect CPU heats.However, cooling mechanisms available to mobile devices are limited both by power usage and form factor, therefore most of the time heat could only be dissipated by surrounding air.Thus therefore, operating frequencies are limited to those producing an amount of heat that could be consistently dissipated -at the time of writing, about 2 GHz for commercially available CPUs.
Although dependency between power consumption and frequency is still vaguely close to linear at these levels, power efficiency -the relation of power to performance -is clearly exponential [6].Power efficiency could also be interpreted as the cost of instruction execution at a selected frequency.A complimentary optimization criterion is the execution time, so in practice, the goal is to achieve a certain balance between power and performance.Android OS DVFS governors are OS modules that observe the current state of a device and send signals to the CPU to increase or decrease operating frequency.For example, powersave and performance governors set minimum and maximum frequency respectively.
Although we say that DVFS governors "set" some frequency to a cluster, it is technically a recommendation to the CPU, not a direct assignment.Frequency change does not happen immediately, and DVFS cycle length -the time between governor invocations -could be considerably lower than the frequency change time.For example, the default cycle length for the OnDemand governor is 10 ms, while both clusters of Xiaomi Redmi Note 8 Pro CPU take about 30 ms to change frequency.So while the first governor invocation could initiate hardware frequency update, from the CPU perspective the next two invocations are as good as non-existing.
Energy-aware scheduling (EAS) [8] is an Android OS task scheduler, which could only operate in heterogeneous CPU topologies such as ARM big.LITTLE.Power management for symmetric topologies is uniform, that is, CPU frequency is set globally for the entire CPU, therefore, the maximum occupancy and performance are the same for every core.EAS uses a normalized energy model for every core cluster, taking into account available frequencies and power usage.When a task needs to be scheduled, EAS chooses the core to run this task in such a way that CPU power consumption will increase in the least possible way.However, EAS works while the CPU load is lower than 80%, otherwise, a Completely Fair Scheduler is used until the load is decreased.In order to obtain consistent power savings, EAS should be able to Such network architecture reduces CPU power consumption by a maximum of 19% in comparison to common DVFS.
DVFS could be built in mind with other criteria than performance or energy saving.Chip temperature is another important factor, and by using decisions from the relatively simple neural network in the DVFS process average chip temperature could be reduced by up to 18 deg C with a minimum execution overhead and comparable performance [12].However, chip temperature is not mutually exclusive with energy efficiency, as using deep reinforcement learning neural network to determine the temperature of the chip and estimating environmental temperature could result in 23.9% less energy consumption while retaining the required level of performance [13].
Application-specific DVFS models could also be built.For instance, a DVFS model for augmented reality applications that takes into account the requested frame rates and response time could theoretically be decreased by up to 80%, but it was not proved in practice yet [14].
Graphical Processing Unit (GPU) is a powerful device to offload computations from the CPU, and it also could operate on different frequencies, so it is reasonable to make a joint DVFS strategy for both devices.A simple rule-based model could reduce energy consumption by 18.11%, while the smartphone frame rate drops by 3.12% [15].Another approach is to aggregate load, energy and temperature data for CPU, GPU and RAM, and assign frequency for each component by joint priorities list.This technique saves at least 26.8% power compared to default governors and state-of-the-art approaches [16].
2.3.Idle states and idle state governors.Modern CPUs are capable to enter idle states where program execution is suspended.Multi-core processors can set one or more of their cores to idle state, while other cores remain active.While in an idle state, part of the processor hardware is switched off, so it consumes less power.The deeper the state, the more hardware is switched off, but at the cost of greater entry and exit times.In Linux and Android OS terms, the total enter latency and minimum time hardware would stay in a particular idle state is called target residency [17].
For example, without going into much detail, here is a list of idle states of dual-core ARM CortexA9 processor [18]  -C4 (CPUs OFF, MPU CSWR + CORE Open Switched Retention)similar to C3, but the core is in OSWR mode.Exit latency is 1500 us.
It is important to note that ability of a CPU to enter a particular state could be or could be not utilized by mobile device system-on-chip and OS, therefore, it is possible to observe a smartphone capable of entering only C1, while its CPU could go deeper by design.
The idle state governor is a module within the Android OS kernel that can track the current system state and send a signal to the CPU to enter or exit some idle state for one or more of its cores.There are several default algorithms generally available in Android devices, and at the time of writing the default algorithm is the menu.It tries to predict the current idle duration, then adjusts the obtained value based on a number of factors, and then tries to find the deepest idle state given its target residency and exit latency.The prediction of the current idle duration is based on the history of the previous idle times.
2.4.Evaluating Energy Consumption.Measuring smartphone CPU energy consumption is a non-trivial task, and there are several approaches available.We classify them into direct and non-direct approaches [7].
Direct approaches involve the physical measurement of momentary CPU electrical parameters either with external or internal sensors.Energy consumed over a period of time could be estimated as: where U (t) is momentary voltage, I(t) is momentary current.While it is technically possible to have separate circuits to power the CPU at different voltages, it is challenging, so in commercially available smartphones DVFS is in fact dynamic frequency-only scaling, while voltage changes only due to natural processes within a battery1 .Therefore, we may rewrite the above formula as: where U is a nominal voltage on a CPU.Equation (1) allows to simplify measurement scheme and use only an ammeter.In practice, as ammeter readings are discrete, the Riemann sum is calculated instead of the actual integral.
While this approach could provide the most reliable results, there are considerable limitations to it.First of all, while the internal sensors are rather common for smartphone peripherals such as Bluetooth and Wi-Fi modules, they are seldom used to estimate the CPU power consumption 2 .The frequency and momentary current multiple times over a single second.An external ammeter should be able to catch such high-frequency changes, so most advanced commercially advanced ammeters operate in KHz range [19], but their cost could be a prohibitive factor.Additionally, ammeters should be connected, consequentially, with the measured device.Smartphone electronics do not allow us to easily connect CPU power input with ammeter.Alternatively, one could connect an ammeter to a battery connection slot while also using an external power device at a constant voltage, but readings acquired with this approach will be inevitably skewed by other smartphone peripherals such as a screen or Wi-Fi module.
Non-direct approaches estimate energy consumption by associating statistics unrelated to energy to it by some model.For example, one could estimate how much energy is consumed by a specific CPU instruction [20].Due to the size of instruction sets in modern CPUs, we consider this approach impractical and instead follow the model proposed by Google [21].Under this model, a CPU operating at a specified frequency consumes a specific constant current.Therefore, equation ( 1) is simplified even further as: where n -number of frequencies available to CPU, f i -particular operating frequency, I(f )i) -constant current at a specified frequency, t f i -time spent by CPU at a frequency f i .Equation ( 2) is further supported by Android OS, as t f i is stored in special time-in-state files in the /sys directory.Temporal data is stored in separate lines for each frequency as "<frequency><time>".The number of lines is equal to n.It is worth noting, that the CPU clusters under big.LITTLE architecture are treated as separate CPUs, so lines in time-in-state also 2 Anecdotally, we didn't encounter them in any devices available to us.differ from core to core depending on the cluster they belong to.Time is measured in 10 milliseconds units, and the count is started when the corresponding OS driver is installed or reset to measure processor data.Values for I(f i ) are stored in a weight coefficients file called power_profile.xml.This file should be provided by the smartphone manufacturer, but unfortunately, it is not always the case.However, absent power constants could be extracted from other smartphones using the same CPU model with the same or similar core topology.An example of such file contents is given in Table I.
To estimate power consumption one needs to multiply timing data to corresponding weight coefficients in a device power_profile.xml-a file provided by a smartphone manufacturer which contains power metrics for each smartphone device or peripheral.A sample of its contents related to CPU is shown in Table I.Power constants there are reported in mA 3 .
Different DVFS governors produce different time distributions over available sets of frequencies.Methodologically, when a voltage is constant, electrical charge is a main indicator of energy consumption and Equation ( 2) could be rewritten as: where q f i is an electrical charge spent at i-th frequency, q -total electrical charge.Because of this, we report energy consumption in our experiments in mAh.

Simultaneous Perturbation Stochastic Approximation.
In a significant number of control problems target system behavior could be described in the form of empirical quality functional (also known as medium risk functional).An optimal control action is taken based on the extrema of this functional.In our case, CPU operating frequency might be determined based on the list of available frequencies, current CPU workload and its history and other quality criteria such as energy consumption.
More formally, let F t (x, w) be a function of discrete time t, some parameter x and randomised vector w.The medium risk functional is defined as: and the minimum point of f t (x) as Then in order to find the optimal point the task is to build the sequence of estimations { θn } such that || θn − θ t || → min based on observations of the random variables F t (x n , w n ), n = 1, 2,..We define a momentary trial perturbation as a sequence of the observed uniformly symmetrically distributed independent random vectors ∆ n with covariance matrices: where δ nj ∈ {0, 1} is the Kronecker symbol, 0 < σ ∆ < ∞.The Bernoulli random vectors are suitable and frequently used as a simultaneous trial perturbation because the vector coordinates ∆ n are independent of one another and have equiprobable values of ±1.
It is possible to use three following algorithms when observations are noisy: to build a minimum point estimation sequence for a functional F (x) without significant loss of convergence rate [22].We denote noisy observations in the following way: {α n } and {β n } here are sequences of non-negative numbers conforming to a set of conditions, w + n is a stochastic perturbation vector for y + n observation, v + n is an arbitrary external noise during the observation.This recurrent procedure is called simultaneous perturbation stochastic approximation (SPSA) because it inseparably contains a randomized trial perturbation which is simultaneous in all coordinates.Overall, SPSA could be classified as a stochastic gradient descent algorithm.The first algorithm uses only a single noisy observation, and the second and third involve two noisy observations.For the purposes of distinction between those variations of the SPSA algorithm, we will call the version with a single noisy observation as SPSA1, and the latter two variations -SPSA2.
Among the conditions for consistency of estimates we specifically set out a condition for a weak correlation between the trial perturbation {∆ n } and sequences of indeterminacies {w n } and {v n } as the most important.SPSA1 has lower mean squared convergence rate compared to SPSA2 but has the advantage of using only a single noisy observation, which could be more time-efficient at the end, hence both variations could be used in practice.
Both variations of the SPSA algorithm follow the same general flow: 1. Define an empirical functional F (x).
2. Make an initial optimal estimate of θ0 .
3. Perturb a current optimal estimate.4. Obtain the required amount of the noisy observations of F and update a current optimal estimate θn .
5. Go to Step 3. Previously, we developed a DVFS governor based on SPSA with a single noisy observation [4].It was shown, that overall its energy consumption is between OnDemand and Interactive governors under the CFS scheduler with a performance drop of no more than 3.06%, but in some specific scenarios it could save up to 16% of the total CPU energy.

Algorithm 3.1. Defining
The empirical Functional.Empirical functional is a crucial component for SPSA algorithm.Although from a theoretical perspective, its definition could vary significantly even within the same problem [2], we used the lessons learned of the SPSA1 governor and added the following considerations to the functional definition.
We start our discussion by defining a load L on the CPU core as the percentage of active CPU time from the total CPU time (active and idle) over some period of time.This definition is the same as the one defined for the OnDemand governor.Then, a computational volume or simply volume denotes a product of frequency by load.Essentially, the volume is a number of the CPU ticks dedicated to active computational processes, and it is closely related to the number of the executed instructions.The crucial observation for DVFS is that the same volume produces different loads under different frequencies: the higher the frequency, the lower the load, and vice versa.
The task of DVFS is peculiar in the sense that we can base our DVFS strategy on the history of the CPU load observations among other criteria, but still, the future load cannot be reliably determined -for example, we can't predict that smartphone would receive a phone call in the next second.Therefore, we introduce target load or threshold load L T -a specific constant value of load over some observation time.In our experiments we used empirically determined values of L T from 60% to 80% -such values provide additional computational capacity in a scenario where the actual load proved to be much higher than anticipated from the load history.L T is used as a boundary between two different DVFS strategies: -If the current load exceeds L T , we assume that the smartphone performs a rather long computational task.Therefore, all CPU resources should be made available for it, and the optimal frequency that DVFS should set is the one that provides the closest load to L T under the currently observed volume.In this case, performance considerations outweigh energy savings.Long-term, if a load doesn't drop below L T , the maximum frequency will eventually be set until the load drops.
-If the load does not exceed L T , it means there is room to optimize energy consumption by finding such a frequency that provides the best energy efficiency and the projected load still does not exceed L T .
We now define execution efficiency or cost-of-execution (CoE) of an operating frequency f in the following way: CoE could also be viewed as a priority for the DVFS governor for the frequency selection -the lower the value, the higher the priority.Under this point of view, some generalized cost-of-execution GCoE metric could be used instead of CoE to prioritize the frequency selection, as it only needs to establish order relationship between frequencies.
In our research, we defined it as: As discussed, the CPU voltage is safe to be treated as constant, so by removing U term from the CoE definition we do not change the established order relationship 4 .
It is important to note that, given the power constants available, with the increase of f I f also increases, but GCoE(f ) is generally a non-monotonically increasing function.An example is given in Table 1, where the A55 core consumes the least amount of power at a frequency of 500MHz, but is most energy efficient at 875MHz.
In the end, when the current load is below L T , the optimal frequency would be the one with the lowest GCoE among those who could process the same volume without the projected load exceeding L T .
As we defined the algorithm to determine the optimal frequency in both scenarios (performance and optimization), the result of our empirical functional and the quality of the current frequency estimation is determined as the distance between the current and optimal frequency indexes in a sorted frequency list.

Theoretical Foundation.
In order for empirical functional to be usable in the SPSA2 algorithm and to prove the consistency of the algorithm, some conditions should be satisfied.First, the changes of optimal frequency are bounded by a task definition: 4 Besides, given that in our case U =1V, CoE and GCoE are numerically equal.

Assumption 2:
The gradient ∇F (f ) satisfies the Lipschitz condition: It is proved that SPSA2 converges when both assumptions for empirical functional are met [5], therefore, algorithm from 3.1 converges.

Strategies to Implement Observations.
A straightforward approach to implementing SPSA2 takes 3 invocations of the DVFS routine.We note that due to the absence of floating point operations in the Android OS kernel, it is easier to manipulate with frequency indexes in a sorted array rather than with the frequencies themselves.In all cases if the index is beyond array boundaries, it is set to low or high boundary correspondingly.
1. Given the current frequency index i i , a random number ∆ = ±1 is generated, and a frequency with the index i i + ∆β is set to obtain the first noisy observation y + of functional.
2. Then the frequency with index i i − ∆β is set to obtain the second noisy observation y − .

New frequency is set by index
. However, by definition of SPSA2 a faster approach is also possible which takes only two invocations: 1.The current functional value under the current frequency with index i i is treated as the first noisy observation y 0 .A random number ∆ = ±1 is generated, and a frequency with the index i i + ∆β is set to obtain the second noisy observation y + of functional.
2. New frequency is set by index i i+1 = i i − α(y + −y 0 ) ∆β .However, waiting for 2 or 3 invocations to be finished could be detrimental to governor performance and estimation accuracy.Therefore, we try out another scheme, which is not strict in terms of the SPSA2 definition, but takes only a single iteration: 1. Given the current frequency index i i , a random number ∆ = ±1 is generated.The load for frequencies with indexes i i ± ∆β is calculated in the assumption that the volume will remain the same, and the functional value for .In the last scheme, we also added an optimization for the performance mode.When the observed load is 100%, we modify the value of β and thus introduce a sequence of β n instead of a single value to increase frequency more substantially.If, however, this decision would prove excessive, and the high load would not last long, the next iteration of the algorithm would reduce operating frequency.
In all schemes, additional precautions are taken before setting i i+1 to properly handle corner cases.Based on the number of observations, we will further call those schemes SPSA2 3 , SPSA2 2 and SPSA2 1 respectively.
3.4.Accounting for Frequency Change Lag.The time CPU takes to change frequency could be considerable.For example, the OnDemand governor is scheduled to re-evaluate optimal frequency every 10 ms.Historically, OnDemand was developed for desktop computers and servers, and frequency switch could happen within 10 ms.However, in our preliminary research with Xiaomi Redmi Note 8 Pro, we found out that after the initial frequency change request by the OnDemand-based DVFS governor, it could take up to 3 consequent DVFS routine calls to register the updated frequency from the CPU, and the DVFS governor sends a frequency change request on every invocation.
This situation is indicative that even the most commonly available DVFS governors are based on assumptions that may not be true for the underlying hardware.
As our SPSA2 governors are also using the OnDemand governor timer model, and it is important for the algorithm to make noisy observations at the requested frequencies, we skip the DVFS routine invocations where the CPU cluster frequency is still different from the one requested previously.The algorithm continues when a proper frequency is set, and it is assumed that the load between two routine invocations was obtained at the requested frequency; thus, the noisy load observation is properly obtained.

EAS Integration.
Energy Aware Scheduling works closely with the schedutil DVFS governor, and it is stated that EAS is not guaranteed to reduce energy consumption if the schedutil is not an active governor.In fact, schedutil implements the same strategy as the OnDemand governor, that is, it selects cluster frequency proportional to the maximum load over cluster cores with the minimum frequency corresponding to 0% load and the maximum frequency corresponding to 100% load.However, the EAS CPU load definition is significantly different from the one used in OnDemand.The CPU load is estimated in terms of the EAS energy model and occupancy of all 1018 Информатика и автоматизация. 2023.Том 22 №5.ISSN 2713-3192 (печ.)ISSN 2713-3206 (онлайн) www.ia.spcras.ru_____________________________________________________________________ ЦИФРОВЫЕ ИНФОРМАЦИОННО-ТЕЛЕКОММУНИКАЦИОННЫЕ ТЕХНОЛОГИИ tasks assigned to a specific CPU core.Instead of a posterior evaluation, it is estimated a priori.
For our proposed schemes we used the OnDemand load definition even when EAS is turned on, and, as shown in the experiments, tuning α, β and L T is enough to align our SPSA2 governors with EAS.The smartphone was patched to run the herolte 5 Android 11 kernel, which could be run on the test device with the EAS scheduler.To test the default CFS scheduler, we used Samsung android_kernel_samsung_universal8890 for Android 106 .This kernel selection allows us to compare our governors' performance against commonly available Interactive, OnDemand and Schedutil governors.Our DVFS governors were added as additional modules to both of those kernels, and it was required to obtain root access for the smartphone to install custom builds.Switching to our governors was done by the default cpufreq system calls [9].

Test Cases.
As noted in our previous work [4], it is important for the general-purpose DVFS governor to be able to handle different kinds of workloads while providing a trade-off between performance and energy consumption.However, each governor is built with a particular workload model in mind, and this model is not bound to reflect real-world scenarios.
We've considered the following list of diverse test cases to be implemented.Each test case was assigned a name for further reference.
1. videoVLC -playing MP4 file using VLC video player, preliminarily selected as a default video player.
2. trialXTreme3 -imitation of playing Trial Xtreme 3. 6. twitch -watching a Twitch stream in a browser and closing the browser after a while 7 .
As the Monkeyrunner tool we used previously was not available in the toolchain for the test device, a set of utilities that mirrors its functionality to run tests was written in Python 8 .
All tests are launched 10 times for 5 minutes to average the influence of system background processes.They are implemented as Python scripts as well.To launch a test, a smartphone needs to be connected to a controlling PC, and the test execution is controlled through adb commands.
To test smartphone performance we selected Geekbench 5.5.1, as it has an open methodology we found valid for our purposes [23].The performance tests were launched 3 times for each evaluated governor due to their longevity, and an average of the obtained score points was calculated.

Modified Energy Consumption Models.
Overall, direct measurement approaches were not used in the research for this paper due to the difficulties outlined above.While the basic energy consumption model could already be used, it does not take into account idle state management.Moreover, the source code behind time-in-state files is not synchronized with the idle governors.To address this issue, we propose the following modifications to the basic model.Equation ( 2) implicitly assumes that the CPU consumes a fixed amount of power by just being at a specific frequency, regardless of the fact if there is an active computational process or not.This assumption is supported by the implementation of idle task while the CPU is in C0 (Active) state.When the OS scheduler cannot schedule an active tasks for the core, it assigns a special idle task, which consists of NOP (No Operation) instructions.C1, however, involves stopping CPU core timers, and the computational process cannot be run.
Given than, only C0 and C1 states are available on the test device, and given the fact that there could be several clusters of cores with a single frequency for all cores within a cluster, we modify Equation (2) as: 1020 Информатика и автоматизация. 2023.Том 22 № 5. ISSN 2713-3192 (печ.) ISSN 2713-3206 (онлайн) www.ia.spcras.ru where N cl is a number of core clusters, Eidle i (t) is energy consumed by a CPU for just being turned on, N f i is a number of frequencies for ith cluster, U i is a nominal voltage of cores in ith cluster, I i (f ) is a nominal current for slected frequency f in ith cluster, N cores i is a number of cores in ith cluster, t f jk is a time spent by kth core within ith cluster on jth frequency.While this model is more elaborate than the one described in (2), it could be simplified for practical reasons.In the first term, we assume that each cluster has linear energy consumption over time for just being turned on.Intuitively, it seems that to properly calculate idle state energy footprint we should split Eidle i (t) to basic CPU energy consumption and C1 footprint for each core, but base CPU and C1 idle state power profiles are not available in power_profile.xml,we omit this term from further analysis.

N coresi k=1
t f jk seems excessive, because cores of the same cluster operate at the same frequency.However, this term becomes useful when we are trying to estimate the impact of deep sleep idle states impact on energy consumption.
While idle state stops CPU core timers, it does not change the nominal core operating frequency, that is, a core operating at 1 GHz before sleep will continue to work at 1 GHz after sleep.We initially assumed that the core is put to sleep when there are not enough tasks to schedule in the system overall, so the DVFS governor sets the minimum operating frequency for the cluster.Our initial idea was to deduct idle time in C1 from the time spent by the core on the lowest frequency such as in: where t idle k is the time spent by kth core in idle states.However, this assumption proved to be wrong, as it was shown experimentally that there were cases where t f 0k < t idle k ), which means that the system put a core to sleep at a higher frequency than minimal.
While it is technically possible to modify the Android OS kernel and to track at which frequency the core was suspended, instead we assume that idle state management is independent of DVFS, but not of the OS scheduler, and a core may be put to sleep at any frequency at any time.Therefore, we propose to track idle state energy footprint by deducting idle state time from each time spent on frequency proportionally: As this approach is not precise but gives empirically good insights on actual energy expenditure, we use both original basic and modified models into report our results.

Device Preparation.
Validity of experiments and output stability was enhanced by the following protocol we have described in the previous research [7]: -All unnecessary applications were uninstalled, and all application activities were turned off for those that could not be uninstalled.
-Peripherals that were not used in a particular test case (i.e.Wi-Fi, 4G, GPS) were turned off.
-A cool-down period of 2 minutes was taken between the tests for background processes to finalize and decrease the device temperature.
-Before running a particular test case, it was run for a single time for warming-up purposes.
Battery charge levels were not homogenized before each test runs, as it was unnecessary by our energy consumption models.In fact, for the device to be controlled by a test run utility, it was connected to a PC via USB, so the battery was charging during the test runs.
The full experimental data is also available. 11 The first set of the experiments was taken under Android 11 with EAS turned on.By enumerating possible parameters we found out the best-performing settings for all SPSA algorithms: Tables 2 to 5 contain the median values for energy consumption under the basic and modified energy models.Here and in the subsequent tables we 9 https://github.com/makar-pelogeiko/herolte_Eas _Idle _modification/tree/Q-stable-spsa_gov 10 https://github.com/makar-pelogeiko/android_kernel _samsung _universal8890/tree/lineage-17.1-spsa_gov 11 https://studentspburu-my.sharepoint.com/personal/st076963_student _spbu _ru/ _layouts/15/ onedrive.aspx?id=%2Fpersonal%2Fst076963%5Fstudent%5Fspbu%5Fru%2FDocuments% 2Fdiploma%2Fresultsga=1 1022 Информатика и автоматизация. 2023.Том 22 № 5. ISSN 2713-3192 (печ.)ISSN 2713-3206 (онлайн) www.ia.spcras.ru_____________________________________________________________________ ЦИФРОВЫЕ ИНФОРМАЦИОННО-ТЕЛЕКОММУНИКАЦИОННЫЕ ТЕХНОЛОГИИ highlight scenarios where the SPSA governor energy consumption is higher than energy consumption of every baseline governor in italic and the cases where SPSA is better than some of the baseline governors in bold.Table 6 contains performance data.The first line for SPSA governors corresponds to the first set of parameters, and the second line -to the second set.Performance-wise, both existing and novel algorithms perform comparably.There are no clear outliers, and for single-core test, the difference between the best and worst performance is within 15%, while it is within 18% for the multicore scenario.We note that the first set of SPSA parameters provides better performance than the second set.SPSA2 3 performance is worsened by the fact it needs 3 observations, but not too much.
1024 Информатика и автоматизация. 2023.Том 22 № 5. ISSN 2713-3192 (печ.)ISSN 2713-3206 (онлайн) www.ia.spcras.ruWhen we look at energy consumption data, it should be noted that the modified energy model significantly changes the observed values compared to the basic model, and in some cases their ratio.The modified model, in our opinion, still better reflects data reality if direct measurement is not available, and our next analysis is based on it, even though the basic model is more complimentary to the SPSA governors.
The Interactive and OnDemand governors are consistently more power-hungry than schedutil under EAS, and, as a general rule, we definitely suggest using schedutil over other governors if EAS is available in stock firmware.
As for the SPSA governors, all of them could be used in place of schedutil.First of all, they all handle trialXTreme3 and twitch tests better than schedutil.We assume it to be related to β = 1 in our experiments, as higher β results in a more rapid gradient descent, and in a quickly changing reality of CPU load complicated by frequency switch lags relatively smooth frequency change is good for consistent CPU usage.SPSA2 1 with the first set of parameters is shown to consume more energy in most of the tests (up to 27% for type test), while being only 10% better in performance compared to schedutil, so the second set of parameters is recommended there, as performance is at the same level, while energy consumption may be up to 31% better.SPSA2 2 is the most balanced governor, as it is up to 20% better in some tests and up to 10% worse in others energy-wise, but it demonstrates a 5.6% performance increase in multicore case, which is more realistic scenario.SPSA2 3 is the most conservative of other governors energy and performancewise, but overall multicore performance is comparable to schedutil.
A special case is the videoVLC test, where the SPSA governors tend to spend more energy than schedutil.The difference is never significant (no 1025 Informatics and Automation.2023.Vol.22 No. 5. ISSN 2713-3192 (print) ISSN 2713-3206 (online) www.ia.spcras.rumore than 7.2% for SPSA2 2 ), but our explanation is that the timings of intense computations (frame decoding) are such that the SPSA governors cannot drop frequency to lower values.The only exception is SPSA2 3 , but the difference is within a margin of a statistical error.Tables 7 to 10 contain the median values for energy consumption under the basic and modified energy models for the tests run under Android 10 and CFS scheduler.The situation there is drastically different.First, schedutil cannot be used there as a benchmark as it would be running under the conditions it was not designed for.Then, the Interactive governor demonstrates a remarkable improvement in energy consumption.The reason behind such energy behavior is a change in scheduling strategy.Unlike EAS, CFS does not prioritize LITTLE cores over big ones, and in similar situations, big cores are loaded more resulting in a higher power drain.
Additionally, while overall SPSA2 3 demonstrates an acceptable energy consumption, its performance is worse, as at load levels of 99-100% it does not immediately recommend increase frequency.As other algorithms demonstrate higher overall energy consumption than Interactive, and the latter provides an acceptable performance to the end user, we did not run performance tests for the SPSA governors.

Table 1 .
Xiaomi Redmi Note 8 Pro CPU clusters frequencies CoE is cost-of-execution, E(t) is energy spent over a period of time, n ticks is a number of CPU ticks observed during that time, t is the observed period of time.The discussion on translation from integral to constant product is given in Subsection 2.4.In essence, CoE is the energy cost of one CPU tick under a specific operating frequency.

Table 2
Overall, the energy consumption dispersion is omitted in this paper, but it is available in the full experimental report.It should be noted that the values for the SPSA and schedutil governors have a visibly low dispersion in all experiments, and energy consumption could be compared by the median value only.It is not true for OnDemand and especially Interactive governors.For example, Interactive showed in the videoVLC test the minimum and 1023 Informatics and Automation.2023.Vol.22 No. 5. ISSN 2713-3192 (print) ISSN 2713-3206 (online) www.ia.spcras.rumaximumvalues under the basic energy model of 42.42 and 155.91 mAh respectively.However, it does not affect the validity of the analysis, as even the minimum energy consumption of both governors was higher than the maximum value for schedutil or SPSA governors.

Table 10 .
Energy consumption (mAh) under Android 10 with CFS scheduler (modified energy model), type, camera and twitch tests