※教你寫出一流的銷售文案?
銷售文案是什麼?A文案是廣告用的文字。舉凡任何宣傳、行銷、販賣商品時所用到的文字都是文案。在網路時代,文案成為行銷中最重要的宣傳方式,好的文案可節省大量宣傳資源,達成行銷目的。
Energy Awareness Scheduler是由ARM和Linaro開發的新的linux kernel調度器。
原先CFS調度器是基於policy進行調度,並有不同的吞吐量。例如,有一個新的task創建,同時也有一個idle cpu時,CFS始終會把新的task放到這個idle cpu上運行。但是,這樣對節省功耗來說,並不是一個最好的決定。而EAS就是為了解決這樣的問題。在不影響性能的前提下,EAS會在調度時實現節省功耗。
從SDM845開始,QTI在EAS基礎上進行了一些修改,以滿足移動市場的需要。所以QTI在EAS基礎上添加了一些feature,來獲得更好的性能和功耗。
Energy model
在dts中,針對不同的cpu平台,已定義好不同的energy model。模型主要是由【頻率,能量】的數組構成,對應了CPU和cluster不同的OOP(Operating Performance Point);同時也提供了不同idle state的能量消耗:idle cost。
CPU0: cpu@0 { device_type = "cpu"; compatible = "arm,armv8"; reg = <0x0 0x0>; enable-method = "psci"; efficiency = <1024>; cache-size = <0x8000>; cpu-release-addr = <0x0 0x90000000>; qcom,lmh-dcvs = <&lmh_dcvs0>; #cooling-cells = <2>; next-level-cache = <&L2_0>; sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>; //小核都用CPU_COST_0 CLUSTER_COST_0 。。。。。。 CPU4: cpu@400 { device_type = "cpu"; compatible = "arm,armv8"; reg = <0x0 0x400>; enable-method = "psci"; efficiency = <1740>; cache-size = <0x20000>; cpu-release-addr = <0x0 0x90000000>; qcom,lmh-dcvs = <&lmh_dcvs1>; #cooling-cells = <2>; next-level-cache = <&L2_400>; sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>; //大核都用CPU_COST_1 CLUSTER_COST_1
。。。。。。。
對應的數組如下,
energy_costs: energy-costs { compatible = "sched-energy"; CPU_COST_0: core-cost0 { busy-cost-data = < 300000 31 422400 38 499200 42 576000 46 652800 51 748800 58 825600 64 902400 70 979200 76 1056000 83 1132800 90 1209600 97 1286400 105 1363200 114 1440000 124 1516800 136 1593600 152 1651200 167 /* speedbin 0,1 */ 1670400 173 /* speedbin 2 */ 1708800 186 /* speedbin 0,1 */ 1747200 201 /* speedbin 2 */ >; idle-cost-data = < 22 18 14 12 >; }; CPU_COST_1: core-cost1 { busy-cost-data = < 300000 258 422400 260 499200 261 576000 263 652800 267 729600 272 806400 280 883200 291 960000 305 1036800 324 1113600 348 1190400 378 1267200 415 1344000 460 1420800 513 1497600 576 1574400 649 1651200 732 1728000 824 1804800 923 1881600 1027 1958400 1131 2035000 1228 /* speedbin 1,2 */ 2092000 1290 /* speedbin 1 */ 2112000 1308 /* speedbin 2 */ 2208000 1363 /* speedbin 2 */ >; idle-cost-data = < 100 80 60 40 >; }; CLUSTER_COST_0: cluster-cost0 { busy-cost-data = < 300000 3 422400 4 499200 4 576000 4 652800 5 748800 5 825600 6 902400 7 979200 7 1056000 8 1132800 9 1209600 9 1286400 10 1363200 11 1440000 12 1516800 13 1593600 15 1651200 17 /* speedbin 0,1 */ 1670400 19 /* speedbin 2 */ 1708800 21 /* speedbin 0,1 */ 1747200 23 /* speedbin 2 */ >; idle-cost-data = < 4 3 2 1 >; }; CLUSTER_COST_1: cluster-cost1 { busy-cost-data = < 300000 24 422400 24 499200 25 576000 25 652800 26 729600 27 806400 28 883200 29 960000 30 1036800 32 1113600 34 1190400 37 1267200 40 1344000 45 1420800 50 1497600 57 1574400 64 1651200 74 1728000 84 1804800 96 1881600 106 1958400 113 2035000 120 /* speedbin 1,2 */ 2092000 125 /* speedbin 1 */ 2112000 127 /* speedbin 2 */ 2208000 130 /* speedbin 2 */ >; idle-cost-data = < 4 3 2 1 >; }; }; /* energy-costs */
在代碼kernel/sched/energy.c中遍歷所有cpu,並讀取dts中的數據
for_each_possible_cpu(cpu) { cn = of_get_cpu_node(cpu, NULL); if (!cn) { pr_warn("CPU device node missing for CPU %d\n", cpu); return; } if (!of_find_property(cn, "sched-energy-costs", NULL)) { pr_warn("CPU device node has no sched-energy-costs\n"); return; } for_each_possible_sd_level(sd_level) { cp = of_parse_phandle(cn, "sched-energy-costs", sd_level); if (!cp) break; prop = of_find_property(cp, "busy-cost-data", NULL); if (!prop || !prop->value) { pr_warn("No busy-cost data, skipping sched_energy init\n"); goto out; } sge = kcalloc(1, sizeof(struct sched_group_energy), GFP_NOWAIT); if (!sge) goto out; nstates = (prop->length / sizeof(u32)) / 2; cap_states = kcalloc(nstates, sizeof(struct capacity_state), GFP_NOWAIT); if (!cap_states) { kfree(sge); goto out; } for (i = 0, val = prop->value; i < nstates; i++) { //將讀取的[freq,energy]數組存放起來 cap_states[i].cap = SCHED_CAPACITY_SCALE; cap_states[i].frequency = be32_to_cpup(val++); cap_states[i].power = be32_to_cpup(val++); } sge->nr_cap_states = nstates; //state為[freq,energy]組合個數,就是支持多少個狀態:將所有數據flatten之後,再處以2 sge->cap_states = cap_states; prop = of_find_property(cp, "idle-cost-data", NULL); if (!prop || !prop->value) { pr_warn("No idle-cost data, skipping sched_energy init\n"); kfree(sge); kfree(cap_states); goto out; } nstates = (prop->length / sizeof(u32)); idle_states = kcalloc(nstates, sizeof(struct idle_state), GFP_NOWAIT); if (!idle_states) { kfree(sge); kfree(cap_states); goto out; } for (i = 0, val = prop->value; i < nstates; i++) idle_states[i].power = be32_to_cpup(val++); //將讀取的idle cost data存放起來 sge->nr_idle_states = nstates; //idle state的個數,就是idle cost data的長度 sge->idle_states = idle_states; sge_array[cpu][sd_level] = sge; //將當前cpu獲取的energy模型存放再sge_array[cpu][sd_level]中。其中cpu就是對應哪個cpu,sd_level則對應是哪個sched_domain,也就是是cpu level還是cluster level } }
Load Tracking
QTI EAS使用的負載計算是WALT,是基於時間窗口的load統計方法,具體參考之前文章:https://www.cnblogs.com/lingjiajun/p/12317090.html
其中會跟蹤計算出2個比較關鍵的數據,就是task_util和cpu_util
當執行wakeup task placement,scheduler就會使用task utilization和CPU utilization
可以理解為將load的情況轉化為Utilization,並且將其標準化為1024的值。
task_util = demand *1024 / window_size
= (delta / window_size) * (cur_freq / max_freq) * cpu_max_capacity
—–delta是task在一個window中運行的真實時間;window_size默認是20ms;
cur_freq為cpu當前頻率;max_freq為cpu最大頻率;
Task utilization boosted = Task utilization + (1024-task_util) x boost_percent —–boost percent是使用schedtune boost時,所需要乘上的百分比
CPU utilization = 1024 x (累計的runnable均值 / window size)——–累計的runnable均值,個人理解就是rq上所有task util的總和
Task placement的主要概念:
EAS是Task placement 是EAS影響調度的主要模塊。 其主要keypoint如下:
1、EAS依靠energy model來進行精確地進行選擇CPU運行
2、使用energy model估算:把一個任務安排在一個CPU上,或者將任務從一個CPU遷移到另一個CPU上,所發生的能量變化
3、EAS會在不影響performance情況下(比如滿足滿足最低的latency),趨向於選擇消耗能量最小的CPU,去運行當前的task
4、EAS僅發生在system沒有overutilized的情況下
5、EAS的概念與QTI EAS的一樣
6、一旦系統處於overutilized,QTI EAS仍然在wake up的path下進行energy aware。不會考慮系統overutilized的情形。
補充:
※別再煩惱如何寫文案,掌握八大原則!
什麼是銷售文案服務?A就是幫你撰寫適合的廣告文案。當您需要販售商品、宣傳活動、建立個人品牌,撰寫廣告文案都是必須的工作。
overutilization,一個cpu_util大於cpu capacity的95%(sched_capacity_margin_up[cpu]),那麼就認為這個cpu處於overutilization。並且整個系統也被認為overutilizaion。
EAS核心調度算法
不同版本的EAS在不同版本下的主要task placement實現函數(針對CFS task):
Zone scheduler: select_best_cpu() QTI EAS r1.2: energy_aware_wake_cpu() QTI EAS r1.5: find_energy_efficienct_cpu()
task placement調用路徑:
QTI EAS r1.5 (Kernel 4.14) Task wake-up: try_to_wake_up() →select_task_rq_fair() →invokes find_energy_efficient_cpu() Scheduler tick occurs: scheduler_tick() →check_for_migration() →invokes find_energy_efficient_cpu() New task arrives: do_fork() →wake_up_new_task() →select_task_rq_fair() →invokes find_energy_efficient_cpu()
EAS的task placement代碼流程,主要目標是找到一個合適的cpu來運行當前這個task p。
主要代碼就是find_energy_efficient_cpu()這個函數裏面,如下:
1 /* 2 * find_energy_efficient_cpu(): Find most energy-efficient target CPU for the 3 * waking task. find_energy_efficient_cpu() looks for the CPU with maximum 4 * spare capacity in each performance domain and uses it as a potential 5 * candidate to execute the task. Then, it uses the Energy Model to figure 6 * out which of the CPU candidates is the most energy-efficient. 7 * 8 * The rationale for this heuristic is as follows. In a performance domain, 9 * all the most energy efficient CPU candidates (according to the Energy 10 * Model) are those for which we'll request a low frequency. When there are 11 * several CPUs for which the frequency request will be the same, we don't 12 * have enough data to break the tie between them, because the Energy Model 13 * only includes active power costs. With this model, if we assume that 14 * frequency requests follow utilization (e.g. using schedutil), the CPU with 15 * the maximum spare capacity in a performance domain is guaranteed to be among 16 * the best candidates of the performance domain. 17 * 18 * In practice, it could be preferable from an energy standpoint to pack 19 * small tasks on a CPU in order to let other CPUs go in deeper idle states, 20 * but that could also hurt our chances to go cluster idle, and we have no 21 * ways to tell with the current Energy Model if this is actually a good 22 * idea or not. So, find_energy_efficient_cpu() basically favors 23 * cluster-packing, and spreading inside a cluster. That should at least be 24 * a good thing for latency, and this is consistent with the idea that most 25 * of the energy savings of EAS come from the asymmetry of the system, and 26 * not so much from breaking the tie between identical CPUs. That's also the 27 * reason why EAS is enabled in the topology code only for systems where 28 * SD_ASYM_CPUCAPACITY is set. 29 * 30 * NOTE: Forkees are not accepted in the energy-aware wake-up path because 31 * they don't have any useful utilization data yet and it's not possible to 32 * forecast their impact on energy consumption. Consequently, they will be 33 * placed by find_idlest_cpu() on the least loaded CPU, which might turn out 34 * to be energy-inefficient in some use-cases. The alternative would be to 35 * bias new tasks towards specific types of CPUs first, or to try to infer 36 * their util_avg from the parent task, but those heuristics could hurt 37 * other use-cases too. So, until someone finds a better way to solve this, 38 * let's keep things simple by re-using the existing slow path. 39 */ 40 41 static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu, 42 int sync, int sibling_count_hint) 43 { 44 unsigned long prev_energy = ULONG_MAX, best_energy = ULONG_MAX; 45 struct root_domain *rd = cpu_rq(smp_processor_id())->rd; 46 int weight, cpu = smp_processor_id(), best_energy_cpu = prev_cpu; //cpu:當前執行的cpu 47 unsigned long cur_energy; 48 struct perf_domain *pd; 49 struct sched_domain *sd; 50 cpumask_t *candidates; 51 bool is_rtg; 52 struct find_best_target_env fbt_env; 53 bool need_idle = wake_to_idle(p); //是否set flag PF_WAKE_UP_IDLE 54 int placement_boost = task_boost_policy(p); //獲取task sched boost policy:none/on_big/on_all 與sched_boost、schedtune設置也有關 55 u64 start_t = 0; 56 int delta = 0; 57 int task_boost = per_task_boost(p); //僅網絡有打開該boost,這裏可以認為沒有boost 58 int boosted = (schedtune_task_boost(p) > 0) || (task_boost > 0); //查看task的schedtune有沒有打開boost 59 int start_cpu = get_start_cpu(p); //獲取從哪個cpu core開始,嘗試作為target cpu 60 61 if (start_cpu < 0) 62 goto eas_not_ready; 63 64 is_rtg = task_in_related_thread_group(p); //判斷task是否在一個group內 65 66 fbt_env.fastpath = 0; 67 68 if (trace_sched_task_util_enabled()) 69 start_t = sched_clock(); //trace log 70 71 /* Pre-select a set of candidate CPUs. */ 72 candidates = this_cpu_ptr(&energy_cpus); 73 cpumask_clear(candidates); 74 75 if (need_idle) 76 sync = 0; 77 78 if (sysctl_sched_sync_hint_enable && sync && 79 bias_to_this_cpu(p, cpu, start_cpu)) { //滿足3個調節:sync hint enable/flag:sync=1/bias to當前cpu 80 best_energy_cpu = cpu; //當前執行的cpu 81 fbt_env.fastpath = SYNC_WAKEUP; 82 goto done; 83 } 84 85 if (is_many_wakeup(sibling_count_hint) && prev_cpu != cpu && //sibling_count_hint代表有多少個thread在當前event中喚醒 86 bias_to_this_cpu(p, prev_cpu, start_cpu)) { 87 best_energy_cpu = prev_cpu; //選擇prev cpu 88 fbt_env.fastpath = MANY_WAKEUP; 89 goto done; 90 } 91 92 rcu_read_lock(); 93 pd = rcu_dereference(rd->pd); 94 if (!pd) 95 goto fail; 96 97 /* 98 * Energy-aware wake-up happens on the lowest sched_domain starting 99 * from sd_asym_cpucapacity spanning over this_cpu and prev_cpu. 100 */ 101 sd = rcu_dereference(*this_cpu_ptr(&sd_asym_cpucapacity)); 102 while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) 103 sd = sd->parent; 104 if (!sd) 105 goto fail; 106 107 sync_entity_load_avg(&p->se); //更新task所在sched_entity的PELT load 108 if (!task_util_est(p)) 109 goto unlock; 110 111 if (sched_feat(FIND_BEST_TARGET)) { //檢查FIND_BEST_TARGET這個調度特性是否打開:目前是打開的 112 fbt_env.is_rtg = is_rtg; 113 fbt_env.placement_boost = placement_boost; 114 fbt_env.need_idle = need_idle; 115 fbt_env.start_cpu = start_cpu; 116 fbt_env.boosted = boosted; 117 fbt_env.strict_max = is_rtg && 118 (task_boost == TASK_BOOST_STRICT_MAX); 119 fbt_env.skip_cpu = is_many_wakeup(sibling_count_hint) ? 120 cpu : -1; 121 122 find_best_target(NULL, candidates, p, &fbt_env); //(1)核心函數,最終是將找到的target_cpu和backup_cpu都存放進了candidates中 123 } else { 124 select_cpu_candidates(sd, candidates, pd, p, prev_cpu); 125 } 126 127 /* Bail out if no candidate was found. */ 128 weight = cpumask_weight(candidates); //判斷如果沒有找到target cpu和backup cpu時,直接goto unlock 129 if (!weight) 130 goto unlock; 131 132 /* If there is only one sensible candidate, select it now. */ 133 cpu = cpumask_first(candidates); 134 if (weight == 1 && ((schedtune_prefer_idle(p) && idle_cpu(cpu)) || //如果只找到了1個cpu,task是prefer_idle並且這個cpu也是idle的;或者cpu就是prev_cpu 135 (cpu == prev_cpu))) { 136 best_energy_cpu = cpu; //那麼就選這個cpu為【best_energy_cpu】 137 goto unlock; 138 } 139 140 #ifdef CONFIG_SCHED_WALT 141 if (p->state == TASK_WAKING) //如果是新喚醒的task,獲取task_util 142 delta = task_util(p); 143 #endif 144 if (task_placement_boost_enabled(p) || need_idle || boosted || //滿足一下條件之一,那麼第一個candidate cpu就作為【best_energy_cpu】不再考慮計算energy 145 is_rtg || __cpu_overutilized(prev_cpu, delta) || //打開了sched_boost、need_idle(PF_WAKE_UP_IDLE)、開了schedtune boost、related_thread_group限制使用小核、prev_cpu+delta沒有overutil、 146 !task_fits_max(p, prev_cpu) || cpu_isolated(prev_cpu)) { //p放在prev_cpu上會misfit、prev_cpu處於isolated 147 best_energy_cpu = cpu; 148 goto unlock; 149 } 150 151 if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) //根據prev_cpu是否在task p的cpuset範圍內 152 prev_energy = best_energy = compute_energy(p, prev_cpu, pd); //(2)在範圍內,則計算p在prev_cpu上的energy 153 else 154 prev_energy = best_energy = ULONG_MAX; //不在範圍內,energy就設為最大,說明prev_cpu不考慮作為best_energy_cpu了 155 156 /* Select the best candidate energy-wise. */ //通過比較energy,挑選出best_energy_cpu、best_energy 157 for_each_cpu(cpu, candidates) { 158 if (cpu == prev_cpu) //過濾prev_cpu 159 continue; 160 cur_energy = compute_energy(p, cpu, pd); //計算p遷移到candidate cpu上的energy 161 trace_sched_compute_energy(p, cpu, cur_energy, prev_energy, 162 best_energy, best_energy_cpu); 163 if (cur_energy < best_energy) { 164 best_energy = cur_energy; 165 best_energy_cpu = cpu; 166 } else if (cur_energy == best_energy) { 167 if (select_cpu_same_energy(cpu, best_energy_cpu, //當candidate cpu的energy與best_cpu一樣的話,怎麼選 168 prev_cpu)) { 169 best_energy = cur_energy; 170 best_energy_cpu = cpu; 171 } 172 } 173 } 174 unlock: 175 rcu_read_unlock(); 176 177 /* 178 * Pick the prev CPU, if best energy CPU can't saves at least 6% of 179 * the energy used by prev_cpu. 180 */ 181 if ((prev_energy != ULONG_MAX) && (best_energy_cpu != prev_cpu) && //找到了非prev_cpu的best_energy_cpu、且省電下來的energy要大於在prev_energy上的6%,那麼best_energy_cpu則滿足條件;否則仍然使用prev_cpu 182 ((prev_energy - best_energy) <= prev_energy >> 4)) //這裏巧妙地使用了位移:右移1位代表÷2,所以prev_energy/2/2/2/2 = prev_energy*6% 183 best_energy_cpu = prev_cpu; 184 185 done: 186 187 trace_sched_task_util(p, cpumask_bits(candidates)[0], best_energy_cpu, 188 sync, need_idle, fbt_env.fastpath, placement_boost, 189 start_t, boosted, is_rtg, get_rtg_status(p), start_cpu); 190 191 return best_energy_cpu; 192 193 fail: 194 rcu_read_unlock(); 195 eas_not_ready: 196 return -1; 197 }
(1)find_best_target()
1 static void find_best_target(struct sched_domain *sd, cpumask_t *cpus, 2 struct task_struct *p, 3 struct find_best_target_env *fbt_env) 4 { 5 unsigned long min_util = boosted_task_util(p); //獲取p的boosted_task_util 6 unsigned long target_capacity = ULONG_MAX; 7 unsigned long min_wake_util = ULONG_MAX; 8 unsigned long target_max_spare_cap = 0; 9 unsigned long best_active_util = ULONG_MAX; 10 unsigned long best_active_cuml_util = ULONG_MAX; 11 unsigned long best_idle_cuml_util = ULONG_MAX; 12 bool prefer_idle = schedtune_prefer_idle(p); //獲取task prefer_idle配置 13 bool boosted = fbt_env->boosted; 14 /* Initialise with deepest possible cstate (INT_MAX) */ 15 int shallowest_idle_cstate = INT_MAX; 16 struct sched_domain *start_sd; 17 struct sched_group *sg; 18 int best_active_cpu = -1; 19 int best_idle_cpu = -1; 20 int target_cpu = -1; 21 int backup_cpu = -1; 22 int i, start_cpu; 23 long spare_wake_cap, most_spare_wake_cap = 0; 24 int most_spare_cap_cpu = -1; 25 int prev_cpu = task_cpu(p); 26 bool next_group_higher_cap = false; 27 int isolated_candidate = -1; 28 29 /* 30 * In most cases, target_capacity tracks capacity_orig of the most 31 * energy efficient CPU candidate, thus requiring to minimise 32 * target_capacity. For these cases target_capacity is already 33 * initialized to ULONG_MAX. 34 * However, for prefer_idle and boosted tasks we look for a high 35 * performance CPU, thus requiring to maximise target_capacity. In this 36 * case we initialise target_capacity to 0. 37 */ 38 if (prefer_idle && boosted) 39 target_capacity = 0; 40 41 if (fbt_env->strict_max) 42 most_spare_wake_cap = LONG_MIN; 43 44 /* Find start CPU based on boost value */ 45 start_cpu = fbt_env->start_cpu; 46 /* Find SD for the start CPU */ 47 start_sd = rcu_dereference(per_cpu(sd_asym_cpucapacity, start_cpu)); //找到start cpu所在的sched domain,sd_asym_cpucapacity表示是非對稱cpu capacity級別,應該就是DIE level,所以domain是cpu0-7 48 if (!start_sd) 49 goto out; 50 51 /* fast path for prev_cpu */ 52 if (((capacity_orig_of(prev_cpu) == capacity_orig_of(start_cpu)) || //prev cpu和start cpu的當前max_policy_freq下的capacity相等 53 asym_cap_siblings(prev_cpu, start_cpu)) && 54 !cpu_isolated(prev_cpu) && cpu_online(prev_cpu) && 55 idle_cpu(prev_cpu)) { 56 57 if (idle_get_state_idx(cpu_rq(prev_cpu)) <= 1) { //prev cpu idle state的index <1,說明休眠不深 58 target_cpu = prev_cpu; 59 60 fbt_env->fastpath = PREV_CPU_FASTPATH; 61 goto target; 62 } 63 } 64 65 /* Scan CPUs in all SDs */ 66 sg = start_sd->groups; 67 do { //do-while循環,針對start cpu的調度域中的所有調度組進行遍歷,由於domain是cpu0-7,那麼調度組就是2個大小cluster:cpu0-3,cpu4-7 68 for_each_cpu_and(i, &p->cpus_allowed, sched_group_span(sg)) { //尋找task允許的cpuset和調度組可用cpu範圍內 69 unsigned long capacity_curr = capacity_curr_of(i); //當前freq的cpu_capacity 70 unsigned long capacity_orig = capacity_orig_of(i); //當前max_policy_freq的cpu_capacity, >=capacity_curr 71 unsigned long wake_util, new_util, new_util_cuml; 72 long spare_cap; 73 int idle_idx = INT_MAX; 74 75 trace_sched_cpu_util(i); 76 77 if (!cpu_online(i) || cpu_isolated(i)) //cpu處於非online,或者isolate狀態,則直接不考慮 78 continue; 79 80 if (isolated_candidate == -1) 81 isolated_candidate = i; 82 83 /* 84 * This CPU is the target of an active migration that's 85 * yet to complete. Avoid placing another task on it. 86 * See check_for_migration() 87 */ 88 if (is_reserved(i)) //已經有task要遷移到上面,但是還沒有遷移完成。所以這樣的cpu不考慮 89 continue; 90 91 if (sched_cpu_high_irqload(i)) //高irq load的cpu不考慮。irq load可以參考之前WALT文章:https://www.cnblogs.com/lingjiajun/p/12317090.html 92 continue; 93 94 if (fbt_env->skip_cpu == i) //當前活動的cpu是否有很多event一起wakeup,如果有,那麼也不考慮該cpu 95 continue; 96 97 /* 98 * p's blocked utilization is still accounted for on prev_cpu 99 * so prev_cpu will receive a negative bias due to the double 100 * accounting. However, the blocked utilization may be zero. 101 */ 102 wake_util = cpu_util_without(i, p); //計算沒有除了p以外的cpu_util(p不在該cpu rq的情況下,實際就是當前cpu_util) 103 new_util = wake_util + task_util_est(p); //計算cpu_util + p的task_util(p的task_util就是walt統計的demand_scaled) 104 spare_wake_cap = capacity_orig - wake_util; //剩餘的capacity = capacity_orig - p以外的cpu_util 105 106 if (spare_wake_cap > most_spare_wake_cap) { 107 most_spare_wake_cap = spare_wake_cap; //在循環中,找到有剩餘capacity最多(最空閑)的cpu = i,並保存剩餘的capacity 108 most_spare_cap_cpu = i; 109 } 110 111 if (per_task_boost(cpu_rq(i)->curr) == //cpu【i】當前running_task的task_boost == TASK_BOOST_STRICT_MAX,那麼不適合作為tager_cpu 112 TASK_BOOST_STRICT_MAX) 113 continue; 114 /* 115 * Cumulative demand may already be accounting for the 116 * task. If so, add just the boost-utilization to 117 * the cumulative demand of the cpu. 118 */ 119 if (task_in_cum_window_demand(cpu_rq(i), p)) //計算新的cpu【i】的cpu_util_cum = cpu_util_cum + p的boosted_task_util 120 new_util_cuml = cpu_util_cum(i, 0) + //特別地,如果p已經在cpu【i】的rq中,或者p的部分demand被統計在了walt中。那麼防止統計2次,所以要減去p的task_util(denamd_scaled) 121 min_util - task_util(p); 122 else 123 new_util_cuml = cpu_util_cum(i, 0) + min_util; 124 125 /* 126 * Ensure minimum capacity to grant the required boost. 127 * The target CPU can be already at a capacity level higher 128 * than the one required to boost the task. 129 */ 130 new_util = max(min_util, new_util); //取 p的booted_task_util、加入p之後的cpu_util,之間的較大值 131 if (new_util > capacity_orig) //與capacity_orig比較,大於capacity_orig的情況下,不適合作為target_cpu 132 continue; 133 134 /* 135 * Pre-compute the maximum possible capacity we expect 136 * to have available on this CPU once the task is 137 * enqueued here. 138 */ 139 spare_cap = capacity_orig - new_util; //預計算當p遷移到cpu【i】上后,剩餘的可能最大capacity 140 141 if (idle_cpu(i)) //判斷當前cpu【i】是否處於idle,並獲取idle index(idle的深度) 142 idle_idx = idle_get_state_idx(cpu_rq(i)); 143 144 145 /* 146 * Case A) Latency sensitive tasks 147 * 148 * Unconditionally favoring tasks that prefer idle CPU to 149 * improve latency. 150 * 151 * Looking for: 152 * - an idle CPU, whatever its idle_state is, since 153 * the first CPUs we explore are more likely to be 154 * reserved for latency sensitive tasks. 155 * - a non idle CPU where the task fits in its current 156 * capacity and has the maximum spare capacity. 157 * - a non idle CPU with lower contention from other 158 * tasks and running at the lowest possible OPP. 159 * 160 * The last two goals tries to favor a non idle CPU 161 * where the task can run as if it is "almost alone". 162 * A maximum spare capacity CPU is favoured since 163 * the task already fits into that CPU's capacity 164 * without waiting for an OPP chance. 165 * 166 * The following code path is the only one in the CPUs 167 * exploration loop which is always used by 168 * prefer_idle tasks. It exits the loop with wither a 169 * best_active_cpu or a target_cpu which should 170 * represent an optimal choice for latency sensitive 171 * tasks. 172 */ 173 if (prefer_idle) { //對lantency有要求的task 174 /* 175 * Case A.1: IDLE CPU 176 * Return the best IDLE CPU we find: 177 * - for boosted tasks: the CPU with the highest 178 * performance (i.e. biggest capacity_orig) 179 * - for !boosted tasks: the most energy 180 * efficient CPU (i.e. smallest capacity_orig) 181 */ 182 if (idle_cpu(i)) { //如果cpu【i】是idle的 183 if (boosted && 184 capacity_orig < target_capacity) //對於boosted task,cpu需要選擇最大capacity_orig,不滿足要continue 185 continue; 186 if (!boosted && 187 capacity_orig > target_capacity) //對於非boosted task,cpu選擇最小capacity_orig,不滿足要continue 188 continue; 189 /* 190 * Minimise value of idle state: skip 191 * deeper idle states and pick the 192 * shallowest. 193 */ 194 if (capacity_orig == target_capacity && 195 sysctl_sched_cstate_aware && 196 idle_idx >= shallowest_idle_cstate) //包括下面的continue,都是為了挑選出處於idle最淺的cpu 197 continue; 198 199 target_capacity = capacity_orig; 200 shallowest_idle_cstate = idle_idx; 201 best_idle_cpu = i; //選出【prefer_idle】best_idle_cpu 202 continue; 203 } 204 if (best_idle_cpu != -1) //過濾上面已經找到best_idle_cpu的情況,不需要走下面流程了 205 continue; 206 207 /* 208 * Case A.2: Target ACTIVE CPU 209 * Favor CPUs with max spare capacity. 210 */ 211 if (capacity_curr > new_util && 212 spare_cap > target_max_spare_cap) { //找到capacity_curr滿足包含進程p的cpu_util,並且找到空閑capacity最多的那個cpu 213 target_max_spare_cap = spare_cap; 214 target_cpu = i; //選出【prefer_idle】target_cpu 215 continue; 216 } 217 if (target_cpu != -1) //如果cpu條件不滿足,則continue,繼續找target_cpu 218 continue; 219 220 221 /* 222 * Case A.3: Backup ACTIVE CPU 223 * Favor CPUs with: 224 * - lower utilization due to other tasks 225 * - lower utilization with the task in 226 */ 227 if (wake_util > min_wake_util) //找出除了p以外的cpu_util最小的cpu 228 continue; 229 230 /* 231 * If utilization is the same between CPUs, 232 * break the ties with WALT's cumulative 233 * demand 234 */ 235 if (new_util == best_active_util && 236 new_util_cuml > best_active_cuml_util) //如果包含p的cpu_util相等,那麼就挑選cpu_util_cum + p的boosted_task_util最小的那個cpu 237 continue; 238 min_wake_util = wake_util; 239 best_active_util = new_util; 240 best_active_cuml_util = new_util_cuml; 241 best_active_cpu = i; //選出【prefer_idle】best_active_cpu 242 continue; 243 } 244 245 /* 246 * Skip processing placement further if we are visiting 247 * cpus with lower capacity than start cpu 248 */ 249 if (capacity_orig < capacity_orig_of(start_cpu)) //cpu【i】capacity_orig < 【start_cpu】capacity_orig的不考慮 250 continue; 251 252 /* 253 * Case B) Non latency sensitive tasks on IDLE CPUs. 254 * 255 * Find an optimal backup IDLE CPU for non latency 256 * sensitive tasks. 257 * 258 * Looking for: 259 * - minimizing the capacity_orig, 260 * i.e. preferring LITTLE CPUs 261 * - favoring shallowest idle states 262 * i.e. avoid to wakeup deep-idle CPUs 263 * 264 * The following code path is used by non latency 265 * sensitive tasks if IDLE CPUs are available. If at 266 * least one of such CPUs are available it sets the 267 * best_idle_cpu to the most suitable idle CPU to be 268 * selected. 269 * 270 * If idle CPUs are available, favour these CPUs to 271 * improve performances by spreading tasks. 272 * Indeed, the energy_diff() computed by the caller67jkkk 273 * will take care to ensure the minimization of energy 274 * consumptions without affecting performance. 275 */ //對latency要求不高的task,並要求idle cpu作為target的情況 276 if (idle_cpu(i)) { //判斷cpu【i】是否idle 277 /* 278 * Prefer shallowest over deeper idle state cpu, 279 * of same capacity cpus. 280 */ 281 if (capacity_orig == target_capacity && //選出capacity相同情況下,idle最淺的cpu 282 sysctl_sched_cstate_aware && 283 idle_idx > shallowest_idle_cstate) 284 continue; 285 286 if (shallowest_idle_cstate == idle_idx && 287 target_capacity == capacity_orig && 288 (best_idle_cpu == prev_cpu || 289 (i != prev_cpu && 290 new_util_cuml > best_idle_cuml_util))) //best_idle_cpu非prev_cpu,並且挑選cpu_util_cum + p的boosted_task_util最小的 291 continue; 292 293 target_capacity = capacity_orig; 294 shallowest_idle_cstate = idle_idx; 295 best_idle_cuml_util = new_util_cuml; 296 best_idle_cpu = i; //選出【normal-idle】best_idle_cpu 297 continue; 298 } 299 300 /* 301 * Consider only idle CPUs for active migration. 302 */ 303 if (p->state == TASK_RUNNING) //task p正在運行說明是misfit task,只考慮idle cpu作為target,不進行下面流程 304 continue; 305 306 /* 307 * Case C) Non latency sensitive tasks on ACTIVE CPUs. 308 * 309 * Pack tasks in the most energy efficient capacities. 310 * 311 * This task packing strategy prefers more energy 312 * efficient CPUs (i.e. pack on smaller maximum 313 * capacity CPUs) while also trying to spread tasks to 314 * run them all at the lower OPP. 315 * 316 * This assumes for example that it's more energy 317 * efficient to run two tasks on two CPUs at a lower 318 * OPP than packing both on a single CPU but running 319 * that CPU at an higher OPP. 320 * 321 * Thus, this case keep track of the CPU with the 322 * smallest maximum capacity and highest spare maximum 323 * capacity. 324 */ //對latency要求不高,並需要ACTIVE cpu作為target的情況 325 326 /* Favor CPUs with maximum spare capacity */ 327 if (spare_cap < target_max_spare_cap) //找到遷移p之後,剩餘capacity最多的cpu 328 continue; 329 330 target_max_spare_cap = spare_cap; 331 target_capacity = capacity_orig; 332 target_cpu = i; //找出【normal-ACTIVe】的target_cpu 333 } //到此就是一個調度組(cluster)內cpu的循環查找 334 335 next_group_higher_cap = (capacity_orig_of(group_first_cpu(sg)) < 336 capacity_orig_of(group_first_cpu(sg->next))); //嘗試查找下一個capacity更大的big cluster 337 338 /* 339 * If we've found a cpu, but the boost is ON_ALL we continue 340 * visiting other clusters. If the boost is ON_BIG we visit 341 * next cluster if they are higher in capacity. If we are 342 * not in any kind of boost, we break. 343 * 344 * And always visit higher capacity group, if solo cpu group 345 * is not in idle. 346 */ 347 if (!prefer_idle && !boosted && //上面找到cpu但是boost=ON_ALL,那麼還要查找其他cluster 348 ((target_cpu != -1 && (sg->group_weight > 1 || //上面找到cpu但是boost=ON_BIG,那麼還要在capacity更大的cluster中查找 349 !next_group_higher_cap)) || //上面找到了cpu,並且不在任何boost。那麼break 350 best_idle_cpu != -1) && //如果上面group中,沒有cpu是idle,那麼always在capacity更大的cluster中查找 351 (fbt_env->placement_boost == SCHED_BOOST_NONE || 352 !is_full_throttle_boost() || 353 (fbt_env->placement_boost == SCHED_BOOST_ON_BIG && 354 !next_group_higher_cap))) 355 break; 356 357 /* 358 * if we are in prefer_idle and have found an idle cpu, 359 * break from searching more groups based on the stune.boost and 360 * group cpu capacity. For !prefer_idle && boosted case, don't 361 * iterate lower capacity CPUs unless the task can't be 362 * accommodated in the higher capacity CPUs. 363 */ 364 if ((prefer_idle && best_idle_cpu != -1) || //如果設置了prefer_idle,並且找到了一個idle cpu;根據schedtune是否打開boost和是否有更大capacity的cluster進行判斷是否break 365 (boosted && (best_idle_cpu != -1 || target_cpu != -1 || //沒有prefer_idle,但是打開boost的情況,除非high capacity的cpu不能接受task,否則不用再遍歷low capacity的cpu 366 (fbt_env->strict_max && most_spare_cap_cpu != -1)))) { 367 if (boosted) { 368 if (!next_group_higher_cap) 369 break; 370 } else { 371 if (next_group_higher_cap) 372 break; 373 } 374 } 375 376 } while (sg = sg->next, sg != start_sd->groups); 377 378 adjust_cpus_for_packing(p, &target_cpu, &best_idle_cpu, //計算將task放在target_cpu時,在考慮20%的余量,和sched_load_boost之後,看capacity是否滿足target_cpu當前freq的capacity 379 shallowest_idle_cstate, //另外檢查rtg,看是否不考慮idle cpu 380 fbt_env, boosted); 381 382 /* 383 * For non latency sensitive tasks, cases B and C in the previous loop, 384 * we pick the best IDLE CPU only if we was not able to find a target 385 * ACTIVE CPU. //latency要求不高的task選擇cpu優先級:ACTIVE cpu > idle cpu;沒有ACITVE,則選idle cpu 386 * 387 * Policies priorities: 388 * 389 * - prefer_idle tasks: //prefer_idle的task選擇cpu優先級:idle cpu > ACTIVE cpu(包含task之後又更多spare capacity) > ACTIVE cpu(更小cpu_util+boosted_task_util) 390 * 391 * a) IDLE CPU available: best_idle_cpu 392 * b) ACTIVE CPU where task fits and has the bigger maximum spare 393 * capacity (i.e. target_cpu) 394 * c) ACTIVE CPU with less contention due to other tasks 395 * (i.e. best_active_cpu) 396 * 397 * - NON prefer_idle tasks: //非prefer_idle的task選擇cpu優先級:ACTIVE cpu > idle cpu 398 * 399 * a) ACTIVE CPU: target_cpu 400 * b) IDLE CPU: best_idle_cpu 401 */ 402 403 if (prefer_idle && (best_idle_cpu != -1)) { //prefer_idle的task,直接選擇best_idle_cpu作為target 404 target_cpu = best_idle_cpu; 405 goto target; 406 } 407 408 if (target_cpu == -1) //假如target沒有找到,那麼重新找target: 409 target_cpu = prefer_idle 410 ? best_active_cpu //1、prefer_idle的task選擇best_active_cpu; 411 : best_idle_cpu; //2、而非prefer_idle的task選擇best_idle_cpu 412 else 413 backup_cpu = prefer_idle //假如找到了target,那麼再選backup_cpu: 414 ? best_active_cpu //1、prefer_idle的task選擇 best_active_cpu 415 : best_idle_cpu; //2、非prefer_idle的task選擇 best_idle_cpu 416 417 if (target_cpu == -1 && most_spare_cap_cpu != -1 && 418 /* ensure we use active cpu for active migration */ //active migration(misfit task遷移)情況只選擇active cpu 419 !(p->state == TASK_RUNNING && !idle_cpu(most_spare_cap_cpu))) 420 target_cpu = most_spare_cap_cpu; 421 422 if (target_cpu == -1 && isolated_candidate != -1 && //假如沒有找到target_cpu,prev_cpu又處於isolated,而task允許的所有cpu中有online並且unisolated的 423 cpu_isolated(prev_cpu)) 424 target_cpu = isolated_candidate; //那麼就選擇最後一個online並unisolated的cpu作為target 425 426 if (backup_cpu >= 0) 427 cpumask_set_cpu(backup_cpu, cpus); //將backup_cpu存放進cpus中 428 if (target_cpu >= 0) { 429 target: 430 cpumask_set_cpu(target_cpu, cpus); //將找出的target cpu存放進cpus中 431 } 432 433 out: 434 trace_sched_find_best_target(p, prefer_idle, min_util, start_cpu, 435 best_idle_cpu, best_active_cpu, 436 most_spare_cap_cpu, 437 target_cpu, backup_cpu); 438 }
(2)計算energy
/* * compute_energy(): Estimates the energy that would be consumed if @p was * migrated to @dst_cpu. compute_energy() predicts what will be the utilization * landscape of the * CPUs after the task migration, and uses the Energy Model * to compute what would be the energy if we decided to actually migrate that * task. */ static long compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) { long util, max_util, sum_util, energy = 0; int cpu; for (; pd; pd = pd->next) { max_util = sum_util = 0; /* * The capacity state of CPUs of the current rd can be driven by * CPUs of another rd if they belong to the same performance * domain. So, account for the utilization of these CPUs too * by masking pd with cpu_online_mask instead of the rd span. * * If an entire performance domain is outside of the current rd, * it will not appear in its pd list and will not be accounted * by compute_energy(). */ for_each_cpu_and(cpu, perf_domain_span(pd), cpu_online_mask) { //在perf domain的cpu中找出online的 #ifdef CONFIG_SCHED_WALT util = cpu_util_next_walt(cpu, p, dst_cpu); //計算遷移task p之後,每個cpu的util情況 #else util = cpu_util_next(cpu, p, dst_cpu); util += cpu_util_rt(cpu_rq(cpu)); util = schedutil_energy_util(cpu, util); #endif max_util = max(util, max_util); //找到perf domain中cpu util最大的值(同perf domain,即cluster,最大的util決定了freq的設定) sum_util += util; //統計遷移之後,perf domain內的總util } energy += em_pd_energy(pd->em_pd, max_util, sum_util); //計算perf domain的energy,並累計大小cluster的energy,就是整個系統energy } return energy; }
獲取perf domain內的energy,在其中有2個重要的結構體:
/** * em_cap_state - Capacity state of a performance domain * @frequency: The CPU frequency in KHz, for consistency with CPUFreq * @power: The power consumed by 1 CPU at this level, in milli-watts * @cost: The cost coefficient associated with this level, used during * energy calculation. Equal to: power * max_frequency / frequency */ struct em_cap_state { unsigned long frequency; unsigned long power; unsigned long cost; }; /** * em_perf_domain - Performance domain * @table: List of capacity states, in ascending order * @nr_cap_states: Number of capacity states * @cpus: Cpumask covering the CPUs of the domain * * A "performance domain" represents a group of CPUs whose performance is * scaled together. All CPUs of a performance domain must have the same * micro-architecture. Performance domains often have a 1-to-1 mapping with * CPUFreq policies. */ struct em_perf_domain { struct em_cap_state *table; int nr_cap_states; unsigned long cpus[0]; };
em_pd_energy函數可以得到perf domain的energy。
/** * em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain * @pd : performance domain for which energy has to be estimated * @max_util : highest utilization among CPUs of the domain * @sum_util : sum of the utilization of all CPUs in the domain * * Return: the sum of the energy consumed by the CPUs of the domain assuming * a capacity state satisfying the max utilization of the domain. */ static inline unsigned long em_pd_energy(struct em_perf_domain *pd, unsigned long max_util, unsigned long sum_util) { unsigned long freq, scale_cpu; struct em_cap_state *cs; int i, cpu; if (!sum_util) return 0; /* * In order to predict the capacity state, map the utilization of the * most utilized CPU of the performance domain to a requested frequency, * like schedutil. */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(NULL, cpu); //獲取cpu的max_capacity cs = &pd->table[pd->nr_cap_states - 1]; //獲取capacity state,是為了獲取最大頻點(因為cs的table是升序排列的,所以最後一個配置就是最大的頻點) freq = map_util_freq(max_util, cs->frequency, scale_cpu); //利用上面獲取的最大頻點、max_capacity,根據當前的cpu util映射到當前的cpu freq /* * Find the lowest capacity state of the Energy Model above the * requested frequency. */ for (i = 0; i < pd->nr_cap_states; i++) { //通過循環找到能滿足當前cpu freq的最小的頻點,及其對應的capacity state cs = &pd->table[i]; //同樣因為cs的table是升序排列的,所以遞增找到第一個滿足的,就是滿足條件的最小頻點 if (cs->frequency >= freq) break; } /* * The capacity of a CPU in the domain at that capacity state (cs) * can be computed as: * * cs->freq * scale_cpu * cs->cap = -------------------- (1) * cpu_max_freq * * So, ignoring the costs of idle states (which are not available in * the EM), the energy consumed by this CPU at that capacity state is * estimated as: * * cs->power * cpu_util * cpu_nrg = -------------------- (2) * cs->cap * * since 'cpu_util / cs->cap' represents its percentage of busy time. * * NOTE: Although the result of this computation actually is in * units of power, it can be manipulated as an energy value * over a scheduling period, since it is assumed to be * constant during that interval. * * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product * of two terms: * * cs->power * cpu_max_freq cpu_util * cpu_nrg = ------------------------ * --------- (3) * cs->freq scale_cpu * * The first term is static, and is stored in the em_cap_state struct * as 'cs->cost'. * * Since all CPUs of the domain have the same micro-architecture, they * share the same 'cs->cost', and the same CPU capacity. Hence, the * total energy of the domain (which is the simple sum of the energy of * all of its CPUs) can be factorized as: * * cs->cost * \Sum cpu_util * pd_nrg = ------------------------ (4) * scale_cpu */ return cs->cost * sum_util / scale_cpu; //通過上面的註釋以及公式,推導出energy計算公式,並計算出perf doamin的總energy }
總結
1、find_best_target()函數主要是根據當前情況,找到task遷移的candidate cpu(target_cpu、backup cpu、prev_cpu)
具體邏輯:
prefer_idle:
best_idle_cpu:必須選擇idle狀態的cpu
—【task打開boost,選大核cpu】 && 【idle state最淺】
—【task沒有打開boost,選小核cpu】 && 【idle state最淺】
target_cpu:必選選擇ACTIVE狀態的cpu
—【當前freq的cpu_capacity > 遷移task后的cpu_util】 && 【遷移task之後,剩餘capacity最多的cpu】
best_active_cpu:必選選擇ACTIVE狀態的cpu
—【當前cpu_util更小的cpu】 && 【遷移task之後的cpu_util相等的話,選擇cpu_util_cum + boosted_task_util】
normal:
該cpu的capacity_orig > start_cpu的capacity_orig(只會往更大的cluster中尋找)
best_idle_cpu:必須選擇idle狀態的cpu
—【capacity相同情況下,idle最淺的cpu】 && 【選擇cpu_util_cum + boosted_task_util中最小的】 && 【不能是prev_cpu】
非misfit task遷移的情況下,還要選出target_cpu
target_cpu:必選選擇ACTIVE狀態的cpu
—【遷移task之後,剩餘capacity最多的cpu】】
2、在find_energy_efficient_cpu()後半段,計算task遷移到每個candidate cpu后的系統總energy。計算出的最小energy假如比prev_cpu的總energy少6%以上,那麼這個cpu就是best_energy_cpu。
後續:在energy model與energy計算,目前還未弄清楚如何聯繫起來,後續需要找到如何聯繫。
本站聲明:網站內容來源於博客園,如有侵權,請聯繫我們,我們將及時處理
※廣告預算用在刀口上,台北網頁設計公司幫您達到更多曝光效益
擁有後台管理系統的網站,將擁有強大的資料管理與更新功能,幫助您隨時新增網站的內容並節省網站開發的成本。