 Balancing Efficiency and Flexibility: Cost of Abstractions in Embedded Systems<_ZN4CPin5resetEv>: ... 080001b0 <_ZN4CPin3setEv>: ...Dynamic Polymorphism without de-virtualization Load the vtable address in r3. 43 Balancing Efficiency and Flexibility: Cost of Abstractions in Embedded Systems<_ZN4CPin5resetEv>: ... 080001b0 <_ZN4CPin3setEv>: ...Dynamic Polymorphism without de-virtualization Load the vtable address in r3. 43- : push {r0, r1, r2, lr} ldr r3, [pc, #28] ; (80001e4 : 0x080001b1 0x0800025c : 0x08000199 CPin vtableDynamic Polymorphism without de-virtualization Load the address of the set method from the vtable into r3. 44 - : push {r0, r1, r2, lr} ldr 0 码力 | 75 页 | 2.12 MB | 6 月前3
 Rethinking Task Based Concurrency and Parallelism for Low Latency C++used to mitigate contention but this introduces a myriad of new problems: ○ Task starvation ○ Load balancing ○ Forfeits strict FIFO behaviour ○ Increases memory footprint (or requires allocations) ○ Terrible aging ● Multiple queues for different priority also works but: ○ Scheduling ○ Task starvation, load balancing, work stealing Task Thread Thread Thread Task Task Task Task Task Task Task Back Front0 码力 | 142 页 | 2.80 MB | 6 月前3 Rethinking Task Based Concurrency and Parallelism for Low Latency C++used to mitigate contention but this introduces a myriad of new problems: ○ Task starvation ○ Load balancing ○ Forfeits strict FIFO behaviour ○ Increases memory footprint (or requires allocations) ○ Terrible aging ● Multiple queues for different priority also works but: ○ Scheduling ○ Task starvation, load balancing, work stealing Task Thread Thread Thread Task Task Task Task Task Task Task Back Front0 码力 | 142 页 | 2.80 MB | 6 月前3
 Combining Co-Routines and Functions into a Job Systemtake jobs out of queue Main Thread P P P P parallel_for() Main Thread I/O or Render Thread (load / store data from disk) Blocking waitHelmut Hlavacs – Combining Co-Routines and Functions into a Main thread P CD CD CD CD CR CR CR CR CR P I I I I I UI N I/O or Render Thread (load / store data from disk) P Logic AI parallel_for() Do we really need these extra threads? Main Logic AI Collision Detection Collision Response Integrate parallel_for() I/O or Render Thread (load / store data from disk) Blocking wait Blocking waitHelmut Hlavacs – Combining Co-Routines and Functions0 码力 | 39 页 | 1.23 MB | 6 月前3 Combining Co-Routines and Functions into a Job Systemtake jobs out of queue Main Thread P P P P parallel_for() Main Thread I/O or Render Thread (load / store data from disk) Blocking waitHelmut Hlavacs – Combining Co-Routines and Functions into a Main thread P CD CD CD CD CR CR CR CR CR P I I I I I UI N I/O or Render Thread (load / store data from disk) P Logic AI parallel_for() Do we really need these extra threads? Main Logic AI Collision Detection Collision Response Integrate parallel_for() I/O or Render Thread (load / store data from disk) Blocking wait Blocking waitHelmut Hlavacs – Combining Co-Routines and Functions0 码力 | 39 页 | 1.23 MB | 6 月前3
 The Beauty and Power of Primitive C++generation • Generated code is more likely to be correct than hand-written code • Good design rests on balancing concerns and making tradeoffs Stroustrup - "Primitive" - CppCon 2020 52Why isn’t “Flats” a standards0 码力 | 53 页 | 1.03 MB | 6 月前3 The Beauty and Power of Primitive C++generation • Generated code is more likely to be correct than hand-written code • Good design rests on balancing concerns and making tradeoffs Stroustrup - "Primitive" - CppCon 2020 52Why isn’t “Flats” a standards0 码力 | 53 页 | 1.03 MB | 6 月前3
 The Shapes of Multidimensional ArraysDesign EDSL Extents Going beyond Conclusion Software design Software architecture The art of balancing and compromising between genericity, performance, and expressivity. Design goals Genericity: cover0 码力 | 62 页 | 1.38 MB | 6 月前3 The Shapes of Multidimensional ArraysDesign EDSL Extents Going beyond Conclusion Software design Software architecture The art of balancing and compromising between genericity, performance, and expressivity. Design goals Genericity: cover0 码力 | 62 页 | 1.38 MB | 6 月前3
 Adventures in SIMD Thinking (Part 1 of 2)E W B C O M P U T I N G 4 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE rf_512 load_value(float fill) { return _mm512_set1_ps(v); } 2.3 2.3 2.3 2.3 2.3 2.3 2.3 E W B C O M P U T I N G 5 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE ri_512 load_value(int32_t fill) { return _mm512_set1_epi32(i); } 7 7 7 7 7 7 7 7 7 7 7 7 E W B C O M P U T I N G 6 CppCon 2020 - Adventures in SIMD Thinking Function load_from() KEWB_FORCE_INLINE rf_512 load_from(float const* psrc) { return _mm512_loadu_ps(psrc); } m0 m1 m2 m3 m4 m5 m60 码力 | 88 页 | 824.07 KB | 6 月前3 Adventures in SIMD Thinking (Part 1 of 2)E W B C O M P U T I N G 4 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE rf_512 load_value(float fill) { return _mm512_set1_ps(v); } 2.3 2.3 2.3 2.3 2.3 2.3 2.3 E W B C O M P U T I N G 5 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE ri_512 load_value(int32_t fill) { return _mm512_set1_epi32(i); } 7 7 7 7 7 7 7 7 7 7 7 7 E W B C O M P U T I N G 6 CppCon 2020 - Adventures in SIMD Thinking Function load_from() KEWB_FORCE_INLINE rf_512 load_from(float const* psrc) { return _mm512_loadu_ps(psrc); } m0 m1 m2 m3 m4 m5 m60 码力 | 88 页 | 824.07 KB | 6 月前3
 Adventures in SIMD Thinking (Part 2 of 2)kcoeff[i] = load_value(pkrnl[j]); } //- Preload the initial input data window; note the zeroes in the register representing data // preceding the input array. // prev = load_value(0.0f); curr = load_from(psrc); load_from(psrc); next = load_from(psrc + 16); ... }Copyright © 2020 Bob Steagall K E W B C O M P U T I N G 12 CppCon 2020 - Adventures in SIMD Thinking Function Template avx_convolve() template Adventures in SIMD Thinking (Part 2 of 2)kcoeff[i] = load_value(pkrnl[j]); } //- Preload the initial input data window; note the zeroes in the register representing data // preceding the input array. // prev = load_value(0.0f); curr = load_from(psrc); load_from(psrc); next = load_from(psrc + 16); ... }Copyright © 2020 Bob Steagall K E W B C O M P U T I N G 12 CppCon 2020 - Adventures in SIMD Thinking Function Template avx_convolve() template- load_value(pkrnl[j]); } //- Preload the initial input data window; note the zeroes in the register representing data // preceding the input array. // prev = load_value(0.0f); curr = load_from(psrc); 0 码力 | 135 页 | 551.08 KB | 6 月前3
 Making Games Start Fast: A Story About ConcurrencyRead localization ◉ Load textures, models and audio ◉ Load game rules & databases 14Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 6.8s 2.7 (Old) Startup Profile 15Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 0.5s Wait: 0.6s CPU: 0.9s Wait: 1.4s CPU: 4.5s Wait: starts a bit faster ◉ CPU load increases 33Improvement Results, Round 1 ◉ Wait time on PhysFS mutex entirely disappears ◉ Wait time goes down, game starts a bit faster ◉ CPU load increases 3435A lock0 码力 | 76 页 | 2.22 MB | 6 月前3 Making Games Start Fast: A Story About ConcurrencyRead localization ◉ Load textures, models and audio ◉ Load game rules & databases 14Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 6.8s 2.7 (Old) Startup Profile 15Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 0.5s Wait: 0.6s CPU: 0.9s Wait: 1.4s CPU: 4.5s Wait: starts a bit faster ◉ CPU load increases 33Improvement Results, Round 1 ◉ Wait time on PhysFS mutex entirely disappears ◉ Wait time goes down, game starts a bit faster ◉ CPU load increases 3435A lock0 码力 | 76 页 | 2.22 MB | 6 月前3
 When Lock-Free Still Isn't Enough: An Introduction to Wait-Free Programming and Concurrency Techniquesdanielanderson.net struct Counter { bool increment_if_not_zero() { auto current = counter.load(); while (current > 0 && !counter.compare_exchange_weak(current, current + 1)) { } decrement() { return counter.fetch_sub(1) == 1; } uint64_t read() { return counter.load(); } std::atomic When Lock-Free Still Isn't Enough: An Introduction to Wait-Free Programming and Concurrency Techniquesdanielanderson.net struct Counter { bool increment_if_not_zero() { auto current = counter.load(); while (current > 0 && !counter.compare_exchange_weak(current, current + 1)) { } decrement() { return counter.fetch_sub(1) == 1; } uint64_t read() { return counter.load(); } std::atomic- counter{1}; };12 A lock-free solution Daniel Anderson -- danielanderson danielanderson.net struct Counter { bool increment_if_not_zero() { auto current = counter.load(); while (current > 0 && !counter.compare_exchange_weak(current, current + 1)) { } 0 码力 | 33 页 | 817.96 KB | 6 月前3
 Branchless Programming in C++(branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] jump if true a[i]:a+=v2[i] jump a[i]:a+=v1[i] ...Branchless a+=(v1[i]>v2[i])?v1[i]:v2[i] Performance and Efficiency load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] v2[i]=v1[i] if true ... load:v2[i+1] load:v1[i+1] ... a[i]:a+=v2[i] ... ... ... ... ... conditional for i+2 before checking that i+2 Branchless Programming in C++(branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] jump if true a[i]:a+=v2[i] jump a[i]:a+=v1[i] ...Branchless a+=(v1[i]>v2[i])?v1[i]:v2[i] Performance and Efficiency load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] v2[i]=v1[i] if true ... load:v2[i+1] load:v1[i+1] ... a[i]:a+=v2[i] ... ... ... ... ... conditional for i+2 before checking that i+2- load:v1[i] load:v2[i] s[i]:v1[i]+v2[i] a[i]:a+=s[i] s[i+1]:v1[i+1]+v2[i+1] load:v2[i+1] load:v1[i+1] a[i+1]:a+=s[i+1] load:v1[i+w] s[i+2]: v2[i+2]: v1[i+2]: 0 码力 | 61 页 | 9.08 MB | 6 月前3
共 162 条
- 1
- 2
- 3
- 4
- 5
- 6
- 17
相关搜索词
 BalancingEfficiencyandFlexibilityCostofAbstractionsinEmbeddedSystemsRethinkingTaskBasedConcurrencyParallelismforLowLatencyC++CombiningCoRoutinesFunctionsintoJobSystemTheBeautyPowerPrimitiveShapesMultidimensionalArraysAdventuresSIMDThinkingPartMakingGamesStartFastStoryAboutWhenLockFreeStillIsnEnoughAnIntroductiontoWaitProgrammingTechniquesBranchless














 
 