Adventures in SIMD Thinking (Part 1 of 2)E W B C O M P U T I N G 4 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE rf_512 load_value(float fill) { return _mm512_set1_ps(v); } 2.3 2.3 2.3 2.3 2.3 2.3 2.3 E W B C O M P U T I N G 5 CppCon 2020 - Adventures in SIMD Thinking Function load_value() KEWB_FORCE_INLINE ri_512 load_value(int32_t fill) { return _mm512_set1_epi32(i); } 7 7 7 7 7 7 7 7 7 7 7 7 E W B C O M P U T I N G 6 CppCon 2020 - Adventures in SIMD Thinking Function load_from() KEWB_FORCE_INLINE rf_512 load_from(float const* psrc) { return _mm512_loadu_ps(psrc); } m0 m1 m2 m3 m4 m5 m60 码力 | 88 页 | 824.07 KB | 6 月前3
Adventures in SIMD Thinking (Part 2 of 2)kcoeff[i] = load_value(pkrnl[j]); } //- Preload the initial input data window; note the zeroes in the register representing data // preceding the input array. // prev = load_value(0.0f); curr = load_from(psrc); load_from(psrc); next = load_from(psrc + 16); ... }Copyright © 2020 Bob Steagall K E W B C O M P U T I N G 12 CppCon 2020 - Adventures in SIMD Thinking Function Template avx_convolve() templateload_value(pkrnl[j]); } //- Preload the initial input data window; note the zeroes in the register representing data // preceding the input array. // prev = load_value(0.0f); curr = load_from(psrc); 0 码力 | 135 页 | 551.08 KB | 6 月前3
Making Games Start Fast: A Story About ConcurrencyRead localization ◉ Load textures, models and audio ◉ Load game rules & databases 14Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 6.8s 2.7 (Old) Startup Profile 15Start Enumeration Read Localization Load Audio Load 2D Assets Load 3D Assets Load Game Databases CPU: 0.5s Wait: 0.6s CPU: 0.9s Wait: 1.4s CPU: 4.5s Wait: starts a bit faster ◉ CPU load increases 33Improvement Results, Round 1 ◉ Wait time on PhysFS mutex entirely disappears ◉ Wait time goes down, game starts a bit faster ◉ CPU load increases 3435A lock0 码力 | 76 页 | 2.22 MB | 6 月前3
When Lock-Free Still Isn't Enough: An Introduction to Wait-Free Programming and Concurrency Techniquesdanielanderson.net struct Counter { bool increment_if_not_zero() { auto current = counter.load(); while (current > 0 && !counter.compare_exchange_weak(current, current + 1)) { } decrement() { return counter.fetch_sub(1) == 1; } uint64_t read() { return counter.load(); } std::atomiccounter{1}; };12 A lock-free solution Daniel Anderson -- danielanderson danielanderson.net struct Counter { bool increment_if_not_zero() { auto current = counter.load(); while (current > 0 && !counter.compare_exchange_weak(current, current + 1)) { } 0 码力 | 33 页 | 817.96 KB | 6 月前3
Branchless Programming in C++(branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] jump if true a[i]:a+=v2[i] jump a[i]:a+=v1[i] ...Branchless a+=(v1[i]>v2[i])?v1[i]:v2[i] Performance and Efficiency load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] v2[i]=v1[i] if true ... load:v2[i+1] load:v1[i+1] ... a[i]:a+=v2[i] ... ... ... ... ... conditional for i+2 before checking that i+2load:v1[i] load:v2[i] s[i]:v1[i]+v2[i] a[i]:a+=s[i] s[i+1]:v1[i+1]+v2[i+1] load:v2[i+1] load:v1[i+1] a[i+1]:a+=s[i+1] load:v1[i+w] s[i+2]: v2[i+2]: v1[i+2]: 0 码力 | 61 页 | 9.08 MB | 6 月前3
Lock-Free Atomic Shared Pointers Without a Split Reference Count? It Can Be Done!shared_ptr that can be manipulated atomically! • void store(shared_ptrdesired) • shared_ptr load() • bool compare_exchange_weak(shared_ptr & expected, shared_ptr desired) • … compare_exchange(expected make_shared (std::move(t), head.load()); while (!head.compare_exchange_weak(p->next, p)) {} } optional pop_front() { auto p = head.load(); while (p != nullptr && atomic > a Thread 1 auto s = a.load(); Thread 2 a.store(make_shared ( …)); ++ -- If this happens first…23 The fundamental problem: The race to zero Thread 1 load(): Thread 2 store(shared_ptr 0 码力 | 45 页 | 5.12 MB | 6 月前3
Conan 2.0 Documentationprovides a mechanism to define those variables and make it possible, for executables, to find and load these shared libraries. This mechanism is the VirtualRunEnv generator. If you check the output folder from conan import ConanFile from conan.tools.files import load class pkgRecipe(ConanFile): name = "pkg" def set_version(self): self.version = load(self, "version.txt") # No need to specify the version # if self.version is already defined from CLI --version arg, it will # not load version.txt self.version = self.version or load(self, "version.txt") # This will create the "1.4" version even if the version0 码力 | 652 页 | 4.00 MB | 1 年前3
A Relaxed Guide to memory_order_relaxedJust a load, just a store: Full control, excellent efficiency and scalability! ○ Assuming aligned machine-sized atomic objects, that is…What is Not to Like About memory_order_relaxed? ● Just a load, just unnecessary overhead on architectures such as ARM ● This refinement proposes the addition of memory_order_load_storeProprietary + Confidential We contrast three examples: ● Simple reordering must be allowed. presumed to be initially zero, null, or false, unless stated otherwise. ● r1 =rlx x abbreviates r1 = x.load(std::memory_order_relaxed) ● x =rlx r1 abbreviates x.store(r1, std::memory_order_relaxed) Notation0 码力 | 32 页 | 278.53 KB | 6 月前3
Conan 2.5 Documentationprovides a mechanism to define those variables and make it possible, for executables, to find and load these shared libraries. This mechanism is the VirtualRunEnv generator. If you check the output folder can be done: Listing 73: conanfile.py from conan import ConanFile from conan.tools.files import load (continues on next page) 120 Chapter 4. Tutorial Conan Documentation, Release 2.5.0 (continued from previous page) class pkgRecipe(ConanFile): name = "pkg" def set_version(self): self.version = load(self, "version.txt") # No need to specify the version in CLI arg or in recipe attribute $ conan create0 码力 | 769 页 | 4.70 MB | 1 年前3
Conan 2.4 Documentationprovides a mechanism to define those variables and make it possible, for executables, to find and load these shared libraries. This mechanism is the VirtualRunEnv generator. If you check the output folder can be done: Listing 73: conanfile.py from conan import ConanFile from conan.tools.files import load (continues on next page) 120 Chapter 4. Tutorial Conan Documentation, Release 2.4.1 (continued from previous page) class pkgRecipe(ConanFile): name = "pkg" def set_version(self): self.version = load(self, "version.txt") # No need to specify the version in CLI arg or in recipe attribute $ conan create0 码力 | 769 页 | 4.69 MB | 1 年前3
共 160 条
- 1
- 2
- 3
- 4
- 5
- 6
- 16













