Can Data-Oriented-Design be Improved?programming • Modules • 2000s • Template metaprogramming • Concurrency • 2020s • ??? 3What is DoD about? • DoD (“Data oriented design”) • Not about cache lines, nor struct layout (at its core) • From wikipedia: 4Minimalist definition of DoD 𝐷𝑎𝑡𝑎!"#$"# = 𝐹(𝐷𝑎𝑡𝑎%&$"#) Transformation Input Data Output Data Specific transformation Previous transformation Next transformation 5How DoD is used in actual ChatGPT… 7 That’s cool, but it won’t get us very far…How can we improve it? (second try) • At its core DoD is just: 𝐷𝑎𝑡𝑎!"#$"# = 𝐹(𝐷𝑎𝑡𝑎%&$"#) • … With a heavy focus on the data. • What if we looked0 码力 | 39 页 | 1.18 MB | 6 月前3
Embracing an Adversarial Mindset for Cpp SecurityMalware Research # @ Endgame 0x40200B Malware Research # @ Fireeye 0x40201A Computer Forensics # @ DoD PREVIOUS 0x40D021 Blackhat, RSA, DEFCON, 0x40D02B 44Con, CanSecWest 0x40D02E BsidesSF, WiCys 0x40D0320 码力 | 92 页 | 3.67 MB | 6 月前3
Just-in-Time Compilation - J F Bastien - CppCon 2020carry much more semantic information implicitly. Here, think of examples where a single bytecode instruction might do a full matrix multiplication, or change the prototype of a class. There are much higher architecture on another architecture. Dynamic binary translation: execute the program from one Instruction Set Architecture in another (or the same) ISA, performing the translation dynamically. In other dynamic optimization system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor. Focus its efforts on optimization opportunities that0 码力 | 111 页 | 3.98 MB | 6 月前3
C++ Memory Model: from C++11 to C++23www.linkedin.com/in/alexdathskovsky INO Execution • instruction fetch • if operands available execute it if not fetch them • The instruction is executed by the functional unit • The functional unit Scheduling • instruction fetched • instruction dispatched to instruction que • The instruction waits in the que untils its input operand are available • if operands available instruction is allowed to to leave the queue before other instructions • The instruction is issued to a functional unit • Only if all older instructions have completed the operation the result is written to register fileAlex0 码力 | 112 页 | 5.17 MB | 6 月前3
Branchless Programming in C++● Data dependency: a = (v1 + v2)*(v1 – v2) ● Pipeline increases CPU utilization ● Multiple instruction streams run in parallel – Dependencies within each stream – No data dependencies between streams executed ● Conditional jumps (branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i] load:v2[i] cmp[i]:v1[i]>v2[i] jump if true a[i]:a+=v2[i] jump a[i]:a+=v1[i] executed ● Conditional jumps (branches) disrupt that order ● CPU must wait until it knows which instruction to fetch next load:v1[i]...v3[i] cmp[i]:v3[i]==0 jump if true a[i]:a+=v1[i]+v2[i] jump a[i]:a+=v1[i]*v2[i]0 码力 | 61 页 | 9.08 MB | 6 月前3
How Meta Made Debugging Async Code Easier with Coroutines and SendersatomictopFrame; AsyncStackRoot* nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; };So what was that magic*? struct AsyncStackRoot { atomic atomic topFrame; AsyncStackRoot* nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; };So what was that magic*? struct AsyncStackRoot { atomic topFrame; nextRoot; frame_ptr stackFramePtr; instruction_ptr returnAddress; }; struct AsyncStackFrame { AsyncStackFrame* parentFrame; instruction_ptr instructionPointer; AsyncStackRoot* stackRoot; 0 码力 | 131 页 | 907.41 KB | 6 月前3
Blazing Trails: Building the World's Fastest CameBoy Emulator in Modern C++completely memory mapped • Only 8080’s set of registers • Z80’s extended bit manipulation instruction set • Some aditional new instructionsCPU Block DiagramMemory & Memory mapped access • 8 KiB of the CPU. Emulators with T-cycle accuracy simulate the exact number of clock ticks for every instruction, providing the highest level of timing precision. • M-Cycle: Memory Cycle. An M-cycle represents represents a higher-level unit of time used by the Game Boy's CPU for executing instructions. Each instruction takes a specific number of M-cycles, with each M-cycle typically equating to 4 T-cycles. 21T-Cycle0 码力 | 91 页 | 8.37 MB | 6 月前3
To Int or to Uint, This is the Questionas all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is as all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is as all numbers are represented in the binary form. Therefor using n/2 is equal to n>>1. •Each instruction that is fetched from the memory is pushed into a pipeline, one of the steps in the pipeline is0 码力 | 102 页 | 3.64 MB | 6 月前3
Back to Basics: Concurrencyof execution ● One CPU core executes code sequentially ○ i.e. One instruction after the other. Instruction Execute ... Instruction ExecuteThis somewhat reflects how we write software (2/3) 12 ● one main, sequential thread of execution ● One CPU core executes code sequentially ○ i.e. One instruction after the other. ○ We can abstract our visualization, and just show the call stack. ■ (One function have one main, sequential thread of execution ● One CPU core executes code serially ○ i.e. One instruction after the other. ○ We can abstract our visualization, and just show the call stack. ■ (One function0 码力 | 141 页 | 6.02 MB | 6 月前3
Performance Engineering: Being Friendly to Your Hardware• Linear fetch • Incoming branch • Instruction alignment • Instruction fusingBranch prediction 42 Branching Fetch L1I • Governs fetching of next instruction blocks • A set of tables • Branch instructions is: - Complex - Serial - Slow • Fetch block size • Linear fetch vs incoming branch 44Instruction decoding 45 Branching Fetch Decode Queue ROM Cache L1I • Decoded operations may get cached Block-based cryptographic hash function. • Simple bitwise operations – many of them. • Instruction set equivalence may not be assumed. • Vertical vs horizontal data layout. 105 Latency Throughput0 码力 | 111 页 | 2.23 MB | 6 月前3
共 105 条
- 1
- 2
- 3
- 4
- 5
- 6
- 11
相关搜索词
CanDataOrientedDesignbeImprovedEmbracinganAdversarialMindsetforCppSecurityJustinTimeCompilationBastienCppCon2020C++MemoryModelfrom11to23BranchlessProgrammingHowMetaMadeDebuggingAsyncCodeEasierwithCoroutinesandSendersBlazingTrailsBuildingtheWorldFastestCameBoyEmulatorModernToIntorUintThisisQuestionBackBasicsConcurrencyPerformanceEngineeringBeingFriendlyYourHardware













