 Performance Engineering: Being Friendly to Your Hardware*>(dst + size - 16), _mm_loadu_si128(reinterpret_cast Performance Engineering: Being Friendly to Your Hardware*>(dst + size - 16), _mm_loadu_si128(reinterpret_cast- (src + size - 16))); Factual Clickhouse implementation as an exampleExample – memcpy: handcrafted 88 void *memcpy_ch(void * __restrict __m128i*>(src) + 6); c7 = _mm_loadu_si128(reinterpret_cast - (src) + 7); src += 128; Factual Clickhouse implementation as an exampleExample – memcpy: handcrafted 89 void *memcpy_ch(void * __restrict return ret; } Factual Clickhouse implementation as an exampleExample – memcpy: handcrafted 90 Re-autovectorized, loops unrolled Still predominantly unaligned Factual Clickhouse implementation as an 0 码力 | 111 页 | 2.23 MB | 6 月前3
共 1 条
- 1













