C++ zero-cost abstractions на примере хеш-таблиц в ClickHouseсчет дополнительных фетчей из памяти Метод цепочек 13 13 Метод цепочек 14 14 Пример: std::unordered_map 1. Стабильность указателей на ключ, значение 2. Возможность хранить большие объекты, неперемещаемые Время ClickHouse HashMap 7.366 сек. Google DenseMap 10.089 сек. Abseil HashMap 9.011 сек. std::unordered_map 44.758 сек. Бенчмарки 28 28 perf stat -e cache-misses:u ./integer_hash_tables_and_hashes misses ClickHouse HashMap 329,664,616 Google DenseMap 383,350,820 Abseil HashMap 415,869,669 std::unordered_map 1,939,811,017 Бенчмарки 29 29 http://norvig.com/21-days.html#answers Бенчмарки 30 300 码力 | 49 页 | 2.73 MB | 1 年前3
7. UDF in ClickHouseamong ML algorithm engineers Simdjson • An extremely fast JSON parser based on AVX2 Instruction Set • Structured data extraction (JSONExtract) • We can pass the type as a parameter just like in CAST 26 Going Further Begin Content Area = 16,30 27 Inline C++ in SQL SELECT udsf(' std::string udsf(std::string s) { return "hello, " + s; } ', 'world') • Compiled and linked0 码力 | 29 页 | 1.54 MB | 1 年前3
ClickHouse on KubernetesService Replica Service User Config Map Common Config Map Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map Altinity ClickHouse Operator Quick Start minutes to propagate. Confirm changes using clickhouse- client To make storage persistent and set properties add an explicit volume claim template with class and size apiVersion: "clickhouse.altinity allocate or mount volumes Local volume mounts are also supported storageClassName can be used to set the proper class of storage as well as disable dynamic provisioning Use kubectl to find available0 码力 | 34 页 | 5.06 MB | 1 年前3
ClickHouse on KubernetesService Replica Service User Config Map Common Config Map Stateful Set Pod Persistent Volume Claim Persistent Volume Per-replica Config Map Challenges running ClickHouse on Kubernetes minutes to propagate. Confirm changes using clickhouse- client To make storage persistent and set properties add an explicit volume claim template with class and size apiVersion: "clickhouse.altinity storage by ‘kubectl exec’ into pod; run ‘df -h’ to confirm mount storageClassName can be used to set the proper class of storage as well as disable dynamic provisioning Use kubectl to find available0 码力 | 29 页 | 3.87 MB | 1 年前3
ClickHouse在B站海量数据场景的落地实践离线写入服务 平台服务 Berserker 数据源管理 交互式 分析查询 Yuuni服务 用户 内核 Map隐式列 v 原⽣Map使⽤Array of Tuple实现 v 原⽣Map查询时需读取⼤量⽆效数据 Map隐式列 v Map隐式列将每个Key存储为独⽴列 v Map隐式列查询时只读取需要的隐式列 Bulkload v 原⽣写⼊⽅式消耗ClickHouse Server资源,影响查询性能0 码力 | 26 页 | 2.15 MB | 1 年前3
3. Sync Clickhouse with MySQL_MongoDBslow ● GROUP BY id HAVING sum(sign)>0 ○ Need to use GROUP BY in every query ○ Not suitable for multi-column primary key Our Solution: PTS Key Features ● Only one config file needed for a new Clickhouse0 码力 | 38 页 | 7.13 MB | 1 年前3
2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰26 31 26 1 2000209 2 4 1 28 42 16 32 2 1 一切以用户价值为依归 如何使用ClickHouse满足特殊需求 23 业务应用实践 iData 1 Map类数据处理方式 SELECT Goals.play_times_key AS key, sum(Goals.play_times_value) AS value FROM wegame ARRAY0 码力 | 26 页 | 3.58 MB | 1 年前3
ClickHouse in Production'Show'=1, 'Click'=2) ) ENGINE = HDFS('hdfs://hdfs1:9000/event_log.parq', 'Parquet') Ok. 0 rows in set. Elapsed: 0.004 sec. 51 / 97 In ClickHouse: Most Clicked Banner SELECT countIf(CounterType='Show') 958 │ 6253958168 │ │ 15826 │ 873 │ 6999678684 │ └──────────┴───────────┴────────────┘ 3 rows in set. Elapsed: 109.586 sec. Processed 28.75 mln rows. 53 / 97 In ClickHouse: Local Log Copy CREATE TABLE MergeTree() ORDER BY BannerID; Ok. INSERT INTO EventLogLocal SELECT * FROM EventLogHDFS; Ok. 0 rows in set. Elapsed: 106.350 sec. Processed 28.75 mln rows. 54 / 97 In ClickHouse: Query Local Copy SELECT0 码力 | 100 页 | 6.86 MB | 1 年前3
8. Continue to use ClickHouse as TSDBColumn-Orient Model How we do CPU : Intel Skylake 8 core Memory : 64 GB Disk : 500GB SSD Data Set : TSBS, 12 Hours, 40000 Drivers, 10 Metrics ≈ 16.9 billion Rows ► Column-Orient Model How we do DESC LIMIT 5 ┌─value─┐ │ 4 │ │ 4 │ │ 4 │ │ 4 │ │ 4 │ └───────┘ 5 rows in set. Elapsed: 0.854 sec. Processed 144.06 million rows, 5.19 GB (168.64 million rows/s., 6.07 GB/s.) Time-Series-Orient Model How we do CPU : Intel Skylake 8 core Memory : 64 GB Disk : 500GB SSD Data Set : TSBS, 12 Hours, 40000 Drivers, 10 Metrics ≈ 19.6 billion Rows ► Time-Series-Orient Model How we0 码力 | 42 页 | 911.10 KB | 1 年前3
1. Machine Learning with ClickHouserows in set. Elapsed: 0.413 sec. Processed 432.99 million rows Query with sampling reads less rows! SELECT count() FROM trips_sample_time SAMPLE 1 / 3 OFFSET 1 / 3 144330770 1 rows in set. Elapsed:0 码力 | 64 页 | 1.38 MB | 1 年前3
共 13 条
- 1
- 2













