7. UDF in ClickHousesolution can be 10x-1000x slower than a native C++ program Example: Multiple self-joining on time series Ease of Use and Maintainability SELECT skewPop(x) FROM data SELECT centralMoment(3)(x) / pow(stddevPop(x) pow(stddevPop(x), 3) FROM data SELECT (sum(pow(x, 3)) / count() - 3 * sum(pow(x, 2)) * sum(x) / pow(count(), 2) + 2 * pow(sum(x), 3) / pow(count(), 3)) / pow(stddevPop(x), 3) FROM data Begin Content Area Content Area = 16,30 18 Array Functions SELECT arraySplit(x -> x >= 10, [11, 4, 5, 14]) = [[11, 4, 5], [14]] SELECT arrayFill(x -> x > 0, [1, 2, 0, 0, 3, 0]) = [1, 2, 2, 2, 3, 3] • Handling time0 码力 | 29 页 | 1.54 MB | 1 年前3
1. Machine Learning with ClickHouseTable (part) 6 / 62 How to sample data You already know it! › LIMIT N › WHERE condition › SAMPLE x OFFSET y 7 / 62 How to sample data LIMIT N SELECT min(pickup_date), max(pickup_date) FROM ( SELECT sample data SAMPLE x OFFSET y Must specify an expression for sampling › Optimized by PK › Fixed dataset for fixed sample query › Only for MergeTree 11 / 62 How to sample data SAMPLE x OFFSET y CREATE for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 432992321 1 rows in set. Elapsed: 0.413 sec. Processed0 码力 | 64 页 | 1.38 MB | 1 年前3
0. Machine Learning with ClickHouse Table (part) 6 / 62 How to sample data You already know it! › LIMIT N › WHERE condition › SAMPLE x OFFSET y 7 / 62 How to sample data LIMIT N SELECT min(pickup_date), max(pickup_date) FROM ( SELECT sample data SAMPLE x OFFSET y Must specify an expression for sampling › Optimized by PK › Fixed dataset for fixed sample query › Only for MergeTree 11 / 62 How to sample data SAMPLE x OFFSET y CREATE for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 432992321 1 rows in set. Elapsed: 0.413 sec. Processed0 码力 | 64 页 | 1.38 MB | 1 年前3
6. ClickHouse在众安的实践02 集智平台 X-Brain AI 开放平台 计算框架 Hadoop, JStorm, Spark Streaming, Flink 离线/实时任务监控 数据、模型存储 Hive, HBase, Clickhouse, Kylin 数据接入 消 息 中 间 件 模型、 算法 模版 机器学习平台 Antron 机器人平台 X-Insight 数据洞察平台 X-Zatlas 数据可视化平台 数据可视化平台 模板 X-BI 数据探索平台 图像分类 平台 OCR工具 链 X-Farm 异构数据治理、协同平台 元数据管理/数据集市 数据权限管理 | 大数据、流数据建模 | 数据/模型生命周期管理 资源调度 业务系统 开 发 工 具 基 础 设 施 模型 反馈 智能应用 开放与敏捷 • 大数据、流数据统一建模管理 • 垂直方向行业模板,简化开发过程 • 百亿数据性能测试与优化 • 硬盘存储升级 • 高效云盘 --> SSD + RAID0 • 140MBps --> ~600MBps, ~4x • 升级后 • ~250s --> ~69s,~3.62x l 数据加热后 ~69s -- > 18s ,~3.8x • ToDos • 优化数据导入流程 • 支持多分区,支持指定主键 • 常用字段加热 29 常用分析性能的命令分享 • linux命令0 码力 | 28 页 | 4.00 MB | 1 年前3
4. ClickHouse在苏宁用户画像场景的实践437 7.038 0 10 20 30 40 50 60 时长 结论: • 整形值精确去重场景,groupBitmap 比 uniqExact至少快 2x+ • groupBitmap仅支持整形值去重, uniqExact支持任意类型去重。 • 非精确去重场景,uniq在精准度上有优势。 5 0.25 0.46 0.29 0 入Array Container来说明: 12 short[] keys 0x0000 0xEE6B 0xFF01 Array Container 0x0001 0x0002 0x0003 0x2800 0xEE6B2800 高16位 Key 0xEE6B 0x2800 低16位 Value Bitmap Container 0 10 码力 | 32 页 | 1.47 MB | 1 年前3
Что нужно знать об архитектуре ClickHouse, чтобы его эффективно использоватьBY passenger_count NYC taxi benchmark Шардов 1 3 140 Время, с. 1,224 0,438 0,043 Ускорени е x2.8 x28.5 Запись в Distributed таблицу Запись в Distributed таблицу › Хочется защититься от аппаратного Availability (почти) есть! ❋ Можно отключать один ДЦ, если ZK в 3-х датацентрах, а реплики минимум в 2-x. ❋ Нельзя писать в сервер, отрезанный от кворума ZK Репликация с точки зрения CAP–теоремы Всё вместе0 码力 | 28 页 | 506.94 KB | 1 年前3
ClickHouse: настоящее и будущее• На инфраструктуре заказчика • На личном ноутбуке ClickHouse доступен под разные платформы: • x86_64, aarch64 (ARM), PowerPC 64, RISC-V • Linux, FreeBSD, mac OS ClickHouse — настоящий open-source0 码力 | 32 页 | 2.62 MB | 1 年前3
ClickHouse: настоящее и будущееKubernetes • На инфраструктуре заказчика • На личном ноутбуке ClickHouse доступен под разные платформы: • x86_64, aarch64 (ARM), PowerPC 64, RISC-V • Linux, FreeBSD, mac OS ClickHouse — настоящий open-source0 码力 | 32 页 | 776.70 KB | 1 年前3
ClickHouse on Kubernetesdistributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support for ClickHouse deployments ○ Software (Kubernetes, cluster manager, tools & utilities) ○0 码力 | 34 页 | 5.06 MB | 1 年前3
ClickHouse on Kubernetesdistributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support for ClickHouse deployments ○ Software (Kubernetes, cluster manager, tools & utilities) ○0 码力 | 29 页 | 3.87 MB | 1 年前3
共 11 条
- 1
- 2













