2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰高内存,廉价存储: 单机配置: Memory128G CPU核数24 SATA20T,RAID5 万兆网卡 一切以用户价值为依归 5 部署与监控管理 1 生产环境部署方案: Distributed Table Replica1Replica1 Replica1Replica1 Replica1Replica1 Shard01 Shard02 Shard03 Load Balancing DataMore大数据实时决策能力 一切以用户价值为依归 17 业务应用实践 iData 2 新大数据分析引擎2.0 业界传统 大数据分析 引擎 大数据分析引擎&存储 Analytical Engine & Database 大数据仓库 Hadoop Data Lake 计算引擎 MR & Spark Data Warehouse OLTP Big Data Analysis 数据报表 多 维0 码力 | 26 页 | 3.58 MB | 1 年前3
 ClickHouse in Productionhttps://github.com/donnemartin/system-design-primer 3 / 97 Highload Architecture › Webserver (Apache, Nginx) › Cache (Memcached) https://github.com/donnemartin/system-design-primer 4 / 97 Highload Architecture Cache (Memcached) › Message Broker (Kafka, Amazon SQS) › Coordination system (Zookeeper, etcd) https://github.com/donnemartin/system-design-primer 5 / 97 Highload Architecture › Webserver (Apache, Nginx) Broker (Kafka, Amazon SQS) › Coordination system (Zookeeper, etcd) › MapReduce (Hadoop, Spark) › Network File System (S3, HDFS) https://github.com/donnemartin/system-design-primer 6 / 97 Highload Architecture0 码力 | 100 页 | 6.86 MB | 1 年前3
 ClickHouse on KubernetesBackground ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support Linux” Actually it’s an open-source platform to: ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application ClickHouse on Kubernetes? 1. Provisioning 2. Persistence 3. Networking 4. Transparency kube-system namespace The ClickHouse operator turns complex data warehouse configuration into a single easy-to-manage0 码力 | 29 页 | 3.87 MB | 1 年前3
 7. UDF in ClickHousesystems and algorithms Active GitHub User • https://github.com/hczhcz • Interested in computer system and language stuff • 8 organizations, 90+ repos, 600+ followers ClickHouse Contributor Begin Content in a ML System • Pre-analyzing the data • Extracting features • Constructing relationship graphs • Generating reports • ... Begin Content Area = 16,30 7 Intensive Tasks in a ML System • Pre-analyzing is very similar to OLAP Begin Content Area = 16,30 8 A Database is not Just a “Database” What an English Dictionary Tells You • database /ˈdeɪtəˌbeɪs/ A collection of data stored in a computer that0 码力 | 29 页 | 1.54 MB | 1 年前3
 Что нужно знать об архитектуре ClickHouse, чтобы его эффективно использоватьмешают друг другу… ClickHouse: Шардирование + Distributed таблицы! Когда одного сервера не хватает Чтение из Distributed таблицы Чтение из Distributed таблицы CSV 227 Gb, ~1.3 млрд строк SELECT passenger_count Шардов 1 3 140 Время, с. 1,224 0,438 0,043 Ускорени е x2.8 x28.5 Запись в Distributed таблицу Запись в Distributed таблицу › Хочется защититься от аппаратного сбоя… › Данные должны быть доступны0 码力 | 28 页 | 506.94 KB | 1 年前3
 ClickHouse on KubernetesBackground ● Premier provider of software and services for ClickHouse ● Incorporated in UK with distributed team in US/Canada/Europe ● US/Europe sponsor of ClickHouse community ● Offerings: ○ 24x7 support Linux” Actually it’s an open-source platform to: ● manage container-based systems ● build distributed applications declaratively ● allocate machine resources efficiently ● automate application easy-to-manage resource ClickHouse Operator ClickHouseInstallation YAML file (Apache 2.0 source, distributed as Docker image) ClickHouse cluster resources kubectl apply create resources What0 码力 | 34 页 | 5.06 MB | 1 年前3
 8. Continue to use ClickHouse as TSDB(4) 数据总是随时间变化而不断变化 Why we choose it ► 解决方案 ► (1) Row-Orient Database ► (2) Column-Orient Database ► (3) Time-Series-Orient Database Why we choose it Time Name Age Humidity HeartRate Localtion . 2019/10/11/ 11:00:01 Tom 26 45% 96 121.54794 31.32318 ... 21 INSERT INTO ... ► Row-Orient Database Why we choose it 2019/10/11/ 11:00:01 Tom 26 45% 96 ... 21 Time Name Age Humidity HeartRate BETWEEN ... AND ... AND Name = “Tom” Red : Data needed Green : Data Scaned ► Column-Orient Database Why we choose it Temperature 11 20 ... 11 21 Time 2019/10/10/ 10:00:00 2019/10/10/0 码力 | 42 页 | 911.10 KB | 1 年前3
 1. Machine Learning with ClickHouseBY sipHash64(pickup_datetime) -- expression for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 4329923210 码力 | 64 页 | 1.38 MB | 1 年前3
 0. Machine Learning with ClickHouse BY sipHash64(pickup_datetime) -- expression for sampling SAMPLE BY expression must be evenly distributed! 12 / 62 How to sample data SAMPLE x OFFSET y SELECT count() FROM trips_sample_time 4329923210 码力 | 64 页 | 1.38 MB | 1 年前3
 Тестирование ClickHouse которого мы заслуживаемcontrib shared clang-8 release thread contrib static gcc-8 release — contrib static gcc-8 release — system static И это не все... 11 / 77 Тестирование ClickHouse, которого мы заслуживаем ClickHouse не joinGet(toDateTimeOrNull((CAST(([885455.14523]) AS String)))); SELECT (SELECT 1) FROM remote('127.0.0.{1,2}', system.one); 21 / 77 Тестирование ClickHouse, которого мы заслуживаем Про интеграцию С чем интегрируется которого мы заслуживаем Тесты производительности: анализ запросов Запрос: SELECT count() FROM system.numbers WHERE NOT ignore( materialize('xxxxxxxxxxxxxxxxxx') AS s, concat(s, s, s, s, s, s, s, s0 码力 | 84 页 | 9.60 MB | 1 年前3
共 12 条
- 1
 - 2
 













