Table API - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

ClickHouse in Production

Message Broker Daemons Pipeline mt-nano mt-log mt-giga Key-Value Store Dictionary Data DB Java API BI Tool Site Owner Data analyst 10 / 97 ClickHouse in Production: Yandex.Metrika Site visitor Message Broker Daemons Pipeline mt-nano mt-log mt-giga Key-Value Store Dictionary Data DB Java API BI Tool Site Owner Data analyst 11 / 97 ClickHouse in Production: Yandex.Metrika Site visitor Message Broker Daemons Pipeline mt-nano mt-log mt-giga Key-Value Store Dictionary Data DB Java API BI Tool Site Owner Data analyst 12 / 97 ClickHouse in Production: Yandex.Metrika Site visitor

0 码力 | 100 页 | 6.86 MB | 1 年前
3
1. Machine Learning with ClickHouse

resp = requests.get(url, data=query) string_io = io.StringIO(resp.text) table = pd.read_csv(string_io, sep="\t") 5 / 62 Table (part) 6 / 62 How to sample data You already know it! › LIMIT N › WHERE for fixed sample query › Only for MergeTree 11 / 62 How to sample data SAMPLE x OFFSET y CREATE TABLE trips_sample_time ( pickup_datetime DateTime ) ENGINE = MergeTree ORDER BY sipHash64(pickup_datetime) to store trained model You can store model as aggregate function state in a separate table Example CREATE TABLE models ENGINE = MergeTree ORDER BY tuple() AS SELECT stochasticLinearRegressionState(total_amount

0 码力 | 64 页 | 1.38 MB | 1 年前
3
0. Machine Learning with ClickHouse

resp = requests.get(url, data=query) string_io = io.StringIO(resp.text) table = pd.read_csv(string_io, sep="\t") 5 / 62 Table (part) 6 / 62 How to sample data You already know it! › LIMIT N › WHERE for fixed sample query › Only for MergeTree 11 / 62 How to sample data SAMPLE x OFFSET y CREATE TABLE trips_sample_time ( pickup_datetime DateTime ) ENGINE = MergeTree ORDER BY sipHash64(pickup_datetime) to store trained model You can store model as aggregate function state in a separate table Example CREATE TABLE models ENGINE = MergeTree ORDER BY tuple() AS SELECT stochasticLinearRegressionState(total_amount

0 码力 | 64 页 | 1.38 MB | 1 年前
3
7. UDF in ClickHouse

Computing Task Result Table Pipeline = Directed Acyclic Graph (DAG) of modules Module = Input + Task + Output Task = Query or external program Query = “CREATE TABLE ... AS SELECT ...” A Database with few dependencies Customization • The straight-forward code structure and the well-designed API • We maintains a custom build Begin Content Area = 16,30 11 The UDF Magic Begin Content Area provided by the user UDF in ClickHouse • Scalar functions • Aggregate functions & combinators • Table functions & storage engines Usage Examples in Our ML Systems Data Preprocessing Filling invalid

0 码力 | 29 页 | 1.54 MB | 1 年前
3
2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰

高内存，廉价存储：单机配置: Memory128G CPU核数24 SATA20T，RAID5 万兆网卡一切以用户价值为依归 5 部署与监控管理 1 生产环境部署方案： Distributed Table Replica1Replica1 Replica1Replica1 Replica1Replica1 Shard01 Shard02 Shard03 Load Balancing 一切以用户价值为依归 Partition0 Data2 Partition2 DataN PartitionM … … app-2 … … app-n RPC DataNode 基于位图的分布式计算引擎 API Server Scheduler SQL-Parser QueryOptimier Column1 DataNode Column2 Column3 ColumnN Column1

0 码力 | 26 页 | 3.58 MB | 1 年前
3
3. Sync Clickhouse with MySQL_MongoDB

Can’t update/delete table frequently in Clickhouse Possible Solutions 2. MySQL Engine Not suitable for big tables Not suitable for MongoDB Possible Solutions 3. Reinit whole table every day…… Possible PTS Key Features ● Only one config file needed for a new Clickhouse table ● Init and keep syncing data in one app for a table ● Sync multiple data source to Clickhouse in minutes PTS Provider Transform mongodb, redis Listen: binlog, // binlog, kafka DataSource: user:pass@tcp(example.com:3306)/user, Table: user, QueryKeys: [ // usually primary key id ], Pairs: { // field mapping id: id, name: name

0 码力 | 38 页 | 7.13 MB | 1 年前
3
8. Continue to use ClickHouse as TSDB

Column-Orient Model ► (2) Time-Series-Orient Model How we do ► Column-Orient Model How we do CREATE TABLE demonstration.insert_view ( `Time` DateTime, `Name` String, `Age` UInt8, ..., `HeartRate` PARTITION BY toYYYYMM(Time) ORDER BY (Name, Time, Age, ...); ► Column-Orient Model How we do CREATE TABLE demonstration.insert_view ( `Time` DateTime, `Name` LowCardinality(String), `Age` UInt8 rows, 5.19 GB (168.64 million rows/s., 6.07 GB/s.) ► Time-Series-Orient Model How we do CREATE TABLE demonstration.test ( `time_series_interval` DateTime, `metric_name` String, `Name`

0 码力 | 42 页 | 911.10 KB | 1 年前
3
ClickHouse: настоящее и будущее

Обработка графов • Batch jobs • Data Hub Support For Semistructured Data 27 JSO data type: CREATE TABLE games (data JSON) ENGINE = MergeTree; • You can insert arbitrary nested JSONs • Types are automatically games dataset CREATE TABLE games (data String) ENGINE = MergeTree ORDER BY tuple(); SELECT JSONExtractString(data, 'teams', 1, 'name') FROM games; — 0.520 sec. CREATE TABLE games (data JSON) ENGINE teams.name[1] FROM games; — 0.015 sec. Support For Semistructured Data <-- inferred type DESCRIBE TABLE games SETTINGS describe_extend_object_types = 1 name: data type: Tuple( `_id.$oid` String, `date

0 码力 | 32 页 | 2.62 MB | 1 年前
3
ClickHouse: настоящее и будущее

Обработка графов • Batch jobs • Data Hub Support For Semistructured Data 27 JSO data type: CREATE TABLE games (data JSON) ENGINE = MergeTree; • You can insert arbitrary nested JSONs • Types are automatically games dataset CREATE TABLE games (data String) ENGINE = MergeTree ORDER BY tuple(); SELECT JSONExtractString(data, 'teams', 1, 'name') FROM games; — 0.520 sec. CREATE TABLE games (data JSON) ENGINE teams.name[1] FROM games; — 0.015 sec. Support For Semistructured Data <-- inferred type DESCRIBE TABLE games SETTINGS describe_extend_object_types = 1 name: data type: Tuple( `_id.$oid` String, `date

0 码力 | 32 页 | 776.70 KB | 1 年前
3
2. Clickhouse玩转每天千亿数据-趣头条

1：趣头条和米读的上报数据是按照”事件类型”(eventType)进行区分 2：指标系统分”分时”和”累时”指标 3：指标的一般都是会按照eventType进行区分 select count(1) from table where dt='' and timestamp>='' and timestamp<='' and eventType='' 建表的时候缺乏深度思考，由于分时指标的特性，我们的表是order 1：max_memory_usage指定单个SQL查询在该机器上面最大内存使用量 2：除了些简单的SQL，空间复杂度是O(1) 如: select count(1) from table where column=value select column1, column2 from table where column=value 凡是涉及group by, order by, distinct, join这样的SQL内存占用不再是O(1)

0 码力 | 14 页 | 1.10 MB | 1 年前
3

共 15 条前往

页

ClickHouse in Production Machine Learning with UDF In 腾讯 clickhouse 实践 2019 丁晓坤熊峰 Sync Clickhouse MySQL MongoDB Continue to use as TSDB final pdf 玩转每天千亿数据头条

分类

语言

格式

ClickHouse in Production

1. Machine Learning with ClickHouse

0. Machine Learning with ClickHouse

7. UDF in ClickHouse

2. 腾讯 clickhouse实践 _2019丁晓坤&熊峰

3. Sync Clickhouse with MySQL_MongoDB

8. Continue to use ClickHouse as TSDB

ClickHouse: настоящее и будущее

ClickHouse: настоящее и будущее

2. Clickhouse玩转每天千亿数据-趣头条