BI - IT文库_程序员IT互联网编程电子书和文档免费下载，助您码力十足！

首页文库资料文章资讯上传文档发布文章登录账户

Qcon北京2018-《文本智能处理的深度学习技术》-陈运文

序列标注传统机器学习（CRF） • 需要大量特征工程 • 不同领域需要反复调整深度学习（Bi-LSTM+CRF） • 多领域通用 • 输入层采用词向量，提升泛化能力 • 循环神经网络（LSTM,GRU等）能学到一些较远的的上下文特征以及一些非线性特征序列标注字/词向量 Bi-LSTM 会数据开观达去明小 LSTM LSTM LSTM LSTM LSTM 解码器从向量里面提取关键信息，组合成生成式摘要深度学习内部注意力机制的引入 l 内部注意力机制在解码器里面做 l 关注已生成词，解决长序列摘要生成时，个别字词重复出现的问题 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN 解码器内部注意力机制输入序列输入序列输入序列。。。编码器解码器摘要序列。。。摘要序列 Rouge指标优化 Reward 文本摘要候选集集。 l ROUGE指标评价：不可导，无法采用梯度下降的方式训练，考虑强化学习，鼓励reward高的模型，通过给与反馈来更新模型。最终训练得到表现最好的模型。生成式摘要 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN Rouge指标优化 Reward 文本摘要候选集生成解码器内部注意力机制编码器解码器深度学习摘要生成式模型输入序列输入序列输入序列。。。

0 码力 | 46 页 | 25.61 MB | 1 年前
3
Hadoop 迁移到阿里云MaxCompute 技术方案

及开源生态与阿里云大数据生态对比 2.1.1 主流大数据体系架构 Hadoop 及开源生态由一系列的开源组件共同组成，很多用户基于 Hadoop 及开源生态组件构建企业数据仓库/数据湖、机器学习、实时分析、BI 报表等大数据应用。我们常见的大数据架构的逻辑组件关系如下图所示：这些逻辑组件包括：  数据源：数据源包括关系型数据库、日志文件、实时消息等。  数据存储：面向海量数据存储的分布式文件存储服务，支持分析型数据存储：对数据进行处理加工后，面向应用场景，将数据以结构化的方式进行存储，以便分析工具或分析应用能够获取数据。如利用 MPP 数据仓库、Spark SQL 等支持 BI 工具访问，利用 Hbase 实现低延迟的在线服务等  分析与报表：对数据进行分析和展现以获取洞察。如 BI 工具、jupyter 等。  数据作业编排：将多个数据处理动作（数据移动、处理转换等）编排成为工作流并周期性地执行以实现数据处理工作的自动化。如 GreenPlum/Impala/Presto/Hive NoSQL：Hbase 数据仓库：MaxCompute/ Hologres/分析型数据库 NoSQL:云数据库 Hbase 版/表格存储分析与报表 BI 工具 Notebook QuickBI PAI Notebook 组件 EMR Notebook 组件数据作业编排 Oozie/Azkaban/Airflow Sqooq

0 码力 | 59 页 | 4.33 MB | 1 年前
3
Apache Kyuubi 1.5.1 Documentation

org/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos

0 码力 | 267 页 | 5.80 MB | 1 年前
3
Apache Kyuubi 1.5.2 Documentation

org/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos

0 码力 | 267 页 | 5.80 MB | 1 年前
3
Apache Kyuubi 1.5.0 Documentation

org/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos

0 码力 | 267 页 | 5.80 MB | 1 年前
3
Apache Kyuubi 1.5.0 Documentation

Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry

0 码力 | 172 页 | 6.94 MB | 1 年前
3
Apache Kyuubi 1.5.1 Documentation

Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry

0 码力 | 172 页 | 6.94 MB | 1 年前
3
Apache Kyuubi 1.5.2 Documentation

Iceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry

0 码力 | 172 页 | 6.94 MB | 1 年前
3
Apache Kyuubi 1.7.0-rc1 Documentation

Lakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different

0 码力 | 206 页 | 3.78 MB | 1 年前
3
Apache Kyuubi 1.7.3 Documentation

Lakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different

0 码力 | 211 页 | 3.79 MB | 1 年前
3