 Qcon北京2018-《文本智能处理的深度学习技术》-陈运文序列标注 传统机器学习(CRF) • 需要大量特征工程 • 不同领域需要反复调整 深度学习(Bi-LSTM+CRF) • 多领域通用 • 输入层采用词向量,提升泛化能力 • 循环神经网络(LSTM,GRU等)能学 到一些较远的的上下文特征以及一些 非线性特征 序列标注 字/词向量 Bi-LSTM 会 数 据 开 观 达 去 明 小 LSTM LSTM LSTM LSTM LSTM 解码器从向量里面提取关键信息,组合成生成式摘要 深度学习内部注意力机制的引入 l 内部注意力机制在解码器里面做 l 关注已生成词,解决长序列摘要生成时,个别字词重复出现的问题 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN 解码器内部注意力机制 输入序列 输入序列 输入序列。。。 编码器 解码器 摘要序列。。。 摘要序列 Rouge指标优化 Reward 文本摘要候选集 集。 l ROUGE指标评价:不可导,无法采用梯度下降的方式训练,考虑强化学习,鼓励reward高的模型,通过 给与反馈来更新模型。最终训练得到表现最好的模型。 生成式摘要 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN Rouge指标优化 Reward 文本摘要候选集 生成 解码器内部注意力机制 编码器 解码器 深度学习摘要生成式模型 输入序列 输入序列 输入序列。。。0 码力 | 46 页 | 25.61 MB | 1 年前3 Qcon北京2018-《文本智能处理的深度学习技术》-陈运文序列标注 传统机器学习(CRF) • 需要大量特征工程 • 不同领域需要反复调整 深度学习(Bi-LSTM+CRF) • 多领域通用 • 输入层采用词向量,提升泛化能力 • 循环神经网络(LSTM,GRU等)能学 到一些较远的的上下文特征以及一些 非线性特征 序列标注 字/词向量 Bi-LSTM 会 数 据 开 观 达 去 明 小 LSTM LSTM LSTM LSTM LSTM 解码器从向量里面提取关键信息,组合成生成式摘要 深度学习内部注意力机制的引入 l 内部注意力机制在解码器里面做 l 关注已生成词,解决长序列摘要生成时,个别字词重复出现的问题 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN 解码器内部注意力机制 输入序列 输入序列 输入序列。。。 编码器 解码器 摘要序列。。。 摘要序列 Rouge指标优化 Reward 文本摘要候选集 集。 l ROUGE指标评价:不可导,无法采用梯度下降的方式训练,考虑强化学习,鼓励reward高的模型,通过 给与反馈来更新模型。最终训练得到表现最好的模型。 生成式摘要 Bi_LSTM Bi_LSTM Bi_LSTM RNN RNN Rouge指标优化 Reward 文本摘要候选集 生成 解码器内部注意力机制 编码器 解码器 深度学习摘要生成式模型 输入序列 输入序列 输入序列。。。0 码力 | 46 页 | 25.61 MB | 1 年前3
 Hadoop 迁移到阿里云MaxCompute 技术方案及开源生态与阿里云大数据生态对比 2.1.1 主流大数据体系架构 Hadoop 及开源生态由一系列的开源组件共同组成,很多用户基于 Hadoop 及开源生态组件构 建企业数据仓库/数据湖、机器学习、实时分析、BI 报表等大数据应用。我们常见的大数据架构 的逻辑组件关系如下图所示: 这些逻辑组件包括:  数据源:数据源包括关系型数据库、日志文件、实时消息等。  数据存储:面向海量数据存储的分布式文件存储服务,支持 分析型数据存储:对数据进行处理加工后,面向应用场景,将数据以结构化的方式进行存储, 以便分析工具或分析应用能够获取数据。如利用 MPP 数据仓库、Spark SQL 等支持 BI 工具 访问,利用 Hbase 实现低延迟的在线服务等  分析与报表:对数据进行分析和展现以获取洞察。如 BI 工具、jupyter 等。  数据作业编排:将多个数据处理动作(数据移动、处理转换等)编排成为工作流并周期性地 执行以实现数据处理工作的自动化。如 GreenPlum/Impala/Presto/Hive NoSQL:Hbase 数据仓库:MaxCompute/ Hologres/分析 型数据库 NoSQL:云数据库 Hbase 版/表格存储 分析与报表 BI 工具 Notebook QuickBI PAI Notebook 组件 EMR Notebook 组件 数据作业编排 Oozie/Azkaban/Airflow Sqooq0 码力 | 59 页 | 4.33 MB | 1 年前3 Hadoop 迁移到阿里云MaxCompute 技术方案及开源生态与阿里云大数据生态对比 2.1.1 主流大数据体系架构 Hadoop 及开源生态由一系列的开源组件共同组成,很多用户基于 Hadoop 及开源生态组件构 建企业数据仓库/数据湖、机器学习、实时分析、BI 报表等大数据应用。我们常见的大数据架构 的逻辑组件关系如下图所示: 这些逻辑组件包括:  数据源:数据源包括关系型数据库、日志文件、实时消息等。  数据存储:面向海量数据存储的分布式文件存储服务,支持 分析型数据存储:对数据进行处理加工后,面向应用场景,将数据以结构化的方式进行存储, 以便分析工具或分析应用能够获取数据。如利用 MPP 数据仓库、Spark SQL 等支持 BI 工具 访问,利用 Hbase 实现低延迟的在线服务等  分析与报表:对数据进行分析和展现以获取洞察。如 BI 工具、jupyter 等。  数据作业编排:将多个数据处理动作(数据移动、处理转换等)编排成为工作流并周期性地 执行以实现数据处理工作的自动化。如 GreenPlum/Impala/Presto/Hive NoSQL:Hbase 数据仓库:MaxCompute/ Hologres/分析 型数据库 NoSQL:云数据库 Hbase 版/表格存储 分析与报表 BI 工具 Notebook QuickBI PAI Notebook 组件 EMR Notebook 组件 数据作业编排 Oozie/Azkaban/Airflow Sqooq0 码力 | 59 页 | 4.33 MB | 1 年前3
 Apache Kyuubi 1.5.1 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3 Apache Kyuubi 1.5.1 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3
 Apache Kyuubi 1.5.2 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3 Apache Kyuubi 1.5.2 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3
 Apache Kyuubi 1.5.0 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3 Apache Kyuubi 1.5.0 Documentationorg/] to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi Documentation 1. Access Kyuubi with Hive JDBC and ODBC Drivers 2. Access Kerberized Kyuubi with Beeline & BI Tools Integrations 1. Kyuubi On Apache Kudu 2. Kyuubi On Delta Lake 3. Kyuubi On Delta Lake With 3. JDBC URL 1.4. Example 1.5. Unsupported Hive Features 2. Access Kerberized Kyuubi with Beeline & BI Tools 2.1. Instructions 2.2. Install Kerberos Client 2.3. Configure Kerberos Client 2.4. Get Kerberos0 码力 | 267 页 | 5.80 MB | 1 年前3
 Apache Kyuubi 1.5.0 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3 Apache Kyuubi 1.5.0 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3
 Apache Kyuubi 1.5.1 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3 Apache Kyuubi 1.5.1 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3
 Apache Kyuubi 1.5.2 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3 Apache Kyuubi 1.5.2 DocumentationIceberg to build and manage Data Lake with pure SQL for both data processing e.g. ETL, and analytics e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. Kyuubi fully compatible with Hive JDBC and ODBC drivers that let you connect to popular Business Intelligence (BI) tools to query, analyze and visualize data though Spark SQL engines. Install Hive JDBC For programing 'hive-jdbc', version: '2.3.8' For BI tools, please refer to Quick Start to check the guide for the BI tool used. If you find there is no specific document for the BI tool that you are using, don’t worry0 码力 | 172 页 | 6.94 MB | 1 年前3
 Apache Kyuubi 1.7.0-rc1 DocumentationLakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different0 码力 | 206 页 | 3.78 MB | 1 年前3 Apache Kyuubi 1.7.0-rc1 DocumentationLakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different0 码力 | 206 页 | 3.78 MB | 1 年前3
 Apache Kyuubi 1.7.3 DocumentationLakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different0 码力 | 211 页 | 3.79 MB | 1 年前3 Apache Kyuubi 1.7.3 DocumentationLakehouse with pure SQL for both data processing, e.g. ETL, and online analytics processing(OLAP), e.g. BI. All workloads can be done on one platform, using one copy of data, with one SQL interface. ADMIN ticket(TGT). 3. Kerberos client stores TGT into a ticket cache. 4. JDBC client, such as beeline and BI tools, reads TGT from the ticket cache. 5. JDBC client sends TGT and server principal to KDC. 6. JDBC driver use Kerberos authentication. core-site.xml should be placed under beeline’s classpath or BI tools’ classpath. Beeline Here are the usual locations where core-site.xml should exist for different0 码力 | 211 页 | 3.79 MB | 1 年前3
共 86 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9














