Spark SQL, DataFrames and Datasets指南
Introduction
概述
SQL
DataFrames
Datasets
准备开始
出发点SQLContext
创建DataFrames
DataFrames操作
执行SQL查询
创建Datasets
RDDs交互操作
使用反射推测schema
程序指定schema
数据源
通用的加载/保存函数
手动指定选择项
直接在文件上执行SQL
保存模式
保存到持久化表
Parquet文件
程序加载数据
分区发现
Schema合并
Hive metastore Parquet table conversion
Hive/Parquet Schema Reconciliation
Metadata Refreshing
配置
JSON数据集
Hive Tables
Interacting with Different Versions of Hive Metastore
JDBC To Other Databases
Troubleshooting
Performance Tuning
Caching Data In Memory
Other Configuration Options
Distributed SQL Engine
Running the Thrift JDBC/ODBC server
Running the Spark SQL CLI
Migration Guide
Upgrading From Spark SQL 1.6 to 2.0
Upgrading From Spark SQL 1.5 to 1.6
Upgrading From Spark SQL 1.4 to 1.5
Upgrading from Spark SQL 1.3 to 1.4
DataFrame data reader/writer interface
DataFrame.groupBy retains grouping columns
Behavior change on DataFrame.withColumn
Upgrading from Spark SQL 1.0-1.2 to 1.3
Rename of SchemaRDD to DataFrame
Unification of the Java and Scala APIs
Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)
Removal of the type aliases in org.apache.spark.sql for DataType (Scala-only)
UDF Registration Moved to sqlContext.udf (Java & Scala)
Python DataTypes No Longer Singletons
Compatibility with Apache Hive
Deploying in Existing Hive Warehouses
Supported Hive Features
Unsupported Hive Functionality
Reference
Data Types
NaN Semantics
Powered by
GitBook
配置
配置
results matching "
"
No results matching "
"