Spark SQL, DataFrames and Datasets指南
Introduction
概述
- SQL
- DataFrames
- Datasets
准备开始
数据源
- 通用的加载/保存函数
- Parquet文件
  - 程序加载数据
  - 分区发现
  - Schema合并
  - Hive metastore Parquet table conversion
    - Hive/Parquet Schema Reconciliation
    - Metadata Refreshing
  - 配置
- JSON数据集
- Hive Tables
  - Interacting with Different Versions of Hive Metastore
- JDBC To Other Databases
- Troubleshooting
Performance Tuning
- Caching Data In Memory
- Other Configuration Options
Distributed SQL Engine
- Running the Thrift JDBC/ODBC server
- Running the Spark SQL CLI
Migration Guide
- Upgrading From Spark SQL 1.6 to 2.0
- Upgrading From Spark SQL 1.5 to 1.6
- Upgrading From Spark SQL 1.4 to 1.5
- Upgrading from Spark SQL 1.3 to 1.4
  - DataFrame data reader/writer interface
  - DataFrame.groupBy retains grouping columns
  - Behavior change on DataFrame.withColumn
- Upgrading from Spark SQL 1.0-1.2 to 1.3
  - Rename of SchemaRDD to DataFrame
  - Unification of the Java and Scala APIs
  - Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)
  - Removal of the type aliases in org.apache.spark.sql for DataType (Scala-only)
  - UDF Registration Moved to sqlContext.udf (Java & Scala)
  - Python DataTypes No Longer Singletons
- Compatibility with Apache Hive
  - Deploying in Existing Hive Warehouses
  - Supported Hive Features
  - Unsupported Hive Functionality
Reference
- Data Types
- NaN Semantics

配置

results matching ""