pyspark.sql.SparkSession¶
- 
class pyspark.sql.SparkSession(sparkContext, jsparkSession=None)[source]¶
- The entry point to programming Spark with the Dataset and DataFrame API. - A SparkSession can be used create - DataFrame, register- DataFrameas tables, execute SQL over tables, cache tables, and read parquet files. To create a- SparkSession, use the following builder pattern:- 
builder¶
- A class attribute having a - Builderto construct- SparkSessioninstances.
 - Examples - >>> spark = SparkSession.builder \ ... .master("local") \ ... .appName("Word Count") \ ... .config("spark.some.config.option", "some-value") \ ... .getOrCreate() - >>> from datetime import datetime >>> from pyspark.sql import Row >>> spark = SparkSession(sc) >>> allTypes = sc.parallelize([Row(i=1, s="string", d=1.0, l=1, ... b=True, list=[1, 2, 3], dict={"s": 0}, row=Row(a=1), ... time=datetime(2014, 8, 1, 14, 1, 5))]) >>> df = allTypes.toDF() >>> df.createOrReplaceTempView("allTypes") >>> spark.sql('select i+1, d+1, not b, list[1], dict["s"], time, row.a ' ... 'from allTypes where b and i > 0').collect() [Row((i + 1)=2, (d + 1)=2.0, (NOT b)=False, list[1]=2, dict[s]=0, time=datetime.datetime(2014, 8, 1, 14, 1, 5), a=1)] >>> df.rdd.map(lambda x: (x.i, x.s, x.d, x.l, x.b, x.time, x.row.a, x.list)).collect() [(1, 'string', 1.0, 1, True, datetime.datetime(2014, 8, 1, 14, 1, 5), 1, [1, 2, 3])] - Methods - createDataFrame(data[, schema, …])- Creates a - DataFramefrom an- RDD, a list or a- pandas.DataFrame.- Returns the active - SparkSessionfor the current thread, returned by the builder- Returns a new - SparkSessionas new session, that has separate SQLConf, registered temporary views and UDFs, but shared- SparkContextand table cache.- range(start[, end, step, numPartitions])- Create a - DataFramewith single- pyspark.sql.types.LongTypecolumn named- id, containing elements in a range from- startto- end(exclusive) with step value- step.- sql(sqlQuery)- Returns a - DataFramerepresenting the result of the given query.- stop()- Stop the underlying - SparkContext.- table(tableName)- Returns the specified table as a - DataFrame.- Attributes - A class attribute having a - Builderto construct- SparkSessioninstances.- Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc. - Runtime configuration interface for Spark. - Returns a - DataFrameReaderthat can be used to read data in as a- DataFrame.- Returns a - DataStreamReaderthat can be used to read data streams as a streaming- DataFrame.- Returns the underlying - SparkContext.- Returns a - StreamingQueryManagerthat allows managing all the- StreamingQueryinstances active on this context.- Returns a - UDFRegistrationfor UDF registration.- The version of Spark on which this application is running. 
-