Apache Spark 3.0 Release Note (Preview)

Apache Spark 3.0 is released and available for testing in preview mode. The release was done on 2019-Nov-08 and it was announed via twiter. The preview mode is lauched to enable wide-scale community testing of this major release.

This Spark 3.0 preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the mailing lists or JIRA.

The Spark issue tracker already contains a list of features in 3.0.

What's new in Spark 3.0

The Spark 3.0 is faster, easier, and smarter. Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. The features are exciting and data developers as well as machine learning engineerings will find it exciting to explore them. Along with list feature list other major initiatives that are coming in the future. You will find lot of good and intitutive example and demos in future articles.

The following features are covered:

Accelerator-aware scheduling
Adaptive query execution
Dynamic partition pruning
Join hints
New query explain
Better ANSI compliance
Observable metrics
New UI for structured streaming
New UDAF and built-in functions
New unified interface for Pandas UDF
Various enhancements in the built-in data sources [e.g., parquet, ORC and JDBC].

You can get the summary of each newly added here in this article.

Additional PySpark Resource & Reading Material

PySpark Frequentl Asked Question

Refer our PySpark FAQ space where important queries and informations are clarified. It also links to important PySpark Tutorial apges with-in site.

PySpark Examples Code

Find our GitHub Repository which list PySpark Example with code snippet

PySpark/Spark Related Interesting Blogs

Here are the list of informative blogs and related articles, which you might find interesting

01 May 2020

Apache Spark 3

« Snowflake SnowPro Practice Test PySpark Jupyter Notebook Configuration On Windows »

Topper Tips