Apache Spark 3.0 Release Note (Preview)
Apache Spark 3.0 is released and available for testing in preview mode. The release was done on 2019-Nov-08 and it was announed via twiter. The preview mode is lauched to enable wide-scale community testing of this major release.
This Spark 3.0 preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code that will become Spark 3.0. If you would like to test the release, please download it, and send feedback using either the mailing lists or JIRA.
The Spark issue tracker already contains a list of features in 3.0.
What's new in Spark 3.0
The Spark 3.0 is faster, easier, and smarter. Apache Spark 3.0 extends its scope with more than 3000 resolved JIRAs. The features are exciting and data developers as well as machine learning engineerings will find it exciting to explore them. Along with list feature list other major initiatives that are coming in the future. You will find lot of good and intitutive example and demos in future articles.
The following features are covered:
- Accelerator-aware scheduling
- Adaptive query execution
- Dynamic partition pruning
- Join hints
- New query explain
- Better ANSI compliance
- Observable metrics
- New UI for structured streaming
- New UDAF and built-in functions
- New unified interface for Pandas UDF
- Various enhancements in the built-in data sources [e.g., parquet, ORC and JDBC].
You can get the summary of each newly added here in this article.
Additional PySpark Resource & Reading Material
PySpark Frequentl Asked Question
Refer our PySpark FAQ space where important queries and informations are clarified. It also links to important PySpark Tutorial apges with-in site.
PySpark Examples Code
Find our GitHub Repository which list PySpark Example with code snippet
PySpark/Spark Related Interesting Blogs
Here are the list of informative blogs and related articles, which you might find interesting