Pandas API on Spark#
- Options and settings
 - From/to pandas and PySpark DataFrames
 - Transform and apply a function
 - Type Support in Pandas API on Spark
 - Type Hints in Pandas API on Spark
 - From/to other DBMSes
 - Best Practices
- Leverage PySpark APIs
 - Check execution plans
 - Use checkpoint
 - Avoid shuffling
 - Avoid computation on single partition
 - Avoid reserved column names
 - Do not use duplicated column names
 - Specify the index column in conversion from Spark DataFrame to pandas-on-Spark DataFrame
 - Use 
distributedordistributed-sequencedefault index - Handling index misalignment with 
distributed-sequence - Reduce the operations on different DataFrame/Series
 - Use pandas API on Spark directly whenever possible
 
 - Supported pandas API
 - FAQ