Koalas provides a drop-in replacement for pandas. Commonly used by data scientists, pandas is a Python package that provides easy-to-use data structures and data analysis tools for the Python programming language. However, pandas does not scale out to big data. Koalas fills this gap by providing pandas equivalent APIs that work on Apache Spark. Koalas is useful not only for pandas users but also PySpark users, because Koalas supports many tasks that are difficult to do with PySpark, for example plotting data directly from a PySpark DataFrame.
Koalas is included on clusters running Databricks Runtime 7.3 through 9.1. For clusters running Databricks Runtime 10.0 and above, use Pandas API on Spark instead.
To use Koalas on a cluster running Databricks Runtime 7.0 or below, install Koalas as a Databricks PyPI library.