Rise in Spark’s popularity coincided with that of the Python data (PyData) ecosystem. So it makes sense that Spark’s Python API—PySpark—has significantly grown in popularity over the last few years. Although the PyData ecosystem has recently sprung up some distributed programming options, Apache Spark remains one of the most popular choices for working with large datasets across industries and domains. Combined with the recent efforts to integrate PySpark with the other PyData tools, learning the framework can help you boost your productivity significantly as a data science practitioner.
نظرات کاربران