How to do a diff between 2 dataframes using Pandas and DataComPy Pandas is a popular open-source Python library used for data manipulation, analysis, and visualization. It provides data structures and functions…
Tag: data-science
How to setup Zeppelin Notebooks on Ubuntu
Apache Zeppelin is a web-based notebook that enables interactive data exploration, visualization, and collaboration. It supports a wide range of data sources, including Apache Spark, Hadoop, and relational databases. With Zeppelin Notebooks,…
How to setup Spark on Ubuntu
Apache Spark is an open-source distributed computational framework that is created to provide faster computational results. It is an in-memory computational engine, meaning the data will be processed in memory. Spark supports…