Posts

Showing posts from October, 2022

Can we replace pandas library with datatable?

Pandas:  Pandas is an open-source library. It provides various data structures and operations for manipulating data and time series.  1. Time taken to readCSV using Pandas and Datatable : import pandas as pd import time file = 'annual-enterprise-survey.csv' start_Time=time.time() df = pd.read_csv(file) end_Time=time.time() # At the end of script execution_time=(end_Time-start_Time) print("Execution time for pandas",execution_time*60) Output: Execution time for pandas 4.621939659118652 import datatable as dt import time start_Time1 =time.time() dt_df = dt.fread(file) dt_df = dt_df.to_pandas() end_Time1=time.time() # At the end of script execution_time1=(end_Time1-start_Time1) print("Execution time for datatable",execution_time1*60) Output: Execution time for datatable 3.1781816482543945 2. Time taken to save CSV using Pandas and Datatable : import pandas as pd import time file = 'annual-enterprise-survey.csv' start_Time=time.time() df = pd.read_csv(fi