Can we replace pandas library with datatable?

October 21, 2022

Pandas:

Pandas is an open-source library. It provides various data structures and operations for manipulating data and time series.

1. Time taken to readCSV using Pandas and Datatable :

import pandas as pd

import time

file = 'annual-enterprise-survey.csv'

start_Time=time.time()

df = pd.read_csv(file)

end_Time=time.time() # At the end of script

execution_time=(end_Time-start_Time)

print("Execution time for pandas",execution_time*60)

Output:

Execution time for pandas 4.621939659118652

import datatable as dt

import time

start_Time1 =time.time()

dt_df = dt.fread(file)

dt_df = dt_df.to_pandas()

end_Time1=time.time() # At the end of script

execution_time1=(end_Time1-start_Time1)

print("Execution time for datatable",execution_time1*60)

Output:

Execution time for datatable 3.1781816482543945

2. Time taken to save CSV using Pandas and Datatable :

import pandas as pd

import time

file = 'annual-enterprise-survey.csv'

start_Time=time.time()

df = pd.read_csv(file)

df.to_csv(file)

end_Time=time.time() # At the end of script

execution_time=(end_Time-start_Time)

print("Execution time for pandas",execution_time*60)

Output:

Execution time for pandas 19.438705444335938

import datatable as dt

import time

start_Time1 =time.time()

dt_df = dt.fread(file)

dt_df.to_csv(file)

end_Time1=time.time() # At the end of script

execution_time1=(end_Time1-start_Time1)

print("Execution time for datatable",execution_time1*60)

Output:

Execution time for datatable 9.057526588439941

Datatable:

To install datatable:

pip install datatable

Load Data: The fundamental unit of analysis in datatable is a data Frame. It is the same notion as a pandas Dataframe.

import datatable as dt

import math

dt = dt.Frame(A=range(5), B=[1.7, 3.4, 0, None, -math.inf],

stypes={"A": dt.int64})

print(dt)

Fread(): You can also load a CSV/text/Excel file. Fread() is used to load files. It can automatically detect parse parameters for the majority of text files, load data from .zip archives or URL's, read Excel files, and much more.

import datatable as dt

import math

file = 'annual-enterprise-survey.csv'

dt_df = dt.fread(file)

Create a Frame:

import datatable as dt

import numpy as np

import pandas as pd

np.random.seed(1)

NP = np.random.randn(100)

dt.Frame(NP)

PD = pd.DataFrame({"A": range(1000)})

dt.Frame(PD)

Basic Frame Properties:

file = 'annual-enterprise-survey.csv'

dt_df = dt.fread(file)

print(dt_df.shape)

print(dt_df.names)

print(dt_df.types)

Frame Statistics:

Compute pre-column summary statistics using the following Frame's methods:

print(dt_df.sum())

print(dt_df.max())

print(dt_df.min())

print(dt_df.mean())

print(dt_df.sd())

print(dt_df.mode())

print(dt_df.nmodal())

print(dt_df.nunique())

Delete a Rows and Columns:

del dt_df[:, "Units"]

For more information about datatable kindly check following documentation.

Datatable

Please 🙏 upvote my answers. Your one vote inspires me to share my knowledge with you all. 😊

ɪꜰ ʏᴏᴜ ᴀʀᴇ ɪɴᴛᴇʀᴇꜱᴛᴇᴅ ɪɴ ᴘʏᴛʜᴏɴ ᴘʀᴏɢʀᴀᴍᴍɪɴɢ, ᴘʟᴇᴀꜱᴇ ɢᴏ ᴛᴏ ꜰᴏʟʟᴏᴡɪɴɢ ʙʟᴏɢ.

Pythoholic

ʟᴇᴛ ᴍᴇ ᴋɴᴏᴡ ɪꜰ ʏᴏᴜ ʜᴀᴠᴇ ᴀɴʏ Qᴜᴇʀɪᴇꜱ ᴏʀ Qᴜᴇꜱᴛɪᴏɴꜱ.

pratikshagarkar871999@gmail.com

Search This Blog

Pythoholic: Python conepts and projects.