Can we replace pandas library with datatable?

Pandas: 

Pandas is an open-source library. It provides various data structures and operations for manipulating data and time series. 

1. Time taken to readCSV using Pandas and Datatable :

import pandas as pd
import time

file = 'annual-enterprise-survey.csv'
start_Time=time.time()
df = pd.read_csv(file)
end_Time=time.time() # At the end of script
execution_time=(end_Time-start_Time)
print("Execution time for pandas",execution_time*60)

Output:
Execution time for pandas 4.621939659118652

import datatable as dt
import time
start_Time1 =time.time()
dt_df = dt.fread(file)
dt_df = dt_df.to_pandas()
end_Time1=time.time() # At the end of script
execution_time1=(end_Time1-start_Time1)
print("Execution time for datatable",execution_time1*60)

Output:
Execution time for datatable 3.1781816482543945

2. Time taken to save CSV using Pandas and Datatable :

import pandas as pd
import time
file = 'annual-enterprise-survey.csv'
start_Time=time.time()
df = pd.read_csv(file)
df.to_csv(file)
end_Time=time.time() # At the end of script
execution_time=(end_Time-start_Time)
print("Execution time for pandas",execution_time*60)

Output:
Execution time for pandas 19.438705444335938

import datatable as dt
import time
start_Time1 =time.time()
dt_df = dt.fread(file)
dt_df.to_csv(file)
end_Time1=time.time() # At the end of script
execution_time1=(end_Time1-start_Time1)
print("Execution time for datatable",execution_time1*60)

Output:
Execution time for datatable 9.057526588439941

Datatable:

To install datatable:

pip install datatable  

Load Data: The fundamental unit of analysis in datatable is a data Frame. It is the same notion as a pandas Dataframe. 

import datatable as dt
import math

dt = dt.Frame(A=range(5), B=[1.7, 3.4, 0, None, -math.inf],
               stypes={"A": dt.int64})
print(dt)

Fread(): You can also load a CSV/text/Excel file. Fread() is used to load files. It can automatically detect parse parameters for the majority of text files, load data from .zip archives or URL's, read Excel files, and much more.

import datatable as dt
import math
file = 'annual-enterprise-survey.csv'
dt_df = dt.fread(file)

Create a Frame:
import datatable as dt
import numpy as np
import pandas as pd

np.random.seed(1)
NP = np.random.randn(100)
dt.Frame(NP)
PD = pd.DataFrame({"A": range(1000)})
dt.Frame(PD)

Basic Frame Properties:

file = 'annual-enterprise-survey.csv'
dt_df = dt.fread(file)
print(dt_df.shape)
print(dt_df.names)
print(dt_df.types)
 
Frame Statistics:
Compute pre-column summary statistics using the following Frame's methods: 
 
print(dt_df.sum())
print(dt_df.max())
print(dt_df.min())
print(dt_df.mean())
print(dt_df.sd())
print(dt_df.mode())
print(dt_df.nmodal())
print(dt_df.nunique())

Delete a Rows and Columns:

del dt_df[:, "Units"]  


For more information about datatable kindly check following documentation.


Please 🙏 upvote my answers. Your one vote inspires me to share my knowledge with you all. 😊
ɪꜰ ʏᴏᴜ ᴀʀᴇ ɪɴᴛᴇʀᴇꜱᴛᴇᴅ ɪɴ ᴘʏᴛʜᴏɴ ᴘʀᴏɢʀᴀᴍᴍɪɴɢ, ᴘʟᴇᴀꜱᴇ ɢᴏ ᴛᴏ ꜰᴏʟʟᴏᴡɪɴɢ ʙʟᴏɢ.


ʟᴇᴛ ᴍᴇ ᴋɴᴏᴡ ɪꜰ ʏᴏᴜ ʜᴀᴠᴇ ᴀɴʏ Qᴜᴇʀɪᴇꜱ ᴏʀ Qᴜᴇꜱᴛɪᴏɴꜱ.
pratikshagarkar871999@gmail.com





























Comments

Popular posts from this blog

How to convert PDF file into audio file?

Pillow Libary in Python.

How to perform operations on emails and folders using imap_tools?