Can we replace pandas library with datatable?
Pandas:
Pandas is an open-source library. It provides various data structures and operations for manipulating data and time series.
1. Time taken to readCSV using Pandas and Datatable :
import pandas as pd
import time
file = 'annual-enterprise-survey.csv'
start_Time=time.time()
df = pd.read_csv(file)
end_Time=time.time() # At the end of script
execution_time=(end_Time-start_Time)
print("Execution time for pandas",execution_time*60)
Output:
Execution time for pandas 4.621939659118652
import datatable as dt
import time
start_Time1 =time.time()
dt_df = dt.fread(file)
dt_df = dt_df.to_pandas()
end_Time1=time.time() # At the end of script
execution_time1=(end_Time1-start_Time1)
print("Execution time for datatable",execution_time1*60)
Output:
Execution time for datatable 3.1781816482543945
2. Time taken to save CSV using Pandas and Datatable :
import pandas as pd
import time
file = 'annual-enterprise-survey.csv'
start_Time=time.time()
df = pd.read_csv(file)
df.to_csv(file)
end_Time=time.time() # At the end of script
execution_time=(end_Time-start_Time)
print("Execution time for pandas",execution_time*60)
Output:
Execution time for pandas 19.438705444335938
import datatable as dt
import time
start_Time1 =time.time()
dt_df = dt.fread(file)
dt_df.to_csv(file)
end_Time1=time.time() # At the end of script
execution_time1=(end_Time1-start_Time1)
print("Execution time for datatable",execution_time1*60)
Output:
Execution time for datatable 9.057526588439941
Datatable:
To install datatable:
pip install datatable
Load Data: The fundamental unit of analysis in datatable is a data Frame. It is the same notion as a pandas Dataframe.
import datatable as dt
import math
dt = dt.Frame(A=range(5), B=[1.7, 3.4, 0, None, -math.inf],
stypes={"A": dt.int64})
print(dt)
Fread(): You can also load a CSV/text/Excel file. Fread() is used to load files. It can automatically detect parse parameters for the majority of text files, load data from .zip archives or URL's, read Excel files, and much more.
import datatable as dt
import math
file = 'annual-enterprise-survey.csv'
dt_df = dt.fread(file)
Create a Frame:
import datatable as dt
import numpy as np
import pandas as pd
np.random.seed(1)
NP = np.random.randn(100)
dt.Frame(NP)
PD = pd.DataFrame({"A": range(1000)})
dt.Frame(PD)
Basic Frame Properties:
file = 'annual-enterprise-survey.csv'
dt_df = dt.fread(file)
print(dt_df.shape)
print(dt_df.names)
print(dt_df.types)
Frame Statistics:
Compute pre-column summary statistics using the following Frame's methods:
print(dt_df.sum())
print(dt_df.max())
print(dt_df.min())
print(dt_df.mean())
print(dt_df.sd())
print(dt_df.mode())
print(dt_df.nmodal())
print(dt_df.nunique())
Delete a Rows and Columns:
del dt_df[:, "Units"]
For more information about datatable kindly check following documentation.
Please 🙏 upvote my answers. Your one vote inspires me to share my knowledge with you all. 😊
ɪꜰ ʏᴏᴜ ᴀʀᴇ ɪɴᴛᴇʀᴇꜱᴛᴇᴅ ɪɴ ᴘʏᴛʜᴏɴ ᴘʀᴏɢʀᴀᴍᴍɪɴɢ, ᴘʟᴇᴀꜱᴇ ɢᴏ ᴛᴏ ꜰᴏʟʟᴏᴡɪɴɢ ʙʟᴏɢ.
ʟᴇᴛ ᴍᴇ ᴋɴᴏᴡ ɪꜰ ʏᴏᴜ ʜᴀᴠᴇ ᴀɴʏ Qᴜᴇʀɪᴇꜱ ᴏʀ Qᴜᴇꜱᴛɪᴏɴꜱ.
pratikshagarkar871999@gmail.com
Comments
Post a Comment
If you have any doubt, please let me know. To check my other blog kindly check the following links:
https://pythoholic.blogspot.com/
If you are interested in reading Marathi stories and other stuff, kindly check the following link.
https://pratilipi.page.link/q8dZ4ffZwKPHUx6R9
ꜰᴏʀ ᴇxᴘʟᴏʀɪɴɢ ᴛʜᴇ ᴡᴏʀʟᴅ ᴘʟᴇᴀꜱᴇ ʜᴀᴠᴇ ʟᴏᴏᴋ ᴀɴᴅ ꜰᴏʟʟᴏᴡ.
https://maps.app.goo.gl/jnKyzdDpKMFutUqR7