Posts

Some interesting libraries or modules…

  Python:  Python is a popular programming language. Python is dynamically-typed and garbage-collected programming language. It was created by Guido van Rossum during 1985- 1990. It is one of the world’s most popular, in-demand programming languages. Reasons: It’s easy to learn. It’s super versatile. It has a huge range of modules and libraries. Modules:  A module is a collection of code or functions that uses the .py extension. Libraries:  A Python library is a set of related modules or packages bundled together. Python provides some interesting modules and libraries. Pillow:  It is a lightweight image processing tool that aid in editing, creating, and saving images. Pillow supports many image file formats including BMP, PNG, JPEG, and TIFF. Python Imaging Library is a free and open-source additional library for the Python programming language that adds support for opening, manipulating, and saving many different images file formats. To install pillow: pip install Pillow from PIL imp

Can we replace pandas library with datatable?

Pandas:  Pandas is an open-source library. It provides various data structures and operations for manipulating data and time series.  1. Time taken to readCSV using Pandas and Datatable : import pandas as pd import time file = 'annual-enterprise-survey.csv' start_Time=time.time() df = pd.read_csv(file) end_Time=time.time() # At the end of script execution_time=(end_Time-start_Time) print("Execution time for pandas",execution_time*60) Output: Execution time for pandas 4.621939659118652 import datatable as dt import time start_Time1 =time.time() dt_df = dt.fread(file) dt_df = dt_df.to_pandas() end_Time1=time.time() # At the end of script execution_time1=(end_Time1-start_Time1) print("Execution time for datatable",execution_time1*60) Output: Execution time for datatable 3.1781816482543945 2. Time taken to save CSV using Pandas and Datatable : import pandas as pd import time file = 'annual-enterprise-survey.csv' start_Time=time.time() df = pd.read_csv(fi

Spelling correction using TextBlob in python.

What is TextBlob? TextBlob is python library for processing textual data. It is built on the top of NLTK module. How to install TextBlob?     1. Using pip:         pip install textblob     2. Using conda:         conda install -c conda-forge textblob Some terms that will be frequently used are : · Corpus – Body of text, singular. · Lexicon – Words and their meanings. · Token – Each “entity” that is a part of whatever was split up based on rules. For examples, each word is a token when a sentence is “tokenized” into words. Each sentence can also be a token, if you tokenized the sentences out of a paragraph. Textblob(text,tokenizer= None, np_extractor=None,pos_tagger=None,analyzer=None,classifier=None):  A general text block, meant for larger bodies of text.     Parameters:      text: string     tokenzier:  (optional) A tokenizer instance. If None, defaults to WordTokenizer().     np_extractor:  (optional) An NPExtractor instance. If None, defaults to FastNPExtractor().     pos_tagger: 

Second Library: Matplotlib

Image
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. To install matplotlib: Install using pip:              pip install matplotlib Install using conda:        conda install matplotlib Matplotlib consists of the following submodules:  matplotlib.pyplot matplotlib.animation matplotlib.artist matplotlib.image matplotlib.text matplotlib.textpath There are many submodules of matplotlib. Most of the Matplotlib utilities lies under the pyplot  submodule.  Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. Introductory: The basics of creating visualizations with Matplotlib. Matplotlib graphs  where points can be specified in terms of x-y coordinates.  The simplest way of creating a Figure with an Axes is using pyplot.subplots.  We can then use plot  to draw some data on the Axes:          Parts of figur

First Library: Pandas in Python

Image
Pandas is an open-source library. P andas is a   It provides various data structures and operations for manipulating numerical data and time series. This library is built on top of the NumPy library. Pandas is fast and it has high performance & productivity for users. Pandas data table representation: How to install Pandas in Python? Install pandas via pip--> pip install pandas How to import Pandas? import pandas  How to Create a data frame using Pandas? import pandas as pan df = pan.DataFrame(     {         "Name": [             "Braund, Mr. Owen Harris",             "Allen, Mr. William Henry",             "Bonnell, Miss. Elizabeth",         ],         "Age": [22, 35, 58],         "Sex": ["male", "male", "female"],     } ) print(df) When using a Python dictionary of lists, the dictionary keys will be used as column headers and the values in each list as columns of the Data frame. Each colu