Spelling correction using TextBlob in python.

September 12, 2022

What is TextBlob?

TextBlob is python library for processing textual data. It is built on the top of NLTK module.

How to install TextBlob?

1. Using pip:

pip install textblob

2. Using conda:

conda install -c conda-forge textblob

Some terms that will be frequently used are :

· Corpus – Body of text, singular.

· Lexicon – Words and their meanings.

· Token – Each “entity” that is a part of whatever was split up based on rules. For examples, each word is a token when a sentence is “tokenized” into words. Each sentence can also be a token, if you tokenized the sentences out of a paragraph.

Textblob(text,tokenizer= None,

np_extractor=None,pos_tagger=None,analyzer=None,classifier=None): A general text block, meant for larger bodies of text.

Parameters:

text: string

tokenzier: (optional) A tokenizer instance. If None, defaults to WordTokenizer().

np_extractor: (optional) An NPExtractor instance. If None, defaults to FastNPExtractor().

pos_tagger: (optional) A Tagger instance. If None, defaults to NLTKTagger.

analyzer: (optional) A sentiment analyzer. If None, defaults to PatternAnalyzer.

classifier: (optional) A classifier.

classify(): Classify the blob using the blob’s classifier.

correct(): Attempt to correct the spelling of a blob.

detect_language(): Detect the blob’s language using the Google Translate API.

find(): Behaves like the built-in str.find() method. Returns an integer, the index of the first occurrence of the substring argument sub in the sub-string given by [start:end].

join(): Behaves like the built-in str.join(iterable) method, except returns a blob object.

json: The json representation of this blob.

lower(): Like str.lower(), returns new object with all lower-cased characters.

translate(from_lang=u'auto', to=u'en'): Translate the blob to another language. Uses the Google Translate API. Returns a new TextBlob.

upper(): Like str.upper(), returns new object with all upper-cased characters.

Tagging: Part-of-speech tags can be accessed through the tag property.

from textblob import TextBlob

wiki = TextBlob("Python is a high-level, general-purpose programming languages.")

print(wiki.tags)

Output: [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('languages', 'NNS')]

Noun Phrase Extraction: Noun phrases are accessed through the noun_phrases property.

print(wiki.noun_phrases)

Tokenization: You can break TextBlobs into words or sentences.

wiki = TextBlob("Python is a high-level, general-purpose programming languages."

"Explicit is better than implicit. ")

print(wiki.words)

print(wiki.sentences)

How to get correct sentence without any spelling mistake in python?

from textblob import TextBlob

gfg = TextBlob("every pian gives a lessson and every lessson chages a pirson.")

# using TextBlob.correct() method

gfg = gfg.correct()

print(gfg)

How to convert each to lower case?

gfg = TextBlob("Every Pian Gives a lessson and every Lessson chages a pirson.")

print(gfg.lower())

Search This Blog

Pythoholic: Python conepts and projects.

Spelling correction using TextBlob in python.

Comments

Post a Comment

Popular posts from this blog

How to perform operations on emails and folders using imap_tools?

How to convert PDF file into audio file?

Pillow Libary in Python.