Spelling correction using TextBlob in python.

What is TextBlob?
TextBlob is python library for processing textual data. It is built on the top of NLTK module.

How to install TextBlob?
    1. Using pip:
        pip install textblob
    2. Using conda:
        conda install -c conda-forge textblob

Some terms that will be frequently used are :
· Corpus – Body of text, singular.

· Lexicon – Words and their meanings.

· Token – Each “entity” that is a part of whatever was split up based on rules. For examples, each word is a token when a sentence is “tokenized” into words. Each sentence can also be a token, if you tokenized the sentences out of a paragraph.

Textblob(text,tokenizer= None,
np_extractor=None,pos_tagger=None,analyzer=None,classifier=None): A general text block, meant for larger bodies of text.
    Parameters: 
    text: string
    tokenzier: (optional) A tokenizer instance. If None, defaults to WordTokenizer().
    np_extractor: (optional) An NPExtractor instance. If None, defaults to FastNPExtractor().
    pos_tagger: (optional) A Tagger instance. If None, defaults to NLTKTagger.
    analyzer: (optional) A sentiment analyzer. If None, defaults to PatternAnalyzer.
    classifier: (optional) A classifier.

classify(): Classify the blob using the blob’s classifier.
correct(): Attempt to correct the spelling of a blob.
detect_language(): Detect the blob’s language using the Google Translate API.
find(): Behaves like the built-in str.find() method. Returns an integer, the index of the first occurrence of the substring argument sub in the sub-string given by [start:end].
join(): Behaves like the built-in str.join(iterable) method, except returns a blob object.
json: The json representation of this blob.
lower(): Like str.lower(), returns new object with all lower-cased characters.
translate(from_lang=u'auto', to=u'en'): Translate the blob to another language. Uses the Google Translate API. Returns a new TextBlob.
upper(): Like str.upper(), returns new object with all upper-cased characters.
Tagging: Part-of-speech tags can be accessed through the tag property.
    from textblob import TextBlob
    wiki = TextBlob("Python is a high-level, general-purpose programming languages.")
    print(wiki.tags)

    Output: [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('high-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('languages', 'NNS')]

 Noun Phrase Extraction: Noun phrases are accessed through the noun_phrases property.
    print(wiki.noun_phrases)

Tokenization: You can break TextBlobs into words or sentences.
    wiki = TextBlob("Python is a high-level, general-purpose programming languages."
            "Explicit is better than implicit. ")
    print(wiki.words)
    print(wiki.sentences)

How to get correct sentence without any spelling mistake in python?
    from textblob import TextBlob
    gfg = TextBlob("every pian gives a lessson and every lessson chages a pirson.")
    # using TextBlob.correct() method
    gfg = gfg.correct()
    print(gfg)

How to convert each to lower case?
    gfg = TextBlob("Every Pian Gives a lessson and every Lessson chages a pirson.")
    print(gfg.lower())





Comments

Popular posts from this blog

How to convert PDF file into audio file?

How to perform operations on emails and folders using imap_tools?

Pillow Libary in Python.