How to convert PDF file into audio file?
To convert PDF files into an audio file, we use PyPDF2 and Pyttsx3 libraries. First, we use PyPDF2 for reading text from PDF and then use Pyttsx3 to convert extracted text into audio files.
Let’s start with the PyPDF2 library.
PyPDF2: PyPDF2 is a library in python that is used to read text from PDF. A
Pure-Python library built as a PDF toolkit. It is capable of extracting
document information, splitting documents page by page, merging documents page
by page, etc.
Pyttxs3: Pyttx3 is a text-to-speech library. It
has many functions which will help the machine to communicate with us. It will help
the machine to speak to us.
How to install the above libraries?
Pip install PyPDF2
Pip install
pyttsx3
Example:
import pyttsx3,PyPDF2
pdfObj = open('sample.pdf','rb')
pdfreader = PyPDF2.PdfFileReader(pdfObj)
speaker = pyttsx3.init()
for page_num in range(pdfreader.numPages):
text = pdfreader.getPage(page_num).extractText() ## extracting text from the PDF
cleaned_text = text.strip().replace('\n',' ') ## Removes unnecessary spaces and break lines
print(cleaned_text)
#speaker.say(cleaned_text) ## Let The Speaker Speak The Text
speaker.save_to_file(cleaned_text,'story.mp3') ## Saving Text In a audio file 'story.mp3'
speaker.runAndWait()
speaker.stop()
Output:
In the above script we used,
1.
pdfFileObj = open ('sample.pdf', 'rb') We opened the sample.pdf in binary mode. and saved the file object as pdfFileObj.
pdfreader = PyPDF2.PdfFileReader(pdfFileObj)
Here, we create an object of PdfFileReader class of the PyPDF2 module and pass the pdf file object & get a pdf reader object.
speaker = pyttsx3.init()
Gets a reference to an engine instance that will use the given driver. If the requested driver is already in use by another engine instance, that engine is returned. Otherwise, a new engine is created.
- for page_num in range(pdfreader.numPages):
Iterate pdf from the first page to the last
page.
- text = pdfreader.getPage(page_num).extractText()
- cleaned_text = text.strip().replace('\n',' ')
Removes unnecessary spaces and break lines.
- speaker.say(cleaned_text)
This function
will convert the text to speech
- speaker.save_to_file(cleaned_text,' sample.mp3')
Saving text in an audio file ' sample.mp3'
- speaker.runAndWait()
This function will make the speech audible in the system, if
you don't write this command then the speech will not be audible to you.
- speaker.stop()
To stop the speaker object.
PdfFileReader(): PdfFileReader() to read the PDF. We just must
give the path of the PDF as the argument. PdfFileReader class provides lots of
methods or functions to interact with PDF.
getNumPages
(): Calculates the number of
pages in this PDF file.
decrypt
(password): When
using an encrypted/secured PDF file with the PDF Standard encryption handler,
this function will allow the file to be decrypted. It checks the given password
against the document’s user password and owner password and then stores the
resulting decryption key if either password is correct.
getDocumentInfo (): Read-only property that accesses the getDocumentInfo()
function.
getPageNumber
(page): Retrieve page number of a given PageObject.
extractText (): Extracting text from page.
By Adding Little Bit of Web Scraping, The Same Script Can Be Used To Read Text From Sites Like Wikipedia
import requests
from bs4 import BeautifulSoup
import pyttsx3
url ='https://en.wikipedia.org/wiki/Kabaddi'
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
}
r = requests.get(url)
soup = BeautifulSoup(r.content,'lxml')
speaker = pyttsx3.init()
headings = soup.find_all('p')
speaker.setProperty('rate', 100)
for i in headings:
text = i.text
cleaned_text =
text.strip().replace('\n',' ')
speaker.say(cleaned_text)
speaker.save_to_file(cleaned_text,'Kabaddi_Wiki.mp3')
speaker.runAndWait()
speaker.stop()
Comments
Post a Comment
If you have any doubt, please let me know. To check my other blog kindly check the following links:
https://pythoholic.blogspot.com/
If you are interested in reading Marathi stories and other stuff, kindly check the following link.
https://pratilipi.page.link/q8dZ4ffZwKPHUx6R9
ꜰᴏʀ ᴇxᴘʟᴏʀɪɴɢ ᴛʜᴇ ᴡᴏʀʟᴅ ᴘʟᴇᴀꜱᴇ ʜᴀᴠᴇ ʟᴏᴏᴋ ᴀɴᴅ ꜰᴏʟʟᴏᴡ.
https://maps.app.goo.gl/jnKyzdDpKMFutUqR7