you are required to write a standalone python program that translates
Search for question
Question
you are required
to write a standalone Python program that translates misspelled/rearranged
English words into their correct English spelling!
Requirements:
Your program is required to do the following:
1. Read data from a file into a Python list (including the use of exception
handling).
2. Make use of a binary search when navigating through a list in alphabetic
order
in order to make your program work more efficiently.
Specifications:
This assignment will focus on the use of loops and decision control, logical
operations,
functions (built-in and user-defined), strings and string functions, slices,
lists and
list functions, list comprehensions, dictionaries, FILE I/0, exception
handling, as well
as the efficient use of various data structures and algorithms within the
Python
programming language.
Read the following paragraph as quickly as you can and see if you encounter
any
difficulties:
Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosnt
mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt
tihng is taht the frist and lsat ltteer is at the rghit pclae. The
rset can be a toatl mses and you can sitll raed it wouthit a
porbelm. Tihs is bcuseae hmuan biegns do not raed ervey lteter by itslef
but the wrod as a wlohe.
This was published as an example of a principle of human reading
comprehension. If
you keep the first letter and the last letter of a word in their correct
positions,
then scramble the letters in between, the word is still quite readable in the
context of an accompanying paragraph.
Your program will be required to "translate" (i.e. correct) strings of text
containing misspelled words into their correct forms.
Processing:
You MUST code the following Python functions as specified.
def load_dictionary(file_name) :
+ This function accepts the name of a file on disk that contains a list of 370,032 lowercase english dictionary words (one word per line) and loads
the words from the file into a list and returns that list.
This function must make use of exception handling.
The dictionary file can be downloaded here: dictionary.txt
NOTE/WARNING: While care was taken to ensure that words of a
derogatory/vulgar/slang nature have
been removed from the dictionary file (linked above), there
may still contain 1 or more English
words that some individuals may take offense to. If such a
word is encountered, please
note that this was completely unintentional and you are
encouraged to forward a request
to have the word removed if so desired.
The original list was obtained from:
https://github.com/dwyl/english-words/tree/master
def_clean_up_words (word_list, text) :
+ This function accepts a list of english words (f
function)
the load_dictionary
and a text string containing 1 or more incorrectly spelled words and
corrects each
word by searching for the correct version in the dictionary list before
creating
a new string containing the corrected text.
Rules for misspelled/scrambled words are as follows:
1. The first and last characters are always correct and will be in their
correct position.
2. Words of 3 characters or less will always be correctly spelled.
3. Words will NOT contain any embedded punctuation characters (hyphens,
apostrophes, etc),
but a punctionation mark may terminate a word (such as "!") in which case
the terminating
punctuation character is to be ignored when performing the comparison
analysis, but is to
be inserted into the corrected word once the analysis is complete.
For example a word of "Eerkua!" would be converted to "Eureka!".
There may only be at most 1 punctuation character terminating a word.
4. Words in the text string may be in either upper or lowercase.
5. The corrected word must contain every character in the scrambled word and
case sensitivity
of the scrambled word must be maintained in the corrected word.
6. If there are multiple possibilities for a correct word, then only the
first
word that matches the criteria above will be accepted.
7. Not every word in the text string may be incorrectly spelled/scrambled.
In order to make your solution to this function more efficient, this
function must
make use of a binary search algorithm when searching for words by the first
character.
A binary search will be demontrated in class. Once every word in the text string has been corrected, this function must
return
BOTH the original string (scrambled) AND the corrected string.
MAIN PROGRAM:
# Your solution may ONLY use the python modules listed below
# program: almain.py
# author:
danny abesdris
# date:
february 5, 2024
python main() program for PRG550 Winter 2024 Assignment #1
# version: 1.00
# purpose:
import math
import string
import collections
import re
import copy
# YOUR CODE BELOW...
def load_dictionary(file_name) :
your code here
# end def
def clean_up_words (word_list, text)
your code here
# end def
def main():
passage1 = '''\
Aoccdrnig to rsceareh at an Elingsh uinervtisy, it deosnt \
mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt \
tihng is taht the frist and lsat ltteer is at the rghit pclae. The \
rset can be a toatl mses and you can sitll raed it wouthit a \
porbelm. Tihs is bcuseae hmuan biegns do not raed ervey lteter by itslef \
but the wrod as a wlohe! Nxet \
To! Etu? Brute? A! to! etu? brute? a! Aslo''
passage2 = """\
I cnat blveiee taht I can aulaclty uesdnatnrd tihs! The \
phaonmneel pweor of the hmuan mnid is qiute remarkable. I awlyas thought
taht \
slpeling was ipmorantt too, but apparnelty tihs is not so. Hevower \
wihle it is rieealtvly esay to raed sroht wrods, it is not so esay wehn
ridaneg \
legonr wdros. Aslo, msot wdros in Esinglh are sveen leertts lnog or leognr,
and \
the mroe leretts terhe are in a wrod, the mroe dulcifift it bmeecos to
cletrorcy \ infietdy them wehn the ltrtees are ragnearerd. Mroe cmoomn wrdos lkie blal \
and baer raimen mltsoy ungnchead and esay to rizocenge, whereas, lgneor and
less \
cmoomn wdros, like pltuonuim and soulamitunes caghne saillbattunsy scuh taht \
rnciooitegn is srclceay pbsslioe. This atiibly smtes form a garet deal \
of enpicerexe rindaeg cretolcry slelepd wdros and only plopee who can \
adrealy raed pelictroinfy can do this tsak. Tihs tirck does not reeavl \
mcuh aoubt the pscroes of Innreaig to raed, it only ietaindcs that hhligy \
slielkd rrdeaes can omoercve moinr informieepcts wehn dnriiveg mnnaeig!'
book =
load_dictionary("dictionary.txt")
mixed, good = clean_up_words (book, passage1)
print("\
n=SCRAMBLED===
=======")
print("original:")
WC = 1
for w in mixed.split(' ') :
if wc % 9 == 0:
print()
print(w, end=" ")
WC += 1
print("\
n=CLEANED===:
=======")
print("cleaned: ")
WC = 1
for w in good.split(' ') :
if wc % 9 == 0:
print()
print(w, end =" ")
WC += 1
mixed, good = clean_up_words(book, passage2)
print("\
n=SCRAMBLED:
========")
print("original:")
WC = 1
for w in mixed.split(' ') :
if wc % 9 == 0:
print()
print(w, end=" ")
WC += 1
print("\
n=CLEANED====
======")
print("cleaned: ")
WC = 1
for w in good.split(' ') :
if wc % 9 == 0:
print() if
print (w, end =" ")
WC += 1
print()
name
main()
# The expected output is listed below.
=SCRAMBLED=
== _main__":
original:
Aoccdrnig to rsceareh at an Elingsh uinervtisy, it
deosnt mttaer in waht oredr the ltteers in a
wrod are, the olny iprmoatnt tihng is taht the
frist and lsat ltteer is at the rghit pclae.
The rset can be a toatl mses and you
can sitll raed it wouthit a porbelm. Tihs is
bcuseae hmuan biegns do not raed ervey lteter by
itslef but the wrod as a wlohe! Nxet To!
Etu? Brute? A! to! etu? brute? a! Aslo
=CLEANED==
=======
cleaned:
According to research at an English university, it
doesnt matter in what order the letters in a
word are, the only important thing is that the
first and last letter is at the right place.
The rest can be a total mess and you
can still read it without a problem. This is
because human begins do not read every letter by
itself but the word as a whole! Next To!
Etu? Brute? A! to! etu? brute? a! Also
=SCRAMBLED===
====
original:
I cnat blveiee taht I can aulaclty uesdnatnrd
tihs! The phaonmneel pweor of the hmuan mnid is
qiute remarkable. I awlyas thought taht slpeling was ipmorantt
too, but apparnelty tihs is not so. Hevower wihle
it is rieealtvly esay to raed sroht wrods, it
is not so esay wehn ridaneg legonr wdros. Aslo,
msot wdros in Esinglh are sveen leertts lnog or
leognr, and the mroe leretts terhe are in a
wrod, the mroe dulcifift it bmeecos to cletrorcy infietdy
them wehn the ltrtees are ragnearerd. Mroe cmoomn wrdos
lkie blal and baer raimen mltsoy ungnchead and esay
to rizocenge, whereas, lgneor and less cmoomn wdros, like
pltuonuim and soulamitunes caghne saillbattunsy scuh taht rnciooitegn is
srclceay pbsslioe. This atiibly smtes form a garet deal
of enpicerexe rindaeg cretolcry slelepd wdros and only plopee
who can adrealy raed pelictroinfy can do this tsak.
Tihs tirck does not reeavl mcuh aoubt the pscroes
of Innreaig to raed, it only ietaindcs that hhligy
slielkd rrdeaes can omoercve moinr informieepcts wehn dnriiveg mnnaeig!/n 19/02/2024, 03:44
bsearch.txt.txt.html
fh = open ("dictionary.txt")
data
fh.read()
mediatb.blob.core.windows.net/media//65d26e1c3e5c9b9de4dd263e/questions/binary_search_170829003...
fh.close()
words = data.split('\n')
word = "aardvark"
found = False
# print (len (words))
high, low, mid
searches
1
=
=
searches")
len (words), 0, len (words)//2
while not found and low <= high :
print("mid:",mid," word:", words[mid], " low: ", low, high: ", high)
input ("press enter to continue...")
if word == words [mid] :
print("found correct starting point at index:", mid,
print("word: ", words [mid])
found = True
# search by first letter of word
#pos
=
mid
#while word [0]
words [pos] [0] :
#
pos -= 1
#print("first word starting letter:", word [0],
words [pos + 1])
==
elif word > words [mid] :
low
=
mid + 1
elif word < words [mid] :
high = mid - 1
mid = (low + high) // 2
searches += 1
# end while
Any Browser!
||
II
after:", searches,
has index:", pos + 1,
||
word is:",
https://mediatb.blob.core.windows.net/media//65d26e1c3e5c9b9de4dd263e/questions/binary_search_1708290033278.txt
1/1