The assignment is broken up into a number of tasks, to help you progressively complete the assignment.
3.1 Task A: Implement the Dictionary and Its Operations Using Array, Linked
List, and Trie (12 marks)
In this task, you will implement a dictionary of English words that allows Add, Search, Delete,
and Auto-completion, using three different data structures: Array (Python's list), Linked List, and
Trie. Each implementation should support the following operations:
• Build a dictionary from a list of words and frequencies: create a dictionary that stores words
and frequencies taken from a given list. This operation is not tested.
(A)dd a word and its frequency to the dictionary. Return True if successful or False if it already
exists in the dictionary.
(S)earch for a word in a dictionary and return its frequency. Return 0 if not found.
4
(D)elete a word from the dictionary. Returns True if successful and False if it doesn't exist in
the dictionary.
• (AC) Auto-complete a given string and return a list of three most frequent words (if any) in the
dictionary that have the string as a prefix. The list can be empty./n3.1.1 Implementation Details
Array-Based Dictionary. In this subtask, you will implement the dictionary using Python's lists,
which are equivalent to arrays in other programming languages. In this implementation, all standard
operations on lists are allowed. Other data structures should NOT be used directly in the main
operations of the array-based dictionary. The usage of the module 'bisect' is allowed. Other data
structures should NOT be used directly in the main operations of the array-based dictionary. See the
Background Section for more details and an example.
Linked-List-Based Dictionary. In this subtask, you will implement the dictionary by using a
singly linked list. Other data structures should NOT be used directly in the main operations of the
array-based dictionary (but Python's list can be used to store intermediate data or the input/output).
See the Background Section for more details and an example.
Trie-Based Dictionary. In this subtask, you will implement the dictionary using the trie data
structure. Both iterative or recursive implementations are acceptable. See the Background Section
for more details and an example.
3.1.2 Operations Details
Operations to perform on the implementations are specified on the command file. They are in the
following format:
where operation is one of {S, A, D, AC} and arguments is for optional arguments of some of the
operations. The operations take the following form:
• S word - searches for word in the dictionary and returns its frequency (returns 0 if not found).
• A word frequency - adds a new word and its frequency to the dictionary, returns True if
succeeded and False if the word already exists in the dictionary.
• D word - deletes word from the dictionary. If fails to delete (word is not in the dictionary),
returns False. Unneeded nodes (after deletion) must be removed.
• AC partial_word - returns a list of three words of highest frequencies in the dictionary that has
partial word as a prefix. These words should be listed in a decreasing order of frequencies.
Maximum three words and minimum zero word will be returned.
As an example of the operations, consider the input and output from the provided testing files,/n3.1.3 Testing Framework
We provide Python skeleton codes (see Table 2) to help you get started and automate the correctness
testing. You may add your own Python modules to your final submission, but please ensure that they
work with the supplied modules and the Python test script.
Debugging. To run the code, from the directory where dictionary_file_based.py is, execute
(use 'python3' on Linux, 'python' on Pycharm):
> python3 dictionary_file_based.py
where
• approach is one of the following: array, linkedlist, trie,
• data filename is the name of the file containing the initial set of words and frequencies,
• command filename is the name of the file with the commands/operations,
• output filename is where to store the output of program.
For example, to run the test with sampleDataToy.txt, type (in Linux, use 'python3', on Pycharm's
terminal, use 'python'):
file
dictionary_file_based.py
description
Code that reads in operation commands from file then executes
those on the specified nearest neighbour data structure. For
debugging your code. DO NOT MODIFY.
dictionary_test_script.py Code that executes dictionary_file_based.py and tests
your code on Linux servers. Make sure your implementation
passes the tests before submission. DO NOT MODIFY.
base_dictionary.py
array_dictionary.py
The base class for the dictionary. DO NOT MODIFY.
Skeleton code that implements an array-based dictionary./n3.1.4 Notes
• If you correctly implement the "To be implemented" parts, you in fact do not need to do anything
else to get the correct output formatting because dictionary_file_based.py will handle this.
• We will run the supplied test script on your implementation on the university's core teaching
servers, e.g., titan.csit.rmit.edu.au, jupiter.csit.rmit.edu.au, saturn.csit.rmit.edu.au.
If you write codes on your own computer, make sure they run without errors/warnings on
these servers before submission. If your codes do not run on the core teaching servers, we
unfortunately won't have the resources to debug each one and cannot award marks for testing.
Please avoid including non-standard Python modules not available on the servers, as
that will cause the test script to fail on your submission.
• All submissions should run with no warnings on Python 3.10 on the core teaching servers. We
will discuss how to install Python 3.10 on our server on a later post on the Canvas Discussion.
3.1.5 Test Data
We provide three sample datasets. The smallest one, sampleDataToy.txt, is for debugging, the
medium one, sampleData.txt, which contains 5000 words from English-Corpora.org, is for testing and
automarking, while the largest one, sampleData200k.txt, which contains 200k words from Kaggle,
is for running experiments and report (you could use other data sources or generate yourself for the
report as well). Each line in these files represents a word and its frequency. For each line, the format
is as follows:
word frequency
This is also the expected input file format when passing data information to dictionary_file_based.py.
When creating your own files for the the analysis part, use this format.
3.2 Task B: Evaluate your Data Structures for Different Operations (18 marks)
In this second task, you will evaluate your implementions in terms of their time complexities for
different operations. You will perform the empirical analysis and report the process and the outcome,
and provide comparisons, comments, interpretations and explanations of the outcome, and your overall
recommendations. The report should be no more than 5 pages, in font size 12 and A4 pages (210x297
mm or 8.3 x 11.7 inches). See the assessment rubric (Appendix A) for the criteria we are seeking in
the report.
Data Generation and Experiment Setup
Apart from sampleData.txt (from iWeb), which contains 5,000 words, we also provide another large
dataset sampleData200k.txt (from Kaggle) with 200,000 unique words and frequencies. You may/n7 Team Structure
This assignment could be done in pairs (group of two) or individually. Either case, you (and your
partner, if any) must add yourself into an Official Group for Assignment 1 (see Section 5). If you have
difficulty in finding a partner, post on the discussion forum. If you opt for a team of two, it is still
your sole responsibility to deliver the implementation and the report by the deadline, even
when the team breaks down for some reason, e.g., your partner doesn't show up for meetings, leaves
the team, or, in some rare case, withdraws from the course. Effective/frequent communication and
early planning are key.
11
Fig: 1
Fig: 2
Fig: 3
Fig: 4
Fig: 5