3 Tasks

Question

3 Tasks

The assignment is broken up into a number of tasks, to help you progressively complete the assignment.

3.1 Task A: Implement the Dictionary and Its Operations Using Array, Linked

List, and Trie (12 marks)

In this task, you will implement a dictionary of English words that allows Add, Search, Delete,

and Auto-completion, using three different data structures: Array (Python's list), Linked List, and

Trie. Each implementation should support the following operations:

• Build a dictionary from a list of words and frequencies: create a dictionary that stores words

and frequencies taken from a given list. This operation is not tested.

(A)dd a word and its frequency to the dictionary. Return True if successful or False if it already

exists in the dictionary.

(S)earch for a word in a dictionary and return its frequency. Return 0 if not found.

4

(D)elete a word from the dictionary. Returns True if successful and False if it doesn't exist in

the dictionary.

• (AC) Auto-complete a given string and return a list of three most frequent words (if any) in the

dictionary that have the string as a prefix. The list can be empty./n3.1.1 Implementation Details

Array-Based Dictionary. In this subtask, you will implement the dictionary using Python's lists,

which are equivalent to arrays in other programming languages. In this implementation, all standard

operations on lists are allowed. Other data structures should NOT be used directly in the main

operations of the array-based dictionary. The usage of the module 'bisect' is allowed. Other data

structures should NOT be used directly in the main operations of the array-based dictionary. See the

Background Section for more details and an example.

Linked-List-Based Dictionary. In this subtask, you will implement the dictionary by using a

singly linked list. Other data structures should NOT be used directly in the main operations of the

array-based dictionary (but Python's list can be used to store intermediate data or the input/output).

See the Background Section for more details and an example.

Trie-Based Dictionary. In this subtask, you will implement the dictionary using the trie data

structure. Both iterative or recursive implementations are acceptable. See the Background Section

for more details and an example.

3.1.2 Operations Details

Operations to perform on the implementations are specified on the command file. They are in the

following format:

[arguments]

where operation is one of {S, A, D, AC} and arguments is for optional arguments of some of the

operations. The operations take the following form:

• S word - searches for word in the dictionary and returns its frequency (returns 0 if not found).

• A word frequency - adds a new word and its frequency to the dictionary, returns True if

succeeded and False if the word already exists in the dictionary.

• D word - deletes word from the dictionary. If fails to delete (word is not in the dictionary),

returns False. Unneeded nodes (after deletion) must be removed.

• AC partial_word - returns a list of three words of highest frequencies in the dictionary that has

partial word as a prefix. These words should be listed in a decreasing order of frequencies.

Maximum three words and minimum zero word will be returned.

As an example of the operations, consider the input and output from the provided testing files,/n3.1.3 Testing Framework

We provide Python skeleton codes (see Table 2) to help you get started and automate the correctness

testing. You may add your own Python modules to your final submission, but please ensure that they

work with the supplied modules and the Python test script.

Debugging. To run the code, from the directory where dictionary_file_based.py is, execute

(use 'python3' on Linux, 'python' on Pycharm):

> python3 dictionary_file_based.py

where

• approach is one of the following: array, linkedlist, trie,

• data filename is the name of the file containing the initial set of words and frequencies,

• command filename is the name of the file with the commands/operations,

• output filename is where to store the output of program.

For example, to run the test with sampleDataToy.txt, type (in Linux, use 'python3', on Pycharm's

terminal, use 'python'):

file

dictionary_file_based.py

description

Code that reads in operation commands from file then executes

those on the specified nearest neighbour data structure. For

debugging your code. DO NOT MODIFY.

dictionary_test_script.py Code that executes dictionary_file_based.py and tests

your code on Linux servers. Make sure your implementation

passes the tests before submission. DO NOT MODIFY.

base_dictionary.py

array_dictionary.py

The base class for the dictionary. DO NOT MODIFY.

Skeleton code that implements an array-based dictionary./n3.1.4 Notes

• If you correctly implement the "To be implemented" parts, you in fact do not need to do anything

else to get the correct output formatting because dictionary_file_based.py will handle this.

• We will run the supplied test script on your implementation on the university's core teaching

servers, e.g., titan.csit.rmit.edu.au, jupiter.csit.rmit.edu.au, saturn.csit.rmit.edu.au.

If you write codes on your own computer, make sure they run without errors/warnings on

these servers before submission. If your codes do not run on the core teaching servers, we

unfortunately won't have the resources to debug each one and cannot award marks for testing.

Please avoid including non-standard Python modules not available on the servers, as

that will cause the test script to fail on your submission.

• All submissions should run with no warnings on Python 3.10 on the core teaching servers. We

will discuss how to install Python 3.10 on our server on a later post on the Canvas Discussion.

3.1.5 Test Data

We provide three sample datasets. The smallest one, sampleDataToy.txt, is for debugging, the

medium one, sampleData.txt, which contains 5000 words from English-Corpora.org, is for testing and

automarking, while the largest one, sampleData200k.txt, which contains 200k words from Kaggle,

is for running experiments and report (you could use other data sources or generate yourself for the

report as well). Each line in these files represents a word and its frequency. For each line, the format

is as follows:

word frequency

This is also the expected input file format when passing data information to dictionary_file_based.py.

When creating your own files for the the analysis part, use this format.

3.2 Task B: Evaluate your Data Structures for Different Operations (18 marks)

In this second task, you will evaluate your implementions in terms of their time complexities for

different operations. You will perform the empirical analysis and report the process and the outcome,

and provide comparisons, comments, interpretations and explanations of the outcome, and your overall

recommendations. The report should be no more than 5 pages, in font size 12 and A4 pages (210x297

mm or 8.3 x 11.7 inches). See the assessment rubric (Appendix A) for the criteria we are seeking in

the report.

Data Generation and Experiment Setup

Apart from sampleData.txt (from iWeb), which contains 5,000 words, we also provide another large

dataset sampleData200k.txt (from Kaggle) with 200,000 unique words and frequencies. You may/n7 Team Structure

This assignment could be done in pairs (group of two) or individually. Either case, you (and your

partner, if any) must add yourself into an Official Group for Assignment 1 (see Section 5). If you have

difficulty in finding a partner, post on the discussion forum. If you opt for a team of two, it is still

your sole responsibility to deliver the implementation and the report by the deadline, even

when the team breaks down for some reason, e.g., your partner doesn't show up for meetings, leaves

the team, or, in some rare case, withdraws from the course. Effective/frequent communication and

early planning are key.

11

3 Tasks

Get Instant Homework HelpOn Your Mobile

Get Instant Homework HelpOn Your Mobile

Get Instant Homework HelpOn Your Mobile