Search for question
Question

For final project, you will develop an automated web crawler to collect data from a website of your

choice, store them in a database, and perform data analytics using Python.

1) First, identify a website as your data source, then identify target data fields your team plans to

collect. You should aim to collect as much data as possible, even if you do not initially expect to use

some data fields for analysis. This is because retroactive collection could be time-consuming if you

find that you are missing some needed data later on.

2) Set up a database to store your data.

a)

You can use any database, including sqlite3, MS SQL Server, MySQL, MongoDB, etc. But note

that Excel or a csv file is not a database.

b)

Based on the data fields you identified from the website, design and create one or more

tables that host your dataset to be collected.

c) After finalizing your database tables, develop the web crawler so that it directly inserts data

into your database (instead of, for example, downloading files as a csv file first, then

importing the csv file into the database).

3) Based on collected data in the database, perform some analyses to obtain insights. The types of

analyses can include at least two of the following (but not limited to):

a) Descriptive analysis

b) Visualization

c) Regression

d) Sentiment analysis

e) Other text mining analysis

4) Present your work in a video presentation.