Question

For final project, you will develop an automated web crawler to collect data from a website of your choice, store them in a database, and perform data analytics using Python. 1) First, identify a website as your data source, then identify target data fields your team plans to collect. You should aim to collect as much data as possible, even if you do not initially expect to use some data fields for analysis. This is because retroactive collection could be time-consuming if you find that you are missing some needed data later on. 2) Set up a database to store your data. a) You can use any database, including sqlite3, MS SQL Server, MySQL, MongoDB, etc. But note that Excel or a csv file is not a database. b) Based on the data fields you identified from the website, design and create one or more tables that host your dataset to be collected. c) After finalizing your database tables, develop the web crawler so that it directly inserts data into your database (instead of, for example, downloading files as a csv file first, then importing the csv file into the database). 3) Based on collected data in the database, perform some analyses to obtain insights. The types of analyses can include at least two of the following (but not limited to): a) Descriptive analysis b) Visualization c) Regression d) Sentiment analysis e) Other text mining analysis 4) Present your work in a video presentation.