Instructions for submitting the solution
1. Submit the source code file for every programming assignment like .c file, .java file.
2. If any dataset is used then you need to share all the files with solution in a single zip file.
3. Share the readme file in which you have to include the system specification, required
software and the execution instructions.
4. Share the output of the programs. If it is a single program then you can submit the screenshot,
if multiple files are included then please share the screen recording. Below are some steps that
need to be covered in screen recording.
❖ Show the complete solution according to the instructions.
❖ Need to cover all the compilation steps.
Show the complete output of program/project.
Show all pass test cases if given in assignment.
5. Always add proper comments in code. If any specific package is used then please mention it in
comment./n UCF-CECS
Using Python & a screen scraper to extract data
from Wikipedia
Homework Assignment 3 (hw03)
April 1, 2024
1 Objectives
The goal of this homework assignment is to develop a solution to extract (screen scrape) the
SpaceX Falcon 9 Block 5 launch records from Wikipedia. It is important to note that this website
is being actively & substantially modified. So a stable version of the Wikipedia page for this
assignment has been supplied. Use the supplied file is to develop a solution to extract (screen
scrape) data for the SpaceX Falcon 9/Heavy Launches using the Block 5 engines. This is described
in further detail below.
1.1 Specific data
There are several data in the webpage elements to be extracted in this assignment. They are as
follows:
• All Block 5 engines' launch history ((may vary from one to many launches).
-
Launch identified by launch number (Fx-DDD where x is either a 9 for Falcon 9, or
H for a Falcon Heavy and DDD is a 3 decimal digit)
-
- Launch date in DD-Month-Year format
Turnaround time in days
CIS4340-McAlpin
HW 03
1 • In this assignment it is important to note the Falcon 9 launches are a single launch booster
using a Block 5 engine, and Falcon Heavy launches have three Block 5 engines.
These objectives will be met and demonstrated in the exercises specified later in the assignment.
1.2 Collected data
As discussed earlier, the Falcon 9 wikipedia page is currently being substantially revised.
This page is currently being split.
After a discussion, consensus to split this page into List of Falcon 9 and Falcon Heavy launches
(2020-2021) was found. You can help implement the split by following the instructions at
Help:Splitting and the resolution on the discussion. Process started in March 2024.
For this reason, the file (Falcon9first-stageBoosters.html) has been supplied for this assignment.
It is in the Webcourses assignment page.
2
CIS4340-McAlpin
HW 03 S/Nial Type Launches Launch date (UTC)[5]
Falcon 9 block 5 first-stage boosters
Expended, Destroyed, or Officially Retired)
Flight No. Turnaround
Payload cl
11 May 2018
[b]
F9-054
time
Bangabandhu-188]
7 August 2018
E9-06088 days
Telkom-4 Merah
Putih 891
B1046 E9
3 December 2018
P9-064 118 days
19 January 2020911
F9-079412 days
SHERPA (SSO-A)[88][90]
Dragon C205 (In-Flight
Abort Testy 921
22 July 2018
F9-058
-
Telstar 19y1931
B1047 9
15 November 2018
F9-063116 days
Es'hail 2241
Launch Landing
(pad) (location)
Success Success
(39A) (OCISLY)
Success Success
(40) (OCISLY)
Success
(4E)
Success
(39A)
Success
(40)
Success
(39A)
Success
Status
Expended
Success (JRTI)
No attempt
(OCISLY)
Success
Expended
(OCISLY)
6 August 20191951
E9-074263 days
AMOS-17
No attempt 271
(40)
B1024's history in html:
| B1024
|
Figure 1.1: The first 7 rows of Block 5 engines' data
FT
|
15 June 2016
|
F9-026
|
âĂŤ
|
ABS-2A / Eutelsat 117 West B
|
Success (40)
|
Failure
|
Destroyed[40]
|
The rows, , and cells by column, | tags support configuration for the number of rows in
each column. Those tags are navigable in the scraping code, often indexable too.
CIS4340-McAlpin
HW 03
3 S/Nlal Type Launches
uch date TC151
Flight No. Turnaround
F9-159 daigne
[156]
Launch Landing
Starlinkpl L19) Success
(pad) (location)
(40) (OCISLY)[157]
Status
B1066 EH core 1
1 November 2022
FH-004
USSF-44
B1068
FH
core 1331
1
1 May 2023[131]
FH-006
-
ViaSat-3 Americas[131]
B1070 FH core 1
15 January 2023[158]
FH-005
-
USSF-67
Success
(39A)
Success
(39A)
Success
(39A)
No attempt Expended
No attempt[132] Expended
No attempt Expended
B1074 FH core 1
29 July 2023
FH-007
-
Jupiter-3 (EchoStar-24)
Success
(39A)
Success
No attempt Expended
B1079 FH core 1
13 October 2023
FH-008
Psyche 1591
No attempt Expended
(39A)
B1084 FH core 1
29 December 2023
FH-009
USSF-52 (Boeing X-37B
OTV-7)
Success
(39A)
No attempt
Expended
1.3 Programs
1.3.1 Extraction
Figure 1.2: The last 6 rows of Block 5 engines' data
The following data needs to be extracted using a screen scraper applied to the HTML file,
Falcon9first-stageBoosters.html, which is supplied via Webcourses.
1. The Block 5 engine number.
2. The Flight number.
3. The Flight type
a) F9 for Falcon 9
b) FH for Falcon Heavy
4. The launch date, in the YYYY-MM-DD format.
5. The launch pad.
6. The landing location, typically an acronym, sometimes also identified as No attempt.
7. The Turnaround time, in days.
8. The engine's status:
a) Expended
b) Destroyed
c) Lost at sea
d) Returned to service
9. The total number of launches for this engine.
These data elements should be output in the order shown above, to STDOUT. Each element
should be separated by a comma, thereby building a CSV.
This Python program should be named, Block5Extract.py.
CIS4340-McAlpin
HW 03
4 Wondering how to run the program and capture the output?
-
- xyz$python3 Block5Extract.py > Block5.csv
1.3.2 Reports
* This command prompt
executes the Python program using the
Falcon9first-stageBoosters.html as input and redirects the output from STDOUT
to the file named Block5.csv.
* Make sure Falcon9first-stageBoosters.html is in the same directory as the code.
#
Title
1 f9only
2
fHonly
3
fHpairs
4
5
6
Table 1.1: Report names
Description
Only Falcon 9 launches
Only Falcon Heavy launches
The three engines used for each Falcon Heavy launch
longestTurnaround | The longest turnaround for a Block 5 engine
fastestTurnaround
mostLaunches
Notes:
The fastest turnaround for a Block 5 engine
The most number of launches for a Block 5 engine
Use the Title as shown above for both the program name, i.e. #1 would be
f9only .py and the output to be redirected to the filename f9only.txt.
Both the program and the output file for each of the 6 programs/reports will be
submitted to Webcourses.
1.3.3 Submission instructions
You must submit this assignment in Webcourses as file uploads. It is preferred to ZIP your
submissions.
The submitted programs are as follows:
1. f9only
2. fHonly
3. fHpairs
4. longestTurnaround
5. fastestTurnaround
6. mostLaunches
7. Block5Extract
CIS4340-McAlpin
HW 03
5/n |