WJ Inc., is a company with offices around the world. The HR team wants to build reports on
the number of employees around the world, in different departments. But they do not want to
use the raw data for the reporting purpose. They want some consolidation and aggregation to
be done prior to being loaded into their reporting system. The data is provided to the software
engineer in CSV files. There are 3 CSV files, each holding the department, location and
employee headcount data.
The department and location information is provided as a flattened hierarchy. Given below is a
snapshot of the department file content:
level1
WJ Group
WJ Group
WJ Group
level2
WJ Inc.
WJ Inc.
WJ Ino
level?
level3 level4 level5 level5
DEPT 1 DEPT 11 DEPT 111 DEPT 1111 DEPT 11111
DEPT 1 DEPT 12
DEPT 1 DEPT 13 DEPT 133
level!
The department hierarchy has 7 levels. The file has columns level1 - level7 and an additional
column to indicate the the bottom level. The department hierarchy is a jagged hierarchy
meaning not all hierarchies have 7 levels.
Similarly the location data is also provided. The screen shot is given below:
level1 level2
laval3
World North America USA Califomia
World North America USA Califomia
World North America USA Califomia
laval7
level5
loval6
leve! bottom_level
Northern California San Jose Main Campus SJG1 8JC 1
Northern Califomia San Jose Main Campus SJC2 SJC2
Southern California Los Angeles LA Techspace LA 1 LA 1
The head count data includes the department, location. Each row in the file gives the number
of employees for a particular department at a particular location. The headcount data is only
available at bottom levels. Given below is a screenshot of the data:
location department
DEPT_11111
SJC 1
SJC 1
SJC 1
SJC 1
SJC 1
DEPT 12
DEPT 133
bottom_level
DEPT 11111
DEPT 12
DEPT 133
DEPT_2222
DEPT_2244
head_count
12
42
122
3
12
Fig: 1
Fig: 2