Search for question
Question

Programming Languages - Project

WJ Inc., is a company with offices around the world. The HR team wants to build reports on

the number of employees around the world, in different departments. But they do not want to

use the raw data for the reporting purpose. They want some consolidation and aggregation to

be done prior to being loaded into their reporting system. The data is provided to the software

engineer in CSV files. There are 3 CSV files, each holding the department, location and

employee headcount data.

The department and location information is provided as a flattened hierarchy. Given below is a

snapshot of the department file content:

level1

WJ Group

WJ Group

WJ Group

level2

WJ Inc.

WJ Inc.

WJ Ino

level?

level3 level4 level5 level5

DEPT 1 DEPT 11 DEPT 111 DEPT 1111 DEPT 11111

DEPT 1 DEPT 12

DEPT 1 DEPT 13 DEPT 133

level!

The department hierarchy has 7 levels. The file has columns level1 - level7 and an additional

column to indicate the the bottom level. The department hierarchy is a jagged hierarchy

meaning not all hierarchies have 7 levels.

Similarly the location data is also provided. The screen shot is given below:

level1 level2

laval3

World North America USA Califomia

World North America USA Califomia

World North America USA Califomia

laval7

level5

loval6

leve! bottom_level

Northern California San Jose Main Campus SJG1 8JC 1

Northern Califomia San Jose Main Campus SJC2 SJC2

Southern California Los Angeles LA Techspace LA 1 LA 1

The head count data includes the department, location. Each row in the file gives the number

of employees for a particular department at a particular location. The headcount data is only

available at bottom levels. Given below is a screenshot of the data:

location department

DEPT_11111

SJC 1

SJC 1

SJC 1

SJC 1

SJC 1

DEPT 12

DEPT 133

bottom_level

DEPT 11111

DEPT 12

DEPT 133

DEPT_2222

DEPT_2244

head_count

12

42

122

3

12

Fig: 1

Fig: 2