# 程序代写代做代考 chain data mining scheme ROYAL HOLLOWAY

ROYAL HOLLOWAY

Page 1 of 12

UNIVERSITY OF LONDON

MSc EXAMINATION 2014

For Internal Students of

Royal Holloway

DO NOT TURN OVER UNTIL TOLD TO BEGIN

Sample Exam paper

Time Allowed: TWO hours

Attempt THREE questions

No credit will be given for attempting any further questions

Calculators are NOT permitted

BI5502

© Royal Holloway University of London 2014

BI5502

Page 2 of 12 NEXT PAGE

BLANK PAGE

BI5502

Page 3 of 12 NEXT PAGE

Question 1

1. (a) Indicate whether each of the following statements is true or false.

Marking Scheme: 2 marks per correct answer

(i) Dimension hierarchies relate to drill-down analysis. [TRUE]
(ii) Type 1 changes for slowly changing dimensions relate to correction of errors.

[TRUE]
(iii) The main reason BI solutions are necessary is to support the day to day operational

needs of an organization. [FALSE]

(iv) Conforming dimensions is not absolutely necessary in a data warehouse. [FALSE]
(v) Scorecards display information on organizational Key Performance Indicators

(KPIs). [TRUE]

(vi) In dashboard design a flow based structure emphasizes a sequence of events or
actions across time. [TRUE]

[12 marks]

(b) The following cube shows the sales of different products of a bank, per branch, per

quarter of the year.

If you know that branches with codes 100 and 101 belong to Region A and branches with

codes 102 and 103 belong to Region B, perform the following OLAP operations in the

previous cube and design the results of those operations.

(i) Roll up on location (from branches to regions) [6 marks]

(ii) Slice, for Time = “Q1” [6 marks]

In the results of both (i) and (ii) above you may leave blank the parts whose values can

not be illustrated due to the lack of the necessary data in the above cube

i. The roll-up operation on location (from branches to regions results in the following
cube:

Marking Scheme: 4 marks for cube design, 2 marks for correct data.

BI5502

Page 4 of 12 NEXT PAGE

ii. The slice for Time =´Q1´ operation results in the following:

Marking Scheme: 4 marks for result, 2 marks for correct data.

Location

(Branches)

100 200 180 190 210

101 250

102 350

103 230

Mortgages Consumer Loan

Product

Credit Card Deposit

(c) Consider Customer Relationship Management (CRM) technology and applications.

(i) Name the functions/business units of an organization that typically use CRM applications.

[4 marks]

Answer: CRM mainly applies to the Management, Sales, Marketing and Customer

Service functions of an organization [1 mark per each function]

(ii) A company is interested in using a CRM application in order to send, via e-mail, a

promotional message to a group of customers. Which module of the CRM application

will be used for that purpose? [5 marks]

Answer: The Campaign Management module of a CRM application will be used for the

functionality described.

BI5502

Page 5 of 12 NEXT PAGE

Question 2

2 (a) Discuss the main differences between BI Solutions/Data Warehouses and On-Line

Transaction Processing Systems (OLTP). Structure the discussion by comparing the two

approaches according to the following criteria: Purpose, Usage, Data Modifications, DB

Schema Design, Typical DB Operations, Historical Data and DB Size. [13 marks]

Marking Scheme: 2 marks for the first six criteria, 1 mark for DB Size.

BI5502

Page 6 of 12 NEXT PAGE

(b) In the table below, the first column contains a list of business domains in which Data

Mining is applied. Complete the second column of the table with at least two data mining

examples as applied in these domains. The first entry has been already completed for you

as a sample answer. [6 marks]

Domain Example Data Mining Application

Financial Services
-Credit Scoring/Risk Analysis

-Fraud Detection

Customer Relationship

Management

 Marketing campaigns

 Churn analysis

 Cross- or up-selling

Retail/Logistics

 Optimize inventory levels

 Optimize logistics by predicting seasonal effects

Manufacturing

 Predict/prevent machinery failures

 Optimize manufacturing capacity

 Discover novel patterns to improve product quality

Criteria Transactional Systems BI Systems (Data Warehouse based)

Purpose Perform day-to-day business operations Support decision making

Usage OLTP systems support predefined

operations (i.e. have a fixed, “a-priori”

known set of operations in the data) and

their usage is repetitive in nature.

Designed for ad hoc querying and analysis.

Should be optimized to perform well for a

wide variety of such (not predefined)

operations.

Data

Modifications

The operations that end-users perform

result in direct data modifications.

Data warehouse/BI data is updated on a

regular basis using massive data

modification techniques (ETL). The end-

users of a data warehouse/BI infrastructure

do not directly update data, in other words

data is “read only” (non-volatile) from a

user perspective.

DB

Schema(s)

Design

Normalized schemas in order to optimize

the efficiency/consistency of

“update/insert/delete” data operations.

Typically “function” (transaction)

oriented schemas.

Combinations of relational/normalized and

multi-dimensional DB schemas (e.g. a star

schema) that optimize (ad-hoc) querying

and data analysis. Designed using

“subject” oriented schemas which directly

reflect business domain entities. Aggregate

data from different sources and maintain

them in consistent formats, i.e. are

integrated.

Typical DB

operations

DB queries access only some records DB queries usually scan thousands or

millions of rows

Historical

Data

Usually maintain data for only a few

weeks or months.

Maintain years of historical data (e.g. in

order to support historical analysis), i.e. are

time-variant

Size GBs TBs

BI5502

Page 7 of 12 NEXT PAGE

(c) A bank’s Data Warehouse contains data concerning the number of customers’ loans

applications performed in the bank’s branches in 3 different geographical regions (A, B,

C) in the 4 quarters of the year (Q1, Q2, Q3, Q4). A sample of the relevant business data

is summarized in the following 3 tables:

Region A

Q1 Q2 Q3 Q4

Mortgages Applications 1100 1200 975 1230

Credit Cards Applications 4125 3835 2746 3010

Consumer Loan Applications 3870 4010 3310 3425

Region B

Q1 Q2 Q3 Q4

Mortgages Applications 456 524 410 428

Credit Cards Applications 748 800 735 801

Consumer Loan Applications 945 875 1011 978

Region C

Q1 Q2 Q3 Q4

Mortgages Applications 223 297 289 300

Credit Cards Applications 330 300 333 377

Consumer Loan Applications 452 508 504 589

(i) Design a suitable Star-schema to satisfy the above data modelling requirements,
showing the schema entities, attributes and respective keys.

[7 marks]

(ii) Design a suitable Cube that represents the previous data model. The cube you design
should also illustrate in its visible parts the corresponding data from the above tables.

[8 marks]

Marking Scheme: 1 mark per correct entity, 1 mark per relationship

BI5502

Page 8 of 12 NEXT PAGE

Marking Scheme: 6 marks for correct cube design, 2 marks for correct data.

BI5502

Page 9 of 12 NEXT PAGE

Question 3

3. (a) Design a STAR schema for a chain of DVD rental stores whose managers would like to

be informed on the number and revenue of individual DVD items, per store, per day.

Devise at least two attributes for each of the non-key dimensions you include in your

schema; such attributes should make sense. Provide also a sample row of all tables of the

proposed schema illustrating the respective data interrelationships.

[13 marks]

Marking scheme: 9 marks for schema design correctness and completeness, 4 marks

for sample row.

Sample Row of Tables

Date

Date_Key Date Description Day of Week Month Year

1 21/02/2014 February 21st,

2014

Friday February 2014

Store

Store_ Key Zip Code City Country Region

1 322 34 London UK Europe

DVD Item

DVDItem_

Key

ItemCode Name Description Category

1 ABC123 “The Matrix” Film description Science Fiction

DVD Sales Facts

Store_Key Date_Key DVDItem_Key No of Sales Revenue

1 1 1 5 £15

BI5502

Page 10 of 12 NEXT PAGE

(b) In the context of Data Warehousing alternative architectures, provide a diagram of the

Centralized Data Warehouse architecture. What is the main issue of such architecture?

[9 marks]

Marking scheme: 6 marks for diagram, 2 marks for main issue

Source

Systems

Staging

Area

Normalized relational

warehouse (atomic/some

summarized data)

End user

access and

applications

ETL

(d) Centralized Data Warehouse Architecture

The main issue of the above architecture is the difficulty to achieve/build it.

(c) For each of the transformation types shown in the table below, give an example to

illustrate the case. One entry has already been completed for you as a sample answer.

Marking Scheme: 3 marks per row

Transformation Type Example

Character set conversion . from older character set formats to ASCII

Conversion of Units of

Measurements

Conversion of Miles to Km (or vice versa)

Splitting of Single Fields For example, when the first name, middle initials, and last

name of a customer is stored as a large text in a single field

and ones needs to store individual components of names and

addresses in separate fields in the data warehouse.

Date/Time Conversion e.g. Conversion between the US and the British formats

Decoding of Fields e.g. one source system using 1 and 2 for male and female and

another system using M and F.

[12 marks]

BI5502

Page 11 of 12 NEXT PAGE

Question 4

4. (a) What is the notion of granularity in dimensional modelling? How does it impact the

design of a BI solution/Data Warehouse? [10 marks]

Granularity or grain is the level of detail that a fact/measurement (a row in the facts table)

represents, i.e. its “scope”. For example, the “total sales per product, per store, per day” is

of different granularity compared to the “total sales per product type, per store, per

week”. [5 marks]

Its choice impacts the scalability of a BI solution due to the associated number of facts –

the lower the grain the larger the number of facts. Moreover, it also impacts the level of

detail that the users of the BI solution can “navigate” to. The lower the granularity, the

more level of detail is available to the users. [5 marks]

(b) Provide a diagram of the typical high-level components of a DB reporting solution and

outline the role of each of such components. [10 marks]

Marking Scheme: diagram 8 marks (2 per component), 2 marks for descriptions.

(c) Suppose that a dimensional model consists of the three dimensions date, doctor, and

patient, and a measure “charge”, which is the fee that a doctor charges a patient for a

visit.

(i) Draw a STAR schema for the above. [7 marks]

(ii) In a cube that shows data for a given patient, in a given doctor in a given date, what
specific OLAP operations should be performed in order to list the total fee collected by

each doctor in 2013? [7 marks]