# CS代考 COMP5426 Distributed – cscodehelp代写

COMP5426 Distributed

Load Balancing

15 17 19 21 23 25 27 29

Copyright By cscodehelp代写 加微信 cscodehelp

In practice, the actual speedup for solving a problem often smaller than the ideal speedup and worse as th

cessors increases

15 17 19 21 23 25 27 29

Typically

of Ineffici

Poor single processor performance

Too much parallelism overhead

Synchronization, communicatio

oad imbalance

Different amounts of

Different speeds of computing resou

work across proces

communication

Recall Amdahl’s law: a small amount of load imbal can significantly bound maximum speedup

strategies for

balance workload across the processes/threads

The strategies can be classified into two categories:

assignment

Static task

Dynamic task assignment (

ssignment to

job scheduling

Static Task Assignm

In static assignment the assignment of tasks processes/threads is pre-determined

Simple and zero runtime overhead

Static assig

predictable

t is applicable when the amount of tasks are

of task partitioning

e.g., dense m

meshes, can be easily decomposed

equal size with regular data depen

assignment relatively easy

Some other problems, e.g., sparse meshes, and graphs, are more com how to balance workload, minimize and redundant computation

ent is directly related to the quality

ation, regular

into a number of tasks

dency pattern – task

matrices, unstructured plicated, need to consider

communication overhead,

Graph partitioning of smaller graphs

It has a wid

rtitioning

e range of

application

a graph into a number

s in practice

problem can be reformulated as a graph problem e.g., scientific computing, VLSI circuit design,

computer graph

We are interested in large computation/analysis using

thus graph

processing step

ics, and web grap

partitioning

parallel machines

rtitioning

methods for solving

are grid-oriented, e.g.,

The 2D space is divided into a number of triangles

Each triangle, represented by a vertex in the correspo dependency graph (or mesh), has a state

The neighbouring triangles, represen

the vertices in the

ate update of a

rential equations

ted by edges connectin

involves the state

Definition of Graph Partiti

Partitioning

Information

There are several

solve the problem

3D space (e.g.,

e.g., the graph is derived from some structure in 2D

geometric coordinates can be attached to the vertices and therefore the geometric layout of graph is known

partitioning

PDEs), the

ate Bisection

oordinate bisection

x-coordinate which divides

For recursive bisection we

z- coordinates

This leads to a better lower edge cut size

partitioning method

We may simply draw a hyperplane orthogonal, say, to

simplest coord

alternately b

aspect ratio

inate-based

hopefully a

ate Bisection

ate Bisection

age – coordinate de

The same structure may partitioned quite differently in different coordinate systems

e.g., the partition in (b) is not good as the edge cut set is not minimized in this simple example

(𝑥𝑥, 𝑦𝑦) system let point P h polar coordinates (𝑟𝑟, 𝛼𝛼) a

(𝑟𝑟, 𝛼𝛼 − 𝜃𝜃) in the (𝑥𝑥′, 𝑦𝑦′) system We have

𝑥𝑥=𝑟𝑟cos and

𝑥𝑥′ =𝑟𝑟cos(𝛼𝛼−𝜃𝜃)

and =𝑟𝑟cos𝛼𝛼cos𝜃𝜃+𝑟𝑟sin𝛼𝛼sin𝜃𝜃

= 𝑥𝑥 cos 𝜃𝜃 + 𝑦𝑦 sin 𝜃𝜃 𝑦𝑦′ =𝑟𝑟sin(𝛼𝛼−𝜃𝜃)

= 𝑟𝑟 sin 𝛼𝛼 cos 𝜃𝜃 − 𝑟𝑟 cos 𝛼𝛼 sin 𝜃𝜃 = −𝑥𝑥 sin 𝜃𝜃 + 𝑦𝑦 cos 𝜃𝜃

of 𝐿𝐿, i.e., We then

�𝑑𝑑= 𝑖𝑖2 =

𝑢𝑢 = (𝑣𝑣, 𝑤𝑤) know 𝑑𝑑𝑖𝑖 of

t rotate x- and

𝑣𝑣2 �(𝑥𝑥 − 𝑣𝑣2𝑠𝑠𝑥𝑥𝑥𝑥 + 2𝑣𝑣

𝑣𝑣2 + 𝑤𝑤2 = 1

ordinates, we need to find the

𝑥𝑥̅) +2𝑣𝑣𝑤𝑤�(𝑥𝑥−𝑥𝑥̅) 𝑦𝑦−𝑦𝑦�

each mesh point and with

𝑤𝑤𝑠𝑠𝑥𝑥𝑥𝑥 + 𝑤𝑤2𝑠𝑠𝑥𝑥𝑥𝑥

2 𝑦𝑦 − 𝑦𝑦� )

th we have

Partitioning

require ge

In many practical problems graph don’t have geometric

e.g., the

Information

The coordinate-based methods, such as inertial

model of WWW, vertices

ometric information about the

, however, vert information

improve an i

local charac

spectral bisec

t takes a kind of

nitial partition

The Kernigan-Lin (or K-L) algorithm does not create partitions but rather improves them iteratively

local view of

neighbouring vertices

Thus it nicely complements algorithms which have a

view of the problem but t

the problem,

teristics, such as inertial bisection and

Multilevel Parti

n many app

processing step

The multilevel

rithms have been

However, these for large graphs

speed up the computation

It reduces the size of the gr

collapsing

partitioning

small size grap

to construct a partition for t

partitioning is just

partition, e.g.,

methods are very expensive, especially

aph gradually b

he original graph

that find a

first pre-

ncoarsens it

Multilevel Parti

Multilevel phases

Coarsening

Initial partition Refinement

algorithms

Multilevel Parti

arsening phase,

a sequence of smaller graph

s, each with

vertices, is constructed

It can be achieved by finding a maximal the matched vertices into a multinode

A matching of a graph, is a set of edges, no two incident on the same vertex

If a matching contains all possible such edges (desirable f graph coarsening, it is call maximal matching

matching and collapsing

Multilevel Parti

respect to the original graph

bisection of

a coarser graph to be goo

The weight of a multinode is set equal to the sum of weights of the vertices collapsed in the multinode

The edges of a multinode are the union of the edges of the vertices collapsed in the multinode

Multilevel Parti

graph is c

In the partitioning phase

Any partitioning algorithm It is fast and efficient as

oarse enough, it can be

can be used the size of the

partitione

graph is small

relatively easy t

e vertex weights and edge weights are

es, the bisection of the coarser graph

o be refined

preserved in

original grap

Multilevel Parti

Given the part

partition of the finer graph

rser graph, we need to

In the refinement phase we now have more degree of freedom find a better partition (improved cut size and balance)

The refinement can be done by using one of the variants of K-L algorithm

often is a

trade off between the

of the sol

only a solution of medium

ly interested i

Some algorithms run quite fast quality

Some others take a long time but deliver excellent solu Even others can be tuned between both extremes

The choice of time vs. quality depends on the intended application

For VLSI-design we prefer the quality as even solution can save real money

For sparse matrix computation we are time

the execution time for the graph partitioning has to be less than the time saved by parallel matrix computation

hus there is no single best solution for all situations

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com