# 程序代写代做代考 algorithm Microsoft PowerPoint – Chapter 9 – Sorting

Microsoft PowerPoint – Chapter 9 – Sorting

Introduction to

Parallel Computing

George Karypis

Sorting

Outline

Background

Sorting Networks

Quicksort

Bucket-Sort & Sample-Sort

Background

Input Specification

Each processor has n/p elements

A ordering of the processors

Output Specification

Each processor will get n/p consecutive elements of

the final sorted array.

The “chunk” is determined by the processor ordering.

Variations

Unequal number of elements on output.

In general, this is not a good idea and it may require a shift to

obtain the equal size distribution.

Basic Operation:

Compare-Split Operation

Single element per processor

Multiple elements per processor

Sorting Networks

Sorting is one of the fundamental problems in

Computer Science

For a long time researchers have focused on the

problem of “how fast can we sort n elements”?

Serial

nlog(n) lower-bound for comparison-based sorting

Parallel

O(1), O(log(n)), O(???)

Sorting networks

Custom-made hardware for sorting!

Hardware & algorithm

Mostly of theoretical interest but fun to study!

Elements of Sorting Networks

Key Idea:

Perform many comparisons in

parallel.

Key Elements:

Comparators:

Consist of two-input, two-output

wires

Take two elements on the input

wires and outputs them in sorted

order in the output wires.

Network architecture:

The arrangement of the

comparators into interconnected

comparator columns

similar to multi-stage networks

Many sorting networks have been

developed.

Bitonic sorting network

Θ(log2(n)) columns of

comparators.

Bitonic Sequence

Bitonic sequences are

graphically represented

by lines as follows:

1

2

7

0

4

6

Why Bitonic Sequences?

A bitonic sequence can be “easily” sorted in

increasing/decreasing order.

s s1 s2

• Every element of s1 will be less than or equal to every element of s2

• Both s1 and s2 are bitonic sequences.

• So how can a bitonic sequence be sorted?

Bitonic

Split

An example

Bitonic Merging Network

A comparator network that

takes as input a bitonic

sequence and performs a

sequence of bitonic splits

to sort it.

+BM[n]

A bitonic merging

network for sorting in

increasing order an n-

element bitonic

sequence.

-BM[n]

Similar sort in decreasing

order.

Are we done?

Given a set of elements, how do we re-arrange them into

a bitonic sequence?

Key Idea:

Use successively larger bitonic networks to transform the set into

a bitonic sequence.

An example

Complexity

How many columns of

comparators are required

to sort n=2l elements?

i.e., depth d(n) of the

network?

Bitonic Sort on a Hypercube

One-element-per-processor case

How do we map the algorithm onto a hypercube?

What is the comparator?

How do the wires get mapped?

What can you say about the

pairs of wires that are inputs

to the various comparators?

Illustration

Communication Pattern

Algorithm

Complexity?

Bitonic Sort on a Mesh

One-element-per-processor case

How do the wires get mapped?

Which one is better?

Why?

Row-Major Shuffled Mapping

Complexity?

Can we do better?

What is the lowest bound of sorting on a mesh?

communication performed by each process

More than one element per

processor

Hypercube

Mesh

Bitonic Sort Summary

Quicksort

Parallel Formulation

How about recursive decomposition?

Is it a good idea?

We need to do the partitioning of the array around

a pivot element in parallel.

What is the lower bound of parallel

quicksort?

What will it take to achieve this lower bound?

Optimal for CRCW PRAM

One element per processor

Arbitrary resolution of the concurrent writes.

Views the sorting as a two-step process:

(i) Constructing a binary tree of pivot elements

(ii) Obtaining the sorted sequence by performing an inorder

traversal of this binary tree.

Building the Binary Tree

Complexity?

Practical Quicksort

Shared-memory

Data resides on a shared array.

During a partitioning each

processor is responsible for a

certain portion.

Array Partitioning:

Select & Broadcast pivot.

Local re-arrangement.

Is this required?

Global re-arrangement.

Efficient Global Rearrangement

Practical Quicksort

Complexity

Complexity for message-passing is similar assuming that the all-to-all

personalized communication is not cross-bisection bandwidth limited.

A word on Pivot Selection

Selecting pivots that lead to balanced

partitions is importance

height of the tree

effective utilization of processors

Sample Sort

Generalization of bucket sort with data-driven sampling

n/p elements per-processor.

Each processor sorts is local elements.

Each processor selects p-1 equally spaced elements from its

own list.

The combined p(p-1) set of elements are sorted and p-1 equally

spaced elements are selected from that list.

Each processor splits its own list according to these splitters into

p buckets.

Each processor sends its ith bucket to the ith processor.

Each processor merges the elements that it receives.

Done.

Sample Sort Illustration

Sample Sort Complexity

Assumes

a serial sort