Algorithm for Natural Join
For doing Natural join, the relation R(A, B) with S(B, C), it is required to find tuples that agree on their B components, i.e, the second component from tuples of R and the first component of tuples of S. Using the B-value of tuples from either relation as the key, the value will be the other component along with the name of the relation, so that Reduce function can know where each tuple come from.
Map-function
For each tuple (a, b) of R, produce the key-value pair (b, (R, a)). For each tuple (b, c) of S, produce the key-value pair (b, (S, c)).
Reduce function
Each key-value b will be associated with a list of pairs that are either of the form (R, a) or (S, c). Construct all pairs consisting of one with the first component of R and the other with the first component of S, say (R, a) and (S, c). The output for key b is (b, [(a1, b1, c1), (a2, b2, c2), ...])
Pseudo Code:
map(key, value):
if key == R:
for (a, b) in value:
emit (b, (R, a))
else:
for (b, c) in value:
emit (b, (S, c))
reduce(key, value):
list_R = [a for (x, a) in values if x == R]
list_S = [c for (x, c) in values if x == S]
for a in list_R:
for c in list S:
emit (key, (a, key, c))
Algorithm to perform Intersection of two sets:
Mappers are fed by all tuples if both R and S relations to be intersected. Reducer emits only tuples that occurred twice. It is possible only if both the sets contain this tuple because tuples include the primary key and can occur in one set only once in each relation.
Map function: Turn each tuple t into a key-value pair (t, t)
Reduce function: If key t has value list [t, t], then produce (t, t) otherwise produce (t, NULL).
Pseudo Code:
map(key, value)
for tuple in value:
emit (tuple, tuple)
reduce(key, value):
if value == [key, key]:
emit (key, key)
For doing Natural join, the relation R(A, B) with S(B, C), it is required to find tuples that agree on their B components, i.e, the second component from tuples of R and the first component of tuples of S. Using the B-value of tuples from either relation as the key, the value will be the other component along with the name of the relation, so that Reduce function can know where each tuple come from.
Map-function
For each tuple (a, b) of R, produce the key-value pair (b, (R, a)). For each tuple (b, c) of S, produce the key-value pair (b, (S, c)).
Reduce function
Each key-value b will be associated with a list of pairs that are either of the form (R, a) or (S, c). Construct all pairs consisting of one with the first component of R and the other with the first component of S, say (R, a) and (S, c). The output for key b is (b, [(a1, b1, c1), (a2, b2, c2), ...])
Pseudo Code:
map(key, value):
if key == R:
for (a, b) in value:
emit (b, (R, a))
else:
for (b, c) in value:
emit (b, (S, c))
reduce(key, value):
list_R = [a for (x, a) in values if x == R]
list_S = [c for (x, c) in values if x == S]
for a in list_R:
for c in list S:
emit (key, (a, key, c))
Algorithm to perform Intersection of two sets:
Mappers are fed by all tuples if both R and S relations to be intersected. Reducer emits only tuples that occurred twice. It is possible only if both the sets contain this tuple because tuples include the primary key and can occur in one set only once in each relation.
Map function: Turn each tuple t into a key-value pair (t, t)
Reduce function: If key t has value list [t, t], then produce (t, t) otherwise produce (t, NULL).
Pseudo Code:
map(key, value)
for tuple in value:
emit (tuple, tuple)
reduce(key, value):
if value == [key, key]:
emit (key, key)