Showing posts with label Big Data Analytics. Show all posts
Showing posts with label Big Data Analytics. Show all posts

Spark Word Count program in Python

Here is the word count program in Python using Spark (pyspark) and Hadoop (hdfs). In this tutorial, you will get to know how to process the data in spark using...

Explain Hadoop Ecosystem and briefly explain its components.

Hadoop is a framework which deals with Big Data but unlike other frameworks, it's not a simple framework, it has its own family for processing different thing which is tied...

Explain how to compute Page Rank for any web graph.

Page Rank is a function that assigns a real number to each page in the Web. The intent is that the higher the Page Rank of a page, the more...

Give Map Reduce algorithm for Natural Join of two relations and Intersection of two sets.

Algorithm for Natural Join For doing Natural join, the relation R(A, B) with S(B, C), it is required to find tuples that agree on their B components, i.e, the second...

Explain how dead ends are handled in Page Rank.

A dead end is a Web Page with no links out. The presence of dead ends will cause the Page Rank of some or all the pages to go to...

Give 2-step Map Reduce algorithm to multiply two large matrices.

M is a matrix with element mi,j in row i and column j. N is a matrix with element nj,k in row j and column k. P is a matrix = MN...

Give problem in Flajolet-Martin (FM) Algorithm to count distinct elements in a stream.

To estimate the number of different elements appearing in a stream, we can hash elements to integers interpreted as binary numbers. 2 raised to the power that is the longest...

Find Manhattan distance ( L1-norm) and Euclidian distance(L2-norm) for X = (1, 2, 2) and Y = (2, 5, 3)

1) Manhattan Distance (L1) is given as ,  Here,  = | 1 - 2 | + | 2 - 5 | + | 2 - 3 |          ...

Define 3 V's of Big Data

Big Data is characterized by 3 V's, they are as follow: 1) Volume, 2) Velocity, 3) Variety  1) Volume (Data at Rest)  -> The name 'Big Data' itself is related...