Wednesday 14 August 2013

Filled Under:
, ,

some most sought after open source big data tools!!



Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

  

The data is too big, moves too fast, or doesn’t fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it.The challenges include capture, storage, search, sharing, transfer, analysis and visualization.So a need arises to know how Big data works? There are some open source tools , you can try your hands with.

Apache Hadoop

The most well known tool for big Data deployment has been the driving force behind the growth of the big data industry. You’ll hear it mentioned often, along with allied technologies such as Hive and Pig. But what does it do, and why do you need all its strangely-named friends, such as Oozie, Zookeeper and Flume?
Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. By large, we mean from 10-100 gigabytes and above. How is this different from what went before?
Existing enterprise data warehouses and relational databases excel at processing structured data and can store enormous amounts of data, though at a cost: This requirement for structure restricts the kinds of data that can be processed, and it imposes an inertia that makes data warehouses unsuitable for agile exploration of substantial heterogeneous data. The amount of effort required to warehouse data often means that valuable data sources in organizations are never mind. This is where Hadoop can make a big difference.

Apache Kafka 

Apache Kafka is a distributed publish-subscribe messaging system developed by LinkedIn.
It has been created with performance, availability and scalability in mind and is used as the messaging backbone at LinkedIn.

Talend 

Talend is an open source software vendor that provides data integration, data management, enterprise application integration and big data software and solutions. Headquartered in Sureness, France and Los Altos, California, Talend has offices in North America, Europe and Asia, and a global network of technical and services partners. Customers include eBay, Virgin Mobile, Sony Online Entertainment, Deutsche Post and Allianz. Talend is an Apache Software Foundation sponsor and many of its engineers are major contributors to Apache including CXF, Camel, ServiceMix, Karaf, Santuario, and Active. The company has also joined the Java Community Process (JCP) and is a Strategic Developer Member of the Eclipse Foundation, a Corporate Member of OW2, and has active involvement in the OASIS project.Source- http://en.wikipedia.org/wiki/Talend

Apache Hbase  


This is the Hadoop database,which is distributed ,scalable,big data store.You can use this when you need random,realtime read/write access to your big Data .

Bigdata

This is a scale out storage and computer fabric that supports optical transaction ,very high concurrency and very high I/O rates.

Pentaho

This software tightly couples data integration with business analytics in one platform Bringing together IT and business users to easily access,visualize and explore all data that impacts business results.

















 

 
























0 comments:

Post a Comment

Turingo: Answer Your Curiosity

Create questions | Search questions | Get your answers | Join Communities | Create your teams | Get answers to your curiosity