Readings on Data Mining for Big Data

Big Data has been an interesting topic in data mining community lately. As in today (17/3/10) there are about 240,000,000 pages for big data (broad search) in Google search. If you are new to big data, see visualization below about big data in wonder wheel to find out what related terms associated with it.

Further readings on Big Data can be found on these posts:
1. What is Big Data?

Big Data is the “modern scale” at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs. While Big Data as a term seems to refer to volume this isn’t the case. Many existing technologies have little problem physically handling large volumes (TB or PB) of data. Instead the Big Data challenges result out of the combination of volume and our usage demands from that data. And those usage demands are nearly always tied to timeliness.

Big Data is therefore the push to utilize “modern” volumes of data within “modern” timeframes. The exact definitions are of course are relative & constantly changing, however right now this is somewhere along the path towards the end goal. This is of course the ability to handle an unlimited volume of data, processing all requests in real time.

2. Big Data Technologies

Some key points on the big data technologies are summarized in two extended clips:

Big Data Technologies (1:35 minutes)
Key Technology Dimensions (4:52 minutes)
3. Data Mining of Big Data

The Data Mining Renaissance – Hadoop, an open-source implementation of MapReduce.
Algorithms for Massive Data Set Analysis – algorithmic and statistical methods for large-scale data analysis (course)
Method for fast large scale data mining using logistic regression
4. Current and Future Trends of Big Data

The Pathologies of Big Data – discusses the problems and how to deals with big data.
The Future Is Big Data in the Cloud – talks about distributed, non-relational database systems (DNRDBMS) for tackling “Big Data stack”.
Big Data Is Less About Size, And More About Freedom – big data trend is about the democratization of large data.
Data Singularity – another way of handling big data!

