Mining Unstructured Data

The information entered as “tweats” by a lot of people inside applications such as Twitter, Linkedin is unstructured. The “tweats” updated into such applications are similar to our own thought processes. Data Mining techniques involve mining on data that is precisely defined. For example a product survey contains questions such as Which color do you like the most? Which feature do you like the most, so on and so forth.

By writing some standard OLAP processing logic one would be able to derive reports required for providing critical business intelligence reports. In this case, there is also a considerable amount of effort spent on data definition, data entry and data analysis.

Tweats contain a lot of unstructured information. Rather than setting up a review committee to provide reviews on movies, products, packaged food,services so on and so forth, one can poke or construct a review system based on information updated into Twitter, Mouth, Linkedin, facebook etc.. The challenges obviously would be to construct a mining system that would be based both on likelihood and statistics.

The user responses or tweats will be mapped with certain possible values. For example a tweat such as “Oh great I had a good time at the coffee shop” could indicate any value between 7 to 10 on a rating system. In the case of consolidating unstructured information, statistical inferences will be combined with likelihood. So the same tweat can be used to infer two similar situations or view points.

There are already some applications that provide Product reviews based on tweats inside Twitter. It is now for the developers to develop some more applications that can effectively consolidate responses in the form of tweats and also derive business intelligence.

Continue Reading