Data Mining Search and Twitter Trend in 3D

Following to my previous post titled “Data Mining In Different Views“, I would like to add one more view to the current views of Data Mining information from web search engine. Using Tianamo browser-based 3D information visualization interface, we can view data mining search results from Yahoo search engine in 3D. You can view the latest 3D results on “Data Mining” by clicking the image below, but I need to remind that your browser must have Java Runtime Environment (JRE) installed in your machine. The advantage of viewing 3D search results is that you can identify what are the main keywords for the topic you are searching just by looking at which keyword has the “highest mountain” in the overall 3D view. For instance, in the current view, the main data mining search results keyword is “report”.

Continue Reading

Screen Scraping vs Data Mining vs Web Mining

I know the topic got many ‘vs’ but I want to highlight the differences between all of them together. Currently there is a video on YouTube titled “screen scraping, web data mining, web data scraping” and I am calling to clarify the misleading topic. You can watch the video below:

Read my posts about “Data Mining vs Screen-Scraping” and “Data Mining vs Web Mining” to get the whole idea of the topic. I just want to highlight some of the main differences as below:

Screen scraping was used to extract characters from the screens so that they could be analyzed. Screen scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

Web mining, on the other hand, is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

For More Information about Data Minining click here

Continue Reading

Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing

Following to my earlier post titled “Data Mining Trends“, I would like to know the popularity of data mining algorithms in major search engines (Google, Yahoo & Bing). Using keyword research tool, I managed to pull out top 10 data mining algorithms in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “algorithm” to  each of the data mining algorithm to make it specific that we try to search “data mining algorithms”, not something else. Maybe my approach is wrong, if so please correct me, OK.

By the way, have a look at the result of the top 10 data mining algorithms:

Data Mining Algorithms Google Yahoo Bing
C4.5 ALGORITHM 18,100,000 1,060,000 562,000
REGRESSION ALGORITHM 11,900,000 5,810,000 1,260,000
APRIORI ALGORITHM 9,140,000 59,900 1,090,000
NEURAL NETWORK ALGORITHM 8,870,000 7,870,000 1,560,000
K-MEANS ALGORITHM 4,680,000 219,000 9,470,000
SUPPORT VECTOR MACHINE ALGORITHM 4,440,000 4,310,000 1,120,000
ID3 ALGORITHM 3,380,000 491,000 389,000
NEAREST NEIGHBORS ALGORITHM 2,370,000 763,000 530,000
GENETIC ALGORITHM 1,790,000 10,600,000 1,840,000
RIPPER ALGORITHM 487,000 1,650,000 350,000

 

Continue Reading

When To Use Genetic Algorithm For Data Mining Task?

You already got model(s) for your data but not sure whether the models are accurate enough for predictive data mining. Well, one of the way you can optimize your predictive model is through the use of Genetic Algorithm (one of the application of evolutionary computation). According to Wikipedia:

A genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (EA) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover.

Currently, genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics,physics and other fields.

Read white paper about how to “Using Genetic Algorithms for Parameter Optimization in Building Predictive Data Mining Models“, which describes the problem of finding optimal predictive model building parameter as an optimization problem and examine the usefulness of genetic algorithms. They perform experiments on several datasets and report empirical results to show the applicability of genetic algorithms to the problem of finding optimal predictive model building parameters.

For More Information about Data Minining click here

Continue Reading