March, 2010 | My Data Mine

Data Miner Survey Report for 2009

March 31, 2010 No Comments

According to the 2009 Rexer Analytics Data Miner Surveys, in which 710 data miners from 58 countries participated, for the past 3 years data miners concluded that most commonly used algorithms are regression, decision trees, and cluster analysis. The report also mentioned that almost half of industry data miners rate the analytic capabilities of their company as above average or excellent. But 19% feel their company has minimal or no analytic capabilities.

Among other key points of the survey are:

IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners.
Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
SAS Enterprise Miner dropped in data miner’s tool rankings this year.
Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
Fields & Industries: CRM / Marketing, Academic, Financial Services, & IT / Telecom. For-profit sector, the departments data miners most frequently work in are Marketing & Sales and Research & Development.

Blog, Data Mining

Data Mining Search and Twitter Trend in 3D

March 28, 2010 No Comments

Following to my previous post titled “Data Mining In Different Views“, I would like to add one more view to the current views of Data Mining information from web search engine. Using Tianamo browser-based 3D information visualization interface, we can view data mining search results from Yahoo search engine in 3D. You can view the latest 3D results on “Data Mining” by clicking the image below, but I need to remind that your browser must have Java Runtime Environment (JRE) installed in your machine. The advantage of viewing 3D search results is that you can identify what are the main keywords for the topic you are searching just by looking at which keyword has the “highest mountain” in the overall 3D view. For instance, in the current view, the main data mining search results keyword is “report”.

Data Mining, Data Mining Trends

Screen Scraping vs Data Mining vs Web Mining

March 26, 2010 No Comments

I know the topic got many ‘vs’ but I want to highlight the differences between all of them together. Currently there is a video on YouTube titled “screen scraping, web data mining, web data scraping” and I am calling to clarify the misleading topic. You can watch the video below:

Read my posts about “Data Mining vs Screen-Scraping” and “Data Mining vs Web Mining” to get the whole idea of the topic. I just want to highlight some of the main differences as below:

Screen scraping was used to extract characters from the screens so that they could be analyzed. Screen scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

Web mining, on the other hand, is the application of data mining techniques to discover patterns from the Web. According to analysis targets, web mining can be divided into three different types, which are Web usage mining, Web content mining and Web structure mining.

For More Information about Data Minining click here

Blog, Data Mining

Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing

March 10, 2010 No Comments

Following to my earlier post titled “Data Mining Trends“, I would like to know the popularity of data mining algorithms in major search engines (Google, Yahoo & Bing). Using keyword research tool, I managed to pull out top 10 data mining algorithms in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “algorithm” to each of the data mining algorithm to make it specific that we try to search “data mining algorithms”, not something else. Maybe my approach is wrong, if so please correct me, OK.

By the way, have a look at the result of the top 10 data mining algorithms:

Data Mining Algorithms	Google	Yahoo	Bing
C4.5 ALGORITHM	18,100,000	1,060,000	562,000
REGRESSION ALGORITHM	11,900,000	5,810,000	1,260,000
APRIORI ALGORITHM	9,140,000	59,900	1,090,000
NEURAL NETWORK ALGORITHM	8,870,000	7,870,000	1,560,000
K-MEANS ALGORITHM	4,680,000	219,000	9,470,000
SUPPORT VECTOR MACHINE ALGORITHM	4,440,000	4,310,000	1,120,000
ID3 ALGORITHM	3,380,000	491,000	389,000
NEAREST NEIGHBORS ALGORITHM	2,370,000	763,000	530,000
GENETIC ALGORITHM	1,790,000	10,600,000	1,840,000
RIPPER ALGORITHM	487,000	1,650,000	350,000