Top 10 Search Competition of Data Mining Software in Google, Yahoo & Bing

In my previous post, I write about “Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing“. However, I would also like to know the popularity of existing data mining software/tools in major search engines such as Google, Yahoo & Bing. Similarly, by using keyword research tool, I managed to pull out top 10 data mining software in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “data mining” to each of the software name to make it specific that we try to search “data mining software”, not something else. Please correct me if my approach is wrong, OK.
From the result, in Google search engine, top 3 data mining software are Excel, R and Statistica. It appears that these three software also dominated top 10 of search results for Yahoo and Bing search engines. However, “R data mining” software results for these search engines can be argued because of the single alphabet name itself ( can be anything, i.e. name abbreviation).

Continue Reading

Video Compilation of Practical Data Mining Techniques

Nowadays, it is very difficult for new data mining beginner to watch free video tutorial of data mining techniques in a single place. Because of that reason, I have created new page in my site to showcase video compilation of practical data mining techniques using known software/tools available in the market today. The compilation videos are categorized into data mining techniques, which are classification; clustering; regression; neural network etc. I will try to update the list every week if new video is available in the web. If you know other sources of video tutorial using different software/tools not listed here then feel free to add in the comment section.
For an Overview of Data Mining Techniques, visit here.

1. Classification
Software: Rapidminer 5.0
This video shows how to import training and prediction data, add a classification learner, and apply the model.
Application: building a Gold Classification trend model.

Software: Rapidminer 5.0
This video shows how to use Rapidminer to create a decision tree to help us find “sweet spots” in a particular market segment. This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.
Application: creating Decision Trees for Market Segmentation.

This video shows how to use CHAID decision trees to classify good and bad credit risk. CHAID decision trees are particularly well suited for large data sets and often find application in marketing segmentation. This session discusses the analysis options in STATISTICA and review CHAID output including the decision trees and performance indices.

2. Clustering
This video shows how to use clustering tools available in STATISTICA Data Miner and demonstrates the K-means clustering tool as well as the Kohonen network clustering tool.
Application: Clustering tools are beneficial when you want to find structure or clusters in data, but dont necessarily have a target variable. Clustering is used often in marketing research applications as well as many others.

Software: WEKA
This video shows how to use clustering algorithm available in WEKA and demonstrates the K-means clustering tool.
Application: building cluster model of bank customer based on mortgage application.

3. Neural Networks (NN)
– Neural Networks is a sophisticated modeling tool, capable of modeling very complex relationships in data.
Software: WEKA
This video shows how to use neural network functions available in WEKA to classify real weather data.
Application: building weather prediction model

4. Regression

This video explores the application of neural networks in a regression problem using STATISITCA Automated Neural Networks. The options used for regression are similar for other neural networks applications such as classification and time series. The episode explores various analysis options and demonstrates working with neural network output.

This video uses the regression data, beverage manufacturing, to explore C&RT as well as the other tree algorithms. The options and parameters are reviewed as well as important output.

5. Evolutionary/Genetic Algorithms
Software: Rapidminer 5.0
This video highlights the data generation capabilities for Rapidminer 5.0 if you want to tinker around, and how to use a Genetic Optimization data pre-processor within a nested nested experiment.

Software: Rapidminer 5.0
This video discusses some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set. We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).

Software: Rapidminer 5.0
This tutorial highlights Rapidminer’s weighting operator using an evolutionary approach. We use financial data to preweight inputs before we feed them into a neural network model to try to better classify a gold trend.

For More Information about Data Minining click here

Continue Reading

Data Miner Survey Report for 2009

According to the 2009 Rexer Analytics Data Miner Surveys, in which 710 data miners from 58 countries participated, for the past 3 years data miners concluded that most commonly used algorithms are regression, decision trees, and cluster analysis. The report also mentioned that almost half of industry data miners rate the analytic capabilities of their company as above average or excellent. But 19% feel their company has minimal or no analytic capabilities.

Among other key points of the survey are:

IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners.
Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
SAS Enterprise Miner dropped in data miner’s tool rankings this year.
Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
Fields & Industries: CRM / Marketing, Academic, Financial Services, & IT / Telecom. For-profit sector, the departments data miners most frequently work in are Marketing & Sales and Research & Development.

Continue Reading

Data Mining Search and Twitter Trend in 3D

Following to my previous post titled “Data Mining In Different Views“, I would like to add one more view to the current views of Data Mining information from web search engine. Using Tianamo browser-based 3D information visualization interface, we can view data mining search results from Yahoo search engine in 3D. You can view the latest 3D results on “Data Mining” by clicking the image below, but I need to remind that your browser must have Java Runtime Environment (JRE) installed in your machine. The advantage of viewing 3D search results is that you can identify what are the main keywords for the topic you are searching just by looking at which keyword has the “highest mountain” in the overall 3D view. For instance, in the current view, the main data mining search results keyword is “report”.

Continue Reading
1 3 4 5 6 7 10