R Data Mining Resources

Data Mining using R has been gaining popularity among data miner/data analyst around the globe these days. A report from the Rexer’s Annual Data Miner Survey in 2010 stated that R has become the data mining tool used by more data miners (43%). According to Wikipedia, R is a programming language and software environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. Thus, I have compiled top resources about R data mining for your reference:

  1. R Project for Statistical Computing – the official R open source project website. Here you can get the latest release of R source code, manuals and recent bugs.
  2. R Books Website – list of latest books that are related to R and may be useful to the R user community. You may also like to read Data Mining with R book (data mining bestseller at Amazon.com).
  3. R in Wikipedia – here you can read basic info and example for R programming, including list of GUI for R and some references.
  4. Rattle: A GUI for Data Mining using R – a simple and logical graphical user interface based on Gnome that can be used by itself to deliver data mining projects. Rattle runs under GNU/Linux, Macintosh OS/X, and MS/Windows.
  5. R reference card for data mining – a collection of R packages and functions for data mining.
  6. R Bloggers – a central hub of news and tutorials contributed by (185) R bloggers.
  7. R Video Tutorials – a series of R for Statistical Programming screencasts that show you how to use R for for text mining. (some of the video links are missing)
  8. Reasons to learn R? – YouTube video describing why students should learn the R programming language.
  9. Programming R – online R programming resources from beginner to advanced resources.
  10. R Programming Wikibook – a place where anyone can share his/her tricks and knowledge on R.

For More Information about Data Minining click here

Continue Reading

Data Mining News

Data preprocessing for clustering: survey – Smart Data Collective

Data preprocessing for clustering: survey
Smart Data Collective
That might cause problems for users of data mining software because they are not sure which algorithm they should apply in which situation. …
Harper Tories accused of leaking data on mining stock – Toronto Star

Harper Tories accused of leaking data on mining stock
Toronto Star
OTTAWA—Opposition parties accused well-connected federal Conservatives of leaking confidential information that led to unusual trading of a western mining …
and more »
Appeals Court Overturns Prescriber Data-Mining Law – MedPage Today

Town Hall

Appeals Court Overturns Prescriber Data-Mining Law
MedPage Today
Pharmacies sell this “prescriber information” to data mining companies, including the three appellants in the case, IMS Health, Verispan (now SDI), …
Vt. law on drug data mining ruled unconstitutionalBusinessWeek
Vt. law on drug data mining ruled unconstitutionalWashington Post
Appeals Court Strikes Down Vermont Data Mining RestrictionThompson.com
TopNews United States -Courthouse News Service -Mass Device
all 175 news articles »
FICO, UCSD Announce Winners of International Predictive Analytics Competition – MarketWatch (press release)

FICO, UCSD Announce Winners of International Predictive Analytics Competition
MarketWatch (press release)
The winners were: “Students around the world look forward to the UCSD-FICO Data Mining Contest each year as an opportunity to put the skills they’ve learned …
and more »
Data Mining Could Help Insurers Predict Longevity – eCRM Guide

Data Mining Could Help Insurers Predict Longevity
eCRM Guide
Insurers have long used blood and urine tests to assess people’s health but data-gathering companies have such extensive files on consumers that some …
and more »
Microsoft data mining Chinese cabbies brains for profit – China Car Times

Microsoft data mining Chinese cabbies brains for profit
China Car Times
Microsoft Research Asia believes Beijing’s cabbies hold the key to efficient driving, and is mining them for information in hopes of launching a mapping …
and more »
Should data-mining be used by insurance companies to predict longevity? – Wall Street Journal

Should data-mining be used by insurance companies to predict longevity?
Wall Street Journal
Data mining is not good for producing general rules. Data mining is good for reducing a proverbial needle in a haystack situation to a needle in an armful …
Should data-mining be used by insurance companies to predict longevity?Wall Street Journal
all 2 news articles »
Insurers Use Online Data Mining to Screen Potential Clients – Slate Magazine (blog)

Insurers Use Online Data Mining to Screen Potential Clients
Slate Magazine (blog)
That’s one question insurance companies are pondering as they turn to online data to screen and target potential customers. According to a Wall Street …
Friday Rant: Life Insurance — What’s Wrong With Mining Social Data for Risk …Spend Matters
Insurers Test Data Profiles to Identify Risky ClientsWall Street Journal
all 10 news articles »
HP shows profit from data mining – NetworkWorld.com

HP shows profit from data mining
NetworkWorld.com
Computer giant HP added $20 million (£12.5 million) to its bottom line through data mining activities, demonstrating that …
11Ants Analytics Introduces Revolutionary Method of Building Ensembles at …PRLog.Org (press release)
all 10 news articles »
Be cautious about open source data mining software – NetworkWorld.com

Be cautious about open source data mining software
NetworkWorld.com
Businesses should not deploy open source software for data mining just because it is generally cheaper, an open source …
and more »

For More Information about Data Minining click here

Continue Reading

Top 10 Search Competition of Data Mining Software in Google, Yahoo & Bing

In my previous post, I write about “Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing“. However, I would also like to know the popularity of existing data mining software/tools in major search engines such as Google, Yahoo & Bing. Similarly, by using keyword research tool, I managed to pull out top 10 data mining software in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “data mining” to each of the software name to make it specific that we try to search “data mining software”, not something else. Please correct me if my approach is wrong, OK.
From the result, in Google search engine, top 3 data mining software are Excel, R and Statistica. It appears that these three software also dominated top 10 of search results for Yahoo and Bing search engines. However, “R data mining” software results for these search engines can be argued because of the single alphabet name itself ( can be anything, i.e. name abbreviation).

Continue Reading

Video Compilation of Practical Data Mining Techniques

Nowadays, it is very difficult for new data mining beginner to watch free video tutorial of data mining techniques in a single place. Because of that reason, I have created new page in my site to showcase video compilation of practical data mining techniques using known software/tools available in the market today. The compilation videos are categorized into data mining techniques, which are classification; clustering; regression; neural network etc. I will try to update the list every week if new video is available in the web. If you know other sources of video tutorial using different software/tools not listed here then feel free to add in the comment section.
For an Overview of Data Mining Techniques, visit here.

1. Classification
Software: Rapidminer 5.0
This video shows how to import training and prediction data, add a classification learner, and apply the model.
Application: building a Gold Classification trend model.

Software: Rapidminer 5.0
This video shows how to use Rapidminer to create a decision tree to help us find “sweet spots” in a particular market segment. This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.
Application: creating Decision Trees for Market Segmentation.

Software: STATISTICA
This video shows how to use CHAID decision trees to classify good and bad credit risk. CHAID decision trees are particularly well suited for large data sets and often find application in marketing segmentation. This session discusses the analysis options in STATISTICA and review CHAID output including the decision trees and performance indices.

2. Clustering
Software: STATISTICA
This video shows how to use clustering tools available in STATISTICA Data Miner and demonstrates the K-means clustering tool as well as the Kohonen network clustering tool.
Application: Clustering tools are beneficial when you want to find structure or clusters in data, but dont necessarily have a target variable. Clustering is used often in marketing research applications as well as many others.

Software: WEKA
This video shows how to use clustering algorithm available in WEKA and demonstrates the K-means clustering tool.
Application: building cluster model of bank customer based on mortgage application.

3. Neural Networks (NN)
– Neural Networks is a sophisticated modeling tool, capable of modeling very complex relationships in data.
Software: WEKA
This video shows how to use neural network functions available in WEKA to classify real weather data.
Application: building weather prediction model

4. Regression

This video explores the application of neural networks in a regression problem using STATISITCA Automated Neural Networks. The options used for regression are similar for other neural networks applications such as classification and time series. The episode explores various analysis options and demonstrates working with neural network output.

This video uses the regression data, beverage manufacturing, to explore C&RT as well as the other tree algorithms. The options and parameters are reviewed as well as important output.

5. Evolutionary/Genetic Algorithms
Software: Rapidminer 5.0
This video highlights the data generation capabilities for Rapidminer 5.0 if you want to tinker around, and how to use a Genetic Optimization data pre-processor within a nested nested experiment.

Software: Rapidminer 5.0
This video discusses some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set. We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).

Software: Rapidminer 5.0
This tutorial highlights Rapidminer’s weighting operator using an evolutionary approach. We use financial data to preweight inputs before we feed them into a neural network model to try to better classify a gold trend.

For More Information about Data Minining click here

Continue Reading