Data Mining News

Data preprocessing for clustering: survey – Smart Data Collective

Data preprocessing for clustering: survey
Smart Data Collective
That might cause problems for users of data mining software because they are not sure which algorithm they should apply in which situation. …
Harper Tories accused of leaking data on mining stock – Toronto Star

Harper Tories accused of leaking data on mining stock
Toronto Star
OTTAWA—Opposition parties accused well-connected federal Conservatives of leaking confidential information that led to unusual trading of a western mining …
and more »
Appeals Court Overturns Prescriber Data-Mining Law – MedPage Today

Town Hall

Appeals Court Overturns Prescriber Data-Mining Law
MedPage Today
Pharmacies sell this “prescriber information” to data mining companies, including the three appellants in the case, IMS Health, Verispan (now SDI), …
Vt. law on drug data mining ruled unconstitutionalBusinessWeek
Vt. law on drug data mining ruled unconstitutionalWashington Post
Appeals Court Strikes Down Vermont Data Mining RestrictionThompson.com
TopNews United States -Courthouse News Service -Mass Device
all 175 news articles »
FICO, UCSD Announce Winners of International Predictive Analytics Competition – MarketWatch (press release)

FICO, UCSD Announce Winners of International Predictive Analytics Competition
MarketWatch (press release)
The winners were: “Students around the world look forward to the UCSD-FICO Data Mining Contest each year as an opportunity to put the skills they’ve learned …
and more »
Data Mining Could Help Insurers Predict Longevity – eCRM Guide

Data Mining Could Help Insurers Predict Longevity
eCRM Guide
Insurers have long used blood and urine tests to assess people’s health but data-gathering companies have such extensive files on consumers that some …
and more »
Microsoft data mining Chinese cabbies brains for profit – China Car Times

Microsoft data mining Chinese cabbies brains for profit
China Car Times
Microsoft Research Asia believes Beijing’s cabbies hold the key to efficient driving, and is mining them for information in hopes of launching a mapping …
and more »
Should data-mining be used by insurance companies to predict longevity? – Wall Street Journal

Should data-mining be used by insurance companies to predict longevity?
Wall Street Journal
Data mining is not good for producing general rules. Data mining is good for reducing a proverbial needle in a haystack situation to a needle in an armful …
Should data-mining be used by insurance companies to predict longevity?Wall Street Journal
all 2 news articles »
Insurers Use Online Data Mining to Screen Potential Clients – Slate Magazine (blog)

Insurers Use Online Data Mining to Screen Potential Clients
Slate Magazine (blog)
That’s one question insurance companies are pondering as they turn to online data to screen and target potential customers. According to a Wall Street …
Friday Rant: Life Insurance — What’s Wrong With Mining Social Data for Risk …Spend Matters
Insurers Test Data Profiles to Identify Risky ClientsWall Street Journal
all 10 news articles »
HP shows profit from data mining – NetworkWorld.com

HP shows profit from data mining
NetworkWorld.com
Computer giant HP added $20 million (£12.5 million) to its bottom line through data mining activities, demonstrating that …
11Ants Analytics Introduces Revolutionary Method of Building Ensembles at …PRLog.Org (press release)
all 10 news articles »
Be cautious about open source data mining software – NetworkWorld.com

Be cautious about open source data mining software
NetworkWorld.com
Businesses should not deploy open source software for data mining just because it is generally cheaper, an open source …
and more »

For More Information about Data Minining click here

Continue Reading

Top 10 Search Competition of Data Mining Software in Google, Yahoo & Bing

In my previous post, I write about “Top 10 Search Competition of Data Mining Algorithms in Google, Yahoo & Bing“. However, I would also like to know the popularity of existing data mining software/tools in major search engines such as Google, Yahoo & Bing. Similarly, by using keyword research tool, I managed to pull out top 10 data mining software in these search engines. FYI, for the search competition, I used setting such as “keyword exists anywhere” in the page document. I also add keyword “data mining” to each of the software name to make it specific that we try to search “data mining software”, not something else. Please correct me if my approach is wrong, OK.
From the result, in Google search engine, top 3 data mining software are Excel, R and Statistica. It appears that these three software also dominated top 10 of search results for Yahoo and Bing search engines. However, “R data mining” software results for these search engines can be argued because of the single alphabet name itself ( can be anything, i.e. name abbreviation).

Continue Reading

Video Compilation of Practical Data Mining Techniques

Nowadays, it is very difficult for new data mining beginner to watch free video tutorial of data mining techniques in a single place. Because of that reason, I have created new page in my site to showcase video compilation of practical data mining techniques using known software/tools available in the market today. The compilation videos are categorized into data mining techniques, which are classification; clustering; regression; neural network etc. I will try to update the list every week if new video is available in the web. If you know other sources of video tutorial using different software/tools not listed here then feel free to add in the comment section.
For an Overview of Data Mining Techniques, visit here.

1. Classification
Software: Rapidminer 5.0
This video shows how to import training and prediction data, add a classification learner, and apply the model.
Application: building a Gold Classification trend model.

Software: Rapidminer 5.0
This video shows how to use Rapidminer to create a decision tree to help us find “sweet spots” in a particular market segment. This video tutorial uses the Rapidminer direct mail marketing data generator and a split validation operator to build the decision tree.
Application: creating Decision Trees for Market Segmentation.

Software: STATISTICA
This video shows how to use CHAID decision trees to classify good and bad credit risk. CHAID decision trees are particularly well suited for large data sets and often find application in marketing segmentation. This session discusses the analysis options in STATISTICA and review CHAID output including the decision trees and performance indices.

2. Clustering
Software: STATISTICA
This video shows how to use clustering tools available in STATISTICA Data Miner and demonstrates the K-means clustering tool as well as the Kohonen network clustering tool.
Application: Clustering tools are beneficial when you want to find structure or clusters in data, but dont necessarily have a target variable. Clustering is used often in marketing research applications as well as many others.

Software: WEKA
This video shows how to use clustering algorithm available in WEKA and demonstrates the K-means clustering tool.
Application: building cluster model of bank customer based on mortgage application.

3. Neural Networks (NN)
– Neural Networks is a sophisticated modeling tool, capable of modeling very complex relationships in data.
Software: WEKA
This video shows how to use neural network functions available in WEKA to classify real weather data.
Application: building weather prediction model

4. Regression

This video explores the application of neural networks in a regression problem using STATISITCA Automated Neural Networks. The options used for regression are similar for other neural networks applications such as classification and time series. The episode explores various analysis options and demonstrates working with neural network output.

This video uses the regression data, beverage manufacturing, to explore C&RT as well as the other tree algorithms. The options and parameters are reviewed as well as important output.

5. Evolutionary/Genetic Algorithms
Software: Rapidminer 5.0
This video highlights the data generation capabilities for Rapidminer 5.0 if you want to tinker around, and how to use a Genetic Optimization data pre-processor within a nested nested experiment.

Software: Rapidminer 5.0
This video discusses some of the parameters that are available in the Genetic Algorithm data transformers to select the best attributes in the data set. We also replace the first operator with another Genetic Algorithm data transformer that allows us to manipulate population size, mutation rate, and change the selection schemes (tournament, roulette, etc).

Software: Rapidminer 5.0
This tutorial highlights Rapidminer’s weighting operator using an evolutionary approach. We use financial data to preweight inputs before we feed them into a neural network model to try to better classify a gold trend.

For More Information about Data Minining click here

Continue Reading

Data Miner Survey Report for 2009

According to the 2009 Rexer Analytics Data Miner Surveys, in which 710 data miners from 58 countries participated, for the past 3 years data miners concluded that most commonly used algorithms are regression, decision trees, and cluster analysis. The report also mentioned that almost half of industry data miners rate the analytic capabilities of their company as above average or excellent. But 19% feel their company has minimal or no analytic capabilities.

Among other key points of the survey are:

IBM SPSS Modeler (SPSS Clementine), Statistica, and IBM SPSS Statistics (SPSS Statistics) are identified as the “primary tools” used by the most data miners.
Open-source tools Weka and R made substantial movement up data miner’s tool rankings this year, and are now used by large numbers of both academic and for-profit data miners.
SAS Enterprise Miner dropped in data miner’s tool rankings this year.
Users of IBM SPSS Modeler, Statistica, and Rapid Miner are the most satisfied with their software.
Fields & Industries: CRM / Marketing, Academic, Financial Services, & IT / Telecom. For-profit sector, the departments data miners most frequently work in are Marketing & Sales and Research & Development.

Continue Reading
1 5 6 7 8 9 13