Top 10 Data Mining Mistakes

Maybe some of you have read this white paper before, but I just want to add here as resource collection for future data mining beginners. The paper is a book excerpts from “Handbook of Statistical Analysis and Data Mining Applications“, Elsevier (ISBN: 978-0-123747655). According to the authors, mining data to extract useful and enduring patterns remains a skill arguably more art than science itself. In the paper, they briefly describe, and illustrate from examples, what they believe are the “Top 10” mistakes of data mining, in terms of frequency and seriousness.

Top 10 DM Mistakes (white paper)

0. Lack of Data (important too!)
1. Focus on Training
2. Rely on One Technique
3. Ask the Wrong Question
4. Listen (Only) to the Data
5. Accept Leaks from the Future
6. Discount Pesky Cases
7. Extrapolate
8. Answer Every Inquiry
9. Sample Casually
10. Believe the Best Model

I would like to emphasize on mistake no. 2 (Rely on 1 technique only) which I think is important for us to consider. In data mining task, it is important that we try variations of modeling algorithms to make sure that we get the best result. Find new algorithms/tools that are available in the market (sometimes it is good to read new publication in conference/journal to get latest improvement of the algorithms) to mine your data. There is a popular folklore “No Free Lunch” (NFL Theorem) that states no algorithm is better to solve all the problems!

Continue Reading

Data Mining Trends (2004-2010): By Country, City and Language

I managed to have a look at the current trend (using Google Trends) for data mining by country (search volume), city (traffic source) and language (search language). Although I am no surprise that most of the interest came from Indian people, what interest me is that Iranian also want to join the data mining community year by year!

So have a look: Data Mining Trend (2004-2010)

Country Ranking (Search Volume)
India 1
Pakistan 2
Taiwan 3
Hong Kong 4
Iran 5
Indonesia 6
Singapore 7
South Korea 8
Malaysia 9
Thailand 10
City Ranking (Traffic Volume)
Chennai (India) 1
Mahape (India) 2
Bangalore (India) 3
Delhi (India) 4
Mumbai (India) 5
Taipei (Taiwan) 6
Hong Kong (Hong Kong) 7
Singapore (Singapore) 8
Jakarta (Indonesia) 9
Bangkok (Thailand) 10
Language Ranking (Search Language)
Indonesian 1
Korean 2
Thai 3
English 4
Chinese 5
Portuguese 6
Russian 7
Arabic 8
Italian 9
German 10

 

Continue Reading

Watch Online Data Mining Tutorial with Weka

Hi, if you are not eager to learn some data mining tasks by reading, I recommend you watch this online video (can also be download-mp4) on practical data mining using Weka (open source data mining tool in java). The presenter shows you step by step on how to install weka, selecting real world data source, how to clean the data, how to upload the data into Weka, selecting learning algorithm and result (model) interpretation. The current available data mining videos are:

  • text mining
  • neural network
  • clustering
  • cluster and neural network
  • filtering tool
  • experimenter (weka module)

I bet you can learn the tasks faster more than you know (at least it works for me..).

Watch Online Data Mining Tutorial with Weka.

For More Information about Data Minining click here

Continue Reading

Data Mining: Top 10 Search Keywords in Google

Today (as of 14 Feb 2010) as interest I would like to know what people around the world search for about data mining. So, I have compiled a statistic of data mining keywords and their results (using Google search engine) for your information.

Top 10 search keywords list are:

Ranking Search Volume Keyword
1 31,600,000 data mining algorithms
2 22,700,000 data mining
3 18,800,000 data mining techniques
4 15,700,000 data mining tools
5 14,800,000 data mining jobs
6 6,740,000 data mining examples
7 6,170,000 data mining software
8 2,700,000 data mining definition
9 1,020,000 data mining wiki
10 789,000 data mining and knowledge discovery

 

Continue Reading