Data Science News 17 April 2015

Data Science News Highlights:

The data science ecosystem, part 3: Data applications

Computerworld
Remember that quote I started part two with? About data scientists wanting better tools for wrangling so they could work on the “sexy stuff”? Well, after …

Cybersecurity, data science and machine learning: Is all data equal?

Computerworld
Positive data, i.e. malicious network traffic data from malware and cyberattacks, have much more value than some other data science problems.

7 guerrilla tactics for retaining data scientists

TechRepublic
The ABCs of retention won’t cut it for data scientists, which is why you must use creative tactics. Try these out-of-the-box retention strategies.

JHU Researchers to Launch New Coursera Specialization Focused on Genomic Data Science

GenomeWeb
The newly-minted Genomic Data Science specialization features six non-credit courses that provide a coherent introduction to some common tools of …

Top Online Engineering School Responds to Industry Appetite for Data Science, Energy, and …

PR Newswire (press release)
The new Data Science certificate includes three courses in big data analysis, machine learning, and principles of database systems, with substitutions …

State Street, Berkeley and Stanford form data science consortium..

Finextra (press release)
“We are excited to be working with leading data scientists to tackle the immensely complex data challenges that face our clients and the institutional …

Genomic Data Science Course Invades MOOC Platform

International Business Times AU
A genomic data science course designed by John Hopkins University would soon hit massive online open course platform Coursera.org, a Forbes …

How Disruptive Are MOOCs? Hopkins Genomics MOOC Launches In June

Forbes
Of course the content today is different, particularly in the sciences–no one even knew what DNA was 200 years ago–but the way we teach has barely …

TOM STILL: Data science is learning as it grows

Kenosha News
Data science is a fancy term for statistics. It’s the extraction of knowledge from data, which can be derived from multiple digital sources and turned into …

NCDS Takes Action on Big Data

insideHPC
Big data is such a hot topic it has finally outgrown the descriptor ‘big’. From scientific journals to the popular press, so much has been said about big …

BLOGS
NASA’s Space Apps Challenge Tries to Coax More Women to Data Science

Xconomy
NASA officials visited New York in the fall to pick up some ideas on how women in the data science and tech community were doing and how to make …

WEB

the Dutch Data Science Summit

Technische Universiteit Eindhoven
the Dutch Data Science Summit. On December 1st 2015, another Dutch Data Science Summit will take place at TU/e. First confirmed speakers are …

Tackling big data: How Europe is trying to bridge the data science skills gap

Guest contributor Jonathan Keane takes a look at this space and how the European Data Science Academy wants to bridge the big data skills gap.

Three Reasons Data Scientists Might Prevent The Next Market Collapse

Attunity
Data science might be our best hope for better predicting and averting market panics like we saw in late 2008.

Continue Reading

Data Miner Survey Summary Report 2013

Here are some highlights from the 2013 Data Miner Survey:
SURVEY & PARTICIPANTS: 68-item survey conducted online in 2013. Participants: 1,259 analytic professionals from 75 countries. This is the 6th Data Miner Survey.
FOCUS ON CRM: In the past few years, there has been an increase among data miners in the already substantial area of customer-focused analytics. Respondents are looking for a better understanding of customers and seeking to improve the customer experience. This can be seen in their goals, analyses, big data endeavors, and in the focus of their text mining.
BIG DATA: Many in the field are talking about the phenomena of Big Data. There are clearly some areas in which the volume and sources of data have grown. However it is unclear how much Big Data has impacted the typical data miner. While data miners believe that the size of their datasets have increased over the past year, data from previous surveys indicate that the size of datasets have been fairly consistent over time.
THE ASCENDANCE OF R: The proportion of data miners using R is rapidly growing, and since 2010, R has been the most-used data mining tool. While R is frequently used along with other tools, an increasing number of data miners also select R as their primary tool.
CHALLENGES IN THE USE OF ANALYTICS: Data miners continue to report challenges at each level of the analytic process. Companies often are not using analytics to their fullest and have continuing issues in the areas of deployment and performance measurement.
ENGAGEMENT & JOB SATISFACTION: The Data Miners in our survey are highly engaged with the analytic community: consuming and producing content, entering competitions and searching for education and growth within their jobs. All of these activities lead to high job satisfaction, which has been increasing over time.
ANALYTIC SOFTWARE: Data miners are a diverse group who are looking for different things from their data mining tools. Ease-of-use and cost are two distinguishing dimensions. Software packages vary in their strengths and features. STATISTICA, KNIME, SAS JMP and IBM SPSS Modeler all receive high satisfaction ratings.
OTHER FINDINGS include the labels analytic professionals use to describe themselves (Data Scientist is #1), the algorithms being used (regression, decision trees, and cluster analysis continue to be the triad of core algorithms), and computing environments (cloud computing is increasing).

Continue Reading

Data Miner Survey 2011

Some highlights from the Rexer Analytics’ 5th Annual Data Miner Survey (2011):
SURVEY & PARTICIPANTS: 52-item survey of data miners, conducted on-line in 2011. Participants: 1,319 data miners from over 60 countries.

FIELDS & GOALS: Data miners work in a diverse set of fields. CRM/Marketing has been the #1 field for the past five years. Fittingly, 
“improving the understanding of customers”, “retaining customers” and other 
CRM goals continue to be the goals identified by the most data miners.

ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. A third of data miners currently use text mining and another third plan to do so in the future.

TOOLS: R continued its rise this year and is now being used by close to half of all data miners (47%). R users report preferring it for being free, open source, and having a wide variety of algorithms. Many people also cited R’s flexibility and the strength of the user community. STATISTICA is selected as the primary data mining tool by the most respondents (17%). Data miners report using an average of 4 software tools. STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings in 2011.

ANALYTIC CAPABILITY AND SUCCESS MEASUREMENT: Only 12% of corporate respondents rate their company as having very high analytic sophistication. However, companies with better analytic capabilities are outperforming their peers. Respondents report analyzing analytic success via Return on Investment (ROI) and analyzing the predictive validity or accuracy of their models. Challenges to measuring success include client or user cooperation and data availability/quality.

Continue Reading

Data Mining Books at Amazon using ScraperWiki

Below is comprehensive list of Data Mining books at Amazon.com website which I scraped using ScraperWiki (free online data scraper shared with the world).

1. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
2. Data Mining: Concepts and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management Systems)
3. Introduction to Data Mining
4. Handbook of Statistical Analysis and Data Mining Applications
5. Data Analysis with Open Source Tools
6. Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
7. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
8. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
9. Data Mining with R: Learning with Case Studies (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
10. Data Analysis Using SQL and Excel
11. Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems)
12. Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner
13. Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions
14. Data Mining Techniques in CRM: Inside Customer Segmentation
15. Data Mining with Microsoft SQL Server 2008
16. Principles of Data Mining (Undergraduate Topics in Computer Science)
17. Mastering Data Mining: The Art and Science of Customer Relationship Management
18. Data-Driven Marketing: The 15 Metrics Everyone in Marketing Should Know
19. Data Mining and Statistics for Decision Making (Wiley Series in Computational Statistics)
20. Handbook of Statistical Analysis and Data Mining Applications
21. Data Mining: A Tutorial Based Primer
22. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition
23. Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis
24. Data Mining: Concepts, Models, Methods, and Algorithms
25. Machine Learning: An Algorithmic Perspective (Chapman & Hall/Crc Machine Learning & Pattern Recognition)
26. Principles of Data Mining (Adaptive Computation and Machine Learning)
27. Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
28. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
29. Programming Collective Intelligence: Building Smart Web 2.0 Applications
30. Beautiful Data: The Stories Behind Elegant Data Solutions
31. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
32. Principles and Theory for Data Mining and Machine Learning (Springer Series in Statistics)
33. Data Preparation for Data Mining (The Morgan Kaufmann Series in Data Management Systems)
34. Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
35. Intelligent Data Analysis
36. Beautiful Visualization: Looking at Data through the Eyes of Experts (Theory in Practice)
37. The Top Ten Algorithms in Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
38. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (The Morgan Kaufmann Series in Data Management Systems)
39. 21 Recipes for Mining Twitter
40. Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R)
41. Data Mining Using SAS Enterprise Miner: A Case Study Approach
42. Data Mining for Intelligence, Fraud & Criminal Detection: Advanced Analytics & Information Sharing Technologies
43. Discovering Knowledge in Data: An Introduction to Data Mining
44. Practical Applications of Data Mining
45. Data Mining: Introductory and Advanced Topics
46. Mining the Web: Discovering Knowledge from Hypertext Data
47. How to Measure Anything: Finding the Value of Intangibles in Business
48. Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions (Synthesis Lectures on Data Min)
49. Data Analysis with Open Source Tools
50. Modern Data Warehousing, Mining, and Visualization: Core Concepts
51. A Practitioner’s Guide to Resampling for Data Analysis, Data Mining, and Modeling
52. Data Mining Methods and Models
53. Text Mining Application Programming (Charles River Media Programming)
54. Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications
55. Temporal Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
56. Microsoft® SQL Server 2008 R2 Analytics & Data Visualization
57. Data Analysis Using SQL and Excel
58. Data Mining and Machine Learning in Cybersecurity
59. Data Preparation for Data Mining Using SAS (The Morgan Kaufmann Series in Data Management Systems)
60. Data Preparation for Analytics Using SAS (SAS Press)
61. Practical Data Mining
62. Clinical Data-Mining: Integrating Practice and Research (Pocket Guides to Social Work Research Methods)
63. Business Intelligence: Data Mining and Optimization for Decision Making
64. Text Mining: Applications and Theory
65. Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge Communication Series)
66. Introduction to Business Data Mining
67. Mining the Social Web
68. Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data (Wiley Series on Methods and Applications in Data Mining)
69. Hadoop: The Definitive Guide
70. Understanding Complex Datasets: Data Mining with Matrix Decompositions (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
71. Functional Data Analysis with R and MATLAB (Use R)
72. Machine Learning and Data Mining for Computer Security: Methods and Applications (Advanced Information and Knowledge Processing)
73. Data Mining and Knowledge Discovery Handbook (Springer series in solid-state sciences)
74. Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms)
75. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
76. Practical Text Mining with Perl (Wiley Series on Methods and Applications in Data Mining)
77. Predictive Data Mining: A Practical Guide (The Morgan Kaufmann Series in Data Management Systems)
78. Data Warehousing For Dummies
79. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
80. Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis
81. Data Warehousing Essentials (ABC’s of Data Warehousing & Data Mining)
82. Investigative Data Mining for Security and Criminal Detection
83. Handbook of Educational Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
84. Data Mining: Multimedia, Soft Computing, and Bioinformatics
85. Making Sense of Data II: A Practical Guide to Data Visualization, Advanced Data Mining Methods, and Applications
86. Music Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
87. Making Sense of Data III: A Practical Guide to Designing Interactive Data Visualizations
88. Data Mining and Business Intelligence: A Guide to Productivity
89. Social Network Data Analytics
90. Data Mining and Knowledge Discovery with Evolutionary Algorithms
91. Beautiful Data
92. Data Mining for Association Rules and Sequential Patterns: Sequential and Parallel Algorithms
93. Exploratory Data Mining and Data Cleaning
94. Biological Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
95. Data Mining and Uncertain Reasoning: An Integrated Approach
96. Microsoft PowerPivot for Excel 2010: Give Your Data Meaning
97. Applied Data Mining for Business and Industry (Statistics in Practice)
98. Discovering Data Mining: From Concept to Implementation
99. Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel(R) with 100.
100. XLMiner(TM) + Making Sense of Data Set
101. Introduction to Clustering Large and High-Dimensional Data
102. Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining
103. Data Mining and Knowledge Discovery via Logic-Based Methods: Theory, Algorithms, and Applications (Springer Optimization and Its Applications)
104. International Journal of Data Mining and Bioinformatics
105. Building Data Mining Applications for CRM
106. Visual Data Mining: Techniques and Tools for Data Visualization and Mining
107. Text Mining: Predictive Methods for Analyzing Unstructured Information
108. Microsoft Data Mining: Integrated Business Intelligence for e-Commerce and Knowledge Management
109. Java Data Mining: Strategy, Standard, and Practice: A Practical Guide for architecture, design, and implementation (The Morgan Kaufmann Series in Data Management Systems)
110. Mining the Talk: Unlocking the Business Value in Unstructured Information
Minecraft Foam Pickaxe
111. Data Mining Powerpoint Templates – Data Mining Powerpoint Background – Data Mining PPT Templates
Business Modeling and Data Mining (The Morgan Kaufmann Series in Data Management Systems)
112. Mining the Web: Transforming Customer Data into Customer Value
113. Scientific Data Mining: A Practical Perspective
114. Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Management (Datawarehousing)
115. Applying and evaluating models to predict customer attrition using data mining techniques.: An article from: Journal of Comparative International Management
116. Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference, PAKDD’98, Melbourne, Australia, April 15-17, 1998, Proceedings
117. Research and Trends in Data Mining Technologies and Applications
118. Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
119. Optimization Based Data Mining: Theory and Applications (Advanced Information and Knowledge Processing)
120. Information-Statistical Data Mining: Warehouse Integration with Examples of Oracle Basics (The Springer International Series in Engineering and Computer Science)
121. Spatial and Spatiotemporal Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
122. Statistical Analysis of Network Data: Methods and Models (Springer Series in Statistics)
123. Data Mining for Design and Manufacturing: Methods and Applications
124. Introduction to Data Technologies (Chapman & Hall/CRC Computer Science & Data Analysis)
125. Data Clustering in C++: An Object-Oriented Approach (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
126. Data Mining Using SAS Enterprise Miner:
127. Advances in Machine Learning and Data Mining for Astronomy (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
128. Grouping Multidimensional Data: Recent Advances in Clustering
129. Algorithms of the Intelligent Web
130. Social Computing: A Data Mining Perspective (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
131. Agents and Data Mining: Interaction and Integration (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
132. Spectral Feature Selection for Data Mining (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)
133. Cluster Effects in Mining Complex Data
134. Exploring Advances in Interdisciplinary Data Mining and Analytics: New Trends
135. Data Mining: Foundations and Intelligent Paradigms: Volume 3: Medical, Health, Social, Biological and other Applications (Intelligent Systems Reference Library)

Continue Reading