Data Mining vs Screen-Scraping

Data mining isn’t screen-scraping. I know that some people in the room may disagree with that statement, but they’re actually two almost completely different concepts. In a nutshell, you might state it this way: screen-scraping allows you to get information, where data mining allows you to analyze information. That’s a pretty big simplification, so I’ll elaborate a bit.

The term “screen-scraping” comes from the old mainframe terminal days where people worked on computers with green and black screens containing only text. Screen-scraping was used to extract characters from the screens so that they could be analyzed. Fast-forwarding to the web world of today, screen-scraping now most commonly refers to extracting information from web sites. That is, computer programs can “crawl” or “spider” through web sites, pulling out data. People often do this to build things like comparison shopping engines, archive web pages, or simply download text to a spreadsheet so that it can be filtered and analyzed.

Data mining, on the other hand, is defined by Wikipedia as the “practice of automatically searching large stores of data for patterns.” In other words, you already have the data, and you’re now analyzing it to learn useful things about it. Data mining often involves lots of complex algorithms based on statistical methods. It has nothing to do with how you got the data in the first place. In data mining you only care about analyzing what’s already there.

The difficulty is that people who don’t know the term “screen-scraping” will try Googling for anything that resembles it. We include a number of these terms on our web site to help such folks; for example, we created pages entitled Text Data Mining, Automated Data Collection, Web Site Data Extraction, and even Web Site Ripper (I suppose “scraping” is sort of like “ripping”). So it presents a bit of a problem–we don’t necessarily want to perpetuate a misconception (i.e., screen-scraping = data mining), but we also have to use terminology that people will actually use.

Continue Reading

How to make “Personal Data Mining”

“Personal Data Mining” is a kind of data mining task which tries to identify pattern or behavior about someone (or yourself) from their personal data. The sources of personal data can come from mobile communication usage, financial expenses, meals consumption, daily activity (exercise, sleep, entertainment etc), mood and social media interaction (Facebook, Twitter, LinkedIn etc). There are some tools that you can use to mine these data such as:

DAYTUM – it helps to collect and communicate statistics about your daily personal life. You can define what items you want to track (coffee, miles, sleep time etc) as category (beverage, amount, shopping etc). It has varieties of display types for presenting your data. You can also submit your data using mobile app (iPhone) and social media site (Twitter) for your convenience. (Free and premium service)
your.flowingdata(YFD) – it allows you to record personal data from just about anywhere using Twitter. It has interactive data visualization tool to explore your data in a meaningful way. It also helps to understand yourself by monitoring your growth and progress through data. (Free service)
mycrocosm – an online service that allows you to share snippets of information from daily life activity in the form of simple statistical graphs. In particular it provides an alternative to purely text based micro-blogging software. It encourages users to creatively use the visual forms provided to express whatever they may want about themselves or the world they are living in. Mycrocosm also allows for data entry from a mobile device. (Free service)

For More Information about Data Minining click here

Continue Reading

Data Extraction – A Guideline to Use Scrapping Tools Effectively

So many people around the world do not have much knowledge about these scrapping tools. In their views, mining means extracting resources from the earth. In these internet technology days, the new mined resource is data. There are so many data mining software tools are available in the internet to extract specific data from the web. Every company in the world has been dealing with tons of data, managing and converting this data into a useful form is a real hectic work for them. If this right information is not available at the right time a company will lose valuable time to making strategic decisions on this accurate information.

This type of situation will break opportunities in the present competitive market. However, in these situations, the data extraction and data mining tools will help you to take the strategic decisions in right time to reach your goals in this competitive business. There are so many advantages with these tools that you can store customer information in a sequential manner, you can know the operations of your competitors, and also you can figure out your company performance. And it is a critical job to every company to have this information at fingertips when they need this information.

To survive in this competitive business world, this data extraction and data mining are critical in operations of the company. There is a powerful tool called Website scraper used in online digital mining. With this toll, you can filter the data in internet and retrieves the information for specific needs. This scrapping tool is used in various fields and types are numerous. Research, surveillance, and the harvesting of direct marketing leads is just a few ways the website scraper assists professionals in the workplace.

Screen scrapping tool is another tool which useful to extract the data from the web. This is much helpful when you work on the internet to mine data to your local hard disks. It provides a graphical interface allowing you to designate Universal Resource Locator, data elements to be extracted, and scripting logic to traverse pages and work with mined data. You can use this tool as periodical intervals. By using this tool, you can download the database in internet to you spread sheets. The important one in scrapping tools is Data mining software, it will extract the large amount of information from the web, and it will compare that date into a useful format. This tool is used in various sectors of business, especially, for those who are creating leads, budget establishing seeing the competitors charges and analysis the trends in online. With this tool, the information is gathered and immediately uses for your business needs.

Another best scrapping tool is e mailing scrapping tool, this tool crawls the public email addresses from various web sites. You can easily from a large mailing list with this tool. You can use these mailing lists to promote your product through online and proposals sending an offer for related business and many more to do. With this toll, you can find the targeted customers towards your product or potential business parents. This will allows you to expand your business in the online market.

There are so many well established and esteemed organizations are providing these features free of cost as the trial offer to customers. If you want permanent services, you need to pay nominal fees. You can download these services from their valuable web sites also.

For More Information about Data Minining click here

Continue Reading

Mining Unstructured Data

The information entered as “tweats” by a lot of people inside applications such as Twitter, Linkedin is unstructured. The “tweats” updated into such applications are similar to our own thought processes. Data Mining techniques involve mining on data that is precisely defined. For example a product survey contains questions such as Which color do you like the most? Which feature do you like the most, so on and so forth.

By writing some standard OLAP processing logic one would be able to derive reports required for providing critical business intelligence reports. In this case, there is also a considerable amount of effort spent on data definition, data entry and data analysis.

Tweats contain a lot of unstructured information. Rather than setting up a review committee to provide reviews on movies, products, packaged food,services so on and so forth, one can poke or construct a review system based on information updated into Twitter, Mouth Shut.com, Linkedin, facebook etc.. The challenges obviously would be to construct a mining system that would be based both on likelihood and statistics.

The user responses or tweats will be mapped with certain possible values. For example a tweat such as “Oh great I had a good time at the coffee shop” could indicate any value between 7 to 10 on a rating system. In the case of consolidating unstructured information, statistical inferences will be combined with likelihood. So the same tweat can be used to infer two similar situations or view points.

There are already some applications that provide Product reviews based on tweats inside Twitter. It is now for the developers to develop some more applications that can effectively consolidate responses in the form of tweats and also derive business intelligence.

Continue Reading