Wednesday, July 11, 2012

Nokogiri error


How to use Nokogiri to collect data?

The first step is to Install Nokogiri 

When I tried to install Nokogiri on my Windows OS using


gem install nokogiri


I confronted an error shows


 It looks like it is about the conversion of the binary code. I can imagine a real computer engineer could solve this with casual enthusiasm, but now I have to count on myself. 
 I turn to Google of help. Nothing.
Then I turn to Nokogiri-talk , no helpful information. 
 I post my question on it , and I am suggested to upgrade my RDoc packet. The suggestion really helped!

 And I found on the blog :


 that it is prerequisite to install libxml or libxml before installing Nokogiri. Do what it ask you to do.
  
but I encounter another trouble, 

when I typed    irb(main):001:0> require 'rubygems'
an error occured...



We can get example of Nokogiri by googling "Nokogiri css example"







Tuesday, July 10, 2012

Ruby Programming Language: Basic Ruby command


We have learned how to run Ruby on cmd.exe on Windows

So let's learn some Ruby.

Concise Ruby Language Tutorial 
A short, but quite enough tutorial for beginners. But we don't spend too much time going through every chapter here, because our goal is to learn data-mining, not Ruby. We can review it when we need it.


Quick Note
 
A quick overview of the most basic commands, in case we forgot.

   







Tuesday, July 3, 2012

Ruby Programming Language Note: Basic Commands for cmd.exe


Ruby Programming Language Note


The first class of learning ruby is not learning ruby language, but leaning some command for cmd.exe. It's because we data miners don't use Interactive Ruby, but Command Prompt Ruby.
we run ruby on cmd.exe.



command                           description
---------------------------------------------------------------------------------------------------
dir                                      list the files/directory under the current directory
                                          Its equivalent is ls on Linux and OS

cd                                      enter the directory/ file

cd ..                                   return to the previous directory, remember the space before tow
                                         periods.  // very similar to Linux commnad ^^
mkdir                                 make directory

move                                 move

copy                                 copy

ruby                                 run ruby files

You can find all commands and its explanation HERE

Tutorial for cmd.exe commands

For people who haven't learn Linux before, I recommend the Bible of Linux- 鳥哥的Linux私房菜

------------------------------------------------------------------------------------------------------

OK, this is the first class here.


Data mining booknote



Data mining takes advantage of advances in the fields of artificial intelligence (AI) and statistics.Both disciplines have been working on problems of pattern recognition and classification. Both communities have made great contributions to the understanding and application of neural nets and decision trees.

1. Describe the Data
        explore data, select data, cleanse the data,and classify data.
 
2. Build Predictive Model
          I. use sample data to build a predictive model (based on patterns with known results)
         II. use data outside the sample to test the veracity of the model
        III. empirically verify the model by applying model to customer's database

3.  Data Mining's DONTS
        I. It doesn't uncover solution automatically, and the patterns uncovered must be verified to
            the reality.
       II. the predictive uncovered patterns are NOT necessarily the CAUSES of the behaviors.
      III. you must understand your data,

4. Data mining does not replace skilled business analysts but confirm their empirical observations
   and find new, subtle patterns that yield steady incremental improvement (plus the occasional
   breakthrough insight).


Application:
    Detect Fraud:  Telecommunication, Credit Card Company, Insurance Company, and Stock
                          Exchange Company.


   Medical effective test


   Retailer





     









source: http://www.twocrows.com/intro-dm.pdf

Monday, July 2, 2012

Software for Data Mining


To Start Data Mining, You Need:


MicroSoft Excel  

  the most available software, and just avoid us from wasting time on learning the interface.

mySQL

R
  a statistic analysis software. Free and professional. Everyone deals with statistics use it            
  Download point http:// cran.r-project.org/


RUBY

  language that makes it easy to grab data table and transform data into suitable format you need.
  easy to learn and use.

Nokogiri
 
   Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
 
Nokogiri save files in a Ruby library, and that's why we use ruby.