From: "Saved by Windows Internet Explorer 9" Subject: Machine learning - Wikipedia, the free encyclopedia Date: Tue, 27 Dec 2011 10:19:09 +0800 MIME-Version: 1.0 Content-Type: multipart/related; type="text/html"; boundary="----=_NextPart_000_0000_01CCC480.F8B45E90" X-MimeOLE: Produced By Microsoft MimeOLE V6.0.6002.18463 This is a multi-part message in MIME format. ------=_NextPart_000_0000_01CCC480.F8B45E90 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Location: http://en.wikipedia.org/wiki/Machine_learning =EF=BB=BF
Machine learning, a branch of artificial = intelligence, is a scientific discipline concerned with the design = and=20 development of algorithms=20 that allow computers to evolve = behaviors=20 based on empirical data, such as from sensor data or databases. A learner = can take=20 advantage of examples (data) to capture characteristics of interest of = their=20 unknown underlying probability distribution. Data can be seen as = examples that=20 illustrate relations between observed variables. A major focus of = machine=20 learning research is to automatically learn to recognize complex = patterns and=20 make intelligent decisions based on data; the difficulty lies in the = fact that=20 the set of all possible behaviors given all possible inputs is too large = to be=20 covered by the set of observed examples (training data). Hence the = learner must=20 generalize from the given examples, so as to be able to produce a useful = output=20 in new cases.
Tom=20 M. Mitchell provided a widely quoted definition: A computer program = is said=20 to learn from experience E with respect to some class of tasks T and = performance=20 measure P, if its performance at tasks in T, as measured by P, improves = with=20 experience E.= [1]
The core objective of a learner is to generalize from its = experience.= [2]=20 The training examples from its experience come from some generally = unknown=20 probability distribution and the learner has to extract from them = something more=20 general, something about that distribution, that allows it to produce = useful=20 answers in new cases.
These three terms are commonly confused, as they often employ the = same=20 methods and overlap strongly. They can be roughly separated as = follows:
However, these two areas overlap in many ways: data mining uses many = machine=20 learning methods, but often with a slightly different goal in mind. On = the other=20 hand, machine learning also employs data mining methods as "unsupervised = learning" or as a preprocessing step to improve learner accuracy. Much = of the=20 confusion between these two research communities (which do often have = separate=20 conferences and separate journals, ECML=20 PKDD being a major exception) comes from the basic assumptions they = work=20 with: in machine learning, the performance is usually evaluated with = respect to=20 the ability to reproduce known knowledge, while in KDD the key = task is=20 the discovery of previously unknown knowledge. Evaluated with = respect to=20 known knowledge, an uninformed (unsupervised) method will easily be = outperformed=20 by supervised methods, while in a typical KDD task, supervised methods = cannot be=20 used due to the unavailability of training data.
Some machine learning systems attempt to eliminate the need for human = intuition= =20 in data analysis, while others adopt a collaborative approach between = human and=20 machine. Human intuition cannot, however, be entirely eliminated, since = the=20 system's designer must specify how the data is to be represented and = what=20 mechanisms will be used to search for a characterization of the = data.
Machine learning algorithms=20 can be organized into a taxonomy=20 based on the desired outcome of the algorithm.
The computational analysis of machine learning algorithms and their=20 performance is a branch of theore= tical=20 computer science known as compu= tational=20 learning theory. Because training sets are finite and the future is=20 uncertain, learning theory usually does not yield guarantees of the = performance=20 of algorithms. Instead, probabilistic bounds on the performance are = quite=20 common.
In addition to performance bounds, computational learning theorists = study the=20 time complexity and feasibility of learning. In computational learning = theory, a=20 computation is considered feasible if it can be done in polynomial time. = There=20 are two kinds of time=20 complexity results. Positive results show that a certain class of = functions=20 can be learned in polynomial=20 time. Negative results show that certain classes cannot be learned = in=20 polynomial time.
There are many similarities between machine learning theory and = statistics,=20 although they use different terms.
Decision tree learning uses a decision=20 tree as a predictive=20 model which maps observations about an item to conclusions about the = item's=20 target value.
Association rule learning is a method for discovering interesting = relations=20 between variables in large databases.
An artificia= l=20 neural network (ANN) learning algorithm, usually called "neural = network"=20 (NN), is a learning algorithm that is inspired by the structure and/or=20 functional aspects of biologic= al neural=20 networks. Computations are structured in terms of an interconnected = group of=20 artificial=20 neurons, processing information using a connectionist=20 approach to computation.=20 Modern neural networks are non-linear=20 statistical=20 data=20 modeling tools. They are usually used to model complex relationships = between=20 inputs and outputs, to find=20 patterns in data, or to capture the statistical structure in an = unknown join= t=20 probability distribution between observed variables.
Genetic programming (GP) is an evolutionary= =20 algorithm-based methodology inspired= =20 by biological=20 evolution to find computer=20 programs that perform a user-defined task. It is a specialization of = genetic=20 algorithms (GA) where each individual is a computer program. It is a = machine=20 learning technique used to optimize a population of computer programs = according=20 to a fitness=20 landscape determined by a program's ability to perform a given = computational=20 task.
Inductive logic programming (ILP) is an approach to rule learning = using logic = programming as a=20 uniform representation for examples, background knowledge, and = hypotheses. Given=20 an encoding of the known background knowledge and a set of examples = represented=20 as a logical database of facts, an ILP system will derive a hypothesized = logic=20 program which entails=20 all the positive and none of the negative examples.
Support vector machines (SVMs) are a set of related supervised = learning=20 methods used for classifi= cation=20 and regression.= =20 Given a set of training examples, each marked as belonging to one of two = categories, an SVM training algorithm builds a model that predicts = whether a new=20 example falls into one category or the other.
Cluster analysis or clustering is the assignment of a set of = observations=20 into subsets (called clusters) so that observations in the same = cluster=20 are similar in some sense. Clustering is a method of unsupervised = learning, and a common technique for statistical=20 data=20 analysis.
A Bayesian network, belief network or directed acyclic graphical = model is a=20 probabilistic = graphical=20 model that represents a set of random = variables and=20 their conditiona= l=20 independencies via a directed=20 acyclic graph (DAG). For example, a Bayesian network could represent = the=20 probabilistic relationships between diseases and symptoms. Given = symptoms, the=20 network can be used to compute the probabilities of the presence of = various=20 diseases. Efficient algorithms exist that perform inference=20 and learning.
Reinforcement learning is concerned with how an agent ought to = take=20 actions in an environment so as to maximize some notion of = long-term reward. Reinforcement learning algorithms attempt to = find a=20 policy that maps states of the world to the actions the = agent=20 ought to take in those states. Reinforcement learning differs from the = supervised = learning=20 problem in that correct input/output pairs are never presented, nor = sub-optimal=20 actions explicitly corrected.
Several learning algorithms, mostly unsupervised = learning algorithms, aim at discovering better representations of = the inputs=20 provided during training. Classical examples include princ= ipal=20 components analysis and clustering.=20 Representation learning algorithms often attempt to preserve the = information in=20 their input but transform it in a way that makes it useful, often as a=20 pre-processing step before performing classification or predictions, = allowing to=20 reconstruct the inputs coming from the unknown data generating = distribution,=20 while not being necessarily faithful for configurations that are = implausible=20 under that distribution. Manifold=20 learning algorithms attempt to do so under the constraint that the = learned=20 representation is low-dimensional. Sparse=20 coding algorithms attempt to do so under the constraint that the = learned=20 representation is sparse (has many zeros). Deep=20 learning algorithms discover multiple levels of representation, or a = hierarchy of features, with higher-level, more abstract features defined = in=20 terms of (or generating) lower-level features. It has been argued that = an=20 intelligent machine is one that learns a representation that = disentangles the=20 underlying factors of variation that explain the observed data.= [3]
In the learning area, sparse dictionary learning is one of the most = popular=20 methods, and has gained a huge success in lots of applications. In = sparse=20 dictionary learning, a data is represented as a linear combination of = basis=20 functions, and the coefficients are assumed to be sparse. Let x = be a=20 d-dimensional data, D be a d by n matrix, where = each column=20 of D represent a basis function. r is the coefficient to = represent=20 x using D. Mathematically, sparse dictionary learning = means the=20 following
where r is sparse. Generally speaking, n is assumed to be = larger than=20 d to allow the freedom for a sparse representation.
Sparse dictionary learning has been applied in different context. In=20 classification, the problem is to determine whether a new data belongs = to which=20 classes. Suppose we already build a dictionary for each class, then a = new data=20 is associate to the class such that it is best sparsely represented by = the=20 corresponding dictionary. People also applied sparse dictionary learning = in=20 image denoising. The key idea is that clean image path can be sparsely=20 represented by a image dictionary, but the noise cannot. User can refer = to = [4]=20 if interested.
Applications for machine learning include:
In 2006, the on-line movie company Netflix=20 held the first "Netflix=20 Prize" competition to find a program to better predict user = preferences and=20 beat its existing Netflix movie recommendation system by at least 10%. = The=20 AT&T Research Team BellKor beat out several other teams with their = machine=20 learning program "Pragmatic Chaos". After winning several minor prizes, = it won=20 the grand prize competition in 2009 for $1 million.= [5]
RapidMiner, KNIME, Weka, = ODM, Shogun = toolbox, Orange, Apache Mahout, = scikit-learn, mlpy are software suites = containing a variety of machine learning algorithms.