Machine learning

Machine learning

From Wikipedia, the free encyclopedia

Jump to: navigation= , search

For the journal, see Machine = Learning=20 (journal).

Machine learning, a branch of artificial = intelligence, is a scientific discipline concerned with the design = and=20 development of algorithms=20 that allow computers to evolve = behaviors=20 based on empirical data, such as from sensor data or databases. A learner = can take=20 advantage of examples (data) to capture characteristics of interest of = their=20 unknown underlying probability distribution. Data can be seen as = examples that=20 illustrate relations between observed variables. A major focus of = machine=20 learning research is to automatically learn to recognize complex = patterns and=20 make intelligent decisions based on data; the difficulty lies in the = fact that=20 the set of all possible behaviors given all possible inputs is too large = to be=20 covered by the set of observed examples (training data). Hence the = learner must=20 generalize from the given examples, so as to be able to produce a useful = output=20 in new cases.

[edit]=20 Definition

Tom=20 M. Mitchell provided a widely quoted definition: A computer program = is said=20 to learn from experience E with respect to some class of tasks T and = performance=20 measure P, if its performance at tasks in T, as measured by P, improves = with=20 experience E.^=
[1]

[edit]=20 Generalization

The core objective of a learner is to generalize from its = experience.^=
[2]=20 The training examples from its experience come from some generally = unknown=20 probability distribution and the learner has to extract from them = something more=20 general, something about that distribution, that allows it to produce = useful=20 answers in new cases.

[edit]=20 Machine learning, knowledge discovery in databases = (KDD) and=20 data mining

These three terms are commonly confused, as they often employ the = same=20 methods and overlap strongly. They can be roughly separated as = follows:

Machine learning focuses on the prediction, based on known=20 properties learned from the training data
Data mining (which is the analysis step of Knowledge Discovery in=20 Databases) focuses on the discovery=20 of (previously) unknown properties on the data

However, these two areas overlap in many ways: data mining uses many = machine=20 learning methods, but often with a slightly different goal in mind. On = the other=20 hand, machine learning also employs data mining methods as "unsupervised = learning" or as a preprocessing step to improve learner accuracy. Much = of the=20 confusion between these two research communities (which do often have = separate=20 conferences and separate journals, ECML=20 PKDD being a major exception) comes from the basic assumptions they = work=20 with: in machine learning, the performance is usually evaluated with = respect to=20 the ability to reproduce known knowledge, while in KDD the key = task is=20 the discovery of previously unknown knowledge. Evaluated with = respect to=20 known knowledge, an uninformed (unsupervised) method will easily be = outperformed=20 by supervised methods, while in a typical KDD task, supervised methods = cannot be=20 used due to the unavailability of training data.

[edit]=20 Human = interaction

Some machine learning systems attempt to eliminate the need for human = intuition= =20 in data analysis, while others adopt a collaborative approach between = human and=20 machine. Human intuition cannot, however, be entirely eliminated, since = the=20 system's designer must specify how the data is to be represented and = what=20 mechanisms will be used to search for a characterization of the = data.

[edit]=20 Algorithm = types

Machine learning algorithms=20 can be organized into a taxonomy=20 based on the desired outcome of the algorithm.

Supervised=20 learning generates a function that maps inputs to desired = outputs=20 (also called labels, because they are often provided by human = experts=20 labeling the training examples). For example, in a classifi= cation=20 problem, the learner approximates a function mapping a vector into = classes by=20 looking at input-output examples of the function.
Unsupervised = learning models a set of inputs, like clustering. See also data mining and = knowledge=20 discovery.
Semi-super= vised=20 learning combines both labeled and unlabeled examples to = generate an=20 appropriate function or classifier.
Reinforcemen= t=20 learning learns how to act given an observation of the world. = Every=20 action has some impact in the environment, and the environment = provides=20 feedback in the form of rewards that guides the learning = algorithm.
Tra= nsduction=20 tries to predict new outputs based on training inputs, training = outputs, and=20 test inputs.
Learning=20 to learn learns its own inductive=20 bias based on previous experience.

[edit]=20 Theory

Main article: Compu= tational=20 learning theory

The computational analysis of machine learning algorithms and their=20 performance is a branch of theore= tical=20 computer science known as compu= tational=20 learning theory. Because training sets are finite and the future is=20 uncertain, learning theory usually does not yield guarantees of the = performance=20 of algorithms. Instead, probabilistic bounds on the performance are = quite=20 common.

In addition to performance bounds, computational learning theorists = study the=20 time complexity and feasibility of learning. In computational learning = theory, a=20 computation is considered feasible if it can be done in polynomial time. = There=20 are two kinds of time=20 complexity results. Positive results show that a certain class of = functions=20 can be learned in polynomial=20 time. Negative results show that certain classes cannot be learned = in=20 polynomial time.

There are many similarities between machine learning theory and = statistics,=20 although they use different terms.

[edit]=20 Approaches

Main article: List of=20 machine learning algorithms

[edit]=20 Decision tree=20 learning

Main article: Decision = tree=20 learning

Decision tree learning uses a decision=20 tree as a predictive=20 model which maps observations about an item to conclusions about the = item's=20 target value.

[edit]=20 Association = rule=20 learning

Main article: Associati= on rule=20 learning

Association rule learning is a method for discovering interesting = relations=20 between variables in large databases.

[edit]=20 Artificial = neural=20 networks

Main article: Artificia= l neural=20 network

An artificia= l=20 neural network (ANN) learning algorithm, usually called "neural = network"=20 (NN), is a learning algorithm that is inspired by the structure and/or=20 functional aspects of biologic= al neural=20 networks. Computations are structured in terms of an interconnected = group of=20 artificial=20 neurons, processing information using a connectionist=20 approach to computation.=20 Modern neural networks are non-linear=20 statistical=20 data=20 modeling tools. They are usually used to model complex relationships = between=20 inputs and outputs, to find=20 patterns in data, or to capture the statistical structure in an = unknown join= t=20 probability distribution between observed variables.

[edit]=20 Genetic=20 programming

Main articles: Genetic = programming=20 and Evolutiona= ry=20 computation

Genetic programming (GP) is an evolutionary= =20 algorithm-based methodology inspired= =20 by biological=20 evolution to find computer=20 programs that perform a user-defined task. It is a specialization of = genetic=20 algorithms (GA) where each individual is a computer program. It is a = machine=20 learning technique used to optimize a population of computer programs = according=20 to a fitness=20 landscape determined by a program's ability to perform a given = computational=20 task.

[edit]=20 Inductive = logic=20 programming

Main article: Inducti= ve logic=20 programming

Inductive logic programming (ILP) is an approach to rule learning = using logic = programming as a=20 uniform representation for examples, background knowledge, and = hypotheses. Given=20 an encoding of the known background knowledge and a set of examples = represented=20 as a logical database of facts, an ILP system will derive a hypothesized = logic=20 program which entails=20 all the positive and none of the negative examples.

[edit]=20 Support = vector=20 machines

Main article: Support=20 vector machines

Support vector machines (SVMs) are a set of related supervised = learning=20 methods used for classifi= cation=20 and regression.= =20 Given a set of training examples, each marked as belonging to one of two = categories, an SVM training algorithm builds a model that predicts = whether a new=20 example falls into one category or the other.

[edit]=20 Clustering

Main article: Cluster = analysis

Cluster analysis or clustering is the assignment of a set of = observations=20 into subsets (called clusters) so that observations in the same = cluster=20 are similar in some sense. Clustering is a method of unsupervised = learning, and a common technique for statistical=20 data=20 analysis.

[edit]=20 Bayesian = networks

Main article: Bayesian = network

A Bayesian network, belief network or directed acyclic graphical = model is a=20 probabilistic = graphical=20 model that represents a set of random = variables and=20 their conditiona= l=20 independencies via a directed=20 acyclic graph (DAG). For example, a Bayesian network could represent = the=20 probabilistic relationships between diseases and symptoms. Given = symptoms, the=20 network can be used to compute the probabilities of the presence of = various=20 diseases. Efficient algorithms exist that perform inference=20 and learning.

[edit]=20 Reinforcement=20 learning

Main article: Reinforcemen= t=20 learning

Reinforcement learning is concerned with how an agent ought to = take=20 actions in an environment so as to maximize some notion of = long-term reward. Reinforcement learning algorithms attempt to = find a=20 policy that maps states of the world to the actions the = agent=20 ought to take in those states. Reinforcement learning differs from the = supervised = learning=20 problem in that correct input/output pairs are never presented, nor = sub-optimal=20 actions explicitly corrected.

[edit]=20 Representation=20 learning

Several learning algorithms, mostly unsupervised = learning algorithms, aim at discovering better representations of = the inputs=20 provided during training. Classical examples include princ= ipal=20 components analysis and clustering.=20 Representation learning algorithms often attempt to preserve the = information in=20 their input but transform it in a way that makes it useful, often as a=20 pre-processing step before performing classification or predictions, = allowing to=20 reconstruct the inputs coming from the unknown data generating = distribution,=20 while not being necessarily faithful for configurations that are = implausible=20 under that distribution. Manifold=20 learning algorithms attempt to do so under the constraint that the = learned=20 representation is low-dimensional. Sparse=20 coding algorithms attempt to do so under the constraint that the = learned=20 representation is sparse (has many zeros). Deep=20 learning algorithms discover multiple levels of representation, or a = hierarchy of features, with higher-level, more abstract features defined = in=20 terms of (or generating) lower-level features. It has been argued that = an=20 intelligent machine is one that learns a representation that = disentangles the=20 underlying factors of variation that explain the observed data.^=
[3]

[edit]=20 Sparse = Dictionary=20 Learning

In the learning area, sparse dictionary learning is one of the most = popular=20 methods, and has gained a huge success in lots of applications. In = sparse=20 dictionary learning, a data is represented as a linear combination of = basis=20 functions, and the coefficients are assumed to be sparse. Let x = be a=20 d-dimensional data, D be a d by n matrix, where = each column=20 of D represent a basis function. r is the coefficient to = represent=20 x using D. Mathematically, sparse dictionary learning = means the=20 following $3D"$

where r is sparse. Generally speaking, n is assumed to be = larger than=20 d to allow the freedom for a sparse representation.

Sparse dictionary learning has been applied in different context. In=20 classification, the problem is to determine whether a new data belongs = to which=20 classes. Suppose we already build a dictionary for each class, then a = new data=20 is associate to the class such that it is best sparsely represented by = the=20 corresponding dictionary. People also applied sparse dictionary learning = in=20 image denoising. The key idea is that clean image path can be sparsely=20 represented by a image dictionary, but the noise cannot. User can refer = to = [4]=20 if interested.

[edit]=20 Applications

Applications for machine learning include:

machine=20 perception
computer=20 vision
natural= =20 language processing
synta= ctic=20 pattern recognition
search=20 engines
medical=20 diagnosis
bioinformatics
brain-mach= ine=20 interfaces
cheminformatics=
Detecting credit=20 card fraud
stock market=20 analysis
Classifying DNA=20 sequences
speech=20 and handwriting= =20 recognition
object=20 recognition in computer=20 vision
game = playing
software=20 engineering
adaptive=20 websites
robot=20 locomotion
computational= =20 finance
struct= ural=20 health monitoring.
Sentiment=20 Analysis (or Opinion Mining).

In 2006, the on-line movie company Netflix=20 held the first "Netflix=20 Prize" competition to find a program to better predict user = preferences and=20 beat its existing Netflix movie recommendation system by at least 10%. = The=20 AT&T Research Team BellKor beat out several other teams with their = machine=20 learning program "Pragmatic Chaos". After winning several minor prizes, = it won=20 the grand prize competition in 2009 for $1 million.^=
[5]

[edit]=20 Software

RapidMiner, KNIME, Weka, = ODM, Shogun = toolbox, Orange, Apache Mahout, = scikit-learn, mlpy are software suites = containing a variety of machine learning algorithms.

[edit]=20 Journals and = conferences

Machi= ne=20 Learning (journal)
Journal=20 of Machine Learning Research
Neural=20 Computation (journal)
Journal of Intelligent Systems(journal)
International=20 Conference on Machine Learning (ICML) (conference)
Neural=20 Information Processing Systems (NIPS) (conference)

[edit]=20 See also

Arti= ficial=20 intelligence portal

Adaptive=20 control
Cache=20 language model
Computat= ional=20 intelligence
Computat= ional=20 neuroscience
Cognitive=20 science
Data=20 mining

Explanat= ion-based=20 learning
Important=20 publications in machine learning
Multi-la= bel=20 classification
Pattern=20 recognition
Predictive=20 analytics

[edit]=20 References

^=20 * Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN= =20 0-07-042807-7, p.2.

^=20 Christopher=20 M. Bishop (2006) Pattern Recognition and Machine Learning, = Springer=20 ISBN= =20 0-387-31073-8.

^=20 Yoshua Bengio (2009). Learning Deep Architectures for AI. Now = Publishers=20 Inc.. p. 1=E2=80=933. = ISBN 9= 781601982940. http://books.google.com/books?id=3Dcq5ewg7FniMC&pg=3D= PA3.

^=20 Aharon, M, M Elad, and A Bruckstein. 2006. =E2=80=9CK-SVD: An = Algorithm for Designing=20 Overcomplete Dictionaries for Sparse Representation.=E2=80=9D Signal = Processing, IEEE=20 Transactions on 54 (11): 4311-4322

^=20 "BelKor Home Page" = research.att.com

[edit]=20 Further = reading

Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Pattern=20 Recognition", 4th Edition, Academic Press, I= SBN=20 978-1-59749-272-0.

Ethem Alpayd=C4=B1n (2004) Introduction to Machine Learning = (Adaptive=20 Computation and Machine Learning), MIT Press, ISBN= =20 0-262-01211-1

Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and = Usage=20 Data. Springer, ISBN= =20 3-540-37881-2

Toby Segaran, Programming Collective Intelligence, O'Reilly = ISBN= =20 0-596-52932-5

Ray=20 Solomonoff, "An Inductive Inference Machine" A privately = circulated=20 report from the 1956 Dartmouth=20 Summer Research Conference on AI.

Ray Solomonoff, An Inductive Inference Machine, IRE = Convention=20 Record, Section on Information Theory, Part 2, pp., 56-62, 1957.

Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1983),=20 Machine Learning: An Artificial Intelligence Approach, Tioga = Publishing=20 Company, ISBN= =20 0-935382-05-4.

Ryszard S. Michalski, Jaime G. Carbonell, Tom M. Mitchell (1986),=20 Machine Learning: An Artificial Intelligence Approach, Volume = II,=20 Morgan Kaufmann, ISBN= =20 0-934613-00-1.

Yves Kodratoff, Ryszard S. Michalski (1990), Machine Learning: = An=20 Artificial Intelligence Approach, Volume III, Morgan Kaufmann, ISBN= =20 1-55860-119-8.

Ryszard S. Michalski, George Tecuci (1994), Machine Learning: A = Multistrategy Approach, Volume IV, Morgan Kaufmann, ISBN= =20 1-55860-251-8.

Bishop, C.M. (1995). Neural Networks for Pattern = Recognition,=20 Oxford University Press. ISBN= =20 0-19-853864-2.

Richard O. Duda, Peter E. Hart, David G. Stork (2001) Pattern=20 classification (2nd edition), Wiley, New York, ISBN= =20 0-471-05669-3.

Huang T.-M., Kecman V., Kopriva I. (2006), Kernel Based Algorithms for Mining Huge Data Sets, = Supervised,=20 Semi-supervised, and Unsupervised Learning, Springer-Verlag, = Berlin,=20 Heidelberg, 260 pp. 96 illus., Hardcover, ISBN= =20 3-540-31681-7.

KECMAN Vojislav (2001), Learning and Soft Computing, Support Vector Machines, = Neural=20 Networks and Fuzzy Logic Models, The MIT Press, Cambridge, MA, 608 = pp.,=20 268 illus., ISBN= =20 0-262-11255-8.

MacKay, D.J.C. (2003). Information Theory, Inference, and Learning = Algorithms,=20 Cambridge University Press. ISBN= =20 0-521-64298-1.

Ian H. Witten and Eibe Frank Data Mining: Practical machine = learning=20 tools and techniques Morgan Kaufmann ISBN= =20 0-12-088407-0.

Sholom Weiss and Casimir Kulikowski (1991). Computer Systems = That=20 Learn, Morgan Kaufmann. ISBN= =20 1-55860-065-5.

Mierswa, Ingo and Wurst, Michael and Klinkenberg, Ralf and Scholz, = Martin=20 and Euler, Timm: YALE: Rapid Prototyping for Complex Data Mining = Tasks,=20 in Proceedings of the 12th ACM SIGKDD International Conference on = Knowledge=20 Discovery and Data Mining (KDD-06), 2006.

Trevor Hastie, Robert Tibshirani and Jerome Friedman (2001). The=20 Elements of Statistical Learning, Springer. ISBN= =20 0-387-95284-5.

Vladimir Vapnik (1998). Statistical Learning Theory.=20 Wiley-Interscience, ISBN= =20 0-471-03003-1.

[edit]=20 External = links

International Machine Learning Society

There is a popular online course by Andrew=20 Ng, at ml-class.org.=20 It uses GNU Octave. The = course is a=20 free version of Stanford=20 University's actual course, whose lectures are also available for free.

Machine Learning Video Lectures

Retrieved from "http://en.wikipedia.org/w/index.php?title=3DMachine_le= arning&oldid=3D465487977"

Categories: =

Learning=20 in computer vision

Machine=20 learning

Learning
=
Cybernetics

Personal tools

Namespaces

Variants

Views

Actions

Search

Contents