The main topic of this book is conformal prediction, a method of prediction recently developed in machine learning. Conformal predictors are among the most accurate methods of machine learning, and unlike other state-of-the-art methods, they provide information about their own accuracy and reliability.

The book integrates mathematical theory and revealing experimental work. It demonstrates mathematically the validity of the reliability claimed by conformal predictors when they are applied to independent and identically distributed data, and it confirms experimentally that the accuracy is sufficient for many practical problems. Later chapters generalize these results to models called repetitive structures, which originate in the algorithmic theory of randomness and statistical physics. The approach is flexible enough to incorporate most existing methods of machine learning, including newer methods such as boosting and support vector machines and older methods such as nearest neighbors and the bootstrap.

Topics and Features:

- Describes how conformal predictors yield accurate and reliable predictions, complemented with quantitative measures of their accuracy and reliability
- Handles both classification and regression problems
- Explains how to apply the new algorithms to real-world data sets
- Demonstrates the infeasibility of some standard prediction tasks
- Explains connections with Kolmogorov's algorithmic randomness, recent work in machine learning, and older work in statistics
- Develops new methods of probability forecasting and shows how to use them for prediction in causal networks

Researchers in computer science, statistics, and artificial intelligence will find the book an authoritative and rigorous treatment of some of the most promising new developments in machine learning. Practitioners and students in all areas of research that use quantitative prediction or machine learning will learn about important new methods.

The book may be purchased directly from Springer and from many on-line booksellers, including amazon.com.

**Vladimir Vovk** and **Alex Gammerman**
are Professors of Computer Science at Royal Holloway, University of London.
**Glenn Shafer** is Professor
in the Rutgers School of Business - Newark and New Brunswick.
All three authors are affiliated with the
Computer Learning Research Centre
at Royal Holloway, University of London.

The following articles further develop and review the theory presented in the book:

- Working Paper 1.
"On-line predictive linear regression"
by Vladimir Vovk, Ilia Nouretdinov and Alex Gammerman.
This article is written for statisticians. We consider the on-line predictive version of the standard statistical problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the usual prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state the book's result showing that in the on-line protocol the frequency of error does equal the nominal significance level, up to statistical fluctuations, and we describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is greatly underused in the statistical theory of regression. Published in the

*Annals of Statistics***37**:1566–1590 (2009). - Working Paper 2.
"Hedging predictions in machine learning"
by Alex Gammerman and Vladimir Vovk.
This is a written version of the second Computer Journal lecture, presented at the British Computer Society London Office on 12 June 2006. The lecture was followed by a discussion by some of the leading experts in machine learning, which is also included in the article. Its definitive version is published in the

*Computer Journal***50**:151–177 (2007). This is the description of conformal prediction as given in the article's abstract:Recent advances in machine learning make it possible to design efficient prediction algorithms for data sets with huge numbers of parameters. This article describes a new technique for "hedging" the predictions output by many such algorithms, including support vector machines, kernel ridge regression, kernel nearest neighbours, and by many other state-of-the-art methods. The hedged predictions for the labels of new objects include quantitative measures of their own accuracy and reliability. These measures are provably valid under the assumption of randomness, traditional in machine learning: the objects and their labels are assumed to be generated independently from the same probability distribution. In particular, it becomes possible to control (up to statistical fluctuations) the number of erroneous predictions by selecting a suitable confidence level. Validity being achieved automatically, the remaining goal of hedged prediction is efficiency: taking full account of the new objects' features and other available information to produce as accurate predictions as possible. This can be done successfully using the powerful machinery of modern machine learning.

- Working Paper 3.
"A tutorial on conformal prediction"
by Glenn Shafer and Vladimir Vovk.
This tutorial presents a self-contained account of the theory of conformal prediction and works through several numerical examples. Published in the

*Journal of Machine Learning Research***9**:371–421 (2008). - Working Paper 4.
"Plug-in martingales for testing exchangeability on-line"
by Valentina Fedorova, Alex Gammerman, Ilia Nouretdinov, and Vladimir Vovk.
A standard assumption in machine learning is the exchangeability of data, which is equivalent to assuming that the examples are generated from the same probability distribution independently. This paper is devoted to testing the assumption of exchangeability on-line: the examples arrive one by one, and after receiving each example we would like to have a valid measure of the degree to which the assumption of exchangeability has been falsified. Such measures are provided by exchangeability martingales. We extend known techniques for constructing exchangeability martingales and show that our new method is competitive with the martingales introduced before. Finally we investigate the performance of our testing method on two benchmark datasets, USPS and Statlog Satellite data; for the former, the known techniques give satisfactory results, but for the latter our new more flexible method becomes necessary. This article appears in the Proceedings of ICML 2012.

- Working Paper 5.
"Conditional validity of inductive conformal predictors"
by Vladimir Vovk.
Conformal predictors are automatically valid in the sense of having coverage probability equal to or exceeding a given confidence level. Inductive conformal predictors are a computationally efficient version of conformal predictors satisfying the same property of validity. However, inductive conformal predictors have been only known to control unconditional coverage probability. This paper explores various versions of conditional validity and various ways to achieve them using inductive conformal predictors and their modifications. These are the data set (the Spambase data set contributed by George Forman to the UCI Machine Learning Repository with column names) and R programs used in the experimental section of the conference version of the paper. The conference version of this article is published in the Proceedings of ACML 2012 (

*JMLR: Workshop and Conference Proceedings***25**:475–490, 2012). The journal version is published in*Machine Learning*(ACML 2012 Special Issue)**92**:349–376 (2013). - Working Paper 6.
"Cross-conformal predictors"
by Vladimir Vovk.
This note introduces the method of cross-conformal prediction, which is a hybrid of the methods of inductive conformal prediction and cross-validation, and studies its validity and predictive efficiency empirically. To appear in the Special Issue of

*Annals of Mathematics and Artificial Intelligence*on Conformal Prediction. - Working Paper 7.
"Venn-Abers predictors"
by Vladimir Vovk and Ivan Petej.
This paper continues study, both theoretical and empirical, of the method of Venn prediction, concentrating on binary prediction problems. Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well calibrated under the standard assumption that the observations are generated independently from the same distribution. We give a simple formalization and proof of this property. We also introduce Venn-Abers predictors, a new class of Venn predictors based on the idea of isotonic regression, and report promising empirical results both for Venn-Abers predictors and for their more computationally efficient simplified version.

- Working Paper 8.
"Transductive conformal predictors"
by Vladimir Vovk.
This paper, to appear in the COPA 2013 Proceedings, discusses a transductive version of conformal predictors. This version is computationally inefficient for big test sets, but it turns out that apparently crude "Bonferroni predictors" are about as good in their information efficiency and vastly superior in computational efficiency.

- Working Paper 9.
"Conformal prediction under hypergraphical models"
by Valentina Fedorova, Alex Gammerman, Ilia Nouretdinov, and Vladimir Vovk.
This paper, which will also appear in the COPA 2013 Proceedings, discusses conformal prediction for hypergraphical models (in the book, hypergraphical models are used only for Venn predictors).

- Working Paper 10.
"Efficiency of conformalized ridge regression"
by Evgeny Burnaev and Vladimir Vovk.
Conformal prediction can be applied on top of a wide range of prediction algorithms, which often make strong assumptions about the data-generating mechanism. For the method to be really useful it is desirable that in the case where the assumptions of the underlying algorithm are satisfied, the conformal predictor loses little in efficiency as compared with the underlying algorithm (whereas being a conformal predictor, it produces valid results even when those assumptions are not satisfied, provided the data are IID). This paper explores the degree to which this additional requirement of efficiency is satisfied in the case of Bayesian ridge regression.

- Working Paper 11.
"Criteria of efficiency for conformal prediction"
by Vladimir Vovk, Valentina Fedorova, Ilia Nouretdinov, and Alex Gammerman.
We study optimal conformity measures for various criteria of efficiency in an idealized setting. This leads to an important class of criteria of efficiency that we call probabilistic; it turns out that the most standard criteria of efficiency used in literature are not probabilistic.

For other research on conformal prediction, see the on-line prediction wiki. See also the events page.