Isotonic Regression via Partitioning

Quentin F. Stout
Computer Science and Engineering, University of Michigan

 

Extended Abstract: This paper gives algorithms for determining weighted isotonic regressions satisfying order constraints given by various ordered sets (directed acyclic graphs, aka DAGs). For the L1 metric a partitioning approach is used which exploits the fact that L1 regression values can always be chosen to be data values. Extending this approach, algorithms for binary-valued L1 isotonic regression are used to find Lp isotonic regressions for 1 < p < ∞. Algorithms are given for trees, 2-dimensional and multidimensional orderings, and arbitrary DAGs. Algorithms are also given for Lp isotonic regression with constrained data and weight values, L1 regression with unweighted data, and L1 regression for DAGs where there are multiple data values at the vertices.

Let n denote the number of vertices and m the number of edges. The algorithms show that one can determine the L1 regression for a

For all of the times which include log factors, one of the factors can be reduced to log k if there are only k different data values. The ordering for d-dimensional data is the natural ordering, where < x1,...,xd> preceeds < y1,...,yd > if and only if xi ≤ yi for all 1 ≤ i ≤ d.

The paper's results for Lp apply for general p, but vary somewhat depending on whether p is 2, 3, 4, or 5, is an even integer, is an odd integer, or is nonintegral. See the paper for an explanation of the complete results and reasons for the differences. To illustrate, the times of the new algorithms for L2 are:

For general p there are algorithms which are semi-exact, meaning as exact as one can compute the root of a specific equation (this is exact for p = 2, 3, 4, and 5); accurate to within δ, meaning that the isotonic regression generated is pointwise within δ of the optimal isotonic regression; and exact, when the data has binary values and positive integer weights in a constrained range.

The algorithm for a DAG which is d-dimensional data in arbitrary locations, for d ≥ 3, is based on first creating a new DAG with more vertices but significantly fewer edges in the worst case, and then doing isotonic regression on the new DAG. The construction of the DAG is described in Isotonic Regression for Multiple Independent Variables.

Notes not in the paper:

Keywords: weighted isotonic median regression, nonparametric, shape constrained, L1, Lp, partial order, DAG, linear order, chain, tree, star, bivariate, multidimensional, classification, unweighted

Complete paper. This appears in Algorithmica 66 (2013), pp. 93-112.

Implementations of the algorithms for 2-d grids appears in the UniIsoRegression package on CRAN.

 

Related work: Here is an overview of my work on shape constrained regression (isotonic, unimodal, step), and here is information about the fastest (in O-notation) isotonic regression algorithms known to date, for a variety of orderings and Lp metrics.


Quentin's Home Copyright © 2009-2024 Quentin F. Stout.