Isotonic Regression Algorithms

Quentin F. Stout
Computer Science and Engineering, University of Michigan

 

Below is a table of the fastest known isotonic regression algorithms for several metrics and orderings. I've cited the first paper to give a correct algorithm with the given time bound, to the best of my knowledge. In some cases I've listed two if they appeared nearly contemporaneously. I've only include algorithms for exact calculations (to within numerical error), not approximations. I think I've included all orderings of practical interest. Feel free to contact me about corrections or updates.

This is not an historical survey of the results leading up to the ones listed here, but rather just a list of the best known now. Many of the papers listed have extensive historical references within them. I've also omitted related topics such as unimodal regression, integer-valued regression, regressions with constraints on the number of level sets or the differences between adjacent ones, etc. I might include some of these in the future if there is sufficient interest. No parallel algorithms are considered since regrettably there has been almost no work on such algorithms.

A directed acyclic graph (DAG) G with n vertices V = {v1,...,vn} and m edges E defines a partial order over the vertices, where vi precedes vj if and only if there is a path from vi to vj. It is assumed that G is connected, and hence m ≥ n-1. If it isn't connected then the algorithms would be applied to each component independently of the others. Given G with a weighted value at each vertex (also called data), (vi,yi,wi): vi ∈ V, yi ∈ ℜ, wi ≥ 0}, an Lp isotonic regression of the data is a set {ýi: ýi ∈ ℜ, 1 ≤ i ≤ n } that minimizes
(∑i=1n wi |yi - ýi|p)1/p  if 1 ≤ p < ∞
maxi=1n wi |yi - ýi| if p = ∞
among all sets satisfying the isotonic constraint that if vi precedes vj then ýi ≤ ýj   The regression error is the value of this expression.

The orderings listed below are linear, tree, points in multidimensional space, and general (i.e., an algorithm that applies to all orderings). A DAG of points in multidimensional space is the isotonic version of multivariate regression. For points in multidimensional space (the "dim" orderings), a point p=(p1,...,pd) in d-dimensional space is dominated by point q=(q1,...,qd) if pi ≤ qi for all 1 ≤ i ≤ d. If p is dominated by q then p precedes q, thus the regression value at p must be no larger than the value at q. The multidimensional points are further subdivided into grids and points in arbitrary positions, and into dimension 2 and dimension ≥ 3. The constants in the O-notational analysis depend on d but are not explicitly determined.

The metrics considered are L1, L2, and L. These metrics also go under a variety of other names: L1 is also known as median regression, least absolute deviation, Manhattan or taxi-cab distance; L2 is squared error regression or Euclidean distance; and L is also known as minimax optimization, uniform metric, Chebyshev distance, supremum, or maximum absolute deviation. Results for general Lp appear in [13].

The table lists the best time known for the given ordering and metric, and citations to the references where the algorithm is described, or to a relevant note. All times are worst-case. I've included the times solely as a function of n, and, when it is an improvement for some cases, as a function of n and m. There are no results listed for L2 on unweighted data because there do not seem to be any improvements known compared to the weighted case.

L1   note 8 L2   note 9 L   notes 3, 8 L1 unweighted L unweighted
linear   note 1 Θ(n log n)   [1,9] Θ(n) Θ(n log n)   [5,10] Θ(n log n)   + Θ(n)   *
tree   note 6 Θ(n log n)   [13] Θ(n log n)   [7] Θ(n log n)   [10] Θ(n log n)   + Θ(n)   *
2-dim grid Θ(n log n)   [13] Θ(n2)   [8] Θ(n log n)   * Θ(n log n)   + Θ(n)   *
2-dim arbitrary   notes 4, 5 Θ(n log2 n)   [13] Θ(n2 log n)   [13] Θ(n log2 n)   [11] Θ(n log2 n)   + Θ(n log n)   *
d ≥ 3 dim grid Θ(n2 log n)   * Θ(n3 log n)   * Θ(n log n)   * Θ(n1.5 logd+1 n)   [11] Θ(n)   *
d ≥ 3 dim arbitrary   note 5 Θ(n2 logd n)   * Θ(n3 logd n)   * Θ(n logd n)   [11] Θ(n1.5 logd+1 n)   [11] Θ(n logd-1 n)   *
arbitrary   note 7 Θ(n3)
Θ(nm + n2 log n)   [2]
Θ(n4)   note 2
Θ(n2m + n3 log n)   [13]
Θ(n2 log n)   [4]
Θ(m log n)    [10]
Θ(n2.5 log n)   [13]
 
Θ(n2)
Θ(m) note 10
* the time is implied by the result for arbitrary sets
+ the time is implied by the result for weighted data

 

Notes:
  1. For linear orders the "pair adjacent violators", PAV, approach has been repeatedly rediscovered. Apparently the first paper that used this is by Ayer et al. [3]. For the L2 metric it is trivial to implement in linear time, while for the others appropriate data structures are needed, increasing the time by a logarithmic factor.
  2. This is Maxwell and Muckstadt's [6] algorithm, with a correction supplied by Spouge, Wan, and Wilbur [8].

  3. The algorithm in [10] is a modest improvement of the algorithm of Kaufman and Tamir [4]. It replaced a linear programming approach with one that could be computed via topological sorting, changing the time from Θ(m log n + n log2 n) to Θ(m log n). This is faster for sparse graphs where m = o(n log n), which is relevant for all of the other orderings considered.

    The approach used in [4,10] is based on using parametric search and solving the inverse problem, an approach which would result in very large constant factors in any implementation. Fortunately this asymptotic time can be achieved using direct approaches for linear and tree orders [10], and multi-dimensional orders with arbitrary placement [11]. There is no known direct construction for grid orderings, d ≥ 2, which achieves the time bound obtained by the algorithm for general orders. As a side note, I'd be interested in hearing of any implementation of parametric search that uses AKS sort or some other logarithmic time parallel sorting algorithm, as opposed to, say, using bitonic sort. I don't know of a reason why any such implementation would exist.

  4. For 2-dimensional points with arbitrary placement, [13] shows how to use a balanced tree to simulate the 2-dimensional grid algorithms for the L1 and L2 metrics to obtain the indicated results.

  5. For points in arbitrary position and d ≥ 3 (and d=2 for the L metric), the results are based on embedding into a DAG which has more vertices but which can succinctly represent the domination ordering. The best result for general sparse DAGs is then applied to this new DAG. This was first used in the original version of [10], but was subsequently moved to [11]. The new DAG has Θ(n logd-1 n) edges and vertices, and can be constructed in time linear in its size. The time analysis for L2 uses the fact that the number of iterations needed is linear in n, rather than linear in the size of the new DAG. For L1 and L2 the analyses also use the fact that a minimum cost flow step in [2] only uses a number of steps linear in the number of vertices with nonzero weight, which too is n rather than the size of the new DAG. Without using this embedding, the natural representation of domination ordering for points in arbitrary position has m=Θ(n2), and so the bounds for general dense orderings would apply.

  6. It is straightforward to show that if the graph is a star, then the time is linear for all of the metrics.

  7. For arbitrary DAGs two bounds are given. The bound solely in terms of n makes it easy to see its relationship to the entries above it, and the bound which is in terms of n and m is the relevant one for sparse graphs.

  8. In general there are many optimal isotonic regressions when the L1 and L metrics are used. A natural, "best best" regression is the limit as p tends to 1 or ∞, respectively, of Lp isotonic regression. This is called strict isotonic regression. For L the fastest known algorithms for computing it take Θ(n log n) time for linear and tree orderings, and Θ(TC(n,m)) time for a general ordering, where TC(n,m) is the time required to determine the transitive closure of a DAG of n vertices and m edges. The best bound known for TC(n,m) is Θ(min{nm,n2.376}). These algorithms appear in [12]. Apparently no algorithms have been published for strict L1 isotonic regression, though PAV can be used for linear and tree orderings.

  9. All of the algorithms are measured in worst case time, and most of them have running time independent of the data. However, for DAGs other than linear and trees the L2 algorithms have a worst case where at each step either almost all vertices have a regression value larger than the average data value, or almost all are less. This introduces a factor of n which in practice is likely to be closer to log n. I do not know of a reasonable model of "expected case", so I don't have a proof of this.

  10. For unweighted data the time for L is Θ(m) since the regression value at a vertex can be chosen to be the average of the maximum value of any predecessor (including itself) and the minimum value of any successor (including itself). This can be computed via topological sorting and has been repeatedly rediscovered.

 

References:

  1. Ahuja, RK and Olin, JB (2001), "A fast scaling algorithm for minimizing separable convex functions subject to chain constraints", Operations Research 49, pp. 784-789
  2. Angelov, A, Harb, B, Kannan, S, and Wang, L-S (2006), "Weighted isotonic regression under the L1 norm", Symp. Discrete Algorithms (SODA), pp. 783-791.
  3. Ayer, M, Brunk, HD, Ewing, GM, Reid, WT, and Silverman, E (1955), "An empirical distribution function for sampling with incomplete information", Annals of Math. Stat. 5, pp. 641-647.
  4. Kaufman, Y and Tamir, A (1993), "Locating service centers with precedence constraints", Discrete Applied Math. 47, pp. 251-261.
  5. Lin, T-X, Kuo, C-C, Hsieh, Y-H, and Wang, B-F (2009), "Efficient algorithms for the inverse sorting problem with bound constraints under the L-norm and the Hamming distance", J. Comp. and Sys. Sci. 75, pp. 451-464.
  6. Maxwell, WL and Muckstadt, JA (1985), "Establishing consistent and realistic reorder intervals in production-distribution systems", Operations Research 33, pp. 1316-1341.
  7. Pardalos, PM and Xue, G (1999), "Algorithms for a class of isotonic regression problems", Algorithmica 23, pp. 211-222.
  8. Spouge J, Wan H, and Wilbur WJ (2003), "Least squares isotonic regression in two dimensions", J. Optimization Theory and Apps. 117, pp. 585-605.
  9. Stout, QF (2008), "Unimodal regression via prefix isotonic regression", Comp. Stat. and Data Anal. 53, pp. 289-297. A preliminary version appeared in "Optimal algorithms for unimodal regression", Computing and Stat. 32, 2000.
  10. Stout, QF (2011), "Weighted L isotonic regression", submitted. This is a revision of the original version (2008) that was posted on the web.
  11. Stout, QF (2011), "Isotonic regression for multiple independent variables", submitted.
  12. Stout, QF (2012), "Strict L isotonic regression", J. Optimization Theory and Appl. 152, pp. 121-135.
  13. Stout, QF (2012), "Isotonic regression via partitioning", Algorithmica, to appear.

Quentin's Home Copyright © 2009-2012 Quentin F. Stout.