Quentin F. Stout
Computer Science and Engineering, University of Michigan
Here is a table of the fastest known isotonic regression algorithms for several metrics and orderings. I've cited the first paper to give a correct algorithm with the given time bound, to the best of my knowledge. In some cases I've listed 2 if they appeared nearly contemporaneously. I've only include algorithms for exact calculations (to within numerical error), not approximations. Feel free to contact me about corrections or updates.
This is not an historical survey of the results leading up to the ones listed here, but rather just a list of the best known now. Many of the papers listed have extensive historical references within them. I've also omitted related topics such as unimodal regression, integer-valued regression, regressions with constraints on the number of level sets or the differences between adjacent ones, etc. I might include some of these in the future if there is sufficient interest.
A directed acyclic graph (DAG) G with n vertices V = {v1,...,vn} and m edges E defines a partial order over the vertices, where vi precedes vj if and only if there is a path from vi to vj. It is assumed that G is connected, and hence m ≥ n-1. If it isn't connected then the algorithms would be applied to each component independently of the others. Given G with a weighted value at each vertex (also called data), (vi,yi,wi): vi ∈ V, yi ∈ ℜ, wi ≥ 0}, an Lp isotonic regression of the data is a set {ýi: ýi ∈ ℜ, 1 ≤ i ≤ n } that minimizes
| (∑i=1n wi |yi - ýi|p)1/p | if 1 ≤ p < ∞ |
| maxi=1∞ wi |yi - ýi| | if p = ∞ |
The orderings listed are linear, tree, points in multidimensional space, and general (i.e., an algorithm that applies to all orderings). For points in multidimensional space (the ``dim'' orderings), a point p=(p1,...,pd) in d-dimensional space is dominated by point q=(q1,...,qd) if pi ≤ qi for all 1 ≤ i ≤ d. If p is dominated by q then p precedes q, thus the regression value at p must be no larger than the value at q. The multidimensional points are further subdivided into grids and points in arbitrary positions, and into dimension 2 and dimension ≥ 3. The constants in the O-notational analysis depend on d but are not explicitly determined.
The metrics considered are L1, L2, and L∞. These metrics also go under a variety of other names: L1 is also known as median regression or Manhatten distance; L2 is squared error regression; and L∞ is also known as minimax optimization, uniform metric, Chebyshev distance, or least absolute deviation.
The table lists the best time known for the given ordering and metric, and citations to the references where the algorithm is described, or to a relevant note.
| L1 | L2 | L∞ | |
| linear | Θ(n log n) [1,9] | Θ(n) Note 1 | Θ(n log n) *, [5,10] |
| tree | Θ(n log n) [11] | Θ(n log n) [7] | Θ(n log n) *, [10] |
| 2-dim grid | Θ(n log n) [11] | Θ(n2) [8] | Θ(n log n) * |
| 2-dim arbitrary | Θ(n log2 n) Note 4 | Θ(n2 log n) Note 4 | Θ(n log2 n) [10] |
| d ≥ 3 dim grid | Θ(n2 log n) * | Θ(n3 log n)   * | Θ(n log n) * |
| d ≥ 3 dim arbitrary | Θ(n2 logd n) [12] + | Θ(n3 log2d-1) [12] + | Θ(n logd n) [10] |
| arbitrary | Θ(n3) [2] Θ(nm+n2log n) |
Θ(n4) Note 2 Θ(n2m log n) |
Θ(n2 log n) Note 3 Θ(m log n) |
|
* the time is implied by the result for arbitrary sets + uses the construction of a related DAG, to which the general algorithm is applied |
|||
Notes:
The best general algorithm for L2 is Spouge, Wan, and Wilbur's [8] correction to Maxwell and Muckstadt's [6] algorithm. Maxwell and Muckstadt claimed that their algorithm took Θ(n4) time, but Spouge, Wan, and Wilbur showed that it used an NP-complete subproblem. They showed how to modify the algorithm to obtain quartic time. The time is Θ(n F(n,m)), where F(n,m) is the time to compute a maximal flow on a graph with n vertices and m edges. The Sleator and Tarjan max flow algorithm has F(n,m) = Θ(nm log n), which gives the best time known when m = o(n2 log n).
The best general algorithm for L∞ is the improvement in [10] to the algorithm of Kaufman and Tamir [4]. This replaced a linear programming approach with one that could be computed via topological sorting, changing the time from Θ(m log n + n log2 n) to Θ(m log n), which is faster for sparse graphs where m = o(n log n).
For 2-dimensional points with arbitrary placement, [11] shows how to simulate the 2-dimensional grid algorithms for the L1 and L2 metrics to obtain the indicated results.
For points in arbitrary position and d ≥ 3 (and d=2 for the L∞ metric), the results are based on constructing a larger set of points for which domination ordering can be specified with a succinct set of edges, and then applying the best result for general sparse DAGS to this new set of points. This was first used in the original version of [10], and subsequently in [11]. However, the revised version of [10] replaces this approach with another which avoids parametric search by using more direct construction. The sparse DAG construction and application to all 3 metrics was moved to a short paper [12].
The DAG constructed has Θ(n logd-1 n) edges and vertices, hence the number of edges is linear in the number of vertices. This is true for the other orderings as well, which is why I have mentioned results for sparse graphs in the entries of algorithms for general orders. Without using this embedding, the natural representation of domination ordering for points in arbitrary position has m=Θ(n2), and so the bounds for general dense orderings would apply.
For L∞ the approach used in [4,10] is based on using parametric search and solving the inverse problem. A direct approach for linear orders appears in [5,10], and for tree orders appears in [10], as do the results for dimensional orders with arbitrary placement. There is no known direct construction for grid orderings, d ≥ 2, which achieves the time bound obtained by the algorithm for general orders.
For L∞ and unweighted data, the time reduces to Θ(m). Unweighted data does not improve the best time known for the other two metrics.
It is straightforward to show that if the graph is a star, then the time is linear for all of the metrics.
For the arbitrary DAG entries I included the crude bound, solely in terms of n, to make it easier to see its relationship to the entries above it, as well as the bound which relies on n and m, since that is the relevant version for the sparse graphs.
References:
![]() |
Copyright © 2009 Quentin F. Stout. |