D-ADMM: Distributed ADMM

D-ADMM is a distributed optimization algorithm and it stands for Distributed Alternating Direction Method of Multipliers. It was first proposed in the paper

J. Mota, J. Xavier, P. Aguiar, and M. Püschel
Distributed Basis Pursuit
IEEE Transactions Signal Processing, Vol. 60, No. 4, pp. 1942-1956, 2012

and explored in

J. Mota, J. Xavier, P. Aguiar, and M. Püschel
D-ADMM: A Communication-Efficient Distributed Algorithm For Separable Optimization
IEEE Transactions Signal Processing, Vol.61, No. 10, pp. 2718-2723, 2013

to solve other signal processing problems. This page overviews and provides Matlab code for D-ADMM.

Overview

The setting is a connected network with \(P\) nodes, as shown in the figure below for \(P=10\) nodes.

_images/single1.png

Each node in the network has a private function \(f_p:\mathbb{R}^n \rightarrow \mathbb{R}\) and a private set \(X_p \subseteq \mathbb{R}^n\). The goal is to minimize the sum of all functions, while constraining the solution to be in the intersection of all the sets. Mathematically, we want to

(1)\[\begin{split}\begin{array}{ll} \underset{x \in \mathbb{R}^n}{\text{minimize}} & f_1(x) + f_2(x) + \cdots + f_P(x) \\ \text{subject to} & x \in X_1 \cap X_2 \cap \ldots \cap X_P\,, \end{array}\end{split}\]

where the minimization is with respect to \(x \in \mathbb{R}^n\). Each function \(f_p\) is assumed real-valued and convex, and each set \(X_p\) is assumed closed and convex. D-ADMM solves (1) in a distributed way.

Definition: Distributed Algorithm

  • Both \(f_p\) and \(X_p\) are private to node \(p\), i.e., no other node has access to \(f_p\) or \(X_p\)
  • There is no central or aggregation node
  • Each node communicates only with its neighbors (all-to-all communications are not allowed)

D-ADMM solves (1) by reformulating it into another problem and then applying an extended version of ADMM. One of the key concepts used in our reformulation is network coloring. Network coloring is an assignment of colors (actually, numbers) to the nodes of the network such that two neighbors have always a different color. The network shown above can be colored as follows:

_images/single2.png

The coloring scheme defines groups of nodes that work in parallel: for example, the red nodes work at the same time, then the green nodes do the same, and so on. At each iteration, each node performs:

Operations performed by node \(\small \boldsymbol{p}\) at each iteration

  1. Solve the optimization problem

    (2)\[\begin{split}x_p^{k+1} = \begin{array}[t]{cl} \underset{x_p}{\arg\min} & f_p(x_p) + \Bigl(\gamma_p^k - \rho \sum_{j \in \mathcal{N}_p} x_j^k\Bigr)^\top x_p + \frac{D_p\rho}{2}\|x_p\|^2 \\ \text{s.t.} & x_p \in X_p \end{array}\end{split}\]
  2. Send \(x_p^{k+1}\) to its neighbors

  3. Update \(\gamma_p^{k+1} = \gamma_p^k + \rho \sum_{j \in \mathcal{N}_p}(x_p^{k+1} - x_j^{k+1})\)

The following video illustrates how the algorithm works.

Matlab Code

The code for D-ADMM can be found in

This code is free software, licensed under the GNU Public License. The package provides examples and the necessary data to generate the figures in the D-ADMM paper. Executing all examples requires the spgl1 solver and the Matlab Optimization Toolbox. Besides this, no special installation is required: just go to a given directory and execute the code, as explained below for some cases. However, to use this code, you’ll have to provide your own solver for (2).

The syntax of D-ADMM is:

>> [X, vars_prob, varargout] = DADMM(n, vars_prob, vars_network, varargin)

where

n:

size of the variable \(x\)

vars_prob:

Matlab struct containing the field handler with a handler for a solver of (2)

vars_network:

Matlab struct with network information. It should contain:

P: number of network nodes
neighbors: cell array of size P, where the pth entry contains a vector with the indices of the neighbors of node p; the nodes are numbered from 1 to P
partition_colors: cell array of size C, the number of colors of the coloring scheme. Each entry contains a vector with the nodes that have that color. For example, in a network with 6 nodes, {[1,2], [3,4,5], 6} would mean that there are 3 colors: nodes 1 and 2 have color 1, nodes 3, 4, 5 have color 2, and node 6 has color 3.

See the documentation in the code for optional arguments. The algorithm outputs

X:cell array of size P, where the pth entry contains the solution found by node p
vars_prob:struct that the user provided as input; this struct can be used to store information during the algorithm.

Again, see the documentation for the optional output arguments.

For some examples on how to use the code, see the four folders inside the directory DADMM/. There, you can also find solvers for the respective instances of (2). The simplest example is the average consensus, where \(f_p(x) = \frac{1}{2}(x - \theta_p)^2\), and the variable \(x\) is scalar. Problem (2) in this case becomes

\[x_p^{k+1} = \underset{x}{\text{argmin}} \,\,\, \frac{1}{2} (x - \theta_p)^2 + v_p x + \frac{D_p \rho}{2} x^2\,,\]

where \(v_p = \gamma_p^k - \rho \sum_{j \in \mathcal{N}_p}x_j^k\), and it has the closed-form solution \(x_p^{k+1}= (\theta_p - v_p)/(1 + D_p \rho)\). The directory DADMM/AverageConsensus/ contains a function AverageConsensusSubProb.m implementing this solution. Setting Matlab to this directory and running the script RunConsensus.m, you should obtain a plot and the following output:

>> RunConsensus
||x_estimate - solution||/||solution|| = 0.000009
Number of iterations = 53
stop_crit = EPS REACHED
iter_for_errors =
1.000000E-01    17
1.000000E-02    26
1.000000E-03    35
1.000000E-04    44
1.000000E-05    53
1.000000E-06    Inf
1.000000E-07    Inf
1.000000E-08    Inf
1.000000E-09    Inf
1.000000E-10    Inf

The first line indicates the relative error achieved by the algorithm, the second line the number of iterations (communication steps), and the third line whether the algorithm achieved the prescribed error (EPS REACHED) or the maximum number of iterations (MAX ITERATIONS REACHED). It is also shown the number of iterations to reach the canonical errors in between, \(10^{-1}, 10^{-2}, \ldots\).

_images/ExperimentalResults.png

To generate each plot in Figure 3 of the paper (replicated above), go to the respective folder in CompareAlgs/ and run

>> RunExperimentsNodes(50)

Beware that these experiments take a long time to run because each ADMM-based algorithm is executed several times for each network, each time with a different augmented Lagrangian parameter \(\rho\). The set of \(\rho\)‘s is \(\{10^{-4}, 10^{-3},10^{-2}, 10^{-1}, 1, 10, 10^2\}\). You can also access the results in the Results/ directories. More networks, including networks with 2000 nodes, can be found in the directory GenerateData/Networks/.

Generalization of D-ADMM

For a generalization of D-ADMM for the case where the variable at each node depends on a subset of the components of the variable, rather than on all of them, see:

J. Mota, J. Xavier, P. Aguiar, and M. Püschel
Distributed Optimization With Local Domains: Applications in MPC and Network Flows
to appear in IEEE Transactions on Automatic Control

The code is available here.

References

D-ADMM is described and analyzed in

  • J. Mota, J. Xavier, P. Aguiar, and M. Püschel
    D-ADMM: A Communication-Efficient Distributed Algorithm For Separable Optimization
    IEEE Transactions Signal Processing, Vol.61, No. 10, 2013
  • J. Mota, J. Xavier, P. Aguiar, and M. Püschel
    Distributed Basis Pursuit
    IEEE Transactions Signal Processing, Vol. 60, No. 4, 2012

The package provided above also implements the algorithms in the following papers:

  • I. Schizas, A. Ribeiro, and G. Giannakis
    Consensus in Ad Hoc WSNs With Noisy Links - Part I: Distributed Estimation of Deterministic Signals
    IEEE Trans. Sig. Proc., Vol. 56, No. 1, 2008
  • H. Zhu, G. Giannakis, and A. Cano
    Distributed In-Network Channel Decoding
    IEEE Trans. Sig. Proc., Vol. 57, No. 10, 2009
  • B. Oreshkin, M. Coates, M. Rabbat,
    Optimization and Analysis of Distributed Averaging With Short Node Memory
    IEEE Trans. Sig. Proc., Vol. 58, No. 5, 2010
  • G. Mateos, J. Bazerque, and G. Giannakis
    Distributed Sparse Linear Regression
    IEEE Trans. Sig. Proc., Vol. 58, No. 10, 2010

Contact

Please feel free to email me with questions, suggestions, and bug reports.

Email: j.mota (at) ucl.ac.uk

João Mota
Department of Electronic and Electrical Engineering
Roberts Building
University College London
Torrington Place
London
WC1E 7JE

This work was done by João Mota , João Xavier, Pedro Aguiar, and Markus Püschel at Carnegie Mellon University, Instituto Superior Técnico, Institute For Systems And Robotics, and ETH Zurich under the Carnegie Mellon|Portugal program. It was partially funded by the grants from Fundação para a Ciência e Tecnologia (FCT) CMU-PT/SIA/0026/2009, PEst-OE/EEI/LA0009/2011, and SFRH/BD/33520/2008 (through the Carnegie Mellon/Portugal Program managed by ICTI).