I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.
|
dw = -(x @ dx + y @ dy + s @ ds) |
|
dz = np.concatenate( |
|
[dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])]) |
|
|
|
if np.allclose(dz, 0): |
|
r = np.zeros(dz.shape) |
|
elif mode == "dense": |
|
r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz) |
|
else: |
|
r = _diffcp.lsqr(MT, dz).solution |
|
|
|
values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols] |
|
dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape) |
|
db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m] |
|
dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n] |
|
|
|
return dA, db, dc |
It seems like, when compared to the paper, the code solves M.T @ r = dz for r, whereas the paper solves M.T @ g = -dz for g. So r = -g. But then the equations used in the code to compute (dA, db, dc) seem to match those in the paper, when they should all differ by a negative sign.
Similarly, for the forward-mode derivative, you solve M @ dz = dQ @ pi_z for dz, use the same equations as in the paper despite the sign difference, but you multiply (dx, dy, dz) by -1 before returning, so this is fine.
Is this a sign error in the adjoint derivative, or did I get something wrong?
I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.
diffcp/diffcp/cone_program.py
Lines 341 to 357 in 83080bc
It seems like, when compared to the paper, the code solves
M.T @ r = dzforr, whereas the paper solvesM.T @ g = -dzforg. Sor = -g. But then the equations used in the code to compute(dA, db, dc)seem to match those in the paper, when they should all differ by a negative sign.Similarly, for the forward-mode derivative, you solve
M @ dz = dQ @ pi_zfordz, use the same equations as in the paper despite the sign difference, but you multiply(dx, dy, dz)by-1before returning, so this is fine.Is this a sign error in the adjoint derivative, or did I get something wrong?