Sign in adjoint derivative calculation

I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.

https://github.com/cvxgrp/diffcp/blob/83080bcd30775e2a48fbac33ca4165c474a7aa00/diffcp/cone_program.py#L341-L357

It seems like, when compared to the paper, the code solves `M.T @ r = dz` for `r`, whereas the paper solves `M.T @ g = -dz` for `g`. So `r = -g`. But then the equations used in the code to compute `(dA, db, dc)` seem to match those in the paper, when they should all differ by a negative sign.

Similarly, for the forward-mode derivative, you solve `M @ dz = dQ @ pi_z` for `dz`, use the same equations as in the paper despite the sign difference, but you multiply `(dx, dy, dz)` by `-1` before returning, so this is fine.

Is this a sign error in the adjoint derivative, or did I get something wrong?



	dw = -(x @ dx + y @ dy + s @ ds)
	dz = np.concatenate(
	[dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])])

	if np.allclose(dz, 0):
	r = np.zeros(dz.shape)
	elif mode == "dense":
	r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz)
	else:
	r = _diffcp.lsqr(MT, dz).solution

	values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols]
	dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape)
	db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m]
	dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n]

	return dA, db, dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign in adjoint derivative calculation #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sign in adjoint derivative calculation #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions