Enable multiband runs (single-band unchanged)#23
Enable multiband runs (single-band unchanged)#23Jayashree-Behera wants to merge 13 commits intoschlafly:masterfrom
Conversation
e9c901b to
701c4a0
Compare
schlafly
left a comment
There was a problem hiding this comment.
This looks great, thanks Shree.
Let's delete mosaic.py which will be broken. I think we will need some decam_proc.py updates but they should be modest.
I'll have some more comments tomorrow, sorry to be slow.
| if weight is None: | ||
| weight = np.ones_like(im) | ||
| wt_pad = np.pad(weight, [pad, pad], constant_values=0.) | ||
| wt_pad[wt_pad == 0] = 1e-20 |
There was a problem hiding this comment.
You're duplicating my existing code, but I can't figure out for the life of me why I did this. Let's keep this for this commit, but can you file an issue to check removing this after we merge this commit?
schlafly
left a comment
There was a problem hiding this comment.
This looks good. Several comments inline. One bigger question: when running on a multiband image, I think fit_im now returns a "multiband" output; i.e., both _joint quantities and the original quantities, now with _b0 appended. There's also one newer quantity, snr_joint.
I'm wondering whether in the single band case we should compress the output before returning, doing roughly:
cat = {k[:-3]: cat[k] for k in cat if k.endswith('_b0')}
Then I think e.g. DECam processing and single-band WISE processing would be less affected (the returned catalogs would be basically the same as before. You may have already addressed the single-band WISE case in wise_proc?
| # Sources in this subregion (shared positions across all bands) | ||
| mbda_src = in_bounds(xa, ya, [bdxaf-0.5, bdxal-0.5], [bdyaf-0.5, bdyal-0.5]) | ||
| if not np.any(mbda_src): continue | ||
| mbd = in_bounds(xa, ya, [bdxf-0.5, bdxl-0.5], [bdyf-0.5, bdyl-0.5]) |
crowdsource/crowdsource_base.py
Outdated
|
|
||
| ind = np.flatnonzero(mbda_src) # global indices into 0..N-1 | ||
| ind2 = np.arange(len(ind)) # local indices 0..M-1 | ||
| M = len(ind) # Its same as len(ind2) |
There was a problem hiding this comment.
I think we lost a bit of the logic here. It clearly doesn't make a big impact, but here's the intent:
- we cut the image into subregions
- each subregion has a 'primary' region; the union of the primary regions covers the image.
- it also has a boundary region; we include the boundary regions in the fits so that stars on the edge of the primary region aren't cut off and can be fit simultaneously with their neighbors. The primary + boundary region gets the 'a' suffix ('all')
- we do the fit on each subregion independently ('all')
- we only fill out the fluxes of the stars that were in the 'primary' region
I think the new code does all of this right except for 5.
crowdsource/crowdsource_base.py
Outdated
| flux_snr = flux_snr + (flux_snr == 0)*1e-20 | ||
|
|
||
| xcen /= flux_snr | ||
| ycen /= flux_snr |
There was a problem hiding this comment.
I think I personally would have done flux_avg = np.sum(flux * fluxunc**-2) / np.sum(fluxunc ** -2), and xcen /= flux_avg, ycen /= flux_avg. This is also fine though since it's the first iteration and we just need to do something sane.
Perhaps add some comments that this only applies to the first iteration since that's the iteration where we don't know the flux scaling.
What happens to the centroids for newly detected sources? It feels like they probably need similar handling?
There was a problem hiding this comment.
The derivative columns in build_sparse_matrix are flux-scaled using guessflux from previous iteration, so the LSQR dx/dy coefficients already behave like centroid shifts. Except for the first iteration and so we do xcen /= flux_snr normalization.
Co-authored-by: Eddie Schlafly <eschlafly@gmail.com>
Co-authored-by: Eddie Schlafly <eschlafly@gmail.com>
Co-authored-by: Eddie Schlafly <eschlafly@gmail.com>
Co-authored-by: Eddie Schlafly <eschlafly@gmail.com>
| if guess is not None: | ||
| ind = np.flatnonzero(mbda_src) # fitted sources (reuse same 'ind') | ||
| guess_local_flux = guess[:B*N].reshape(B, N)[:, ind].ravel(order='C') | ||
| guessmbda = np.concatenate([guess_local_flux, skypar.get((bdxf, bdyf), np.zeros(B*nskypar, 'f4'))]) if nskypar>0 else guess_local_flux |
There was a problem hiding this comment.
| guessmbda = np.concatenate([guess_local_flux, skypar.get((bdxf, bdyf), np.zeros(B*nskypar, 'f4'))]) if nskypar>0 else guess_local_flux | |
| guessmbda = np.concatenate([guess_local_flux, skypar[(bdxf, bdyf)]]) |
I feel like on 1164 you make sure that this exists so you don't need to guard against it not existing here?
There was a problem hiding this comment.
Right, no need for both lines to do this. I will correct that.
|
|
||
| # print("fit_once:", time.time()-t_fo) | ||
|
|
||
| ind_all = np.flatnonzero(mbda_src) # sources included in the fit (ALL region = PRIMARY + BOUNDARY) | ||
| ind_pri = np.flatnonzero(mbd) # sources owned by this tile (PRIMARY region) | ||
|
|
||
| # Map primary sources into local ALL coordinates | ||
| ind_pri_loc = np.searchsorted(ind_all, ind_pri) | ||
|
|
||
| M_all = len(ind_all) | ||
| ind2 = np.arange(M_all) # local indices for PSF stamp building (still ALL sources) | ||
|
|
||
| # Store new fluxes | ||
| for b in range(B): | ||
| models[b][spri] = tmodels[b][sfit] | ||
| mskys[b][spri] = tmskys[b][sfit] | ||
|
|
||
| loc_start = b*M_all | ||
| loc_stop = (b+1)*M_all | ||
| glob_start = b*N | ||
|
|
||
| # only commit PRIMARY sources | ||
| flux[glob_start + ind_pri] = tflux[0][loc_start:loc_stop][ind_pri_loc] | ||
|
|
||
| # # Store (dx,dy) (if present): global deriv block starts at B*N | ||
| if tpsfderiv and M_all > 0: | ||
| deriv_off_glob = B*N | ||
| deriv_off_loc = B*M_all | ||
|
|
||
| dx_all = tflux[0][deriv_off_loc : deriv_off_loc + 2*M_all : 2] | ||
| dy_all = tflux[0][deriv_off_loc+1 : deriv_off_loc + 2*M_all : 2] | ||
|
|
||
| # only commit PRIMARY sources | ||
| flux[deriv_off_glob + 2*ind_pri] = dx_all[ind_pri_loc] | ||
| flux[deriv_off_glob + 2*ind_pri + 1] = dy_all[ind_pri_loc] | ||
|
|
||
| # Update per-tile sky | ||
| if nskypar > 0: | ||
| skypar[(bdxf, bdyf)] = tflux[0][-B*nskypar:].copy() | ||
|
|
||
| # Build PSF stamps for this tile (optional but gives big speed-up in centroiding) | ||
| for b in range(B): | ||
| if M_all > 0: | ||
| for k in range(n_psfcomps): | ||
| psf_block = np.stack( | ||
| [psfmod.central_stamp(psfsbda[b][k][tind], minsz_b[b]) for tind in ind2], | ||
| axis=0 | ||
| ).astype('f4') # (M, S, S) | ||
| # Insert into the global psf_stamps | ||
| psf_stamps[b][k][mbda_src] = psf_block |
There was a problem hiding this comment.
This got a little more involved than I immediately thought necessary. Would you mind looking at the following and seeing what I'm missing?
if numpy.all(numpy.isnan(tmodel)):
raise ValueError("Model is all NaNs")
ind_pri_in_full = numpy.flatnonzero(mbd)
ind_pri_in_all = numpy.flatnonzero(mbd[mbda])
Nall = np.sum(mbda)
# copy the centroid parameters for primary sources from the subregion fit
# into the full image parameter set
if tpsfderiv:
for i in range(2):
flux[N*B + ind_pri_in_full + i] = tflux[0][Nall*B + ind_pri_in_all + i]
skypar[(bdxf, bdyf)] = tflux[0][-B*nskypar:].copy()
# copy the flux parameters for primary sources from the subregion fit
# into the full image parameter set
for b in range(B):
models[b][spri] = tmodel[b][sfit]
msky[b][spri] = tmsky[b][sfit]
flux[ind_pri_in_full*B + b] = tflux[0][ind_pri_in_all*B + b]
for k in range(n_psfcomps):
if Nall == 0:
continue
psf_stamps[b][k][ind_pri_in_full] = [
psfmod.central_stamp(psfsbda[i][tind], minsz)
for tind in ind_pri_in_all]
(roughly I'm trying to follow the old logic more closely, but I'm probably missing something you're doing there!)
There was a problem hiding this comment.
Thanks for writing it out. Yes, I originally tried to follow the old logic more closely. The reason I introduced the explicit ind_all to ind_pri_loc mapping is that the subregion fit is indexed in the ALL-sources ordering (flatnonzero(mbda_src)), and I wanted to be explicit about mapping PRIMARY sources back into that local ordering.
If we are confident that mbd[mbda] preserves exactly the same ordering as flatnonzero(mbda_src), then your version should indeed work and would be cleaner.
There was a problem hiding this comment.
That checks out, so I made that change.
Regarding other parts of the block, the logic is much clearer in your form. It would definitely be nicer if the parameter layout were source-major like that, since the indexing becomes much shorter and cleaner.
But in my implementation, the flux vector is band-major (b*N + i) where b = band and i = source, so I had to index it slightly differently to stay consistent with how the matrix is built. If it’s not a strong preference, I’d lean toward keeping the current layout for now to avoid changing any of my analysis notebooks immediately.
This PR adds multiband source fitting utilities to crowdsource_base.py and updates wise_proc.py to run in multiband mode. Multiband functions are named after their corresponding single-band functions with "_multiband" appended.
Single-band run behavior is unchanged.