Demosaic fixes/regressions#20428
Demosaic fixes/regressions#20428jenshannoschwalm wants to merge 10 commits intodarktable-org:masterfrom
Conversation
|
When doing the diff-dump with complete the first image showing the diff is EDIT: just checked,
Before going into this i will check all the other demosaicers ... |
44761f1 to
a4d248a
Compare
|
@TurboGit a lot of tedious work, i think i have now fixed a large bunch of demosaicing related issues, many/most have been there for quite long. Hopefully a lot of the border issues are gone now. I would give these fixes a go for now. |
|
@jenshannoschwalm : Thanks, I have to analyze the following tests, it will take some time and will report back if I find something suspicious: |
|
I first check the simple rcd demosaicer, that already shows new artefacts due to clip&zoom happening here in demosaicer. |
|
EDIT: converted again to draft, there are still border issues at very strong blavk/white transitions for PPG and RCD (they share that part) you have to wait a bit longer :-) |
|
Ok, anyway here is the log of the integration tests run: |
516178d to
3838996
Compare
|
@TurboGit finally this all is ok to me. Heavy stuff for reviewing i guess :-) A lot of fixes done, notably for all images doing VNG, PPG, LMMSE, RCD there will be differences also for CPU code as the border calculations have all been fixed so that a) differences between CPU and GPU are marginal and b) choosing a better algo (only borders). All xtrans CPU code for example got the better algo also for markje, that was simply missing since 2016 i think. Differences bewtween CPU & OpenCL have been largely reduced for green equalizing and color smoothing. I tested all this with pipe differences in demosaic with hq processing on to see what happens in demosaic and what happens while scrop&scale in demosaic. there are differences which i couldn't track down yet. BTW a) for all interpolators and b) also after ensuring demosaic output stays above zero. How to proceed? the crop&scale is the obvious next important topic to check, don't know is oncl OpenCL code is bad or CPU or even both. Do you have any idea about this? For sure we can't update the integration expected images now as we have to understand the issue before doing so. At least discrepances while scaling seem to be pretty heavy and are the cause of most differences. Tested on AMD RustiCL and ROCm btw. |
3838996 to
9ba5df1
Compare
|
I think i got the interpolator bugs :-) |
9ba5df1 to
166e8e9
Compare
|
Before you go into review&testing i'll do a cleanup, hold on |
|
Understood, thanks. |
166e8e9 to
089aefc
Compare
|
@TurboGit @kofa73 i checked this back to 69077b9 (some earlier commits i couldn't compile any more) and all versions showed the large discrepancies ... i tested and logged as included You will find the bash script i used to test the relevant sets, all logs are named 2056526 Can you confirm this finding (the bug being that old? |
|
I may only have time during the weekend, but sure, I'll try to help if I can by checking. |
|
Just confirming that we have the regression since so long would be good. |
|
@jenshannoschwalm : Yes, I'm pretty sure the diff between CPU & GPU is there since a long time. |
|
For CE there is a lot of matrix math happening i'll .check again. Also i spotted some trivial box filter. Lot of pending work ... |
|
I'm fighting with python, somehow Ubuntu's colour-science package takes precedence even after I activate a venv. If those are the expected ones, I'll spend more time tomorrow with the commits @jenshannoschwalm mentioned. Please confirm. If the results are not expected, I'll troubleshoot the test env. |
|
I was looking for cpu vs opencl differences checking if those are due to recent regressions. I now know they are in fact "quite old". Pascal provided some integraton patches, it seems your system also fails to check. There is no need for further work on your side :-) |
1. The internal tiling for RCD CPU code requires a border of 10 to avoid subtle instabilities at the overlapping parts. Fixed comments accordingly. 2. The PPG demosaicer can handle negative input for best results thus quality has been improved both for CPU and OpenCL code especially in dark areas. 3. As RCD used a slightly modified variant of PPG (we don't need to calculate data outside the border), code could be refactored for CPU and OpenCL. 4. With current pipe code we have demosaicers returning non-negative output. This might not be necessary/desired after further work on the pipeline/interpolators. All demosaicers ensure data to be at least DEMOSAIC_OUTMIN which currently if 0.0f to support easy change later. 5. Introduce and first use of read_imagef helper inline functions
The algorithm does not require it and we have subtle quality improvements in dark areas due to less influence of eps.
1. Fix OpenCL VNG full interpolation, we must not touch the outermost 2 pixels, they have been interpolated already in the linear phase. 2. Ensure identical results for VNG initial borders. Checked results for two possible algorithms, average vs ppg-like original. There are subtle differences, but overall the ppg-like seems to be better and we now use it for both code paths. 3. Fix VNG final green mixing for bayer4 sensors. Do this at the end of processing for both CPU and GPU code. 4. We don't avoid negative input for slightly better results but keep output using DEMOSAIC_OUTMIN.
Don't touch outermost one-pixels, they are used for next/previous row/column, smoothing over-the-edge leads to clear artifacts as data are mirrored from the opposite border.
1. Use doubles to accumulate for full image green averaging in OpenCL sampling, the second reduce kernel uses kahan sum for increased precision for less CPU vs OpenCL difference, both don't fully fix differences but less diffs. 2. In OpenCL local greens averaging we correct the same lines as we do in cpu code.
Allow negative input for this preliminary demosaicing code, the demosaicers will have to handle this.
1. Pass icoeffs as a float4 instead of cl_mem, use inlines for reading. 2. Per default use the quality maths for equality between CPU & OpenCL
1. The CPU code deserves the better border calculation as we do for OpenCL since very long. There is a small performance penalty but with clearly better results. I was likely just overseen for many years. 2. Some minor fixes for initalizing data for far less differences between CPU OpenCL. 3. The VNG linear interpolation code needed some modifications to handle borders.
1. Calculation of Y0 mask is done minimally different leading to a slightly better stability in regions with any channal below zero. 2. The CPU code now works exactly as OpenCL with subtle p3erf gains as we multiply instead of devide depending on compiler.
1. OpenCL clip mode and falsecolor got a simpler kernel calling interface 2. As we used pow(x, 3) for opposed ref calculation for some time now and there has been no further refinement possible, let's use fcube and cbrt instead of the costly pow() also having higher precision.
68fc6a7 to
95676f3
Compare
|
@TurboGit i fear this is a big round for you :-) So we have
Please note that overall visible differences are neglectable (or desired) but the integrations algo reports all border differences. I did a huge amount of testing with diff-dumping the demosaicer and the amount of CPU/GPU differences has improved a lot. Also i did a lot of testing about current reported CPU vs OpenCL differences
In short, if i disable aggessive optimizing and disable use of native functions in color maths these differences are far less than we have now. I will rework the OpenCL preferences interface adding an "performance mode" resulting in faster but less precise compiled code (as we have it now) but with a default o leave color maths at precision mode. So it's up to the user ... |
|
I think this bunch is finally ready to go now, i ran the complete testsuite and couldn't spot any unexpected results. |
Meaning that we won't detect regressions until then. If possible I'd prefer having this merged together, but I don't know when this will happen. This is your work and I don't want to put pressure on anyone. |
|
Ok, i'll continue this work here, from time to time i'll prepare safe&tested PRs (as #20573) stuff that won't effect current maths / code. BTW could you make the test system run pocl OpenCL on CPU yet or is there anything missing/not working? |
The only missing thing is me understanding what should be done. Do I have only to install pocl on the machine or is there something else? |
|
I think that should do it... |
|
Closing as being handled in various PR's for easier testing. |

Fix RCD
overlapping parts.
We has these issues shown in diff-dumps

Fix VNG
@TurboGit these fix the most annyoing problems with CPU vs GPU diffs.
There is also a PPG issue i am still investigation following in another PR.
We will need integration updates as for almost all xtrans, vng and rcd related tests there will be desired differences :-)