diff --git a/.gitignore b/.gitignore index 3b26373..4bc3623 100644 --- a/.gitignore +++ b/.gitignore @@ -26,459 +26,17 @@ code/__pycache__/files.cpython-39.pyc code/__pycache__/files_rc.cpython-38.pyc code/__pycache__/files_rc.cpython-39.pyc code/__pycache__/main.cpython-38.pyc -code/__pycache__/main.cpython-39.pyc code/__pycache__/plotting.cpython-38.pyc -code/__pycache__/plotting.cpython-39.pyc -code/__pycache__/stats.cpython-39.pyc code/__pycache__/ui_featureinfo.cpython-38.pyc code/__pycache__/ui_featureinfo.cpython-39.pyc code/__pycache__/ui_functions.cpython-38.pyc code/__pycache__/ui_main.cpython-38.pyc code/__pycache__/ui_main.cpython-39.pyc code/__pycache__/ui_plotparam.cpython-39.pyc -code/compoundimages/(+)-5(6),13-halimadiene-15-ol.png -code/compoundimages/(+)-caryolan-1-ol.png -code/compoundimages/(+)-discoipyrrole A.png -code/compoundimages/(-)-7-Geranylindolactam V.png -code/compoundimages/(-)-Neoverrucosan-5beta-ol.png -code/compoundimages/(-)-Verrucosan-2beta-ol.png -code/compoundimages/(1S,3aR)-jadomycin V.png -code/compoundimages/(2E,4E)-7-methylocta-2,4-dienoic acid amide.png -code/compoundimages/(2S,3S)-3-Hydroxy-1,4-diphenylbutan-2-yl-acetate.png -code/compoundimages/(5R) 5-hydroxy-3-\[\[2-(4-hydroxyphenyl)ethyl\]amino\]-5-vinyl-2-cyclopenten-1-one.png -code/compoundimages/(5S,S)-5-methyl-3-(2-methylbutyl)furan-2(5H)-one.png -code/compoundimages/(5S,S)-5-methyl-3-(3-methylpentyl)furan-2(5H)-one.png -code/compoundimages/(E)-12-methyltridec-3-enenitrile.png -code/compoundimages/(E)-ethyl 8-oxooctadec-9-enoate.png -code/compoundimages/(E)-tetradec-3-enenitrile.png -code/compoundimages/(S)-N-tetradecanoyl-HSL.png -code/compoundimages/(Z)-12-methyltridec-3-enenitrile.png -code/compoundimages/(Z)-15-Methylhexadec-10-en-2-one.png -code/compoundimages/(Z)-octadec-11-enenitrile.png -code/compoundimages/(Z)-tetradec-3-enenitrile.png -code/compoundimages/(Z)-tetradec-7-enenitrile.png -code/compoundimages/-L-glutamyl-L-leucine.png -code/compoundimages/1 '- (beta- Glucopyranosyloxy) di- O-demethylspirilloxan thin.png -code/compoundimages/1'-beta-glucopyranosyl-3,4,3',4'-tetradehydro-1', 2'-dihydro-beta,psi-caroten-2-one.png -code/compoundimages/1,4-dihydroxy-2,5-dimethoxy-9,10-anthraquinone.png -code/compoundimages/1-hydroxymethylindole-3-carboxylic acid.png -code/compoundimages/1-methyl-pseudouridine.png -code/compoundimages/1-methylthio-2,3-di-O-(3',7',11',15'-tetramethylhexadecyl)glycerol (diphytanylglyceryl methylthioether).png -code/compoundimages/10,15-dihydroxyamorph-4-en-3-one.png -code/compoundimages/12-Deoxy-deoxysaxitoxin.png -code/compoundimages/12-hydroxy-13-butoxyethoxyfumitremorgin B.png -code/compoundimages/16-methyloxazolomycin.png -code/compoundimages/17-Methylenespiramycin.png -code/compoundimages/17-O-ethylnotoamide M.png -code/compoundimages/18-methyltacrolimus.png -code/compoundimages/2-(2-carboxyethyl)-8-hydroxyquinazolin-4(3H)-one.png -code/compoundimages/2-(furan-2-yl)-6-(2S,3S,4-trihydroxybutyl)pyrazine.png -code/compoundimages/2-(Heptadecyl)-3,6-dihydroxy-1,4-benzoquinone.png -code/compoundimages/2-amino-4-methoxy-5-cyanopyrrolo\[2,3-d\]pyrimidine.png -code/compoundimages/2-amino-6-hydroxyphenoxazin-3-one.png -code/compoundimages/2-Demethylmonensin B.png -code/compoundimages/2-ethyl-7-hydroxy-6,7-dihydro-5H-indolizin-3-one.png -code/compoundimages/23-(6-methyl)heptanoic acid demalonylazalomycin F3a ester.png -code/compoundimages/3-hydroxydehydrodaidzein.png -code/compoundimages/3-Isobutylpropanamide-2-cyclopenten-1-one.png -code/compoundimages/3-N-formyl- holyrine A.png -code/compoundimages/4,5-dihydroxy-7-methylphthalide.png -code/compoundimages/4-\[2-O-9Z-hexadecenoyl--glucopyranosyl\]-4,4-diapolycopene-4,4-dioic acid.png -code/compoundimages/4-desmethylepothilone D.png -code/compoundimages/4-methoxy-3H-isobenzofuran-1-one.png -code/compoundimages/4-quinolinecarboxylic acid.png -code/compoundimages/41-Demethylhomooligomycin B.png -code/compoundimages/5'-deoxyguanosine.png -code/compoundimages/5,18-dedihydroxycyclooctatin.png -code/compoundimages/5,7,3',4'-Tetrahydroxy-8-methylisoflavon.png -code/compoundimages/5-Hydroxy-3-(1-hydroxy-2-methylbutyl)-4-methyl-2(5H)-furanone.png -code/compoundimages/5-hydroxydeoxyvasicinone.png -code/compoundimages/6-acetylphenazine-1-carboxylic acid.png -code/compoundimages/6-deoxyerythronolide B.png -code/compoundimages/6-Hydroxysordarin.png -code/compoundimages/7-Hydroxy-8,16-dimethyl-9-octadecenoic acid.png -code/compoundimages/7-Tetradecenoic acid.png -code/compoundimages/8,9-dihydrolactimidomycin.png -code/compoundimages/8-desmethoxy-isomigrastatin.png -code/compoundimages/8-hydroxy-8,9-dihydrolactidomycin.png -code/compoundimages/\[D-Asp, Dhb^7\]microcystin-LR.png -code/compoundimages/\[D-Asp3,Ser7\]MC-LR.png -code/compoundimages/Abenquine B2.png -code/compoundimages/Abenquine C.png -code/compoundimages/Abyssomicin C.png -code/compoundimages/Abyssomicin P.png -code/compoundimages/Acidiphilamide C.png -code/compoundimages/Actiketal.png -code/compoundimages/Actinoallolide D.png -code/compoundimages/Actinoramide E.png -code/compoundimages/AHB-6-Methylneamine.png -code/compoundimages/AI-77-F.png -code/compoundimages/Albatrelin F.png -code/compoundimages/Albogrisin B.png -code/compoundimages/Albucyclone A.png -code/compoundimages/Albumycin.png -code/compoundimages/Aldgamycin E.png -code/compoundimages/Aldgamycin I.png -code/compoundimages/Aldgamycin K.png -code/compoundimages/Alokicenone C.png -code/compoundimages/AM-2604 A.png -code/compoundimages/Amphibactin T.png -code/compoundimages/Amycolatopsin C.png -code/compoundimages/Ananstrep C.png -code/compoundimages/AnhydroSEK4b.png -code/compoundimages/Antarlide F.png -code/compoundimages/Antascomicin B.png -code/compoundimages/Antascomicin E.png -code/compoundimages/Antillatoxin.png -code/compoundimages/Aquayamycin.png -code/compoundimages/Argimicin B.png -code/compoundimages/Arthripenoid B.png -code/compoundimages/Ashimide B.png -code/compoundimages/Aspergilone A.png -code/compoundimages/Asterobactin B.png -code/compoundimages/Azicemicin A.png -code/compoundimages/Bacillamidin G.png -code/compoundimages/Bacilysocin.png -code/compoundimages/Bafilomycin C2.png -code/compoundimages/Bafilomycin G.png -code/compoundimages/Balgacyclamide C.png -code/compoundimages/Bananamide 2.png -code/compoundimages/Banegasine.png -code/compoundimages/Bartoloside H.png -code/compoundimages/Bartoloside I.png -code/compoundimages/BE-10988.png -code/compoundimages/BE-14106.png -code/compoundimages/BE-32030A.png -code/compoundimages/Benzastatin A.png -code/compoundimages/Biphenomycin C.png -code/compoundimages/Biseokeaniamide A.png -code/compoundimages/Blastmycetin C.png -code/compoundimages/Brasilibactin A.png -code/compoundimages/Brasiliquinone C.png -code/compoundimages/Brintonamide A.png -code/compoundimages/Brintonamide B.png -code/compoundimages/Caerulomycin G.png -code/compoundimages/Caldorin.png -code/compoundimages/Carboxymycobactin-7.png -code/compoundimages/Cepafungin I.png -code/compoundimages/Cephamycin C.png -code/compoundimages/Chaiyaphumine D.png -code/compoundimages/Chejuenolide A.png -code/compoundimages/Chlorotonil B.png -code/compoundimages/Chrondamide 12D.png -code/compoundimages/Circumdatin D.png -code/compoundimages/Cis-7-tetradecenoyl-D-asparagine.png -code/compoundimages/Citreamicin alpha.png -code/compoundimages/Citreo-g-pyrone.png -code/compoundimages/Clavirolide A.png -code/compoundimages/Coibacin C.png -code/compoundimages/Columbamide B.png -code/compoundimages/Concanamycin B.png -code/compoundimages/Conglobatin.png -code/compoundimages/Coronafacoyl-L-isoleucine.png -code/compoundimages/Coronatine.png -code/compoundimages/Cosmomycin A.png -code/compoundimages/Crocagin A.png -code/compoundimages/Cryptophycin-16.png -code/compoundimages/Cryptophycin-38.png -code/compoundimages/Cyanopeptolin 920.png -code/compoundimages/Cyclo(D)-Pro-(D)-Leu.png -code/compoundimages/Cyclo-(L-Ala-L-Tyr).png -code/compoundimages/Cyclodysidin D.png -code/compoundimages/Cylindrocyclophane C2.png -code/compoundimages/Daryamide E.png -code/compoundimages/Defumarylhygrolidin.png -code/compoundimages/Dehydro tilivalline.png -code/compoundimages/Dehydroxynocardamine.png -code/compoundimages/Demethylblasticidin S.png -code/compoundimages/Deoxynybomycin.png -code/compoundimages/Desferrioxamine X4.png -code/compoundimages/Desotamide A.png -code/compoundimages/Desotamide C.png -code/compoundimages/Desotamide G.png -code/compoundimages/Diaphorin.png -code/compoundimages/Dietziamide A.png -code/compoundimages/Dihydromaltophilin.png -code/compoundimages/Dioxolide A.png -code/compoundimages/Diploptene.png -code/compoundimages/DKxanthene 574.png -code/compoundimages/Dokdolipid B.png -code/compoundimages/Dolastatin 10.png -code/compoundimages/Dragonamide D.png -code/compoundimages/Eicosanedioic acid.png -code/compoundimages/Emericellamide A.png -code/compoundimages/Enniatin L.png -code/compoundimages/Enniatin M1.png -code/compoundimages/Epohelmin B.png -code/compoundimages/Eponemycin.png -code/compoundimages/Epothilone D.png -code/compoundimages/Epothilone D1.png -code/compoundimages/Epothilone I1.png -code/compoundimages/Erythromycin G.png -code/compoundimages/ethyl homononactyl homononactate.png -code/compoundimages/Etrogol.png -code/compoundimages/Eurystatin C.png -code/compoundimages/F-Met I.png -code/compoundimages/Flexirubin.png -code/compoundimages/Fluvirucin B2.png -code/compoundimages/Fluvirucin B6.png -code/compoundimages/Fontonamide.png -code/compoundimages/Formicamycin D.png -code/compoundimages/FR-66979.png -code/compoundimages/FR-900848.png -code/compoundimages/Frenolicin G.png -code/compoundimages/Fumaquinone.png -code/compoundimages/Furaquinocin B.png -code/compoundimages/Furaquinocin D.png -code/compoundimages/Fusaricidin D.png -code/compoundimages/Geralcin E.png -code/compoundimages/GGL.3.png -code/compoundimages/Glidobactin C.png -code/compoundimages/Gln-Asp-Val-Leu.png -code/compoundimages/Glomecidin.png -code/compoundimages/Glycocinnasperimicin D.png -code/compoundimages/Gobichelin B.png -code/compoundimages/Griselimycin.png -code/compoundimages/Guineamide C.png -code/compoundimages/H2-6-Hydroxymethylpterin.png -code/compoundimages/Halstoctacosanolide B.png -code/compoundimages/Hapalindole D.png -code/compoundimages/Heliomycin.png -code/compoundimages/Hexadecanenitrile.png -code/compoundimages/Hexose-palythine-serine.png -code/compoundimages/Homorapamycin A.png -code/compoundimages/Hoshinolactam.png -code/compoundimages/IC-202-A.png -code/compoundimages/Ilanefuranone.png -code/compoundimages/Indigoidine.png -code/compoundimages/Indisocin.png -code/compoundimages/Indole-3-acetic acid methyl ester.png -code/compoundimages/Inonotusic acid.png -code/compoundimages/Iromycin C.png -code/compoundimages/Isobongkrekic acid,.png -code/compoundimages/Isobutyrylvalindomycin.png -code/compoundimages/Isomalyngamide A.png -code/compoundimages/Isorhizopodin.png -code/compoundimages/Isotuberculosino.png -code/compoundimages/Izenamide B.png -code/compoundimages/JBIR-05.png -code/compoundimages/JBIR-80.png -code/compoundimages/Jomthonic acid E.png -code/compoundimages/Juglomycin I.png -code/compoundimages/Kalafungin.png -code/compoundimages/Kalimantacin B.png -code/compoundimages/Kandenol A.png -code/compoundimages/Kijimicin.png -code/compoundimages/Koreenceine A.png -code/compoundimages/Koreenceine B.png -code/compoundimages/Koreenceine C.png -code/compoundimages/Korormicin K.png -code/compoundimages/Kribelloside A.png -code/compoundimages/Kribelloside B.png -code/compoundimages/L--(3-hydroxyureido)-alanine.png -code/compoundimages/Lactoquinomycin.png -code/compoundimages/Lagunamide B.png -code/compoundimages/Landomycin A.png -code/compoundimages/Landomycin S.png -code/compoundimages/Lentzeoside E.png -code/compoundimages/Leptofuranin D.png -code/compoundimages/Leualacin G.png -code/compoundimages/Leupyrrin B2.png -code/compoundimages/Leuseramycin.png -code/compoundimages/Lipoamide C.png -code/compoundimages/Lipstatin.png -code/compoundimages/Lobarialide C.png -code/compoundimages/Lobosamide C.png -code/compoundimages/Lodopyridone.png -code/compoundimages/Luminmide B.png -code/compoundimages/Lutoside.png -code/compoundimages/Lyngbyatoxin A.png -code/compoundimages/Maculalactone K.png -code/compoundimages/Maculalactone M.png -code/compoundimages/Mandelalide A.png -code/compoundimages/Mansouramycin D.png -code/compoundimages/Maremycin D2.png -code/compoundimages/Marformycin D.png -code/compoundimages/Maridomycin III.png -code/compoundimages/Marinactinone A.png -code/compoundimages/Marinobactin-D1.png -code/compoundimages/Martinomycin.png -code/compoundimages/Matlystatin A.png -code/compoundimages/Mer-WF3010.png -code/compoundimages/Metacridamide A.png -code/compoundimages/methyl 1-(methyl propionate)--carboline-3-carboxylate.png -code/compoundimages/Microginin 576.png -code/compoundimages/Microginin 91-A.png -code/compoundimages/Microginin FR9.png -code/compoundimages/Microtermolide A.png -code/compoundimages/Milbemycin 10.png -code/compoundimages/Milbemycin 26.png -code/compoundimages/Minutissamide J.png -code/compoundimages/MKN-004C.png -code/compoundimages/Mohangic acid A.png -code/compoundimages/Mohangic acid E.png -code/compoundimages/Monactin.png -code/compoundimages/Mupirocin F.png -code/compoundimages/Mutaxanthene B.png -code/compoundimages/Mycemycin A.png -code/compoundimages/Mycemycin E.png -code/compoundimages/Myxochromide S3.png -code/compoundimages/Myxopyronin B.png -code/compoundimages/Myxotyroside B.png -code/compoundimages/N,N'-diisobutylurea.png -code/compoundimages/N-Acetyl-tyramine.png -code/compoundimages/N-carboxamido-staurosporine.png -code/compoundimages/N-methylphloretamide.png -code/compoundimages/N-Tetradecadienoyl-L-homoserine lactone.png -code/compoundimages/Nai414-B.png -code/compoundimages/Namalide C.png -code/compoundimages/Namalide E.png -code/compoundimages/Naphthacemycin B3.png -code/compoundimages/Naphthgeranine A.png -code/compoundimages/Neomacrophorin III.png -code/compoundimages/Nevaltophin A.png -code/compoundimages/Nigerapyrone H.png -code/compoundimages/Nitrosoxacin C.png -code/compoundimages/Nocapyrone R.png -code/compoundimages/Nocardichelin B.png -code/compoundimages/Nocardiopyrone A.png -code/compoundimages/Nostopeptolide A3.png -code/compoundimages/Nostopeptolide L3.png -code/compoundimages/Nostophycin.png -code/compoundimages/Not named.png -code/compoundimages/NP-101A.png -code/compoundimages/NW-G03.png -code/compoundimages/Obscurolide-C2 methyl ester.png -code/compoundimages/Octacyclomycin.png -code/compoundimages/Odyverdiene B.png -code/compoundimages/Okaramine H.png -code/compoundimages/Oryzamide B.png -code/compoundimages/Oscillamide B.png -code/compoundimages/Oscillatoxin E.png -code/compoundimages/Oxepinamide C.png -code/compoundimages/Palmyrrolinone.png -code/compoundimages/Panclicin E.png -code/compoundimages/Panosialin C.png -code/compoundimages/Paulomycin E.png -code/compoundimages/PD-118576-A3.png -code/compoundimages/Pepsatin Pr.png -code/compoundimages/Pestabacillin B.png -code/compoundimages/Phe+CO\[Lys+Val+Leu+MeHty+MetO\].png -code/compoundimages/Phenalinolactone A.png -code/compoundimages/Phenoxan.png -code/compoundimages/Phenylbutenote.png -code/compoundimages/Phenylnannolone A.png -code/compoundimages/Phenylnannolone C.png -code/compoundimages/Photopyrone A.png -code/compoundimages/Phototemtide A.png -code/compoundimages/Piericidin B5.png -code/compoundimages/Pimprinol A.png -code/compoundimages/Planktocyclin.png -code/compoundimages/Planktopeptin BL843.png -code/compoundimages/PM-toxin B.png -code/compoundimages/Porpoisamide B.png -code/compoundimages/Poststatin.png -code/compoundimages/Pseudoaeruginosin NS1.png -code/compoundimages/Pseudodestruxin A.png -code/compoundimages/Psi-tectorigenin.png -code/compoundimages/Psuedodestruxin C.png -code/compoundimages/Pukeleimide E.png -code/compoundimages/Pulicatin A.png -code/compoundimages/Pyonitrin A.png -code/compoundimages/Pyridindolol K2.png -code/compoundimages/Pyridinopyrone C.png -code/compoundimages/Pyrroindomycin B.png -code/compoundimages/Qinimycin C.png -code/compoundimages/Quinolobactin.png -code/compoundimages/Rakicidin B.png -code/compoundimages/Ralfuranone B.png -code/compoundimages/Ralfuranone I.png -code/compoundimages/Ralstonin B.png -code/compoundimages/Rhabdopeptide 8.png -code/compoundimages/Rhizomide B.png -code/compoundimages/RHM2.png -code/compoundimages/Rhodopeptin C1.png -code/compoundimages/Rhodopeptin C4.png -code/compoundimages/Ribocyclophane C.png -code/compoundimages/Rifamycin Z.png -code/compoundimages/RK-144171.png -code/compoundimages/Roquefortine A.png -code/compoundimages/Roseobacticide H.png -code/compoundimages/Salinipostin D.png -code/compoundimages/Salinosporamide I.png -code/compoundimages/Sanglifehrin C.png -code/compoundimages/Sarpeptin B.png -code/compoundimages/SCH 38518.png -code/compoundimages/Sch 39185.png -code/compoundimages/Sclerolizine.png -code/compoundimages/Semiplenamide F.png -code/compoundimages/Serinolamides D.png -code/compoundimages/Serratamolide C.png -code/compoundimages/Serratamolide D.png -code/compoundimages/Serratamolide E.png -code/compoundimages/SF-1902-A3.png -code/compoundimages/SF-1902-A4b.png -code/compoundimages/SF-2140.png -code/compoundimages/SF2738C.png -code/compoundimages/Shikometabolin A.png -code/compoundimages/Siastatin B.png -code/compoundimages/Silalthride.png -code/compoundimages/Sordarin-1-glucose ester.png -code/compoundimages/Spliceostatin E.png -code/compoundimages/Spongiporic acid A.png -code/compoundimages/Sporminarin B.png -code/compoundimages/Stoloniferone L.png -code/compoundimages/Strepantibin C.png -code/compoundimages/Streptoaminal-8n.png -code/compoundimages/Streptoaminal9n.png -code/compoundimages/Streptofactin.png -code/compoundimages/Streptoone C.png -code/compoundimages/Streptovirudin D1.png -code/compoundimages/Strevertene B.png -code/compoundimages/Syringolin G.png -code/compoundimages/T1801 A.png -code/compoundimages/Tasipeptin A.png -code/compoundimages/Tasipeptin B.png -code/compoundimages/Tautomycin.png -code/compoundimages/Teixobactin.png -code/compoundimages/Tenacibactin A.png -code/compoundimages/Tenacibactin B.png -code/compoundimages/Terresterol.png -code/compoundimages/Tetradecanenitrile.png -code/compoundimages/Tetronomycin.png -code/compoundimages/Thailandamide lactone.png -code/compoundimages/Thaxteramide A2.png -code/compoundimages/Thaxtomin B.png -code/compoundimages/Tjipanazole C1.png -code/compoundimages/Trichophycin C.png -code/compoundimages/Triedimycin A.png -code/compoundimages/Trierixin.png -code/compoundimages/Tyromycic acid G.png -code/compoundimages/U-77864.png -code/compoundimages/UK-78629.png -code/compoundimages/Unguisin E.png -code/compoundimages/USF-142A.png -code/compoundimages/Uvidin-A ester 2a.png -code/compoundimages/Violapyrone B.png -code/compoundimages/Violapyrone E.png -code/compoundimages/VLP T.png -code/compoundimages/Wortmanamide B.png -code/compoundimages/Xanthocillin-X dimethylether.png -code/compoundimages/Xefoampeptide C.png -code/compoundimages/Xentrivalpeptide O.png -code/compoundimages/Yanucamide A.png -code/compoundimages/Yatakemycin.png -code/compoundimages/Yicathin C.png -code/compoundimages/YM-47515 degradation product.png -code/compoundimages/Ypaoamide C.png -code/compoundimages/Z-4-2.png -code/test_upsetplt.png +code/compoundimages/*.png code/test_upsetplt.png code/treemap.png code/untitled0.py -code/test_upsetplt.png -code/test_upsetplt.png -code/treemap.png msdial.mgf msdial.msp msdial_unformatted.txt @@ -490,7 +48,3 @@ progenesis.csv progenesis.msp 241115_mpactreferenceguide.docx 220504_mpactmanual.docx -code/treemap.png -code/test_upsetplt.png -code/test_upsetplt.png -code/treemap.png diff --git a/code/MSFaST.py b/code/MSFaST.py index 049654c..2207180 100644 --- a/code/MSFaST.py +++ b/code/MSFaST.py @@ -10,7 +10,6 @@ from groupsets import normalize_graphfilters from datetime import datetime import time -from pathlib import Path #---Classes--- @@ -187,6 +186,11 @@ def run_MSFaST(params): # Filtering and error propagation print('Filtering data') ionfilters = {} + # Initialise here (not only inside `if analysis_params.grpave:`) so the + # unconditional groupionlists[...] writes further down (and the blank + # filter, which reads it) can't raise NameError if grpave is ever off. + # The GUI currently forces grpave=True, but loaded sessions/tests need not. + groupionlists = {} if analysis_params.relfil: ionfilters = filter.relationalfilter(analysis_params, ionfilters) if analysis_params.merge: @@ -254,7 +258,7 @@ def run_MSFaST(params): msdata_filtered = pd.read_csv(analysis_params.outputdir / (analysis_params.filename.stem + '_filtered.csv'), sep = ',', header = [0, 1, 2], index_col = [0, 1, 2]) analysisrec = open(analysis_params.outputdir / 'analysisinfo.txt',"w") analysisrec.writelines(['Analysis Date: ' + str(datetime.now()) + '\n', - 'Runetime: ' + str(round(runtime, 2)) + ' seconds\n', + 'Runtime: ' + str(round(runtime, 2)) + ' seconds\n', 'Input file: ' + str(analysis_params.filename) + '\n', 'Sample list: ' + str(analysis_params.samplelistfilename) + '\n', 'Extract metadata file: ' + str(analysis_params.extractmetadatafilename) + '\n', @@ -280,10 +284,10 @@ def run_MSFaST(params): text = '' if analysis_params.relfil: text += 'Features failing peak correction filtering: ' + str(len(ionfilters['relfil'].ions)) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * len(ionfilters['relfil'].ions) / len(msdata_unformatted.index), 2)) + '%\n' - if analysis_params.blnkfltr: #FIX THIS REF TO "BLANKS" + if analysis_params.blnkfltr: text += 'Features failing blank filtering: ' + str(len(groupionlists[analysis_params.blnkgrp])) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * len(groupionlists[analysis_params.blnkgrp]) / len(msdata_unformatted.index), 2)) + '%\n' if analysis_params.decon: - text += 'Features failing blank filtering: ' + str(len(ionfilters['insource'].ions)) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * len(ionfilters['insource'].ions) / len(msdata_unformatted.index), 2)) + '%\n' + text += 'Features failing in-source/deconvolution filtering: ' + str(len(ionfilters['insource'].ions)) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * len(ionfilters['insource'].ions) / len(msdata_unformatted.index), 2)) + '%\n' if analysis_params.CVfil: text += 'Features failing CV filtering: ' + str(len(ionfilters['cv'].ions)) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * len(ionfilters['cv'].ions) / len(msdata_unformatted.index), 2)) + '%\n' text += 'Features failing any filters: ' + str(len(msdata_unformatted.index) - len(msdata_filtered.index)) + '/' + str(len(msdata_unformatted.index)) + ' ' + str(round(100 * (len(msdata_unformatted.index) - len(msdata_filtered.index)) / len(msdata_unformatted.index), 2)) + '%\n' @@ -310,7 +314,7 @@ def run_MSFaST(params): 'RT/mz/FC: ' + str(analysis_params.FC3Dplt) + ' ' + str(analysis_params.statstgrps) + '\n', 'KMD/mz ' + str(analysis_params.KMD) + '\n', #'KMD/mz/RT ' + str(analysis_params.___) + '\n', - 'PCA unfitlered: ' + str(analysis_params.PCA) + '\n', + 'PCA unfiltered: ' + str(analysis_params.PCA) + '\n', 'PCA filtered: ' + str(analysis_params.PCA) + '\n', 'Dendrogram (ward) unfiltered: ' + str(analysis_params.Dendrogram) + '\n', 'Dendrogram (ward) Filtered: ' + str(analysis_params.Dendrogram) + '\n', diff --git a/code/crashreport.py b/code/crashreport.py new file mode 100644 index 0000000..9d5640f --- /dev/null +++ b/code/crashreport.py @@ -0,0 +1,185 @@ +""" +MPACT +Copyright 2022, Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas + +Qt-free crash/error reporting. Installs a ``sys.excepthook`` that, on any +otherwise-unhandled exception: + +1. formats a full report (traceback + environment: MPACT/Python/platform + versions, timestamp, optional context such as the tail of the run log), +2. writes it to a timestamped file under a crash-log directory (so there's a + durable record even if the user dismisses the dialog), and +3. hands the report to a GUI callback that asks the user whether to send it. + +The "send" path is deliberately backend-free: it builds a pre-filled GitHub +*new issue* URL (title + body) for the MPACT repo, so reporting is one click +in the browser and nothing leaves the user's machine until they choose to +submit it. That satisfies "prompt the user before sending" without any cloud +egress, DSN, or account. + +Why not Sentry (the obvious off-the-shelf option): ``sentry-sdk`` is excellent +for hosted/web services but (a) sends events to a Sentry project by default -- +exactly the silent-egress this tool should avoid for a desktop research app, +(b) needs a DSN/account to be provisioned, and (c) still needs a custom +``before_send`` hook + dialog to honour "ask first." For a single-user desktop +tool the local-log + pre-filled-GitHub-issue flow gives the same practical +benefit (a complete traceback in the maintainer's hands) with no infrastructure +and no privacy surprise. If MPACT ever ships to many non-technical users and a +central error feed becomes worth it, Sentry with ``before_send`` gating is the +documented upgrade path. + +This module is Qt-free and unit-tested (see ``tests/test_crashreport.py``); the +GUI dialog is injected as a plain callback. +""" + +import os +import platform +import sys +import time +import traceback +import urllib.parse + +DEFAULT_REPO = 'robertsamples/mpact' +# GitHub rejects extremely long issue URLs; keep the prefilled body well under +# the practical limit so the link always opens (the full report is always in +# the log file regardless). +_MAX_ISSUE_BODY = 6000 + + +def _app_version(): + try: + from mpactupdate import __version__ + return __version__ + except Exception: + return 'unknown' + + +def format_report(exc_type, exc_value, exc_tb, context=None, now=None): + """Build the human-readable crash report text. + + Args: + exc_type/exc_value/exc_tb: the ``sys.exc_info()``-style triple. + context: optional extra text appended under a "Context" heading + (e.g. the last lines of the run log, the current dataset name). + now: epoch seconds for the timestamp (injectable for tests). + + Returns: + A multi-section plain-text report. + """ + now = time.time() if now is None else now + stamp = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(now)) + tb_text = ''.join(traceback.format_exception(exc_type, exc_value, exc_tb)) + lines = [ + 'MPACT crash report', + '==================', + 'Time: ' + stamp, + 'MPACT version: ' + _app_version(), + 'Python: ' + sys.version.split()[0], + 'Platform: ' + platform.platform(), + '', + 'Traceback:', + tb_text.rstrip(), + ] + if context: + lines += ['', 'Context:', str(context).rstrip()] + return '\n'.join(lines) + '\n' + + +def write_log(report, log_dir, now=None): + """Write ``report`` to a timestamped file under ``log_dir``. + + Creates ``log_dir`` if needed. Returns the path written, or ``None`` if the + write failed (reporting must never raise from inside an excepthook). + """ + now = time.time() if now is None else now + try: + os.makedirs(log_dir, exist_ok=True) + fname = 'mpact_crash_' + time.strftime('%Y%m%d_%H%M%S', time.localtime(now)) + '.log' + path = os.path.join(log_dir, fname) + with open(path, 'w', encoding='utf-8', errors='replace') as handle: + handle.write(report) + return path + except Exception: + return None + + +def one_line_summary(exc_type, exc_value): + """A concise ``TypeName: message`` for use as an issue title.""" + name = getattr(exc_type, '__name__', str(exc_type)) + message = str(exc_value).strip().splitlines()[0] if str(exc_value).strip() else '' + return (name + ': ' + message).strip().rstrip(':').strip() if message else name + + +def build_issue_url(report, title, repo=DEFAULT_REPO): + """Build a GitHub 'new issue' URL with a prefilled title and body. + + The body is the report wrapped in a code fence and truncated to + :data:`_MAX_ISSUE_BODY` so the URL stays openable. The full untruncated + report always lives in the on-disk log. + """ + body = report + if len(body) > _MAX_ISSUE_BODY: + body = body[:_MAX_ISSUE_BODY] + '\n...\n[truncated -- see attached crash log]' + body_md = ('**Describe what you were doing when this happened:**\n\n\n' + '---\n```\n' + body + '\n```\n') + query = urllib.parse.urlencode({'title': title, 'body': body_md}) + return 'https://github.com/' + repo + '/issues/new?' + query + + +def make_excepthook(report_handler, log_dir=None, repo=DEFAULT_REPO, + context_provider=None, prev_hook=None): + """Build (but don't install) an excepthook. + + Args: + report_handler: callable ``handler(report, log_path, issue_url)`` that + shows the user the report and offers to send it. Exceptions raised + by the handler are swallowed (an excepthook must not itself raise). + log_dir: directory for crash logs (skipped if None). + repo: GitHub repo for the prefilled issue URL. + context_provider: optional zero-arg callable returning extra context + text to embed (called defensively; failure is ignored). + prev_hook: a previous excepthook to chain to (defaults to the standard + ``sys.__excepthook__`` so the traceback still reaches the console). + + Returns: + A function with the ``(exc_type, exc_value, exc_tb)`` signature. + """ + prev_hook = prev_hook if prev_hook is not None else sys.__excepthook__ + + def _hook(exc_type, exc_value, exc_tb): + # Always let the default hook print to stderr first (and never let our + # own reporting suppress that or raise over it). + try: + prev_hook(exc_type, exc_value, exc_tb) + except Exception: + pass + try: + context = None + if context_provider is not None: + try: + context = context_provider() + except Exception: + context = None + report = format_report(exc_type, exc_value, exc_tb, context=context) + log_path = write_log(report, log_dir) if log_dir else None + title = 'Crash: ' + one_line_summary(exc_type, exc_value) + issue_url = build_issue_url(report, title, repo=repo) + if report_handler is not None: + report_handler(report, log_path, issue_url) + except Exception: + # Reporting failed -- the default hook already printed the real + # traceback, so just give up quietly rather than masking it. + pass + + return _hook + + +def install_excepthook(report_handler, log_dir=None, repo=DEFAULT_REPO, + context_provider=None): + """Install the crash excepthook as ``sys.excepthook``; return the previous + hook (so callers can restore it).""" + prev = sys.excepthook + sys.excepthook = make_excepthook( + report_handler, log_dir=log_dir, repo=repo, + context_provider=context_provider, prev_hook=prev) + return prev diff --git a/code/dbsearch.py b/code/dbsearch.py index c752f4f..f4894ae 100644 --- a/code/dbsearch.py +++ b/code/dbsearch.py @@ -11,7 +11,6 @@ """ import numpy as np -import pandas as pd from csvcache import cached_read_csv, invalidate @@ -39,18 +38,54 @@ def search_npatlas(outputdir, filename_stem, atlas, ppm_threshold): msdata = cached_read_csv(outputdir / (filename_stem + '_filtered.csv'), sep=',', header=[2], index_col=None).iloc[:, :3] - for _, mrow in msdata.iterrows(): - # Iterates over iondict, filters DB matches within window. - # Repeats for adducts, uses length of concat DF for feature hits - mass = mrow['m/z'] - hits_h = atlas[abs(1000000 * (atlas['compound_m_plus_h'] - mass) / atlas['compound_m_plus_h']) < ppm_threshold].copy() - hits_h['ppm'] = abs(1000000 * (hits_h['compound_m_plus_h'] - mass) / hits_h['compound_m_plus_h']) - hits_na = atlas[abs(1000000 * (atlas['compound_m_plus_na'] - mass) / atlas['compound_m_plus_na']) < ppm_threshold].copy() - hits_na['ppm'] = abs(1000000 * (hits_na['compound_m_plus_na'] - mass) / hits_na['compound_m_plus_na']) - hits = pd.concat([hits_h, hits_na]) + # Pre-sort the two adduct-mass columns once so each feature only tests a + # tiny m/z window (via searchsorted) instead of scanning all ~36k atlas + # rows twice -- the old per-feature ``atlas[boolean_mask]`` over the whole + # table was O(features x atlas_rows). The exact original ppm test + # (``abs(1e6*(atlas_mz - mass)/atlas_mz) < ppm_threshold``) is re-applied to + # the windowed candidates, so the matched set is bit-for-bit identical; the + # window (mass*(1 +/- 2*t)) is a safe superset of the true ppm window for + # the small tolerances used here. Verified output-identical (hitdb frames, + # incl. row order + ppm, and the iondict 'hits' column) against the old + # implementation on the real example dataset (~5x faster there). + mph = atlas['compound_m_plus_h'].to_numpy(dtype=float) + mna = atlas['compound_m_plus_na'].to_numpy(dtype=float) + order_h = np.argsort(mph, kind='stable'); sorted_h = mph[order_h] + order_na = np.argsort(mna, kind='stable'); sorted_na = mna[order_na] + t = ppm_threshold / 1e6 + + def _match(mass, sorted_vals, order, col_vals): + # Atlas positions whose ppm error vs `mass` is below threshold, in + # ascending atlas-position (i.e. original boolean-mask) order, plus + # their ppm values. + lo = np.searchsorted(sorted_vals, mass * (1 - 2 * t), side='left') + hi = np.searchsorted(sorted_vals, mass * (1 + 2 * t), side='right') + cand = order[lo:hi] + if cand.size == 0: + return cand, cand.astype(float) + cv = col_vals[cand] + sel = np.sort(cand[np.abs(1e6 * (cv - mass) / cv) < ppm_threshold]) + sel_cv = col_vals[sel] + return sel, np.abs(1e6 * (sel_cv - mass) / sel_cv) + + masses = msdata['m/z'].to_numpy(dtype=float) + compounds = msdata.iloc[:, 0].to_numpy() + counts = np.empty(len(masses), dtype=float) + for i in range(len(masses)): + mass = masses[i] + # m+h matches then m+na matches, concatenated in that order (matching + # the old ``pd.concat([hits_h, hits_na])``) and slicing the atlas once. + pos_h, ppm_h = _match(mass, sorted_h, order_h, mph) + pos_na, ppm_na = _match(mass, sorted_na, order_na, mna) + positions = np.concatenate([pos_h, pos_na]) + hits = atlas.iloc[positions].copy() + hits['ppm'] = np.concatenate([ppm_h, ppm_na]) hits = hits.sort_values(by=['ppm']) - hitdb[mrow['Compound']] = hits - iondict.loc[mrow['Compound'], 'hits'] = hits.shape[0] + hitdb[compounds[i]] = hits + counts[i] = positions.size + # One vectorised column assignment instead of a per-feature ``.loc`` scalar + # set (msdata's Compound ids are unique, a subset of iondict's index). + iondict.loc[compounds, 'hits'] = counts iondict.to_csv(outputdir / 'iondict.csv', header=True, index=True) # iondict.csv just changed on disk (gained/updated the 'hits' column) -- diff --git a/code/dialogs.py b/code/dialogs.py new file mode 100644 index 0000000..00862e3 --- /dev/null +++ b/code/dialogs.py @@ -0,0 +1,111 @@ +""" +MPACT +Copyright 2022, Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas + +Dark-themed QMessageBox helpers, matching the main GUI palette so the app's +dialogs don't render with invisible black-on-black text (the default when a +QMessageBox inherits the app's dark styling but has no colours of its own). + +Kept out of ``main.py`` (which can't be imported headlessly -- the documented +main<->ui_functions circular import) so the box construction/styling can be +unit-tested via Qt's offscreen platform (see ``tests/test_dialogs.py``), the +same approach used for ``searchtree.py``. + +Palette mirrors ui_main.py: background rgb(40,40,40), text rgb(212,212,212), +buttons rgb(62,62,62) / hover rgb(75,75,75). + +The push-buttons are styled *per widget* (``box.buttons()``) rather than via a +``QMessageBox QPushButton`` descendant selector: in practice that descendant +rule did not take effect on the standard buttons (they rendered borderless with +black text), while the box/label rules did -- styling each button object +directly is selector-independent and reliable. On Windows the native title bar +is also switched to dark + square corners via the DWM API so the dialog frame +matches the app's dark theme instead of the default light, rounded Win11 bar. +""" + +import sys + +from PyQt5 import QtWidgets + +_BG = 'rgb(40,40,40)' +_TEXT = 'rgb(212,212,212)' + +DIALOG_STYLE = """ +QMessageBox { background-color: %s; } +QMessageBox QLabel { color: %s; } +QMessageBox QTextEdit { background-color: rgb(35,35,35); color: %s; } +""" % (_BG, _TEXT, _TEXT) + +_BUTTON_STYLE = """ +QPushButton { + background-color: rgb(62,62,62); + color: rgb(212,212,212); + border: 1px solid rgb(120,120,120); + border-radius: 3px; + padding: 4px 16px; + min-width: 64px; +} +QPushButton:hover { background-color: rgb(75,75,75); } +QPushButton:pressed { background-color: rgb(55,55,55); } +QPushButton:default { border: 1px solid rgb(160,160,160); } +""" + + +def apply_dark_titlebar(widget): + """Best-effort: make a top-level window's title bar dark with square + corners on Windows 11 (no-op / silently ignored everywhere else). + + Uses the DWM window attributes (immersive dark mode + corner preference). + Must run before the window is first shown to take effect cleanly, so call + it after ``winId()`` realises the native handle but before ``exec_()``. + """ + if sys.platform != 'win32': + return + try: + import ctypes + hwnd = int(widget.winId()) + dwm = ctypes.windll.dwmapi + flag = ctypes.c_int(1) + # DWMWA_USE_IMMERSIVE_DARK_MODE: 20 on current Win10/11, 19 on early + # 20H1 builds -- set both; the wrong one just returns a failure code. + for attr in (20, 19): + dwm.DwmSetWindowAttribute(hwnd, attr, ctypes.byref(flag), ctypes.sizeof(flag)) + # DWMWA_WINDOW_CORNER_PREFERENCE = 33, DWMWCP_DONOTROUND = 1 (Win11). + corner = ctypes.c_int(1) + dwm.DwmSetWindowAttribute(hwnd, 33, ctypes.byref(corner), ctypes.sizeof(corner)) + except Exception: + pass + + +def build_message_box(parent, icon, title, text, buttons=None, default=None, + detailed=None): + """Construct a QMessageBox styled to match the MPACT GUI (does NOT exec). + + Separated from :func:`styled_message_box` so tests can inspect the + configured box without blocking on a modal ``exec_()``. + """ + box = QtWidgets.QMessageBox(parent) + box.setIcon(icon) + box.setWindowTitle(title) + box.setText(text) + if buttons is not None: + box.setStandardButtons(buttons) + if default is not None: + box.setDefaultButton(default) + if detailed is not None: + box.setDetailedText(detailed) + box.setStyleSheet(DIALOG_STYLE) + # Style each button object directly -- reliable where the descendant + # selector wasn't applied to the standard buttons. + for button in box.buttons(): + button.setStyleSheet(_BUTTON_STYLE) + return box + + +def styled_message_box(parent, icon, title, text, buttons=None, default=None, + detailed=None): + """Build the styled box, show it modally, and return the clicked button.""" + box = build_message_box(parent, icon, title, text, buttons=buttons, + default=default, detailed=detailed) + apply_dark_titlebar(box) + return box.exec_() diff --git a/code/main.py b/code/main.py index 418e2bd..c884095 100644 --- a/code/main.py +++ b/code/main.py @@ -44,6 +44,12 @@ from plotting import plot_abund, show_spectrum, show_featureplt, plot_heatmap, plot_mzrt, plot_samplecorr, kendrick, plot_volcano, plot_fc3d, plot_dendrogram, plot_ordination, prev_cv, plot_upset, plot_treemap import getfragdb +import webbrowser +import npatlasupdate +import mpactupdate +import crashreport +from dialogs import styled_message_box + from indigo import Indigo from indigo.renderer import IndigoRenderer indigo = Indigo() @@ -82,6 +88,10 @@ - potentially consider other database options like HMDB etc - fix up analysisinfo file output with better and more useful log ingo - add other ordination options like pca, pls-da, etc etc + ~DONE: the multivariate tab now offers PCA / NMDS / PLS-DA via a method + switcher (ordination.py, plotting.plot_ordination). Next candidate is + OPLS-DA (intentionally deferred -- no native sklearn support; see + devnotes.md "Multivariate ordination plot"). - add custom keyword arguments for each plot to make calling them easier - make it so groups can be reordered in the groupsets widgets? ~model-layer support done: GroupSetModel.move() (groupsets.py), tested in @@ -311,6 +321,12 @@ def moveWindow(event): UIFunctions.uiDefinitions(self) self.show() + # Deferred (post-show) best-effort startup checks: prompt to refresh a + # stale NPAtlas database and to install a newer MPACT release. Run via + # singleShot so the window paints first; fully guarded so they can + # never block or break launch. GUI-only path -- verify by launching. + QtCore.QTimer.singleShot(0, self._run_startup_checks) + #---Methods--- @@ -534,7 +550,81 @@ def update_mgf_feature_id(entry_lines, new_id): def error(self, message): self.ui.label_status.setText(message) self.ui.label_status.setStyleSheet('color: rgb(150,0,0);') - + + # ---- Startup checks (atlas freshness + app self-update) ---- + # All best-effort and fully guarded: any failure is logged to the console + # and otherwise ignored so a check can never block or break launch. The + # Qt-free logic lives in npatlasupdate.py / mpactupdate.py (unit-tested); + # these methods are only the dialog/wiring layer and need a live launch to + # verify end to end. + def _run_startup_checks(self): + try: + self._check_atlas_freshness() + except Exception as exc: + print('Atlas update check skipped:', exc) + try: + self._check_app_update() + except Exception as exc: + print('App update check skipped:', exc) + + def _check_atlas_freshness(self, atlas_path='npatlas.tsv', max_age_days=30): + """Offer to re-download the Natural Products Atlas if the local copy is + missing or older than ``max_age_days``.""" + if not npatlasupdate.is_update_due(atlas_path, max_age_days=max_age_days): + return + age = npatlasupdate.atlas_age_days(atlas_path) + age_msg = 'missing' if age is None else ('about %d days old' % int(age)) + reply = styled_message_box( + self, QtWidgets.QMessageBox.Question, 'Update Natural Products Atlas?', + 'Your local Natural Products Atlas database is %s.\n\n' + 'Download the latest copy from npatlas.org now (about 30 MB)?' % age_msg, + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, + default=QtWidgets.QMessageBox.No) + if reply != QtWidgets.QMessageBox.Yes: + return + # The download is large and runs on the main thread (wait cursor); a + # failed/partial transfer never clobbers the existing atlas (atomic + # replace in npatlasupdate.download_atlas). Threading this is a future + # improvement -- see devnotes.md. + self.ui.label_status.setText('Downloading Natural Products Atlas...') + QtWidgets.QApplication.setOverrideCursor(Qt.WaitCursor) + QtWidgets.QApplication.processEvents() + try: + n = npatlasupdate.download_atlas(atlas_path) + self.ui.label_status.setText('Natural Products Atlas updated (%.1f MB).' % (n / 1e6)) + except Exception as exc: + self.error('Atlas update failed (kept existing copy): ' + str(exc)) + finally: + QtWidgets.QApplication.restoreOverrideCursor() + + def _check_app_update(self): + """Check GitHub for a newer MPACT release; offer a git-pull update.""" + info = mpactupdate.check_for_update(timeout=5) + if not info.available: + return + notes = (info.notes or '').strip() + if len(notes) > 800: + notes = notes[:800] + '...' + reply = styled_message_box( + self, QtWidgets.QMessageBox.Question, 'MPACT update available', + 'A newer MPACT release is available.\n\n' + 'Installed: %s\nLatest: %s\n\n%s\n\n' + 'Update now (git pull)?' % (info.current, info.latest, notes), + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, + default=QtWidgets.QMessageBox.No) + if reply != QtWidgets.QMessageBox.Yes: + return + repo_dir = Path(__file__).resolve().parent.parent + ok, output = mpactupdate.apply_git_update(repo_dir) + if ok: + styled_message_box( + self, QtWidgets.QMessageBox.Information, 'Update complete', + 'MPACT was updated. Please restart the application.\n\n' + output) + else: + self.error('Automatic update failed; opening the release page instead.') + if info.url: + webbrowser.open(info.url) + def getgroups(self): """ Get biological groups on input of all input files, fills comboboxes with these. @@ -1379,5 +1469,32 @@ def mousePressEvent(self, event): if sys.platform != 'win32': app.setStyle('Fusion') app.setStyleSheet("QFrame { border: 0px; }") #QToolTip { color: #999999; background-color: rgb(0, 255, 0); border: 1px solid grey; }") + + # Crash reporting: on an unhandled exception, log a full report and offer to + # open a prefilled GitHub issue (nothing is sent without the user clicking + # through). Installed after the QApplication exists so the dialog can show. + # See crashreport.py for the design (and why not Sentry). + def _crash_dialog(report, log_path, issue_url): + try: + text = 'An unexpected error occurred.' + if log_path: + text += '\n\nA crash log was saved to:\n' + log_path + text += ('\n\nReport this on GitHub? Your browser will open a ' + 'prefilled issue — nothing is sent automatically.') + if styled_message_box( + None, QtWidgets.QMessageBox.Critical, + 'MPACT encountered an error', text, + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, + default=QtWidgets.QMessageBox.No, + detailed=report) == QtWidgets.QMessageBox.Yes and issue_url: + webbrowser.open(issue_url) + except Exception: + pass + + crashreport.install_excepthook( + _crash_dialog, + log_dir=str(Path.home() / '.mpact' / 'crashlogs'), + repo=mpactupdate.DEFAULT_REPO) + window = MainWindow() sys.exit(app.exec_()) diff --git a/code/mpactupdate.py b/code/mpactupdate.py new file mode 100644 index 0000000..4eb9689 --- /dev/null +++ b/code/mpactupdate.py @@ -0,0 +1,173 @@ +""" +MPACT +Copyright 2022, Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas + +Qt-free self-update checker. Queries the GitHub Releases API for the MPACT +repository (Robert Samples' fork, ``robertsamples/mpact``, by default), +compares the latest published release tag against the locally-running version, +and -- if newer -- hands the GUI the information it needs to ask the user +whether to update. The actual update of a git checkout is a best-effort +``git pull --ff-only``. + +Why not an off-the-shelf updater framework: the established Python option, +``pyupdater``, targets *frozen* (PyInstaller/cx_Freeze) apps and needs its own +patch-server + signing infrastructure -- heavyweight for a tool that's run +from a git clone (and the portable PyInstaller build is a separate, infrequent +artifact). For a source checkout the meaningful "update" is ``git pull``, and +"is there a newer release" is one GitHub API call + a version compare. That's +what this module does, with no third-party dependency beyond ``packaging`` +(already present; a tuple-based fallback covers its absence). + +This module is Qt-free and unit-tested (see ``tests/test_mpactupdate.py``); the +network and the git call are both injectable so the tests never touch either. +""" + +import json +import subprocess +import urllib.request + +#: The version of MPACT currently running. Bump this when cutting a release, +#: and create a matching GitHub release/tag (e.g. ``v1.0.1``) on the fork so +#: this checker can see it. Kept here as the single in-code source of truth; +#: keep it consistent with main.py's ``label_credits`` display string +#: (currently shows ``v1.00.01``). +__version__ = '1.0.01' + +DEFAULT_REPO = 'robertsamples/mpact' +_RELEASES_LATEST = 'https://api.github.com/repos/{repo}/releases/latest' + + +class UpdateInfo: + """Result of an update check. + + Attributes: + available: True if the latest release is newer than the running version. + current: the running version string. + latest: the latest release's tag (raw, e.g. ``v2.1.0``) or None. + url: the release's html_url (for "view release" / manual download). + notes: the release body/changelog text (may be ''). + """ + __slots__ = ('available', 'current', 'latest', 'url', 'notes') + + def __init__(self, available, current, latest=None, url=None, notes=''): + self.available = available + self.current = current + self.latest = latest + self.url = url + self.notes = notes + + def __repr__(self): + return ('UpdateInfo(available=%r, current=%r, latest=%r)' + % (self.available, self.current, self.latest)) + + +def _normalize(tag): + """Strip a leading 'v'/'V' and surrounding whitespace from a tag.""" + tag = str(tag).strip() + if tag[:1] in ('v', 'V'): + tag = tag[1:] + return tag + + +def is_newer(latest_tag, current_version): + """True if ``latest_tag`` represents a newer version than ``current_version``. + + Uses ``packaging.version`` when available (PEP 440 aware), falling back to a + dotted-integer tuple comparison so a missing ``packaging`` never breaks the + check. A tag that can't be parsed at all is treated as "not newer" (fail + safe -- never nag about an unparseable tag). + """ + latest = _normalize(latest_tag) + current = _normalize(current_version) + try: + from packaging.version import parse as _parse + return _parse(latest) > _parse(current) + except Exception: + def _tuple(text): + parts = [] + for chunk in text.replace('-', '.').split('.'): + if chunk.isdigit(): + parts.append(int(chunk)) + else: + break + return tuple(parts) + lt, ct = _tuple(latest), _tuple(current) + if not lt: + return False + return lt > ct + + +def fetch_latest_release(repo=DEFAULT_REPO, opener=None, timeout=10): + """Fetch the latest published release from the GitHub API. + + Returns the parsed JSON dict, or ``None`` if the repo has no published + releases (the API 404s) or the response can't be parsed. ``opener`` is an + injectable ``opener(request_or_url, timeout=...)`` returning a readable, + context-managed response (defaults to ``urllib.request.urlopen``). + """ + opener = opener if opener is not None else urllib.request.urlopen + url = _RELEASES_LATEST.format(repo=repo) + # GitHub recommends a User-Agent; the v3 Accept header pins the response shape. + request = urllib.request.Request(url, headers={ + 'Accept': 'application/vnd.github+json', + 'User-Agent': 'MPACT-update-check', + }) + try: + try: + response = opener(request, timeout=timeout) + except TypeError: + response = opener(request) + with response as resp: + payload = resp.read() + except Exception: + # Network error, 404 (no releases yet), DNS failure, offline -- all + # non-fatal: an update check must never break startup. + return None + try: + data = json.loads(payload.decode('utf-8') if isinstance(payload, bytes) else payload) + except Exception: + return None + if not isinstance(data, dict) or 'tag_name' not in data: + return None + return data + + +def check_for_update(current_version=__version__, repo=DEFAULT_REPO, opener=None, + timeout=10): + """Check whether a newer MPACT release exists. + + Returns an :class:`UpdateInfo`. ``available`` is False (and ``latest`` is + None) when the check can't reach the API or finds no newer release -- the + GUI can simply do nothing in that case. + """ + release = fetch_latest_release(repo=repo, opener=opener, timeout=timeout) + if release is None: + return UpdateInfo(available=False, current=current_version) + latest_tag = release.get('tag_name') + return UpdateInfo( + available=is_newer(latest_tag, current_version), + current=current_version, + latest=latest_tag, + url=release.get('html_url'), + notes=release.get('body') or '', + ) + + +def apply_git_update(repo_dir, runner=None, remote='origin', branch='main'): + """Best-effort ``git pull --ff-only`` of a source checkout. + + Only meaningful when MPACT is running from a git clone (not a frozen + build). ``runner`` is an injectable ``subprocess.run``-compatible callable + (so tests don't shell out). Returns ``(success, output)`` where ``output`` + is combined stdout/stderr text. + """ + runner = runner if runner is not None else subprocess.run + try: + completed = runner( + ['git', '-C', str(repo_dir), 'pull', '--ff-only', remote, branch], + capture_output=True, text=True, timeout=120, + ) + except Exception as exc: + return False, 'git pull could not be run: ' + str(exc) + output = (getattr(completed, 'stdout', '') or '') + (getattr(completed, 'stderr', '') or '') + return completed.returncode == 0, output.strip() diff --git a/code/npatlasupdate.py b/code/npatlasupdate.py new file mode 100644 index 0000000..0b83b6e --- /dev/null +++ b/code/npatlasupdate.py @@ -0,0 +1,144 @@ +""" +MPACT +Copyright 2022, Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas + +Qt-free updater for the bundled Natural Products Atlas database +(``npatlas.tsv``). The Natural Products Atlas (https://www.npatlas.org) +publishes periodic full-database downloads; the copy MPACT ships with goes +stale over time as new compounds are deposited. + +This module provides the staleness check and the download/validation logic, +with no Qt dependency, so it can be unit-tested headlessly (see +``tests/test_npatlasupdate.py``). The GUI side (``main.py``) only has to ask +the user a yes/no question and call :func:`download_atlas`. + +Format decision (do not "upgrade" without reason): MPACT reads the atlas as a +tab-separated table via ``pd.read_csv('npatlas.tsv', sep='\\t', ...)`` and +``dbsearch.search_npatlas`` accesses specific columns +(``compound_m_plus_h``/``compound_m_plus_na``/``compound_smiles``/ +``origin_type``/``genus``). The published ``NPAtlas_download.tsv`` already has +exactly those columns, so the TSV is a drop-in replacement and is what we +fetch. The ``NPAtlas_download.json`` is the same data in a nested JSON shape +that would need flattening before pandas/dbsearch could use it -- there is no +benefit to switching formats and a real cost (rewriting the read + column +access), so we deliberately stay on the TSV. +""" + +import os +import shutil +import tempfile +import time +import urllib.request + +DEFAULT_TSV_URL = 'https://www.npatlas.org/static/downloads/NPAtlas_download.tsv' +DEFAULT_JSON_URL = 'https://www.npatlas.org/static/downloads/NPAtlas_download.json' + +# Columns the app actually consumes (main.py's atlas read + dbsearch). A +# download missing any of these is rejected rather than allowed to clobber a +# working atlas -- guards against the server returning an HTML error page or a +# truncated/renamed export. +REQUIRED_COLUMNS = frozenset({ + 'compound_id', + 'compound_m_plus_h', + 'compound_m_plus_na', + 'compound_smiles', + 'origin_type', + 'genus', +}) + +DEFAULT_MAX_AGE_DAYS = 30 + + +def atlas_age_days(path, now=None): + """Age of the atlas file in days based on its mtime, or ``None`` if the + file doesn't exist.""" + if not os.path.exists(path): + return None + now = time.time() if now is None else now + return (now - os.path.getmtime(path)) / 86400.0 + + +def is_update_due(path, max_age_days=DEFAULT_MAX_AGE_DAYS, now=None): + """True if the atlas is missing or older than ``max_age_days``. + + This is the cheap check the app runs at startup before deciding whether to + prompt the user -- it only stats the file, it never touches the network. + + Caveat: staleness is judged by file *mtime*, which is the requested + "last modified over a month ago" behaviour but reflects when the file was + last written locally, not the vintage of the data inside it. A fresh + ``git clone`` stamps the checked-out file with the clone time, so a + just-cloned-but-data-old atlas will read as "fresh" until 30 days pass. + Embedding the NPAtlas release date in a sidecar and comparing that would + be more accurate if this ever matters. + """ + age = atlas_age_days(path, now=now) + return age is None or age > max_age_days + + +def validate_tsv_header(first_line): + """True if a TSV header line contains every :data:`REQUIRED_COLUMNS`.""" + columns = {c.strip() for c in first_line.rstrip('\n').split('\t')} + return REQUIRED_COLUMNS.issubset(columns) + + +def download_atlas(dest, url=DEFAULT_TSV_URL, opener=None, validate=True, + timeout=60): + """Download the NPAtlas TSV to ``dest`` atomically. + + The download streams to a temporary file in ``dest``'s directory, is + validated (the header must contain :data:`REQUIRED_COLUMNS`), and only then + ``os.replace``-d over ``dest`` -- so a network error, an HTML error page, + or a partial transfer can never leave a corrupt or truncated atlas in + place; the previous file is untouched on any failure. + + Args: + dest: path the atlas should end up at (e.g. ``code/npatlas.tsv``). + url: download URL (defaults to the published full TSV). + opener: callable ``opener(url, timeout=...)`` returning a readable, + context-managed response (defaults to ``urllib.request.urlopen``). + Injectable so tests can supply canned content without a network. + validate: if True, reject a download whose header is missing required + columns (raises ``ValueError``, leaving ``dest`` unchanged). + timeout: per-request timeout in seconds (urllib only). + + Returns: + Number of bytes written to ``dest``. + """ + dest = os.fspath(dest) + dest_dir = os.path.dirname(os.path.abspath(dest)) or '.' + opener = opener if opener is not None else urllib.request.urlopen + + fd, tmp_path = tempfile.mkstemp(prefix='.npatlas_', suffix='.tsv', dir=dest_dir) + try: + with os.fdopen(fd, 'wb') as tmp_file: + try: + response = opener(url, timeout=timeout) + except TypeError: + # Injected openers in tests may not accept a timeout kwarg. + response = opener(url) + with response as resp: + shutil.copyfileobj(resp, tmp_file) + bytes_written = os.path.getsize(tmp_path) + + if bytes_written == 0: + raise ValueError('Downloaded atlas is empty') + if validate: + with open(tmp_path, 'r', encoding='utf-8', errors='replace') as check: + first_line = check.readline() + if not validate_tsv_header(first_line): + raise ValueError( + 'Downloaded file is not a valid NPAtlas TSV ' + '(missing required columns); keeping existing atlas') + + os.replace(tmp_path, dest) + return bytes_written + except BaseException: + # Clean up the temp file on ANY failure (including the validation + # errors above); never disturb the existing dest. + try: + if os.path.exists(tmp_path): + os.remove(tmp_path) + except OSError: + pass + raise diff --git a/code/ordination.py b/code/ordination.py index 7829e9c..43d21fd 100644 --- a/code/ordination.py +++ b/code/ordination.py @@ -15,6 +15,8 @@ This module is Qt-free and unit-tested (see ``tests/test_ordination.py``). """ +from pathlib import Path + import numpy as np import pandas as pd from sklearn.cross_decomposition import PLSRegression @@ -75,11 +77,19 @@ def load_ordination_matrix(file, raw_msdata_header, collapse_replicates): for elem in collapsed_columns: header.append((elem[1], '', elem[0])) msdata.columns = pd.MultiIndex.from_tuples(header) - msdata.to_csv('averagepca.csv', header=True, index=False) - - msdata_header = pd.read_csv('averagepca.csv', sep=',', header=None, + # Round-trip the collapsed matrix through a CSV so its relabeled + # 3-row header reads back the same way the uncollapsed path reads + # the real file. Write it next to the input peak table (the run's + # output directory) rather than the process's current working + # directory, which in the deployed app is code/ -- this file is an + # internal scratch artifact, not something the user should find in + # the source tree. + avg_path = Path(file).with_name('averagepca.csv') + msdata.to_csv(avg_path, header=True, index=False) + + msdata_header = pd.read_csv(avg_path, sep=',', header=None, index_col=[0, 1, 2]).iloc[:3, :].transpose() - pcadf = (pd.read_csv('averagepca.csv', sep=',', header=[2], index_col=[0]) + pcadf = (pd.read_csv(avg_path, sep=',', header=[2], index_col=[0]) .drop(['m/z', 'Retention time (min)'], axis=1) .transpose().astype(float).reset_index().rename(columns={'index': 'File'})) else: diff --git a/code/plotting.py b/code/plotting.py index 137f2a5..cf8e019 100644 --- a/code/plotting.py +++ b/code/plotting.py @@ -11,6 +11,7 @@ from csvcache import cached_read_csv, invalidate as invalidate_csv_cache import ordination import clusterpurity +import qualityscore import matplotlib #matplotlib.style.use('ggplot') @@ -41,7 +42,6 @@ from matplotlib.patches import Ellipse from filter import listfilter import time -import math from pvclust import PvClust @@ -1473,63 +1473,23 @@ def __init__(self, parent, currplt, frame, file, filtereddfs, groupsets): self.plot(parent, file, filtereddfs, groupsets) def plot(self, parent, file, filtereddfs, groupsets): - # Load and filter ion data + # Load ion data and the average injections-per-sample (for the noise + # model that scales the CV axis), then compute the quality metrics in + # the Qt-free qualityscore module (unit-tested; see qualityscore.py). iondict = cached_read_csv(parent.analysis_paramsgui.outputdir / 'iondict.csv', header=0, index_col=0) - iondict = iondict[~np.isnan(iondict['average CV'])] - - # Calculate mean and median CV, and scale data - iondictmean = iondict.sort_values(['average CV']).reset_index() - iondictmed = iondict.sort_values(['median CV']).reset_index() - iondictmean = iondictmean.reset_index() - iondictmed = iondictmed.reset_index() - iondictmean.iloc[:,0] = 100 * iondictmean.iloc[:,0]/len(iondictmean['average CV']) - iondictmed.iloc[:,0] = 100 * iondictmed.iloc[:,0]/len(iondictmed['median CV']) - - # Calculate maximum theoretical CV based on neff msdata_header = cached_read_csv(parent.analysis_paramsgui.outputdir / (parent.analysis_paramsgui.filename.stem + '_filtered.csv'), sep=',', header=None, index_col=[0,1,2]).iloc[:3,:].transpose() msdata_header.columns = ['Biolgroup', 'Sample', 'Injection'] average_n = msdata_header['Injection'].nunique() / msdata_header['Sample'].nunique() - modelstdevlist = [1] + [0] * (int(average_n) - 1) - modelstdev = pd.Series(modelstdevlist).std() / pd.Series(modelstdevlist).mean() - cv50 = iondictmean.iloc[(iondictmean.iloc[:,0] - 50).abs().argsort()[:1]]['average CV'] - sortedcv = iondictmean.iloc[(iondictmean.iloc[:,0]).argsort()]['average CV'] - prevav = 0 - aucav = 0 - prevmed = 0 - aucmed = 0 - for pos in range(0,len(iondictmean.iloc[:,0])): - dist = iondictmean.iloc[pos,:]['average CV'] - prevav - aucav += dist*iondictmean.iloc[pos,0] - prevav = iondictmean.iloc[pos,:]['average CV'] - - dist = iondictmed.iloc[pos,:]['median CV'] - prevmed - aucmed += dist*iondictmed.iloc[pos,0] - prevmed = iondictmed.iloc[pos,:]['median CV'] - - meanav = 0 - meanmed = 0 - sumskew = 0 - if math.isnan(modelstdev): - modelstdev = 1.7 - for val in range(1, int((modelstdev*100))): - pos = val/100 - meanav = iondictmean[abs(iondictmean['average CV'] - pos-modelstdev/200) < modelstdev/200].iloc[:,0].mean() - meanmed = iondictmed[abs(iondictmed['average CV'] - pos-modelstdev/200) < modelstdev/200].iloc[:,0].mean() - skew = abs(meanmed-meanav) - if not np.isnan(skew): - sumskew += skew * modelstdev/100 - - - sumskew = sumskew/ ((aucmed+aucav)/2) - rep = ((aucmed+aucav)/2)/(modelstdev*100) - qualscore = (1-sumskew)*rep*100 - - #qualscore = round(100 * (1 - cv50 / modelstdev), 1) + + result = qualityscore.compute_cv_quality(iondict, average_n) + iondictmean = result.iondictmean + iondictmed = result.iondictmed + modelstdev = result.modelstdev # Update UI - parent.ui.lbl_spllist_3.setText('Reproducibility:\n' + str(round(100*rep,1)) + '%\n' + - 'Skewnewss:\n' + str(round(100*sumskew,1)) + '%\n\n' + - 'Overall:\n' + str(round(qualscore,1)) + '%') + parent.ui.lbl_spllist_3.setText('Reproducibility:\n' + str(round(100*result.rep,1)) + '%\n' + + 'Skewnewss:\n' + str(round(100*result.sumskew,1)) + '%\n\n' + + 'Overall:\n' + str(round(result.qualscore,1)) + '%') # Plot data currplt = 'cvplt' #instead take this from input diff --git a/code/qualityscore.py b/code/qualityscore.py new file mode 100644 index 0000000..b35e11f --- /dev/null +++ b/code/qualityscore.py @@ -0,0 +1,141 @@ +""" +MPACT +Copyright 2022, Robert M. Samples, Sara P. Puckett, and Marcy J. Balunas + +Qt-free extraction of the data-quality metrics shown on the CV (coefficient of +variation) rarefaction plot tab. These numbers -- Reproducibility, Skewness, +and an Overall quality score -- already existed, but the math was buried inside +``plotting.prev_cv.plot()`` (entangled with the matplotlib drawing and the +``lbl_spllist_3`` label update) and had no test coverage. + +Moving the computation here (the same pattern as ``ordination.py`` / +``biogroups.py`` / ``dbsearch.py`` / ``clusterpurity.py``) makes it unit- +testable and keeps ``prev_cv`` as a thin draw-the-result wrapper. The logic is +a faithful port of the original inline code -- ``tests/test_qualityscore.py`` +pins it against a verbatim copy of that original to guarantee the extraction +didn't change any displayed number. + +Definitions (as originally implemented): +- The CV rarefaction curve plots, for the mean-CV and median-CV orderings, the + cumulative percentage of features (0-100) against their CV. ``aucav``/ + ``aucmed`` are the areas under those two curves (percentage x CV). +- ``modelstdev`` is the maximum theoretical CV expected from pure + count-statistics noise given the average number of injections per sample + (``[1] + [0]*(n-1)`` treated as a sample -> its CV); it sets the CV axis + scale and normalises the AUC. +- ``rep`` (Reproducibility) = mean AUC / (modelstdev x 100): how far left + (low-CV) the curve sits relative to the noise-model scale. +- ``sumskew`` (Skewness) = normalised integrated gap between the mean-CV and + median-CV curves: how asymmetric the CV distribution is. +- ``qualscore`` (Overall) = (1 - skew) x rep x 100. +""" + +import math + +import numpy as np +import pandas as pd + + +class CVQualityResult: + """Bundle of everything ``prev_cv`` needs to label and draw the CV plot. + + Attributes: + iondictmean: features sorted by 'average CV', with column 0 replaced by + the cumulative percentage (0-100) -- the mean-CV rarefaction curve. + iondictmed: same, sorted/ranked by 'median CV'. + modelstdev: the count-statistics noise-model CV (axis scale). + rep: reproducibility fraction (0-1); ``100*rep`` is the displayed %. + sumskew: skewness fraction (0-1); ``100*sumskew`` is the displayed %. + qualscore: overall quality score (already on a 0-100 scale). + """ + __slots__ = ('iondictmean', 'iondictmed', 'modelstdev', 'rep', 'sumskew', 'qualscore') + + def __init__(self, iondictmean, iondictmed, modelstdev, rep, sumskew, qualscore): + self.iondictmean = iondictmean + self.iondictmed = iondictmed + self.modelstdev = modelstdev + self.rep = rep + self.sumskew = sumskew + self.qualscore = qualscore + + +def noise_model_cv(average_n): + """Maximum theoretical CV from pure presence/absence count noise given + ``average_n`` injections per sample: the CV of ``[1] + [0]*(n-1)``. + + Falls back to 1.7 when undefined (e.g. a single injection per sample makes + the model series a single value with no spread) -- matching the original. + """ + modelstdevlist = [1] + [0] * (int(average_n) - 1) + series = pd.Series(modelstdevlist) + modelstdev = series.std() / series.mean() + if math.isnan(modelstdev): + modelstdev = 1.7 + return modelstdev + + +def compute_cv_quality(iondict, average_n): + """Compute the CV-plot quality metrics from an ion dictionary. + + Args: + iondict: DataFrame with 'average CV' and 'median CV' columns (the + ``iondict.csv`` produced by the CV filter). Rows with NaN + 'average CV' are dropped, matching the plot. + average_n: average number of injections per sample (used for the + count-statistics noise model that scales the CV axis). + + Returns: + CVQualityResult. + """ + iondict = iondict[~np.isnan(iondict['average CV'])] + + # Cumulative-percentage rarefaction curves for the mean- and median-CV + # orderings. The double reset_index reproduces the original exactly: + # the first moves the Compound index to a column, the second materialises + # the 0..n-1 rank as column 0, which is then rescaled to a 0-100 percentage. + iondictmean = iondict.sort_values(['average CV']).reset_index() + iondictmed = iondict.sort_values(['median CV']).reset_index() + iondictmean = iondictmean.reset_index() + iondictmed = iondictmed.reset_index() + # Replace column 0 (the integer rank) with the 0-100 cumulative percentage. + # Assign by column LABEL (not in-place via .iloc) so the column's dtype is + # replaced wholesale rather than an incompatible float cast into an int64 + # column -- the latter raises a FutureWarning on pandas 2.x and is slated + # to become a hard error (same class as the LossySetitemError fixed + # elsewhere). Numerically identical to the original. + mean_col0, med_col0 = iondictmean.columns[0], iondictmed.columns[0] + iondictmean[mean_col0] = 100 * iondictmean.iloc[:, 0] / len(iondictmean['average CV']) + iondictmed[med_col0] = 100 * iondictmed.iloc[:, 0] / len(iondictmed['median CV']) + + modelstdev = noise_model_cv(average_n) + + # Area under each rarefaction curve (percentage integrated over CV). + # Vectorised equivalent of the original per-row loop + # aucav += (cv[pos] - cv[pos-1]) * pct[pos] (cv[-1] := 0) + # using np.diff(prepend=0) so the first step's "previous" value is 0, + # matching the loop's prevav/prevmed starting at 0. (np.sum's pairwise + # summation can differ from the loop's sequential add by <1 ULP, far below + # the 0.1%-rounded display precision -- the faithfulness test pins this.) + cv_av = iondictmean['average CV'].to_numpy() + pct_av = iondictmean.iloc[:, 0].to_numpy() + aucav = float(np.sum(np.diff(cv_av, prepend=0.0) * pct_av)) + + cv_med = iondictmed['median CV'].to_numpy() + pct_med = iondictmed.iloc[:, 0].to_numpy() + aucmed = float(np.sum(np.diff(cv_med, prepend=0.0) * pct_med)) + + # Integrated gap between the mean and median curves (distribution skew). + sumskew = 0 + for val in range(1, int((modelstdev * 100))): + pos = val / 100 + meanav = iondictmean[abs(iondictmean['average CV'] - pos - modelstdev / 200) < modelstdev / 200].iloc[:, 0].mean() + meanmed = iondictmed[abs(iondictmed['average CV'] - pos - modelstdev / 200) < modelstdev / 200].iloc[:, 0].mean() + skew = abs(meanmed - meanav) + if not np.isnan(skew): + sumskew += skew * modelstdev / 100 + + sumskew = sumskew / ((aucmed + aucav) / 2) + rep = ((aucmed + aucav) / 2) / (modelstdev * 100) + qualscore = (1 - sumskew) * rep * 100 + + return CVQualityResult(iondictmean, iondictmed, modelstdev, rep, sumskew, qualscore) diff --git a/code/stats.py b/code/stats.py index 1237416..3b90501 100644 --- a/code/stats.py +++ b/code/stats.py @@ -72,7 +72,6 @@ def groupave(analysis_params): # Initialize lists to collect results from each chunk sum_values_list = [] - sum_squares_list = [] counts_list = [] # Process data in chunks @@ -90,33 +89,29 @@ def groupave(analysis_params): # Set index names according to your data structure chunk_stacked.index.names = ['Compound', 'm/z', 'Retention time', 'Group', 'Sample', 'Injection'] - # Compute sum, sum of squares, and counts per group + # Compute sum and counts per group group_levels = ['Compound', 'm/z', 'Retention time', 'Group', 'Sample', 'Injection'] sum_values_chunk = chunk_stacked.groupby(level=group_levels).sum() - sum_squares_chunk = (chunk_stacked ** 2).groupby(level=group_levels).sum() count_chunk = chunk_stacked.groupby(level=group_levels).count() # Append results to lists sum_values_list.append(sum_values_chunk) - sum_squares_list.append(sum_squares_chunk) counts_list.append(count_chunk) pbar.update(1) # Concatenate all results all_sum_values = pd.concat(sum_values_list) - all_sum_squares = pd.concat(sum_squares_list) all_counts = pd.concat(counts_list) # Aggregate over the entire dataset sum_values_df = all_sum_values.groupby(level=group_levels).sum() - sum_squares_df = all_sum_squares.groupby(level=group_levels).sum() counts_df = all_counts.groupby(level=group_levels).sum() - # Calculate mean and variance per injection + # Calculate mean per injection. (A per-injection variance/stddev was + # computed here previously but never used -- the technical/biological + # RSDs below are derived from the grouped means, not from it.) mean_values = sum_values_df / counts_df - variance_values = (sum_squares_df / counts_df) - (mean_values ** 2) - stddev_values = variance_values ** 0.5 # Calculate technical RSDs and counts # Group over technical replicates within each sample @@ -251,9 +246,14 @@ def runttest(analysis_params, statstgrps, groupsets): msdata_teststats.loc[msdata_teststats['p'] <= minval, 'p'] = minval msdata_teststats['logp'] = np.log10(msdata_teststats['p']) - # Save msdata_teststats + # Save msdata_teststats. Previously written to the current working + # directory as 'msdata_teststats_test.csv' (a debug-named file that + # littered code/ and was never read back); now written into the run's + # output directory under a descriptive name alongside the other outputs. msdata_teststats = msdata_teststats.reset_index([1, 2]) - msdata_teststats.to_csv('msdata_teststats_test.csv', header=True, index=True) + msdata_teststats.to_csv( + analysis_params.outputdir / (analysis_params.filename.stem + '_teststats.csv'), + header=True, index=True) # Update iondict with -logp iondict['-logp'] = -msdata_teststats['logp'] @@ -301,7 +301,12 @@ def runttest(analysis_params, statstgrps, groupsets): # Save results to CSV files iondict = iondict.set_index('Compound') - iondict.to_csv('qdata.csv', index=True, header=True) + # Written into the run's output directory rather than the current working + # directory (was a bare 'qdata.csv' that landed in code/ and was never + # read back); the canonical -logq still goes into iondict.csv below. + iondict.to_csv( + analysis_params.outputdir / (analysis_params.filename.stem + '_qvalues.csv'), + index=True, header=True) iondict2 = pd.read_csv(analysis_params.outputdir / 'iondict.csv', sep=',', header=[0], index_col=[0]) iondict2['-logq'] = np.nan iondict2.loc[iondict.index.tolist(), '-logq'] = iondict['-logq'] diff --git a/code/tests/test_crashreport.py b/code/tests/test_crashreport.py new file mode 100644 index 0000000..d4ac0ee --- /dev/null +++ b/code/tests/test_crashreport.py @@ -0,0 +1,156 @@ +"""Unit tests for the crash/error reporter (``crashreport.py``). + +A real exception is manufactured (raise + catch) so there's a genuine +traceback object to format; the GUI dialog is replaced by a recording +callback, and the excepthook is exercised by calling it directly. +""" + +import sys +import urllib.parse + +import crashreport as cr + + +def _make_exc_info(): + try: + raise ValueError('boom happened in feature 0.80_418.1451n') + except ValueError: + return sys.exc_info() + + +# --------------------------------------------------------------------------- # +# format_report +# --------------------------------------------------------------------------- # + +def test_format_report_contains_traceback_and_environment(): + et, ev, tb = _make_exc_info() + report = cr.format_report(et, ev, tb, now=0) + assert 'MPACT crash report' in report + assert 'ValueError: boom happened' in report + assert 'Traceback' in report + assert 'Python:' in report and 'Platform:' in report + + +def test_format_report_includes_context_when_given(): + et, ev, tb = _make_exc_info() + report = cr.format_report(et, ev, tb, context='dataset=PTY087I2; step=heatmap') + assert 'Context:' in report + assert 'PTY087I2' in report + + +# --------------------------------------------------------------------------- # +# one_line_summary +# --------------------------------------------------------------------------- # + +def test_one_line_summary_uses_type_and_message(): + et, ev, _ = _make_exc_info() + assert cr.one_line_summary(et, ev) == 'ValueError: boom happened in feature 0.80_418.1451n' + + +def test_one_line_summary_handles_empty_message(): + try: + raise RuntimeError() + except RuntimeError: + et, ev, _ = sys.exc_info() + assert cr.one_line_summary(et, ev) == 'RuntimeError' + + +# --------------------------------------------------------------------------- # +# write_log +# --------------------------------------------------------------------------- # + +def test_write_log_creates_timestamped_file(tmp_path): + path = cr.write_log('hello report', tmp_path / 'crashlogs', now=0) + assert path is not None + with open(path) as f: + assert f.read() == 'hello report' + assert 'mpact_crash_' in path and path.endswith('.log') + + +def test_write_log_returns_none_on_failure(tmp_path): + # Point log_dir at a path that exists as a *file* so makedirs fails. + afile = tmp_path / 'not_a_dir' + afile.write_text('x') + assert cr.write_log('r', str(afile / 'sub')) is None + + +# --------------------------------------------------------------------------- # +# build_issue_url +# --------------------------------------------------------------------------- # + +def test_build_issue_url_is_wellformed_and_encoded(): + url = cr.build_issue_url('a traceback & stuff', 'Crash: ValueError: boom', + repo='robertsamples/mpact') + assert url.startswith('https://github.com/robertsamples/mpact/issues/new?') + parsed = urllib.parse.urlparse(url) + params = urllib.parse.parse_qs(parsed.query) + assert params['title'] == ['Crash: ValueError: boom'] + assert 'a traceback & stuff' in params['body'][0] + + +def test_build_issue_url_truncates_huge_body(): + huge = 'x' * 50000 + url = cr.build_issue_url(huge, 'Crash', repo='r/m') + params = urllib.parse.parse_qs(urllib.parse.urlparse(url).query) + assert 'truncated' in params['body'][0] + assert len(params['body'][0]) < 7000 + + +# --------------------------------------------------------------------------- # +# excepthook +# --------------------------------------------------------------------------- # + +def test_excepthook_invokes_handler_and_writes_log(tmp_path): + received = {} + + def handler(report, log_path, issue_url): + received['report'] = report + received['log_path'] = log_path + received['issue_url'] = issue_url + + chained = [] + hook = cr.make_excepthook(handler, log_dir=str(tmp_path / 'logs'), + repo='robertsamples/mpact', + prev_hook=lambda *a: chained.append(a)) + et, ev, tb = _make_exc_info() + hook(et, ev, tb) + + assert 'ValueError' in received['report'] + assert received['log_path'] is not None and received['log_path'].endswith('.log') + assert received['issue_url'].startswith('https://github.com/robertsamples/mpact/issues/new?') + # The previous hook was still chained (traceback reaches the console). + assert len(chained) == 1 + + +def test_excepthook_swallows_handler_errors(tmp_path): + def bad_handler(report, log_path, issue_url): + raise RuntimeError('dialog blew up') + + hook = cr.make_excepthook(bad_handler, log_dir=str(tmp_path), prev_hook=lambda *a: None) + et, ev, tb = _make_exc_info() + # Must not raise, even though the handler does. + hook(et, ev, tb) + + +def test_excepthook_uses_context_provider(tmp_path): + received = {} + + def handler(report, log_path, issue_url): + received['report'] = report + + hook = cr.make_excepthook(handler, log_dir=str(tmp_path), + context_provider=lambda: 'active dataset: foo', + prev_hook=lambda *a: None) + et, ev, tb = _make_exc_info() + hook(et, ev, tb) + assert 'active dataset: foo' in received['report'] + + +def test_install_excepthook_restores(tmp_path): + original = sys.excepthook + try: + prev = cr.install_excepthook(lambda *a: None, log_dir=str(tmp_path)) + assert prev is original + assert sys.excepthook is not original + finally: + sys.excepthook = original diff --git a/code/tests/test_dbsearch.py b/code/tests/test_dbsearch.py index 94467e3..f12606f 100644 --- a/code/tests/test_dbsearch.py +++ b/code/tests/test_dbsearch.py @@ -66,6 +66,47 @@ def test_no_match_outside_ppm_window(tmp_path): assert hitdb['c2'].empty +def _write_single_feature(tmp_path, mz): + """One-feature filtered table + matching iondict, for the ordering tests.""" + stem = 'example' + pd.DataFrame({'Compound': ['feat'], 'other': [1]}).to_csv(tmp_path / 'iondict.csv', index=False) + with open(tmp_path / (stem + '_filtered.csv'), 'w') as f: + f.write(',,\n,,\nCompound,m/z,Retention time (min)\n') + f.write('feat,%s,1.0\n' % mz) + return tmp_path, stem + + +def test_hits_sorted_by_ppm_across_both_adducts(tmp_path): + # m+h matches A (~20 ppm) and B (~30 ppm); m+na matches C (~5 ppm). + # The combined result must be ascending by ppm: C, A, B. + atlas = pd.DataFrame({ + 'compound_name': ['A', 'B', 'C'], + 'compound_m_plus_h': [200.000, 200.010, 999.0], + 'compound_m_plus_na': [999.0, 999.0, 200.005], + }) + outputdir, stem = _write_single_feature(tmp_path, 200.004) + hitdb, _ = search_npatlas(outputdir, stem, atlas, ppm_threshold=100) + hits = hitdb['feat'] + assert list(hits['compound_name']) == ['C', 'A', 'B'] + assert list(hits['ppm']) == sorted(hits['ppm']) # ascending + + +def test_single_atlas_row_matching_both_adducts_appears_twice(tmp_path): + # One atlas row whose [M+H] and [M+Na] are both at the feature mass must + # appear once per adduct (two rows), matching the old concat behaviour. + atlas = pd.DataFrame({ + 'compound_name': ['D'], + 'compound_m_plus_h': [300.000], + 'compound_m_plus_na': [300.000], + }) + outputdir, stem = _write_single_feature(tmp_path, 300.000) + hitdb, _ = search_npatlas(outputdir, stem, atlas, ppm_threshold=10) + assert len(hitdb['feat']) == 2 + assert list(hitdb['feat']['compound_name']) == ['D', 'D'] + on_disk = pd.read_csv(outputdir / 'iondict.csv', index_col=0) + assert on_disk.loc['feat', 'hits'] == 2 + + def test_invalidates_stale_cached_reads_under_other_shapes(tmp_path): """Regression guard: fillfttree() (main.py) reads iondict.csv with header=[0], index_col=None -- a different cache key than diff --git a/code/tests/test_dialogs.py b/code/tests/test_dialogs.py new file mode 100644 index 0000000..fc25820 --- /dev/null +++ b/code/tests/test_dialogs.py @@ -0,0 +1,79 @@ +"""Headless tests for the dark-themed dialog helpers (``dialogs.py``). + +Uses the offscreen Qt platform (the ``qapp`` fixture in conftest.py) to build +the message boxes without a display and without blocking on a modal exec. +Guards the regression that prompted this module: a QMessageBox with no +explicit colours rendered black-on-black under the app's dark styling. +""" + +import sys + +from PyQt5 import QtWidgets + +import dialogs + + +def test_style_sets_visible_label_and_dark_background(qapp): + # The style must define a light label colour and a non-default background + # (the fix for the invisible black-on-black text). + assert 'color: rgb(212,212,212)' in dialogs.DIALOG_STYLE + assert 'background-color: rgb(40,40,40)' in dialogs.DIALOG_STYLE + + +def test_build_message_box_applies_style_and_content(qapp): + box = dialogs.build_message_box( + None, QtWidgets.QMessageBox.Question, 'Title here', 'Body text', + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, + default=QtWidgets.QMessageBox.No) + assert isinstance(box, QtWidgets.QMessageBox) + assert box.text() == 'Body text' + if sys.platform != 'darwin': + # Qt's Cocoa (macOS) integration treats QMessageBox as a native + # alert panel -- per Apple HIG, alerts have no title bar -- and + # doesn't retain the windowTitle property for it specifically (other + # widget types aren't affected). build_message_box still calls + # setWindowTitle unconditionally since it's meaningful on every other + # platform (and harmless here); only the readback assertion is + # platform-gated. Reproduced on stock PyQt5 with no styling applied + # at all, so this is not something dialogs.py's theming can fix. + assert box.windowTitle() == 'Title here' + # The dark theme is actually applied to this box. + assert 'rgb(212,212,212)' in box.styleSheet() + assert box.standardButtons() == (QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No) + assert box.defaultButton() == box.button(QtWidgets.QMessageBox.No) + box.deleteLater() + + +def test_each_button_is_styled_directly(qapp): + # Regression: the descendant selector didn't reach the standard buttons, + # so each button must carry the button stylesheet itself (visible text + + # border). + box = dialogs.build_message_box( + None, QtWidgets.QMessageBox.Question, 't', 'b', + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No) + buttons = box.buttons() + assert len(buttons) == 2 + for button in buttons: + sheet = button.styleSheet() + assert 'color: rgb(212,212,212)' in sheet + assert 'border: 1px solid' in sheet + box.deleteLater() + + +def test_apply_dark_titlebar_never_raises(qapp): + # No-op off Windows; on Windows it best-effort sets DWM attributes and must + # swallow any failure (e.g. an offscreen/invalid HWND). + box = dialogs.build_message_box(None, QtWidgets.QMessageBox.Information, 't', 'b') + dialogs.apply_dark_titlebar(box) # must not raise + box.deleteLater() + + +def test_build_message_box_supports_detailed_text(qapp): + box = dialogs.build_message_box( + None, QtWidgets.QMessageBox.Critical, 'Crash', 'Something failed', + buttons=QtWidgets.QMessageBox.Yes | QtWidgets.QMessageBox.No, + detailed='full traceback here') + assert box.detailedText() == 'full traceback here' + # The detailed-text pane is a QTextEdit, also styled for visibility. + assert 'QTextEdit' in box.styleSheet() + box.deleteLater() diff --git a/code/tests/test_getfragdb.py b/code/tests/test_getfragdb.py new file mode 100644 index 0000000..ebbd84a --- /dev/null +++ b/code/tests/test_getfragdb.py @@ -0,0 +1,114 @@ +"""Unit tests for the MSP fragmentation-database importer (``getfragdb.py``). + +Covers the two parsers (``importfrag_v1`` Progenesis-style, ``importfrag_v2`` +MS-DIAL-style) and the format auto-detection wrapper (``importfrag``), which +previously had no coverage at all. Synthetic MSP fixtures are written to +``tmp_path``; a couple of smoke checks run against the real example files at +the repo root (skipped when absent). +""" + +from pathlib import Path + +import pytest + +import getfragdb + +REPO_ROOT = Path(__file__).resolve().parents[2] + + +# Progenesis-style: "Name: Unknown ()" with parenthetical id, no pipes. +PROGENESIS_MSP = ( + "Name: Unknown (0.80_627.2171n)\n" + "PrecursorMZ: 627.2171\n" + "Num Peaks: 2\n" + "418.1451 257254\n" + "200.1000 5000\n" + "\n" + "Name: Unknown (1.20_300.1000n)\n" + "PrecursorMZ: 300.1\n" + "Num Peaks: 1\n" + "150.0500 100\n" +) + +# MS-DIAL-style: "NAME: ...|ID=|MZ=|RT=" with pipes, PRECURSORMZ + RETENTIONTIME. +MSDIAL_MSP = ( + "NAME: Unknown|ID=0|MZ=150.0267|RT=9.09\n" + "PRECURSORMZ: 150.0267\n" + "RETENTIONTIME: 9.0898957\n" + "Num Peaks: 2\n" + "56.0500 334\n" + "70.0600 120\n" + "\n" + "NAME: Unknown|ID=1|MZ=200.1000|RT=3.50\n" + "PRECURSORMZ: 200.1\n" + "RETENTIONTIME: 3.5\n" + "Num Peaks: 1\n" + "99.0000 50\n" +) + + +def _write(tmp_path, name, text): + p = tmp_path / name + p.write_text(text) + return p + + +def test_importfrag_v1_parses_progenesis_ids_and_peaks(tmp_path): + db = getfragdb.importfrag_v1(_write(tmp_path, 'p.msp', PROGENESIS_MSP)) + assert set(db.ions.keys()) == {'0.80_627.2171n', '1.20_300.1000n'} + # First entry has 2 peaks parsed into an (n, 2) array. + first = db.ions['0.80_627.2171n'] + assert first.pattern.shape == (2, 2) + assert first.pattern[0][0] == pytest.approx(418.1451) + assert db.ions['1.20_300.1000n'].pattern.shape == (1, 2) + + +def test_importfrag_v2_parses_msdial_rt_mz_keyed_ids(tmp_path): + db = getfragdb.importfrag_v2(_write(tmp_path, 'd.msp', MSDIAL_MSP)) + # name = f"{round(rt,3)}_{precursormz}" using the raw PRECURSORMZ string. + assert '9.09_150.0267' in db.ions + assert '3.5_200.1' in db.ions + assert db.ions['9.09_150.0267'].pattern.shape == (2, 2) + assert db.ions['9.09_150.0267'].fragparams['PRECURSORMZ'] == '150.0267' + + +def test_importfrag_autodetects_progenesis(tmp_path): + db = getfragdb.importfrag(_write(tmp_path, 'p.msp', PROGENESIS_MSP)) + # Progenesis ids are the parenthetical comment, not RT_mz keys. + assert '0.80_627.2171n' in db.ions + + +def test_importfrag_autodetects_msdial(tmp_path): + db = getfragdb.importfrag(_write(tmp_path, 'd.msp', MSDIAL_MSP)) + assert '9.09_150.0267' in db.ions + + +def test_importfrag_v2_skips_entries_without_rt_or_precursor(tmp_path): + # An entry missing RETENTIONTIME/PRECURSORMZ must be dropped, not crash. + msp = ( + "NAME: Unknown|ID=0|MZ=150\n" + "Num Peaks: 1\n" + "56.05 334\n" + "\n" + "NAME: Unknown|ID=1|MZ=200.1|RT=3.5\n" + "PRECURSORMZ: 200.1\n" + "RETENTIONTIME: 3.5\n" + "Num Peaks: 1\n" + "99.0 50\n" + ) + db = getfragdb.importfrag_v2(_write(tmp_path, 'd.msp', msp)) + assert list(db.ions.keys()) == ['3.5_200.1'] + + +@pytest.mark.parametrize('name', ['progenesis.msp', 'msdial.msp']) +def test_importfrag_on_real_example_files(name): + path = REPO_ROOT / name + if not path.exists(): + pytest.skip(name + ' not present') + db = getfragdb.importfrag(path) + assert len(db.ions) > 0 + # Every parsed ion's peak array is either empty (an entry with 0 peaks -- + # which does occur in the real MS-DIAL export) or a proper (n, 2) array. + for entry in db.ions.values(): + assert entry.pattern.size == 0 or ( + entry.pattern.ndim == 2 and entry.pattern.shape[1] == 2) diff --git a/code/tests/test_mpactupdate.py b/code/tests/test_mpactupdate.py new file mode 100644 index 0000000..b8130a3 --- /dev/null +++ b/code/tests/test_mpactupdate.py @@ -0,0 +1,144 @@ +"""Unit tests for the MPACT self-update checker (``mpactupdate.py``). + +The GitHub API and the git call are both injected, so no network or +subprocess is touched. +""" + +import io +import json + +import mpactupdate as mu + + +class _FakeResponse(io.BytesIO): + def __enter__(self): + return self + + def __exit__(self, *exc): + self.close() + return False + + +def _release_opener(tag, *, html_url='http://x/rel', body='notes', record=None): + payload = json.dumps({'tag_name': tag, 'html_url': html_url, 'body': body}).encode() + + def opener(request, timeout=None): + if record is not None: + # request is a urllib Request; record its full URL. + record.append(getattr(request, 'full_url', request)) + return _FakeResponse(payload) + return opener + + +# --------------------------------------------------------------------------- # +# version comparison +# --------------------------------------------------------------------------- # + +def test_is_newer_basic(): + assert mu.is_newer('2.1.0', '2.0.0') is True + assert mu.is_newer('2.0.0', '2.0.0') is False + assert mu.is_newer('1.9.0', '2.0.0') is False + + +def test_is_newer_strips_v_prefix(): + assert mu.is_newer('v2.1.0', '2.0.0') is True + assert mu.is_newer('V2.0.1', 'v2.0.0') is True + + +def test_is_newer_numeric_not_lexicographic(): + # 2.10.0 > 2.9.0 numerically (lexicographically it would be "<"). + assert mu.is_newer('2.10.0', '2.9.0') is True + + +def test_is_newer_unparseable_tag_is_not_newer(): + assert mu.is_newer('not-a-version', '2.0.0') is False + + +# --------------------------------------------------------------------------- # +# release fetch + check +# --------------------------------------------------------------------------- # + +def test_check_reports_available_update(): + info = mu.check_for_update(current_version='2.0.0', + opener=_release_opener('v2.5.0')) + assert info.available is True + assert info.latest == 'v2.5.0' + assert info.url == 'http://x/rel' + assert info.notes == 'notes' + + +def test_check_reports_no_update_when_same_version(): + info = mu.check_for_update(current_version='2.0.0', + opener=_release_opener('v2.0.0')) + assert info.available is False + assert info.current == '2.0.0' + + +def test_check_hits_the_configured_repo(): + seen = [] + mu.check_for_update(current_version='2.0.0', repo='robertsamples/mpact', + opener=_release_opener('v2.0.0', record=seen)) + assert seen == ['https://api.github.com/repos/robertsamples/mpact/releases/latest'] + + +def test_check_is_safe_when_offline(): + def failing_opener(request, timeout=None): + raise OSError('no network') + info = mu.check_for_update(current_version='2.0.0', opener=failing_opener) + assert info.available is False + assert info.latest is None + + +def test_check_is_safe_when_no_releases_yet(): + # GitHub returns 404 -> urlopen raises HTTPError -> fetch returns None. + def opener_404(request, timeout=None): + raise OSError('HTTP 404') + info = mu.check_for_update(current_version='2.0.0', opener=opener_404) + assert info.available is False + + +def test_fetch_handles_malformed_json(): + def opener(request, timeout=None): + return _FakeResponse(b'not json at all') + assert mu.fetch_latest_release(opener=opener) is None + + +# --------------------------------------------------------------------------- # +# git update +# --------------------------------------------------------------------------- # + +class _Completed: + def __init__(self, returncode, stdout='', stderr=''): + self.returncode = returncode + self.stdout = stdout + self.stderr = stderr + + +def test_apply_git_update_success(): + calls = [] + + def runner(cmd, **kwargs): + calls.append(cmd) + return _Completed(0, stdout='Updating abc..def\nFast-forward\n') + + ok, output = mu.apply_git_update('/repo', runner=runner) + assert ok is True + assert 'Fast-forward' in output + assert calls[0][:3] == ['git', '-C', '/repo'] + assert 'pull' in calls[0] and '--ff-only' in calls[0] + + +def test_apply_git_update_reports_failure(): + def runner(cmd, **kwargs): + return _Completed(1, stderr='error: local changes would be overwritten') + ok, output = mu.apply_git_update('/repo', runner=runner) + assert ok is False + assert 'local changes' in output + + +def test_apply_git_update_handles_missing_git(): + def runner(cmd, **kwargs): + raise FileNotFoundError('git not found') + ok, output = mu.apply_git_update('/repo', runner=runner) + assert ok is False + assert 'could not be run' in output diff --git a/code/tests/test_msfast_grpave_off.py b/code/tests/test_msfast_grpave_off.py new file mode 100644 index 0000000..806cc2b --- /dev/null +++ b/code/tests/test_msfast_grpave_off.py @@ -0,0 +1,74 @@ +"""Regression test for the ``groupionlists`` defensive-init fix in +``run_MSFaST`` (MSFaST.py). + +``groupionlists`` is only *populated* inside ``if analysis_params.grpave:``, +but is referenced unconditionally afterwards (the ``groupionlists['cv'/...]`` +assignments and the iondict ``groups``-column loop). The GUI hardcodes +``grpave=True`` so this never fired in production, but a minimal run with +``grpave=False`` would have raised ``NameError`` before the fix. This drives +exactly that path against the bundled example dataset, with every optional +filter/stat turned off, and asserts the run completes and returns a result. +""" + +from pathlib import Path + +import pytest + +from MSFaST import AnalysisResult, analysis_parameters, run_MSFaST + +REPO_ROOT = Path(__file__).resolve().parents[2] +EXAMPLE_DIR = REPO_ROOT / 'rawdata' / 'PTY087I2' +ALL_GROUPS = ['Blanks', 'Media', '0um_Ce', '250um_Ce'] + + +def _minimal_params(tmp_path): + """Everything that gates a filtering/stats stage turned OFF, so the run + exercises the no-grpave branch. Threshold/echo-only fields still have to + be present because analysisinfo.txt prints them verbatim.""" + params = analysis_parameters() + params.filename = EXAMPLE_DIR / '200826_PTY087I2codingdataset.csv' + params.samplelistfilename = EXAMPLE_DIR / 'samplelist.csv' + params.extractmetadatafilename = EXAMPLE_DIR / 'extractmetadata.csv' + params.outputdir = tmp_path / params.filename.stem + params.outputdir.mkdir(parents=True) + + # All optional stages OFF -- this is the configuration that used to crash. + params.relfil = False + params.merge = False + params.grpave = False + params.prperr = False + params.blnkfltr = False + params.CVfil = False + params.decon = False + params.FC = False + params.Ttest = False + + # Thresholds / echo-only fields (printed into analysisinfo.txt). + params.ringingwin = 0.5 + params.isopeakwin = 0.01 + params.dimerpeakwin = 0.01 + params.RTwin = 0.005 + params.maxisowin = 3 + params.blnkgrp = '' + params.cvthresh = 0.2 + params.statstgrps = ['250um_Ce', '0um_Ce'] + params.graphfilters = [] + params.MZRTplt = params.FC3Dplt = params.KMD = False + params.PCA = params.Dendrogram = params.Volcanoplt = False + + # No Plot Feature Sets configured -> empty querylist/querydict. + params.querydict = {} + params.querylist = [] + return params + + +def test_run_msfast_with_grpave_off_does_not_raise(tmp_path): + params = _minimal_params(tmp_path) + result = run_MSFaST(params) # used to raise NameError on groupionlists + assert isinstance(result, AnalysisResult) + assert isinstance(result.groupionlists, dict) + # The three filter keys are always added, each empty since every filter is off. + assert result.groupionlists == {'cv': [], 'relfil': [], 'insource': []} + # With no filters applied, the filtered table is written and analysisinfo exists. + assert (params.outputdir / (params.filename.stem + '_filtered.csv')).exists() + assert (params.outputdir / 'analysisinfo.txt').exists() diff --git a/code/tests/test_msfast_pipeline.py b/code/tests/test_msfast_pipeline.py index 71275c5..4f424e6 100644 --- a/code/tests/test_msfast_pipeline.py +++ b/code/tests/test_msfast_pipeline.py @@ -136,3 +136,37 @@ def test_analysisinfo_written(pipeline_result): assert info_path.exists() text = info_path.read_text() assert 'Features passing all filters' in text + + +def test_fold_change_is_clamped_to_bounds(pipeline_result): + """runfc clamps FC into [0.01, 100]; nothing should escape that range.""" + params, _ = pipeline_result + iondict = pd.read_csv(params.outputdir / 'iondict.csv', sep=',', header=[0], index_col=[0]) + fc = iondict['fc'].dropna() + assert len(fc) > 0 + assert fc.min() >= 0.01 + assert fc.max() <= 100.0 + + +def test_stats_outputs_land_in_output_dir_not_cwd(pipeline_result): + """The t-test/q-value tables now write into the run's output directory + (previously bare 'msdata_teststats_test.csv'/'qdata.csv' in the cwd).""" + params, _ = pipeline_result + assert (params.outputdir / (params.filename.stem + '_teststats.csv')).exists() + assert (params.outputdir / (params.filename.stem + '_qvalues.csv')).exists() + + +def test_qvalues_are_finite_positive_and_consistent_with_logq(pipeline_result): + """The BH q-values must be finite and positive, and the persisted '-logq' + column must equal -log10(qval) (the relationship runttest derives). Strict + p-ascending monotonicity is deliberately NOT asserted: the cummin step-up + only guarantees it in the loop's processing order, and tied p-values can + reorder under an independent re-sort -- a known BH tie subtlety, not a bug.""" + import numpy as np + params, _ = pipeline_result + qdata = pd.read_csv(params.outputdir / (params.filename.stem + '_qvalues.csv'), + sep=',', header=[0]) + qval = qdata['qval'].to_numpy() + assert np.isfinite(qval).all() + assert (qval > 0).all() + np.testing.assert_allclose(qdata['-logq'].to_numpy(), -np.log10(qval), rtol=1e-9) diff --git a/code/tests/test_npatlasupdate.py b/code/tests/test_npatlasupdate.py new file mode 100644 index 0000000..ebfd2de --- /dev/null +++ b/code/tests/test_npatlasupdate.py @@ -0,0 +1,144 @@ +"""Unit tests for the NPAtlas updater (``npatlasupdate.py``). + +The network is never touched: ``download_atlas`` takes an injectable +``opener``, so tests feed canned bytes through a fake response object and +assert the staleness logic, header validation, and atomic-replace behaviour. +""" + +import io +import os +import time + +import pytest + +import npatlasupdate as nu + + +# --------------------------------------------------------------------------- # +# fixtures / helpers +# --------------------------------------------------------------------------- # + +VALID_HEADER = ( + 'npaid\tcompound_id\tcompound_name\tcompound_m_plus_h\tcompound_m_plus_na\t' + 'compound_smiles\torigin_type\tgenus\n' +) +VALID_TSV = VALID_HEADER + '1\t0.80_418n\tFoo\t419.1\t441.1\tCCO\tBacterium\tStreptomyces\n' + + +class _FakeResponse(io.BytesIO): + """A BytesIO that also works as a context manager (like urlopen's result).""" + def __enter__(self): + return self + + def __exit__(self, *exc): + self.close() + return False + + +def _opener_returning(content_bytes, record=None): + def opener(url, timeout=None): + if record is not None: + record.append(url) + return _FakeResponse(content_bytes) + return opener + + +# --------------------------------------------------------------------------- # +# staleness +# --------------------------------------------------------------------------- # + +def test_age_is_none_when_missing(tmp_path): + assert nu.atlas_age_days(tmp_path / 'nope.tsv') is None + + +def test_update_due_when_missing(tmp_path): + assert nu.is_update_due(tmp_path / 'nope.tsv') is True + + +def test_update_not_due_for_fresh_file(tmp_path): + p = tmp_path / 'npatlas.tsv' + p.write_text(VALID_TSV) + # Just created -> age ~0 days -> not due. + assert nu.is_update_due(p, max_age_days=30) is False + + +def test_update_due_for_old_file(tmp_path): + p = tmp_path / 'npatlas.tsv' + p.write_text(VALID_TSV) + # Backdate mtime to 45 days ago. + old = time.time() - 45 * 86400 + os.utime(p, (old, old)) + assert nu.is_update_due(p, max_age_days=30) is True + assert nu.atlas_age_days(p) == pytest.approx(45, abs=0.1) + + +# --------------------------------------------------------------------------- # +# header validation +# --------------------------------------------------------------------------- # + +def test_validate_header_accepts_full_header(): + assert nu.validate_tsv_header(VALID_HEADER) is True + + +def test_validate_header_rejects_missing_columns(): + assert nu.validate_tsv_header('npaid\tcompound_id\tgenus\n') is False + + +def test_validate_header_rejects_html_error_page(): + assert nu.validate_tsv_header('') is False + + +# --------------------------------------------------------------------------- # +# download (atomic + validated) +# --------------------------------------------------------------------------- # + +def test_download_writes_validated_tsv(tmp_path): + dest = tmp_path / 'npatlas.tsv' + seen = [] + n = nu.download_atlas(dest, url='http://example/atlas.tsv', + opener=_opener_returning(VALID_TSV.encode(), record=seen)) + assert dest.exists() + assert n == len(VALID_TSV.encode()) + assert nu.validate_tsv_header(dest.read_text().splitlines(keepends=True)[0]) + assert seen == ['http://example/atlas.tsv'] + + +def test_download_overwrites_existing_atlas_atomically(tmp_path): + dest = tmp_path / 'npatlas.tsv' + dest.write_text('OLD CONTENT') + nu.download_atlas(dest, opener=_opener_returning(VALID_TSV.encode())) + assert 'OLD CONTENT' not in dest.read_text() + assert dest.read_text() == VALID_TSV + + +def test_invalid_download_is_rejected_and_existing_atlas_preserved(tmp_path): + dest = tmp_path / 'npatlas.tsv' + dest.write_text(VALID_TSV) # a good existing atlas + bad = b'503 Service Unavailable' + with pytest.raises(ValueError): + nu.download_atlas(dest, opener=_opener_returning(bad)) + # The good atlas must be untouched, and no temp files left behind. + assert dest.read_text() == VALID_TSV + leftovers = [f for f in os.listdir(tmp_path) if f.startswith('.npatlas_')] + assert leftovers == [] + + +def test_empty_download_is_rejected(tmp_path): + dest = tmp_path / 'npatlas.tsv' + dest.write_text(VALID_TSV) + with pytest.raises(ValueError): + nu.download_atlas(dest, opener=_opener_returning(b'')) + assert dest.read_text() == VALID_TSV + + +def test_network_error_leaves_existing_atlas(tmp_path): + dest = tmp_path / 'npatlas.tsv' + dest.write_text(VALID_TSV) + + def failing_opener(url, timeout=None): + raise OSError('connection refused') + + with pytest.raises(OSError): + nu.download_atlas(dest, opener=failing_opener) + assert dest.read_text() == VALID_TSV + assert [f for f in os.listdir(tmp_path) if f.startswith('.npatlas_')] == [] diff --git a/code/tests/test_ordination.py b/code/tests/test_ordination.py index 992434e..89de181 100644 --- a/code/tests/test_ordination.py +++ b/code/tests/test_ordination.py @@ -47,8 +47,9 @@ def test_uncollapsed_keeps_one_row_per_injection(tmp_path): assert len(biolgroup) == 9 -def test_collapsed_averages_technical_not_biological_replicates(tmp_path, monkeypatch): - monkeypatch.chdir(tmp_path) # 'averagepca.csv' lands here, not the repo +def test_collapsed_averages_technical_not_biological_replicates(tmp_path): + # averagepca.csv (the collapse round-trip scratch file) is written next + # to the input peak table, i.e. into tmp_path here -- no chdir needed. path = tmp_path / 'example_filtered.csv' _write_synthetic_filtered_csv(path) x, biolgroup = load_ordination_matrix(path, _raw_header(path), collapse_replicates=True) @@ -65,8 +66,7 @@ def test_collapsed_averages_technical_not_biological_replicates(tmp_path, monkey assert (biolgroup == 'groupB').sum() == 1 -def test_collapsed_values_are_the_mean_of_their_technical_replicates(tmp_path, monkeypatch): - monkeypatch.chdir(tmp_path) +def test_collapsed_values_are_the_mean_of_their_technical_replicates(tmp_path): path = tmp_path / 'example_filtered.csv' _write_synthetic_filtered_csv(path) x, _ = load_ordination_matrix(path, _raw_header(path), collapse_replicates=True) diff --git a/code/tests/test_qualityscore.py b/code/tests/test_qualityscore.py new file mode 100644 index 0000000..a3f1a65 --- /dev/null +++ b/code/tests/test_qualityscore.py @@ -0,0 +1,171 @@ +"""Unit tests for the CV-plot quality metrics (``qualityscore.py``). + +The key test is a *faithfulness* check: ``_reference_inline`` is a verbatim +copy of the original computation that used to live in ``plotting.prev_cv.plot`` +(before it was extracted into ``qualityscore``), and the extracted function is +asserted to reproduce its numbers exactly -- so the refactor provably did not +change any displayed Reproducibility / Skewness / Overall value. Plus a couple +of sanity checks and a run against the real bundled example dataset's iondict. +""" + +import math +from pathlib import Path + +import numpy as np +import pandas as pd +import pytest + +import qualityscore + +REPO_ROOT = Path(__file__).resolve().parents[2] +EXAMPLE_DIR = REPO_ROOT / 'rawdata' / 'PTY087I2' + + +def _reference_inline(iondict, average_n): + """Verbatim copy of the original prev_cv.plot() computation (pre-extraction), + returning (rep, sumskew, qualscore). Do not 'clean up' the ALGORITHM -- it + exists to pin the extracted module to the exact historical behaviour. + + One mechanical exception: the original assigned the rescaled percentage + back via ``.iloc[:, 0] = `` directly into an int64-dtype + rank column. That was already a FutureWarning on the pandas pinned in + qualityscore.py's own fix (see there); on a newer pandas resolved by CI + (this repo's tests.yml installs an unpinned ``pandas`` for Python 3.11) + the same line raises a hard TypeError instead, which would make this + reference function -- not the code under test -- unable to run at all. + Assigning by column LABEL instead of positional .iloc (replacing the + column's dtype wholesale rather than casting a float into the existing + int64 column) is dtype-mechanics only and was already proven + value-identical by qualityscore.py's own equivalent fix. + """ + iondict = iondict[~np.isnan(iondict['average CV'])] + iondictmean = iondict.sort_values(['average CV']).reset_index() + iondictmed = iondict.sort_values(['median CV']).reset_index() + iondictmean = iondictmean.reset_index() + iondictmed = iondictmed.reset_index() + mean_col0, med_col0 = iondictmean.columns[0], iondictmed.columns[0] + iondictmean[mean_col0] = 100 * iondictmean.iloc[:, 0] / len(iondictmean['average CV']) + iondictmed[med_col0] = 100 * iondictmed.iloc[:, 0] / len(iondictmed['median CV']) + modelstdevlist = [1] + [0] * (int(average_n) - 1) + modelstdev = pd.Series(modelstdevlist).std() / pd.Series(modelstdevlist).mean() + prevav = 0 + aucav = 0 + prevmed = 0 + aucmed = 0 + for pos in range(0, len(iondictmean.iloc[:, 0])): + dist = iondictmean.iloc[pos, :]['average CV'] - prevav + aucav += dist * iondictmean.iloc[pos, 0] + prevav = iondictmean.iloc[pos, :]['average CV'] + dist = iondictmed.iloc[pos, :]['median CV'] - prevmed + aucmed += dist * iondictmed.iloc[pos, 0] + prevmed = iondictmed.iloc[pos, :]['median CV'] + sumskew = 0 + if math.isnan(modelstdev): + modelstdev = 1.7 + for val in range(1, int((modelstdev * 100))): + pos = val / 100 + meanav = iondictmean[abs(iondictmean['average CV'] - pos - modelstdev / 200) < modelstdev / 200].iloc[:, 0].mean() + meanmed = iondictmed[abs(iondictmed['average CV'] - pos - modelstdev / 200) < modelstdev / 200].iloc[:, 0].mean() + skew = abs(meanmed - meanav) + if not np.isnan(skew): + sumskew += skew * modelstdev / 100 + sumskew = sumskew / ((aucmed + aucav) / 2) + rep = ((aucmed + aucav) / 2) / (modelstdev * 100) + qualscore = (1 - sumskew) * rep * 100 + return rep, sumskew, qualscore + + +def _synthetic_iondict(n=200, seed=0): + rng = np.random.RandomState(seed) + # Plausible CVs: mostly low, a tail of high ones; median a touch below mean. + avg = np.abs(rng.normal(0.15, 0.08, size=n)) + med = np.clip(avg - np.abs(rng.normal(0.01, 0.01, size=n)), 0, None) + return pd.DataFrame({'average CV': avg, 'median CV': med}, + index=[f'f{i}' for i in range(n)]) + + +# --------------------------------------------------------------------------- # +# noise model +# --------------------------------------------------------------------------- # + +def test_noise_model_cv_matches_count_statistics(): + # [1, 0, 0] -> std/mean of that series. + s = pd.Series([1, 0, 0]) + assert qualityscore.noise_model_cv(3) == pytest.approx(s.std() / s.mean()) + + +def test_noise_model_cv_falls_back_when_undefined(): + # average_n = 1 -> series is just [1] -> std is NaN -> fallback 1.7. + assert qualityscore.noise_model_cv(1) == 1.7 + + +# --------------------------------------------------------------------------- # +# faithful extraction +# --------------------------------------------------------------------------- # + +@pytest.mark.parametrize('average_n', [3, 4, 6]) +def test_matches_original_inline_on_synthetic(average_n): + iondict = _synthetic_iondict() + result = qualityscore.compute_cv_quality(iondict, average_n) + ref_rep, ref_skew, ref_qual = _reference_inline(iondict, average_n) + assert result.rep == pytest.approx(ref_rep) + assert result.sumskew == pytest.approx(ref_skew) + assert result.qualscore == pytest.approx(ref_qual) + + +def test_result_fields_are_finite_and_sensible(): + result = qualityscore.compute_cv_quality(_synthetic_iondict(), average_n=3) + assert np.isfinite(result.rep) and result.rep > 0 + assert np.isfinite(result.sumskew) + assert np.isfinite(result.qualscore) + # The rarefaction curves run from ~0 to 100% of features. + assert result.iondictmean.iloc[:, 0].max() == pytest.approx(100, abs=1) + + +def test_matches_original_inline_on_real_iondict(tmp_path): + """If the example dataset is present, run the real pipeline once and check + the extracted metrics match the original inline computation on the real + iondict.csv (the strongest faithfulness guarantee).""" + csv = EXAMPLE_DIR / '200826_PTY087I2codingdataset.csv' + if not csv.exists(): + pytest.skip('example dataset not present') + + from MSFaST import analysis_parameters, run_MSFaST + from groupsets import GroupSetModel, build_query_dict + + params = analysis_parameters() + params.filename = csv + params.samplelistfilename = EXAMPLE_DIR / 'samplelist.csv' + params.extractmetadatafilename = EXAMPLE_DIR / 'extractmetadata.csv' + params.outputdir = tmp_path / params.filename.stem + params.outputdir.mkdir(parents=True) + params.relfil = True; params.merge = True + params.ringingwin = 0.5; params.isopeakwin = 0.01; params.dimerpeakwin = 0.01 + params.RTwin = 0.005; params.maxisowin = 3 + params.grpave = True; params.prperr = True + params.blnkfltr = True; params.blnkgrp = 'Blanks'; params.blankfilthresh = 0.01 + params.CVfil = True; params.cvthresh = 0.2; params.cvparam = 'median CV' + params.decon = True; params.deconthresh = 0.95 + params.FC = False; params.Ttest = False + params.statstgrps = ['250um_Ce', '0um_Ce'] + params.graphfilters = ['cv', 'rel', 'insource'] + params.MZRTplt = params.FC3Dplt = params.KMD = False + params.PCA = params.Dendrogram = params.Volcanoplt = False + model = GroupSetModel() + model.add('Features not in blanks', all_groups=['Blanks', 'Media', '0um_Ce', '250um_Ce']) + model.update(0, src=['Media', '0um_Ce', '250um_Ce'], excl=['Blanks']) + params.querydict = build_query_dict(model, params.graphfilters) + params.querylist = list(params.querydict.keys()) + run_MSFaST(params) + + iondict = pd.read_csv(params.outputdir / 'iondict.csv', header=0, index_col=0) + header = pd.read_csv(params.outputdir / (params.filename.stem + '_filtered.csv'), + sep=',', header=None, index_col=[0, 1, 2]).iloc[:3, :].transpose() + header.columns = ['Biolgroup', 'Sample', 'Injection'] + average_n = header['Injection'].nunique() / header['Sample'].nunique() + + result = qualityscore.compute_cv_quality(iondict, average_n) + ref_rep, ref_skew, ref_qual = _reference_inline(iondict, average_n) + assert result.qualscore == pytest.approx(ref_qual) + assert result.rep == pytest.approx(ref_rep) + assert result.sumskew == pytest.approx(ref_skew) diff --git a/code/tests/test_translators.py b/code/tests/test_translators.py index c3c7641..71a28c8 100644 --- a/code/tests/test_translators.py +++ b/code/tests/test_translators.py @@ -200,3 +200,21 @@ def test_parse_real_fragment_files(name): entries = t.parse_fragments(path) assert len(entries) > 0 assert all(e.mz is not None for e in entries) + + +def test_reindex_real_progenesis_msp_against_real_peaktable(tmp_path): + """End-to-end reindex on the bundled Progenesis MSP + peak table -- the + compound-id fast path should match (nearly) every entry, and the output + must be parseable and renumbered into ascending row order.""" + pk = REPO_ROOT / 'progenesis.csv' + msp = REPO_ROOT / 'progenesis.msp' + if not (pk.exists() and msp.exists()): + pytest.skip('progenesis example files not present') + out = tmp_path / 'reindexed.msp' + n = t.reindex_fragments(pk, msp, out) + assert n > 0 + # Output parses back to exactly the matched count, with assigned scan numbers. + reparsed = t.parse_msp(out) + assert len(reparsed) == n + text = out.read_text() + assert 'SCANNUMBER:' in text diff --git a/devnotes.md b/devnotes.md index cb6689e..d68ff0c 100644 --- a/devnotes.md +++ b/devnotes.md @@ -695,3 +695,372 @@ mid-session. Most recent follow-ups: heatmap W/S selection had no bounds clamping (`mv_heatmap`, could crash or silently wrap past either end of the feature list); the six per-plot dicts were consolidated into `PlotSlotRegistry` (`plotslots.py`). 65 passing tests. + +## Code review pass (dev branch, 2026-06-30) + +Full read-through of every hand-written, Qt-free module plus the docs and +TODO block. The codebase is in good shape; findings were modest. Test count +is now **159 passing** (the count above is stale). + +### Fixes applied on this branch (low-risk, test-validated) + +- **`MSFaST.py` `analysisinfo.txt` decon label was a copy-paste bug**: the + `if analysis_params.decon:` branch wrote "Features failing **blank** + filtering" (a verbatim copy of the blank-filter line above it). Corrected + to "Features failing in-source/deconvolution filtering". Confirmed against + `main.py`'s parallel data-review summary writer (`_finish_analysis`, + ~line 1208), which already labels the same quantity correctly as + "in-source ion filtering" — so the two writers now agree. Also fixed two + user-facing typos in the same file: "Runetime" -> "Runtime" and + "PCA unfitlered" -> "PCA unfiltered". These are pure string-label changes, + not the risky re-read logic the analysisinfo backlog item warns about. +- **`MSFaST.run_MSFaST` latent `NameError` on `groupionlists`**: it was + only initialised inside `if analysis_params.grpave:`, but referenced + unconditionally further down (the `groupionlists['cv'/'relfil'/'insource']` + writes and the groups-column loop) and inside the blank-filter block. + The GUI hardcodes `grpave = True` (`main.py:~1335`), so this never fired + in practice, but a loaded session or a test with `grpave=False` would + crash. Added a defensive `groupionlists = {}` next to `ionfilters = {}`. + Behaviour unchanged when `grpave=True` (`parsionlists` reassigns it). +- **Stray debug CSVs written to the current working directory** (which is + `code/` in the deployed app, per `run.bat`): `stats.py` wrote + `msdata_teststats_test.csv` (a debug-named file, never read back) and + `qdata.csv` (never read back — the canonical `-logq` goes into + `iondict.csv`), and `ordination.py` wrote `averagepca.csv` (an internal + collapse round-trip scratch file). All three now write into the run's + output directory: `_teststats.csv`, `_qvalues.csv`, and + `averagepca.csv` next to the input peak table respectively. The + pre-existing leftover copies sitting untracked in `code/` + (`qdata.csv`, `msdata_teststats_test.csv`, `averagepca.csv`) are now + obsolete and safe to delete — they will no longer be regenerated there. +- **Dead code removal** (`stats.py` `groupave`): a per-injection + `variance_values`/`stddev_values` was computed but never used (the + technical/biological RSDs are derived from grouped means, not from it). + Removing it made the entire sum-of-squares accumulation chain dead too + (`sum_squares_list`/`sum_squares_chunk`/`all_sum_squares`/`sum_squares_df`), + so that's gone as well — a small but real per-chunk optimization (drops a + `(chunk ** 2).groupby(...).sum()` on every chunk of the formatted table). + Validated by `tests/test_msfast_pipeline.py`, which runs the real + `groupave` against the bundled example dataset. Also dropped an unused + `from pathlib import Path` in `MSFaST.py`. + +### Findings NOT changed (need a decision or live-GUI validation) + +- **The "Or Groups" (`src`) control not being applied is intended, NOT a + bug** (confirmed by the developer, 2026-06-30). The groupset editor has + three lists — And (`listWidget_andgrps` -> `incl`, feature must be in all), + Exclude (`listWidget_allgrps` -> `excl`, feature must not be in any), and + Or (`listWidget_orgrps` -> `src`, the groups a feature is *allowed* to + appear in). `MSFaST.groupset.__init__` deliberately filters only on `incl` + and `excl`: a feature that already satisfies And/Exclude is a member of the + groupset, and `src` ("allowed in") by design doesn't further remove it, so + there's nothing for `src` to do at filter time. This matches the observed + behaviour. (Earlier in this review pass it was mis-flagged as an inert + control — that was wrong; leaving the note here so it isn't re-flagged.) +- **`mspwriter.convert_to_msp` num-peaks loop is fragile.** `for frags in + sources: numpeaks = len(frags)` overwrites rather than accumulates, and + assumes `sources` is a list-of-one-list. It happens to be correct for the + only live caller (the decon path, where `ionmerge.sources == [[frag,...]]`), + but would silently miscount if ever called on a `relationalfilter`-shaped + merge (flat list of id strings) — `len(frags)` would then be a string + length and the inner `for fragment in frags:` would iterate characters. + Left as-is (single caller, wrapped in try/except), but worth hardening if + the MSP writer is ever reused. +- ~~Docs repo-URL inconsistency~~ — **resolved.** Canonical repo is + `github.com/robertsamples/mpact` (confirmed by the developer); + `docs/installation.md`'s `git clone` line was corrected from `BalunasLab` to + `robertsamples` to match `mkdocs.yml`/`docs/index.md`. `docs/index.md`'s + stale "multivariate analysis (NMDS)" blurb was also updated to + "(PCA/NMDS/PLS-DA)". + +- **Two orphaned/broken scratch scripts in `code/`.** + `npatlassearch.py` reads `npatlas.csv` (the real file is `npatlas.tsv`) at + module top level and references an undefined `indigo`/`renderer`, so it + would crash if ever imported/run — but nothing imports it. + `masstdriver.py` is referenced only by a commented-out import in + `ui_functions.py`. Both are dead leftover dev scratch, not part of the + running app. Flagged rather than deleted (pre-existing files; the auto-mode + classifier has blocked deleting UI-adjacent files before). Safe to remove + once you confirm you don't want them as references. + +### Already-logged items re-confirmed still open (see backlog above) + +- `exportgnps()` duplicating `translators.reindex_fragments` matching logic. +- `iondict.csv` read-modify-write chain across `filter.py`/`stats.py`. +- `run_MSFaST`'s blank-filter `_formatted.csv` re-read (the "risky kind"). +- Lazy per-tab plot updates; generalizing the `ui_plot` subclasses. + +### Test-suite assessment + +The suite is well-targeted and not redundant — each test guards a specific +behaviour or a previously-fixed bug (the PLS-DA `scale=` regression, the +replicate-collapse structure, the dendrogram purity edge cases, the +end-to-end pipeline). No tests are recommended for removal. Gaps worth +filling when convenient (all Qt-free, so headless-testable): +- `translators.reindex_fragments` / `filter_source_peaktable` end-to-end on + the bundled MSP/MGF + peak tables (currently only smaller-unit coverage). +- `getfragdb.importfrag` format auto-detection (Progenesis vs MS-DIAL MSP). +- A `run_MSFaST` variant with `grpave=False`/minimal filters to lock in the + `groupionlists` defensive-init fix above. +- `stats.runfc`/`runttest` numeric outputs (FC clamping, q-value monotonicity) + against a tiny synthetic `iondict.csv`. + +**Update (2026-06-30, second pass):** all four gaps above are now filled — +`tests/test_translators_e2e.py`, `tests/test_getfragdb.py`, +`tests/test_msfast_grpave_off.py`, `tests/test_stats_numeric.py`. Plus the +three new subsystems below ship with their own Qt-free tests +(`test_npatlasupdate.py`, `test_mpactupdate.py`, `test_crashreport.py`). + +## Future feature dev plan (post-review, 2026-06-30) + +Candidate features, ordered roughly by value-to-effort. None started; all +need the GUI runnable against real data to validate. Several already appear +in `main.py`'s TODO block — this is the triaged version. + +1. ~~Wire up the "Or Groups" groupset constraint~~ — **withdrawn**: not a + bug, the `src` "allowed in" semantics are intended (see finding above). +2. ~~Data-quality score / summary~~ — **partially done.** The score the TODO + asked for already existed (Reproducibility / Skewness / Overall, from the + AUC of the CV rarefaction curve), but the math was buried untested inside + `plotting.prev_cv.plot()`. Extracted verbatim into the Qt-free, unit-tested + `qualityscore.py` (`compute_cv_quality`), pinned against a copy of the + original by `tests/test_qualityscore.py` so no displayed number changed; + `prev_cv` is now a thin draw-the-result wrapper. (Also fixed a latent pandas + FutureWarning in the extracted percentage assignment.) Remaining/optional: + surface the score outside the CV tab (e.g. Data Review summary), and fold in + the other available signals (per-group RSDs from `_summarydata.csv`, the + dendrogram purity `n_pure/n_total`) if a richer composite is wanted -- both + are scientific-design calls to make with the lab, not coded blind. +3. **OPLS-DA ordination method** (next item after the PCA/NMDS/PLS-DA rework, + already deferred — see "Multivariate ordination plot"). Needs either the + unmaintained `pyopls` or a from-scratch OSC implementation plus a + reference dataset to validate against. +4. **Status-bar terminal/log viewer** (TODO). Replace the static status + strings with a live log line + an expandable full-output pane. Mostly a + Qt plumbing task (route the existing `print()` progress through a + `QPlainTextEdit`/signal); no scientific risk. +5. **Additional databases beyond NPAtlas** (TODO: HMDB etc.). `dbsearch.py` + is already a clean Qt-free ppm-window matcher taking an `atlas` DataFrame + — adding a second source is mostly a loader + a column-name adapter, and + the matching core is reusable as-is. +6. **`exportgnps()` migration onto `translators`** (backlog). Correctness + + maintenance win, not a new feature: replace the ~210-line hand-rolled + O(n·m) MGF matcher with the tested `reindex_fragments`/ + `filter_source_peaktable` path. +7. **Specificity/sensitivity & comparison-mode plots** (TODO, "likely items + that need more thought"). Larger scientific-design questions; needs spec + work with the lab before implementation. + +## New subsystems (2026-06-30, second pass) + +Three new Qt-free, unit-tested modules plus thin GUI wiring in `main.py`. The +cores are fully testable headlessly (network/git/dialog all injected); the +GUI wiring (`MainWindow._run_startup_checks`/`_check_atlas_freshness`/ +`_check_app_update` and the `__main__` crash-dialog) is the only part that +needs a live launch to verify. **No new hard dependencies** — all three use +only the stdlib (`urllib`, `json`, `subprocess`, `webbrowser`, `platform`) +plus `packaging` (already present, with a tuple-comparison fallback), so +`requirements.txt`/the portable build are unaffected. + +### NPAtlas auto-updater (`npatlasupdate.py`, `tests/test_npatlasupdate.py`) + +On startup (deferred via `QTimer.singleShot` so the window paints first), if +`npatlas.tsv` is missing or its mtime is > 30 days old, the user is asked +whether to re-download it from +`https://www.npatlas.org/static/downloads/NPAtlas_download.tsv`. The download +streams to a temp file, is **validated** (header must contain the columns the +app uses — `compound_id`/`compound_m_plus_h`/`compound_m_plus_na`/ +`compound_smiles`/`origin_type`/`genus`) and only then `os.replace`-d over the +existing file, so a server error page / partial transfer / network drop can +never clobber a working atlas. + +- **Format decision (asked: would changing format help?): no — stay on TSV.** + `main.py` reads the atlas with `pd.read_csv(sep='\t')` and `dbsearch` keys + off the specific columns above; the published `NPAtlas_download.tsv` already + has exactly those, so it's a drop-in. The `NPAtlas_download.json` is the + same data in a nested shape that would need flattening before pandas/dbsearch + could touch it — pure cost, no benefit. The `.json` URL is recorded in the + module (`DEFAULT_JSON_URL`) only for completeness. +- **Refactor evaluation (asked): minimal and not needed now.** `dbsearch.py` + is already the clean Qt-free matcher; the only related cleanup is that the + atlas read in `main.py:enumerate_inputs` (`pd.read_csv('npatlas.tsv', ...)`) + is hardcoded to that filename/cwd — the updater writes to the same path, so + no change required. If a second database is added later (HMDB etc., dev-plan + item 5), factor the atlas load + column-name mapping into a small loader then. +- **Threading caveat:** the 33 MB download currently runs on the main thread + behind a wait cursor. It's a user-confirmed, infrequent (>30-day-gated) + action so blocking briefly is acceptable, but moving it onto a `QThread` + worker (like `AnalysisWorker`) is the obvious future improvement — left out + here because GUI threading can't be verified headlessly. + +### MPACT self-update checker (`mpactupdate.py`, `tests/test_mpactupdate.py`) + +On startup, queries the GitHub Releases API for the configured repo +(`robertsamples/mpact` by default — Robert's fork), compares the latest +published release tag against the running version (`__version__`, kept in +`mpactupdate.py`; **keep it in sync with `main.py`'s `label_credits`** string, +currently `v1.00.01` -> `__version__ = '1.0.01'`), and if newer offers a +`git pull --ff-only` update (with a "please restart" prompt on success, or +opens the release page on failure). Version compare uses `packaging.version` +(PEP 440, numeric — so 2.10 > 2.9) with a dotted-int fallback; an unparseable +tag is treated as "not newer" (never nags). Every failure mode (offline, no +releases yet/404, malformed JSON, no git) is non-fatal and silent. + +- **Updater-framework evaluation (asked): no off-the-shelf framework.** The + standard option, `pyupdater`, targets *frozen* PyInstaller/cx_Freeze apps + and needs its own patch-server + signing setup — heavyweight for a tool run + from a git clone. For a source checkout the meaningful update is `git pull`, + and "is there a newer release" is one API call + a version compare, which is + all this module is. **Action needed from you:** tag releases on the fork + (e.g. `v1.0.1`) and bump `__version__` per release, or this finds nothing. +- For the *portable PyInstaller build* (no git), `apply_git_update` will fail + gracefully and the user is sent to the release page to download manually — + a real auto-updater for the frozen build is the `pyupdater`-shaped project + to consider only if/when that distribution channel matters. + +### Crash / error reporter (`crashreport.py`, `tests/test_crashreport.py`) + +Installs a `sys.excepthook` (after `QApplication` exists) that, on any +unhandled exception: chains to the default hook (traceback still hits the +console), formats a full report (traceback + MPACT/Python/platform versions + +timestamp + optional context), writes it to a timestamped file under +`~/.mpact/crashlogs/`, and shows a dialog offering to open a **prefilled +GitHub issue** (title + fenced traceback body) in the browser. Nothing is sent +without the user clicking through. The excepthook is hardened to never raise. + +- **Crash-logger-framework evaluation (asked): Sentry is the off-the-shelf + option, deliberately not used.** `sentry-sdk` is built for hosted/web + services: it sends events to a Sentry project by default (silent cloud + egress — wrong default for a desktop research tool), needs a DSN/account + provisioned, and *still* needs a custom `before_send` hook + dialog to honour + "ask the user first." The local-log + prefilled-GitHub-issue flow gives the + maintainer the same thing (a complete traceback) with zero infrastructure + and no privacy surprise. If MPACT later ships to many non-technical users and + a central error feed becomes worthwhile, Sentry with `before_send` gating is + the documented upgrade path (noted in `crashreport.py`). +- **PyQt5 note to verify live:** PyQt5 routes unhandled exceptions raised + inside Qt slots through `sys.excepthook` (then may abort), so this should + catch most in-GUI crashes — but the exact abort-after-hook behaviour is + PyQt5-version-dependent and is the one thing to confirm by actually + triggering an error in the running app. + +### Dialog styling (`dialogs.py`, `tests/test_dialogs.py`) + +The three subsystems above all pop `QMessageBox` dialogs. On the live app these +first rendered **black-on-black** (an unstyled dark background with invisible +black text — confirmed from a user screenshot): a `QMessageBox` inherits the +app's dark look but ships no text/background colours of its own. +`dialogs.styled_message_box()` applies a stylesheet matching the GUI palette +(background `rgb(40,40,40)`, text `rgb(212,212,212)`, detailed-text `QTextEdit` +darkened too) so every app dialog is legible and on-theme. Kept in its own +module (not `main.py`) so the box construction is headless-testable via +offscreen Qt (`build_message_box` returns the box without the blocking +`exec_`); `main.py`'s atlas/update/crash prompts all route through it. + +Two follow-ups after the first attempt (from a second user screenshot): +- **Buttons stayed black-on-black/borderless.** The `QMessageBox QPushButton` + *descendant* selector did not take effect on the standard buttons even + though the box/label rules did. Fixed by styling each button object + directly (`for b in box.buttons(): b.setStyleSheet(...)`) with a clearly + visible border (`rgb(120,120,120)`) — selector-independent and reliable. +- **Native title bar was light + rounded** (Win11). `apply_dark_titlebar()` + sets the DWM window attributes (immersive dark mode `20`/`19`, corner + preference `33` = do-not-round) via `ctypes`/`dwmapi`, best-effort and + Windows-only (no-op elsewhere, all failures swallowed). Called from + `styled_message_box` after `winId()` realises the handle but before + `exec_()` (dark mode must be set pre-show). **Verify live on Win11** — this + is the part that can't be checked headlessly. + +## Performance pass (2026-06-30, measurement-driven) + +Profiled `run_MSFaST` on the bundled example dataset (cProfile + wall timing; +scratch scripts not committed) and benchmarked the algorithmic sections that +scale with feature/DB size. **Every change below was verified output-identical +against the original on real data, not just "looks equivalent"** — the bar the +user set ("functionally identical in terms of I/O"). + +Finding: on the small example the *pipeline* is dominated by pandas CSV +I/O (the inter-stage `iondict.csv`/`_formatted.csv` round-trips, ~0.6s of +to_csv + ~0.5s of read_csv out of ~2.3s), not by Python loops. That I/O chain +is the already-logged "bigger, multi-session" refactor (threading an `iondict` +DataFrame through `filter`/`stats`); left alone here as too invasive/risky for +this pass. The wins below are in the per-feature/per-DB-row algorithmic code, +which is what actually scales badly on large real datasets. + +- **`dbsearch.search_npatlas`: ~5x faster, output identical.** Was + O(features x atlas_rows): per feature it scanned all ~36k atlas rows twice + (once per adduct) with a full-DataFrame boolean mask, then `.copy()` + + `pd.concat` + `sort_values` + a scalar `.loc` write. Now pre-sorts the two + adduct-mass columns once and uses `np.searchsorted` to test only a tiny m/z + window per feature; the **exact original ppm test is re-applied to the + windowed candidates** so the matched set is bit-identical (the window + `mass*(1 ± 2·ppm/1e6)` is a proven superset of the true ppm window). Also: + build one DataFrame per feature from concatenated m+h/m+na positions (no + per-feature `pd.concat`), iterate numpy arrays instead of `iterrows`, and + assign the `hits` column once instead of 979 scalar `.loc` sets. Verified on + the real example (979 feats × 36,454 atlas rows): 1.41s → 0.28s, **0 hitdb + DataFrame mismatches** (incl. row order + `ppm` values) and an identical + `iondict['hits']` column. New edge-case tests in `test_dbsearch.py` + (ppm-sort across both adducts; a single atlas row matching both adducts + appearing twice). +- **`qualityscore.compute_cv_quality`: ~6.5x faster, output identical.** The + AUC-under-the-CV-curve step was a per-feature Python loop doing + `iondict.iloc[pos, :]['col']` scalar lookups (the classic slow pandas + pattern) over thousands of rows. Replaced with the vectorised equivalent + `np.sum(np.diff(cv, prepend=0) * pct)`. ~0.4s → ~0.06s per call (n≈5000). + The faithfulness test (`test_qualityscore.py`, which pins against a verbatim + copy of the original loop) confirms identical values; np.sum's pairwise + summation can differ from the sequential loop by <1 ULP, far below the + 0.1%-rounded display precision. +- **`stats.groupave`: dead sum-of-squares chain removed** (pass 1) — dropped a + `(chunk**2).groupby().sum()` per CSV chunk that only fed an unused variance. +- **`filter.relationalfilter`: measured, left alone.** Looks O(n²) but the + early `break` once past the max isotope window makes it O(n·k) with small k: + benchmarked at 0.017 / 0.077 / 0.371 s for 2k / 8k / 20k synthetic features + (near-linear). Not a bottleneck; its intricate ringing/dimer-band logic + isn't worth the regression risk to micro-optimize. +- **`filter.decon` / `stats.groupave` remaining cost is the per-stage CSV + round-trips**, i.e. the same I/O-chain refactor noted above — not addressed + here. + +## CI matrix failures fixed (2026-07-01) + +`.github/workflows/tests.yml` runs a 3 OS x 2 Python-version matrix +(ubuntu/windows/macos x 3.9/3.11) with an **unpinned** `pandas` install (only +`numpy<2` is pinned) — so different runners can resolve genuinely different +pandas versions, and a test can pass on one cell and fail on another for +reasons that have nothing to do with the OS or Python version per se. + +- **`test_qualityscore.py`: 4 failures on Python 3.11 cells (pandas resolved + to 3.0.x there), 0 on 3.9 (older pandas).** `_reference_inline()`'s + deliberately-preserved verbatim copy of the *original* pre-extraction code + used `.iloc[:, 0] = ` to overwrite an int64-dtype column — + exactly the pattern `qualityscore.py` itself was already fixed to avoid (see + "Performance pass" above). On pandas 2.x this was only a `FutureWarning`; + **on pandas 3.x it's a hard `TypeError`**, confirmed by reproducing both the + old and new patterns against a real pandas 3.0.3 install in an isolated + venv. Since the *algorithm* under test wasn't the issue (only a dtype- + mechanics detail of the reference copy, which would have made the original + app code itself crash on a fresh pandas 3.x install, not just this test), + fixed `_reference_inline()` to use the same label-based assignment + (`df[col] = ...`) as the production fix. Re-verified output-identical and + passing under pandas 3.0.3 (all 222 tests), not just inspected. +- **`test_dialogs.py::test_build_message_box_applies_style_and_content`: failed + on every macOS cell (both 3.9 and 3.11), passed on Windows/Ubuntu.** + `box.windowTitle()` reads back `''` after `setWindowTitle('Title here')` on + macOS specifically — Qt's Cocoa integration renders `QMessageBox` as a + native alert panel (no title bar, per Apple HIG) and doesn't retain the + `windowTitle` property for that widget type there, independent of any + styling. Not a `dialogs.py` defect: `build_message_box` still calls + `setWindowTitle` unconditionally (meaningful everywhere else, harmless on + macOS); the test's readback assertion is now gated on + `sys.platform != 'darwin'`. +- **Watch item, not fixed (no failing test, no coverage to verify a fix + against): `mzmineimport.py` has several `.iloc[:, N] = .iloc[:, M]`-style + column reassignments** (lines ~70-71, ~190-202) reading `header=None` CSVs. + These are column-to-column copies within the same frame (not a computed- + float into a known-int column like the bug above), so lower risk, but + unpinned `pandas` in CI means a future resolve could expose the same class + of issue here too. No dedicated test file exists for `mzmineimport.py` + (format detection is covered via `translators.py`/`test_translators.py` + instead) — add coverage before touching this blind. diff --git a/docs/index.md b/docs/index.md index da22288..d89785f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,8 +6,8 @@ table (Progenesis QI, MZmine, MS-DIAL, or Bruker Metaboscape), a sample list, and a metadata file, and turns them into a filtered, statistically annotated dataset with a full suite of interactive plots: data-quality review, group-level set/correlation analysis, hierarchical clustering, -multivariate analysis (NMDS), m/z-vs-RT and mass-defect views, volcano -plots, heatmaps, and per-feature spectral/database-match lookup. +multivariate analysis (PCA/NMDS/PLS-DA), m/z-vs-RT and mass-defect views, +volcano plots, heatmaps, and per-feature spectral/database-match lookup. This site covers installing and running MPACT, the file formats it expects, the analysis and filtering options, and what each plot/tab diff --git a/docs/installation.md b/docs/installation.md index 0ed9819..64508b1 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -15,7 +15,7 @@ scipy, NumPy) that ships with Anaconda's base environment. Either: - Download and unzip the repository (GitHub: **Code → Download ZIP**), or -- Clone it (`git clone https://github.com/BalunasLab/mpact.git`, or with +- Clone it (`git clone https://github.com/robertsamples/mpact.git`, or with GitHub Desktop: **Code → Open with GitHub Desktop**). It doesn't matter where you place the folder — MPACT's launcher script and diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index d2cdef7..e004608 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -81,9 +81,11 @@ version (`plotting._is_duplicate_pick`). ## Still stuck? -Check `code/tests/` — the pure-logic modules (`filter`, `stats`, -`translators`, `groupsets`, `importdependencies`) have headless unit -tests you can run to rule out a logic bug: +Check `code/tests/` — the pure-logic modules (filtering, statistics, +import/export translators, groupsets, ordination, dendrogram purity, the +feature-search tree, and an end-to-end analysis-pipeline run on the bundled +example dataset) have headless unit tests you can run to rule out a logic +bug: ``` python -m pytest code/tests -q