Dear Tractor-Mix team,
Firstly, thank you for developing such a great tool! I'm looking forward to implementing it in my current dataset. I have a query about extract_tracts.py. When testing it on chr22 of my dataset, I get warning messages that certain positions are not in msp windows. However, I've checked my VCF and these are in msp windows - they just all seem to be the end position of the window. Unless gnomix windows are not end-inclusive, I think this is a bug.
For example, extract_tracts.py gives me the following:
INFO (main 96): # Running script version : 1.2.0
INFO (main 97): # VCF File : admix_chr22_qc.vcf.gz
INFO (main 98): # Prefix of output file names : admix_chr22_qc
INFO (main 99): # VCF File is compressed? : True
INFO (main 100): # Number of Ancestries in VCF : 2
INFO (main 101): # Output Directory : None
INFO (main 109): Creating output files for 2 ancestries
INFO (main 124): Iterating through VCF file
INFO (main 183): VCF position, 11933821 is not in an msp window, skipping site
INFO (main 183): VCF position, 12814864 is not in an msp window, skipping site
INFO (main 183): VCF position, 15860257 is not in an msp window, skipping site
INFO (main 183): VCF position, 16064423 is not in an msp window, skipping site
INFO (main 183): VCF position, 16125206 is not in an msp window, skipping site
INFO (main 183): VCF position, 16167211 is not in an msp window, skipping site
However, !cat admix_chr22_full_results_qc.msp | cut -f 1-3 | head gives:
#chm spos epos
22 10580920 11933821
22 11934161 12814864
22 15166065 15860257
22 15862268 16064423
22 16064594 16125206
22 16125370 16167211
22 16167225 16195316
22 16195357 16230311
22 16230346 16262902
All six flagged positions are the ends of the first six windows (I stopped the script early, but this was true for more windows when I let it run). Grepping for the positions in the dosage output files also shows that they are definitely excluded from the output files.
I was able to solve this in line 94 of def extract_tracts using the following:
while not (row[0] == window[0] and (window[1] <= pos <= window[2])): instead of
while not (row[0] == window[0] and (window[1] <= pos < window[2])):.
I've confirmed that these positions now appear in the dosage output file.
Dear Tractor-Mix team,
Firstly, thank you for developing such a great tool! I'm looking forward to implementing it in my current dataset. I have a query about extract_tracts.py. When testing it on chr22 of my dataset, I get warning messages that certain positions are not in msp windows. However, I've checked my VCF and these are in msp windows - they just all seem to be the end position of the window. Unless gnomix windows are not end-inclusive, I think this is a bug.
For example, extract_tracts.py gives me the following:
INFO (main 96): # Running script version : 1.2.0
INFO (main 97): # VCF File : admix_chr22_qc.vcf.gz
INFO (main 98): # Prefix of output file names : admix_chr22_qc
INFO (main 99): # VCF File is compressed? : True
INFO (main 100): # Number of Ancestries in VCF : 2
INFO (main 101): # Output Directory : None
INFO (main 109): Creating output files for 2 ancestries
INFO (main 124): Iterating through VCF file
INFO (main 183): VCF position, 11933821 is not in an msp window, skipping site
INFO (main 183): VCF position, 12814864 is not in an msp window, skipping site
INFO (main 183): VCF position, 15860257 is not in an msp window, skipping site
INFO (main 183): VCF position, 16064423 is not in an msp window, skipping site
INFO (main 183): VCF position, 16125206 is not in an msp window, skipping site
INFO (main 183): VCF position, 16167211 is not in an msp window, skipping site
However,
!cat admix_chr22_full_results_qc.msp | cut -f 1-3 | headgives:#chm spos epos
22 10580920 11933821
22 11934161 12814864
22 15166065 15860257
22 15862268 16064423
22 16064594 16125206
22 16125370 16167211
22 16167225 16195316
22 16195357 16230311
22 16230346 16262902
All six flagged positions are the ends of the first six windows (I stopped the script early, but this was true for more windows when I let it run). Grepping for the positions in the dosage output files also shows that they are definitely excluded from the output files.
I was able to solve this in line 94 of def extract_tracts using the following:
while not (row[0] == window[0] and (window[1] <= pos <= window[2])):instead ofwhile not (row[0] == window[0] and (window[1] <= pos < window[2])):.I've confirmed that these positions now appear in the dosage output file.