[arrow] Fix incorrect index usage in ArrowFieldWriters.TimeWriter#6768
[arrow] Fix incorrect index usage in ArrowFieldWriters.TimeWriter#6768guluo2016 wants to merge 2 commits intoapache:masterfrom
Conversation
|
@yuzelin @Zouxxyy @JingsongLi Can you review this pr when you have time, thanks! |
|
|
||
| IntColumnVector timeVec = | ||
| new IntColumnVector() { | ||
| final int[] values = new int[] {0, 1000, 2000, 3000, 4000}; |
There was a problem hiding this comment.
Current situation: Similar index mixing (using i to read source data, or using row to check the Arrow vector) can easily occur repeatedly in multiple field writers, leading to hidden bugs.
Recommendation:
Recommendation: Define and strictly distinguish between sourceIndex (e.g., rowIndex = startIndex + i) and targetIndex(i) in the implementation.
Within the loop body of each writer, use sourceIndex to read data from the ColumnVector, and use targetIndex to write data to the ArrowVector; use sourceIndex to check for null values.
Example:
// Assume startIndex, batchRows, columnVector, and timeMilliVector are known.
for (int i = 0; i < batchRows; i++) {
int sourceIndex = startIndex + i; // 从 columnVector 读取的索引
int targetIndex = i; // 写到 Arrow 向量的位置
if (columnVector.isNullAt(sourceIndex)) {
timeMilliVector.setNull(targetIndex);
} else {
int value = ((IntColumnVector) columnVector).getInt(sourceIndex);
timeMilliVector.setSafe(targetIndex, value);
}
}
There was a problem hiding this comment.
I didn’t modify ArrowFieldWriters.java to keep it consistent with the rest of the code.
Updating the test case to improve readability, thanks for the review
Purpose
Linked issue: close #6767
Tests
API and Format
Documentation