fix: update stream compatibility tests and improve result validation #33961

Tony2h · 2025-12-17T09:29:35Z

Description

Issue(s)

Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

gemini-code-assist · 2025-12-17T09:29:50Z

Summary of Changes

Hello @Tony2h, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the reliability and accuracy of stream compatibility tests by introducing a more sophisticated validation mechanism. Instead of merely checking row counts, the tests now directly compare processed stream data with expected outcomes. Additionally, the changes refactor test utility functions for better flexibility and simplify the test environment setup by removing unnecessary taosadapter configurations.

Highlights

Enhanced Stream Result Validation: Replaced basic row count checks with a new, more robust checkStreamResults method that compares actual stream output data against expected data, ensuring more accurate validation of stream processing.
Refactored Row Count Check: The original checkStreamResults method was renamed to checkStreamResultRows to specifically handle row count assertions, improving clarity and separation of concerns.
Improved Test Utility Flexibility: The checkStreamStatus method now accepts a library_path argument, allowing for more flexible environment configurations and reusability.
Simplified Test Environment Setup: Removed taosadapter related setup and configuration from the installTaosd function, streamlining the test initialization process.
Query Alignment for Comparison: Modified the expected query (exp_query) in verifyDataOnCurrentVersion to alias window start/end times as ts and te for consistent result comparison with the actual query output.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the stream compatibility tests by refactoring how stream results are checked, introducing a more powerful checkStreamResults method that compares actual query results against expected results. The changes also improve logging and remove obsolete code. My feedback focuses on improving the robustness and maintainability of the new test logic. I've identified a potential bug in the retry logic and suggested a refactoring to make it safer and cleaner. I also noted a minor duplication of a shell command.

I am having trouble creating individual review comments. Click here to see my feedback.

test/cases/18-StreamProcessing/23-Compatibility/test_new_stream_compatibility.py (294-339)

This new method is a great addition for more robust result validation. However, there are a couple of areas for improvement:

Potential UnboundLocalError: If retry_times is 0, the for loop will not execute, and result1 and result2 will not be defined, leading to an UnboundLocalError on line 327. This is a potential bug.
Code Duplication: The data extraction and logging calls are repeated after the loop. This can be avoided to improve maintainability.
Fragile Parsing: The extract_data function uses line.startswith('=' * 10) to find the data section. This is brittle and will break if the number of = characters in the taos client output changes. Using a regular expression would be more robust.

Here is a suggested refactoring that addresses these points by fixing the potential bug, removing duplication, and making the parsing more robust.

    def checkStreamResults(self, stream: StreamItem, library_path="/usr/lib", retry_times=300):
        command1 = f"LD_LIBRARY_PATH={library_path} taos -s '{stream.res_query}'"
        command2 = f"LD_LIBRARY_PATH={library_path} taos -s '{stream.exp_query}'"
        
        def extract_data(output):
            lines = output.strip().split('\n')
            data_lines = []
            start_collecting = False
            for line in lines:
                if re.match(r'^=+$', line.strip()):  # More robust separator check
                    start_collecting = True
                    continue
                if start_collecting:
                    if line.startswith('Query OK'):
                        break
                    data_lines.append(line.strip())
            return data_lines

        for _ in range(retry_times):
            result1 = subprocess.run(command1, shell=True, text=True, capture_output=True)
            result2 = subprocess.run(command2, shell=True, text=True, capture_output=True)
            if result1.returncode == 0 and result2.returncode == 0:
                res_data = extract_data(result1.stdout)
                exp_data = extract_data(result2.stdout)

                tdLog.info(f"Stream {stream.id} result data: {res_data}")
                tdLog.info(f"Stream {stream.id} expect data: {exp_data}")

                if res_data == exp_data:
                    tdLog.info(f"Stream {stream.id} result matches expected data.")
                    return
            time.sleep(1)

        if 'result1' not in locals():
            raise Exception(f"Stream result check for stream {stream.id} did not run, retry_times might be <= 0.")

        if result1.returncode != 0 or result2.returncode != 0:
            tdLog.error("Stream result retrieval failed.")
            tdLog.error(f"Error1:\n{result1.stderr}")
            tdLog.error(f"Error2:\n{result2.stderr}")
            raise Exception("Stream result retrieval failed.")

        # Re-extract data from last attempt for the assert message
        res_data = extract_data(result1.stdout)
        exp_data = extract_data(result2.stdout)

        assert res_data == exp_data, f"Stream {stream.id} result mismatch! \nResult: {res_data}\nExpect: {exp_data}"

test/cases/18-StreamProcessing/23-Compatibility/test_new_stream_compatibility.py (249-250)

The command to start taosd is constructed twice, once for logging and once for execution with os.system. This is slightly inefficient and could lead to inconsistencies if one is updated and the other is forgotten. It's better to define the command in a variable and reuse it.

        command = f"rm -rf {dataPath}/* && nohup /usr/bin/taosd -c {cPath} &"
        tdLog.info(f"start taosd: {command}")
        os.system(command)

Copilot

Pull request overview

This pull request updates stream compatibility tests to improve result validation and refactors the test infrastructure. The main focus is replacing manual stream result checking with a more robust validation approach that compares actual stream results against expected query results.

Refactored result validation to use a new checkStreamResults method that compares full data output rather than just row counts
Updated library path handling to use build-specific paths instead of hardcoded /usr/lib
Simplified process management by replacing framework calls with direct system commands

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-17T09:36:22Z