Skip to content

refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation#13196

Open
grahamsedman wants to merge 6 commits into
apache:masterfrom
grahamsedman:fix/src-tscore-compile-parse-rules-errors
Open

refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation#13196
grahamsedman wants to merge 6 commits into
apache:masterfrom
grahamsedman:fix/src-tscore-compile-parse-rules-errors

Conversation

@grahamsedman
Copy link
Copy Markdown

@grahamsedman grahamsedman commented May 23, 2026

📌 Summary

This PR refactors src/tscore/CompileParseRules.cc to address compilation errors, type safety issues, modern C++ best practices, and documentation gaps. The changes ensure compatibility across 32-bit and 64-bit targets, fix narrowing conversion warnings, and improve code clarity and maintainability.


✅ Problems Addressed

Problem Root Cause Solution
Narrowing conversion errors char is signed on x86, causing negative values (e.g., -128) for ASCII >127, which cannot be narrowed to unsigned char on 32-bit targets. Use uint8_t for character arrays and explicitly cast to unsigned when writing to files.
Unused header #include "tscore/ink_string.h" is included but never used. Removed the unused header.
Poor header organization Headers were split into two sections, reducing readability. Consolidated all headers at the top of the file.
Missing documentation No Doxygen comments for the file, functions, or variables. Added comprehensive Doxygen documentation.
Non-thread-safe uint_to_binary The function used a static char buf[33], making it unsafe for concurrent use. Refactored to return std::string (RAII-managed, thread-safe).
C-style I/O Used FILE* and fprintf, which are less type-safe and require manual resource management. Replaced with std::ofstream (RAII-based, type-safe).
Poor Doxygen practice @section license causes warnings due to duplicate labels across files. Removed @section and kept the license text directly in the file header.
Unused macro #define COMPILE_PARSE_RULES is defined but never used. Removed the macro.
Outdated data types Used unsigned int and char instead of fixed-width types (uint8_t, uint32_t). Replaced with <cstdint> types for portability and clarity.

🔧 Key Changes

1. File Header & Includes

  • Removed: #define COMPILE_PARSE_RULES, #include "tscore/ink_string.h".
  • Added: <cstdint>, <fstream>, <iomanip>, <string>.
  • Refactored: Consolidated all includes at the top of the file.
  • Documentation: Added a detailed Doxygen file header explaining the purpose, generated files, and modern C++ features used.

2. Data Types

  • Bitmask arrays: Changed from unsigned int to uint32_t for explicit size and future-proofing.
  • Character arrays: Changed from char to uint8_t to guarantee an unsigned range (0-255) and avoid sign-extension issues.
  • Loop variables: Changed from int to uint16_t or uint8_t for semantic clarity.

3. uint_to_binary Function

  • Before: Used a static char buf[33] (non-thread-safe).
  • After: Returns std::string (thread-safe, RAII-managed).
  • Parameter: Changed from unsigned int to uint32_t for explicit size.

4. File I/O

  • Before: Used FILE* and fprintf (C-style, manual resource management).
  • After: Uses std::ofstream (RAII-based, type-safe, modern C++).
  • Format specifiers: Replaced %d with static_cast<unsigned> to avoid sign-extension issues when printing uint8_t values.

5. Doxygen Documentation

  • Added detailed comments for:
    • File purpose and generated outputs.
    • Each global array (parseRulesCType, tparseRulesCType, etc.).
    • The uint_to_binary function.
    • The main function (including its steps and classification functions).

🧪 Testing

  • Compilation: Verified on x86 (signed char) and ARM (unsigned char) targets.
  • Output: Confirmed that generated files (ParseRulesCType, ParseRulesCTypeToUpper, ParseRulesCTypeToLower) contain correct values (no negative numbers for ASCII >127).
  • Thread Safety: The refactored uint_to_binary is now thread-safe.
  • Standards: Meets C++ 20 / C++ 23 and Mozilla Mozilla style guide.

- Replace C-style file I/O (fopen, fprintf, fclose) with std::ofstream
for RAII-based file handling
- Use fixed-width integer types (uint32_t, uint8_t) instead of unsigned
int and char for portability and clarity
- Refactor uint_to_binary function to use std::string instead of static
buffer for thread safety
- Add comprehensive Doxygen documentation for the file, functions, and
arrays
- Remove @section license License from file header to prevent Doxygen
warnings about multiple use of section label, as @section is intended
for major structured documentation sections, not repetitive boilerplate
- Remove obsolete COMPILE_PARSE_RULES macro and ink_string.h dependency
- Improve output formatting using std::setw, std::setfill, and std::hex
for consistent alignment
- Replace int loop variables with uint16_t for better type safety
- Add static_cast for explicit type conversions
@grahamsedman grahamsedman changed the title feat: Refactor CompileParseRules.cc for type safety, modern C++, and documentation refactor: Refactor CompileParseRules.cc for type safety, modern C++, and documentation May 23, 2026
@grahamsedman grahamsedman force-pushed the fix/src-tscore-compile-parse-rules-errors branch 7 times, most recently from 282d691 to fbd054e Compare May 25, 2026 00:33
- Replace `1` with `1U` in bit shift operation to prevent undefined
behaviour when shifting into sign bit of signed integer
- Shifting a signed integer (e.g., `1 << 31`) into its sign bit invokes
undefined behaviour per the C++ standard
- Using an unsigned literal (`1U`) ensures well-defined behaviour for
all shift amounts
…d behaviour

- Change `char cc` to `unsigned char cc` to ensure consistent handling
  of byte values (0-255)
- `char` can be signed or unsigned depending on the platform, causing
  implementation-defined behaviour for values > 127
- Using `unsigned char` guarantees correct interpretation of all byte
  values
- Rename functions to PascalCase (uint_to_binary → UintToBinary)
- Add `g` prefix to global variables (example: parseRulesCType →
gParseRulesCType)
- Rename ParseRules classification functions and constants to PascalCase
(is_* → Is*, *_BIT → IS_*)
- Add file existence checks before writing with error messages to stderr
- Add `#include <iostream>` for std::cerr
- Rename `fp` to `outputFile` for improved clarity
- Change `char` to `unsigned char` for byte value handling
- Update all Doxygen comments to reflect new naming conventions
- Replace `index` with `i` to match loop variable
- Replace `currentChar` with `cc` to match variable declaration
- Use PascalCase for ParseRules functions (ink_tolower → InkTolower,
ink_toupper → InkToupper)
- Use PascalCase for UintToBinary function
- Replace `fp` with `outputFile` in last file writing block for
consistency
@grahamsedman grahamsedman force-pushed the fix/src-tscore-compile-parse-rules-errors branch from fbd054e to 7a47a4f Compare May 25, 2026 09:56
- Split file header into separate C-style comment block for license and
Doxygen block for documentation
- Required to pass Apache Release Audit Tool (Rat)
- Fix formatting: indent Apache license URL for consistency
- Fix formatting: remove extra space before "Fixed-width integer types"
@grahamsedman grahamsedman force-pushed the fix/src-tscore-compile-parse-rules-errors branch from 7a47a4f to 2ce2079 Compare May 25, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant