Skip to content

scriptandcompile/vb6parse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VB6Parse

A complete, high-performance parser library for Visual Basic 6 code and project files.

Crates.io Documentation License: MIT

Project Documentation & Resources
VB6 Library Reference
Code Coverage Report
Performance Benchmarks

Overview

VB6Parse is designed as a foundational library for tools that analyze, convert, or process Visual Basic 6 code. While capable of supporting real-time syntax highlighting and language servers, its primary focus is on offline analysis, legacy code utilities, and migration tools.

Key Features:

  • Fast, efficient parsing with minimal allocations
  • Full support for VB6 project files, modules, classes, forms, and resources
  • Concrete Syntax Tree (CST) with complete source fidelity
  • 160+ built-in VB6 library functions and 42 statements
  • Comprehensive error handling with detailed failure information
  • Zero-copy tokenization and streaming parsing

Quick Start

Add VB6Parse to your Cargo.toml:

[dependencies]
vb6parse = "0.5.1"

Parse a VB6 Project File

use vb6parse::*;

let input = r#"Type=Exe
Reference=*\G{00020430-0000-0000-C000-000000000046}#2.0#0#...\stdole2.tlb#OLE Automation
Module=Module1; Module1.bas
Form=Form1.frm
"#;

// Decode source with Windows-1252 encoding (VB6 default)
let source = SourceFile::from_string("Project1.vbp", input);

// Parse the project
let result = ProjectFile::parse(&source);

// Handle results
let (project, failures) = result.unpack();

if let Some(project) = project {
    println!("Project type: {:?}", project.project_type);
    println!("Modules: {}", project.modules().count());
    println!("Forms: {}", project.forms().count());
}

// Print any parsing errors
for failure in failures {
    failure.print();
}

Parse a VB6 Module

use vb6parse::*;

let code = r#"Attribute VB_Name = "MyModule"
Public Sub HelloWorld()
    MsgBox "Hello, World!"
End Sub
"#;

let source = SourceFile::from_string("MyModule.bas", code);
let result = ModuleFile::parse(&source);

let (module, failures) = result.unpack();
if let Some(module) = module {
    println!("Module name: {}", module.name);
}

Tokenize VB6 Code

use vb6parse::*;
let mut source_stream = SourceStream::new("test.bas", "Dim x As Integer");
let (token_stream, _failures) = tokenize(&mut source_stream).unpack();

if let Some(tokens) = token_stream {
    for (text, token) in tokens {
        println!("{:?}: {:?}", text, token);
    }
}

Parse to Concrete Syntax Tree

use vb6parse::*;

let contents = "Sub Test()\n    x = 5\nEnd Sub";

let (tree_opt, _failures) = ConcreteSyntaxTree::from_text("test.bas", contents).unpack();

// Print the tree
if let Some(tree) = tree_opt {
    println!("{}", tree.debug_tree());
}

Navigating the CST

The CST provides rich navigation capabilities for traversing and querying the tree structure:

use vb6parse::*;
use vb6parse::parsers::SyntaxKind;

let source = "Sub Test()\n    Dim x As Integer\n    x = 42\nEnd Sub";
let cst = ConcreteSyntaxTree::from_text("test.bas", source).unwrap();
let root = cst.to_root_node();

// Basic navigation
let child_count = root.child_count();
let first = root.first_child();

// Find by kind
let sub_stmt = root.find(SyntaxKind::SubStatement); // First match
let all_dims = root.find_all(SyntaxKind::DimStatement); // All matches

// Filter children
let non_tokens: Vec<_> = root.non_token_children().collect();
let significant: Vec<_> = root.significant_children().collect();

// Custom search with predicates
let keywords = root.find_all_if(|n| n.kind().to_string().ends_with("Keyword"));
let complex = root.find_all_if(|n| !n.is_token() && n.children().len() > 5);

// Iterate all nodes depth-first
for node in root.descendants() {
    if node.is_significant() {
        println!("{:?}: {}", node.kind(), node.text());
    }

    // Convenience checkers
    if node.is_comment() || node.is_whitespace() {
        // Skip trivia
    }
}

Available Navigation Methods:

Both ConcreteSyntaxTree and CstNode provide:

  • Basic: child_count(), first_child(), last_child(), child_at()
  • By Kind: children_by_kind(), first_child_by_kind(), contains_kind()
  • Recursive: find(), find_all()
  • Filtering: non_token_children(), token_children(), significant_children()
  • Predicates: find_if(), find_all_if()
  • Traversal: descendants(), depth_first_iter()

CstNode also provides: is_whitespace(), is_newline(), is_comment(), is_trivia(), is_significant()

See also: examples/cst_navigation.rs for comprehensive examples.

API Surface

Top-Level Imports

For common use cases, import everything with:

use vb6parse::*;

This brings in:

  • I/O Layer: SourceFile, SourceStream
  • Lexer: tokenize(), Token, TokenStream
  • File Parsers: ProjectFile, ClassFile, ModuleFile, FormFile, FormResourceFile
  • Syntax Parsers: parse(), ConcreteSyntaxTree, SyntaxKind, SerializableTree
  • Error Handling: ErrorDetails, ParseResult, all error kind enums

Layer Modules (Advanced Usage)

For advanced use cases, access specific layers:

use vb6parse::io::{SourceFile, SourceStream, Comparator};
use vb6parse::lexer::{tokenize, Token, TokenStream};
use vb6parse::parsers::{parse, ConcreteSyntaxTree};
use vb6parse::language::controls::{Control, ControlKind};
use vb6parse::errors::{ProjectErrorKind, FormErrorKind};

Parsing Architecture

Bytes/String/File β†’ SourceFile β†’ SourceStream β†’ TokenStream β†’ CST β†’ Object Layer
                    (Windows-1252) (Characters)   (Tokens)    (Tree) (Structured)

Layers:

  1. I/O Layer (io): Character decoding and stream access
  2. Lexer Layer (lexer): Tokenization with keyword lookup
  3. Syntax Layer (syntax): VB6 language constructs and library functions
  4. Parsers Layer (parsers): CST construction from tokens
  5. Files Layer (files): High-level file format parsers
  6. Language Layer (language): VB6 types, colors, controls
  7. Errors Layer (errors): Comprehensive error types

Source Code Organization

src/
β”œβ”€β”€ io/                          # I/O Layer - Character streams and decoding
β”‚   β”œβ”€β”€ mod.rs                   # SourceFile, SourceStream
β”‚   β”œβ”€β”€ comparator.rs            # Case-sensitive/insensitive comparison
β”‚   └── decode.rs                # Windows-1252 decoding
β”‚
β”œβ”€β”€ lexer/                       # Lexer Layer - Tokenization
β”‚   β”œβ”€β”€ mod.rs                   # tokenize() function, keyword lookup
β”‚   └── token_stream.rs          # TokenStream implementation
β”‚
β”œβ”€β”€ syntax/                      # Syntax Layer - VB6 Language constructs
β”‚   β”œβ”€β”€ library/                 # VB6 built-in library unit tests and documentation
β”‚   β”‚   β”œβ”€β”€ functions/           # 160+ VB6 functions (14 categories)
β”‚   β”‚   β”‚   β”œβ”€β”€ array/           # Array, Filter, Join, Split, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ conversion/      # CBool, CInt, CLng, Str, Val, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ datetime/        # Date, Now, Time, Year, Month, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ file_system/     # Dir, EOF, FileLen, LOF, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ financial/       # FV, IPmt, IRR, NPV, PV, Rate, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ interaction/     # MsgBox, InputBox, Shell, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ math/            # Abs, Cos, Sin, Tan, Log, Sqr, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ miscellaneous/   # Environ, RGB, QBColor, etc.
β”‚   β”‚   β”‚   β”œβ”€β”€ string/          # Left, Right, Mid, Len, Trim, etc.
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── statements/          # VB6 statement unit tests and documentation (7 categories)
β”‚   β”‚       β”œβ”€β”€ file_operations/ # Open, Close, Get, Put, etc.
β”‚   β”‚       β”œβ”€β”€ filesystem/      # FileCopy, Kill, MkDir, RmDir, etc.
β”‚   β”‚       β”œβ”€β”€ runtime_control/ # DoEvents, Stop, End, etc.
β”‚   β”‚       β”œβ”€β”€ runtime_state/   # Date, Time assignment, etc.
β”‚   β”‚       β”œβ”€β”€ string_manipulation/ # Mid statement, etc.
β”‚   β”‚       β”œβ”€β”€ system_interaction/  # Beep, etc.
β”‚   β”‚       └── ...
β”‚   β”œβ”€β”€ statements/              # Statement parsing logic
β”‚   β”‚   β”œβ”€β”€ control_flow/        # If, Select Case, For, While parsers
β”‚   β”‚   β”œβ”€β”€ declarations/        # Dim, ReDim, Const, Enum parsers
β”‚   β”‚   └── objects/             # Set, With, RaiseEvent parsers
β”‚   └── expressions/             # Expression parsing utilities
β”‚
β”œβ”€β”€ parsers/                     # Parsers Layer - CST construction
β”‚   β”œβ”€β”€ cst/                     # Concrete Syntax Tree implementation
β”‚   β”‚   β”œβ”€β”€ mod.rs               # parse(), ConcreteSyntaxTree, CstNode
β”‚   β”‚   └── rowan_wrapper.rs     # Red-green tree wrapper
β”‚   β”œβ”€β”€ parseresults.rs          # ParseResult<T, E> type
β”‚   └── syntaxkind.rs            # SyntaxKind enum (all token types)
β”‚
β”œβ”€β”€ files/                       # Files Layer - VB6 file format parsers
β”‚   β”œβ”€β”€ common/                  # Shared parsing utilities
β”‚   β”‚   β”œβ”€β”€ properties.rs        # Property bag, PropertyGroup
β”‚   β”‚   β”œβ”€β”€ attributes.rs        # Attribute statement parsing
β”‚   β”‚   └── references.rs        # Object reference parsing
β”‚   β”œβ”€β”€ project/                 # VBP - Project files
β”‚   β”‚   β”œβ”€β”€ mod.rs               # ProjectFile struct and parser
β”‚   β”‚   β”œβ”€β”€ properties.rs        # Project properties
β”‚   β”‚   β”œβ”€β”€ references.rs        # Reference types
β”‚   β”‚   └── compilesettings.rs   # Compilation settings
β”‚   β”œβ”€β”€ class/                   # CLS - Class modules
β”‚   β”œβ”€β”€ module/                  # BAS - Code modules
β”‚   β”œβ”€β”€ form/                    # FRM - Forms
β”‚   └── resource/                # FRX - Form resources
β”‚
β”œβ”€β”€ language/                    # Language Layer - VB6 types and definitions
β”‚   β”œβ”€β”€ color.rs                 # VB6 color constants and Color type
β”‚   β”œβ”€β”€ controls/                # VB6 control definitions (50+ controls)
β”‚   β”‚   β”œβ”€β”€ mod.rs               # Control, ControlKind enums
β”‚   β”‚   β”œβ”€β”€ form.rs              # FormProperties
β”‚   β”‚   β”œβ”€β”€ textbox.rs           # TextBoxProperties
β”‚   β”‚   β”œβ”€β”€ label.rs             # LabelProperties
β”‚   β”‚   └── ...                  # 50+ control types
β”‚   └── tokens.rs                # Token enum definition
β”‚
β”œβ”€β”€ errors/                      # Errors Layer - Error types
β”‚   β”œβ”€β”€ mod.rs                   # ErrorDetails, error printing
β”‚   β”œβ”€β”€ decode.rs                # SourceFileErrorKind
β”‚   β”œβ”€β”€ tokenize.rs              # CodeErrorKind
β”‚   β”œβ”€β”€ project.rs               # ProjectErrorKind
β”‚   β”œβ”€β”€ class.rs                 # ClassErrorKind
β”‚   β”œβ”€β”€ module.rs                # ModuleErrorKind
β”‚   β”œβ”€β”€ form.rs                  # FormErrorKind
β”‚   β”œβ”€β”€ property.rs              # PropertyError
β”‚   └── resource.rs              # ResourceErrorKind
β”‚
└── lib.rs                       # Public API surface

Common Tasks

1. Extract All Form Controls

use vb6parse::language::Control;
use vb6parse::*;

fn extract_controls(form_path: &str) -> Vec<String> {
    let source = SourceFile::from_file(form_path).unwrap();
    let result = FormFile::parse(&source);
    let (form, _) = result.unpack();

    let mut control_names = Vec::new();

    if let Some(formfile) = form {
        fn visit_control(control: &Control, names: &mut Vec<String>) {
            names.push(control.name().to_string());

            // Recursively visit children
            if let Some(children) = control.kind().children() {
                for child in children {
                    visit_control(child, names);
                }
            }
        }

        for control in formfile.form.children().unwrap() {
            visit_control(control, &mut control_names);
        }
    }

    control_names
}

2. Analyze Code Without Full Parsing

use vb6parse::*;

fn count_identifiers(code: &str, function_name: &str) -> usize {
    let mut source_stream = SourceStream::new("temp.bas", code);
    let result = tokenize(&mut source_stream);
    let (tokens, _) = result.unpack();

    tokens
        .map(|ts| {
            ts.filter(|(text, token)| {
                *token == language::Token::Identifier && text.eq_ignore_ascii_case(function_name)
            })
            .count()
        })
        .unwrap_or(0)
}

Advanced Topics

Error Handling

VB6Parse uses a custom ParseResult<T, E> type that separates successful results from recoverable errors:

use vb6parse::*;

let result = ProjectFile::parse(&source);

// Option 1: Unpack into result and failures
let (project_opt, failures) = result.unpack();

// Option 2: Check for failures first
if result.has_failures() {
    for failure in result.failures() {
        eprintln!("Error at line {}: {:?}", failure.error_offset, failure.kind);
    }
}

// Option 3: Convert to Result<T, Vec<ErrorDetails>>
let std_result = result.ok_or_errors();

See also:

Working with the CST

The Concrete Syntax Tree preserves all source information including whitespace and comments:

use vb6parse::*;

let tree = parse(token_stream);

// Navigate the tree
let root = tree.to_root_node();
for child in root.children() {
    println!("Node: {:?}", child.kind());
    println!("Text: {}", child.text());
}

// Serialize for debugging
let serializable = tree.to_serializable();
println!("{:#?}", serializable);

See also:

Character Encoding

VB6 uses Windows-1252 encoding. Always use decode_with_replacement() for file content:

use vb6parse::*;

// From bytes (e.g., file read)
let bytes = std::fs::read("file.bas")?;
let source = SourceFile::decode_with_replacement("file.bas", &bytes).unwrap();

// From UTF-8 string (testing/programmatic)
let source = SourceFile::from_string("test.bas", "Dim x As Integer");

See also:

VB6 Library Functions

VB6Parse includes full definitions for 160+ VB6 library functions organized into 14 categories:

// Access function metadata
use vb6parse::syntax::library::functions::string::left;
use vb6parse::syntax::library::functions::math::sin;
use vb6parse::syntax::library::functions::conversion::cint;

// Each module includes:
// - Full VB6 documentation
// - Function signatures
// - Parameter descriptions
// - Usage examples
// - Related functions

Categories:

  • Array manipulation (Array, Filter, Join, Split, UBound, LBound)
  • Conversion (CBool, CDate, CInt, CLng, CStr, Val, Str)
  • Date/Time (Date, Time, Now, Year, Month, Day, Hour, DateAdd, DateDiff)
  • File System (Dir, EOF, FileLen, FreeFile, LOF, Seek)
  • Financial (FV, IPmt, IRR, NPV, PV, Rate)
  • Formatting (Format, FormatCurrency, FormatDateTime, FormatNumber, FormatPercent)
  • Interaction (MsgBox, InputBox, Shell, CreateObject, GetObject)
  • Inspection (IsArray, IsDate, IsEmpty, IsNull, IsNumeric, TypeName, VarType)
  • Math (Abs, Atn, Cos, Exp, Log, Rnd, Sgn, Sin, Sqr, Tan)
  • String (Left, Right, Mid, Len, InStr, Replace, Trim, UCase, LCase)
  • And more...

See also: src/syntax/library/functions/

Form Resources (FRX Files)

Form resource files contain binary data for controls (images, icons, property blobs):

use vb6parse::*;

// Option 1: load bytes and hand to FormResourceFile to handle.
let bytes = std::fs::read("Form1.frx")?;
let result = FormResourceFile::parse("Form1.frx", bytes);

// Option 2: Load directly from file.
let result = FormResourceFile::from_file("Form1.frx")?;

let (resource, _failures) = result.unpack();
if let Some(resource) = resource {
    for (offset, data) in resource.iter_entries() {
        println!(
            "Resource at offset {}: {} bytes",
            offset,
            data.as_bytes().unwrap().len()
        );
    }
}

See also:

Testing

VB6Parse has comprehensive test coverage.

πŸ“Š View Test Coverage Report

Running Tests

# Clone test data (required for integration tests)
git submodule update --init --recursive

# Run all tests
cargo test

# Run only library tests
cargo test --lib

# Run only integration tests
cargo test --test '*'

# Run documentation tests
cargo test --doc

Snapshot Testing

Integration tests use insta for snapshot testing:

# Review snapshot changes
cargo insta review

# Accept all snapshots
cargo insta accept

Test data location: tests/data/ (git submodules of real VB6 projects)

See also:

Benchmarking

VB6Parse includes criterion benchmarks for performance testing:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench bulk_parser_load

# Generate HTML reports
# Results saved to target/criterion/

Benchmarks:

  • bulk_parser_load - Parsing multiple large VB6 projects
  • Token stream generation
  • CST construction

See also:

Code Coverage

VB6Parse uses cargo-llvm-cov to track test coverage and ensure comprehensive testing across all modules.

Installation

# Install cargo-llvm-cov
cargo install cargo-llvm-cov

Running Coverage

# Generate coverage report (terminal output)
cargo llvm-cov

# Generate HTML report
cargo llvm-cov --html
# Open target/llvm-cov/html/index.html in your browser

# Generate coverage with open HTML report
cargo llvm-cov --open

# Generate detailed coverage for specific packages
cargo llvm-cov --package vb6parse

# Include tests in coverage
cargo llvm-cov --all-targets

# Generate LCOV format (for CI/CD integration)
cargo llvm-cov --lcov --output-path lcov.info

Coverage Reports

Coverage reports are saved to:

  • HTML reports: target/llvm-cov/html/
  • Terminal summary: Displays percentage coverage after running cargo llvm-cov
  • LCOV files: lcov.info (when using --lcov flag)

Current Coverage:

  • Library tests: 5,467 tests covering VB6 library functions
  • Integration tests: 31 tests with real-world VB6 projects
  • Documentation tests: 83 tests ensuring examples work
  • Coverage focus: Parsers, tokenization, error handling, and file format support

Contributing to VB6Parse

Contributions are welcome! Please see the CONTRIBUTING.md file for more information.

Development Setup

# Clone repository
git clone https://github.com/scriptandcompile/vb6parse
cd vb6parse

# Get test data
git submodule update --init --recursive

# Run tests
cargo test

# Run benchmarks
cargo bench

# Check for issues
cargo clippy

# Format code
cargo fmt

Code Organization Guidelines

  1. Layer Separation: Keep clear boundaries between layers
  2. Windows-1252 Handling: Always use SourceFile::decode_with_replacement()
  3. Error Recovery: Parsers should recover from errors when possible
  4. CST Fidelity: Preserve all source text including whitespace and comments
  5. Documentation: Include doc tests for public APIs

Adding New Features

VB6 Library Functions:

  • Add to appropriate category in src/syntax/library/functions/
  • Include full VB6 documentation
  • Add comprehensive tests
  • Update category mod.rs

Control Types:

  • Add to src/language/controls/
  • Define properties struct
  • Add to ControlKind enum
  • Include property validation

Error Types:

  • Add to appropriate error module in src/errors/
  • Ensure Display implementation
  • Add context information

Performance Considerations

  • Use zero-copy where possible (string slices, not String)
  • Avoid unnecessary allocations (use iterators)
  • Leverage rowan's red-green tree for CST memory efficiency
  • Use phf crate for compile-time lookup tables

See also:

Supported File Types

Extension Description Status
.vbp Project files βœ… Complete
.cls Class modules βœ… Complete
.bas Code modules βœ… Complete
.frm Forms ⚠️ Partial (font, some icons, etc)
.frx Form resources ⚠️ Partial (binary blobs loaded, not all mapped to properties)
.ctl User controls βœ… Parsed as forms
.dob User documents βœ… Parsed as forms
.vbw IDE window state ❌ Not yet implemented
.dsx Data environments ❌ Not yet implemented
.dsr Data env. resources ❌ Not yet implemented
.ttx Crystal reports ❌ Not yet implemented

Project Status

  • βœ… Core Parsing: Fully implemented for VBP, CLS, BAS files
  • βœ… Tokenization: Complete with keyword lookup
  • βœ… CST Construction: Full syntax tree with source fidelity
  • βœ… Error Handling: Comprehensive error types and recovery
  • βœ… VB6 Library: 160+ functions, 42 statements documented
  • ⚠️ FRX Resources: Binary loading complete, property mapping partial
  • ⚠️ FRM Properties: Majority of FRM properties load properly, (icon, background, font mapping partial)
  • ❌ AST: Not yet implemented (CST available)
  • βœ… Testing: 5,500+ tests across unit, integration, and doc tests
  • βœ… Benchmarking: Criterion-based performance testing
  • βœ… Fuzz Testing: Coverage-guided fuzzing with cargo-fuzz
  • βœ… Documentation: Comprehensive API docs and examples

Fuzz Testing

VB6Parse includes comprehensive fuzz testing using cargo-fuzz and libFuzzer to discover edge cases, crashes, and undefined behavior.

Available Fuzz Targets:

  • sourcefile_decode - Tests Windows-1252 decoding with arbitrary bytes
  • sourcestream - Tests low-level character stream operations
  • tokenize - Tests tokenization with malformed VB6 code
  • cst_parse - Tests Concrete Syntax Tree parsing with invalid syntax

Quick Start:

# Install cargo-fuzz (requires nightly)
cargo install cargo-fuzz

# Run a fuzzer for 60 seconds
cargo +nightly fuzz run sourcefile_decode -- -max_total_time=60

# List all fuzz targets
cargo +nightly fuzz list

Learn More: See fuzz/README.md for detailed usage.

Examples

All examples are located in the examples/ directory:

Example Description
audiostation_parse.rs Parse a complete real-world VB6 project
cst_navigation.rs Navigate and query the Concrete Syntax Tree
cst_parse.rs Parse tokens directly to CST
debug_cst.rs Display CST debug representation
debug_resource.rs Inspect FRX resource files
parse_class.rs Parse class files from bytes
parse_control_only.rs Parse individual form controls
parse_form.rs Parse VB6 forms
parse_module.rs Parse code modules
parse_project.rs Parse project files
sourcestream.rs Work with character streams
tokenstream.rs Tokenize VB6 code

Run any example with:

cargo run --example parse_project

Resources

Limitations

  1. Encoding: Primarily designed for "predominantly English" source code with Windows-1252 encoding detection limitations
  2. AST: Abstract Syntax Tree is not yet implemented (Concrete Syntax Tree is available)
  3. FRX Mapping: Binary resources are loaded but not all are mapped to control properties
  4. Real-time Use: While capable, not optimized for real-time highlighting or LSP (focus is on offline analysis)

License

MIT License - See LICENSE file for details.


Built with ❀️ by ScriptAndCompile

About

Parser library using rust & winnow-rs for VB6 (projects, forms, designers, etc)

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published