Skip to content

feat: Add dy.infer_schema#294

Open
gab23r wants to merge 4 commits intoQuantco:mainfrom
gab23r:infer-schema
Open

feat: Add dy.infer_schema#294
gab23r wants to merge 4 commits intoQuantco:mainfrom
gab23r:infer-schema

Conversation

@gab23r
Copy link
Contributor

@gab23r gab23r commented Mar 5, 2026

Fixes: #232

  • Add dy.infer_schema() function to generate dataframely schema code from a Polars DataFrame
  • Supports three output modes via return_type parameter:
    • None (default): prints schema to stdout for quick exploration
    • "string": returns schema code as a string
    • "schema": returns an actual Schema class for direct use
  • Handles all Polars types including nested types (List, Array, Struct) with proper inner nullability detection
  • Automatically handles invalid Python identifiers and keywords using aliases

This add the

>>> import polars as pl
>>> import dataframely as dy
>>> df = pl.DataFrame({
...     "name": ["Alice", "Bob"],
...     "age": [25, 30],
...     "score": [95.5, None],
... })
>>> dy.infer_schema(df, "PersonSchema")
class PersonSchema(dy.Schema):
    name = dy.String()
    age = dy.Int64()
    score = dy.Float64(nullable=True)
>>> schema = dy.infer_schema(df, "PersonSchema", return_type="schema")
>>> schema.is_valid(df)
True

Not supported (potential future enhancements)

  • Assess min/max length of string values to suggest min_length/max_length constraints
  • Suggest Enum if there are fewer than 10-20 distinct string values in a column
  • Suggest Categorical if there are 50-100 distinct string values in a dataframe with >100k rows

Copilot AI review requested due to automatic review settings March 5, 2026 09:52
@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (b3edd6a) to head (7ee32cf).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #294    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           54        55     +1     
  Lines         3121      3250   +129     
==========================================
+ Hits          3121      3250   +129     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new dy.infer_schema() function (addressing Issue #232) that generates dataframely schema code from a Polars DataFrame. The function inspects a DataFrame's column types and null counts to produce schema class definitions with appropriate column types and nullable annotations.

Changes:

  • New dataframely/_generate_schema.py module implementing infer_schema() with three return modes (print to stdout, return as string, or return as an executable Schema class), plus supporting helper functions for code generation.
  • Public API export of infer_schema in dataframely/__init__.py.
  • New test file tests/test_infer_schema.py covering basic types, nullable detection, datetime types, nested types, invalid identifiers, and round-trip validation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
dataframely/_generate_schema.py New module with infer_schema() function and helpers for inferring schema from DataFrame columns, handling type mapping, identifier sanitization, and code generation.
dataframely/__init__.py Exports infer_schema in the public API (import and __all__).
tests/test_infer_schema.py Tests for string output mode across all supported types and round-trip validation via schema return mode.

@gab23r gab23r changed the title Feat: Add dy.infer_schema feat: Add dy.infer_schema Mar 5, 2026
@github-actions github-actions bot added the enhancement New feature or request label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Generate Schema code from a dataframe

2 participants