From 2dcc97df98a6b4024be953d8144d73fa16d6eba2 Mon Sep 17 00:00:00 2001 From: Ted Vilutis Date: Mon, 16 Feb 2026 18:18:48 -0800 Subject: [PATCH 01/20] Fabric Lakehouse Skill This is new skill for Copilot agent to work with fabric Lakehouse --- docs/README.skills.md | 1 + skills/fabric-lakehouse/SKILL.md | 106 ++++++++++ skills/fabric-lakehouse/references/getdata.md | 36 ++++ skills/fabric-lakehouse/references/pyspark.md | 187 ++++++++++++++++++ 4 files changed, 330 insertions(+) create mode 100644 skills/fabric-lakehouse/SKILL.md create mode 100644 skills/fabric-lakehouse/references/getdata.md create mode 100644 skills/fabric-lakehouse/references/pyspark.md diff --git a/docs/README.skills.md b/docs/README.skills.md index 47139ebc1..e08df43f7 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -35,6 +35,7 @@ Skills differ from other primitives by supporting bundled assets (scripts, code | [copilot-sdk](../skills/copilot-sdk/SKILL.md) | Build agentic applications with GitHub Copilot SDK. Use when embedding AI agents in apps, creating custom tools, implementing streaming responses, managing sessions, connecting to MCP servers, or creating custom agents. Triggers on Copilot SDK, GitHub SDK, agentic app, embed Copilot, programmable agent, MCP server, custom agent. | None | | [create-web-form](../skills/create-web-form/SKILL.md) | Create robust, accessible web forms with best practices for HTML structure, CSS styling, JavaScript interactivity, form validation, and server-side processing. Use when asked to "create a form", "build a web form", "add a contact form", "make a signup form", or when building any HTML form with data handling. Covers PHP and Python backends, MySQL database integration, REST APIs, XML data exchange, accessibility (ARIA), and progressive web apps. | `references/accessibility.md`
`references/aria-form-role.md`
`references/css-styling.md`
`references/form-basics.md`
`references/form-controls.md`
`references/form-data-handling.md`
`references/html-form-elements.md`
`references/html-form-example.md`
`references/hypertext-transfer-protocol.md`
`references/javascript.md`
`references/php-cookies.md`
`references/php-forms.md`
`references/php-json.md`
`references/php-mysql-database.md`
`references/progressive-web-app.md`
`references/python-as-web-framework.md`
`references/python-contact-form.md`
`references/python-flask-app.md`
`references/python-flask.md`
`references/security.md`
`references/styling-web-forms.md`
`references/web-api.md`
`references/web-performance.md`
`references/xml.md` | | [excalidraw-diagram-generator](../skills/excalidraw-diagram-generator/SKILL.md) | Generate Excalidraw diagrams from natural language descriptions. Use when asked to "create a diagram", "make a flowchart", "visualize a process", "draw a system architecture", "create a mind map", or "generate an Excalidraw file". Supports flowcharts, relationship diagrams, mind maps, and system architecture diagrams. Outputs .excalidraw JSON files that can be opened directly in Excalidraw. | `references/element-types.md`
`references/excalidraw-schema.md`
`scripts/.gitignore`
`scripts/README.md`
`scripts/add-arrow.py`
`scripts/add-icon-to-diagram.py`
`scripts/split-excalidraw-library.py`
`templates/business-flow-swimlane-template.excalidraw`
`templates/class-diagram-template.excalidraw`
`templates/data-flow-diagram-template.excalidraw`
`templates/er-diagram-template.excalidraw`
`templates/flowchart-template.excalidraw`
`templates/mindmap-template.excalidraw`
`templates/relationship-template.excalidraw`
`templates/sequence-diagram-template.excalidraw` | +| [fabric-lakehouse](../skills/fabric-lakehouse/SKILL.md) | Provide definition and context about Fabric Lakehouse and its capabilities for software systems and AI-powered features. Help users design, build, and optimize Lakehouse solutions using best practices. | `references/getdata.md`
`references/pyspark.md` | | [finnish-humanizer](../skills/finnish-humanizer/SKILL.md) | Detect and remove AI-generated markers from Finnish text, making it sound like a native Finnish speaker wrote it. Use when asked to "humanize", "naturalize", or "remove AI feel" from Finnish text, or when editing .md/.txt files containing Finnish content. Identifies 26 patterns (12 Finnish-specific + 14 universal) and 4 style markers. | `references/patterns.md` | | [gh-cli](../skills/gh-cli/SKILL.md) | GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line. | None | | [git-commit](../skills/git-commit/SKILL.md) | Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping | None | diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md new file mode 100644 index 000000000..ccc32a96d --- /dev/null +++ b/skills/fabric-lakehouse/SKILL.md @@ -0,0 +1,106 @@ +--- +name: fabric-lakehouse +description: 'Provide definition and context about Fabric Lakehouse and its capabilities for software systems and AI-powered features. Help users design, build, and optimize Lakehouse solutions using best practices.' +metadata: + author: tedvilutis + version: "1.0" +--- + +# When to Use This Skill + +Use this skill when you need to: +- Generate document or explanation that includes definition and context about Fabric Lakehouse and its capabilities. +- Design, build, and optimize Lakehouse solutions using best practices. +- Understand the core concepts and components of a Lakehouse in Microsoft Fabric. +- Learn how to manage tabular and non-tabular data within a Lakehouse. + +# Fabric Lakehouse + +## Core Concepts + +### What is a Lakehouse? + +Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular, like tables, and non-tabular, like files, data. It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides: + +- **Unified storage** in OneLake for structured and unstructured data +- **Delta Lake format** for ACID transactions, versioning, and time travel +- **SQL analytics endpoint** for T-SQL queries +- **Semantic model** for Power BI integration +- Support for other table formats like CSV, Parquet +- Support for any file formats +- Tools for table optimization and data management + +### Key Components + +- **Delta Tables**: Managed tables with ACID compliance and schema enforcement +- **Files**: Unstructured/semi-structured data in the Files section +- **SQL Endpoint**: Auto-generated read-only SQL interface for querying +- **Shortcuts**: Virtual links to external/internal data without copying +- **Fabric Materialized Views**: Pre-computed tables for fast query performance + +### Tabular data in a Lakehouse + +Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats only available for Spark querying. +Tables can be internal, when data is stored under "Tables" folder or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Referencing tables are done through Shortcuts, which can be internal, pointing to other location in Fabric, or external pointing to data stored outside of Fabric. + +### Schemas for tables in a Lakehouse + +When creating a lakehouse user can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under "Tables" folder and store tables inside of those folders. Default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. User can reference schema located in other lakehouse using Schema Shortcut that way referencing all tables with one shortcut that are at the destination schema. + +### Files in a Lakehouse + +Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse. + +### Fabric Materialized Views + +Set of pre-computed tables that are automatically updated based on schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL stored in associated Notebook. + +### Spark Views + +Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables. + +## Security + +### Item access or control plane security + +User can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. User can also get access permission using sharing capabilities of Lakehouse. + +### Data access or OneLake Security + +For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table. + + +## Lakehouse Shortcuts + +Shortcuts create virtual links to data without copying: + +### Types of Shortcuts + +- **Internal**: Link to other Fabric Lakehouses/tables, cross-workspace data sharing +- **ADLS Gen2**: Azure Data Lake Storage Gen2 external Azure storage +- **Amazon S3**: AWS S3 buckets, cross-cloud data access +- **Dataverse**: Microsoft Dataverse, business application data +- **Google Cloud Storage**: GCS buckets, cross-cloud data access + +## Performance Optimization + +### V-Order Optimization + +For faster data read with semantic model enable V-Order optimization on Delta tables.This presorts data in a way that improves query performance for common access patterns. + +### Table Optimization + +Tables can also be optimized using OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes. + +## Lineage + +Lakehosue item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies. + +## PySpark Code Examples + +See [PySpark code](references/pyspark.md) for details. + +## Getting data into Lakehouse + +See [Get data](references/getdata.md) for details. + diff --git a/skills/fabric-lakehouse/references/getdata.md b/skills/fabric-lakehouse/references/getdata.md new file mode 100644 index 000000000..db952d80c --- /dev/null +++ b/skills/fabric-lakehouse/references/getdata.md @@ -0,0 +1,36 @@ +### Data Factory Integration + +Microsoft Fabric includes Data Factory for ETL/ELT orchestration: + +- **180+ connectors** for data sources +- **Copy activity** for data movement +- **Dataflow Gen2** for transformations +- **Notebook activity** for Spark processing +- **Scheduling** and triggers + +### Pipeline Activities + +| Activity | Description | +|----------|-------------| +| Copy Data | Move data between sources and Lakehouse | +| Notebook | Execute Spark notebooks | +| Dataflow | Run Dataflow Gen2 transformations | +| Stored Procedure | Execute SQL procedures | +| ForEach | Loop over items | +| If Condition | Conditional branching | +| Get Metadata | Retrieve file/folder metadata | +| Lakehouse Maintenance | Optimize and vacuum Delta tables | + +### Orchestration Patterns + +``` +Pipeline: Daily_ETL_Pipeline +├── Get Metadata (check for new files) +├── ForEach (process each file) +│ ├── Copy Data (bronze layer) +│ └── Notebook (silver transformation) +├── Notebook (gold aggregation) +└── Lakehouse Maintenance (optimize tables) +``` + +--- \ No newline at end of file diff --git a/skills/fabric-lakehouse/references/pyspark.md b/skills/fabric-lakehouse/references/pyspark.md new file mode 100644 index 000000000..b3ba419bd --- /dev/null +++ b/skills/fabric-lakehouse/references/pyspark.md @@ -0,0 +1,187 @@ +### Spark Configuration (Best Practices) + +```python +# Enable Fabric optimizations +spark.conf.set("spark.sql.parquet.vorder.enabled", "true") +spark.conf.set("spark.microsoft.delta.optimizeWrite.enabled", "true") +``` + +### Reading Data + +```python +# Read CSV file +df = spark.read.format("csv") \ + .option("header", "true") \ + .option("inferSchema", "true") \ + .load("Files/bronze/data.csv") + +# Read JSON file +df = spark.read.format("json").load("Files/bronze/data.json") + +# Read Parquet file +df = spark.read.format("parquet").load("Files/bronze/data.parquet") + +# Read Delta table +df = spark.read.format("delta").table("my_delta_table") + +# Read from SQL endpoint +df = spark.sql("SELECT * FROM lakehouse.my_table") +``` + +### Writing Delta Tables + +```python +# Write DataFrame as managed Delta table +df.write.format("delta") \ + .mode("overwrite") \ + .saveAsTable("silver_customers") + +# Write with partitioning +df.write.format("delta") \ + .mode("overwrite") \ + .partitionBy("year", "month") \ + .saveAsTable("silver_transactions") + +# Append to existing table +df.write.format("delta") \ + .mode("append") \ + .saveAsTable("silver_events") +``` + +### Delta Table Operations (CRUD) + +```python +# UPDATE +spark.sql(""" + UPDATE silver_customers + SET status = 'active' + WHERE last_login > '2024-01-01' +""") + +# DELETE +spark.sql(""" + DELETE FROM silver_customers + WHERE is_deleted = true +""") + +# MERGE (Upsert) +spark.sql(""" + MERGE INTO silver_customers AS target + USING staging_customers AS source + ON target.customer_id = source.customer_id + WHEN MATCHED THEN UPDATE SET * + WHEN NOT MATCHED THEN INSERT * +""") +``` + +### Schema Definition + +```python +from pyspark.sql.types import StructType, StructField, StringType, IntegerType, TimestampType, DecimalType + +schema = StructType([ + StructField("id", IntegerType(), False), + StructField("name", StringType(), True), + StructField("email", StringType(), True), + StructField("amount", DecimalType(18, 2), True), + StructField("created_at", TimestampType(), True) +]) + +df = spark.read.format("csv") \ + .schema(schema) \ + .option("header", "true") \ + .load("Files/bronze/customers.csv") +``` + +### SQL Magic in Notebooks + +```sql +%%sql +-- Query Delta table directly +SELECT + customer_id, + COUNT(*) as order_count, + SUM(amount) as total_amount +FROM gold_orders +GROUP BY customer_id +ORDER BY total_amount DESC +LIMIT 10 +``` + +### V-Order Optimization + +```python +# Enable V-Order for read optimization +spark.conf.set("spark.sql.parquet.vorder.enabled", "true") +``` + +### Table Optimization + +```sql +%%sql +-- Optimize table (compact small files) +OPTIMIZE silver_transactions + +-- Optimize with Z-ordering on query columns +OPTIMIZE silver_transactions ZORDER BY (customer_id, transaction_date) + +-- Vacuum old files (default 7 days retention) +VACUUM silver_transactions + +-- Vacuum with custom retention +VACUUM silver_transactions RETAIN 168 HOURS + +### Incremental Load Pattern + +```python +from pyspark.sql.functions import col, max as spark_max + +# Get last processed watermark +last_watermark = spark.sql(""" + SELECT MAX(processed_timestamp) as watermark + FROM silver_orders +""").collect()[0]["watermark"] + +# Load only new records +new_records = spark.read.format("delta") \ + .table("bronze_orders") \ + .filter(col("created_at") > last_watermark) + +# Merge new records +new_records.createOrReplaceTempView("staging_orders") +spark.sql(""" + MERGE INTO silver_orders AS target + USING staging_orders AS source + ON target.order_id = source.order_id + WHEN MATCHED THEN UPDATE SET * + WHEN NOT MATCHED THEN INSERT * +""") +``` + +### SCD Type 2 Pattern + +```python +from pyspark.sql.functions import current_timestamp, lit + +# Close existing records +spark.sql(""" + UPDATE dim_customer + SET is_current = false, end_date = current_timestamp() + WHERE customer_id IN (SELECT customer_id FROM staging_customer) + AND is_current = true +""") + +# Insert new versions +spark.sql(""" + INSERT INTO dim_customer + SELECT + customer_id, + name, + email, + address, + current_timestamp() as start_date, + null as end_date, + true as is_current + FROM staging_customer +""") +``` \ No newline at end of file From 6181395513be45577d0d7b03eeba3bbe5f60479c Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Mon, 16 Feb 2026 18:24:51 -0800 Subject: [PATCH 02/20] Update skills/fabric-lakehouse/references/pyspark.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/references/pyspark.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/references/pyspark.md b/skills/fabric-lakehouse/references/pyspark.md index b3ba419bd..ca1d45539 100644 --- a/skills/fabric-lakehouse/references/pyspark.md +++ b/skills/fabric-lakehouse/references/pyspark.md @@ -22,7 +22,7 @@ df = spark.read.format("json").load("Files/bronze/data.json") df = spark.read.format("parquet").load("Files/bronze/data.parquet") # Read Delta table -df = spark.read.format("delta").table("my_delta_table") +df = spark.read.table("my_delta_table") # Read from SQL endpoint df = spark.sql("SELECT * FROM lakehouse.my_table") From 86d4e770e3273519c97736c4aa1d446020bd05f0 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Mon, 16 Feb 2026 18:27:10 -0800 Subject: [PATCH 03/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index ccc32a96d..86c6945e1 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -94,7 +94,7 @@ Tables can also be optimized using OPTIMIZE command, which compacts small files ## Lineage -Lakehosue item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies. +Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies. ## PySpark Code Examples From cf91a6290023ca1b9002f60013603caf4ef66e0c Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Mon, 16 Feb 2026 18:27:31 -0800 Subject: [PATCH 04/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 86c6945e1..7433d195e 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -86,7 +86,7 @@ Shortcuts create virtual links to data without copying: ### V-Order Optimization -For faster data read with semantic model enable V-Order optimization on Delta tables.This presorts data in a way that improves query performance for common access patterns. +For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns. ### Table Optimization From 3f9e9b085e0375237a905a4552baf66b52dbf2aa Mon Sep 17 00:00:00 2001 From: Ted Vilutis Date: Tue, 17 Feb 2026 09:21:37 -0800 Subject: [PATCH 05/20] Update pyspark.md --- skills/fabric-lakehouse/references/pyspark.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/references/pyspark.md b/skills/fabric-lakehouse/references/pyspark.md index ca1d45539..08b920040 100644 --- a/skills/fabric-lakehouse/references/pyspark.md +++ b/skills/fabric-lakehouse/references/pyspark.md @@ -131,6 +131,8 @@ VACUUM silver_transactions -- Vacuum with custom retention VACUUM silver_transactions RETAIN 168 HOURS +``` + ### Incremental Load Pattern ```python @@ -184,4 +186,4 @@ spark.sql(""" true as is_current FROM staging_customer """) -``` \ No newline at end of file +``` From 46f49185c10e1ead0f43b9c507d44368b7924dcc Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:29:24 -0800 Subject: [PATCH 06/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 7433d195e..d6808f45b 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -63,7 +63,7 @@ Logical tables defined by a SQL query. They do not store data but provide a virt ### Item access or control plane security -User can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. User can also get access permission using sharing capabilities of Lakehouse. +Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse. ### Data access or OneLake Security From 15e245cf7957d542fce45dacd9acba95225d27d7 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:29:54 -0800 Subject: [PATCH 07/20] Update skills/fabric-lakehouse/references/pyspark.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/references/pyspark.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/references/pyspark.md b/skills/fabric-lakehouse/references/pyspark.md index 08b920040..9c68c9a21 100644 --- a/skills/fabric-lakehouse/references/pyspark.md +++ b/skills/fabric-lakehouse/references/pyspark.md @@ -55,7 +55,7 @@ df.write.format("delta") \ spark.sql(""" UPDATE silver_customers SET status = 'active' - WHERE last_login > '2024-01-01' + WHERE last_login > '2024-01-01' -- Example date, adjust as needed """) # DELETE From b1a9d7ca0ac49960779650a9cbd993855e1660b1 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:30:08 -0800 Subject: [PATCH 08/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index d6808f45b..7ecb6b2e3 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -9,7 +9,7 @@ metadata: # When to Use This Skill Use this skill when you need to: -- Generate document or explanation that includes definition and context about Fabric Lakehouse and its capabilities. +- Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities. - Design, build, and optimize Lakehouse solutions using best practices. - Understand the core concepts and components of a Lakehouse in Microsoft Fabric. - Learn how to manage tabular and non-tabular data within a Lakehouse. From d5d303b23e6198d64d0d2b1a5679619e32508c9f Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:30:21 -0800 Subject: [PATCH 09/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 7ecb6b2e3..e9b1dffdf 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -90,7 +90,7 @@ For faster data read with semantic model enable V-Order optimization on Delta ta ### Table Optimization -Tables can also be optimized using OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes. +Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes. ## Lineage From c61ffdfd8fb265c9d208ff5bdf9e333a00178c35 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:31:05 -0800 Subject: [PATCH 10/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index e9b1dffdf..3076a34f9 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -45,7 +45,7 @@ Tables can be internal, when data is stored under "Tables" folder or external, w ### Schemas for tables in a Lakehouse -When creating a lakehouse user can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under "Tables" folder and store tables inside of those folders. Default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. User can reference schema located in other lakehouse using Schema Shortcut that way referencing all tables with one shortcut that are at the destination schema. +When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut. ### Files in a Lakehouse From e0c7e411fd22c8096d9ee89e95fa697a2791d516 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:31:26 -0800 Subject: [PATCH 11/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 3076a34f9..d41c7bb3e 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -94,7 +94,7 @@ Tables can also be optimized using the OPTIMIZE command, which compacts small fi ## Lineage -Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies. +The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies. ## PySpark Code Examples From 5217b166261759ef8ea6765adb42a6764f853547 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:32:02 -0800 Subject: [PATCH 12/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index d41c7bb3e..94ae645f2 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -41,7 +41,7 @@ Lakehouse in Microsoft Fabric is an item that gives users a place to store their ### Tabular data in a Lakehouse Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats only available for Spark querying. -Tables can be internal, when data is stored under "Tables" folder or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Referencing tables are done through Shortcuts, which can be internal, pointing to other location in Fabric, or external pointing to data stored outside of Fabric. +Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric). ### Schemas for tables in a Lakehouse From c789c498f819487af2f88cc04f946fe7603dbee2 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:32:28 -0800 Subject: [PATCH 13/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 94ae645f2..8ecb57566 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -40,7 +40,7 @@ Lakehouse in Microsoft Fabric is an item that gives users a place to store their ### Tabular data in a Lakehouse -Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats only available for Spark querying. +Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric). ### Schemas for tables in a Lakehouse From 6707f34db2446a8c22983c9f5c71e3f360087f41 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:32:43 -0800 Subject: [PATCH 14/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 8ecb57566..36fbeccd9 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -20,7 +20,7 @@ Use this skill when you need to: ### What is a Lakehouse? -Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular, like tables, and non-tabular, like files, data. It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides: +Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides: - **Unified storage** in OneLake for structured and unstructured data - **Delta Lake format** for ACID transactions, versioning, and time travel From c8d171875ef6f1ab4d12b6915309c928f4643bb7 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:47:58 -0800 Subject: [PATCH 15/20] Refine description of Fabric Lakehouse skill Updated the description to provide clearer context and details about the Fabric Lakehouse skill, including its features and support for users. --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 36fbeccd9..33f3ed3c3 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -1,6 +1,6 @@ --- name: fabric-lakehouse -description: 'Provide definition and context about Fabric Lakehouse and its capabilities for software systems and AI-powered features. Help users design, build, and optimize Lakehouse solutions using best practices.' +description: 'Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples. This skill supports users in designing, building, and optimizing Lakehouse solutions using best practices.' metadata: author: tedvilutis version: "1.0" From 41b34b1bb24e952c43208d657ada91c659c759c7 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:53:52 -0800 Subject: [PATCH 16/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 33f3ed3c3..a5a13fc1e 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -53,7 +53,7 @@ Files are stored under "Files" folder. Users can create folders and subfolders t ### Fabric Materialized Views -Set of pre-computed tables that are automatically updated based on schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL stored in associated Notebook. +Set of pre-computed tables that are automatically updated based on schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook. ### Spark Views From 4b7ad7108647cd32d28bdc8c2f8bd41a4f6fae0b Mon Sep 17 00:00:00 2001 From: Ted Vilutis Date: Tue, 17 Feb 2026 10:55:46 -0800 Subject: [PATCH 17/20] Update README.skills.md --- docs/README.skills.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.skills.md b/docs/README.skills.md index e08df43f7..f1d01853c 100644 --- a/docs/README.skills.md +++ b/docs/README.skills.md @@ -35,7 +35,7 @@ Skills differ from other primitives by supporting bundled assets (scripts, code | [copilot-sdk](../skills/copilot-sdk/SKILL.md) | Build agentic applications with GitHub Copilot SDK. Use when embedding AI agents in apps, creating custom tools, implementing streaming responses, managing sessions, connecting to MCP servers, or creating custom agents. Triggers on Copilot SDK, GitHub SDK, agentic app, embed Copilot, programmable agent, MCP server, custom agent. | None | | [create-web-form](../skills/create-web-form/SKILL.md) | Create robust, accessible web forms with best practices for HTML structure, CSS styling, JavaScript interactivity, form validation, and server-side processing. Use when asked to "create a form", "build a web form", "add a contact form", "make a signup form", or when building any HTML form with data handling. Covers PHP and Python backends, MySQL database integration, REST APIs, XML data exchange, accessibility (ARIA), and progressive web apps. | `references/accessibility.md`
`references/aria-form-role.md`
`references/css-styling.md`
`references/form-basics.md`
`references/form-controls.md`
`references/form-data-handling.md`
`references/html-form-elements.md`
`references/html-form-example.md`
`references/hypertext-transfer-protocol.md`
`references/javascript.md`
`references/php-cookies.md`
`references/php-forms.md`
`references/php-json.md`
`references/php-mysql-database.md`
`references/progressive-web-app.md`
`references/python-as-web-framework.md`
`references/python-contact-form.md`
`references/python-flask-app.md`
`references/python-flask.md`
`references/security.md`
`references/styling-web-forms.md`
`references/web-api.md`
`references/web-performance.md`
`references/xml.md` | | [excalidraw-diagram-generator](../skills/excalidraw-diagram-generator/SKILL.md) | Generate Excalidraw diagrams from natural language descriptions. Use when asked to "create a diagram", "make a flowchart", "visualize a process", "draw a system architecture", "create a mind map", or "generate an Excalidraw file". Supports flowcharts, relationship diagrams, mind maps, and system architecture diagrams. Outputs .excalidraw JSON files that can be opened directly in Excalidraw. | `references/element-types.md`
`references/excalidraw-schema.md`
`scripts/.gitignore`
`scripts/README.md`
`scripts/add-arrow.py`
`scripts/add-icon-to-diagram.py`
`scripts/split-excalidraw-library.py`
`templates/business-flow-swimlane-template.excalidraw`
`templates/class-diagram-template.excalidraw`
`templates/data-flow-diagram-template.excalidraw`
`templates/er-diagram-template.excalidraw`
`templates/flowchart-template.excalidraw`
`templates/mindmap-template.excalidraw`
`templates/relationship-template.excalidraw`
`templates/sequence-diagram-template.excalidraw` | -| [fabric-lakehouse](../skills/fabric-lakehouse/SKILL.md) | Provide definition and context about Fabric Lakehouse and its capabilities for software systems and AI-powered features. Help users design, build, and optimize Lakehouse solutions using best practices. | `references/getdata.md`
`references/pyspark.md` | +| [fabric-lakehouse](../skills/fabric-lakehouse/SKILL.md) | Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples. This skill supports users in designing, building, and optimizing Lakehouse solutions using best practices. | `references/getdata.md`
`references/pyspark.md` | | [finnish-humanizer](../skills/finnish-humanizer/SKILL.md) | Detect and remove AI-generated markers from Finnish text, making it sound like a native Finnish speaker wrote it. Use when asked to "humanize", "naturalize", or "remove AI feel" from Finnish text, or when editing .md/.txt files containing Finnish content. Identifies 26 patterns (12 Finnish-specific + 14 universal) and 4 style markers. | `references/patterns.md` | | [gh-cli](../skills/gh-cli/SKILL.md) | GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line. | None | | [git-commit](../skills/git-commit/SKILL.md) | Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping | None | From 178fed8bb17b06708c8ed8486105967d39616649 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 10:59:51 -0800 Subject: [PATCH 18/20] Update skills/fabric-lakehouse/references/pyspark.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/references/pyspark.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/references/pyspark.md b/skills/fabric-lakehouse/references/pyspark.md index 9c68c9a21..8eae36e4f 100644 --- a/skills/fabric-lakehouse/references/pyspark.md +++ b/skills/fabric-lakehouse/references/pyspark.md @@ -136,7 +136,7 @@ VACUUM silver_transactions RETAIN 168 HOURS ### Incremental Load Pattern ```python -from pyspark.sql.functions import col, max as spark_max +from pyspark.sql.functions import col # Get last processed watermark last_watermark = spark.sql(""" From 0de738c30c426f85a54cfa51c3d1ebfaddd1c478 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 11:18:28 -0800 Subject: [PATCH 19/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index a5a13fc1e..729caef53 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -53,7 +53,7 @@ Files are stored under "Files" folder. Users can create folders and subfolders t ### Fabric Materialized Views -Set of pre-computed tables that are automatically updated based on schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook. +Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook. ### Spark Views From 3b907f7748b078ab5d130f45e94cc1df1dbdf2e9 Mon Sep 17 00:00:00 2001 From: Ted Vilutis <69260340+tedvilutis@users.noreply.github.com> Date: Tue, 17 Feb 2026 11:18:43 -0800 Subject: [PATCH 20/20] Update skills/fabric-lakehouse/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- skills/fabric-lakehouse/SKILL.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/skills/fabric-lakehouse/SKILL.md b/skills/fabric-lakehouse/SKILL.md index 729caef53..4227990a3 100644 --- a/skills/fabric-lakehouse/SKILL.md +++ b/skills/fabric-lakehouse/SKILL.md @@ -77,7 +77,7 @@ Shortcuts create virtual links to data without copying: ### Types of Shortcuts - **Internal**: Link to other Fabric Lakehouses/tables, cross-workspace data sharing -- **ADLS Gen2**: Azure Data Lake Storage Gen2 external Azure storage +- **ADLS Gen2**: Link to ADLS Gen2 containers in Azure - **Amazon S3**: AWS S3 buckets, cross-cloud data access - **Dataverse**: Microsoft Dataverse, business application data - **Google Cloud Storage**: GCS buckets, cross-cloud data access