From efda01284a979f7d30b229968f965462c882a546 Mon Sep 17 00:00:00 2001
From: Rituparna Khaund <ritukhau@amazon.co.uk>
Date: Fri, 29 May 2026 22:34:57 +0000
Subject: [PATCH 1/2] s3: document format=parquet option and page-level
 compression

Update S3 output plugin documentation to reflect the new format=parquet
option that separates output format selection from byte-level compression.

Documents:
- New parquet value for the format option
- Page-level compression codec control via compression when format is
  parquet
- Migration path from deprecated compression=parquet syntax
- Configuration examples with and without page-level compression
- Updated existing parquet examples to use new syntax

Related code PR: https://github.com/fluent/fluent-bit/pull/11885

Signed-off-by: Rituparna Khaund <ritukhau@amazon.co.uk>
---
 pipeline/outputs/s3.md | 91 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 86 insertions(+), 5 deletions(-)

diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md
index 74046e0e0..208c0272f 100644
--- a/pipeline/outputs/s3.md
+++ b/pipeline/outputs/s3.md
@@ -46,12 +46,12 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor
 | `blob_database_file` | Absolute path to a database file to be used to store blob files contexts. | _none_ |
 | `bucket` | S3 bucket name. | _none_ |
 | `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
-| `compression` | Compression type for S3 objects. Supported values: `gzip`, `zstd`, `snappy`. `arrow` and `parquet` are also available if Apache Arrow was enabled at compile time. See [Compression](#compression). | _none_                                    |
+| `compression` | Compression type for S3 objects. Supported values: `gzip`, `zstd`, `snappy`, `arrow`. When `format` is set to `parquet`, this controls the page-level codec inside the Parquet file (supported: `snappy`, `zstd`, `gzip`). `compression=parquet` is deprecated; use `format parquet` instead. See [Compression](#compression). | _none_                                    |
 | `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
 | `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
 | `external_id` | Specify an external ID for the STS API. Can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
 | `file_delivery_attempt_limit` | File delivery attempt limit. | `1` |
-| `format` | Set the record output format. Supported values: `json_lines`, `otlp_json`. When set to `otlp_json`, the `log_key` option isn't supported and only `logs` event chunks are converted. | `json_lines` |
+| `format` | Set the output format. Supported values: `json_lines`, `otlp_json`, `parquet`. When set to `parquet`, records are converted to Apache Parquet columnar format (requires Apache Arrow Parquet support at compile time). The `compression` option controls the page-level codec inside the Parquet file. When set to `otlp_json`, the `log_key` option isn't supported and only `logs` event chunks are converted. | `json_lines` |
 | `host` | IP address or hostname of the target HTTP server. | `127.0.0.1` |
 | `json_date_format` | Specify the format of the date. Accepted values: `double`, `epoch`, `epoch_ms`, `iso8601` (2018-05-30T09:39:52.000681Z), `_java_sql_timestamp_` (2018-05-30 09:39:52.000681). | _none_ |
 | `json_date_key` | Specify the name of the date key in the output record. To disable the time key, set the value to `false`. | `date` |
@@ -128,6 +128,85 @@ Fluent Bit compresses data before uploading to S3. Consumers must decompress the
 
 {% endhint %}
 
+## Parquet format
+
+Setting `format` to `parquet` converts log records to Apache Parquet columnar format before uploading to S3. Parquet files are directly queryable by Athena, Spark, and Presto without additional transformation.
+
+The `compression` option controls the page-level codec applied inside the Parquet file:
+
+| `compression` value | Parquet page codec | Notes |
+|---------------------|-------------------|-------|
+| `snappy` | Snappy | Fast, moderate compression ratio. Industry standard default. |
+| `zstd` | Zstandard | Better ratio, slightly slower. |
+| `gzip` | Gzip | Best ratio, slowest. |
+| _(unset)_ | Uncompressed | No page-level compression. |
+
+{% hint style="info" %}
+
+`format parquet` requires `use_put_object On`. Multipart uploads are not supported with Parquet format.
+
+{% endhint %}
+
+### Example: Parquet with Snappy compression
+
+```yaml
+pipeline:
+  outputs:
+    - name: s3
+      match: '*'
+      bucket: my-bucket
+      region: us-east-1
+      format: parquet
+      compression: snappy
+      use_put_object: on
+      upload_timeout: 60s
+      total_file_size: 50M
+      s3_key_format: '/logs/dt=%Y-%m-%d/h=%H/$UUID.parquet'
+```
+
+### Example: Parquet without page-level compression
+
+```yaml
+pipeline:
+  outputs:
+    - name: s3
+      match: '*'
+      bucket: my-bucket
+      region: us-east-1
+      format: parquet
+      use_put_object: on
+      upload_timeout: 60s
+      s3_key_format: '/logs/dt=%Y-%m-%d/h=%H/$UUID.parquet'
+```
+
+### Migrating from `compression=parquet`
+
+The `compression=parquet` syntax is deprecated. To migrate:
+
+**Before (deprecated):**
+
+```yaml
+compression: parquet
+```
+
+**After (recommended):**
+
+```yaml
+format: parquet
+compression: snappy
+```
+
+The deprecated syntax continues to work but produces Parquet files with uncompressed pages and emits a warning at startup.
+
+### Build requirements
+
+Parquet format requires Apache Arrow Parquet support at compile time:
+
+- CMake flag: `-DFLB_ARROW=On`
+- System packages: `arrow-glib-devel` and `parquet-glib-devel`
+
+The `AWS for Fluent Bit` version 3 container image includes these dependencies by default.
+
 ## Permissions
 
 The plugin requires the following AWS IAM permissions:
@@ -694,7 +773,7 @@ pipeline:
 {% endtab %}
 {% endtabs %}
 
-Setting `Compression` to `arrow` makes Fluent Bit convert payload into Apache Arrow format.
+Setting `compression` to `arrow` converts the payload to Apache Arrow (Feather) format. For Parquet output, use `format parquet` instead.
 
 Load, analyze, and process stored data using popular data processing tools such as Python pandas, Apache Spark and Tensorflow.
 
@@ -766,7 +845,8 @@ pipeline:
       region:          us-east-2
       bucket:          <your_testing_bucket>
       use_put_object:  On
-      compression:     parquet
+      format:          parquet
+      compression:     snappy
       # other parameters
 ```
 
@@ -791,7 +871,8 @@ pipeline:
     Region          us-east-2
     Bucket          <your_testing_bucket>
     Use_Put_Object    On
-    Compression     parquet
+    Format          parquet
+    Compression     snappy
     # other parameters
 ```
 

From 546958f8d9386f5ca29410ffee3509d3f6f1a77e Mon Sep 17 00:00:00 2001
From: "Eric D. Schabell" <eric@schabell.org>
Date: Mon, 1 Jun 2026 08:26:24 +0200
Subject: [PATCH 2/2] docs: pipeline: outputs: s3: fix Vale spelling and
 contraction suggestions

  - Wrap `codec` in backticks for linting issue
  - Replace "are not" with "aren't" for linting isse

  Applies to #2591

Signed-off-by: Eric D. Schabell <eric@schabell.org>
---
 pipeline/outputs/s3.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/pipeline/outputs/s3.md b/pipeline/outputs/s3.md
index 208c0272f..7dd2023b5 100644
--- a/pipeline/outputs/s3.md
+++ b/pipeline/outputs/s3.md
@@ -46,12 +46,12 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor
 | `blob_database_file` | Absolute path to a database file to be used to store blob files contexts. | _none_ |
 | `bucket` | S3 bucket name. | _none_ |
 | `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
-| `compression` | Compression type for S3 objects. Supported values: `gzip`, `zstd`, `snappy`, `arrow`. When `format` is set to `parquet`, this controls the page-level codec inside the Parquet file (supported: `snappy`, `zstd`, `gzip`). `compression=parquet` is deprecated; use `format parquet` instead. See [Compression](#compression). | _none_                                    |
+| `compression` | Compression type for S3 objects. Supported values: `gzip`, `zstd`, `snappy`, `arrow`. When `format` is set to `parquet`, this controls the page-level `codec` inside the Parquet file (supported: `snappy`, `zstd`, `gzip`). `compression=parquet` is deprecated; use `format parquet` instead. See [Compression](#compression). | _none_                                    |
 | `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
 | `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
 | `external_id` | Specify an external ID for the STS API. Can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
 | `file_delivery_attempt_limit` | File delivery attempt limit. | `1` |
-| `format` | Set the output format. Supported values: `json_lines`, `otlp_json`, `parquet`. When set to `parquet`, records are converted to Apache Parquet columnar format (requires Apache Arrow Parquet support at compile time). The `compression` option controls the page-level codec inside the Parquet file. When set to `otlp_json`, the `log_key` option isn't supported and only `logs` event chunks are converted. | `json_lines` |
+| `format` | Set the output format. Supported values: `json_lines`, `otlp_json`, `parquet`. When set to `parquet`, records are converted to Apache Parquet columnar format (requires Apache Arrow Parquet support at compile time). The `compression` option controls the page-level `codec` inside the Parquet file. When set to `otlp_json`, the `log_key` option isn't supported and only `logs` event chunks are converted. | `json_lines` |
 | `host` | IP address or hostname of the target HTTP server. | `127.0.0.1` |
 | `json_date_format` | Specify the format of the date. Accepted values: `double`, `epoch`, `epoch_ms`, `iso8601` (2018-05-30T09:39:52.000681Z), `_java_sql_timestamp_` (2018-05-30 09:39:52.000681). | _none_ |
 | `json_date_key` | Specify the name of the date key in the output record. To disable the time key, set the value to `false`. | `date` |
@@ -132,9 +132,9 @@ Fluent Bit compresses data before uploading to S3. Consumers must decompress the
 
 Setting `format` to `parquet` converts log records to Apache Parquet columnar format before uploading to S3. Parquet files are directly queryable by Athena, Spark, and Presto without additional transformation.
 
-The `compression` option controls the page-level codec applied inside the Parquet file:
+The `compression` option controls the page-level `codec` applied inside the Parquet file:
 
-| `compression` value | Parquet page codec | Notes |
+| `compression` value | Parquet page `codec` | Notes |
 |---------------------|-------------------|-------|
 | `snappy` | Snappy | Fast, moderate compression ratio. Industry standard default. |
 | `zstd` | Zstandard | Better ratio, slightly slower. |
@@ -143,7 +143,7 @@ The `compression` option controls the page-level codec applied inside the Parque
 
 {% hint style="info" %}
 
-`format parquet` requires `use_put_object On`. Multipart uploads are not supported with Parquet format.
+`format parquet` requires `use_put_object On`. Multipart uploads aren't supported with Parquet format.
 
 {% endhint %}