Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Fixes #ISSUE_Number
- [ ] Followed [contribution guide](https://cloudberry.apache.org/contribute/code)
- [ ] Added/updated documentation
- [ ] Reviewed code for security implications
- [ ] This PR contains AI-assisted code generation
- [ ] Requested review from [cloudberry committers](https://github.com/orgs/apache/teams/cloudberry-committers)

### Additional Context
Expand Down
4 changes: 4 additions & 0 deletions .gitmessage
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ Add your commit body here
# Discussions, please list them as a reference:
#See: Issue#id <https://github.com/apache/cloudberry/issues/?>?
#See: Discussion#id <http://github.com/apache/cloudberry/discussions/>?
# If AI tools substantially assisted in writing this commit, optionally
# note which tool(s) were used (one line per tool):
#Assisted-by: ChatGPT
#Assisted-by: GitHub Copilot
########################################################################
#
#
Expand Down
297 changes: 297 additions & 0 deletions AGENTS.md.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# AGENTS.md

Guidance for agent-style coding tools working in the Apache
Cloudberry repository.

## Project overview

Apache Cloudberry is an Apache Incubator project and an
open-source massively parallel processing database. It evolved
from Greenplum Database and is built on a modern PostgreSQL
kernel. It is used for data warehouse, large-scale analytics,
and AI or ML workloads.

Treat this repository as a database system, not as a typical
application project. Small changes can affect SQL semantics,
query planning, storage, distributed execution, management
tooling, upgrade behavior, and user data safety.

## Core principles for agents

- Keep changes as small and direct as possible.
- Do not perform broad code refactoring. Cloudberry's core is
PostgreSQL-based, and unnecessary refactoring makes familiar
code harder for maintainers to recognize and review.
- Preserve PostgreSQL and Cloudberry coding style in the area
being edited.
- Prefer localized fixes over architecture rewrites unless
explicitly requested.
- Read surrounding code before editing. Match existing naming,
memory management, error handling, locking, and test
patterns.
- Do not generate or import code with incompatible licensing.
The project is Apache License 2.0.
- Never treat AI output as automatically correct. The
contributor owns the final code.

## Repository map

- [README.md](README.md) — project introduction, community
links, contribution overview, and license information.
- [CONTRIBUTING.md](CONTRIBUTING.md) — contribution
expectations and community guidance.
- [AI_GUIDELINE.md](AI_GUIDELINE.md) — rules for AI-assisted
development.
- [SECURITY.md](SECURITY.md) — security reporting policy.
- [.gitmessage](.gitmessage) — commit message template with
title, body, and trailer conventions.
- [.github/pull_request_template.md](.github/pull_request_template.md)
— PR checklist, test plan, impact, and AI disclosure
checkbox.
- [src/](src/) — database source tree, including
PostgreSQL-derived backend, frontend utilities, interfaces,
tests, and build integration.
- [src/backend/](src/backend/) — main database backend.
Important areas include parser, optimizer, executor,
storage, catalog, commands, postmaster, replication, and
Cloudberry distributed components.
- [src/backend/cdb/](src/backend/cdb/) — distributed database
logic, including dispatch, gangs, motion, and MPP behavior.
- [src/backend/gporca/](src/backend/gporca/) and
[src/backend/gpopt/](src/backend/gpopt/) — ORCA top-down optimizer
integration and optimizer-related code.
- [src/common/](src/common/) — code shared by backend and
frontend utilities.
- [src/interfaces/](src/interfaces/) — client interfaces such
as libpq, ECPG, and GPPC.
- [src/test/](src/test/) — regression, isolation, unit, and
integration test infrastructure.
- [gpMgmt/](gpMgmt/) — Python management utilities and
cluster administration tooling.
- [gpAux/](gpAux/) — auxiliary scripts, demo cluster support,
packaging, and build helpers.
- [gpcontrib/](gpcontrib/) — Cloudberry-related extensions and
contributed modules.
- [contrib/](contrib/) — PostgreSQL-style contributed modules
and Cloudberry-specific extensions.
- [doc/](doc/) — SGML documentation sources.
- [devops/](devops/) — Docker, automation, sandbox, and
build/deployment helper scripts.
- [mcp-server/](mcp-server/) — MCP server for AI-ready
Cloudberry database interaction.

## Architecture notes

Cloudberry follows a PostgreSQL-style source layout with
additional MPP database components inherited from Greenplum.
The coordinator receives SQL, plans or optimizes it, dispatches
work to segments, and collects results. Segment processes
execute distributed pieces of the plan and interact through the
interconnect.

Key concepts agents should recognize:

- Coordinator and segments are separate roles in a distributed
database cluster.
- Query execution may involve dispatch, gangs, motion nodes,
distributed transactions, snapshots, and interconnect
behavior.
- Storage and catalog changes can affect upgrade, recovery,
visibility, and distributed consistency.
- PostgreSQL compatibility matters. Avoid changing behavior
that is inherited from PostgreSQL unless the task explicitly
targets Cloudberry divergence.
- Extensions under [gpcontrib/](gpcontrib/) and
[contrib/](contrib/) may have independent build or test
workflows.

## Working rules

1. Start by identifying the subsystem and reading nearby
files, tests, and documentation.
2. Prefer existing helpers, macros, memory contexts, error
reporting conventions, and test infrastructure.
3. Avoid unrelated formatting changes.
4. Avoid renaming symbols or moving files unless explicitly
required.
5. Do not silently change SQL-visible behavior, catalog
definitions, on-disk format, wire protocol, GUC behavior,
or user-facing messages.
6. If a change touches security-sensitive areas, call that out
clearly in the PR description and request appropriate human
review.
7. If a change touches distributed execution, verify whether
it affects both coordinator and segment behavior.
8. If a change touches management scripts, check Python
compatibility and existing unit or behave tests.
9. If a change touches documentation, keep examples accurate
and consistent with project terminology.
10. If behavior is uncertain, add a small regression or unit
test rather than relying on assumptions.

## Build and test guidance

Use the smallest relevant validation first, then broader
validation when the change is ready.

Common validation entry points mentioned by project docs and
PR templates:

- Configure and build through the repository's standard build
flow or the automation in
[devops/README.md](devops/README.md).
- Use Docker-based development and sandbox workflows under
[devops/](devops/) when local system dependencies are not
available.
- Run `make installcheck` for regression coverage when
appropriate.
- Run `make -C src/test installcheck-cbdb-parallel` for
Cloudberry parallel regression coverage when appropriate.
- For extension-specific changes, run the extension's local
installcheck or documented test target.
- For management tooling under [gpMgmt/](gpMgmt/), inspect
the relevant README and test targets before selecting a test
command.

Do not invent successful test results. If tests are not run,
state that clearly in the final response or PR notes.

## AI-assisted contribution policy

Follow [AI_GUIDELINE.md](AI_GUIDELINE.md):

- AI-generated code has the same responsibility and quality
bar as human-written code.
- AI-assisted changes must pass normal review, testing, and CI
standards.
- The contributor must ensure license compatibility.
- Significant AI-generated code should be disclosed using the
PR template checkbox and optionally recorded with an
`Assisted-by:` trailer in the commit message.
- AI tools may assist with drafting responses, but
contributors should engage thoughtfully and personally with
reviewers.
- Include or verify tests for AI-generated code.
- Keep changes simple and avoid meaningless code refactoring.

## Security policy

Follow [SECURITY.md](SECURITY.md):

- Do not report security vulnerabilities in public issues,
public mailing lists, or public forums.
- Send vulnerability reports to security@apache.org.
- For normal non-security bugs, use GitHub Issues,
Discussions, the dev mailing list, or Slack.

When working as an agent, do not expose secrets, credentials,
private keys, database dumps with sensitive data, or
vulnerability details in public-facing output.

## Pull request expectations

Use [.github/pull_request_template.md](.github/pull_request_template.md)
as the checklist for final change summaries:

- Explain what the PR does.
- Identify the type of change.
- Document breaking changes if any.
- Provide a test plan.
- Describe performance, user-facing, and dependency impact
when applicable.
- Confirm documentation updates when needed.
- Confirm security review consideration.
- Disclose significant AI-assisted code generation.

Comment thread
leborchuk marked this conversation as resolved.
## Commit conventions

- Add the standard Apache License header for newly created
files (not needed for third-party files).
- When drafting the commit message, use the
[.gitmessage](.gitmessage) template as a reference.
- Start the title with a prefix indicating the change type:
`Fix ...` for bug or typo fixes, `Feature: ...` for new
features, `Enhancement: ...` for code optimization,
`Doc: ...` for documentation changes. For other changes,
start with an imperative uppercase verb.
- Keep the title line to 50 characters or fewer. Do not end
it with a period.
- Leave a blank line between the title and the body.
- In the body, explain *what*, *why*, and *how*. Note any
compatibility issues. Wrap lines at 72 characters.
- Use optional trailers as needed: `Co-authored-by:`,
`Reported-by:`, `See:` (for GitHub Issues or Discussions
links), and `Assisted-by:` (for AI tool attribution).

## Style expectations

- C code should follow the surrounding PostgreSQL or
Cloudberry style.
- Python code in [gpMgmt/](gpMgmt/) should follow nearby
management script patterns and existing test style.
- SQL tests should include expected output files when required
by the test framework.
- Documentation uses Markdown in many repository files and
SGML under [doc/src/sgml/](doc/src/sgml/).
- Prefer project terminology: Apache Cloudberry, coordinator,
segment, MPP, PostgreSQL kernel, Greenplum heritage.

## High-risk areas

Be especially conservative around:

- Catalog definitions and upgrade-sensitive files.
- Storage formats, WAL, recovery, transactions, snapshots,
and visibility.
- Planner, optimizer, executor, and motion/distributed
execution logic.
- Authentication, cryptography, TLS, network protocol, and
libpq behavior.
- Interconnect and dispatch paths.
- Cluster management commands that start, stop, expand,
recover, or reconfigure clusters.
- Public SQL behavior, GUCs, system views, and extension APIs.

## Recommended agent workflow

1. Restate the requested change in concrete terms.
2. Locate the smallest relevant subsystem.
3. Read nearby implementation and tests.
4. Plan a minimal change.
5. Edit only files required for the task.
6. Add or update tests when behavior changes.
7. Run the narrowest relevant tests available.
8. Summarize changed files, test results, and any risks or
follow-ups.

## What not to do

- Do not perform drive-by cleanup.
- Do not reformat unrelated code.
- Do not replace established PostgreSQL-style patterns with
modern alternatives just for preference.
- Do not change public behavior without tests and
documentation.
- Do not assume single-node behavior is enough for distributed
database changes.
- Do not fabricate command output, test results, issue links,
or reviewer decisions.
Loading
Loading