Skip to content

Commit 176fc92

Browse files
committed
modify Readme for ivydocumentdb
1 parent d679c2a commit 176fc92

1 file changed

Lines changed: 188 additions & 87 deletions

File tree

README.md

Lines changed: 188 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,19 @@
11
# Introduction
22

3-
`DocumentDB` is a MongoDB compatible open source document database built on PostgreSQL. It offers a native implementation of a document-oriented NoSQL database, enabling seamless CRUD (Create, Read, Update, Delete) operations on BSON(Binary JSON) data types within a PostgreSQL framework. Beyond basic operations, DocumentDB empowers users to execute complex workloads, including full-text searches, geospatial queries, and vector search, delivering robust functionality and flexibility for diverse data management needs.
3+
`ivydocumentdb` is an open-source project developed based on Microsoft DocumentDB and compatible with IvorySQL. It offers a native implementation of document-oriented NoSQL database, enabling seamless CRUD (Create, Read, Update, Delete) operations on BSON(Binary JSON) data types within an IvorySQL framework. Beyond basic operations, ivydocumentdb empowers you to execute complex workloads, including full-text searches, geospatial queries, and vector embeddings on your dataset, delivering robust functionality and flexibility for diverse data management needs.
4+
5+
[IvorySQL](https://docs.ivorysql.org/en/ivorysql-doc) is advanced, fully featured, open source Oracle compatible PostgreSQL with a firm commitment to always remain 100% compatible and a Drop-in replacement of the latest PostgreSQL.
46

57
[PostgreSQL](https://www.postgresql.org/about/) is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.
68

9+
[DocumentDB](https://github.com/documentdb/documentdb) is the engine powering vCore-based Azure Cosmos DB for MongoDB.
10+
711
## Components
812

9-
The project comprises of three components, which work together to support document operations.
13+
The project comprises of two primary components, which work together to support document operations.
1014

1115
- **pg_documentdb_core :** PostgreSQL extension introducing BSON datatype support and operations for native Postgres.
1216
- **pg_documentdb :** The public API surface for DocumentDB providing CRUD functionality on documents in the store.
13-
- **pg_documentdb_gw :** The gateway protocol translation layer that converts the user's MongoDB APIs into PostgreSQL queries.
1417

1518

1619
## Why DocumentDB ?
@@ -31,140 +34,238 @@ We chose PostgreSQL as our platform for several reasons:
3134

3235
## Get Started
3336

34-
### Prerequisites
35-
- Python 3.7+
36-
- pip package manager
37-
- Docker
38-
- Git (for cloning the repository)
37+
### Pre-requisite
38+
39+
- Ensure [Docker](https://docs.docker.com/engine/install/) is installed on your system.
3940

40-
Step 1: Install Python
41+
### Building DocumentDB with Docker
42+
43+
Step 1: Clone the ivydocumentdb repo.
4144

4245
```bash
46+
git clone https://github.com/ivorysql/ivydocumentdb.git
47+
```
4348

44-
pip install pymongo
49+
Step 2: Create the docker image. Navigate to cloned repo.
4550

51+
```bash
52+
docker build . -f .devcontainer/Dockerfile -t ivydocumentdb
4653
```
4754

48-
Step 2. Install optional dependencies
55+
Note: Validate using `docker image ls`
56+
57+
Step 3: Run the Image as a container
4958

5059
```bash
60+
docker run -v $(pwd):/home/documentdb/code -it ivydocumentdb /bin/bash
61+
62+
cd code
63+
```
5164

52-
pip install dnspython
65+
(Aligns local location with docker image created, allows de-duplicating cloning repo again within image).<br>
66+
Note: Validate container is running `docker container ls`
5367

68+
Step 4: Build & Deploy the binaries
69+
70+
```bash
71+
make
5472
```
5573

56-
Step 3. Setup DocumentDB using Docker
74+
Note: Run in case of an unsuccessful build `git config --global --add safe.directory /home/documentdb/code` within image.
5775

5876
```bash
77+
sudo make install
78+
```
79+
80+
Note: To run backend postgresql tests after installing you can run `make check`.
5981

60-
# Pull the latest DocumentDB Docker image
61-
docker pull ghcr.io/microsoft/documentdb/documentdb-local:latest
82+
You are all set to work with DocumentDB.
83+
84+
### Connecting to the Server
85+
#### Internal Access
86+
Step 1: Run `start_oss_server.sh` to initialize the DocumentDB server and manage dependencies.
87+
88+
```bash
89+
./scripts/start_oss_server.sh
90+
```
6291

63-
# Tag the image for convenience
64-
docker tag ghcr.io/microsoft/documentdb/documentdb-local:latest documentdb
92+
Or logging into the container if using prebuild image
93+
```bash
94+
docker exec -it <container-id> bash
95+
```
6596

66-
# Run the container with your chosen username and password
67-
docker run -dt -p 10260:10260 --name documentdb-container documentdb --username <YOUR_USERNAME> --password <YOUR_PASSWORD>
68-
docker image rm -f ghcr.io/microsoft/documentdb/documentdb-local:latest || echo "No existing documentdb image to remove"
97+
Step 2: Connect to `psql` shell
6998

99+
```bash
100+
psql -p 9712 -d postgres
70101
```
71102

72-
> **Note:** During the transition to the Linux Foundation, Docker images may still be hosted on Microsoft's container registry. These will be migrated to the new DocumentDB organization as the transition completes.
73-
> **Note:** Replace `<YOUR_USERNAME>` and `<YOUR_PASSWORD>` with your desired credentials. You must set these when creating the container for authentication to work.
74-
>
75-
> **Port Note:** Port `10260` is used by default in these instructions to avoid conflicts with other local database services. You can use port `27017` (the standard MongoDB port) or any other available port if you prefer. If you do, be sure to update the port number in both your `docker run` command and your connection string accordingly.
103+
#### External Access
104+
Connect to `psql` shell
76105

77-
Step 4: Initialize the pymongo client with the credentials from the previous step
106+
```bash
107+
psql -h localhost --port 9712 -d postgres -U documentdb
108+
```
78109

79-
```python
110+
## Usage
80111

81-
import pymongo
112+
Once you have your `DocumentDB` set up running, you can start with creating collections, indexes and perform queries on them.
82113

83-
from pymongo import MongoClient
114+
### Create a collection
84115

85-
# Create a MongoDB client and open a connection to DocumentDB
86-
client = pymongo.MongoClient(
87-
'mongodb://<YOUR_USERNAME>:<YOUR_PASSWORD>@localhost:10260/?tls=true&tlsAllowInvalidCertificates=true'
88-
)
116+
DocumentDB provides [documentdb_api.create_collection](https://github.com/microsoft/documentdb/wiki/Functions#create_collection) function to create a new collection within a specified database, enabling you to manage and organize your BSON documents effectively.
89117

118+
```sql
119+
SELECT documentdb_api.create_collection('documentdb','patient');
90120
```
91121

92-
Step 5: Create a database and collection
122+
### Perform CRUD operations
93123

94-
```python
124+
#### Insert documents
95125

96-
quickStartDatabase = client["quickStartDatabase"]
97-
quickStartCollection = quickStartDatabase.create_collection("quickStartCollection")
126+
The [documentdb_api.insert_one](https://github.com/microsoft/documentdb/wiki/Functions#insert_one) command is used to add a single document into a collection.
98127

128+
```sql
129+
select documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P001", "name": "Alice Smith", "age": 30, "phone_number": "555-0123", "registration_year": "2023","conditions": ["Diabetes", "Hypertension"]}');
130+
select documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P002", "name": "Bob Johnson", "age": 45, "phone_number": "555-0456", "registration_year": "2023", "conditions": ["Asthma"]}');
131+
select documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P003", "name": "Charlie Brown", "age": 29, "phone_number": "555-0789", "registration_year": "2024", "conditions": ["Allergy", "Anemia"]}');
132+
select documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P004", "name": "Diana Prince", "age": 40, "phone_number": "555-0987", "registration_year": "2024", "conditions": ["Migraine"]}');
133+
select documentdb_api.insert_one('documentdb','patient', '{ "patient_id": "P005", "name": "Edward Norton", "age": 55, "phone_number": "555-1111", "registration_year": "2025", "conditions": ["Hypertension", "Heart Disease"]}');
99134
```
100135

101-
Step 6: Insert documents
136+
#### Read document from a collection
102137

103-
```python
138+
The `documentdb_api.collection` function is used for retrieving the documents in a collection.
104139

105-
# Insert a single document
106-
quickStartCollection.insert_one({
107-
'name': 'John Doe',
108-
'email': 'john@email.com',
109-
'address': '123 Main St, Anytown, USA',
110-
'phone': '555-1234'
111-
})
140+
```sql
141+
SELECT document FROM documentdb_api.collection('documentdb','patient');
142+
```
143+
144+
Alternatively, we can apply filter to our queries.
112145

113-
# Insert multiple documents
114-
quickStartCollection.insert_many([
115-
{
116-
'name': 'Jane Smith',
117-
'email': 'jane@email.com',
118-
'address': '456 Elm St, Othertown, USA',
119-
'phone': '555-5678'
120-
},
121-
{
122-
'name': 'Alice Johnson',
123-
'email': 'alice@email.com',
124-
'address': '789 Oak St, Sometown, USA',
125-
'phone': '555-8765'
126-
}
127-
])
146+
```sql
147+
SET search_path TO documentdb_api, documentdb_core;
148+
SET documentdb_core.bsonUseEJson TO true;
128149

150+
SELECT cursorPage FROM documentdb_api.find_cursor_first_page('documentdb', '{ "find" : "patient", "filter" : {"patient_id":"P005"}}');
129151
```
130152

131-
Step 7: Read documents
153+
We can perform range queries as well.
154+
155+
```sql
156+
SELECT cursorPage FROM documentdb_api.find_cursor_first_page('documentdb', '{ "find" : "patient", "filter" : { "$and": [{ "age": { "$gte": 10 } },{ "age": { "$lte": 35 } }] }}');
157+
```
132158

133-
```python
159+
#### Update document in a collection
134160

135-
# Read all documents
136-
for document in quickStartCollection.find():
137-
print(document)
161+
DocumentDB uses the [documentdb_api.update](https://github.com/microsoft/documentdb/wiki/Functions#update) function to modify existing documents within a collection.
138162

139-
# Read a specific document
140-
singleDocumentReadResult = quickStartCollection.find_one({'name': 'John Doe'})
141-
print(singleDocumentReadResult)
163+
The SQL command updates the `age` for patient `P004`.
142164

165+
```sql
166+
select documentdb_api.update('documentdb', '{"update":"patient", "updates":[{"q":{"patient_id":"P004"},"u":{"$set":{"age":14}}}]}');
143167
```
144168

145-
Step 8: Run aggregation pipeline operation
169+
Similarly, we can update multiple documents using `multi` property.
146170

147-
```python
171+
```sql
172+
SELECT documentdb_api.update('documentdb', '{"update":"patient", "updates":[{"q":{},"u":{"$set":{"age":24}},"multi":true}]}');
173+
```
174+
175+
#### Delete document from the collection
176+
177+
DocumentDB uses the [documentdb_api.delete](https://github.com/microsoft/documentdb/wiki/Functions#delete) function for precise document removal based on specified criteria.
178+
179+
The SQL command deletes the document for patient `P002`.
148180

149-
pipeline = [
150-
{'$match': {'name': 'Alice Johnson'}},
151-
{'$project': {
152-
'_id': 0,
153-
'name': 1,
154-
'email': 1
155-
}}
156-
]
181+
```sql
182+
SELECT documentdb_api.delete('documentdb', '{"delete": "patient", "deletes": [{"q": {"patient_id": "P002"}, "limit": 1}]}');
183+
```
184+
185+
### Collection management
157186

158-
results = quickStartCollection.aggregate(pipeline)
159-
print("Aggregation results:")
160-
for eachDocument in results:
161-
print(eachDocument)
187+
We can review for the available collections and databases by querying [documentdb_api.list_collections_cursor_first_page](https://github.com/microsoft/documentdb/wiki/Functions#list_collections_cursor_first_page).
162188

189+
```sql
190+
SELECT * FROM documentdb_api.list_collections_cursor_first_page('documentdb', '{ "listCollections": 1 }');
163191
```
164192

165-
### Helpful Links
193+
[documentdb_api.list_indexes_cursor_first_page](https://github.com/microsoft/documentdb/wiki/Functions#list_indexes_cursor_first_page) allows reviewing for the existing indexes on a collection. We can find collection_id from `documentdb_api.list_collections_cursor_first_page`.
194+
195+
```sql
196+
SELECT documentdb_api.list_indexes_cursor_first_page('documentdb','{"listIndexes": "patient"}');
197+
```
166198

167-
- Check out our [website](https://documentdb.io) to stay up to date with the latest on the project.
168-
- Check out our [docs](https://documentdb.io/docs) for MongoDB API compatibility, quickstarts and more.
169-
- Contributors and users can join the [DocumentDB Discord channel](https://discord.gg/vH7bYu524D) for quick collaboration.
170-
- Check out [FerretDB](https://github.com/FerretDB/FerretDB) and their integration of DocumentDB as a backend engine.
199+
`ttl` indexes by default gets scheduled through the `pg_cron` scheduler, which could be reviewed by querying the `cron.job` table.
200+
201+
```sql
202+
select * from cron.job;
203+
```
204+
205+
### Indexing
206+
207+
#### Create an Index
208+
209+
DocumentDB uses the `documentdb_api.create_indexes_background` function, which allows background index creation without disrupting database operations.
210+
211+
The SQL command demonstrates how to create a `single field` index on `age` on the `patient` collection of the `documentdb`.
212+
213+
```sql
214+
SELECT * FROM documentdb_api.create_indexes_background('documentdb', '{ "createIndexes": "patient", "indexes": [{ "key": {"age": 1},"name": "idx_age"}]}');
215+
```
216+
217+
The SQL command demonstrates how to create a `compound index` on fields age and registration_year on the `patient` collection of the `documentdb`.
218+
219+
```sql
220+
SELECT * FROM documentdb_api.create_indexes_background('documentdb', '{ "createIndexes": "patient", "indexes": [{ "key": {"registration_year": 1, "age": 1},"name": "idx_regyr_age"}]}');
221+
```
222+
223+
#### Drop an Index
224+
225+
DocumentDB uses the `documentdb_api.drop_indexes` function, which allows you to remove an existing index from a collection. The SQL command demonstrates how to drop the index named `id_ab_1` from the `first_collection` collection of the `documentdb`.
226+
227+
```sql
228+
CALL documentdb_api.drop_indexes('documentdb', '{"dropIndexes": "patient", "index":"idx_age"}');
229+
```
230+
231+
### Perform aggregations `Group by`
232+
233+
DocumentDB provides the [documentdb_api.aggregate_cursor_first_page](https://github.com/microsoft/documentdb/wiki/Functions#aggregate_cursor_first_page) function, for performing aggregations over the document store.
234+
235+
The example projects an aggregation on number of patients registered over the years.
236+
237+
```sql
238+
SELECT cursorpage FROM documentdb_api.aggregate_cursor_first_page('documentdb', '{ "aggregate": "patient", "pipeline": [ { "$group": { "_id": "$registration_year", "count_patients": { "$count": {} } } } ] , "cursor": { "batchSize": 3 } }');
239+
```
240+
241+
We can perform more complex operations, listing below a few more usage examples.
242+
The example demonstrates an aggregation on patients, categorizing them into buckets defined by registration_year boundaries.
243+
244+
```sql
245+
SELECT cursorpage FROM documentdb_api.aggregate_cursor_first_page('documentdb', '{ "aggregate": "patient", "pipeline": [ { "$bucket": { "groupBy": "$registration_year", "boundaries": ["2022","2023","2024"], "default": "unknown" } } ], "cursor": { "batchSize": 3 } }');
246+
```
247+
248+
This query performs an aggregation on the `patient` collection to group documents by `registration_year`. It collects unique patient conditions for each registration year using the `$addToSet` operator.
249+
250+
```sql
251+
SELECT cursorpage FROM documentdb_api.aggregate_cursor_first_page('documentdb', '{ "aggregate": "patient", "pipeline": [ { "$group": { "_id": "$registration_year", "conditions": { "$addToSet": { "conditions" : "$conditions" } } } } ], "cursor": { "batchSize": 3 } }');
252+
```
253+
254+
### Join data from multiple collections
255+
256+
Let's create an additional collection named `appointment` to demonstrate how a join operation can be performed.
257+
258+
```sql
259+
select documentdb_api.insert_one('documentdb','appointment', '{"appointment_id": "A001", "patient_id": "P001", "doctor_name": "Dr. Milind", "appointment_date": "2023-01-20", "reason": "Routine checkup" }');
260+
select documentdb_api.insert_one('documentdb','appointment', '{"appointment_id": "A002", "patient_id": "P001", "doctor_name": "Dr. Moore", "appointment_date": "2023-02-10", "reason": "Follow-up"}');
261+
select documentdb_api.insert_one('documentdb','appointment', '{"appointment_id": "A004", "patient_id": "P003", "doctor_name": "Dr. Smith", "appointment_date": "2024-03-12", "reason": "Allergy consultation"}');
262+
select documentdb_api.insert_one('documentdb','appointment', '{"appointment_id": "A005", "patient_id": "P004", "doctor_name": "Dr. Moore", "appointment_date": "2024-04-15", "reason": "Migraine treatment"}');
263+
select documentdb_api.insert_one('documentdb','appointment', '{"appointment_id": "A007","patient_id": "P001", "doctor_name": "Dr. Milind", "appointment_date": "2024-06-05", "reason": "Blood test"}');
264+
select documentdb_api.insert_one('documentdb','appointment', '{ "appointment_id": "A009", "patient_id": "P003", "doctor_name": "Dr. Smith","appointment_date": "2025-01-20", "reason": "Follow-up visit"}');
265+
```
266+
267+
The example presents each patient along with the doctors visited.
268+
269+
```sql
270+
SELECT cursorpage FROM documentdb_api.aggregate_cursor_first_page('documentdb', '{ "aggregate": "patient", "pipeline": [ { "$lookup": { "from": "appointment","localField": "patient_id", "foreignField": "patient_id", "as": "appointment" } },{"$unwind":"$appointment"},{"$project":{"_id":0,"name":1,"appointment.doctor_name":1,"appointment.appointment_date":1}} ], "cursor": { "batchSize": 3 } }');
271+
```

0 commit comments

Comments
 (0)