Revert "fetch -> web/fetch for everything"

This reverts commit ca790b1716.
This commit is contained in:
Jon Corbin
2026-01-12 14:03:48 -05:00
parent e8577617b0
commit 5afc90e633
69 changed files with 360 additions and 360 deletions

View File

@@ -197,7 +197,7 @@ A JSON representation showing 5-10 representative documents for the container
"email": "john@example.com"
},
{
"id": "order_456",
"id": "order_456",
"partitionKey": "user_123",
"type": "order",
"userId": "user_123",
@@ -254,7 +254,7 @@ A JSON representation showing 5-10 representative documents for the container
[Explain the overall trade-offs made and optimizations used as well as why - such as the examples below]
- **Aggregate Design**: Kept Orders and OrderItems together due to 95% access correlation - trades document size for query performance
- **Denormalization**: Duplicated user name in Order document to avoid cross-partition lookup - trades storage for performance
- **Denormalization**: Duplicated user name in Order document to avoid cross-partition lookup - trades storage for performance
- **Normalization**: Kept User as separate document type from Orders due to low access correlation (15%) - optimizes update costs
- **Indexing Strategy**: Used selective indexing instead of automatic to balance cost vs additional query needs
- **Multi-Document Containers**: Used multi-document containers for [access_pattern] to enable transactional consistency
@@ -290,7 +290,7 @@ A JSON representation showing 5-10 representative documents for the container
- ALWAYS update cosmosdb_requirements.md after each user response with new information
- ALWAYS treat design considerations in modeling file as evolving thoughts, not final decisions
- ALWAYS consider Multi-Document Containers when entities have 30-70% access correlation
- ALWAYS consider Hierarchical Partition Keys as alternative to synthetic keys if initial design recommends synthetic keys
- ALWAYS consider Hierarchical Partition Keys as alternative to synthetic keys if initial design recommends synthetic keys
- ALWAYS consider data binning for massive scale workloads of uniformed events and batch type writes workloads to optimize size and RU costs
- **ALWAYS calculate costs accurately** - use realistic document sizes and include all overhead
- **ALWAYS present final clean comparison** rather than multiple confusing iterations
@@ -343,7 +343,7 @@ In aggregate-oriented design, Azure Cosmos DB NoSQL offers multiple levels of ag
Multiple entities combined into a single Cosmos DB document. This provides:
• Atomic updates across all data in the aggregate
• Single point read retrieval for all data. Make sure to reference the document by id and partition key via API (example `ReadItemAsync<Order>(id: "order0103", partitionKey: new PartitionKey("TimS1234"));` instead of using a query with `SELECT * FROM c WHERE c.id = "order0103" AND c.partitionKey = "TimS1234"` for point reads examples)
• Single point read retrieval for all data. Make sure to reference the document by id and partition key via API (example `ReadItemAsync<Order>(id: "order0103", partitionKey: new PartitionKey("TimS1234"));` instead of using a query with `SELECT * FROM c WHERE c.id = "order0103" AND c.partitionKey = "TimS1234"` for point reads examples)
• Subject to 2MB document size limit
When designing aggregates, consider both levels based on your requirements.
@@ -375,7 +375,7 @@ When designing aggregates, consider both levels based on your requirements.
• **Cross-partition overhead**: Each physical partition adds ~2.5 RU base cost to cross-partition queries
• **Massive scale implications**: 100+ physical partitions make cross-partition queries extremely expensive and not scalable.
• Index overhead: Every indexed property consumes storage and write RUs
• Update patterns: Frequent updates to indexed properties or full Document replace increase RU costs (and the bigger Document size, bigger the impact of update RU increase)
• Update patterns: Frequent updates to indexed properties or full Document replace increase RU costs (and the bigger Document size, bigger the impact of update RU increase)
## Core Design Philosophy
@@ -439,7 +439,7 @@ One-to-One: Store the related ID in both documents
```json
// Users container
{ "id": "user_123", "partitionKey": "user_123", "profileId": "profile_456" }
// Profiles container
// Profiles container
{ "id": "profile_456", "partitionKey": "profile_456", "userId": "user_123" }
```
@@ -463,10 +463,10 @@ Frequently accessed attributes: Denormalize sparingly
```json
// Orders document
{
"id": "order_789",
"partitionKey": "user_123",
"customerId": "user_123",
{
"id": "order_789",
"partitionKey": "user_123",
"customerId": "user_123",
"customerName": "John Doe" // Include customer name to avoid lookup
}
```
@@ -493,7 +493,7 @@ When deciding aggregate boundaries, use this decision framework:
Step 1: Analyze Access Correlation
• 90% accessed together → Strong single document aggregate candidate
• 50-90% accessed together → Multi-document container aggregate candidate
• 50-90% accessed together → Multi-document container aggregate candidate
• <50% accessed together → Separate aggregates/containers
Step 2: Check Constraints
@@ -514,8 +514,8 @@ Based on Steps 1 & 2, select:
Order + OrderItems:
Access Analysis:
web/fetch order without items: 5% (just checking status)
web/fetch order with all items: 95% (normal flow)
Fetch order without items: 5% (just checking status)
Fetch order with all items: 95% (normal flow)
• Update patterns: Items rarely change independently
• Combined size: ~50KB average, max 200KB
@@ -587,7 +587,7 @@ Index overhead increases RU costs and storage. It occurs when documents have man
When making aggregate design decisions:
• Calculate read cost = frequency × RUs per operation
• Calculate write cost = frequency × RUs per operation
• Calculate write cost = frequency × RUs per operation
• Total cost = Σ(read costs) + Σ(write costs)
• Choose the design with lower total cost
@@ -623,7 +623,7 @@ When facing massive write volumes, **data binning/chunking** can reduce write op
```json
{
"id": "chunk_001",
"partitionKey": "account_test_chunk_001",
"partitionKey": "account_test_chunk_001",
"chunkId": 1,
"records": [
{ "recordId": 1, "data": "..." },
@@ -660,7 +660,7 @@ When multiple entity types are frequently accessed together, group them in the s
[
{
"id": "user_123",
"partitionKey": "user_123",
"partitionKey": "user_123",
"type": "user",
"name": "John Doe",
"email": "john@example.com"
@@ -668,7 +668,7 @@ When multiple entity types are frequently accessed together, group them in the s
{
"id": "order_456",
"partitionKey": "user_123",
"type": "order",
"type": "order",
"userId": "user_123",
"amount": 99.99
}
@@ -705,7 +705,7 @@ Promoting to Single Document Aggregate
When multi-document analysis reveals:
• Access correlation higher than initially thought (>90%)
• All documents always web/fetched together
• All documents always fetched together
• Combined size remains bounded
• Would benefit from atomic updates
@@ -728,7 +728,7 @@ Example analysis:
Product + Reviews Aggregate Analysis:
- Access pattern: View product details (no reviews) - 70%
- Access pattern: View product with reviews - 30%
- Access pattern: View product with reviews - 30%
- Update frequency: Products daily, Reviews hourly
- Average sizes: Product 5KB, Reviews 200KB total
- Decision: Multi-document container - low access correlation + size concerns + update mismatch
@@ -741,7 +741,7 @@ Short-circuit denormalization involves duplicating a property from a related ent
2. The duplicated property is mostly immutable or application can accept stale values
3. The property is small enough and won't significantly impact RU consumption
Example: In an e-commerce application, you can duplicate the ProductName from the Product document into each OrderItem document, so that web/fetching order items doesn't require additional queries to retrieve product names.
Example: In an e-commerce application, you can duplicate the ProductName from the Product document into each OrderItem document, so that fetching order items doesn't require additional queries to retrieve product names.
### Identifying relationship
@@ -788,14 +788,14 @@ StudentCourseLessons container:
"type": "student"
},
{
"id": "course_456",
"id": "course_456",
"partitionKey": "student_123",
"type": "course",
"courseId": "course_456"
},
{
"id": "lesson_789",
"partitionKey": "student_123",
"partitionKey": "student_123",
"type": "lesson",
"courseId": "course_456",
"lessonId": "lesson_789"
@@ -818,7 +818,7 @@ TenantData container:
```json
{
"id": "record_123",
"partitionKey": "tenant_456_customer_789",
"partitionKey": "tenant_456_customer_789",
"tenantId": "tenant_456",
"customerId": "customer_789"
}
@@ -877,20 +877,20 @@ Azure Cosmos DB doesn't enforce unique constraints beyond the id+partitionKey co
function createUserWithUniqueEmail(userData) {
var context = getContext();
var container = context.getCollection();
// Check if email already exists
var query = `SELECT * FROM c WHERE c.email = "${userData.email}"`;
var isAccepted = container.queryDocuments(
container.getSelfLink(),
query,
function(err, documents) {
if (err) throw new Error('Error querying documents: ' + err.message);
if (documents.length > 0) {
throw new Error('Email already exists');
}
// Email is unique, create the user
var isAccepted = container.createDocument(
container.getSelfLink(),
@@ -900,11 +900,11 @@ function createUserWithUniqueEmail(userData) {
context.getResponse().setBody(document);
}
);
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
);
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
```
@@ -929,7 +929,7 @@ Hierarchical Partition Keys provide natural query boundaries using multiple fiel
{
"partitionKey": {
"version": 2,
"kind": "MultiHash",
"kind": "MultiHash",
"paths": ["/accountId", "/testId", "/chunkId"]
}
}
@@ -944,7 +944,7 @@ Hierarchical Partition Keys provide natural query boundaries using multiple fiel
- Data has natural hierarchy (tenant → user → document)
- Frequent prefix-based queries
- Want to eliminate synthetic partition key complexity
- Apply only for Cosmos NoSQL API
- Apply only for Cosmos NoSQL API
**Trade-offs**:
- Requires dedicated tier (not available on serverless)
@@ -963,7 +963,7 @@ Implementation: Add a shard suffix using hash-based or time-based calculation:
// Hash-based sharding
partitionKey = originalKey + "_" + (hash(identifier) % shardCount)
// Time-based sharding
// Time-based sharding
partitionKey = originalKey + "_" + (currentHour % shardCount)
```
@@ -993,7 +993,7 @@ EventLog container (problematic):
• Result: Limited to 10,000 RU/s regardless of total container throughput
Sharded solution:
• Partition Key: date + "_" + shard_id (e.g., "2024-07-09_4")
• Partition Key: date + "_" + shard_id (e.g., "2024-07-09_4")
• Shard calculation: shard_id = hash(event_id) % 15
• Result: Distributes daily events across 15 partitions
@@ -1002,7 +1002,7 @@ Sharded solution:
When aggregate boundaries conflict with update patterns, prioritize based on RU cost impact:
Example: Order Processing System
• Read pattern: Always web/fetch order with all items (1000 RPS)
• Read pattern: Always fetch order with all items (1000 RPS)
• Update pattern: Individual item status updates (100 RPS)
Option 1 - Combined aggregate (single document):
@@ -1010,7 +1010,7 @@ Option 1 - Combined aggregate (single document):
- Write cost: 100 RPS × 10 RU (rewrite entire order) = 1000 RU/s
Option 2 - Separate items (multi-document):
- Read cost: 1000 RPS × 5 RU (query multiple items) = 5000 RU/s
- Read cost: 1000 RPS × 5 RU (query multiple items) = 5000 RU/s
- Write cost: 100 RPS × 10 RU (update single item) = 1000 RU/s
Decision: Option 1 better due to significantly lower read costs despite same write costs
@@ -1029,7 +1029,7 @@ Example: Session tokens with 24-hour expiration
{
"id": "sess_abc123",
"partitionKey": "user_456",
"userId": "user_456",
"userId": "user_456",
"createdAt": "2024-01-01T12:00:00Z",
"ttl": 86400
}