Designing a High-Performance, Reliable System: Integrating Notion, Astro, and Cloudflare's Edge (Workers KV & CDN)
Designing a High-Performance, Reliable System: Integrating Notion, Astro, and Cloudflare's Edge (Workers KV & CDN)
发布日期: 2025年4月12日 已更新: 2025年4月13日
标签:
介绍
对于使用 Astro 框架和 Notion 作为无头 CMS 构建的博客网站而言,提升文章加载速度、改善用户体验以及确保内容更新的可靠性是至关重要的挑战。诸如长篇文章的显示延迟以及 Notion 中的更新无法实时反映等问题,都会严重影响网站的运行质量。
本文详细介绍了为应对这些挑战而构建和实施的系统,该系统结合了 Astro、Cloudflare Workers、Workers KV 和 CDN 缓存。我们将阐述初始方案及其局限性、当前架构、缓存策略的细节、自动更新机制以及通过此实施方案取得的效果。
改进前的挑战
初期实施过程中面临以下主要挑战:
- 大型文章加载延迟和超时: 由于整篇文章(所有块)都是从 Notion API 获取并一次性处理的,因此初始显示需要相当长的时间,尤其是对于长文章,有时会导致服务器端超时。
- 内容更新延迟和不一致: 使用 Cloudflare 的 CDN 缓存意味着,即使在 Notion 中更新了文章,用户仍然会看到旧内容,除非手动清除缓存。
- 运营成本问题: 我们目前在使用 Make.com 时,发现免费套餐的消耗速度超出预期,导致更新通知停止。为了降低网站运营成本,解决这个问题也势在必行。
(失败的)初步方法
为了解决这些问题,我们尝试了以下几种方法:
- 基于块的拆分和延迟加载: 我们尝试使用 Notion API 的分页功能,将文章内容逐块拆分并延迟加载。然而,由于 Notion API 的层级结构和顺序保证方面的限制,保持块之间的顺序一致性非常困难。最终,我们仍然需要将所有块同时加载到内存中,未能显著降低加载量或避免超时。这也增加了实现的复杂性,因此我们放弃了这种方法。
- 静态站点生成 (SSG): 我们曾考虑为每篇文章生成静态 HTML。然而,每次文章更新都需要构建和部署,这与最初低成本、低维护成本的要求相冲突。因此,我们决定坚持采用服务端渲染 (SSR)(+ CDN 缓存)方案。
通过这些经验,我们最终形成了目前更加完善的架构。
系统架构
经过反复试验,我们最终确定了以下配置。基于运行在 SSR 模式下的 Astro,核心策略是通过 Cloudflare Workers 获取主要内容,并利用键值对和 CDN 实现积极的缓存策略。 我决定停止使用 Make,而是使用 GitHub Actions 的调度程序每小时运行一次。
系统整体图及组件结构:
数据流(概述):
核心:Cloudflare Workers KV 和 CDN 缓存策略
该系统的核心是缓存策略,它整合了 Cloudflare Workers KV 和 CDN。
详细数据检索流程:
- 客户端(浏览器): 访问页面(/blog/en/slug)。
- Cloudflare CDN: 首先接收请求。如果存在缓存的 SSR HTML 版本,则立即返回该版本。
- Astro SSR: 如果没有 CDN 缓存(或缓存已过期),服务器会重新渲染页面。在此过程中,它会调用内容 API/api/blog/en/slug/content来获取文章数据。
- Cloudflare CDN(通过 API): 内容 API 也配置了 CDN 缓存。如果 API 响应已被缓存,则会直接返回。
- Cloudflare Worker(内容 API): 如果 API 的 CDN 缓存未命中,则请求将到达 Worker。
- 工作节点 KV 缓存: 工作节点首先检查 KV。键格式为 blog:[ja|en]:[slug] (例如, blog:en:my-first-post)。
- 键值命中时: 返回包含转换后的 HTML 和元数据的 JSON。
- 键值未命中: 从 Notion API 获取代码块 → 转换为 HTML → 在键值中缓存 7 天 → 作为 JSON 响应返回。
- 最终响应: JSON 通过 CDN 缓存传递给 Astro SSR,SSR 页面组装完成,并以 HTML 格式返回给客户端。
KV缓存策略:
KV 存储文章正文(由 Notion 转换为 HTML)及相关信息。
- 图例:( blog:[ja|en]:[slug] 例如 blog:en:cloudflare-workers-kv-cache)
- 值: 类似 JSON 字符串 { html: "
Article HTML...
", article: { title: "...", ... } }。
- TTL: 7 天(604800 秒)
我们引入了一种机制,可以在保持缓存新鲜度的同时,有意地获取最新信息。
// Determine cache force-refresh based on query parameter
const url = new URL(request.url);
const forceRefresh = url.searchParams.get("_refresh") === "true";
// Generate cache key (e.g., based on language and slug)
const lang = url.pathname.split('/')[3];
const slug = url.pathname.split('/')[4];
const cacheKey = `content:${lang}:${slug}`;
if (!forceRefresh) {
// For normal requests, check the KV cache first
const cached = await env.CONTENT_CACHE.get(cacheKey);
if (cached) {
return createJsonResponse(JSON.parse(cached), true); // Indicate returned from cache
}
}
// On cache miss or explicit refresh request, fetch from origin
const pageId = await resolvePageIdFromSlug(slug);
// slug -> ID resolution (CMS-dependent)
const responseData = await fetchAndConvertContent(pageId); // Data retrieval + HTML conversion etc.
// Save to cache (e.g., for 7 days)
await env.CONTENT_CACHE.put(cacheKey, JSON.stringify(responseData), { expirationTtl: 604800 });
return createJsonResponse(responseData, false); // Origin fetch flag为安全起见,使用查询参数访问 API 时 ?_refresh=true 会忽略键值缓存,强制从 Notion 获取最新数据并覆盖原有键值条目。此功能用于后续描述的自动更新流程。
CDN缓存策略(概述):
为了控制 API 响应的缓存行为,我们使用 HTTP 标头定义了一个显式的缓存策略。这确保了在浏览器、CDN 和源服务器之间选择最佳的响应路径,从而平衡显示速度和更新可靠性。
// Example of returning a response
function createApiResponse(data, fromCache) {
const body = JSON.stringify(data);
// Simple ETag generation
const etag = `"W/${Buffer.from(body).toString('base64').slice(0, 27)}"`;
const now = new Date();
return new Response(body, {
headers: {
"Content-Type": "application/json",
"Cache-Control": "public, max-age=0, s-maxage=300, stale-while-revalidate=600",
"ETag": etag,
"Last-Modified": now.toUTCString(),
"Vary": "Accept-Encoding",
// Custom headers for debugging
"X-Cache": fromCache ? "HIT" : "MISS",
"X-Source": fromCache ? "edge-kv" : "origin",
"X-Version": now.toISOString()// Timestamp of data generation
}
});
}- 关键标题说明:
- Cache-Control: public, max-age=0, s-maxage=300, stale-while-revalidate=600
- public允许中间缓存(如 CDN)存储响应。
- max-age=0:告诉浏览器每次都使用缓存重新验证。
- s-maxage=300允许 CDN 将缓存保留最多 5 分钟。
- stale-while-revalidate=600缓存过期后,CDN 可以在后台异步更新的同时,提供最多 10 分钟的过期数据。
- ETag内容标识符。允许浏览器在后续请求中使用它 If-None-Match ,如果内容没有更改,则可能会收到 304 Not Modified 响应以节省带宽。
- Last-Modified记录最后修改时间,用于条件性重新获取 If-Modified-Since。
- Vary: Accept-Encoding:表示响应可能会根据客户端请求的压缩格式(例如,gzip、brotli)而有所不同。
- 自定义标头(用于调试):
- X-Cache:指示缓存命中/未命中(例如, HIT 或 MISS)。
- X-Source:显示响应的来源(例如 edge-kv,, origin-fetch)。
- X-Version:数据生成的时间戳(用于检查缓存新鲜度)。
- Cache-Control: public, max-age=0, s-maxage=300, stale-while-revalidate=600
⚠️ 生产环境中遇到的缓存陷阱及其解决方案
将 Cloudflare Workers、KV 和 CDN 结合使用可能会导致一些容易被忽视的陷阱。以下是我们实际遇到的三个与缓存相关的问题及其解决方案:
- 无法使用KV!?Cloudflare Pages的盲点
- 问题: Cloudflare Pages 本身 无法直接绑定或使用 Workers KV。
- 解决方案: 只需通过 Cloudflare 控制面板创建一个空的(或任意的)Worker,并将其与 Pages 项目的自定义域名关联即可。这样即可为 Pages 函数启用键值绑定。
- 未反映在 **get()** 后续内容 中**put()** ——KV一致性问题
- 问题: Workers KV 表是“最终一致性”的。写入数据后立即读取的数据可能返回旧值。
- Solution: Implement an API hook (like using ?_refresh=true) that explicitly requests a KV update. When an article is updated, hitting the API with this parameter ensures a reliable update.
- CDN Returns "Stale Data That Looks Fresh"
- Problem: Even after updating KV, if the CDN cache holds the previous response, the browser might still see the old content.
- Solution: Design the system with appropriate Cache-Control, ETag, Last-Modified, and Vary headers so the CDN can revalidate correctly. For cases requiring immediate reflection, utilize automated CDN purging via the Cloudflare API.
Automated Cache Update Infrastructure
Manually executing ?_refresh=true requests or CDN purges is cumbersome and risks missed updates. Therefore, we built an automated cache update infrastructure using GitHub Actions.
CLI Command (Cache Generation Script):
First, we prepared a CLI script to generate/update the cache for a specific article.
# Update the cache for a specific article (Production KV)
npx tsx scripts/build-blog-cache.ts --slug "target-blog-post-slug"This script internally fetches data directly from the Notion API and puts it into KV.
GitHub Actions Automation Flow:
This CLI command is executed from a GitHub Actions workflow.
- Trigger: The workflow runs on pushes to the main branch or periodically (e.g., every hour).
- Fetch Article List & Detect Differences: Uses the Notion API to fetch the list of articles (including last updated times) and compares them with the last updated times stored in KV.
- Regenerate Cache: Executes the CLI command above only for articles that have been updated or newly added, updating the KV cache.
- CDN Purge (Optional): As needed, purges the CDN cache corresponding to the URLs of the updated articles via the Cloudflare API to promote immediate reflection.
GitHub Actions workflow for automated cache updates (Overview)name: Update Blog Cache
on: push: branches: [ main ] # Trigger on updates to the production branchschedule: - cron: '0 * * * *' # Periodic execution every hourjobs: update-cache: runs-on: ubuntu-latest steps: - name: Checkout repository uses: actions/checkout@v3
- name: Setup Node.js environment uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Detect updated articles run: | # Compare CMS (Notion, etc.) and cache (KV) to extract slugs needing update SLUGS=$(node scripts/find-updated-slugs.ts) echo "updated_slugs=$SLUGS" >> $GITHUB_ENV env:Necessary credentials read securely from Secrets
CMS_API_KEY: ${{ secrets.CMS_API_KEY }} CACHE_NAMESPACE_ID: ${{ secrets.CACHE_NAMESPACE_ID }} CLOUDFLARE_TOKEN: ${{ secrets.CLOUDFLARE_TOKEN }} - name: Regenerate cache for target articles if: env.updated_slugs != '' run: | for slug in ${{ env.updated_slugs }}; do echo "Updating cache for $slug" # Ensure slug is properly quoted if it contains special characters node scripts/build-cache.ts --slug "$slug" done env:Environment variables for the build script
CMS_API_KEY: ${{ secrets.CMS_API_KEY }} CACHE_NAMESPACE_ID: ${{ secrets.CACHE_NAMESPACE_ID }} CLOUDFLARE_TOKEN: ${{ secrets.CLOUDFLARE_TOKEN }} - name: Purge CDN cache (Optional) if: env.updated_slugs != '' run: | # Ensure slugs are properly formatted for the purge script node scripts/purge-cdn-cache.ts --slugs "${{ env.updated_slugs }}" env: CLOUDFLARE_ZONE_ID: ${{ secrets.CLOUDFLARE_ZONE_ID }} CLOUDFLARE_TOKEN: ${{ secrets.CLOUDFLARE_TOKEN }}
With these mechanisms, after an article update in Notion, the cache is automatically updated and reflected to users within, at most, the s-maxage (5 minutes here) plus the GitHub Actions execution interval (maximum 1 hour here).
Frontend Implementation (Astro)
SSR Page
- Since src/pages/[lang]/[slug].astro is running in SSR (Server-Side Rendering) mode, getStaticPaths is not required.
- Metadata & Body Fetching: Within the page component (or layout component), fetch the HTML body from the /api/blog/[lang]/[slug]/content endpoint mentioned earlier. During normal access, the KV/CDN cache responds, enabling fast page display and significantly reducing direct hits to the Notion API.
- HTML Embedding: Insert the fetched HTML string into the page template using set:html={apiResponse.html}.
- Skeleton Loading: While fetching the body data from the API, display a skeleton UI matching the content structure (headings, paragraphs, images, etc.). This allows users to grasp the page layout even during loading, improving perceived speed. Hide the skeleton and display the actual content once data fetching is complete.
// Concept for skeleton display const skeletonContainer = document.getElementById('skeleton-loader'); const actualContent = document.getElementById('article-content');
// Inside API fetch .then() or .finally() skeletonContainer.classList.add("hidden"); actualContent.classList.remove("hidden"); actualContent.innerHTML = apiResponse.html;
Image Lazy Loading & Proxy:
- Add loading="lazy" and decoding="async" to <img> tags in the article body to defer image loading until they approach the viewport.
- As described earlier, convert Notion image URLs to persistent URLs served via Cloudflare (proxying/caching from R2) for optimization and caching.
JavaScript Dynamic Import:
- Use import() to load libraries like highlight.js for syntax highlighting or JavaScript needed for specific interactions only when required.
// Load highlight library only if code blocks are found if (document.querySelector('pre code')) { let highlightPromise = import("highlight.js").then((module) => { window.hljs = module.default; // Exclude mermaid window.hljs.configure({ ignoreUnescapedHTML: true, languages: supportedLanguages, noHighlightRe: /^mermaid$/, }); }); }
Notion Content to HTML Conversion
Converting block data fetched from the Notion API into HTML displayable on a web page is also crucial.
Conversion Flow:
- Previously, we used a Notion -> Markdown -> HTML conversion process. However, this struggled with the expressiveness of Notion-specific blocks (callouts, synced blocks, etc.) and caused issues like unintentional tag changes during list conversion (e.g., numbered lists becoming
).
- Currently, we implement logic from scratch to convert Notion API block data directly into HTML.
Special Block Example (Callout):
Represent Notion's callout blocks using HTML divs or similar elements.
// Part of the function to convert Notion blocks to HTML (Conceptual) function blockToHtml(block: NotionBlock): string { let html = ''; // Convert rich text annotations to HTML const richTextHtml = richTextToHtml(block[block.type].rich_text); switch (block.type) { case 'paragraph': html = `<p>${richTextHtml || ' '}</p>`; // Add space for empty paragraphs break; case 'heading_1': html = `<h1>${richTextHtml}</h1>`; break; // ... other heading types ... case 'callout': const iconHtml = block.callout.icon?.type === 'emoji' // Check icon type ? `<span class="callout-icon">${block.callout.icon.emoji}</span>` : ''; // Consider other icon types (external, file) if needed html = `<div class="callout">${iconHtml}<div class="callout-content">${richTextHtml}</div></div>`; break; // Handle other block types (bulleted_list_item, numbered_list_item, image, code, etc.) // ... default: console.warn(`Unsupported block type: ${block.type}`); } // Add logic to recursively process child blocks if block.has_children is true if (block.has_children) { // Fetch child blocks for this block ID and call blockToHtml recursively... // Append the resulting child HTML within the parent element (e.g., inside <li> or <details>) } return html; } // Helper function to convert Notion's rich_text array to HTML function richTextToHtml(richTextArray: RichTextItemResponse[]): string { // Implementation to handle annotations (bold, italic, code, color, links, etc.) // Map through richTextArray and wrap content in appropriate HTML tags (<b>, <i>, <a>, <code>, <span style="...">) return richTextArray?.map(rt => { let content = rt.plain_text; if (rt.annotations.bold) content = `<strong>${content}</strong>`; if (rt.annotations.italic) content = `<em>${content}</em>`; // ... handle other annotations ... if (rt.href) content = `<a href="${rt.href}" target="_blank" rel="noopener noreferrer">${content}</a>`; return content; }).join('') || ''; }This direct conversion improves processing speed and allows for rich expression closer to the appearance within Notion itself.
Effects and Benefits of Implementation
The introduction of this architecture and caching strategy yielded the following specific improvements and benefits:
Performance Improvement:
- Improved Initial Display Speed (TTFB): Astro SSR focuses on fetching lightweight metadata, while the article body is served instantly from the KV/CDN cache, speeding up the initial response. Combined with skeleton UI, the perceived waiting time is reduced to almost zero.
- Faster API Response & Reduced Load: Since KV and CDN handle most requests, direct hits to the Notion API are unnecessary during normal access. This improves API response stability and significantly reduces infrastructure load.
- Uninterrupted UX Even on Cache Expiry: Utilizing stale-while-revalidate allows the CDN to serve slightly outdated data immediately while updating asynchronously, ensuring users can continue browsing without waiting for content updates.
Improved Update Reliability:
- The automated cache update flow ensures that after an article update in Notion, the latest content is reliably reflected in KV and the CDN within the defined timeframe, delivering the most current articles to users.
- Signed S3 image URLs provided by Notion have an expiration problem. As a countermeasure, we implemented caching using an image proxy built with Cloudflare Workers, while simultaneously having our article cache generation script convert images to WebP format and save them to R2. This enables stable image delivery from our own CDN (Cloudflare), ensuring both display reliability and speed.
Improved Developability:
- Automated Workflow: CLI tools and GitHub Actions automate the cache update process, eliminating manual effort and mistakes.
- Environment Separation: Using separate KV namespaces for preview (-preview) and production enables safe testing and deployment.
- Ease of Debugging: Custom HTTP headers (like X-Cache, X-Source) clarify the cache status, making it easier to identify the cause of issues when they occur.
Error Handling:
- KV Cache Structure Validation: Added checks to ensure data retrieved from KV matches the expected format.
- Notion API Error Fallback: Implemented measures to return old KV cache data or display an error page if fetching data from the Notion API fails.
- Frontend Error Display: If fetching data from the API fails on the frontend, display an error message and a retry option to the user.
Improved User Experience (UX):
- The combination of faster initial display and skeleton loading significantly improved the overall browsing experience.
Conclusion and Future Prospects
This article explained how to build a system for an Astro blog using Notion as a CMS that achieves a high level of balance between the often-conflicting requirements of "display speed" and "update reliability." This was accomplished by leveraging Cloudflare Workers KV and CDN caching as the core, combined with automated update flows and frontend optimizations.
Key Points of the System:
- Separation of Structure: Metadata (fetched during SSR) is separated from body content (via API + KV/CDN cache).
- Multi-Layer Caching: High-speed delivery via Workers KV (cache close to origin) + Cloudflare CDN (edge cache).
- Cache Control: Efficient cache management using Cache-Control, ETag, Last-Modified headers, and improved availability with stale-while-revalidate.
- Automation: Reduced operational load and ensured update reliability through automation of KV cache updates and CDN purging with GitHub Actions.
- Direct Notion Conversion: Improved expressiveness and processing speed by generating HTML directly from Notion blocks.
This configuration is applicable not only to Notion but also as a general-purpose approach to improve the performance and reliability of websites/applications using external APIs as data sources.
Future Challenges and Opportunities for Cloudflare Service Integration
目前的架构旨在做到简单、快速且经济高效。然而,随着项目规模的扩大——流量增加和数据结构日益多样化——逐步集成更多 Cloudflare 服务可以显著提升可扩展性、可维护性和性能。
1. 扩展存储层:R2 / D1
- R2(对象存储)
R2 目前已用作从 Notion 获取的文章资源的持久化图像存储后端。展望未来,R2 有望用于存储更多非文本类的大型内容,例如 PDF、视频或技术文档。这将降低键值存储 (KV) 的负载并优化存储效率。
- D1(无服务器 SQL)
对于未来需要结构化、持久化数据的功能(例如评论、编辑历史或用户书签),D1 是一个理想的选择。虽然目前的系统主要基于键值对(KV),但采用 D1 可以支持更多交互式使用场景。
2. 高级性能和交付优化
- 图片调整大小/抛光
通过利用 Cloudflare 的自动图像优化工具(例如 WebP 转换和调整大小),可以显著提高图像交付性能和用户体验,所有这些都可以在 CDN 层实现。
- Cloudflare Pages 函数 + 持久对象
对于需要有状态交互的功能,例如评论提交或点赞计数器,Durable Objects 与 Cloudflare Pages Functions 结合使用时可提供轻量级的状态管理。
- 重新思考局部更新/渲染
对于长篇内容,可以探索只更新和渲染修改部分而不是整篇文章的方法,从而提高效率并减少 I/O。
有效的缓存不仅仅是使用正确的工具——它需要深入了解每一层的行为(特别是 Cloudflare 等托管服务的一致性模型和传播机制),以及持续的监控和经验调优。
我希望这篇文章能为面临类似挑战的开发人员提供实用指南和架构参考。