Tag: markdown

8 articles

xiaohongshurednote小红书feishularkai workflowmarkdownchinese social mediaknowledge management

Xiaohongshu to Feishu / Lark Workflow: Save Chinese Social Posts as AI-Ready Markdown

Xiaohongshu (RED / 小红书) is a content goldmine for Chinese-speaking knowledge workers, but its content format is hostile to note-taking tools. External scrapers fail (anti-bot signing). Copy-paste loses images and metadata. Here's a workflow that actually works in 2026 for getting Xiaohongshu posts into Feishu / Lark with full fidelity.

2026-05-106 min read
RAG pipeline preprocessingweb data for RAGRAG input qualityLangChainLlamaIndexvector databaseembedding qualityweb scrapingmarkdownAI engineering

RAG Pipeline Preprocessing: Why Web Data Quality Determines Everything

Most RAG pipelines fail not because of bad retrievers or weak LLMs — they fail because of dirty input data. This deep-dive covers the complete preprocessing architecture for web data: crawling, cleaning, chunking, embedding, and storage, with real Python code and benchmark results.

2026-04-0417 min read