Skip to main content

Conduit Roadmap

Objective: A production-ready .NET data pipeline that fetches, transforms, and serves content from multiple sources with pluggable adapters, transformation stages, and resilient delivery.

Success Criteria

  • Multi-source ingestion beyond RSS (Atom, EDI 834, Zotero)
  • Data transformation and enrichment layer
  • Validation and rejected data tier
  • Deep domain coverage across all three source types
  • Deployable as a container to Azure or any Linux host
  • Comprehensive test coverage and documentation

Context

Conduit is a content aggregation and data pipeline platform built on .NET 10. It ingests content from diverse sources, transforms and enriches it, and serves it through multiple interfaces. The architecture follows a staged pipeline: fetch -> parse -> transform -> store, with multiple entry points (console runner, background worker, REST API, CLI).

The pipeline pattern is domain-agnostic -- applicable to news aggregation, research monitoring, competitive intelligence, healthcare data ingestion, or any scenario where structured content needs to be collected from heterogeneous sources.

Milestones

Foundation

Establishes the core project structure, DI, logging, testing, CI/CD, and documentation. Provides four entry points (console, worker, API, CLI) and a clean architecture that all subsequent milestones build on.

Multi-Source Ingestion

Introduces a pluggable source adapter pattern so the pipeline can ingest data from any structured source -- RSS, Atom, EDI 834, and Zotero research libraries. Proves the architecture works across content feeds, transactional healthcare data, and hybrid local-file-plus-API sources.

Data Transformation

Adds a composable transformation layer between ingestion and storage. Handles deduplication, content enrichment, validation, and a three-tier output pattern (raw / curated / rejected). Turns Conduit from a data fetcher into a data pipeline.

Source Depth: EDI 834

Deepens the 834 adapter from a working prototype to a more complete X12 implementation. Adds transaction/batch envelope tracking, functional acknowledgments (999/TA1), effective dating for overlapping coverage periods, and a more complete X12 loop parser for real-world 834 files.

Source Depth: Zotero

Deepens the Zotero adapter beyond CSV parsing and domain tagging. Adds richer metadata resolution (CrossRef API for citation counts and venue), preprint-to-published version linking, collection and tag hierarchy, and reading status tracking.

Source Depth: RSS

Deepens the RSS adapter beyond keyword extraction. Adds content-similarity deduplication across feeds, topic clustering, feed health tracking, and full-text extraction from linked articles.

Storage Backends

Introduces persistent storage options beyond the local filesystem — database backends (SQLite, PostgreSQL), cloud storage (S3, Azure Blob), or hybrid strategies. Builds on the storage abstraction established in the Data Transformation milestone so that switching or adding backends is a configuration change, not a code change.

  • PRD: To be created
  • Dependency: Data Transformation
  • Status: Not Started

References