System Architecture
An open-source pipeline for tracking software releases, security advisories, and community signals. The goal is to continuously poll 20+ vendor feeds, package registries, and developer forums; normalise every event into one of two MongoDB collections (versions or reddit); stamp each document with a structured identifier; and expose the full corpus through a typed REST API and a static browser client. No proprietary transforms, no vendor lock-in.
Each section is collapsed by default. Click any heading to expand.
Platform Overview
Six subsystems, two MongoDB collections, one REST API. Arrows show the primary data and control paths.
NVD Β· CVE Β· Patch Tuesday Β· Ubuntu Β· Debian"] SRC2["Browser Releases
Chrome Β· Firefox Β· Safari"] SRC3["Runtimes and Platforms
Node.js Β· Python Β· Linux Kernel Β· Eclipse"] SRC4["Community Signals
Reddit 20+ subreddits Β· Stack Overflow"] end subgraph BOT ["releasetrain-bot Β· Python collector fleet"] BOT_V["Version pollers
15+ release scrapers"] BOT_R["Reddit scraper
PRAW"] BOT_ML["ML labeller
isUpdateRelated Β· positiveScore"] end subgraph STORE ["MongoDB Atlas"] COL_V[("versions")] COL_R[("reddit")] end subgraph SERVER ["releasetrain-server Β· Node and Express"] RT_V["/api/v/*"] RT_R["/api/reddit/*"] RT_AGG["/api/aggregate/*"] end subgraph CLIENT ["releasetrain-client Β· Static JS and HTML"] UI_FEED["Feed and Dashboard"] UI_CVE["CVE View"] UI_GRAPH["Component Graph"] UI_OTHER["Docs Β· Label Β· Archive views"] end subgraph CHAT ["releasetrain-chat Β· Streamlit RAG"] CHAT_EMB["Embeddings
text encoder"] CHAT_LLM["LLM Inference
DigitalOcean GenAI"] CHAT_UI["Chat Interface"] end SRC1 & SRC2 & SRC3 --> BOT_V --> COL_V SRC4 --> BOT_R --> BOT_ML --> COL_R COL_V --> RT_V & RT_AGG COL_R --> RT_R RT_V --> UI_FEED & UI_CVE & UI_GRAPH RT_R --> UI_FEED COL_V & COL_R --> CHAT_EMB --> CHAT_LLM --> CHAT_UI classDef ext fill:#eef6ff,stroke:#9fb9d9,color:#1b1f23 classDef bot fill:#fff7ea,stroke:#ead171,color:#1b1f23 classDef db fill:#eefbf0,stroke:#9cd6ae,color:#1b1f23 classDef api fill:#f4f6f8,stroke:#d0d7de,color:#1b1f23 classDef ui fill:#fbfcfe,stroke:#d0d7de,color:#1b1f23 classDef chat fill:#fde7e9,stroke:#f1abb5,color:#1b1f23 class SRC1,SRC2,SRC3,SRC4 ext class BOT_V,BOT_R,BOT_ML bot class COL_V,COL_R db class RT_V,RT_R,RT_AGG api class UI_FEED,UI_CVE,UI_GRAPH,UI_OTHER ui class CHAT_EMB,CHAT_LLM,CHAT_UI chat
Source Coverage: 20+ release and community feeds
Security and vendor feeds write to the versions collection. Community signals write to the reddit collection.
Data Ingestion and Enrichment Pipeline
Raw signals pass through four stages before landing in MongoDB.
and release pages"] C2["Scrape Reddit
via PRAW"] end subgraph NORM ["Normalise"] N1["Deduplicate
by redditId or versionId"] N2["Parse semver
and release channel"] end subgraph ENRICH ["Enrich"] E1["Predict isUpdateRelated"] E2["Predict positiveScore"] E3["Stamp versionId
YYYYMMDD Β· name Β· version"] E4["Add search tags
and timestamps"] end subgraph STORE ["Store"] S1[("versions")] S2[("reddit")] end C1 --> N2 --> E3 --> E4 --> S1 C2 --> N1 --> E1 --> E2 --> S2 classDef db fill:#eefbf0,stroke:#9cd6ae,color:#1b1f23 class S1,S2 db
API Topology: REST route groups and MongoDB dependencies
All REST route groups and the MongoDB collection each reads or writes.
fc Β· fcc Β· stats/by-month"] V2["POST and PUT versions"] end subgraph RED ["/api/reddit/* Β· Reddit"] R1["GET reddit Β· by-subreddit
questions Β· positive Β· cve Β· stats"] R2["POST and PUT reddit"] end subgraph AGG ["/api/aggregate/* Β· Aggregations"] A1["GET v/typeBreakdown Β· v/updateTypeCount
v/classificationSummary Β· v/versionCountByDay"] A2["GET reddit/summary Β· reddit/count
reddit/bySource Β· reddit/bySubreddit Β· reddit/countByDay"] end subgraph SYS ["System"] S1["GET health Β· meta
test/all Β· test/endpoints"] end end DB_V[("versions")] DB_R[("reddit")] CL --> VER & RED & AGG & SYS VER --> DB_V AGG --> DB_V RED --> DB_R classDef db fill:#eefbf0,stroke:#9cd6ae,color:#1b1f23 class DB_V,DB_R db
Frontend Views: releasetrain-client
All views served as static HTML and JS from a single Nginx or http-server process. Click a view to expand details.
Feed and Dashboard
/api/v/search and /api/reddit. Built with jQuery and Chart.js.CVE View
Component Graph
/src/data/graph.json.Architecture View
plantuml-decoder.min.js and the bundled WASM core. Displays system architecture diagrams defined in PlantUML DSL without any server round-trip.Docs and API Explorer
Reddit View
/api/reddit and all query sub-routes.Label View
isUpdateRelated classifier. Displays one Reddit post at a time and writes the label back via PUT /api/reddit/:id.EDI40-2023 View
JUSPN View
MLTL View
Changelog
AI Chat: RAG pipeline with vector search and LLM inference
A Streamlit application (releasetrain-chat) that wraps a retrieval-augmented generation pipeline over both MongoDB collections. At startup it encodes release notes and community posts into a FAISS vector index. At query time it retrieves the top-k most similar documents by cosine similarity, optionally reranked by a cross-encoder, then injects them into an LLM prompt served via DigitalOcean GenAI (Llama 3 or Mistral). Responses include source citations derived from the retrieved documents.
sentence-transformers"] RET["Vector retriever
top k similarity search"] RANK["Reranker
cross encoder scoring"] AUG["Prompt augmenter
release context injection"] LLM["LLM Inference
DigitalOcean GenAI Platform
Llama 3 Β· Mistral"] end subgraph CORPUS ["Knowledge Corpus"] DB_V[("versions")] DB_R[("reddit")] IDX[("FAISS vector index")] end USER --> Q --> RET DB_V & DB_R --> IDX IDX --> RET --> RANK --> AUG --> LLM --> ANS(["Grounded answer
with source citations"]) classDef db fill:#eefbf0,stroke:#9cd6ae,color:#1b1f23 classDef chat fill:#fde7e9,stroke:#f1abb5,color:#1b1f23 class DB_V,DB_R,IDX db class Q,RET,RANK,AUG,LLM chat
Embedding model
sentence-transformers all-MiniLM-L6-v2 or OpenAI text-embedding-3-small. Encodes release notes and Reddit posts into dense 384 or 1536 dimensional vectors indexed in FAISS.
LLM
Served via DigitalOcean GenAI Platform. Llama 3 70B or Mistral 7B receives an augmented prompt containing retrieved release context and returns a grounded, cited answer.
Vector store
FAISS in memory index rebuilt at startup from both collections. Top k cosine similarity retrieval with configurable k (default 10 documents per query).
Retrieval strategy
Hybrid: dense vector search over embeddings plus sparse BM25 keyword match over version IDs and CVE strings. Results merged and reranked before prompt injection.
Request Lifecycle: Version search call path
Canonical read path through the stack for a version search call.
unless start or end param overrides
Health & meta
Endpoints collapsed by default, ordered method-first then route.
GET/api/health
Lightweight health check. No database query.
curl "https://releasetrain.io/api/health"{ "ok": true, "service": "releasetrain", "serverTime": "2026-03-25T17:00:00.000Z" }GET/api/meta
Diagnostics route that pings the database and reports collection names.
curl "https://releasetrain.io/api/meta"{ "ok": true, "dbName": "releasetrain", "ping": { "ok": 1 }, "collections": { "versions": "versions", "users": "users" } }Reddit endpoints
Reddit at a glance
Ingestion, listing, subreddit filtering, update-related search, CVE text search, positive-score retrieval, monthly summaries, and single-item fetch/update. Some routes use page/limit paging; others also support cursor-based navigation.
Endpoints collapsed by default, ordered method-first then route.
GET/api/iot
Specialized Reddit view for IoT-style subreddits, with optional update-related filtering.
curl "https://releasetrain.io/api/iot?limit=100&page=1"GET/api/reddit
List Reddit docs, newest first. Supports page/limit, cursor paging, monthly filtering, and update-related filtering.
curl "https://releasetrain.io/api/reddit?limit=25&page=1&showCount=true"{ "data": [ { "title": "...", "created_utc": "..." } ], "totalCount": 25 }GET/api/reddit/:redditId
Fetch a single Reddit item by Mongo ObjectId or Reddit short id.
curl "https://releasetrain.io/api/reddit/1nq0h33"GET/api/reddit/by-subreddit
Fetch by one or more subreddits, case-insensitive. Optional score, comment, pagination, and projection filters.
curl "https://releasetrain.io/api/reddit/by-subreddit?q=programming,technology&minScore=50&limit=25"GET/api/reddit/count
Total post count over the rolling two-year window.
curl "https://releasetrain.io/api/reddit/count"{ "totalRedditPosts": 248 }GET/api/reddit/meta/subreddits
Return all unique subreddits from the Reddit collection.
curl "https://releasetrain.io/api/reddit/meta/subreddits"{ "success": true, "count": 14, "data": ["programming", "technology"] }GET/api/reddit/query/cve
Text-search Reddit content for CVE-like strings.
curl "https://releasetrain.io/api/reddit/query/cve?q=CVE-&limit=25"GET/api/reddit/query/filter
Filter Reddit docs by comment count and predicted positive score window.
curl "https://releasetrain.io/api/reddit/query/filter?minComments=5&minScore=0.6&maxScore=0.95&limit=50"GET/api/reddit/query/positive
Returns all Reddit docs where metadata.predicted.positiveScore > 0.5, sorted by score descending.
curl "https://releasetrain.io/api/reddit/query/positive"{ "total": 83, "minScore": 0.5, "data": [ ... ] }GET/api/reddit/query/questions
Return Reddit docs where metadata.predicted.isUpdateRelated is true and a question marker appears in title or author description.
curl "https://releasetrain.io/api/reddit/query/questions?where=either&limit=25&page=1&showCount=true"GET/api/reddit/query/update-related
Filter Reddit documents by the nested update-related fields under metadata.labeled and metadata.predicted.
curl "https://releasetrain.io/api/reddit/query/update-related?isLabeled=true&isUpdateRelated=true&limit=25&page=1"GET/api/reddit/stats/by-month
Monthly counts for labeled training distribution.
curl "https://releasetrain.io/api/reddit/stats/by-month?startMonth=202401&endMonth=202412"GET/api/reddit/stats/summary
Range summary over Reddit data: total docs, risky docs, latest-update mentions, and CVE mentions.
curl "https://releasetrain.io/api/reddit/stats/summary?start=20240101&end=20241231"GET/api/subreddits/smoke
Simple footprint check to list distinct subreddits found in the Reddit collection.
curl "https://releasetrain.io/api/subreddits/smoke"{ "count": 14, "items": ["programming", "technology"] }POST/api/reddit
Insert or upsert a Reddit document by redditId.
curl -X POST "https://releasetrain.io/api/reddit" -H "Content-Type: application/json" -d '{"redditId":"1nq0h33","title":"Example","subreddit":"programming"}'PUT/api/reddit
Replace a Reddit document while preserving the existing Mongo _id.
curl -X PUT "https://releasetrain.io/api/reddit" -H "Content-Type: application/json" -d '{"redditId":"1nq0h33","title":"Updated"}'PUT/api/reddit/:redditId
Update a single Reddit item by ObjectId or redditId.
curl -X PUT "https://releasetrain.io/api/reddit/1nq0h33" -H "Content-Type: application/json" -d '{"title":"Retitled","score":88}'Version endpoints
Versions at a glance
Two flavors: older convenience routes like /api/v/ and the newer /api/v/search. For steady client integration, /api/v/search is the safer path: its filters and paging model are explicit.
Endpoints collapsed by default, ordered method-first then route.
GET/api/dashboard/mltl-risk
Dashboard aggregate for documents marked as MLTL risk. Optional component filter.
curl "https://releasetrain.io/api/dashboard/mltl-risk?q=chrome,firefox"GET/api/v/
Older convenience route: returns recent versions. Without q, a broad recent slice; with q, divides the limit across components.
curl "https://releasetrain.io/api/v/?q=chrome,firefox"{ "versions": [ ... ] }GET/api/v/:id
Fetch one version by Mongo ObjectId.
curl "https://releasetrain.io/api/v/660c38fce3cba9423e4f8f23"GET/api/v/aggregate/byDate
Count how many versions were released on a specific day.
curl "https://releasetrain.io/api/v/aggregate/byDate?date=20250723"{ "success": true, "date": "20250723", "count": 42 }GET/api/v/count
Total version count inside the rolling two-year window.
curl "https://releasetrain.io/api/v/count"{ "totalVersions": 54210 }GET/api/v/d/versionsByComponent
Return latest/current/CVE snapshots for one or more components.
curl "https://releasetrain.io/api/v/d/versionsByComponent?component=name:chrome,version:118"[ { "name": "chrome", "latestVersion": { ... }, "currentVersion": { ... }, "latestCveVersion": { ... } } ]GET/api/v/fc
Forecast the next release date for a single component.
curl "https://releasetrain.io/api/v/fc?q=chrome"[ { "component": "chrome", "releaseDate": "2026-04-07", "version": "135.0.0" } ]GET/api/v/fcc
Forecast coinciding release dates for multiple components.
curl "https://releasetrain.io/api/v/fcc?q=chrome,firefox"GET/api/v/latest10
Latest N versions per requested component.
curl "https://releasetrain.io/api/v/latest10?q=chrome,firefox&limit=10"GET/api/v/search
Unified read-only search with filters, projections, count, and cursor paging.
curl "https://releasetrain.io/api/v/search?q=chrome,firefox&limit=50&page=1&showCount=true"
curl "https://releasetrain.io/api/v/search?q=chrome&channel=patch&isCve=true&fields=versionId,versionNumber&limit=25"{ "data": [ { "_id": "...", "versionId": "20250217chrome1.2.3" } ], "totalCount": 314 }GET/api/v/stats/by-month
Monthly counts grouped by month and release channel, optionally filtered to selected components.
curl "https://releasetrain.io/api/v/stats/by-month?startMonth=202401&endMonth=202412&q=chrome,firefox"GET/api/v/versionId/:versionId
Fetch one version by business identifier rather than Mongo ObjectId.
curl "https://releasetrain.io/api/v/versionId/20250217chrome1.2.3"POST/api/v
Create a version document. The server normalizes versionNumber, infers the release channel, and fills timestamps/search tags.
curl -X POST "https://releasetrain.io/api/v" -H "Content-Type: application/json" -d '{"versionProductName":"chrome","versionNumber":"124.0.1","versionReleaseDate":"20260325","versionReleaseChannel":"patch"}'PUT/api/v/:id
Update an existing version by Mongo ObjectId.
curl -X PUT "https://releasetrain.io/api/v/660c38fce3cba9423e4f8f23" -H "Content-Type: application/json" -d '{"classification":{"componentType":["browser"]}}'PUT/api/v/versionId/:versionId
Update an existing version by business identifier. Refreshes versionTimestampLastUpdate.
curl -X PUT "https://releasetrain.io/api/v/versionId/20250217chrome1.2.3" -H "Content-Type: application/json" -d '{"newFieldName":"newValue"}'Component endpoints
Endpoints collapsed by default, ordered method-first then route.
GET/api/c/count
Total number of distinct components in the rolling two-year window.
curl "https://releasetrain.io/api/c/count"{ "totalComponents": 1440 }GET/api/c/frequency
Builds a component list from top components and those updated today.
curl "https://releasetrain.io/api/c/frequency"{ "totalComponents": 20, "components": ["chrome", "firefox"] }GET/api/c/name/:componentName/:versionNumber?
Fetch version history for a specific component, optionally narrowed to one exact version number.
curl "https://releasetrain.io/api/c/name/firefox"
curl "https://releasetrain.io/api/c/name/firefox/118.0.1"GET/api/c/names
Return distinct component names in the rolling two-year window.
curl "https://releasetrain.io/api/c/names"GET/api/c/os
Returns distinct component names classified as OS in the rolling two-year window.
curl "https://releasetrain.io/api/c/os"GET/api/component/
Search component records. Matches product name or predicted component type.
curl "https://releasetrain.io/api/component/?q=linux"Aggregation endpoints
Endpoints collapsed by default, ordered method-first then route.
Measure volume, gaps, and classification counts across both collections. All date params use YYYYMMDD format. Range-based endpoints default to the rolling 2-year window when start/end are omitted.
Versions Β· /api/aggregate/v/*
GET/api/aggregate/v/typeBreakdown
CVE vs release-note split plus per-channel breakdown for a date range. Omit start/end to use the rolling 2-year window.
curl "https://releasetrain.io/api/aggregate/v/typeBreakdown?start=20250101&end=20251231"{
"range": { "start": "20250101", "end": "20251231" },
"total": 4821,
"cveCount": 1203,
"nonCveCount": 3618,
"byChannel": { "major": 312, "minor": 1540, "patch": 1766, "cve": 1203, "other": 0 }
}GET/api/aggregate/v/classificationSummary
Summarize security and breaking classification tags for one day.
curl "https://releasetrain.io/api/aggregate/v/classificationSummary?timestamp=20250723"{ "timestamp": "20250723", "total": 14, "classification": { "security-fix": 8, "breaking-change": 6 } }GET/api/aggregate/v/componentTypeCount
Count classified component types for a single day.
curl "https://releasetrain.io/api/aggregate/v/componentTypeCount?timestamp=20250730"{ "timestamp": "20250730", "total": 40, "components": { "browser": 8, "os": 6 } }GET/api/aggregate/v/missingFields
Sample documents where a requested field is missing, null, or empty. Useful for data quality audits.
curl "https://releasetrain.io/api/aggregate/v/missingFields?field=versionNumber&limit=50"GET/api/aggregate/v/oldestTimestamp
Find the release date of the Nth newest document and compute its age in days.
curl "https://releasetrain.io/api/aggregate/v/oldestTimestamp?count=1000"{ "oldest": "20250723", "deltaInDays": 245 }GET/api/aggregate/v/sourceCountByType
Count one source type for a given day. sourceType: cve, major, minor, patch.
curl "https://releasetrain.io/api/aggregate/v/sourceCountByType?sourceType=cve×tamp=20250723"{ "timestamp": "20250723", "sourceType": "cve", "count": 18 }GET/api/aggregate/v/updateTypeCount
Count major, minor, patch, and other versions for a single day.
curl "https://releasetrain.io/api/aggregate/v/updateTypeCount?timestamp=20250723"{ "timestamp": "20250723", "major": 2, "minor": 5, "patch": 17, "other": 1 }GET/api/aggregate/v/versionCountByDay
Day-by-day version counts over an explicit date range.
curl "https://releasetrain.io/api/aggregate/v/versionCountByDay?start=20250701&end=20250730"{ "range": { "start": "20250701", "end": "20250730" }, "days": [{ "_id": "20250701", "count": 12 }] }Community Β· /api/aggregate/reddit/*
GET/api/aggregate/reddit/summary
Full overview in one call: totals, Reddit vs Stack Overflow split, top subreddits/components, and score stats. Omit start/end for the rolling 2-year window.
curl "https://releasetrain.io/api/aggregate/reddit/summary?topN=5"{
"range": { "start": "20240518", "end": "20260518" },
"total": 28340,
"bySource": { "reddit": 21050, "stackoverflow": 7290 },
"topSubreddits": [{ "_id": "android", "count": 4120 }, { "_id": "chrome", "count": 2980 }],
"score": { "avg": 14.3, "max": 4821, "totalWithPositiveScore": 19204 }
}GET/api/aggregate/reddit/count
Total post count plus Reddit vs Stack Overflow split for a date range.
curl "https://releasetrain.io/api/aggregate/reddit/count?start=20250101&end=20251231"{ "range": { "start": "20250101", "end": "20251231" }, "total": 14820, "redditCount": 11340, "stackoverflowCount": 3480 }GET/api/aggregate/reddit/bySource
Counts grouped by source field. Documents without a source field are counted as reddit.
curl "https://releasetrain.io/api/aggregate/reddit/bySource"{ "range": { "start": "20240518", "end": "20260518" }, "total": 28340, "sources": { "reddit": 21050, "stackoverflow": 7290 } }GET/api/aggregate/reddit/bySubreddit
Top subreddits or Stack Overflow components by post count, with average score. Filter by source to compare communities.
curl "https://releasetrain.io/api/aggregate/reddit/bySubreddit?limit=5&source=reddit"{
"range": { "start": "20240518", "end": "20260518" },
"source": "reddit", "limit": 5,
"subreddits": [{ "_id": "android", "count": 4120, "avgScore": 18.4 }]
}GET/api/aggregate/reddit/countByDay
Post count per day. Same shape as /api/aggregate/v/versionCountByDay. Filter by source to compare Reddit vs Stack Overflow ingestion cadence.
curl "https://releasetrain.io/api/aggregate/reddit/countByDay?start=20250701&end=20250730&source=stackoverflow"{ "range": { "start": "20250701", "end": "20250730" }, "source": "stackoverflow", "days": [{ "_id": "20250701", "count": 8 }] }Test & discovery endpoints
Endpoints collapsed by default, ordered method-first then route.
GET/api/test/all
Runs a built-in suite of GET requests against selected routes and returns status plus payload samples.
curl "https://releasetrain.io/api/test/all"{ "totalGET": 11, "results": [ { "method": "GET", "path": "/api/v?q=chrome,firefox", "status": 200, "success": true } ] }GET/api/test/endpoints
Enumerates all registered routes and flags duplicates.
curl "https://releasetrain.io/api/test/endpoints"{ "totalEndpoints": 30, "endpoints": [ { "methods": ["GET"], "path": "/api/health" } ] }GET/api/test/endpoints/html
HTML view of discovered GET endpoints with generated example URLs and sample outputs.
curl "https://releasetrain.io/api/test/endpoints/html"β‘ Quick Start
Three requests that cover the most common use cases. Paste any into a terminal to verify connectivity and explore the response shape. Base URL: https://releasetrain.io Β· All GET endpoints are public and unauthenticated.
What released in the last 7 days?
curl "https://releasetrain.io/api/v/search?start=20260511&end=20260518&limit=25&showCount=true"
Any CVEs published this month?
curl "https://releasetrain.io/api/v/search?channel=cve&start=20260501&end=20260531&limit=50&showCount=true"
What is Reddit saying about Chrome right now?
curl "https://releasetrain.io/api/reddit/by-subreddit?q=chrome&minScore=5&limit=25"
Latest 10 Firefox releases with key fields only
curl "https://releasetrain.io/api/v/latest10?q=firefox&limit=10"
Forecast Chrome's next release date
curl "https://releasetrain.io/api/v/fc?q=chrome"
π Data Models
Two MongoDB collections store all release intelligence. Not every document has every field β the schema evolves as new sources are added.
versions collection
One document per software version release. Written by the bot fleet; read by /api/v/* and /api/aggregate/*.
| Field | Type | Description |
|---|---|---|
_id | ObjectId | MongoDB document ID |
versionId | string | Composite key: YYYYMMDD + productName + versionNumber. Unique per release. |
versionProductName | string | Normalised product name, lowercase. e.g. chrome, firefox |
versionNumber | string | Semver string. e.g. 124.0.1 |
versionReleaseDate | string | Release date as YYYYMMDD |
versionReleaseChannel | string | Inferred: major Β· minor Β· patch Β· cve Β· other |
versionTimestamp | number | Unix milliseconds derived from versionReleaseDate |
versionTimestampLastUpdate | string | ISO-8601 datetime of last write |
isCve | boolean | true when channel is cve |
classification.componentType | string[] | e.g. ["browser"], ["os"], ["runtime"] |
classification.securityType | string[] | e.g. ["security-fix"] |
classification.breakingType | string[] | e.g. ["breaking-change"] |
metadata.predicted.isUpdateRelated | boolean | ML prediction: is this an update-related version? |
metadata.predicted.positiveScore | number | Sentiment score 0β1. Above 0.5 is positive. |
metadata.labeled.isUpdateRelated | boolean|null | Human override. null = unlabeled. |
{
"_id": "660c38fce3cba9423e4f8f23",
"versionId": "20250217chrome124.0.1",
"versionProductName": "chrome",
"versionNumber": "124.0.1",
"versionReleaseDate": "20250217",
"versionReleaseChannel": "patch",
"versionTimestamp": 1739750400000,
"versionTimestampLastUpdate": "2025-02-17T10:00:00.000Z",
"isCve": false,
"classification": { "componentType": ["browser"] },
"metadata": {
"predicted": { "isUpdateRelated": true, "positiveScore": 0.72 },
"labeled": { "isUpdateRelated": null }
}
}
reddit collection
One document per Reddit post. Written by the scraper and ML labeller; read by /api/reddit/*.
| Field | Type | Description |
|---|---|---|
_id | ObjectId | MongoDB document ID |
redditId | string | Reddit short ID, e.g. 1nq0h33. Deduplication key. |
title | string | Post title |
subreddit | string | Subreddit name without prefix, e.g. programming |
author_description | string | Post body / selftext |
score | number | Reddit upvote score at ingestion time |
num_comments | number | Comment count at ingestion time |
created_utc | string | ISO-8601 datetime of original Reddit post |
updatedAt | string | ISO-8601 datetime of last upsert |
comments | object[] | Top-level comment objects from the thread |
metadata.predicted.isUpdateRelated | boolean | ML prediction: does this post discuss a software update? |
metadata.predicted.positiveScore | number | Sentiment score 0β1 |
metadata.labeled.isUpdateRelated | boolean|null | Human label. null = unlabeled. |
{
"_id": "69d176bbda0850f83829b2d6",
"redditId": "1nq0h33",
"title": "Chrome 124 breaks extension manifest v2",
"subreddit": "chrome",
"author_description": "After updating to 124.0.1 my uBlock Origin stopped...",
"score": 342,
"num_comments": 87,
"created_utc": "2025-02-18T09:14:00.000Z",
"updatedAt": "2025-02-18T12:00:00.000Z",
"metadata": {
"predicted": { "isUpdateRelated": true, "positiveScore": 0.21 },
"labeled": { "isUpdateRelated": true }
}
}
π Pagination Guide
The API supports two paging schemes. Use page/limit for random access; use cursor for efficient forward scrolling through large result sets without offset drift.
Page / Limit
| Param | Description |
|---|---|
page | 1-based page number. Default: 1 |
limit | Documents per page. Default: 25. Pass limit=all to disable paging (large collections only). |
showCount=true | Adds totalCount to the response body at the cost of an extra countDocuments call. |
curl "https://releasetrain.io/api/v/search?q=chrome&limit=50&page=2&showCount=true"
# response: { "data": [...], "totalCount": 314 }
Cursor (forward-only)
| Param | Description |
|---|---|
cursor | Opaque token from the X-Next-Cursor response header. Omit for the first page. |
# First page β capture the X-Next-Cursor response header
curl -i "https://releasetrain.io/api/v/search?q=chrome&limit=50"
# Subsequent page
curl "https://releasetrain.io/api/v/search?q=chrome&limit=50&cursor=TOKEN"
Support by endpoint
| Endpoint | Page/Limit | Cursor |
|---|---|---|
/api/v/search | β | β |
/api/v/latest10 | β | β |
/api/reddit | β | β |
/api/reddit/by-subreddit | β | β |
/api/reddit/query/cve | β | β |
/api/reddit/query/questions | β | β |
/api/reddit/query/update-related | β | β |
/api/iot | β | β |
π€ ML Fields
Two predicted fields are added to every Reddit document by the bot ML labeller. Human overrides live alongside predictions so consumers can choose which branch to trust.
metadata.predicted.isUpdateRelated
Text classifier trained on labeled Reddit posts. Predicts whether a post discusses a software update β a new release, patch announcement, upgrade discussion, or CVE advisory.
| Value | Meaning |
|---|---|
true | Post likely discusses a software update or release event |
false | Post predicted as not update-related |
| field absent | Document has not been scored yet |
Query predicted positives: /api/reddit/query/update-related?isLabeled=false&isUpdateRelated=true. Switch to isLabeled=true to use human labels.
metadata.predicted.positiveScore
Sentiment classifier output. Continuous score 0β1 reflecting community sentiment toward the update being discussed. Scores above 0.5 are considered positive.
| Range | Interpretation |
|---|---|
0.0 β 0.5 | Neutral to negative sentiment |
0.5 β 0.75 | Mildly positive |
0.75 β 1.0 | Strongly positive |
Retrieve high-confidence positives: /api/reddit/query/positive. Use /api/reddit/query/filter?minScore=0.75&maxScore=1.0 for a custom window.
Human labels vs predictions
Human labels live under metadata.labeled.isUpdateRelated and take precedence over predictions for training and evaluation. The Label view in releasetrain-client is the labelling interface. Pass isLabeled=true on any query route to read only human-verified data.
π Glossary
Domain-specific terms used throughout the API, codebase and this documentation.
| Term | Definition |
|---|---|
versionId | Composite business key: YYYYMMDD + productName + versionNumber. Unique per release. Used by /api/v/versionId/:id. |
versionReleaseChannel | Inferred semver category: major (breaking), minor (feature), patch (bugfix), cve (security advisory), other (non-standard). |
isCve | Set to true when a version document originates from a CVE or NVD advisory rather than a regular release channel. |
isUpdateRelated | Boolean: does a Reddit post discuss a software update? Stored under metadata.predicted (ML) and metadata.labeled (human). |
positiveScore | ML-predicted sentiment score 0β1. Reflects community tone toward the update discussed in a Reddit post. |
rolling 2-year window | Most read routes restrict results to the past two years by default. Override with ?start=YYYYMMDD&end=YYYYMMDD. |
redditId | Reddit's own short alphanumeric post ID, e.g. 1nq0h33. Used as the deduplication key on ingestion. |
cursor | Opaque pagination token from the X-Next-Cursor response header. Pass on the next request to continue without offset drift. |
componentType | Classification tag assigned during enrichment: browser, os, runtime, package, tool. |
MLTL | Multi-Language Temporal Logic. A spec-tracking module for cross-language version timeline analysis. |
JUSPN | Japanese specification release tracking module. Mirrors the EDI40-2023 stack matrix for Japanese standard editions. |
EDI40-2023 | EDI 4.0 2023 specification adoption tracker covering LAMP, MEAN, MERN, MEVN and WAMP stack version matrices. |
β Error Format
All error responses return JSON. Use the HTTP status code for programmatic branching; the message field is human-readable context.
| Status | When | Body shape |
|---|---|---|
200 OK | Success | { "data": [...] } or collection-specific shape |
400 Bad Request | Missing or invalid parameter | { "error": "Bad Request", "message": "..." } |
404 Not Found | Document ID does not exist | { "error": "Not Found", "message": "document not found" } |
500 Server Error | Unhandled exception or DB error | { "error": "Internal Server Error", "message": "..." } |
// 400 example
{ "error": "Bad Request", "message": "field param is required" }
// 404 example
{ "error": "Not Found", "message": "document not found" }
The /api/health and /api/meta routes use a slightly different shape: { "ok": true } on success and { "ok": false, "error": "..." } on failure.
π Field Projection
Several endpoints accept a fields query parameter that limits the fields returned per document β useful for reducing payload size when building lightweight dashboards or mobile clients.
# Return only title, subreddit and score
curl "https://releasetrain.io/api/reddit?limit=50&fields=title,subreddit,score"
# Return only key version fields
curl "https://releasetrain.io/api/v/search?q=chrome&fields=versionId,versionNumber,versionReleaseDate"
# Nested fields via dot notation
curl "https://releasetrain.io/api/reddit?fields=title,metadata.predicted.isUpdateRelated"
| Endpoint | Notes |
|---|---|
/api/v/search | All version fields including nested paths |
/api/reddit | All reddit fields including nested paths |
/api/reddit/query/cve | All reddit fields |
/api/reddit/query/questions | All reddit fields |
/api/reddit/query/update-related | All reddit fields |
/api/reddit/by-subreddit | All reddit fields |
_id is always returned regardless of the fields value. Nested paths use dot notation, e.g. metadata.predicted.positiveScore.
π Security and Access
Current access model for the REST API.
| Concern | Status |
|---|---|
| Authentication | None required for GET endpoints. The full read surface is public. |
| Write access | POST and PUT routes are intended for the bot fleet and are not token-guarded in the current version β treat them as internal. |
| CORS | Permissive. All origins are allowed for GET requests. Safe to call from browser JavaScript. |
| Rate limiting | No hard limit is enforced. For long-running integrations, cache responses locally and use cursor or start/end to request only new data. |
| HTTPS | All production traffic is served over HTTPS. HTTP redirects to HTTPS. |
| Data sensitivity | All data is derived from public sources (Reddit, public vendor advisories, NVD). No PII is stored. |
π Live Collection Stats
Fetched live from the API on first load. Rolling 2-year window. Est. size based on average document size.
π¦ Versions
| Total | β¦ |
| CVE advisories | β¦ |
| Release notes | β¦ |
| Est. size | β¦ |
π¬ Community
| Total | β¦ |
| Reddit posts | β¦ |
| Stack Overflow | β¦ |
| Est. size | β¦ |
Loading freshness infoβ¦
CVE Lifecycle Pipeline
CVE Post Timeline
Source
Filters
π Changelog
Notable changes to releasetrain-client, newest first.
Reddit integration in feed cards, timeline sort fixes & docs rewrite
- feat Reddit and Stack Overflow posts now appear inside each component feed card, interleaved chronologically with version entries using a two-pointer merge.
- feat π§ SO toggle added to the left sidebar: counts Stack Overflow posts matched to the active component set.
- fix
created_utcstored as Unix epoch seconds was being parsed as milliseconds (β 1970). Now multiplied by 1000 before passing toDate(). - fix
versionTimestampis set toDate.now()on every bot upsert, making all versions appear as "today". Timeline now usesversionReleaseDate(YYYYMMDD) as the sort key instead. - fix Reddit posts were appended as a block after all versions. Replaced
.sort()with a proper two-pointer merge of two pre-sorted arrays to guarantee chronological interleaving. - fix Left sidebar toggle counts (π¬ Reddit, β οΈ SO Risk, etc.) stayed at 0 after reddit loaded.
paintFixedCountsnow fires insideensureRedditLoaded()once the index is built. - fix
positiveScore > 0.5filter was too strict β many posts have no ML score. Feed now fetches from/api/reddit?limit=400(all recent posts by date) instead of the positive-score endpoint. - ux Partial subreddit matching: "android" matches "androiddev", "chrome" matches "googlechrome", etc.
- ux Reddit items in the feed card show an orange left border and an
r/subredditchip; Stack Overflow items show amber. - docs Docs view rewritten: marketing language removed, section headings use
:instead ofΒ·, AI chat section now describes the RAG pipeline technically (FAISS index, cosine similarity, cross-encoder reranking, LLM prompt injection). - fix Default feed no longer shows future-dated documents: server request now sends
end=todayand the client-side filter rejects any version withversionReleaseDate > now. - fix Mobile scroll broken:
html, body { overflow: hidden }was trapping all content on narrow viewports where the feed panel loses its own scroll container. Override tooverflow: auto; height: autoatmax-width: 640px.
Dashboard graphs, phone layout & changelog
- feat Hub-and-spoke dependency map on dashboard right pane: visualises componentβversions, CVE versions, Reddit posts, and Reddit+CVE edges on a DPR-aware canvas.
- feat Ecosystem snapshot graph on aggregate left pane: auto-selects the most diversely-connected component from the live dataset.
- feat Release cadence bar chart (releases/week) and Reddit mention trend line chart added to the component pane.
- fix Reddit mention trend was always empty: day-key format mismatch (
YYYYMMDDvsYYYY-MM-DD) corrected. - ux Dashboard right pane now scrolls independently; left pane and page remain locked.
- ux Changelog view added (this page) with semver-labelled entries.
- ux Graph view on phones: Sigma canvas reduced to 60 vh; sidebar controls collapse to a fixed bottom-sheet overlay.
- perf
touch-action: noneon graph canvas prevents iOS scroll hijack during pinch-zoom. - perf
prefers-reduced-motionmedia query disables all animations for users who opt out. - ux Added
meta theme-colorandcolor-schemefor better browser chrome theming.
Dashboard two-pane layout
- feat Two-pane dashboard: Total Aggregate (left, always global) and Component (right, filtered).
- feat Sample chips in empty right pane for quick component selection.
- feat Page scroll locked while dashboard is active; restored on exit.
- fix Removed duplicate "Add component" input from dashboard sidebar.
- ux Main search always syncs to dashboard on activate.
Dashboard KPIs & timeline
- feat KPI strip: Today count, Last 2 yrs, CVE today, Signals today: all with delta badges.
- feat Release & signal timeline line chart with configurable window (7 / 30 / 90 days).
- feat Update-type bar chart (major / minor / patch / CVE) with yesterday comparison.
π Dashboard view introduced
- feat New Dashboard nav entry with sidebar controls (timeline window, date picker).
- feat Aggregate version counts fetched from
/api/v/aggregateendpoints. - feat Community signals section: Reddit and StackOverflow post counts.
CVE timeline view
- feat Dedicated CVE view with filterable timeline and CVSS severity chips.
- feat Reddit / SO risk-signal overlay on CVE cards.
- ux Responsive CVE cards collapse gracefully on narrow viewports.
Graph view: Sigma.js
- feat Component dependency graph powered by Sigma.js with cost-based node colouring.
- feat Layer toggles (versions, Reddit/SO posts, CVE posts, CVE versions).
- feat Ring-overlay canvas for component groupings.
- ux Node info panel on click with links to API endpoints.
π€ Credits & Acknowledgements
We gratefully thank the following open data, open-source teams, and services whose APIs, feeds, databases, and software helped build releasetrain.io.
- NIST NVD: National Vulnerability DatabaseCVE enrichment, CVSS scores, and affected-software data
- MITRE CVE: Common Vulnerabilities and ExposuresPublic CVE identifiers and descriptions
- GitHub APIRelease notes, repository metadata, and version data
- Reddit APICommunity discussion signals for security and update topics
- Stack Exchange APIStackOverflow post data for community risk signals
- Chart.jsDashboard chart rendering
- PlantUMLArchitecture diagram generation
- Sigma.jsComponent dependency graph visualisation
- pakoPlantUML deflate encoding
- Linux DistrowatchLinux distribution release feed
- Android WikipediaAndroid version history content
- Firefox CalendarFirefox release schedule content
- Firefox Security AdvisoriesFirefox CVE content
- iOS WikipediaiOS version history content
- Java Release FeedJava version release dates
- MySQL Release NotesMySQL release history content
- Python Release FeedPython version release information
- Eclipse Release FeedEclipse IDE release history
- DigitalOcean StatusInfrastructure release feed
- VirtualBox ChangelogVirtualBox release feed
- Microsoft Windows Release InfoWindows version release feed
- Font AwesomeIcon library
- loading.ioLoading animation assets
- regexr.comRegex development tool
- UptimeRobotUptime monitoring
π Recent Updates (last 6 months)
No updates match the current filters.