Local-first X/Twitter research archive, search workbench, media viewer, and portable bundle system.
Built for researchers, designers, engineers, and high-signal collectors who need to preserve, search, inspect, and share what flows through the X web app without sending private archives to a cloud service.
Issues · 简体中文 · Bundle Format · QC Runbook · Performance Gates · Store Listing Draft · Publishing Runbook · Store Sync Artifact
Scrollmark is maintained by Kyle McCleary. It began as an MIT-licensed fork of
prinsss/twitter-web-exporter, but has since been rebuilt and overhauled across capture, search, storage, UI, bundle import/export, diagnostics, performance, branding, and release workflows. Original copyright and license notices are preserved.
- What Scrollmark does
- Screenshots
- Feature matrix
- Installation
- Core workflow
- Search language
- Portable bundles
- Exports
- Privacy and security model
- Performance model
- Project map
- Development
- Testing and release gates
- Limitations
- FAQ
- Roadmap
- Attribution and license
Scrollmark runs as a userscript on x.com, twitter.com, and mobile.x.com. It observes the same GraphQL/API responses that the X web app loads while you browse, parses useful structures out of those responses, stores them locally in IndexedDB, and gives you a fast explorer for search, review, export, and sharing.
It is intentionally not a cloud product, not a bot, and not a Twitter/X developer API client. The core idea is simple: if the web app loads useful research material into your browser, Scrollmark can help you preserve and query it locally.
| Area | What it gives you | Why it matters |
|---|---|---|
| Local capture | Bookmarks, tweets, likes, users, media, followers/following surfaces, retweeters, quotes, search timelines, lists, communities, and runtime diagnostics. | Build an archive naturally while browsing instead of manually copy-pasting threads, media, and metadata. |
| Search | Natural text search, exact phrases, phrase slop, boosts, boolean logic, exclusions, author shorthand, folders, dates, numeric filters, URLs/domains, and raw dotted-field constraints. | Treat a large bookmark corpus more like a research index than a flat export file. |
| Exploration | Fullscreen virtualized table view plus a tailored masonry media view with deterministic paging. | Scan thousands of records without loading the entire archive into the DOM. |
| Sharing | Canonical ZIP bundle export/import for portable, isolated research collections. | Send a curated archive to a collaborator without giving them your account, cookies, or live X state. |
| Diagnostics | Raw capture, debug counters, search history export, performance probes, and diagnostic bundles. | Reproduce parser, capture, and browser-runtime issues with concrete evidence. |
| Capability | Status | Notes |
|---|---|---|
| Bookmark capture and folder metadata | 🟢 Stable | Captures bookmark timelines and folder views as the X web app loads them. |
| Tweet/user indexing | 🟢 Stable | Normalized tweet and user records are materialized for table, masonry, and search. |
| Article-post support | 🟢 Stable | Article-style X posts are parsed into searchable/exportable records where the web app exposes them. |
| Local search | 🟢 Stable | Worker-backed search path for large corpora; exact phrases and phrase windows are boosted. |
| Table explorer | 🟢 Stable | Virtualized rendering, paged IndexedDB hydration, selected-row export, fullscreen mode. |
| Masonry media explorer | 🟢 Stable | Designer-oriented media scan with deterministic ordering and folder-aware refresh. |
| Canonical bundle export/import | 🟢 Stable | ZIP bundles import into isolated bundle-library tables and do not mutate live captures. |
| JSON/CSV/HTML data export | 🟢 Stable | Classic exports remain available for selected rows or result sets. |
| Bulk media export | 🟡 Browser-dependent | Large binary exports are constrained by browser memory and download behavior. |
| Diagnostics and raw capture | 🟢 Stable | Designed for parser repair, regression investigation, and performance QC. |
| Full automation | 🔴 Not a goal | Scrollmark observes normal browsing; it is not an autonomous scraping bot. |
- Install a userscript manager.
- Firefox: Violentmonkey or Tampermonkey.
- Chrome: Tampermonkey with browser user scripts enabled.
- Install the latest Scrollmark userscript:
https://github.com/kmccleary3301/scrollmark/releases/latest/download/scrollmark.user.js
- Open or hard-reload
https://x.com/home. - Confirm the floating Scrollmark launcher appears on the page.
- Open the widget and verify the header reads:
Scrollmark
By Kyle McCleary
Recent Chrome builds require explicit user-script permission for userscript managers.
| Step | Where | What to verify |
|---|---|---|
| 1 | chrome://extensions |
Developer mode can stay off; this is not an unpacked extension install. |
| 2 | Tampermonkey details | Enable Allow user scripts if Chrome exposes that toggle. |
| 3 | Tampermonkey dashboard | Confirm Scrollmark is enabled and matches x.com/*, twitter.com/*, and mobile.x.com/*. |
| 4 | X tab | Hard reload after install or reinstall. |
The production build emits dist/scrollmark.user.js.
npm install
npm run buildFor the local e2e install endpoint used during development, serve the parent workspace so the historical compatibility path resolves:
cd /home/skra/projects/twitter_scraping
python3 -m http.server 8123Then install the generated e2e build from the compatibility URL used by vite.config.ts:
http://localhost:8123/greasemonkey_project/twitter-web-exporter/dist/twitter-web-exporter-e2e.user.js
- Browse X normally.
- Open home timeline, bookmarks, bookmark folders, user profiles, tweet threads, likes, followers/following pages, retweet panels, quote surfaces, lists, communities, or search timelines.
- Scroll enough for the X web app to load the data you want preserved.
- Watch the Scrollmark widget counters.
- Counters represent parsed local captures, not all possible remote records.
- If a page has not loaded a response in your browser, Scrollmark cannot parse it yet.
- Open a module explorer.
Bookmarksis the primary research surface.Tweet Details,User Tweets,Likes,User Details, and relationship modules expose other parsed shapes.Bundle Vieweropens imported portable archives rather than live captures.
- Search and filter.
- Use plain text for broad recall.
- Quote phrases for exact constraints.
- Add operators for folders, authors, dates, media, domains, and numeric thresholds.
- Choose a view.
- Table view is best for metadata inspection, selection, and exports.
- Masonry view is best for visual scanning of images/videos from tweets and article previews.
- Export what you need.
Export Datafor JSON/CSV/HTML or canonical bundle ZIP.Export Mediafor large binary media downloads.Export Search Historywhen debugging search quality or ranking behavior.
- Share safely.
- Export a canonical bundle ZIP and send it to a collaborator.
- The recipient imports it into the Bundle Library, where it remains isolated from their live X account data.
Scrollmark search is intentionally closer to a compact research query language than a plain browser find box. Unstructured text is expanded into content-term matches plus boosted adjacent phrase windows; quoted phrases are enforced as phrases; operators add hard constraints.
| Query | Intent |
|---|---|
distributed systems design |
Broad natural-language search with phrase-window boosting. |
"full writeup on how" |
Require the exact phrase. |
@sama agent systems |
Require author sama, then rank matching text. |
from:alice ("design system"~2 OR reliability) |
Author filter plus boolean phrase/text logic. |
folder:"Design 02" has:media |
Restrict to a bookmark folder and media-bearing records. |
domain:github.com min_likes:50 |
Find GitHub-linked tweets with at least 50 likes. |
since:2026-03-01 until:2026-03-31 -filter:replies |
Date-bounded search excluding replies. |
legacy.entities.hashtags.text:ai |
Raw dotted-path search over nested metadata. |
| Family | Syntax | Examples |
|---|---|---|
| Lexical | free text, quotes, slop, boosts | agent memory, "design system"~2, machine^2 |
| Boolean | AND, OR, NOT, parentheses |
(memory OR retrieval) AND evaluation |
| Identity | from:, from_id:, author_id:, @user |
@openai, from_id:12345 |
| Reply/entity IDs | to:, to_id:, id:, conversation_id: |
to:alice, id:1999 |
| Folder metadata | folder:, bookmark_folder: |
folder:"Research Revisit 02" |
| Route/source metadata | lang:, route:, source:, card_name: |
lang:en, route:bookmarks |
| URL metadata | domain:, url: |
domain:arxiv.org, url:openai.com |
| Presence | is:, has: |
is:reply, has:media, has:links |
| Compatibility | filter:, include: |
filter:media, include:nativeretweets |
| Numeric thresholds | min_likes:, min_retweets:, min_replies:, min_bookmarks: |
min_bookmarks:10 |
| Time boundaries | since:, until:, since_time:, until_time:, since_id:, max_id: |
since:2026-03-01 |
| Shorthands | mention:, #tag, $symbol |
mention:alice, #ai, $tsla |
| Raw fields | field:value, field:"quoted phrase" |
core.user_results.result.legacy.name:"Jane Doe" |
- Empty search defaults to newest-first ordering, preferably bookmark/save time where available and post time as fallback.
- Plain multi-term searches rank term hits and boosted adjacent phrase windows.
- Quoted phrases are strict phrase constraints.
- Folder filters and other metadata operators constrain the candidate set before display.
- Stale worker responses are ignored so slower older searches do not overwrite newer results.
Canonical bundles are ZIP files for sharing Scrollmark records without mutating anyone's account.
bundle.zip
├─ manifest.json
├─ records/
│ └─ records.jsonl
└─ media/
└─ media-urls.txt
| File | Purpose |
|---|---|
manifest.json |
Bundle identity, producer metadata, privacy summary, counts, and file manifest. |
records/records.jsonl |
One validated bundle-record envelope per line. |
media/media-urls.txt |
Optional newline-delimited original media URLs for external download tools. |
Imported bundles are stored in isolated IndexedDB tables:
imported_bundles
imported_bundle_collections
imported_bundle_items
imported_entity_snapshots
imported_bundle_import_reports
That isolation is deliberate. Importing a bundle does not create X bookmarks, follow users, like posts, write to live tweet tables, or modify the recipient's account state. See docs/bundles/canonical-bundle-v1.md for the v1 format boundary.
| Export path | Best for | Output |
|---|---|---|
| JSON | Full-fidelity local analysis and scripting. | .json |
| CSV | Spreadsheet inspection and lightweight tabular processing. | .csv |
| HTML | Human-readable offline review. | .html |
| Canonical bundle ZIP | Sharing a searchable archive with another Scrollmark user. | .zip |
| Media ZIP | Bulk binary media download. | .zip |
| Media URL list | External download managers or reproducible media pipelines. | .txt |
| Diagnostics bundle | Bug reports, parser repair, performance investigation. | .zip |
| Search history | Search-quality debugging and ranking regression analysis. | .json |
For very large binary media exports, browser memory limits still matter. Canonical data bundles are designed to be much lighter than media ZIPs because they primarily contain structured records and optional media URLs rather than the media bytes themselves.
Scrollmark is designed around local control.
| Principle | Implementation |
|---|---|
| Local-first storage | Captured records are stored in browser IndexedDB. |
| No hosted backend | There is no Scrollmark cloud service. |
| No X developer app | The script observes the web app instead of using official API credentials. |
| Explicit exports | Data leaves your machine only when you export and share files yourself. |
| Isolated imports | Bundle Library imports are separate from live capture tables. |
| Safe text rendering | Imported/captured text is escaped; sanitized http/https entity links are regenerated. |
| ZIP hardening | Bundle import rejects absolute/parent traversal paths and enforces decompression limits. |
Userscript permissions are intentionally narrow for this architecture:
@match *://twitter.com/*
@match *://x.com/*
@match *://mobile.x.com/*
@grant unsafeWindow
@grant GM_xmlhttpRequest
@connect cdn.syndication.twimg.com
unsafeWindow is used to observe web-app runtime/network behavior from the userscript environment. GM_xmlhttpRequest is used for controlled media/export support where normal page fetch semantics are insufficient.
The final release work focused on making large archives usable without forcing the whole corpus through the DOM or main thread.
| Area | Strategy |
|---|---|
| Initial viewer load | Paged IndexedDB hydration instead of full-corpus load on open. |
| Table rendering | Virtual windowing with measured variable-height rows. |
| Masonry rendering | Deterministic chunked media layout with folder/search-aware refresh. |
| Search | Worker-backed execution for non-trivial corpora; large main-thread fallback is blocked. |
| Bundle export | Worker-backed canonical ZIP generation with progress and cancellation. |
| Search typing | Debounce, stale-response suppression, and corpus reuse. |
| Diagnostics | Runtime counters expose worker availability, timings, stale/cancel counts, and hydration behavior. |
Current local performance gates are documented in docs/release/final-hill-performance-gates.md. They include search latency, phrase-quality checks, browser-driven table/masonry scroll behavior, bundle ZIP latency, bundle roundtrip validation, and Chrome CDP metric collection smoke tests.
scrollmark/
├─ README.md
├─ package.json
├─ vite.config.ts
├─ docs/
│ ├─ bundles/
│ │ └─ canonical-bundle-v1.md
│ ├─ db-backed-table-baseline.md
│ ├─ db-backed-table-controller-audit.md
│ ├─ db-backed-table-developer-guide.md
│ ├─ release/
│ │ ├─ final-hill-performance-gates.md
│ │ ├─ final-release-checklist.md
│ │ ├─ store-listing-draft.md
│ │ └─ unified-qc-session-runbook.md
│ └─ screenshots/
│ ├─ README.md
│ ├─ hero-bookmarks-masonry-research.png
│ ├─ bookmarks-masonry-search-fullscreen.png
│ ├─ search-table-from-operator-fixed.png
│ ├─ bundle-viewer-fullscreen-clean.png
│ ├─ export-data-bundle-modal.png
│ └─ export-media-modal.png
├─ e2e/
│ ├─ bundles/
│ │ └─ canonical_bundle_roundtrip_harness.ts
│ ├─ fixtures/
│ │ └─ bundles/
│ └─ perf/
│ ├─ browser_viewer_scroll_harness.mjs
│ ├─ bundle_export_latency_benchmark.ts
│ ├─ run_final_hill_perf_suite.sh
│ ├─ search_engine_latency_benchmark.ts
│ └─ search_phrase_quality_harness.ts
└─ src/
├─ components/
│ ├─ bundles/ # Bundle Library UI
│ ├─ modals/ # Export/settings/dialog surfaces
│ └─ table/ # Shared explorer, table, masonry, virtualization
├─ contracts/ # Versioned search contracts and knobs
├─ core/
│ ├─ bundles/ # Canonical ZIP import/export machinery
│ ├─ database/ # IndexedDB/Dexie schema and storage helpers
│ ├─ options/ # Runtime options and persistence
│ ├─ perf/ # Diagnostics/performance counters
│ └─ search/ # Worker client/contracts/search worker
├─ i18n/
│ └─ locales/ # UI translations
├─ modules/
│ ├─ bookmarks/
│ ├─ followers/
│ ├─ following/
│ ├─ interaction-events/
│ ├─ likes/
│ ├─ local-search/
│ ├─ quotes/
│ ├─ raw-capture/
│ ├─ retweeters/
│ ├─ search-timeline/
│ ├─ tweet-detail/
│ ├─ user-detail/
│ ├─ user-media/
│ └─ user-tweets/
├─ types/
└─ utils/ # API parsing, search parsing/ranking, media/export helpers
| Tool | Use |
|---|---|
| Node.js | Runtime for Vite, TypeScript, ESLint, and harnesses. |
| npm | Package install and scripts. |
| Playwright browsers | Browser-driven perf/QC harnesses. |
| A userscript manager | Manual install/QC in Firefox or Chrome. |
Install dependencies:
npm installInstall Playwright browsers when running browser harnesses:
npx playwright install chromium firefox| Command | Purpose |
|---|---|
npm run lint |
ESLint gate. |
npm run build |
TypeScript check plus production userscript build. |
npm run build:e2e |
Firefox/local e2e userscript build. |
TWE_BUILD_VARIANT=chrome-e2e npx vite build |
Chrome/local e2e userscript build. |
npm run dev |
Vite development server. |
npm run preview |
Preview built output. |
npm run changelog |
Generate changelog with git-cliff. |
| Variant | Output | Install/update behavior |
|---|---|---|
| Production | dist/scrollmark.user.js |
Release-facing filename and GitHub release download URL. |
| Firefox e2e | dist/twitter-web-exporter-e2e.user.js |
Historical local endpoint retained for compatibility. |
| Chrome e2e | dist/twitter-web-exporter-chrome-e2e.user.js |
Historical local endpoint retained for compatibility. |
Some internal filenames and IndexedDB discovery strings intentionally retain twitter-web-exporter compatibility names. The user-facing product identity is Scrollmark.
Run the baseline gates before opening a release PR or publishing a userscript artifact:
npm run lint
npm run build
npm run build:e2e
TWE_BUILD_VARIANT=chrome-e2e npx vite build
./e2e/perf/run_final_hill_perf_suite.sh| Gate | What it protects |
|---|---|
| Lint/build | TypeScript, import, formatting, and static correctness. |
| Search engine latency | Prevents long-query typing and search execution from regressing into main-thread freezes. |
| Phrase-quality harness | Checks exact phrase ranking, quoted enforcement, slop, and author shorthand behavior. |
| Viewer paging model | Ensures large viewers hydrate pages instead of all records at once. |
| Browser scroll harness | Detects blank windows, duplicate visible IDs, long tasks, and masonry folder-refresh failures. |
| Bundle export latency | Ensures canonical ZIP export remains worker-backed and validates output. |
| Bundle roundtrip | Exports/imports representative bundles and checks imported records. |
| Chrome CDP smoke | Confirms browser metric collection and userscript injection path are viable. |
Manual release QC is intentionally backloaded and documented in docs/release/unified-qc-session-runbook.md. The short version: verify install, capture, search, table scrolling, masonry scrolling, bundle export/import, media export, diagnostics export, Firefox behavior, and Chrome behavior against real X sessions.
Scrollmark is powerful, but the boundary is explicit.
- It can only parse data the X web app actually loads in your browser.
- X can change route names, GraphQL response shapes, or timeline structures; parser updates may be required.
- It does not bypass visibility limits in the web app.
- It does not automate scrolling or account actions by itself.
- Imported bundles are local archives, not instructions to recreate bookmarks/folders in an X account.
- Browser storage quotas and memory limits still apply, especially for very large media exports.
- The project is currently optimized for desktop Firefox and Chrome userscript-manager installs.
Do I need an X developer account or API key?
No. Scrollmark observes responses loaded by the X web app while you browse. It does not require official X API credentials.
Does Scrollmark send my archive to a server?
No. Captures are stored locally in browser IndexedDB. Data leaves your machine only when you explicitly export a file and share it yourself.
Why are some counters lower than what I know exists remotely?
Counters reflect what has been loaded and parsed locally. Visit and scroll the relevant X surface so the web app loads more records.
Can I import a collaborator's bundle into my real X bookmarks?
No, not in v1. Bundle import is intentionally a local viewing/searching feature. It does not mutate your real X account.
What should I include in a bug report?
Include browser, userscript manager, route, safe-mode state, exact action, console errors if available, and a Scrollmark diagnostics bundle. If the issue is search quality, also export bookmark search history.
The near-term release posture is stabilization rather than feature sprawl.
| Priority | Direction |
|---|---|
| P0 | Keep search, viewer paging, bundle export/import, and Chrome/Firefox install paths stable. |
| P1 | Improve docs, screenshots, store listing copy, and onboarding for non-developer users. |
| P2 | Expand parser coverage where X exposes useful new surfaces such as quote/article/media variants. |
| P3 | Add deeper CPU/memory telemetry and long-session soak tooling. |
| P4 | Consider optional collaboration/import workflows only if they preserve the strict local/isolation model. |
Scrollmark is maintained by Kyle McCleary and published at kmccleary3301/scrollmark. The project remains licensed under MIT.







