Scrollmark

Local-first X/Twitter research archive, search workbench, media viewer, and portable bundle system.

Built for researchers, designers, engineers, and high-signal collectors who need to preserve, search, inspect, and share what flows through the X web app without sending private archives to a cloud service.

Issues · 简体中文 · Bundle Format · QC Runbook · Performance Gates · Store Listing Draft · Publishing Runbook · Store Sync Artifact

Scrollmark is maintained by Kyle McCleary. It began as an MIT-licensed fork of prinsss/twitter-web-exporter, but has since been rebuilt and overhauled across capture, search, storage, UI, bundle import/export, diagnostics, performance, branding, and release workflows. Original copyright and license notices are preserved.

What Scrollmark does

Scrollmark runs as a userscript on x.com, twitter.com, and mobile.x.com. It observes the same GraphQL/API responses that the X web app loads while you browse, parses useful structures out of those responses, stores them locally in IndexedDB, and gives you a fast explorer for search, review, export, and sharing.

It is intentionally not a cloud product, not a bot, and not a Twitter/X developer API client. The core idea is simple: if the web app loads useful research material into your browser, Scrollmark can help you preserve and query it locally.

Area	What it gives you	Why it matters
Local capture	Bookmarks, tweets, likes, users, media, followers/following surfaces, retweeters, quotes, search timelines, lists, communities, and runtime diagnostics.	Build an archive naturally while browsing instead of manually copy-pasting threads, media, and metadata.
Search	Natural text search, exact phrases, phrase slop, boosts, boolean logic, exclusions, author shorthand, folders, dates, numeric filters, URLs/domains, and raw dotted-field constraints.	Treat a large bookmark corpus more like a research index than a flat export file.
Exploration	Fullscreen virtualized table view plus a tailored masonry media view with deterministic paging.	Scan thousands of records without loading the entire archive into the DOM.
Sharing	Canonical ZIP bundle export/import for portable, isolated research collections.	Send a curated archive to a collaborator without giving them your account, cookies, or live X state.
Diagnostics	Raw capture, debug counters, search history export, performance probes, and diagnostic bundles.	Reproduce parser, capture, and browser-runtime issues with concrete evidence.

Screenshots

Visual research archive Switch the bookmark explorer into fullscreen masonry mode to scan papers, diagrams, design references, screenshots, article cards, and videos as a spatial archive. The masonry view keeps deterministic ordering and folder-aware filtering, so it behaves like an infinite visual feed rather than a static export preview.
	Table search and inspection Use the table view for dense metadata work: exact snippets, author operators, folders, metrics, dates, media cells, selected-row export, and raw IDs stay visible in one place. The search bar supports natural language, exact phrases, boosted phrase windows, boolean logic, exclusions, author shorthand, folders, dates, domains, and dotted metadata fields.
Portable Bundle Viewer Imported bundles reuse the same explorer spine as live bookmarks instead of falling back to a raw JSON scroller. A collaborator can open a shared archive, search it, filter it, inspect it in masonry mode, and export derived subsets without touching their own live X account data.
	Data and bundle export Export selected rows, all current results, or a canonical bundle ZIP for sharing. Bundle export is separate from raw JSON/CSV/HTML export so portable archives can carry manifest metadata and normalized record files. Search history export is available when ranking, phrase matching, or repro-quality diagnostics matter.
Fast media export Download image and video attachments in bulk, tune concurrency and pacing, include metadata sidecars, or copy URL manifests for external tooling. Media export is intentionally separate from canonical bundle export: bundle ZIPs are for portable data archives; media export is for large binary downloads.
	Live capture widget The floating widget tracks module counters while you browse, exposes monitors and Bundle Viewer, and keeps diagnostic controls close without forcing a separate backend or dashboard. Scrollmark observes the same browser-loaded GraphQL/API responses the X web app is already using.
Settings and localization Configure hook mode, safe mode, repair mode, module monitors, database actions, bundle library, language, theme, and diagnostics from the in-page settings panel. The UI has a localization layer rather than hard-coded English-only strings.

Feature matrix

Capability	Status	Notes
Bookmark capture and folder metadata	🟢 Stable	Captures bookmark timelines and folder views as the X web app loads them.
Tweet/user indexing	🟢 Stable	Normalized tweet and user records are materialized for table, masonry, and search.
Article-post support	🟢 Stable	Article-style X posts are parsed into searchable/exportable records where the web app exposes them.
Local search	🟢 Stable	Worker-backed search path for large corpora; exact phrases and phrase windows are boosted.
Table explorer	🟢 Stable	Virtualized rendering, paged IndexedDB hydration, selected-row export, fullscreen mode.
Masonry media explorer	🟢 Stable	Designer-oriented media scan with deterministic ordering and folder-aware refresh.
Canonical bundle export/import	🟢 Stable	ZIP bundles import into isolated bundle-library tables and do not mutate live captures.
JSON/CSV/HTML data export	🟢 Stable	Classic exports remain available for selected rows or result sets.
Bulk media export	🟡 Browser-dependent	Large binary exports are constrained by browser memory and download behavior.
Diagnostics and raw capture	🟢 Stable	Designed for parser repair, regression investigation, and performance QC.
Full automation	🔴 Not a goal	Scrollmark observes normal browsing; it is not an autonomous scraping bot.

Installation

Release install

Install a userscript manager.
- Firefox: Violentmonkey or Tampermonkey.
- Chrome: Tampermonkey with browser user scripts enabled.
Install the latest Scrollmark userscript:

https://github.com/kmccleary3301/scrollmark/releases/latest/download/scrollmark.user.js

Open or hard-reload https://x.com/home.
Confirm the floating Scrollmark launcher appears on the page.
Open the widget and verify the header reads:

Scrollmark
By Kyle McCleary

Chrome note

Recent Chrome builds require explicit user-script permission for userscript managers.

Step	Where	What to verify
1	`chrome://extensions`	Developer mode can stay off; this is not an unpacked extension install.
2	Tampermonkey details	Enable `Allow user scripts` if Chrome exposes that toggle.
3	Tampermonkey dashboard	Confirm `Scrollmark` is enabled and matches `x.com/`, `twitter.com/`, and `mobile.x.com/*`.
4	X tab	Hard reload after install or reinstall.

Local development install

The production build emits dist/scrollmark.user.js.

npm install
npm run build

For the local e2e install endpoint used during development, serve the parent workspace so the historical compatibility path resolves:

cd /home/skra/projects/twitter_scraping
python3 -m http.server 8123

Then install the generated e2e build from the compatibility URL used by vite.config.ts:

http://localhost:8123/greasemonkey_project/twitter-web-exporter/dist/twitter-web-exporter-e2e.user.js

Core workflow

Browse X normally.
- Open home timeline, bookmarks, bookmark folders, user profiles, tweet threads, likes, followers/following pages, retweet panels, quote surfaces, lists, communities, or search timelines.
- Scroll enough for the X web app to load the data you want preserved.
Watch the Scrollmark widget counters.
- Counters represent parsed local captures, not all possible remote records.
- If a page has not loaded a response in your browser, Scrollmark cannot parse it yet.
Open a module explorer.
- Bookmarks is the primary research surface.
- Tweet Details, User Tweets, Likes, User Details, and relationship modules expose other parsed shapes.
- Bundle Viewer opens imported portable archives rather than live captures.
Search and filter.
- Use plain text for broad recall.
- Quote phrases for exact constraints.
- Add operators for folders, authors, dates, media, domains, and numeric thresholds.
Choose a view.
- Table view is best for metadata inspection, selection, and exports.
- Masonry view is best for visual scanning of images/videos from tweets and article previews.
Export what you need.
- Export Data for JSON/CSV/HTML or canonical bundle ZIP.
- Export Media for large binary media downloads.
- Export Search History when debugging search quality or ranking behavior.
Share safely.
- Export a canonical bundle ZIP and send it to a collaborator.
- The recipient imports it into the Bundle Library, where it remains isolated from their live X account data.

Search language

Scrollmark search is intentionally closer to a compact research query language than a plain browser find box. Unstructured text is expanded into content-term matches plus boosted adjacent phrase windows; quoted phrases are enforced as phrases; operators add hard constraints.

Quick examples

Query	Intent
`distributed systems design`	Broad natural-language search with phrase-window boosting.
`"full writeup on how"`	Require the exact phrase.
`@sama agent systems`	Require author `sama`, then rank matching text.
`from:alice ("design system"~2 OR reliability)`	Author filter plus boolean phrase/text logic.
`folder:"Design 02" has:media`	Restrict to a bookmark folder and media-bearing records.
`domain:github.com min_likes:50`	Find GitHub-linked tweets with at least 50 likes.
`since:2026-03-01 until:2026-03-31 -filter:replies`	Date-bounded search excluding replies.
`legacy.entities.hashtags.text:ai`	Raw dotted-path search over nested metadata.

Operator families

Family	Syntax	Examples
Lexical	free text, quotes, slop, boosts	`agent memory`, `"design system"~2`, `machine^2`
Boolean	`AND`, `OR`, `NOT`, parentheses	`(memory OR retrieval) AND evaluation`
Identity	`from:`, `from_id:`, `author_id:`, `@user`	`@openai`, `from_id:12345`
Reply/entity IDs	`to:`, `to_id:`, `id:`, `conversation_id:`	`to:alice`, `id:1999`
Folder metadata	`folder:`, `bookmark_folder:`	`folder:"Research Revisit 02"`
Route/source metadata	`lang:`, `route:`, `source:`, `card_name:`	`lang:en`, `route:bookmarks`
URL metadata	`domain:`, `url:`	`domain:arxiv.org`, `url:openai.com`
Presence	`is:`, `has:`	`is:reply`, `has:media`, `has:links`
Compatibility	`filter:`, `include:`	`filter:media`, `include:nativeretweets`
Numeric thresholds	`min_likes:`, `min_retweets:`, `min_replies:`, `min_bookmarks:`	`min_bookmarks:10`
Time boundaries	`since:`, `until:`, `since_time:`, `until_time:`, `since_id:`, `max_id:`	`since:2026-03-01`
Shorthands	`mention:`, `#tag`, `$symbol`	`mention:alice`, `#ai`, `$tsla`
Raw fields	`field:value`, `field:"quoted phrase"`	`core.user_results.result.legacy.name:"Jane Doe"`

Ranking notes

Empty search defaults to newest-first ordering, preferably bookmark/save time where available and post time as fallback.
Plain multi-term searches rank term hits and boosted adjacent phrase windows.
Quoted phrases are strict phrase constraints.
Folder filters and other metadata operators constrain the candidate set before display.
Stale worker responses are ignored so slower older searches do not overwrite newer results.

Portable bundles

Canonical bundles are ZIP files for sharing Scrollmark records without mutating anyone's account.

bundle.zip
├─ manifest.json
├─ records/
│  └─ records.jsonl
└─ media/
   └─ media-urls.txt

File	Purpose
`manifest.json`	Bundle identity, producer metadata, privacy summary, counts, and file manifest.
`records/records.jsonl`	One validated bundle-record envelope per line.
`media/media-urls.txt`	Optional newline-delimited original media URLs for external download tools.

Imported bundles are stored in isolated IndexedDB tables:

imported_bundles
imported_bundle_collections
imported_bundle_items
imported_entity_snapshots
imported_bundle_import_reports

That isolation is deliberate. Importing a bundle does not create X bookmarks, follow users, like posts, write to live tweet tables, or modify the recipient's account state. See docs/bundles/canonical-bundle-v1.md for the v1 format boundary.

Exports

Export path	Best for	Output
JSON	Full-fidelity local analysis and scripting.	`.json`
CSV	Spreadsheet inspection and lightweight tabular processing.	`.csv`
HTML	Human-readable offline review.	`.html`
Canonical bundle ZIP	Sharing a searchable archive with another Scrollmark user.	`.zip`
Media ZIP	Bulk binary media download.	`.zip`
Media URL list	External download managers or reproducible media pipelines.	`.txt`
Diagnostics bundle	Bug reports, parser repair, performance investigation.	`.zip`
Search history	Search-quality debugging and ranking regression analysis.	`.json`

For very large binary media exports, browser memory limits still matter. Canonical data bundles are designed to be much lighter than media ZIPs because they primarily contain structured records and optional media URLs rather than the media bytes themselves.

Privacy and security model

Scrollmark is designed around local control.

Principle	Implementation
Local-first storage	Captured records are stored in browser IndexedDB.
No hosted backend	There is no Scrollmark cloud service.
No X developer app	The script observes the web app instead of using official API credentials.
Explicit exports	Data leaves your machine only when you export and share files yourself.
Isolated imports	Bundle Library imports are separate from live capture tables.
Safe text rendering	Imported/captured text is escaped; sanitized `http`/`https` entity links are regenerated.
ZIP hardening	Bundle import rejects absolute/parent traversal paths and enforces decompression limits.

Userscript permissions are intentionally narrow for this architecture:

@match    *://twitter.com/*
@match    *://x.com/*
@match    *://mobile.x.com/*
@grant    unsafeWindow
@grant    GM_xmlhttpRequest
@connect  cdn.syndication.twimg.com

unsafeWindow is used to observe web-app runtime/network behavior from the userscript environment. GM_xmlhttpRequest is used for controlled media/export support where normal page fetch semantics are insufficient.

Performance model

The final release work focused on making large archives usable without forcing the whole corpus through the DOM or main thread.

Area	Strategy
Initial viewer load	Paged IndexedDB hydration instead of full-corpus load on open.
Table rendering	Virtual windowing with measured variable-height rows.
Masonry rendering	Deterministic chunked media layout with folder/search-aware refresh.
Search	Worker-backed execution for non-trivial corpora; large main-thread fallback is blocked.
Bundle export	Worker-backed canonical ZIP generation with progress and cancellation.
Search typing	Debounce, stale-response suppression, and corpus reuse.
Diagnostics	Runtime counters expose worker availability, timings, stale/cancel counts, and hydration behavior.

Current local performance gates are documented in docs/release/final-hill-performance-gates.md. They include search latency, phrase-quality checks, browser-driven table/masonry scroll behavior, bundle ZIP latency, bundle roundtrip validation, and Chrome CDP metric collection smoke tests.

Project map

scrollmark/
├─ README.md
├─ package.json
├─ vite.config.ts
├─ docs/
│  ├─ bundles/
│  │  └─ canonical-bundle-v1.md
│  ├─ db-backed-table-baseline.md
│  ├─ db-backed-table-controller-audit.md
│  ├─ db-backed-table-developer-guide.md
│  ├─ release/
│  │  ├─ final-hill-performance-gates.md
│  │  ├─ final-release-checklist.md
│  │  ├─ store-listing-draft.md
│  │  └─ unified-qc-session-runbook.md
│  └─ screenshots/
│     ├─ README.md
│     ├─ hero-bookmarks-masonry-research.png
│     ├─ bookmarks-masonry-search-fullscreen.png
│     ├─ search-table-from-operator-fixed.png
│     ├─ bundle-viewer-fullscreen-clean.png
│     ├─ export-data-bundle-modal.png
│     └─ export-media-modal.png
├─ e2e/
│  ├─ bundles/
│  │  └─ canonical_bundle_roundtrip_harness.ts
│  ├─ fixtures/
│  │  └─ bundles/
│  └─ perf/
│     ├─ browser_viewer_scroll_harness.mjs
│     ├─ bundle_export_latency_benchmark.ts
│     ├─ run_final_hill_perf_suite.sh
│     ├─ search_engine_latency_benchmark.ts
│     └─ search_phrase_quality_harness.ts
└─ src/
   ├─ components/
   │  ├─ bundles/          # Bundle Library UI
   │  ├─ modals/           # Export/settings/dialog surfaces
   │  └─ table/            # Shared explorer, table, masonry, virtualization
   ├─ contracts/           # Versioned search contracts and knobs
   ├─ core/
   │  ├─ bundles/          # Canonical ZIP import/export machinery
   │  ├─ database/         # IndexedDB/Dexie schema and storage helpers
   │  ├─ options/          # Runtime options and persistence
   │  ├─ perf/             # Diagnostics/performance counters
   │  └─ search/           # Worker client/contracts/search worker
   ├─ i18n/
   │  └─ locales/          # UI translations
   ├─ modules/
   │  ├─ bookmarks/
   │  ├─ followers/
   │  ├─ following/
   │  ├─ interaction-events/
   │  ├─ likes/
   │  ├─ local-search/
   │  ├─ quotes/
   │  ├─ raw-capture/
   │  ├─ retweeters/
   │  ├─ search-timeline/
   │  ├─ tweet-detail/
   │  ├─ user-detail/
   │  ├─ user-media/
   │  └─ user-tweets/
   ├─ types/
   └─ utils/               # API parsing, search parsing/ranking, media/export helpers

Development

Prerequisites

Tool	Use
Node.js	Runtime for Vite, TypeScript, ESLint, and harnesses.
npm	Package install and scripts.
Playwright browsers	Browser-driven perf/QC harnesses.
A userscript manager	Manual install/QC in Firefox or Chrome.

Install dependencies:

npm install

Install Playwright browsers when running browser harnesses:

npx playwright install chromium firefox

Common commands

Command	Purpose
`npm run lint`	ESLint gate.
`npm run build`	TypeScript check plus production userscript build.
`npm run build:e2e`	Firefox/local e2e userscript build.
`TWE_BUILD_VARIANT=chrome-e2e npx vite build`	Chrome/local e2e userscript build.
`npm run dev`	Vite development server.
`npm run preview`	Preview built output.
`npm run changelog`	Generate changelog with `git-cliff`.

Build outputs

Variant	Output	Install/update behavior
Production	`dist/scrollmark.user.js`	Release-facing filename and GitHub release download URL.
Firefox e2e	`dist/twitter-web-exporter-e2e.user.js`	Historical local endpoint retained for compatibility.
Chrome e2e	`dist/twitter-web-exporter-chrome-e2e.user.js`	Historical local endpoint retained for compatibility.

Some internal filenames and IndexedDB discovery strings intentionally retain twitter-web-exporter compatibility names. The user-facing product identity is Scrollmark.

Testing and release gates

Run the baseline gates before opening a release PR or publishing a userscript artifact:

npm run lint
npm run build
npm run build:e2e
TWE_BUILD_VARIANT=chrome-e2e npx vite build
./e2e/perf/run_final_hill_perf_suite.sh

Gate	What it protects
Lint/build	TypeScript, import, formatting, and static correctness.
Search engine latency	Prevents long-query typing and search execution from regressing into main-thread freezes.
Phrase-quality harness	Checks exact phrase ranking, quoted enforcement, slop, and author shorthand behavior.
Viewer paging model	Ensures large viewers hydrate pages instead of all records at once.
Browser scroll harness	Detects blank windows, duplicate visible IDs, long tasks, and masonry folder-refresh failures.
Bundle export latency	Ensures canonical ZIP export remains worker-backed and validates output.
Bundle roundtrip	Exports/imports representative bundles and checks imported records.
Chrome CDP smoke	Confirms browser metric collection and userscript injection path are viable.

Manual release QC is intentionally backloaded and documented in docs/release/unified-qc-session-runbook.md. The short version: verify install, capture, search, table scrolling, masonry scrolling, bundle export/import, media export, diagnostics export, Firefox behavior, and Chrome behavior against real X sessions.

Limitations

Scrollmark is powerful, but the boundary is explicit.

It can only parse data the X web app actually loads in your browser.
X can change route names, GraphQL response shapes, or timeline structures; parser updates may be required.
It does not bypass visibility limits in the web app.
It does not automate scrolling or account actions by itself.
Imported bundles are local archives, not instructions to recreate bookmarks/folders in an X account.
Browser storage quotas and memory limits still apply, especially for very large media exports.
The project is currently optimized for desktop Firefox and Chrome userscript-manager installs.

FAQ

Do I need an X developer account or API key?

No. Scrollmark observes responses loaded by the X web app while you browse. It does not require official X API credentials.

Does Scrollmark send my archive to a server?

No. Captures are stored locally in browser IndexedDB. Data leaves your machine only when you explicitly export a file and share it yourself.

Why are some counters lower than what I know exists remotely?

Counters reflect what has been loaded and parsed locally. Visit and scroll the relevant X surface so the web app loads more records.

Can I import a collaborator's bundle into my real X bookmarks?

No, not in v1. Bundle import is intentionally a local viewing/searching feature. It does not mutate your real X account.

What should I include in a bug report?

Include browser, userscript manager, route, safe-mode state, exact action, console errors if available, and a Scrollmark diagnostics bundle. If the issue is search quality, also export bookmark search history.

Roadmap

The near-term release posture is stabilization rather than feature sprawl.

Priority	Direction
P0	Keep search, viewer paging, bundle export/import, and Chrome/Firefox install paths stable.
P1	Improve docs, screenshots, store listing copy, and onboarding for non-developer users.
P2	Expand parser coverage where X exposes useful new surfaces such as quote/article/media variants.
P3	Add deeper CPU/memory telemetry and long-session soak tooling.
P4	Consider optional collaboration/import workflows only if they preserve the strict local/isolation model.

Attribution and license

Scrollmark is maintained by Kyle McCleary and published at kmccleary3301/scrollmark. The project remains licensed under MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 213 Commits
.github/workflows		.github/workflows
.husky		.husky
docs		docs
e2e		e2e
scripts		scripts
src		src
store		store
.commitlintrc		.commitlintrc
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
cliff.toml		cliff.toml
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrollmark

Contents

What Scrollmark does

Screenshots

Feature matrix

Installation

Release install

Chrome note

Local development install

Core workflow

Search language

Quick examples

Operator families

Ranking notes

Portable bundles

Exports

Privacy and security model

Performance model

Project map

Development

Prerequisites

Common commands

Build outputs

Testing and release gates

Limitations

FAQ

Roadmap

Attribution and license

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scrollmark

Contents

What Scrollmark does

Screenshots

Feature matrix

Installation

Release install

Chrome note

Local development install

Core workflow

Search language

Quick examples

Operator families

Ranking notes

Portable bundles

Exports

Privacy and security model

Performance model

Project map

Development

Prerequisites

Common commands

Build outputs

Testing and release gates

Limitations

FAQ

Roadmap

Attribution and license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages