Skip to content

feat: add support for parsely published date, title, and author#865

Merged
cmkm merged 3 commits into
mozilla:mainfrom
inhumantsar:add-parsely-metadata
May 17, 2024
Merged

feat: add support for parsely published date, title, and author#865
cmkm merged 3 commits into
mozilla:mainfrom
inhumantsar:add-parsely-metadata

Conversation

@inhumantsar

Copy link
Copy Markdown
Contributor

Adds Parsely tags as a fallback option for metadata. Parsely is a content analytics service aimed at larger publishers running Wordpress, eg: The Verge.

It's worth noting that Parsely tags are unlikely to exist in isolation and seem to be populated alongside og tags and JSONLD data in nearly all cases. I would like to add other tag sets which will be more valuable though and this was a nice simple one to familiarize myself with. I will totally understand if the preference is to keep the regex patterns from growing too large by leaving out less common sources of metadata like this.

@fchasen

fchasen commented May 10, 2024

Copy link
Copy Markdown

Thanks for these, but as you mentioned I'm a bit torn on if this make sense to add.

One the on hand, these tags seem widely enough used to include but does seem to be repeated info. Is this capturing different metadata we wouldn't get from the JSON-LD already or just in case a site includes these tags but not the JSON?

Looking through the JSON-LD description in https://docs.parse.ly/metadata-jsonld/, they have a few types included as a "post" type that we don't look for so might be worth adding those to jsonLdArticleTypes at the very least.

@inhumantsar

inhumantsar commented May 10, 2024

Copy link
Copy Markdown
Contributor Author

I'd say it's mainly for sites that don't include JSON-LD. I've run into a few others like these too. eg: I have an open issue right now for dc:* and prism:*. They seem to be used on academics-adjacent sites, eg Nature and Our World in Data. Neither of those sites use JSON-LD.

Adding these will be a bit repetitive in the codebase but it would make metadata capture much more consistent.

I'll include the JSON-LD equivalents for anything new as well.

@cmkm cmkm left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at this with Gijs, and while Fred's note about this potentially duplicating data is certainly valid, I think the benefit of capturing more non-JSON-LD metadata outweighs that issue. Thank you for your contribution! :)

@cmkm cmkm merged commit f5e8701 into mozilla:main May 17, 2024
mislav added a commit to mislav/go-readability that referenced this pull request Jun 19, 2025
Ports "feat: add support for parsely published date, title, and author" (mozilla/readability#865)
mislav added a commit to mislav/go-readability that referenced this pull request Jun 23, 2025
Ports "feat: add support for parsely published date, title, and author" (mozilla/readability#865)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants