Skip to content

Support readTsFile table function for external TsFiles#17951

Open
shuwenwei wants to merge 31 commits into
masterfrom
read_tsfile_table_function
Open

Support readTsFile table function for external TsFiles#17951
shuwenwei wants to merge 31 commits into
masterfrom
read_tsfile_table_function

Conversation

@shuwenwei

Copy link
Copy Markdown
Member

Description

This PR adds relational readTsFile table function support for querying external TsFiles.

Main changes:

  • Add readTsFile TVF planning/analyze support and schema collection through TsFileSchemaCollector.
  • Validate explicit file paths by opening them as TsFiles, while directory inputs recursively scan .tsfile files and skip invalid TsFile contents by magic-number validation.
  • Add external TsFile scan and aggregation scan plan nodes/operators.
  • Add ExternalTsFileQueryResource to manage external TsFile readers, task partitioning, run-file merge reads, temporary files, and execution-lifetime memory reservation cleanup.
  • Route related user-facing messages through DataNode query i18n messages.
  • Add UT/IT coverage for external TsFile query resources and the readTsFile table function path.

Tests

Not run in this turn.

@Caideyipi Caideyipi left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues that should be fixed before merging:

  1. External TsFile resources can leak if planning fails after the resource is created.

RelationPlanner.java:1647 creates the ExternalTsFileQueryResource, and ExternalTsFileQueryResource.java:106-108 increments external file-reader references. Those resources are only released through QueryExecution.stopAndCleanup() at QueryExecution.java:395. However, if execution.start() throws during logical planning, distribution planning, or scheduling, Coordinator.java:340-343 only releases frontend memory and schema locks. A timeout or optimizer/distribution exception after planExternalTsFileScan() can therefore leave reader references and temporary directories alive. Please either make QueryExecution.start() transition to failed and clean up on thrown planning exceptions, or have Coordinator.execution() release queryContext.releaseExternalTsFileQueryResources() when execution.start() fails before normal cleanup can run.

  1. read_tsfile silently drops ATTRIBUTE columns.

TsFileSchemaCollector.java:218-229 only collects TAG and FIELD columns, and build() at TsFileSchemaCollector.java:292-307 only emits time, tags, and fields. If an external TsFile table schema contains ColumnCategory.ATTRIBUTE, those columns disappear from the TVF output schema. The execution path also creates AlignedDeviceEntry(deviceID, new Binary[0]) in ExternalTsFileQueryResource.java:125, so attribute values are not available later either. Since table model supports ATTRIBUTE columns, this should either be implemented end to end or rejected explicitly instead of returning an incomplete schema.

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
E Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants