OfficeIMO

API Reference

Class

DocumentReader

Namespace OfficeIMO.Reader
Assembly OfficeIMO.Reader
Modifiers static

Unified, read-only document extraction facade intended for AI ingestion.

Inheritance

  • Object
  • DocumentReader

Remarks

This facade is intentionally dependency-free and deterministic. It normalizes extraction into ReaderChunk instances with stable IDs and location metadata. The API is thread-safe as it does not use shared mutable state.

Methods

BootstrapHostFromAssemblies 2 overloads
public static ReaderHostBootstrapResult BootstrapHostFromAssemblies(IEnumerable<Assembly> assemblies, ReaderHostBootstrapOptions options = null) #
Returns: ReaderHostBootstrapResult

Host bootstrap helper that registers modular handlers from the provided assemblies and returns both typed and JSON capability manifests in one payload.

Parameters

assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly} requiredposition: 0
Assemblies to scan for registrar methods.
options OfficeIMO.Reader.ReaderHostBootstrapOptions = null optionalposition: 1
Bootstrap options. When null, defaults are used.
public static ReaderHostBootstrapResult BootstrapHostFromAssemblies(IEnumerable<Assembly> assemblies, ReaderHostBootstrapProfile profile, Boolean indentedManifestJson = false) #
Returns: ReaderHostBootstrapResult

Host bootstrap helper that applies a preset profile, registers modular handlers from the provided assemblies, and returns both typed and JSON capability manifests in one payload.

Parameters

assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly} requiredposition: 0
Assemblies to scan for registrar methods.
profile OfficeIMO.Reader.ReaderHostBootstrapProfile requiredposition: 1
Bootstrap profile preset.
indentedManifestJson System.Boolean = false optionalposition: 2
When true, indents the returned manifest JSON payload.
BootstrapHostFromLoadedAssemblies 2 overloads
public static ReaderHostBootstrapResult BootstrapHostFromLoadedAssemblies(String assemblyNamePrefix = "OfficeIMO.Reader.", ReaderHostBootstrapOptions options = null) #
Returns: ReaderHostBootstrapResult

Host bootstrap helper that discovers and registers modular handlers from currently loaded assemblies whose simple name starts with assemblyNamePrefix, then returns both typed and JSON capability manifests in one payload.

Parameters

assemblyNamePrefix System.String = "OfficeIMO.Reader." optionalposition: 0
Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
options OfficeIMO.Reader.ReaderHostBootstrapOptions = null optionalposition: 1
Bootstrap options. When null, defaults are used.
public static ReaderHostBootstrapResult BootstrapHostFromLoadedAssemblies(ReaderHostBootstrapProfile profile, String assemblyNamePrefix = "OfficeIMO.Reader.", Boolean indentedManifestJson = false) #
Returns: ReaderHostBootstrapResult

Host bootstrap helper that applies a preset profile, discovers and registers modular handlers from loaded assemblies, and returns both typed and JSON capability manifests in one payload.

Parameters

profile OfficeIMO.Reader.ReaderHostBootstrapProfile requiredposition: 0
Bootstrap profile preset.
assemblyNamePrefix System.String = "OfficeIMO.Reader." optionalposition: 1
Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
indentedManifestJson System.Boolean = false optionalposition: 2
When true, indents the returned manifest JSON payload.
public static ReaderInputKind DetectKind(String path) #
Returns: ReaderInputKind

Detects the input kind based on file extension.

Parameters

path System.String requiredposition: 0
Source file path.
DiscoverHandlerRegistrars 2 overloads
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> DiscoverHandlerRegistrars(params Assembly[] assemblies) #
Returns: IReadOnlyList<ReaderHandlerRegistrarDescriptor>

Discovers modular registrar methods in the provided assemblies.

Parameters

assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly} requiredposition: 0
DiscoverHandlerRegistrars(System.Reflection.Assembly[] assemblies) #

Discovers modular registrar methods in the provided assemblies.

Parameters

assemblies System.Reflection.Assembly[] required
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> DiscoverHandlerRegistrarsFromLoadedAssemblies(String assemblyNamePrefix = "OfficeIMO.Reader.") #
Returns: IReadOnlyList<ReaderHandlerRegistrarDescriptor>

Discovers modular registrar methods from currently loaded assemblies whose simple name starts with assemblyNamePrefix.

Parameters

assemblyNamePrefix System.String = "OfficeIMO.Reader." optionalposition: 0
Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
public static IReadOnlyList<ReaderHandlerCapability> GetCapabilities(Boolean includeBuiltIn = true, Boolean includeCustom = true) #
Returns: IReadOnlyList<ReaderHandlerCapability>

Lists built-in and custom reader capabilities for host discovery.

Parameters

includeBuiltIn System.Boolean = true optionalposition: 0
includeCustom System.Boolean = true optionalposition: 1
public static ReaderCapabilityManifest GetCapabilityManifest(Boolean includeBuiltIn = true, Boolean includeCustom = true) #
Returns: ReaderCapabilityManifest

Builds a machine-readable capability manifest for host auto-discovery.

Parameters

includeBuiltIn System.Boolean = true optionalposition: 0
includeCustom System.Boolean = true optionalposition: 1
public static String GetCapabilityManifestJson(Boolean includeBuiltIn = true, Boolean includeCustom = true, Boolean indented = false) #
Returns: String

Builds a JSON capability manifest payload for host auto-discovery.

Parameters

includeBuiltIn System.Boolean = true optionalposition: 0
includeCustom System.Boolean = true optionalposition: 1
indented System.Boolean = false optionalposition: 2
Read 3 overloads
public static IEnumerable<ReaderChunk> Read(String path, ReaderOptions options = null, CancellationToken cancellationToken = null) #
Returns: IEnumerable<ReaderChunk>

Reads a supported document file and emits normalized extraction chunks.

Parameters

path System.String requiredposition: 0
Source file path.
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 1
Extraction options.
cancellationToken System.Threading.CancellationToken = null optionalposition: 2
Cancellation token.
public static IEnumerable<ReaderChunk> Read(Byte[] bytes, String sourceName = null, ReaderOptions options = null, CancellationToken cancellationToken = null) #
Returns: IEnumerable<ReaderChunk>

Reads a supported document from a stream and emits normalized extraction chunks.

Parameters

stream System.IO.Stream requiredposition: 0
Source stream. This method does not close the stream.
sourceName System.String = null optionalposition: 1
Optional source name used for kind detection (via extension) and citations/IDs. For example: "Policy.docx" or "Workbook.xlsx".
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 2
Extraction options.
cancellationToken System.Threading.CancellationToken = null optionalposition: 3
Cancellation token.
Read(System.Byte[] bytes, System.String sourceName, OfficeIMO.Reader.ReaderOptions options, System.Threading.CancellationToken cancellationToken) #

Reads a supported document from bytes and emits normalized extraction chunks.

Parameters

bytes System.Byte[] required
Source bytes.
sourceName System.String required
Optional source name used for kind detection (via extension) and citations/IDs. For example: "Policy.docx" or "Workbook.xlsx".
options OfficeIMO.Reader.ReaderOptions required
Extraction options.
cancellationToken System.Threading.CancellationToken required
Cancellation token.
ReadFolder 2 overloads
public static IEnumerable<ReaderChunk> ReadFolder(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, CancellationToken cancellationToken = null) #
Returns: IEnumerable<ReaderChunk>

Enumerates a folder and ingests all supported files (best-effort), emitting warning chunks for skipped files.

Parameters

folderPath System.String requiredposition: 0
Folder path.
folderOptions OfficeIMO.Reader.ReaderFolderOptions = null optionalposition: 1
Folder enumeration options.
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 2
Extraction options.
cancellationToken System.Threading.CancellationToken = null optionalposition: 3
Cancellation token.
public static IEnumerable<ReaderChunk> ReadFolder(String folderPath, ReaderFolderOptions folderOptions, ReaderOptions options, Action<ReaderProgress> onProgress, CancellationToken cancellationToken = null) #
Returns: IEnumerable<ReaderChunk>

Enumerates a folder and ingests all supported files (best-effort), emitting warning chunks for skipped files.

Parameters

folderPath System.String requiredposition: 0
Folder path.
folderOptions OfficeIMO.Reader.ReaderFolderOptions requiredposition: 1
Folder enumeration options.
options OfficeIMO.Reader.ReaderOptions requiredposition: 2
Extraction options.
onProgress System.Action{OfficeIMO.Reader.ReaderProgress} requiredposition: 3
Optional progress callback for file-level lifecycle and aggregate counts.
cancellationToken System.Threading.CancellationToken = null optionalposition: 4
Cancellation token.
public static ReaderIngestResult ReadFolderDetailed(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Boolean includeChunks = true, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #
Returns: ReaderIngestResult

Reads a folder and returns ingestion-ready summary/counts with optional chunk materialization.

Parameters

folderPath System.String requiredposition: 0
Folder path.
folderOptions OfficeIMO.Reader.ReaderFolderOptions = null optionalposition: 1
Folder enumeration options.
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 2
Extraction options.
includeChunks System.Boolean = true optionalposition: 3
When true, materializes chunks in the result object.
onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null optionalposition: 4
Optional progress callback.
cancellationToken System.Threading.CancellationToken = null optionalposition: 5
Cancellation token.
public static IEnumerable<ReaderSourceDocument> ReadFolderDocuments(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #
Returns: IEnumerable<ReaderSourceDocument>

Enumerates a folder and emits one source-level payload per file, ready for direct DB upserts.

Parameters

folderPath System.String requiredposition: 0
Folder path.
folderOptions OfficeIMO.Reader.ReaderFolderOptions = null optionalposition: 1
Folder enumeration options.
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 2
Extraction options.
onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null optionalposition: 3
Optional progress callback for file-level lifecycle and aggregate counts.
cancellationToken System.Threading.CancellationToken = null optionalposition: 4
Cancellation token.
public static ReaderPathDocumentResult ReadPathDocumentsDetailed(String path, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Boolean includeDocumentChunks = true, Nullable<Int32> maxReturnedChunks = null, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #
Returns: ReaderPathDocumentResult

Reads a supported file or folder path and returns source-level document payloads with optional chunk shaping.

Parameters

path System.String requiredposition: 0
Source file or folder path.
folderOptions OfficeIMO.Reader.ReaderFolderOptions = null optionalposition: 1
Folder enumeration options when path is a directory.
options OfficeIMO.Reader.ReaderOptions = null optionalposition: 2
Extraction options.
includeDocumentChunks System.Boolean = true optionalposition: 3
When true, includes chunk arrays in returned source documents.
maxReturnedChunks System.Nullable{System.Int32} = null optionalposition: 4
Optional cap across all returned document chunks.
onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null optionalposition: 5
Optional progress callback for folder reads.
cancellationToken System.Threading.CancellationToken = null optionalposition: 6
Cancellation token.
public static Void RegisterHandler(ReaderHandlerRegistration registration, Boolean replaceExisting = false) #
Returns: Void

Registers a custom handler for one or more file extensions.

Parameters

registration OfficeIMO.Reader.ReaderHandlerRegistration requiredposition: 0
Custom handler registration.
replaceExisting System.Boolean = false optionalposition: 1
When true, removes conflicting custom handlers and allows built-in extension overrides.
RegisterHandlersFromAssemblies 2 overloads
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> RegisterHandlersFromAssemblies(Boolean replaceExisting = true, params Assembly[] assemblies) #
Returns: IReadOnlyList<ReaderHandlerRegistrarDescriptor>

Registers modular handlers discovered in the provided assemblies.

Parameters

assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly} = true optionalposition: 0
Assemblies to scan for registrar methods.
replaceExisting System.Boolean = true requiredposition: 1
Passed to discovered registrar methods via their replaceExisting parameter when present.
RegisterHandlersFromAssemblies(System.Boolean replaceExisting, System.Reflection.Assembly[] assemblies) #

Registers modular handlers discovered in the provided assemblies.

Parameters

replaceExisting System.Boolean required
assemblies System.Reflection.Assembly[] required
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> RegisterHandlersFromLoadedAssemblies(Boolean replaceExisting = true, String assemblyNamePrefix = "OfficeIMO.Reader.") #
Returns: IReadOnlyList<ReaderHandlerRegistrarDescriptor>

Registers modular handlers discovered from currently loaded assemblies whose simple name starts with assemblyNamePrefix.

Parameters

replaceExisting System.Boolean = true optionalposition: 0
Passed to discovered registrar methods via their replaceExisting parameter when present.
assemblyNamePrefix System.String = "OfficeIMO.Reader." optionalposition: 1
Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
public static Boolean UnregisterHandler(String handlerId) #
Returns: Boolean

Unregisters a custom handler by identifier.

Parameters

handlerId System.String requiredposition: 0