API Reference
DocumentReader
Unified, read-only document extraction facade intended for AI ingestion.
Inheritance
- Object
- DocumentReader
Remarks
This facade is intentionally dependency-free and deterministic. It normalizes extraction into ReaderChunk instances with stable IDs and location metadata. The API is thread-safe as it does not use shared mutable state.
Methods
public static ReaderHostBootstrapResult BootstrapHostFromAssemblies(IEnumerable<Assembly> assemblies, ReaderHostBootstrapOptions options = null) #ReaderHostBootstrapResultHost bootstrap helper that registers modular handlers from the provided assemblies and returns both typed and JSON capability manifests in one payload.
Parameters
- assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly}
- Assemblies to scan for registrar methods.
- options OfficeIMO.Reader.ReaderHostBootstrapOptions = null
- Bootstrap options. When null, defaults are used.
public static ReaderHostBootstrapResult BootstrapHostFromAssemblies(IEnumerable<Assembly> assemblies, ReaderHostBootstrapProfile profile, Boolean indentedManifestJson = false) #ReaderHostBootstrapResultHost bootstrap helper that applies a preset profile, registers modular handlers from the provided assemblies, and returns both typed and JSON capability manifests in one payload.
Parameters
- assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly}
- Assemblies to scan for registrar methods.
- profile OfficeIMO.Reader.ReaderHostBootstrapProfile
- Bootstrap profile preset.
- indentedManifestJson System.Boolean = false
- When true, indents the returned manifest JSON payload.
public static ReaderHostBootstrapResult BootstrapHostFromLoadedAssemblies(String assemblyNamePrefix = "OfficeIMO.Reader.", ReaderHostBootstrapOptions options = null) #ReaderHostBootstrapResultHost bootstrap helper that discovers and registers modular handlers from currently loaded assemblies whose simple name starts with assemblyNamePrefix, then returns both typed and JSON capability manifests in one payload.
Parameters
- assemblyNamePrefix System.String = "OfficeIMO.Reader."
- Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
- options OfficeIMO.Reader.ReaderHostBootstrapOptions = null
- Bootstrap options. When null, defaults are used.
public static ReaderHostBootstrapResult BootstrapHostFromLoadedAssemblies(ReaderHostBootstrapProfile profile, String assemblyNamePrefix = "OfficeIMO.Reader.", Boolean indentedManifestJson = false) #ReaderHostBootstrapResultHost bootstrap helper that applies a preset profile, discovers and registers modular handlers from loaded assemblies, and returns both typed and JSON capability manifests in one payload.
Parameters
- profile OfficeIMO.Reader.ReaderHostBootstrapProfile
- Bootstrap profile preset.
- assemblyNamePrefix System.String = "OfficeIMO.Reader."
- Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
- indentedManifestJson System.Boolean = false
- When true, indents the returned manifest JSON payload.
public static ReaderInputKind DetectKind(String path) #ReaderInputKindDetects the input kind based on file extension.
Parameters
- path System.String
- Source file path.
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> DiscoverHandlerRegistrars(params Assembly[] assemblies) #IReadOnlyList<ReaderHandlerRegistrarDescriptor>Discovers modular registrar methods in the provided assemblies.
Parameters
- assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly}
DiscoverHandlerRegistrars(System.Reflection.Assembly[] assemblies) #Discovers modular registrar methods in the provided assemblies.
Parameters
- assemblies System.Reflection.Assembly[]
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> DiscoverHandlerRegistrarsFromLoadedAssemblies(String assemblyNamePrefix = "OfficeIMO.Reader.") #IReadOnlyList<ReaderHandlerRegistrarDescriptor>Discovers modular registrar methods from currently loaded assemblies whose simple name starts with assemblyNamePrefix.
Parameters
- assemblyNamePrefix System.String = "OfficeIMO.Reader."
- Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
public static IReadOnlyList<ReaderHandlerCapability> GetCapabilities(Boolean includeBuiltIn = true, Boolean includeCustom = true) #IReadOnlyList<ReaderHandlerCapability>Lists built-in and custom reader capabilities for host discovery.
Parameters
- includeBuiltIn System.Boolean = true
- includeCustom System.Boolean = true
public static ReaderCapabilityManifest GetCapabilityManifest(Boolean includeBuiltIn = true, Boolean includeCustom = true) #ReaderCapabilityManifestBuilds a machine-readable capability manifest for host auto-discovery.
Parameters
- includeBuiltIn System.Boolean = true
- includeCustom System.Boolean = true
public static String GetCapabilityManifestJson(Boolean includeBuiltIn = true, Boolean includeCustom = true, Boolean indented = false) #StringBuilds a JSON capability manifest payload for host auto-discovery.
Parameters
- includeBuiltIn System.Boolean = true
- includeCustom System.Boolean = true
- indented System.Boolean = false
public static IEnumerable<ReaderChunk> Read(String path, ReaderOptions options = null, CancellationToken cancellationToken = null) #IEnumerable<ReaderChunk>Reads a supported document file and emits normalized extraction chunks.
Parameters
- path System.String
- Source file path.
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static IEnumerable<ReaderChunk> Read(Byte[] bytes, String sourceName = null, ReaderOptions options = null, CancellationToken cancellationToken = null) #IEnumerable<ReaderChunk>Reads a supported document from a stream and emits normalized extraction chunks.
Parameters
- stream System.IO.Stream
- Source stream. This method does not close the stream.
- sourceName System.String = null
- Optional source name used for kind detection (via extension) and citations/IDs. For example: "Policy.docx" or "Workbook.xlsx".
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
Read(System.Byte[] bytes, System.String sourceName, OfficeIMO.Reader.ReaderOptions options, System.Threading.CancellationToken cancellationToken) #Reads a supported document from bytes and emits normalized extraction chunks.
Parameters
- bytes System.Byte[]
- Source bytes.
- sourceName System.String
- Optional source name used for kind detection (via extension) and citations/IDs. For example: "Policy.docx" or "Workbook.xlsx".
- options OfficeIMO.Reader.ReaderOptions
- Extraction options.
- cancellationToken System.Threading.CancellationToken
- Cancellation token.
public static IEnumerable<ReaderChunk> ReadFolder(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, CancellationToken cancellationToken = null) #IEnumerable<ReaderChunk>Enumerates a folder and ingests all supported files (best-effort), emitting warning chunks for skipped files.
Parameters
- folderPath System.String
- Folder path.
- folderOptions OfficeIMO.Reader.ReaderFolderOptions = null
- Folder enumeration options.
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static IEnumerable<ReaderChunk> ReadFolder(String folderPath, ReaderFolderOptions folderOptions, ReaderOptions options, Action<ReaderProgress> onProgress, CancellationToken cancellationToken = null) #IEnumerable<ReaderChunk>Enumerates a folder and ingests all supported files (best-effort), emitting warning chunks for skipped files.
Parameters
- folderPath System.String
- Folder path.
- folderOptions OfficeIMO.Reader.ReaderFolderOptions
- Folder enumeration options.
- options OfficeIMO.Reader.ReaderOptions
- Extraction options.
- onProgress System.Action{OfficeIMO.Reader.ReaderProgress}
- Optional progress callback for file-level lifecycle and aggregate counts.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static ReaderIngestResult ReadFolderDetailed(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Boolean includeChunks = true, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #ReaderIngestResultReads a folder and returns ingestion-ready summary/counts with optional chunk materialization.
Parameters
- folderPath System.String
- Folder path.
- folderOptions OfficeIMO.Reader.ReaderFolderOptions = null
- Folder enumeration options.
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- includeChunks System.Boolean = true
- When true, materializes chunks in the result object.
- onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null
- Optional progress callback.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static IEnumerable<ReaderSourceDocument> ReadFolderDocuments(String folderPath, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #IEnumerable<ReaderSourceDocument>Enumerates a folder and emits one source-level payload per file, ready for direct DB upserts.
Parameters
- folderPath System.String
- Folder path.
- folderOptions OfficeIMO.Reader.ReaderFolderOptions = null
- Folder enumeration options.
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null
- Optional progress callback for file-level lifecycle and aggregate counts.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static ReaderPathDocumentResult ReadPathDocumentsDetailed(String path, ReaderFolderOptions folderOptions = null, ReaderOptions options = null, Boolean includeDocumentChunks = true, Nullable<Int32> maxReturnedChunks = null, Action<ReaderProgress> onProgress = null, CancellationToken cancellationToken = null) #ReaderPathDocumentResultReads a supported file or folder path and returns source-level document payloads with optional chunk shaping.
Parameters
- path System.String
- Source file or folder path.
- folderOptions OfficeIMO.Reader.ReaderFolderOptions = null
- Folder enumeration options when path is a directory.
- options OfficeIMO.Reader.ReaderOptions = null
- Extraction options.
- includeDocumentChunks System.Boolean = true
- When true, includes chunk arrays in returned source documents.
- maxReturnedChunks System.Nullable{System.Int32} = null
- Optional cap across all returned document chunks.
- onProgress System.Action{OfficeIMO.Reader.ReaderProgress} = null
- Optional progress callback for folder reads.
- cancellationToken System.Threading.CancellationToken = null
- Cancellation token.
public static Void RegisterHandler(ReaderHandlerRegistration registration, Boolean replaceExisting = false) #VoidRegisters a custom handler for one or more file extensions.
Parameters
- registration OfficeIMO.Reader.ReaderHandlerRegistration
- Custom handler registration.
- replaceExisting System.Boolean = false
- When true, removes conflicting custom handlers and allows built-in extension overrides.
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> RegisterHandlersFromAssemblies(Boolean replaceExisting = true, params Assembly[] assemblies) #IReadOnlyList<ReaderHandlerRegistrarDescriptor>Registers modular handlers discovered in the provided assemblies.
Parameters
- assemblies System.Collections.Generic.IEnumerable{System.Reflection.Assembly} = true
- Assemblies to scan for registrar methods.
- replaceExisting System.Boolean = true
- Passed to discovered registrar methods via their replaceExisting parameter when present.
RegisterHandlersFromAssemblies(System.Boolean replaceExisting, System.Reflection.Assembly[] assemblies) #Registers modular handlers discovered in the provided assemblies.
Parameters
- replaceExisting System.Boolean
- assemblies System.Reflection.Assembly[]
public static IReadOnlyList<ReaderHandlerRegistrarDescriptor> RegisterHandlersFromLoadedAssemblies(Boolean replaceExisting = true, String assemblyNamePrefix = "OfficeIMO.Reader.") #IReadOnlyList<ReaderHandlerRegistrarDescriptor>Registers modular handlers discovered from currently loaded assemblies whose simple name starts with assemblyNamePrefix.
Parameters
- replaceExisting System.Boolean = true
- Passed to discovered registrar methods via their replaceExisting parameter when present.
- assemblyNamePrefix System.String = "OfficeIMO.Reader."
- Simple assembly-name prefix filter. Default: OfficeIMO.Reader..
public static Boolean UnregisterHandler(String handlerId) #BooleanUnregisters a custom handler by identifier.
Parameters
- handlerId System.String
Inherited Methods
public override Boolean Equals(Object obj) #BooleanParameters
- obj Object