Table of Contents

Class PdfPigParser

Namespace
Mythosia.Documents.Pdf
Assembly
Mythosia.Documents.Pdf.dll

Parses PDF files using PdfPig into a structured DoclingDocument. Extracts headings (via font-size analysis), lists (via prefix detection), and paragraphs (via spatial line grouping).

public class PdfPigParser : IDocumentParser
Inheritance
PdfPigParser
Implements
Inherited Members

Constructors

PdfPigParser(PdfParserOptions?)

public PdfPigParser(PdfParserOptions? options = null)

Parameters

options PdfParserOptions

Methods

CanParse(string)

Returns true if the parser can handle the given source.

public bool CanParse(string source)

Parameters

source string

Returns

bool

ParseAsync(string, CancellationToken)

Parses the document and returns a structured DoclingDocument.

public Task<DoclingDocument> ParseAsync(string source, CancellationToken ct = default)

Parameters

source string
ct CancellationToken

Returns

Task<DoclingDocument>