Extensible bag of metrics. Adding a stat = one new field with a default. All additive fields default to zero so that aggregate accumulators (``total = ExtractionStats(); total += per_file_stats``) start correctly. Per-file extractors set ``chunks = 1`` (or higher) at extraction time.
| 12 | |
| 13 | @dataclass |
| 14 | class ExtractionStats: |
| 15 | """Extensible bag of metrics. Adding a stat = one new field with a default. |
| 16 | |
| 17 | All additive fields default to zero so that aggregate accumulators |
| 18 | (``total = ExtractionStats(); total += per_file_stats``) start correctly. |
| 19 | Per-file extractors set ``chunks = 1`` (or higher) at extraction time. |
| 20 | """ |
| 21 | |
| 22 | # LLM input token count (batch-level aggregate). |
| 23 | input_tokens: int = 0 |
| 24 | # LLM output token count (batch-level aggregate). |
| 25 | output_tokens: int = 0 |
| 26 | # LLM reasoning/thinking tokens (subset of output). |
| 27 | reasoning_tokens: int = 0 |
| 28 | # Number of text chunks sent to the LLM for this file. |
| 29 | chunks: int = 0 |
| 30 | # Character count of the manpage plain text after filtering. |
| 31 | plain_text_len: int = 0 |
| 32 | # Wall-clock time for extraction. |
| 33 | elapsed_seconds: float = 0.0 |
| 34 | # Options skipped due to invalid LLM output that could not be recovered |
| 35 | # by normalization (e.g. missing lines, structurally broken dicts). |
| 36 | malformed_options: int = 0 |
| 37 | # Options recovered by normalize_option_fields (e.g. has_argument: null → False, |
| 38 | # list[int] → list[str]). |
| 39 | normalized_options: int = 0 |
| 40 | # Options removed by drop_empty in postprocessing because they had no |
| 41 | # flags (short/long) and no positional name — typically caused by the |
| 42 | # LLM omitting the flag from its response. |
| 43 | dropped_empty: int = 0 |
| 44 | # Options removed as duplicates (exact-match or strict-subset) by |
| 45 | # dedup_options in postprocessing. |
| 46 | deduped_options: int = 0 |
| 47 | |
| 48 | def __iadd__(self, other: ExtractionStats) -> ExtractionStats: |
| 49 | """Accumulate numeric fields.""" |
| 50 | self.input_tokens += other.input_tokens |
| 51 | self.output_tokens += other.output_tokens |
| 52 | self.reasoning_tokens += other.reasoning_tokens |
| 53 | self.chunks += other.chunks |
| 54 | self.plain_text_len += other.plain_text_len |
| 55 | self.elapsed_seconds += other.elapsed_seconds |
| 56 | self.malformed_options += other.malformed_options |
| 57 | self.normalized_options += other.normalized_options |
| 58 | self.dropped_empty += other.dropped_empty |
| 59 | self.deduped_options += other.deduped_options |
| 60 | return self |
| 61 | |
| 62 | |
| 63 | class ExtractionOutcome(enum.Enum): |
no outgoing calls