Result of LLMExtractor.prepare() — everything needed for extraction. Fields ------ synopsis: Synopsis line from lexgrog, or None. aliases: (name, score) tuples for alternative command names. original_lines: 1-indexed line number → original line content (before
| 128 | |
| 129 | @dataclass |
| 130 | class PreparedFile: |
| 131 | """Result of LLMExtractor.prepare() — everything needed for extraction. |
| 132 | |
| 133 | Fields |
| 134 | ------ |
| 135 | synopsis: Synopsis line from lexgrog, or None. |
| 136 | aliases: (name, score) tuples for alternative command names. |
| 137 | original_lines: 1-indexed line number → original line content (before |
| 138 | numbering). Used by finalize to map LLM line references |
| 139 | back to source text. |
| 140 | basename: Manpage file stem without .gz/.section suffixes (e.g. "tar"). |
| 141 | numbered_text: Full manpage text with " 42| …" line-number prefixes, |
| 142 | used for debug dumps. |
| 143 | plain_text_len: Length of the original plain text before filtering/chunking. |
| 144 | plain_text: Original unfiltered manpage text (used for RawManpage storage). |
| 145 | requests: Pre-formatted user-content strings, one per chunk, ready to |
| 146 | submit to the LLM provider. |
| 147 | n_chunks: Derived property — ``len(requests)``. |
| 148 | """ |
| 149 | |
| 150 | synopsis: str | None |
| 151 | aliases: list[tuple[str, int]] |
| 152 | original_lines: dict[int, str] |
| 153 | basename: str |
| 154 | numbered_text: str |
| 155 | plain_text_len: int |
| 156 | plain_text: str |
| 157 | requests: list[str] |
| 158 | |
| 159 | @property |
| 160 | def n_chunks(self) -> int: |
| 161 | return len(self.requests) |
| 162 | |
| 163 | |
| 164 | @runtime_checkable |
no outgoing calls