AI-powered agent for automating iOS phone interactions. The agent uses a vision-language model to understand screen content and decide on actions to complete user tasks via WebDriverAgent. Args: model_config: Configuration for the AI model. agent_config: Configurat
| 42 | |
| 43 | |
| 44 | class IOSPhoneAgent: |
| 45 | """ |
| 46 | AI-powered agent for automating iOS phone interactions. |
| 47 | |
| 48 | The agent uses a vision-language model to understand screen content |
| 49 | and decide on actions to complete user tasks via WebDriverAgent. |
| 50 | |
| 51 | Args: |
| 52 | model_config: Configuration for the AI model. |
| 53 | agent_config: Configuration for the iOS agent behavior. |
| 54 | confirmation_callback: Optional callback for sensitive action confirmation. |
| 55 | takeover_callback: Optional callback for takeover requests. |
| 56 | |
| 57 | Example: |
| 58 | >>> from phone_agent.agent_ios import IOSPhoneAgent, IOSAgentConfig |
| 59 | >>> from phone_agent.model import ModelConfig |
| 60 | >>> |
| 61 | >>> model_config = ModelConfig(base_url="http://localhost:8000/v1") |
| 62 | >>> agent_config = IOSAgentConfig(wda_url="http://localhost:8100") |
| 63 | >>> agent = IOSPhoneAgent(model_config, agent_config) |
| 64 | >>> agent.run("Open Safari and search for Apple") |
| 65 | """ |
| 66 | |
| 67 | def __init__( |
| 68 | self, |
| 69 | model_config: ModelConfig | None = None, |
| 70 | agent_config: IOSAgentConfig | None = None, |
| 71 | confirmation_callback: Callable[[str], bool] | None = None, |
| 72 | takeover_callback: Callable[[str], None] | None = None, |
| 73 | ): |
| 74 | self.model_config = model_config or ModelConfig() |
| 75 | self.agent_config = agent_config or IOSAgentConfig() |
| 76 | |
| 77 | self.model_client = ModelClient(self.model_config) |
| 78 | |
| 79 | # Initialize WDA connection and create session if needed |
| 80 | self.wda_connection = XCTestConnection(wda_url=self.agent_config.wda_url) |
| 81 | |
| 82 | # Auto-create session if not provided |
| 83 | if self.agent_config.session_id is None: |
| 84 | success, session_id = self.wda_connection.start_wda_session() |
| 85 | if success and session_id != "session_started": |
| 86 | self.agent_config.session_id = session_id |
| 87 | if self.agent_config.verbose: |
| 88 | print(f"✅ Created WDA session: {session_id}") |
| 89 | elif self.agent_config.verbose: |
| 90 | print(f"⚠️ Using default WDA session (no explicit session ID)") |
| 91 | |
| 92 | self.action_handler = IOSActionHandler( |
| 93 | wda_url=self.agent_config.wda_url, |
| 94 | session_id=self.agent_config.session_id, |
| 95 | confirmation_callback=confirmation_callback, |
| 96 | takeover_callback=takeover_callback, |
| 97 | ) |
| 98 | |
| 99 | self._context: list[dict[str, Any]] = [] |
| 100 | self._step_count = 0 |
| 101 |