OpenAI vs Google: One methodology, two flagship AI models, and their best stock picks
We used GPT-5.1 and Gemini-3 Pro to run an autonomous Deep Value research workflow on U.S. equities.
From GPT-5.1 Maximus 1.0:
📶BUY-$TWO (Two Harbors Investment Corp) and hold till December 1st, 2026.
From Gemini-3-Pro Maximus 1.0:
📶BUY-$Q (Qnity Electronics) and hold till December 1st, 2026.
We deployed the freshly released GPT-5.1 (rolled out November 12, 2025) and Gemini-3 Pro (released November 18, 2025) to test the upper limits of autonomous financial analysis. Both systems introduce technical advancements specifically optimized for complex problem-solving: GPT-5.1 features adaptive reasoning designed to support extended agentic workflows, while Gemini-3 Pro leverages a Deep Think–style architecture with expanded multimodal context windows. In parallel with these architectural shifts, the models immediately vaulted into the top tier of public AI benchmarks, including Humanity’s Last Exam, where Gemini-3 Pro and GPT-5.1 currently score roughly 38.3% and 27.2% accuracy, respectively, whereas the best-reported system on the same test was at only 8.8% about a year earlier. Against this backdrop of rapidly rising benchmark performance, we tasked these models with executing a structured deep value research methodology to identify unpriced market opportunities; the following report details their independent findings and the resulting investment memos.

Methodology:
We used the following prompt in ChatGPT (5.1 Thinking) and the Gemini app (Thinking with 3 Pro) in "Deep Research" mode. This prompt transforms the AI from a passive search engine into an autonomous analyst by deploying a strict, multi-stage agentic workflow. Instead of settling for surface-level headlines, the methodology forces the Deep Research tool to execute an iterative funnel: first casting a wide net for "information asymmetry" (specific news events the market hasn't priced in), then rigorously filtering candidates against hard financial constraints (like cash burn and price divergence), and finally conducting a "pre-mortem" on the sole survivor. It is specifically engineered to exploit the Deep Research architecture's ability to browse sequentially and synthesize complex data, preventing the common AI failure mode of simply reciting generic, "priced-in" market consensus.