Test the same reasoning prompts across different AI models, compare their chain-of-thought transparency, and generate comparative analysis reports for model selection.