A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::open($save_path, $name) should either be compatible with SessionHandlerInterface::open(string $path, string $name): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 132

A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::close() should either be compatible with SessionHandlerInterface::close(): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 290

A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::read($session_id) should either be compatible with SessionHandlerInterface::read(string $id): string|false, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 164

A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::write($session_id, $session_data) should either be compatible with SessionHandlerInterface::write(string $id, string $data): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 233

A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::destroy($session_id) should either be compatible with SessionHandlerInterface::destroy(string $id): bool, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 313

A PHP Error was encountered

Severity: 8192

Message: Return type of CI_Session_files_driver::gc($maxlifetime) should either be compatible with SessionHandlerInterface::gc(int $max_lifetime): int|false, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice

Filename: drivers/Session_files_driver.php

Line Number: 354

News Swiftly

Apple study exposes deep cracks in LLMs? ?reasoning? capabilities

For a while now, companies like OpenAI and Google have been touting advanced "reasoning" capabilities as the next big step in their latest artificial intelligence models. Now, though, a new study from six Apple engineers shows that the mathematical "reasoning" displayed by advanced large language models can be extremely brittle and unreliable in the face of seemingly trivial changes to common benchmark problems.

The fragility highlighted in these new results helps support previous research suggesting that LLMs use of probabilistic pattern matching is missing the formal understanding of underlying concepts needed for truly reliable mathematical reasoning capabilities. "Current LLMs are not capable of genuine logical reasoning," the researchers hypothesize based on these results. "Instead, they attempt to replicate the reasoning steps observed in their training data."

Mix it up

In "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models"?currently available as a pre-print paper?the six Apple researchers start with GSM8K's standardized set of over 8,000 grade-school level mathematical word problems, which is often used as a benchmark for modern LLMs' complex reasoning capabilities. They then take the novel approach of modifying a portion of that testing set to dynamically replace certain names and numbers with new values?so a question about Sophie getting 31 building blocks for her nephew in GSM8K could become a question about Bill getting 19 building blocks for his brother in the new GSM-Symbolic evaluation.

This approach helps avoid any potential "data contamination" that can result from the static GSM8K questions being fed directly into an AI model's training data. At the same time, these incidental changes don't alter the actual difficulty of the inherent mathematical reasoning at all, meaning models should theoretically perform just as well when tested on GSM-Symbolic as GSM8K.

Credit: Apple Research Simply changing specific names and numbers found in GSM8K tests led to significant decreases in performance in many models. Credit: Apple Research Simply changing specific names and numbers found in GSM8K tests led to significant decreases in performance in many models.

Instead, when the researchers tested more than 20 state-of-the-art LLMs on GSM-Symbolic, they found average accuracy reduced across the board compared to GSM8K, with performance drops between 0.3 percent and 9.2 percent, depending on the model. The results also showed high variance across 50 separate runs of GSM-Symbolic with different names and values. Gaps of up to 15 percent accuracy between the best and worst runs were common within a single model and, for some reason, changing the numbers tended to result in worse accuracy than changing the names.