JC-AI Newsletter #5

Author: Miro Wengner

Original post on Foojay: Read More

Table of Contents

Fourteen days have passed, and it is time to present a fresh collection of readings that could influence developments in the field of artificial intelligence.

Beyond opinion pieces and Java focused tutorials that can enhance your understanding of AI applications, this newsletter concentrates on LLM benchmarking methodologies designed to ensure models accuracy and competency in handling complex contextual information.

The world influenced by LLM is changing very quickly, let’s start…

article : Generating and editing images with Nano Banana in Java
authors: Guillaume Laforge
date: 2025-09-09
desc.: Learn how to create and edit images using Google’s latest “Nano Banana” model (also known as Gemini 2.5 Flash Image) using the Java language.
category: tutorial

article: Generating videos in Java with Veo 3
authors: Guillaume Laforge
date: 2025-09-10
desc.: The Veo 3 allows users to create 8 second videos, either from a prompt, or from an image that serves as a starting point. And of course, it’s possible to use that model from Java.
category: tutorial

article: Stochastic AI Agility: Breaking Cycles of Debt
authors: Miro Wengner
date: 2025-09-10
desc.: This article attempts to tackle challenges related to the used project management methodologies (agile, scrum, kanban, waterfall etc.).
category: opinion

article: Conversation: LLMs and Building Abstractions
authors: Unmesh Joshi, Martin Fowler
date: 2025-08-26
desc.: This article discusses the importance of creating a good project vocabulary as a means of identifying fitting abstractions. This approach leverages LLMs as valuable brainstorming instruments to identify overlooked details, rather than depending on LLM-generated implementation code and assuming its accuracy.
category: opinion

article: Some thoughts on LLMs and Software Development
authors: Martin Fowler
date: 2025-08-28
desc.: This article examines the role of LLMs in development processes, highlighting both their potential contributions and the importance of maintaining realistic expectations about their capabilities.
category: opinion

article: The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs
authors: Akshit Sinha, Arvindh Arun, Shashwat Goel, Steffen Staab, Jonas Geiping
date: 2025-09-11
desc.: While considerable attention has been devoted to LLM planning capabilities, execution remains an understudied challenge, despite its importance as LLMs are increasingly deployed for extended reasoning and agentic tasks. While one might attribute failures in extended tasks to the accumulation of minor errors, this research provides an in-depth examination of the phenomenon.
category: research

article: CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large
authors: Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu
date: 2025-09-11
desc.: Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. This paper introduces a Curiosity-Driven Exploration method that incorporates a novel curiosity model to enhance reinforcement learning in LLMs. This paper tackles challenges in rewarding approaches used for curiosity and critical actors while discussing their stability during the training sequence.
category: research

article: LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
authors: Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Jianguo Zhang, Haolin Chen and others
date: 2025-09-11
desc.: The emergence of long-context language models with context windows extending to millions of tokens has created new opportunities for sophisticated code understanding and software development evaluation. This paper proposes LoCoBench, a comprehensive benchmark designed to evaluate long-context LLMs in complex software development scenarios.
category: research

article: Towards Adaptive ML Benchmarks: Web-Agent-Driven Construction, Domain Expansion, and Metric Optimization
authors: Hangyi Jia, Yuxi Qian, Hanwen Tong, Xinhui Wu, Lin Chen, Feng Wei
date: 2025-09-11
desc.: Recent advances in large language models (LLMs) have enabled the emergence of general-purpose agents capable of automating end-to-end machine learning (ML) workflows, including data analysis, training, and competition solving. However, existing benchmarks remain limited in many ways. This paper presents TAM Bench, a structured, diverse, and realistic benchmark for evaluating LLM-based agents. TAM Bench proposes the utilization of LLMs, browsers, and the Model Context Protocol (MCP) to create fully automated constructs. The paper addresses both the achievements and difficulties associated with such an approach.
category: research

article: PATENTWRITER: A Benchmarking Study for Patent Drafting with LLMs
authors: Homaira Huda Shomee, Suman Kalyan Maity, Sourav Medya
date: 2025-07-30
desc.: The paper presents PATENTWRITER, the first unified benchmarking framework for evaluating LLMs in patent abstract generation. PATENTWRITER utilizes standard natural language processing benchmarks (BLEU, ROUGE, BERTScore). The experiments highlight LLM capabilities, often surpassing domain-specific baselines. The article raises ethical considerations while highlighting these achievements.
category: research

article: BEnchmarking LLMs for Ophthalmology (BELO) for Ophthalmological Knowledge and Reasoning
authors: Sahana Srinivasan, Xuguang Ai, Thaddaeus Wai Soon Lo, Aidan Gilson and others
date: 2025-07-21
desc.: The paper introduces the BELO benchmark, which employs keyword matching, a fine-tuned PubMedBERT model, and expert review for evaluation. The study assessed six large language models using multiple text-generation metrics (ROUGE-L, BERTScore, BARTScore, METEOR, and AlignScore) alongside human evaluation to determine accuracy. The results revealed suboptimal performance and highlighted the need for improvements in clinical reasoning capabilities..
category: research

article: You Can Build Better AI Agents in Java Than Python
authors: Rod Johnson
date: 2025-08-18
desc.: This tutorial presents a novel Embadel agent framework. The implementation example of an agentic book writer demonstrates multiple advantages of the JVM and Java language over Python alternatives, including type safety and streamlined application development.
category: tutorial

article: Research: Measuring Energy Consumption in Programming Languages for AI Applications
authors: Miro Wengner
date: 2025-09-15
desc.: This article presents the research paper ‘Measuring Energy Consumption in Programming Languages for AI Applications,’ which analyzes energy consumption across programming languages used for agentic AI system interactions and computationally intensive applications. The study evaluates Java platform performance and energy efficiency in these contexts, providing development recommendations and hardware selection guidance supported by statistical analysis.
category: research

Previous:
Newsletter vol. 1
Newsletter vol. 2
Newsletter vol. 3
Newsletter vol. 4

The post JC-AI Newsletter #5 appeared first on foojay.