AI Agent 2026: From Tools to Colleagues
With the release of Claude 4 and GPT-5, AI Agents are evolving from simple task executors to true intelligent colleagues. This article explores the impact on software engineering.
Sharing insights on AI, Machine Learning, and Software Engineering
With the release of Claude 4 and GPT-5, AI Agents are evolving from simple task executors to true intelligent colleagues. This article explores the impact on software engineering.
Google Gemini 2 Pro is officially released, excelling in multimodal understanding and long context. This article compares its performance with GPT-5 and Claude 4.
OpenAI's o3 model reaches new heights in mathematical reasoning, but common-sense reasoning still falls short. This article analyzes the boundaries of current LLM reasoning capabilities.
AI coding tools like Cursor and Windsurf are transforming software development workflows. This article shares my experience using AI coding assistants in real projects.
DeepSeek R2 has gained attention for its exceptional cost-performance ratio. This article analyzes its MoE architecture, training strategies, and performance in real-world scenarios.
Breakthroughs in image understanding by GPT-5o and Gemini Ultra are enabling new multimodal applications. This article explores multimodal LLMs in e-commerce and content creation.
As model capabilities grow, safety alignment becomes increasingly important. This article discusses Constitutional AI, RLAIF, and other new approaches.
With Apple Intelligence and Android Gemini advancing, on-device LLMs are becoming mainstream. This article shares experience deploying LLMs on resource-constrained devices.
The EU AI Act is fully implemented, and US AI executive orders continue to evolve. This article analyzes AI regulatory trends across major economies.
Traditional RAG faces challenges in context length and retrieval accuracy. This article introduces GraphRAG, Self-RAG, and their applications in enterprise knowledge bases.
The 2025 token price war significantly reduced LLM API costs. This article analyzes the evolution of LLM business models and future monetization paths.
Google TPU v6, AMD MI350, and Amazon Trainium3 have been released. This article compares the advantages and trade-offs of different AI chips.
OpenAI's o1 and o3 series proved the effectiveness of test-time compute scaling. This article reviews the development of this technical approach.
OpenAI's 12 consecutive days of releases brought o3, official Sora, and more. This article analyzes the impact of each announcement on the industry.
Google released the Willow quantum chip with breakthroughs in quantum error correction. This article explores the potential impact of quantum computing on AI.
Claude 3.5 Sonnet excels in coding and reasoning tasks. This article analyzes its technical features and compares it with GPT-4o.
OpenAI officially released the Sora video generation model. This article tests its generation quality, controllability, and impact on creative industries.
Gemini 1.5 Pro supports 2M token context, Claude 3.5 supports 200K. This article explores long context technology and applications.
AI continues to break through in materials discovery, drug design, and mathematical proofs. This article reviews 2025's major advances in AI for Science.
NVIDIA released the Blackwell architecture with FP4 precision and 2nd gen Transformer Engine. This article analyzes its impact on large model training.
Llama 3.2 adds vision capabilities and on-device deployment. This article reviews its performance and explores open source model business models.
iOS 18.1 brings Apple Intelligence features. This article tests the experience on iPhone and Mac, analyzing the pros and cons of on-device AI.
The 2024 Nobel Prizes in Physics and Chemistry were awarded to AI-related research. This article discusses the significance of this milestone.
GPT-4o's native audio capabilities make voice interaction more natural. This article tests its performance in real-time conversation and emotional expression.
After the AutoGPT hype, a new generation of more practical Agent frameworks has emerged. This article analyzes the evolution of this field.
DoRA, PiSSA and other new fine-tuning methods surpass LoRA in efficiency and effectiveness. This article compares various fine-tuning techniques.
OpenAI released o1-preview, enhancing reasoning capabilities through reinforcement learning. This article tests its performance on math, coding, and science problems.
Mistral released Large 2, excelling in multilingual and coding capabilities. This article reviews and compares it with Llama 3.1.
As model scales grow, AI infrastructure faces new challenges. This article discusses training cluster optimization and inference service architecture.
Hallucination remains a major issue for LLMs. This article reviews retrieval augmentation, fact-checking, and confidence estimation methods.
This is my first blog post. Here, I will share technical articles about AI, machine learning, and software engineering, as well as personal learning insights.
Get notified when new articles are published