Exploring Tech & Thoughts

Sharing insights on AI, Machine Learning, and Software Engineering

AllAILLMInferencePaper ReviewEssay

Latest Posts

Multimodal AIVision-Language ModelsDocument ProcessingInference Optimization

Multimodal Document Intelligence: Preprocessing and Inference Optimization Strategies

A deep dive into how Vision-Language Models process documents, covering preprocessing pipelines, vision token compression, quantization strategies, and production deployment optimization for multimodal AI systems.

2026-05-2115 min

VLLMQuantizationInference OptimizationLLM Deployment

Deep Dive into VLLM Quantization: GGUF, AWQ, GPTQ and Beyond

A comprehensive technical analysis of quantization methods in VLLM, including GGUF, AWQ, GPTQ, and emerging FP8 quantization. Learn how to optimize LLM inference for production deployment.

2025-05-0512 min

AnthropicClaudePerformance Issues

Claude Opus 4.7: A Serious Performance Regression?

Claude Opus 4.7, released by Anthropic on April 16, 2025, has been met with widespread user complaints about performance degradation, hallucinations, and increased costs. This article analyzes the reported issues and their implications for AI reliability.

2025-04-1910 min

LLMOpenAIMetaGoogleIndustry News

LLM Landscape in April 2025: GPT-4o Takes Over, Meta Releases Llama 4

April 2025 marks a pivotal month in the large language model industry. OpenAI begins phasing out GPT-4 in favor of GPT-4o, Meta launches the highly anticipated Llama 4 family, and Google pushes forward with Gemini 2.5.

2025-04-108 min

Breaking NewsSecurityAnthropic

Claude Code Source Code Leak: Anatomy of a $2.5 Billion Mistake

On March 31, 2026, Anthropic's Claude Code suffered a catastrophic source code leak. Approximately 512,000 lines of TypeScript code were exposed. This article examines the incident's timeline, technical details, and far-reaching implications.

2026-03-3115 min

AI AgentTrendsClaude

AI Agent 2026: From Tools to Colleagues

With the release of Claude 4 and GPT-5, AI Agents are evolving from simple task executors to true intelligent colleagues. This article explores the impact on software engineering.

2026-03-257 min

GeminiReviewMultimodal

Gemini 2 Pro Deep Dive: Google's Comeback

Google Gemini 2 Pro is officially released, excelling in multimodal understanding and long context. This article compares its performance with GPT-5 and Claude 4.

2026-03-188 min

Reasoningo3OpenAI

LLM Reasoning: Breakthroughs and Limitations

OpenAI's o3 model reaches new heights in mathematical reasoning, but common-sense reasoning still falls short. This article analyzes the boundaries of current LLM reasoning capabilities.

2026-03-106 min

AI CodingCursorDevTools

AI Coding Assistants 2026: From Copilot to Architect

AI coding tools like Cursor and Windsurf are transforming software development workflows. This article shares my experience using AI coding assistants in real projects.

2026-03-059 min

DeepSeekMoEChinese LLM

DeepSeek R2: The Rise of Chinese LLMs

DeepSeek R2 has gained attention for its exceptional cost-performance ratio. This article analyzes its MoE architecture, training strategies, and performance in real-world scenarios.

2026-02-2810 min

MultimodalGPT-5oApplications

Multimodal LLMs: Vision and Language Fusion

Breakthroughs in image understanding by GPT-5o and Gemini Ultra are enabling new multimodal applications. This article explores multimodal LLMs in e-commerce and content creation.

2026-02-207 min

SafetyAlignmentRLHF

LLM Safety Alignment: Beyond RLHF

As model capabilities grow, safety alignment becomes increasingly important. This article discusses Constitutional AI, RLAIF, and other new approaches.

2026-02-148 min

Edge ComputingMobileDeployment

Edge LLM Deployment Practices

With Apple Intelligence and Android Gemini advancing, on-device LLMs are becoming mainstream. This article shares experience deploying LLMs on resource-constrained devices.

2026-02-089 min

PolicyRegulationAI Act

AI Regulation 2026: Global Policy Trends

The EU AI Act is fully implemented, and US AI executive orders continue to evolve. This article analyzes AI regulatory trends across major economies.

2026-02-016 min

RAGKnowledge GraphRetrieval

RAG Evolution: From Vector Search to Knowledge Graphs

Traditional RAG faces challenges in context length and retrieval accuracy. This article introduces GraphRAG, Self-RAG, and their applications in enterprise knowledge bases.

2026-01-258 min

Business ModelCostAPI

LLM Economics: After the Token Price War

The 2025 token price war significantly reduced LLM API costs. This article analyzes the evolution of LLM business models and future monetization paths.

2026-01-187 min

AI ChipsTPUHardware

AI Chips 2026: Beyond NVIDIA

Google TPU v6, AMD MI350, and Amazon Trainium3 have been released. This article compares the advantages and trade-offs of different AI chips.

2026-01-129 min

Test-Time Scalingo1Trends

Test-Time Scaling: The Most Important Trend of 2025

OpenAI's o1 and o3 series proved the effectiveness of test-time compute scaling. This article reviews the development of this technical approach.

2025-12-288 min

OpenAIo3Sora

OpenAI 12 Days of Shipmas: A Review

OpenAI's 12 consecutive days of releases brought o3, official Sora, and more. This article analyzes the impact of each announcement on the industry.

2025-12-2010 min

QuantumWillowGoogle

Google Willow: AI Meets Quantum Computing

Google released the Willow quantum chip with breakthroughs in quantum error correction. This article explores the potential impact of quantum computing on AI.

2025-12-157 min

ClaudeAnthropicReview

Claude 3.5 Sonnet: Anthropic's Steady Progress

Claude 3.5 Sonnet excels in coding and reasoning tasks. This article analyzes its technical features and compares it with GPT-4o.

2025-12-088 min

SoraVideo GenerationOpenAI

Sora Official Release: Video Generation Enters a New Era

OpenAI officially released the Sora video generation model. This article tests its generation quality, controllability, and impact on creative industries.

2025-12-029 min

Long ContextGeminiClaude

Long Context LLMs: From 128K to Infinity

Gemini 1.5 Pro supports 2M token context, Claude 3.5 supports 200K. This article explores long context technology and applications.

2025-11-257 min

AI4ScienceAlphaFoldResearch

AI for Science: Beyond AlphaFold

AI continues to break through in materials discovery, drug design, and mathematical proofs. This article reviews 2025's major advances in AI for Science.

2025-11-188 min

NVIDIABlackwellGPU

NVIDIA Blackwell: The New Standard for AI Training

NVIDIA released the Blackwell architecture with FP4 precision and 2nd gen Transformer Engine. This article analyzes its impact on large model training.

2025-11-129 min

LlamaMetaOpen Source

Meta Llama 3.2: Open Source Multimodal Choice

Llama 3.2 adds vision capabilities and on-device deployment. This article reviews its performance and explores open source model business models.

2025-11-058 min

AppleOn-Device AIPrivacy

Apple Intelligence: Privacy-First AI

iOS 18.1 brings Apple Intelligence features. This article tests the experience on iPhone and Mac, analyzing the pros and cons of on-device AI.

2025-10-287 min

Nobel PrizeHintonAlphaFold

Nobel Prize and AI: Lessons from Hinton and AlphaFold

The 2024 Nobel Prizes in Physics and Chemistry were awarded to AI-related research. This article discusses the significance of this milestone.

2025-10-206 min

GPT-4oVoiceMultimodal

GPT-4o Native Audio: New Heights in Voice Interaction

GPT-4o's native audio capabilities make voice interaction more natural. This article tests its performance in real-time conversation and emotional expression.

2025-10-157 min

AI AgentAutoGPTAutomation

Autonomous AI Agents: From AutoGPT to Practical Use

After the AutoGPT hype, a new generation of more practical Agent frameworks has emerged. This article analyzes the evolution of this field.

2025-10-088 min

Fine-tuningLoRADoRA

LLM Fine-tuning 2025: New Methods Beyond LoRA

DoRA, PiSSA and other new fine-tuning methods surpass LoRA in efficiency and effectiveness. This article compares various fine-tuning techniques.

2025-09-289 min

o1ReasoningOpenAI

OpenAI o1-preview: The Debut of Reasoning Models

OpenAI released o1-preview, enhancing reasoning capabilities through reinforcement learning. This article tests its performance on math, coding, and science problems.

2025-09-208 min

MistralEuropeReview

Mistral Large 2: Europe's AI Pride

Mistral released Large 2, excelling in multilingual and coding capabilities. This article reviews and compares it with Llama 3.1.

2025-09-127 min

InfrastructureSystemsOptimization

AI Infrastructure 2025: From Training to Inference Optimization

As model scales grow, AI infrastructure faces new challenges. This article discusses training cluster optimization and inference service architecture.

2025-09-0510 min

HallucinationRAGReliability

LLM Hallucination Mitigation: Progress and Limitations

Hallucination remains a major issue for LLMs. This article reviews retrieval augmentation, fact-checking, and confidence estimation methods.

2025-08-288 min

EssayIntroduction

Hello World - Welcome to My Blog

This is my first blog post. Here, I will share technical articles about AI, machine learning, and software engineering, as well as personal learning insights.

2025-03-282 min

Get notified when new articles are published