Emilian Bold's blog: September 2025

Monday, September 29, 2025

Building a Claude Code Plugin for NetBeans: An Early Look

I've started working on a personal project to integrate Claude Code directly into the NetBeans IDE. It's still in the early stages, but there's enough progress to share a look at how it's taking shape.

As someone who contributed to NetBeans in the past (starting with Sun Microsystem days, up to the early Apache Software Foundation years and with my old OpenBeans distribution), it's interesting to be tinkering with the platform again in this new context of AI-assisted development.

Current Progress

The basic integration is working. The plugin can now:

Respond to most tool calls
Track and send your current code selection, updating Claude when you change it

This provides the foundation for contextual conversations about your code. Most of the core MCP (Model Context Protocol) tools are implemented, allowing Claude to interact meaningfully with the project workspace.

Interestingly, I'm building much of this with Claude Code itself. While I handle the architecture and inevitably fix things when they go off track, it's been a practical test of using the tool to build its own integration.

Current Focus: Refining the Integration

Right now, I'm focused on the finer details of the MCP protocol implementation. The public documentation covers the concepts well, but getting all the JSON schemas precisely right for a robust integration requires some careful attention.

What makes this particularly interesting is that unlike many modern protocols, Claude Code's MCP flavour isn't fully open—and neither are the official plugins for editors like VSCode and IntelliJ.

To help with development, I created a WebSocket proxy library—ironically, with Claude's help—which has been useful for observing the data flow and debugging the communication layer.

Looking Ahead

This remains a side project, driven by personal interest in both NetBeans and AI tooling. The goal for now is to create a solid, functional plugin that I'd be comfortable using.

If you're a NetBeans user curious about AI-assisted coding, I'd be interested in your thoughts. What would make a tool like this most useful in your workflow? I'm continuing development and will share updates as there's more to show.

If you're curious about the code or want to follow along, the project is on GitHub.

Friday, September 26, 2025

Scaling AI Workloads: Using Linux FS-Cache to Serve Giant Models from Network Storage

Working with multi-gigabyte LLM and diffusion model files presents a practical challenge: local storage is fast but limited, while network storage is capacious but slow. This is especially true for compact workstations like a Mac Mini or a laptop with a small SSD, where fitting several large models is impossible.

What if you could get the best of both worlds—the speed of local storage for active models and the limitless capacity of a network share? Instead of copying files back and forth, we can use a transparent caching layer built right into the Linux kernel.

The Bottleneck: Network Latency vs. Model Size

The standard approach of mounting a network drive (NFS or Samba/CIFS) containing your model repository solves the storage problem but introduces a performance penalty. Loading a 10GB model over a network, even a fast one, can cause significant delays. This slows down experimentation, hinders rapid model switching, and creates a frustrating development cycle.

The solution isn't to fight the network but to smartly use local storage as a massive read-cache for the remote filesystem. Enter FS-Cache.

FS-Cache: The Hidden Gem for Accelerating Network Filesystems

Linux has long included a powerful, filesystem-agnostic caching layer called FS-Cache. Originally designed for environments like NFS, its utility for AI workloads is profound. The concept is simple:

The first time a model file is read from the network, it is silently stored in a designated cache on a local disk (ideally an SSD).
Every subsequent read for that file is served directly from the local cache at drive speeds, bypassing the network entirely.

This means the second time you load "llama2-7b.Q4.gguf," it feels instantaneous.

Implementation: A Two-Step Setup

The beauty of this system is that it requires no changes to your applications. PyTorch, TensorFlow, or any other tool that reads files will transparently benefit.

Step 1: Configure the cachefilesd Daemon

The kernel provides the caching engine, but you need a userspace manager for the cache directory. This is handled by cachefilesd.

Install it: sudo apt install cachefilesd (on Debian/Ubuntu).
Edit /etc/cachefilesd.conf to point dir to a directory on your fast local drive (e.g., dir /var/cache/fscache). Ensure it has enough space for your active set of models.
Start and enable the daemon: sudo systemctl enable --now cachefilesd.

Step 2: Mount the Network Share with the fsc Flag

Now, mount your network share containing the models with the special fsc (filesystem cache) option.

For a CIFS/Samba share:

sudo mount -t cifs //ai-server/models /mnt/models -o username=user,password=pass,fsc

For an NFS share:

sudo mount -t nfs ai-server:/models /mnt/models -o fsc

That's it. Any file read from /mnt/models is now eligible for caching.

Pro Tip: Pre-Warming the Cache for Instant Results

While the cache populates naturally during use, you can pre-load specific models to eliminate the first-load penalty entirely. This is perfect for preparing a model before a demo or a critical training run.

Simply read the file through the mount point:

cat /mnt/models/llama2-7b.Q4.gguf > /dev/null

This command will pull the entire model file through the kernel's FS-Cache layer, populating the local disk cache. The next time your Python script opens that file, it will be read from the local SSD at full speed.

Conclusion

Using FS-Cache transforms your workflow. It allows a small, fast local disk to act as a high-speed front-end to a vast, centralized model repository on a network server. This setup is not just for media files; it's a pragmatic and powerful solution for managing the growing size of AI artifacts, making it easier to scale your development environment without upgrading every machine's storage.

Emilian Bold's blog