Monday, November 01, 2010

MySQL as the first NoSQL database

I was watching yesterday a presentation at GeekMeet Timisoara about how to scale your websites and all the MySQL-related advices were as expected, but upon thinking about it, entirely agains normal database mantras:

  • You should de-normalize the database (copies are easier to access and cheap)
  • You should disable transactions (ie. use a MySQL storage engine that isn't transactional).
  • MySQL Master-Slave replication (which is asynchronous!)

This made me realize that MySQL is successful precisely because of all the things I discredited it for.

You see, in my book MySQL was never a real database because, initially, it wasn't transactional. For the project where I could choose the database, I picked PostgreSQL and I always used MySQL knowing in the back of my head that, in fact, it was a bit of a sham.

But this might just be MySQL's good fortune: by providing a simple storage engine with some SQL front-end, they proved that most people don't need ACID compliance.

Furthermore, as more and more people need to scale their applications horizontally (since it's cheaper and because... Google does it), they need even less of an actual database.

NoSQL was a movement that started after people got tired of the constraints of SQL databases and started thinking about what do they really need when storing data. It was liberating to see that one must not assume from the start that "external data==database" and actually put some though into the specific needs of their application.

And by being such a lightweight and unconstrained implementation, MySQL is right here, still serving the needs of people that want to scale out.

MySQL was basically the first NoSQL database. By relaxing what a database must provide, they proved in the long run that this is what people actually need. So besides the pure NoSQL tools like the various key-value stores they are building nowadays, MySQL could very well remain the most used place to store your data precisely because it allows you to pick which of the database-specific features you actually need.

Thursday, October 28, 2010

NetBeans Ideas

Auto update must become OS-aware

This means that on Linux auto update is entirely apt-get based (or whatever mechanism the distro has).

On OSX we might use something like Sparkle.

The NetBeans specific auto update implementation should be just a fallback plan. Having it use BitTorrent too would be nice (see my experiment regarding this).

OS-aware notifications

The custom notification mechanism and popup should be replaced by the OS notification, if available. This means using Growl on OSX and whatever Ubuntu has nowadays.

Versioning

Git support should be part of the official release: help these guys make it happen!

Mercurial Queues and 3 way diff would also be a nice thing.

BTrace

Btrace should be bundled with NetBeans and integrated with the existing debugger and profiler. I want to either use the manual debugger/profiler, run normal BTrace scripts or control the debugger or profiler via BTrace scripts! This means a Debugger/Profiler dedicated BTrace API.

Out of process indexing

Indexing takes way too much CPU/memory and should be moved outside the main process (think Google Chrome Multi-process Architecture) since it triggers ugly memory spikes. The design is also kinda broken: preindexing needs almost a full IDE launch during build.

I'll try to expand some of these ideas into dedicated posts.

Monday, October 11, 2010

Why don't you have personal projects ?

I've been reading a lot of CVs and did some interviews with young folks that are either about to finish University or just did (and some are already preparing for their Masters' degree) as I'm trying to fill a position at Joseki Bold SRL.

What strikes me as unusual is how few personal projects do most of them have. And I'm not talking here about A students that barely have enough time to learn for school and do the teacher's projects. I'm talking about normal students that don't seem to have very high grades, nor work to earn a living and yet they also don't have any personal projects to talk about.

Computer programmers are lucky. Unlike other professions, we can easily afford to buy the top-level tools and have free access to a lot of information to learn about our trade. A physics student can't really buy his own particle accelerator but by all means any student already has everything the best computer programmer in the world has: a PC and access to internet. That's all there is!

And if you care about computers there's also the University resources. For example, if you want to play with a cluster -- the University has one. Or, Amazon's EC2 machines are cheap enough you can experiment a bit if you are really passionate about.

This might be the gist of it: you need to be passionate about it.

I remember an old Eddie Murphy from the 1980s called Coming to America where Eddie is a prince that has everything, including a gorgeous groups of half-naked women as his personal "bathers". His father has a nice line somewhere in the movie:

Son, I know we never talked about this. I always assumed you had sex with your bathers. I know I do.

Tuesday, August 17, 2010

Magical moments

Noticing after I've read Douglas Adams' The Hitchhiker's Guide to the Galaxy that on a furniture in the kitchen lies the answer to the ultimate question:


Sitting one day in the almost empty 700 Coffee & Lounge and hearing the eerie Twin Peaks Theme by Angelo Badalamenti:





Cleaning up yesterday the same old closet with 42 and finding a bottle of Suntory Gold whisky. Lost in Translation is my favorite movie and Suntory is the brand of whisky Bill Murray was endorsing in it:

Wednesday, June 30, 2010

Source code hosting sets pricing all wrong

Where's the love?


The way GitHub structured their pricing plan looks like a way to punish long-term customers.

They don't charge based on how much value they are providing: they just charge based on how much you must be willing to pay after your data starts gathering there. And they are not alone -- most other do the same wrong customer segmentation tricks.

Many small projects


It's good form to have separate repositories for separate projects, so with each new project hosted on GitHub you would create a new private repository.

Well -- pretty soon you will run out of private repositories so you'll need to upgrade to a new plan.

If you look at their pricing plans they are for 5, 10 and 20 private repositories, and then you get into the over $100/month business plans.

Am I silver business or micro ?


If I look at my own server, I have 34 mercurial repositories dating 2 years back alone. Of course, some are big, some are small, some represent my own ideas while other are repositories for client projects.

But do you know how many am I actually using nowadays? The answer is 4.

So according to this I would need to be on a silver business plan with $100/month. But what is the value they are providing? It's the equivalent of $7/month of the micro plan, plus the value of having your old projects archived and available. Now, I wouldn't say the value of archiving old projects is $93/month.

Metered please


My solution is metered code hosting.

Amazon's EC2 might have spoiled me but I like to know that I'm paying for what I am actually using.

So, how do I see a sane pricing plan ? Well, there are 3 axes to look at: it's the disk I'm using, the bandwidth I'm using and then the actual hosting extras I'm accessing like online source code visualizer, wiki, merge tool, code review or whatever. If you think about it, the "extras" I am talking about might be seen like the CPU-time of running their software.

Now, all the trick is setting a right storage/bandwidth/CPU price.

Storage can't be more than 130% of the S3 price, meaning about 20 cents/GB ($0.20)

Bandwidth can't be more than 130% of the EC2 data transfer, meaning again about 20 cents/GB ($0.20)

Setting a price on the CPU time is interesting as this basically tells you how they value their product. It's impossible to guess but they would have to set the number pretty high to make normal usage exceed their current pricing plan.

... or the simpler change


The other notion they could introduce is that of active repository.

If I am pushing changesets to a repository, editing the wiki it's pretty safe to say I should pay for that repository.

But if I haven't touched the repository in any way for the whole month it would sure be nice to charge me only for the storage (or nothing at all if we see storage as "unlimited").

Thursday, June 24, 2010

Compiling is such a chore

I'm using Hudson as my build server and I would love to patch some things about it, especially the JUnit reports and charts.

Well, one of the reasons I dislike getting to this small change is that I would first:
  • need to checkout Hudson,
  • then figure out how to build it,
  • then do the patch,
  • then compile it and finally
  • start using the changed Hudson.
Thus, there are quite a few things that stop you from doing the smallest changes, and I would say the biggest culprit is that you have to compile the code. In a scripting language:
  • I would not need to checkout anything as the installed sources are everything I need.
  • There would be no "build" rules.
  • The patch would be done in-place.
  • There would be no "compilation" step and
  • There would be no "deploy" step so I can start using the new Hudson right away.
So while I dislike PHP, for example, as it seems too easy to break anything, having a strong typed, compiled language does hinder the desire to do small changes.

Imagine how easy would be to keep your changes in a separate branch (or Mercurial queues) and just rebase once in a while with the upstream codebase, tweak your patch a bit and have the latest running version as well as your changes.

Having everything pluggable is nice, but sometimes it would just be faster to edit the source.

Tuesday, June 01, 2010

Forget the removable battery, what about the easily removable hard drive? (Get well soon, trusty mac!)

Get well soon, trusty Mac

Last Wednesday my MacBook Pro's display stopped working. Actually, it might be the logic board since the fans do seem to start but nothing else happens: it needs to be sent to an Apple Service. (It could also be that wide-spread NVidia problem MacBook Pros had, who knows).

Anyhow, I had to migrate some data to a new machine I received this morning.

I have bought about a month ago an Intel SSD hard drive so I already knew how to dismantle the laptop. This time I just had to swap the hard drive of the replacement machine with my own SSD drive and I was back to work. Well, one hour later anyhow.

User serviceable

This whole experience made me think how convenient it really is to have user-serviceable components. As laptops basically replace desktops, it's important to be able to access hardware in your laptop.

Actually, not everything is important, there are 2 big things that matter: RAM and hard drive. RAM access is just a nice to have feature since adding more RAM is the best upgrade one could make. CPU and GPU would be nice to have, but not high on my list.

Hard drive access is crucial though, because your work isn't actually on the machine itself.  Your work is just the hard drive. Having swapped the drive into this new laptop I'm back to work just like it's the same machine (if you ignore the annoying German keyboard).

So, it's a bit weird to think about these new MacBooks that are unibody and seem to be harder to dismantle. Taking your machine to a service for a battery or hard drive replacement is odd: in Timisoara this means I have to send it via post 600km to Bucharest, and it's not cheap either.

Plus that all this doesn't take into account the importance of data: I wouldn't want to send my laptop with the work data on it. Even if I were to use FileVault, there is something unsettling about knowing your data is exposed like that.



The right to private data

The right to private data should be as important as the right to privacy. Just because people buy a laptop they shouldn't give away the right to private data. I don't really need a replaceable battery, I don't mind Apple keeping the old battery if they need to replace it. But I need a replaceable hard drive.

So, instead of a removable battery, laptop manufacturers should actually make a removable hard drive. In a few easy steps any user should be able to pop-out his hard drive and then send the empty laptop shell to service worry free.

Sunday, May 23, 2010

Re: How Could the NetBeans Team Make Money from the NetBeans Platform?

This is a reply to Geertjan's blog which wonders how could Oracle monetize the Platform: How Could the NetBeans Team Make Money from the NetBeans Platform?  .

The first thing I would like to see is the NetBeans Foundation which would be a central authority that cares about the future of the NetBeans Platform (and IDE, actually -- it makes no sense to have the Foundation just for the Platform).

Because when NetBeans was under Sun, the Platform wasn't seen as something worth monetizing. Under Oracle, we worried weather they will pull the plug or not (given Oracle has their own IDE and supports Eclipse too).

So, the first thing would be to have an actual entity in charge of this -- something legal, not some website or imaginary construct. This entity would want to get our money and will support itself in various forms: donations, support, stakeholder fees or various subscriptions.

Of course, we need some actual backing so we would still need actual companies on board: stakeholders.

Some of these stakeholders might need to pay, but I imagine it will be indirect: they will pay for developer time. Just as lots of companies employ developers to work on the Linux kernel, tools companies will employ developers to work on parts of the NetBeans Platform or NetBeans IDE.

Having a simpler and more modest entity in charge will also allow an ecosystem to form around the Platform and the IDE. I'm not sure Oracle will list my small company as a source to get official support for the Platform, but I'm pretty sure the NetBeans Foundation would (just as there are many companies offering various services around Postgresql, for example).

Now, the big question isn't how should NetBeans make some money. There are surely many ways: I'm working full-time just doing NetBeans Platform-related projects. Many other people are doing a living doing trainings or programming. My questions are:

How much does NetBeans actually cost and would we get enough stakeholders ? Except Oracle, who would get on board to pay either cash or developer time to keep NetBeans going? Because if only Oracle pays, they will be reluctant to allow the Foundation happen (actually they still might, for tax purposes). If NetBeans is a loss leader, can Oracle really afford losing total control ?


Any other solution that doesn't include the Foundation doesn't really interest me as I don't think NetBeans is making Oracle poor. They can always try to get as much money via training, support and other ways and just cover the difference out of the pocket to have their own IDE which may be seen a loss leader for other Oracle products (for example JavaFX).

Plus that now that Oracle owns Java and leads the JCP so they will always need some IDE to provide the reference implementations on: might as well be the OSGi-fied NetBeans.

Monday, April 19, 2010

EC2 as a build server

I've been using for the past year or so a Slicehost virtual private server running Ubuntu Linux to run a build server.

Due to the inherent IO-bound nature of some of my builds and the RAM starved nature of the servers sold, I've been forced to upgrade from the 256MB to the 512MB and then to the 768MB slice. Not sure if it's a marketing ploy but you cannot use the server otherwise.

Starting last week, I'm running experiments on migrating the builds on top of EC2 (and S3 for storage). Using EC2 for a build server, especially for a small company is a perfect fit:

EC2 machines are way more powerful

The smallest EC2 machine has 1.7GB of RAM and the next one 7GB. These are serios machines.

Builds are finite

This might not apply for your projects or your company, but I generally do a few operations per day that would trigger a build.

This means that I actually only need the server for, let's say, 5 builds per day or less. Over 20 work days, I would actually use the build server for 100 builds per month.

So I am actually paying for a server to be live all the time when I only need it for 100 builds. Assuming a build takes about 1 hour (which is does for the longest project I have), I only need the server for 100 hours per month.

It's cheaper

Considering the previous paragraph where I noticed I only need the server for 100 hours, it's cheaper to pay for the EC2 hourly usage. Of course, running the EC2 server full-time would be a lot more expensive compared to Slicehost, but I don't need it full-time.

Thus, it's cheaper either to give up Slicehost altogether or to have some mixed scenario perhaps, with a much cheaper Slicehost server combined to an EC2 slave running on demand when needed. I'm slowly migrating to the mixed scenario first.


But there are also some clear advantages to using EC2:

You really do a clean build

While with a normal build server that's configured properly this problem doesn't show up, it is possible to be there: tainted builds. A tainted build is one that's using some form of unexpected binaries for various reasons.

When you build on a fresh machine there is nothing there to influence your build. Just the operating system, your tools and your code.

It forces you to take out to magic out of the build

When you start with a bare-bone machine you cannot make any assumptions you would unknowingly do on the build server.

On an always-live build server you can easily ssh and do some manual tweak which will remain there forever but never be actually documented.

This style of whole world building will force you to document and produce all the build dependencies.

Some first results for my most IO bound build


It finishes:


After 30 minutes with the 512MB slice, (but that started slipping for some reason, hence the upgrade)
After 20 minutes with the 768MB slice.
After 25 minutes of uptime with the EC2 m1.small
After 11 minutes of uptime with the EC2 m1.large
After 15 minutes of uptime with the EC2 m1.large, building everything over a RAM-disk.

The surprising thing here is that on EC2 m1.large, where I have over 7GB of RAM, a RAM-disk is slower. I assume the reason is that Linux uses the RAM for disk-cache anyhow and it's smarter about that (ie. by only caching the JARs not the whole source and build folder like I did).

The build over EC2 m1.small seems to be a bit slower but this total time is uptime. Meaning in the 25 minutes I install all the tools and download and unzip more than 1GB of dependencies and do the build.

Friday, April 16, 2010

Fremen gear

Ever since I've been working for my own company, I've discovered that working from a single place gets pretty boring after a while. Actually, there are a few phases you go through, but suffice to say at some point you'll want to also work in coffee shops at least for the change of scenery.

Let's talk now about the gear I happen to use:


Then, a whole lot of other items I was a bit surprised to find when I unloaded the backpack yesterday to wash it:


Let's see (right to left):

  • My Orange 3G modem. I rarely need it while in Timisoara. While visiting my parent though it's almost useless since it goes over EDGE, meaning it's slower than dialup.  Handy as a last resort but I won't renew the subscription with Orange when it expires.
  • My 8G iPod Touch, USB cable and iPhone headset (I use this headset since it also has the microphone, unlike the original iPod headset). I almost never listen to music though, it's just for testing if we have an iPhone project to develop.
  • Nokia E71 USB charger (you can barely see it since it's black -- right next to the iPod cables)
  • My 2010 diary
  • A fountain pen -- I like writing with a fountain pen, a ballpoint pen ruins your handwriting.
  • A ballpoint pen
  • Lots of wet tissues (including one from KFC apparently).
  • Matches
  • Some pills
  • A key I forgot about
  • A Wenger Swiss Army knife (thanks dad)
  • Company stamp. I don't carry this always, but you need it for almost everything company-related here.
  • Bureaucratic papers.
  • A small yellow note book for quick note-taking
  • Some leftover sugar from some coffee I bought probably. Forgot about these -- it's very easy to lose stuff in the backpack.
Besides these items, sometimes I have in the external pockets a small umbrella, tissues and perhaps a 500ml bottle of water.

There you are: the gear of a modern day fremen. Stillsuit not included.

Monday, April 12, 2010

iPhone OS notes

Apple news flooded the Internet during Easter. After having watched the iPhone OS 4 keynote I have a few remarks and questions. Feel free to comment if you have anything to clear up for me.

iAd

Jobs sells the penalty of clicking ads since that closes the app and launches Safari. But since he just introduced multitasking, this penalty is greatly reduced or non-existent. After all, if the user doesn't know how to return to the app after clicking an ad there is something really wrong with the multitasking user interface.

Push services

Things that make you go hmm: apparently Apple has a direct link to each iPhone via push services. I never used this API but I can't but wonder how does this thing really work (expecially via 3G-only).

Also, I assume that notifications aren't encrypted. That should be another data-mining opportunity for  Apple for iAd.

Wireless sync ?

iBooks bookmarks sync wirelessly. What does that mean ? Is there some Apple server that gets this data no matter what ?

That's about it. I won't comment on the new programming language restriction they apparently introduced since it's been said enough already.

Thursday, March 18, 2010

Almost saw Freddy Mercury in concert

I regret not seeing Freddy Mercury from Queen in concert but I was too young when they were touring and not yet born during most of their prime years.

Last night I witnessed a ballet show on Queen music and it was as close to a Freddy Mercury show as I could ever get.  A truly exceptional experience !

Thursday, March 11, 2010

Bread and circuses

Yesterday while zapping the TV shows I've observed what the true purpose of all the political talk shows and talk show hosts is: they are the modern day circus presided by a modern day jester.

For you see, the jester has a very important role: it diffuses the public negativity.

As a ruler or ruling party that also does questionable social or economical decisions you don't really want people on the streets. You don't want resentment to grow within people. So - the jester is actually something you need. Of course, they may sting a little but remember: sticks and stones...

Wednesday, March 03, 2010

The default NetBeans IDE java source template is polluting the web

People will never bother to do anything manual unless absolutely necessary. This is why I believe the current NetBeans "empty" java file template is fundamentally broken.

It tries to "teach" people how to change the template by inserting in the file header something like:

/*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

This might sound like a great idea in practice but it's broken since most people won't change it. So it just becomes line-noise that will get published, committed to VCS, etc.

A good UI would display that message differently, like a floating non-modal dialog, or some notification in the New File Wizard, but it shouldn't produce actual text that is part of the source code file.

Google seems to say about the same: there are 321.000 instances of files indexed by the search engine containing that header. And this is only on the public web, I bet there are many more closed-source code repositories filled with these lines...

Tuesday, March 02, 2010

Reading poetry is hard

I have this book with all the poems of a great Romanian poet, Nichita Stanescu and I've started reading about a poem every day.

I like his poems because they are very imaginative and visual in an unexpected way. I see them as surrealist descriptions, as if someone would put a Dali painting in words and add some more emotions to the mix. Of course, this is just how I see them, I never bothered to read the actual critics review of the poet.

The other day I was reading a nice poem called "Rain in the month of March"  ("Ploaie in luna lui Marte" in original). I can't find a proper translation but this is the first one a search returns.

Well, this poem also became the lyrics of a famous Romanian song by Paula Seling: listen it, it's quite nice.

Now, reading this poem made me realize two things.

First, I couldn't separate the song in my head from the poem. I couldn't read the poem, I was always hearing the tune of the song. The song taught me the only way I could ever read that poem.

Second, I'm probably not good at reading poems. Because the song made me see how beautiful the poem is -- I would have just read it, imagined an interesting visual imagine and be done with it. Reading it would have never shown me how good the poem is, it would have just been another specific Nichita Stanescu poem. But the song showed me it's an excellent poem.

I wonder how many other great poems I've already missed because I couldn't read them the way they are supposed to, with the right state of mind and the right visuals.

Reading poetry is hard.

Tuesday, February 23, 2010

Nobody reads the fine print (for mobile widgets anyhow)

I've gotten used to skim the legalese things that you agree to upon any account creation, etc. You know, those things that have a textbox next to them and the submit button is disables until you check them.

Anyhow -- here is a nice piece from jil.org, a developer portal for mobile widgets. I used to do Konfabulator (now Yahoo) widgets long ago and I though I should see what's with these new "mobile" widgets. (The terms PDF is here ).
   7.  JIL’s right to use User Content
         1. With the exception of personal information, you hereby grant JIL a perpetual, unlimited, royalty-free, worldwide, non-exclusive, irrevocable, transferable license to run, display, copy, reproduce, publish, bundle, distribute, market, create derivative works of, adapt, translate, transmit, arrange, modify, sub-license, export, merge, transfer, loan, rent, lease, assign, share, outsource, host, make available to any person or otherwise use, any widgets or other content you provide on or through the Developer Site or which you send to JIL by e-mail or other correspondence including, without limitation, any ideas, concepts, inventions, know-how, techniques or intellectual property contained therein, for any purpose whatsoever and in any manner (“User Content”). JIL shall not be subject to any obligations of confidentiality regarding any such information unless specifically agreed by JIL in writing or required by law. You represent and warrant that you have the right to (a) grant the license set out above and to (b) upload the User Content on the Developer Site. You acknowledge that the license set out above includes the right for JIL to make the User Content available to a Sponsor and other entities with which JIL has contractual arrangements for further distribution of the User Content.
Now, this paragraph starts like a standard paragraph on many sites where they basically say the want to be legally allowed to display your product. So they need to be able to host, distribute and market the widget. Sounds nice. Obviously their partners need to be able to do that too. Ok, pretty standard so far...

But here is the interesting part: they want to be able to modify your widget and create derivative works! So this is not the standard "hosting" agreement -- you are actually granting them a license on your source code to do as they please. If you combine this with the fact that every agreement of this kind (and obviously JILs too) has some indemnification clause, you get an interesting situation.

An interesting software would be one to track all these agreements. I think that by the end of your lifetime, it would be pretty scary to look at the dependency graph all these agreements have created between you and sites/companies, partner companies, merged companies, owners of bought companies.

In other words, if we define this "agreement distance" in the spirit of the Erdos number , I'd say that over a 30 years span the agreement distance becomes 1 for any reasonably active individual with any other major company or website.

Friday, February 19, 2010

OSGi has won

Although NetBeans' module system was on-par with OSGi, greater industrial support meant OSGi always looked like a better pick to outsiders.

Starting a while back I saw OSGi as the clear winner. Especially when Glassfish 3, an major Sun project picked OSGi instead of the NetBeans Module system, it was obvious OSGi was winning even inside Sun (although they were reluctant towards giving OSGi a too big stake in the upcoming Java 7 module system).

In the meantime NetBeans is getting native support to run OSGi bundles as well as getting ready to run inside an OSGi container.

Oracle wanted a common IDE extension API starting at least 2002 when they sumitted JSR 198.

Now, owning Java and NetBeans itself, they have it really easy to define the roadmap for both.

I estimate that NetBeans will be able to run inside an OSGi container by the end of 2010. We'll also see official NetBeans plugins distributed as OSGi bundles instead of NetBeans modules. In the end the NetBeans module system might become a deprecated subsystem.

--
Note: This is just my take at technology analysis. I have no inside information obtained via my NetBeans Dream Team membership or from Oracle.

Monday, February 08, 2010

Slicehost as a build server

I'm using a Slicehost server for over an year now to host my build server and my slice seems to be getting slower.

Here is the graph for one of the projects:


The build initially took about 30 minutes, then I had a period where I jumped to about 70 minutes. This lag was entirely my unit-test and after some refactorings I took is down again to about 30 minutes, which is decent.

Now take a look at another project:


The difference is that the project took all long about 40 minutes, and now I have spikes of 3-4 hours !

What's the catch: well, the first project is taking so long during unit tests since I have a lot of GUI tests, where the code has to sleep and give the interface time to repaint, etc. So, although the time is 30 minutes, it's mostly waiting for the GUI (inside an Xvnc instance) to paint.

The second project though does a massive build where I just produce JARs and don't run any unit tests. It's massively IO-bound.

So, the way I see it, in the past 2 months or so the machine I'm running my VPS on, has been getting slower doing IO request.

IO has always been a problem with my limited VPS experience. First, I got rid of this by moving from the 256mb slice to the 512mb slice since apparently I was just trashing the swap file.

But now, I'm not so certain it's a RAM issue anymore. The 2nd compilation just needs to touch the disk so it doesn't matter how much RAM do I add, after the minimal amount needed for ant and javac.

I'm starting to think I should move the build server onto an EC2 instance. This way I could use a smaller slice just to run Hudson, but do the actual building onto a bigger EC2 instance. I'm not certain it will be cheaper though.

Later edit: The discussion here is continued with my post about using EC2 instances as build servers.

Thursday, January 28, 2010

Oracle and Java

Since using Google is faster compared to loading the saved Javadoc, I always read the JDK Javadoc online. There was something odd today - the favicon (what's a favicon?) looked a bit off: it's a red square with a white O in the middle.

Heading off to http://java.sun.com/ I see that the header is different. It says "ORACLE: Sun Developer Network (SDN)". So that's where the red favicon comes from !

Well, I guess it will take some getting used to. I don't really know anything about Oracle, but I really liked Sun's logo much more.

Monday, January 18, 2010

Linux: The last 10% will take another 10 years

Note: The blog post bellow was written on the 6th of December 2007, but I never published it. It seems to still be valid today and given that the laptop I'm talking about went dead and was sent for repairs last week (but most likely they won't be able to fix such an old model), I'm finally publishing it now as a remainder of what that little machine had to endure :-)



Everybody in the Linux world will tell you that Linux has GoodEnoughTM hardware support. Meaning of course that all the good stuff is missing but that your system is fairly functional.

Which is fine ! I mean, as a programmer why would I want to squeeze 100% of my machine ? We can always buy another one which will be even faster but, due to the "LackOfLinuxDriver compensating factor", will fell just like the old machine would have felt with proper drivers.

That is, bad hardware support makes you feel on today's hardware as if you are using a refurbished machine.

Ok, enough ranting. The reason I'm evil in this post is because Linux killed my laptop's battery.

I have an old Dell C840 which had a battery that only lasted about 30 minutes. So I spent about a quarter of the laptop's current value on a new battery (the Dell has two battery bays which is kinda cool).

So now, with the new battery I had about 4 hours off the grid, which was acceptable.

That is, until one night when I just let my laptop with the lid closed but, somehow, it didn't suspend/hibernate as it normally did. Instead it kept on going and going and going upto total and 100% battery drain.

Strangely, the old battery (Li-Ion), after being at 0% and about 16 hours of charge recovered and it's now back to the regular 30 minutes.

But the new battery, after days of charging is dead. Hello LI-Ion deep discharge ! Now, why in the world didn't Linux shutdown my machine when it was clear the power was running out ? Or course, some weird hardware support-related fluke. Do I care ? No: my new battery was killed.

This is sadly a losing position for Linux as they can't write those drivers in some situations (due to so called evil corporations) but it sure makes me mad to use less then what my hardware has to offer...

Another example: Linux (Ubuntu) on a MacBook Pro. It's like the ugly duckling ! The scenario is like this: I use OSX with the nice fonts, Expose, I reboot, I select Linux and then I see the horror. It's as if the machine was reduced to a cheap NoName laptop: the image doesn't "look" good, you see pixelated things, then you see ugly fonts, then you see windows that barely drag/refresh due to lack of a proper driver, then you install the binary-blob-driver from nVidia and notice that the "effects" don't hold a candle to OSX' experience (and some more pixelated stuff).

It's this last 10% threshold that I'm talking about here. Sure, hardware works 90% of the cases, but I sure would like to know I'm using the full power of my machine. Gnome does give you some GUI but I sure would like to see fonts I can stare at for 10 hours a day or some decent effects that aren't there just to show what OpenGL does.

In conclusion: I really like Linux and the reason I bought a Mac was because I needed an "unix" with proper hardware support. Because Linux still doesn't have a place as the main machine.

Linux on a headless server ? Sure ! Linux on my main machine I have to stare at the whole day: not if I have a choice.

Thursday, January 14, 2010

I wonder how much does AllegroGraph cost

Although I'm not a big RDF user, I did notice that some SPARQL queries take some time on my machine so I cannot but ask myself how much faster would it run using AllegroGraph ?

Franz Inc does provide a free edition that's limited on how many triples you may store so at some point it should be easy to run some benchmarks.

But -- how much would the AllegroGraph enterprise license really cost to get rid of the triples limitation ?

Like any company that is (or thinks it is) selling an expensive item, the price is not listed, all you are given is a phone number.

I wonder how many customers are they losing this way because people assume the product is way more expensive then it actually is. Because I won't pay 0.5 million dollars to get the enterprise license. Then again, what do I know, it might be 5 mil plus  :-)

Monday, January 11, 2010

Personal growth a purpose in itself ?

2 or 3 years ago I used to read some of an internet-famous person's blog posts. They were mostly economical and entrepreneurship related and I liked the way they were written.

Ever since, I stopped reading his blog since his personal growth "road" has taken him into some strange areas I don't really care to follow.

For example, there were some traces of some kind of mysticism, and then he decided he should separate from his wife, then try polyamory and this year he's going into BDSM !

I know the last part since I re-open once in a while his link to see if something interesting might pop-up. Imagine my surprise when I read his latest blog post...

Of course, it could also be some cultural blocks that deny me to see the "value" in what he's is trying to achieve, but I think at some point personal growth might be able to turn malign.

Humans aren't really built for infinite growth given the simple limitation that people die. So, it might be that trying too much to "grow as a person" leads to desensitizing towards normal life. Which means that it's possible to start using more extreme "personal growth" experiences to make up for it.

Go see Avatar

Yesterday I watched Avatar in a proper cinema with 3D glasses. The experience was almost surreal and while I had already read the book long ago, the adaptation was decent.

But really -- the 3D part of the movie is where all the magic is. Well worth the ticket price !

Monday, January 04, 2010

No such thing as a bad technology

The human race will adapt to the tools and technologies it has developed. That's why the cell phone companies for example just have to play a long term game and wait: in time fewer people will be sensitive to their cell phone radiation.

It's just another level in the "adapting to the environment" game. Even if this time the environment is man-made.

Wednesday, December 09, 2009

Logging needs some lazy evaluation

If it's one situation where lazy evaluation is needed in Java, it's logging. Until something better comes up and we'll have logging injected via AOP or something similar, a log message will be just the result of an extra line in our Java files, and this is a problem.

A normal log message is something like this:

log.fine("Some parameter is:" + someVariable);

This looks quite harmless especially since we know that depending on the log level, our message might be saved or not.

But say we have an expensive function:

log.fine("The extra informations starts at" + reallyLongLastingFunction());

The problem above is obvious: the log string will be built no matter what and our reallyLongLastingFunction() will be called each time, including when the log won't actually be saved.

The solution to this is to pollute your code with something like:

if(log.isLoggable(Level.FINE)){
log.fine("The extra informations starts at" + reallyLongLastingFunction());
}

This way our string creation and expensive function call is done only if the log really is needed. But this adds extra boilerplate in the code as well as makes you maintain the log level (for example what if I change the line to log.finer -- I have to update the if).

If all the log methods would have lazy evaluation this problem would go away -- the code won't be executed until actually needed and there will be only one line in the code.

The AOP style is to inject the logging using bytecode engineering. Perhaps it would be nice to do a post-processing of the resulted JAR artifacts and replace all log calls with something that injects that if check, etc.

And speaking of memory and CPU wasted on logging, there's nothing like seeing that your biggest CPU user is caused by the increased log level while debugging. Debuggers should know how to filter log calls, including time spent in the string building itself otherwise they don't really help.

Friday, October 23, 2009

Official NetBeans build with Romanian localization

Head over to the NetBeans download website and notice the language popup also has Romanian now.

More info about the localization on the Joseki Bold dedicated page.

Sunday, October 11, 2009

Aversion towards localization a sign of technological barbarism

English is obviously the lingua-franca of everything computer and computer-science related. Having a single language does help everybody since it easily allows people to communicate and exchange ideas.

The side-effect of using English for everything computer related is that it decreases the focus on using the local language for computer-related discussions. Or, if the local language is used, it is filled with English words ! The more complex the discussion becomes, the more English is used until it becomes almost easier to use English full-time and just revert to the local language when some explaining is required using examples. I think this is the reason some multinationals revert to English as the official language -- for computer related workers, it doesn't affect the productivity, especially since people of different nationalities might end up working together.

Well, this means that while English has evolved to be a technology-centric language, most of the other languages either try to play catch-up or, most likely, don't run into the race at all and just import all the English words.

In my country developers, for example, dislike applications localized into the native language. The more technical the application is (like a developer tool), the more foreign it seems to them to see the text in the local language instead of English. Native words disturb them and metaphors seem weird when they make the cognitive connection: a mouse is actually a rodent ! I'm pretty sure English people also thought of rodents the first time they heard "mouse" in a phrase -- but this has been changed nowadays. Computers have become so ubiquitous that "mouse" usually means that computer peripheral. "Firewall" is not some fireman expression or a burning wall, but something computer related, etc.

By focusing so much on the English language and not allowing themselves to jump-start the native language metaphors related to computer science, developers are the main culprit of keeping the native language into a phase of technological barbarism.

An then, they act surprised when the local market almost doesn't exist and their parents can't understand a thing from computers and need help with the simplest things (usually they can't understand the language on screen, which is English).

Thursday, September 24, 2009

Green software

Long ago I wrote a blogpost where I was mentioning that for an always-on (wall-plugged) workstation, the latest (then) fad of lowering power consumption is not that essential since, as a developer, all you care about is overall machine speed to get the job done and the cost of power is negligible compared to the cost of on developer hour (and then rent, administrative overhead, etc.)

Well, that is one aspect. The other aspect is when power consumption is important. This is clearly a major factor for large datacenters where a big chunk of their cost is power (for the machines and cooling) so they keep a keen eye on performance per watt. The specifics of the business are different there: you don't have developers on top of each machine, but you have hundreds / thousands of machines providing some service to remote users. The cost of maintaining that datacenter determines the cost you sell your services to users and your overall competitiveness.

Another scenario I've personally noticed as of late (see my other article somehow related to this) is the importance of performance per watt when working on your laptop's battery !

Now, the overall system performance per watt is a given of the machine you happen to own. You can't actually tweak that very much, except some hardware upgrade here and there and operating system optimization.

So what you are left is the actual software you use everyday and its performance per watt. Let's call that productivity per watt. Lower performing sofware might exhibit different issues:


  • Consuming too much CPU
  • Hitting the disk too often. IDEs are notoriously culprits here when a normal clean build deletes and recreates hundreds of files on disk. Even some feature like compile on save doesn't save much since developers save often and all this disk writing might actually stop the disk from going to sleep (and thus consume less power).
  • Hitting the network too often or too much. No point in checking for software updates when the user is on battery. No point downloading that 200MB "update" file -- updates should consist of binary diffs and be as small as possible (Google is looking into this as it becomes very important the more users you have). Also, laptops generally use WiFi is they are not plugged in and (I think) that consumes even more power than ethernet.


These are all optimization issues but the main culprit is not scaling down when on battery: this includes being smart about redundant tasks like re-indexing the Maven repository or checking of non-essential upgrades that could be deferred to a time when you're not on laptop battery .

What we seem to be missing is a new metric to evaluate applications: productivity per watt and teaching users to pick applications the same way they pick an A+ energy rating fridge.

Looking forward to see which IDE uses less power to refactor a class or just to stay idle.

Monday, August 17, 2009

Wednesday, July 15, 2009

The most complex simple GUI: VirtualBox snapshot handling

It's amazing how the guys doing Virtual Box (purchased by Sun and now by Oracle) have managed to screw up so badly their snapshot mechanism and specifically their GUI allowing the user to handle it:

1. Let's start with the easy pickings: they support linear snapshots only. What did they select to display this? Obviously not a list, but a tree !

2. Their snapshot documentation is less than 1 page in the PDF help file. Out of this, half is spent explaining how to "take" a snapshot which is probably the easiest thing they support. A quarter of the page includes a scary note about possible data loss which references some VBoxManage interface which is some script that has nothing to do with the GUI.

Also, their documentation doesn't have a single screenshot or at least pictures of the buttons users are supposed to press.

In the remaining quarter of the documentation they briefly mention revert and discard snapshot.

At no point do they bother explaining how their mechanism works, you kinda have to read between the lines what the philosophy is.

3. When you press "Discard snapshot" it actually merges changes. So you're not losing data actually, you're just losing the little timestamp that the snapshot gave you. They couldn't have picked a more confusing wording and it's not what you would expect.

4. Next to the notion of "snapshot" they have a separate notion of "current state" or "state" which is somehow related but kinda different. I'm not entirely certain if "state" is something like a placeholder that I can fill with different snapshots or just the tip of the snapshots list. Their wording makes it sound one or the other.

Also, not only did they pick such a wrong wording and show such lack in clearly explaining what their application is supposed to do, but they don't even seem to find this important. A bug report on their website complaining about this very thing is marked as minor !

I guess this is a sign of some geek mentality. They assume that people must train using the uber-software and actually read and memorize the 250 page PDF (probably at some point it all makes sense). The sad thing is that the GUI is really simple and this minimalism should allow them to focus on the details, but a third of it is broken.

Obviously I'm not talking about their "Settings" GUI where you configure the machine, that's very complex.

I'm talking about the normal GUI the users gets to see every other time after the creation of the virtual machine which has 3 tabs: "Details", "Snapshots" and "Description". The only one the user will interact for the rest of the virtual machine's lifetime is "Snapshots" and this is the one they thought it's minor. Go figure.

Tuesday, March 03, 2009

Matrix thoughts

AI lemma (anti-Matrix):
We do not live in a simulation of a similar universe as a simulation will consume more energy than a real existence.

A corollary would be:
The universe simulating our existence might be so different that the above lemma doesn't apply.

Monday, February 02, 2009

iPhone Location Manager taking forever

On the iPhone, the Location manager that provides the GPS location is a nice API to use.

It does have some issues though: CLLocationManger doesn't work if it's called from another thread !

I first noticed something was really funny when my delegate wasn't being called at all.

Neither – locationManager:didUpdateToLocation:fromLocation: nor – locationManager:didFailWithError: was called and my application was just waiting there forever for some GPS information.

My first thought was that it was some issue with my memory management as I wasn't holding a reference to the location manager in any class, just in the method where it was created. But still, it didn't work.

Then, I though it was a problem about the threading model being used (I waited for the GPS location in another thread in order not to block the GUI). Sure enough, that seemed to be the problem, and at least another person complained about it. Not sure it is a matter of threading or a matter of memory pool being used.

But more to the point, always create your CLLocationManager instance in the main thread, and not in another thread. Having a singleton method there which is called from the main thread somewhere assures you that the location manager is created in the proper thread/pool.

Friday, January 30, 2009

Developer surprise on OSX

I had a strange bug in the OSX Address Book application: I had a rule that included all the address cards not present in any other rule.

This worked initially, but after an update, Address Book got confused and entered into an infinite cycle (it was probably trying to ignore the cards in the rule itself and then went on to resolve that recursively).

Anyhow, the good thing was the application crashed only if I scrolled on top of that particular rule. And since I had quite a lot, I was safe to open the application at least.

But, still, having a semi-buggy application isn't fun to use. So I went and looked at the Address Book file format which seemed to be some sqlite3 database, but I couldn't fix the problem from there.

To my surprise Apple has a public API for the Address Book !

So I wrote these short lines of code:
    ABAddressBook *AB = [ABAddressBook sharedAddressBook];

NSArray * groups = [AB groups];

for(int i=0;i<[groups count];i++){
ABGroup * group = [groups objectAtIndex:i];
NSString * name = [group valueForProperty: kABGroupNameProperty];

if([@"BadBadRule" compare:name]==NSOrderedSame){
[AB removeRecord:group];
[AB save];
}
}

and that was it ! No more Address Book crashes ! Turns out OSX is really nice to tweak if you are willing to code a bit.

Wednesday, January 14, 2009

My Slicehost / VPS analysis

Fist time VPS user



Starting a few months back, I have a VPS from Slicehost. It's the cheapes one they've got, with only 256MB RAM.

I never worked on a VPS, I only had either dedicated physical servers in the company datacenter (at the previous job) or CPanel-based hosted accounts (for some other clients).

All in all, a VPS is just as one might expect: almost like a normal server only slower.

And the slowness is starting to bug me a bit, specifically the problem that I don't know how slow is it supposed to be.

The fixed technical details from Slicehost is that you'll have 256MB RAM, 10GB or disk storage and 100GB bandwidth.

Now there are 2 issues here. One which seems quite obvious and another one I'll introduce later.

CPU



OK, the 1st problem is that you don't know how much CPU cycles you are going to get. Being a VPS means it runs on some beefy server (Slicehost says it's a Quad core server with 16GB of RAM).

According to Slicehost's FAQ:

Each Slice is assigned a fixed weight based on the memory size (256, 512 and 1024 megabytes). So a 1024 has 4x the cycles as a 256 under load. However, if there are free cycles on a machine, all Slices can consume CPU time.


This basically means that under load, each slices gets CPU cycles depending on the RAM it has (ie. price you pay). A 256MB slice gets 1 cycle, the 512MB slice gets 2 cycles, 1GB slice gets 4 cycles and so on.

The problem here is of course, that one is not certain that they only have on the server a maximum amount of slices, but Slicehost is clearly overselling as top usually displays a "Steal time" of around 20%.

So, assuming a machine is filled 100% with slices and there is no multiplexing, it means that a 256MB slice gets 6.25% of a single CPU under load.

6.25 isn't much at all, but considering that the machine isn't always under load, the slice seems to get a decent amount of CPU nonetheless.

If we consider the overselling issue and that 20% is stolen by Xen to give to other VPS, we get to an even 5 %.

Now, this might not be as bad as it sounds CPU-wise as I've noticed Xen stealing time when my CPU-share is basically idle anyhow so maybe it doesn't affect my overall performance.

For example: ./pi_css5 1048576 takes about 10 seconds which is more than decent.

IO



The bigger problem with VPS seems the be the fact that hard drives aren't nearly as fast as RAM. And when you have a lot of processes competing for the same disk, it's bound to be slow.

What Slicehost doesn't mention is if the "fixed weight" sharing rule they use for CPU cycles applies to disk access too. My impression is that it is.

After trying to use my VPS as a build server I've noticed it grind to a halt.

top shows something like this:


Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 62.2%id, 20.9%wa, 0.0%hi, 0.0%si, 16.9%st


but the load average for a small build is something like


load average: 1.73, 2.06, 1.93
and it easily goes to 3, 4 and even 9! when I also try to do something else there.

So, looking at the information above, we can note that 62.2%, the CPU is just idle, while the actualy "working" tasks, ie. 20.9% are waiting for IO. The rest of 16.9% CPU time is stolen by Xen and given to other virtual machines, and I don't think it really matters given that the load is clearly IO-bound.

And here lies the problem: just how fast might Slicehosts' hard drives be ? And how many per slice ? Actually more like: how many slices per drive ?

From a simple test I made, a simple build that takes 30 seconds on my MacBook Pro (2.4Ghz/2GB ram/laptop hard drive-5400rpm) takes about 20 minutes on the slice. This means the VPS is 40 times slower when doing IO-bound tasks.

Another large build that takes around 40 minutes on my laptop took 28 hours on the server. Which respects the about 40 times slower rule.

Now considering the above number and a 20% steal time, I'd expect to have a 20% overselling of slices on a physical machine. Meaning, at 16GB per machine, roughly 76 slices of 256MB on one machines. Taking into account the 1:40 rule above for IO speed, this means that they have about 2 hard drives in a server.

Conclusions



It's certainly liberating to have complete control over a server. CPanel solution just don't cut it when you need to run various applications on strange ports. Of course, the downsize is that you also have to do all the administration tasks, secure it, etc.

The Slice host services are very decent price-wise, the "administrator" panel they have provides you with everything you need, even a virtual terminal that goes to tty1 of the machine (very handy if for some reason SSH doesn't work for example).

Even the smallest slice I'm using right now has enough space, RAM and bandwidth for small tasks. If you just use it sparingly during business hours, the "fixed weight" sharing rule gives you enough CPU / IO for most tasks.

But for heavy usage, I think the solution is either to get a more expensive slice or start building your own machine.

IO-bound tasks are almost impossible to run due to the 1:40 slowness noticed. This means that you need to get at least the 4GB slice to have it run decently. Of course, that's $250 compared to the $20 slice I have right now.

CPU doesn't seem to be a problem, at least for my kind of usage. It seems responsive enough during normal load and mostly idle under heavy load (so idle that Xen gives my CPU cycles to other virtual machines). Initially I was expecting this to be a major problem while moving my build server there, but boy, was I wrong. IO-limitations don't even compare with the CPU limitations.

Getting 5% or more of a fast CPU doesn't even compare to getting 2.5% of an even slower resource like that hard drive if you are compiling.

Further experiments



During the time I was considering the CPU to be my future bottleneck, I was thinking which option would be better: 2 x 256MB slices or a bigger 512MB slice.

According to their rules and offering, the two configurations are totally comparable. Even more, using their sharing rule, 2 x 256MB slices should get at least the same CPU cycles under load as the 512MB one. (Further emails from Slicehost's support led me to believe the rule might be oversimplified, but they didn't tell me in what way -- I think the weight of the smallest slices might be even smaller with the bigger slices getting even more of their share).

So, if under load they get the same CPU cycles, it means that when the machine has CPU cycles to spare, I have 2 candidate slices to get those spares.

So the question was: for a 5% price increase I would pay for 2 x 256 slices compared to 1 x 512 slice, will I get at least 5% more CPU cycles ?

I'm still not certain with the data I've computed that it might happen. Also, the new question now would be: will I get at least 5% more IO operations ?


Non-agression



The above post isn't a rant against Slicehost. I think they are providing a decent service for their price. It is interesting though to see which kind of usage can one put on a VPS and which are better to be run on the server in the basement.


512MB update



Well, isn't this interesting. A vertical upgrade to 512MB of RAM is another world entirely. Maybe the new VPS is on a less-loaded machine, but at first sight, it's looking way better: the previous 28 hours build (fresh) takes now only 40 minutes for a small update. I'll try a clean build later this week and see how fast it is.

So it seems it wasn't only a problem of slow IO, it was also a big problem of not enough RAM leading to swap file trashing.

Friday, November 07, 2008

I guess it has begun: the environment is at fault for everything

I'm always amazed at the amount of bullshit people are able to come up with, especially when explaining some corporate move.

Take for example my main bank BRD - Groupe Société Générale. Yes, it's the same Groupe Société Générale which showed at the end of 2007 a € 4.9 billion fraud. But it's OK since the Romanian branch is really profitable for them due to limited consumer education here and powerless consumer protection institutions.

I just noticed a new message from them on the Internet Banking site: due to increased environmental awareness from the Bank, they are encouraging people to get alternative bank account statements via online banking or by post. Otherwise you are entitled to one printed account statement per month from their offices.

The reason is, of course, to save the trees by printing less. Of course they are willing to print tons of the stuff if you are willing to pay -- which will go directly into their profit but that's another problem, no ?

Also add here that they also increased the tax for having an account by 20% for individuals and 50% for companies. That probably also had some environmental reasoning that's escaping me.

Anyhow, I'm looking forward to more price increases and consumer ripoffs that's going to be done in the name of the trees.

Too bad us people can't buy our own carbon credit so that companies won't be able to offset that extra cost in the name of the environment on us. But you know what ? I'm pretty sure some one will introduce carbon credit for the masses. After all, why not ? It's a nice way to bring some more money to the state budget.

And only then that old saying will come true: they'll tax you for the air you breath !

Well, technically for the air you exhale but we're close enough.

Thursday, August 07, 2008

No new mail! Want to read updates from your favorite sites?

For a while now I've started using GMail's "Archive" button aggressively on my inbox. The end result has been that from thousands of emails, I now have 0 (zero) ! Everything is archived.

When I get a new email, it sits in the Inbox until it is resolved (ie. I reply or read it). Then it's instantly archived. Out of sight, out of mind.

I've found that this technique greatly reduces the information overload coming from emails. With a full inbox that was also showing snippets of the message (ie. small previews), every time I looked at my inbox I had some information to process. Like: oh, look, that one is starred, I wonder when they'll reply or hm, it's been quite some time since I've got an email from X as the name is on the bottom on the inbox, etc. etc.

Basically a full inbox sends you some information even when no unread emails exist. It's also quite a bad way to "search" for email. I used to manually look for some subject and/or sender in order to hit reply. Now I just use GMail's search.

I remember about some TED video where the host said something like our brain likes new information, we have an addiction for new stuff. Which is exactly what email feeds. It feeds our addiction for new things, even by just having a full list of previously received emails. I also assume that's why sites like Slashdot, Digg and Reddit are quite popular: they feed us new, easy to process, information. Imagine brain junk-food if you will or the Internet-equivalent of too much TV will rot your brain.

Related to this need to always get new stuff, I find it interesting the way Google handles this. When your inbox is empty, you get this message: No new mail! Want to read updates from your favorite sites? Try Google Reader (with a link to google reader).

So what Google is doing here is proving us what we have become used to. Not enough interruptions, not enough new stuff from email? Why gee, why don't you try this other source of new things: Google Reader. Come on, get a quick fix !

Thursday, July 17, 2008

Oh, my, how the NetBeans community has grown !

For quite some time now I've noticed an interesting trend: I don't have the time to read the email in the NetBeans mailing lists. A lot of emails where I could have given some help just fly by me as they are just too many.

Just now openide@ has 2000 unread messages, the oldest unread being from 26 November 2006 about the Manifest File Syntax tutorial (boy, a lot have changed in the Editor APIs). nbdev@ also has about 1700 unread but that's ok as I rarely post / answer there.

Now, this trend seems to be caused by two reasons: me being busy (and lately I'm working full-time on getting the Editor APIs usable in a standalone way) and the community growing.

I do remember the time when I had zero! unread messages. Now I hardly notice when another hundred adds-up.

So, how do you guys handle the workload ?

Of course, the solution might be to be a little more methodical about it and dedicate some exact time (like 30 minutes / day) but it just doesn't seem to work with me. Must be the 100 Editor modules I have open right now in the IDE -- sigh...

Sunday, July 06, 2008

I'm not sure I like Web 2.0

Remember when an URL linked to something static on the net ?

Sure, an URL could be actually a script behind that allows for a more dynamic page.

But when the script is used to discriminate users for a supposedly free site like YouTube, I'm getting kinda annoyed.

This video is not available in your country ?

Un-believable.

So Web 2.0, besides all the AJAX thingy, also brings a wide-spread encouragement to use a proxy to hide your identity ? Is this a social construct to teach us about security and privacy ? Or just a degeneration of what the Internet was supposed to be ?

Thursday, May 22, 2008

I forgot what Alt + F4 does

I saw today an avatar on a forum which said "To view my display picture hold down Alt + F4".

Now, it was clearly some funny-man trick but then it occurred to me that I'm not certain what Alt+F4 does.

I've been using OSX for so long I had forgotten about Alt+F4 on Windows. I guess this is enough Microsoft-independence.

I never wanted to program on Windows or using anything Microsoft specific. This lead me to Java initially then Java on Linux and soon to Java on OSX...

Monday, April 21, 2008

Should the language shape the mind ?

I've read an interesting blurb from Lera Boroditsky (Cognitive Psychology & Cognitive Neuroscience, Stanford University) in Do our languages shape the nuts and bolts of perception, the very way we see the world?, where the basic answer is: yes, languages not only influence our way to interpret information, but also they way we perceive it.

There is an often quoted phrase in the programmers circles:
When the only tool you have is a hammer, everything looks like a nail
,which basically states the same thing: the languages we programmers know and use influence the way we perceive reality.

That's a dangerous thing, because languages (programming languages) were only supposed to help up interpret information. Skewing our perception means we obviously don't even notice the wrong path we've taken.

Being multi-cultural -- that is, knowing multiple languages -- helps, as these may overlap and give you various perspectives on the information and thus make a better representation of the information. The end solution is also most likely to be better.

But I often wondered: shouldn't we at some point just stop trying to force our thoughts into some language and just start expression into another language altogether ? Into our language ?

Sure, learning a language might bring some "discipline" into minds; using it might help up programmers get along with each-other. But in the end a language should just give a programmer some new perspectives. The output should still be in our language.

I assume this is the reason most people think everybody else's code is shit: their internal interpretation doesn't map with the mapping inside the other programmer's brain. A younger you produced a lot of bad code by your current mapping.

Which basically means we are utterly unable to find a way to fully express our thoughts in a way other people would understand, agree and like. And by like I mean having a close mapping with the other (or just bring something totally fresh).

And this limitation doesn't just apply in relation with others but to ourselves.

Then, how should we function ? Each new problems brings into us a current solution with our present interpretation. Should we express this into something like a Domain Specific Language (DSL) ? Should each new problem be represented into a new Problem Specific Language ? (And sure, maybe PSLs at a given time might have something in common as they also represent us).

So what means that some code looks like shit then ? It means the chosen PSL is incomplete, somehow flawed or just not elegant enough compared with our current PSL. Rarely does code look like shit if we like the PSL and the solution is somewhat broken -- then it just has bugs or is incomplete, but we fix it while following the given PSL.

Won't this make cooperation really hard ? Well, not really as cooperation might change the PSL for the better. It will also force programmers to slow down a bit and try to first understand the PSL before understanding the solution. We do this anyway as even while using a common language there is always a meta-layer programmer-specific; only that this layer is sometimes obfuscated by the language used instead of being very prominent like in a PSL.

Maybe general purpose programming languages should stop existing and be replaced only by programming paradigms and concepts.

Tuesday, January 22, 2008

NetBeans Platform autoupdate via BitTorrent

Something I'm pursing nowadays is having my platform applications as decentralized as possible.

And the fist pet peewee I had is that fact that the Update Centers are such, big, centralized, monolithic blocks.

I always assumed that I would need to hack the AutoUpdate module from the NetBeans Platform quite hard in order to get what I wanted all along: BitTorrent downloads for new or updated modules.

So, the first thing we have to notice is that the AutoUpdate Catalog (see DTD) provides for each module in the Update Center a location called distribution. The distribution may be a relative path to the catalog location. That is usually something like ./com-example-mymodule.nbm or it can be a totally new URL.

Now we have a first step towards splitting traffic: we can put the actual NBM file on another URL altogether. Or, if we have the AutoUpdate Catalog location sit behind a servlet we could even try a bit of balancing and return a different distribution link depending on how loaded are the servers. That's a plus...

Ok, that's something but it isn't BitTorrent, you still download the whole file from a single place.

It's alive !

But what the Platform does offer is the possibility to register in the Lookup your own URLStreamHandlerFactory . So, I can register a new handler for the torent:// protocol and the AutoUpdate module will just use my stream handler.

And thus, a few hours later I have a working AutoUpdate infrastructure via BitTorrent. My StramHandler downloads behind the scenes with BitTorrent using Snark and provides a nice InputStream to the AutoUpdate module. It's still not polished but already useable. Install the module from this update center: http://emilianbold.ro/modules-updates.xml or just grab the NBM.

Something else: the destination of the module is no longer pointing now to the NBM, but to a torrent file actually which has the NBM file.

The steps are: place the torrent file on http://example.com/module1.torrent.nbm, edit the catalog to have destination="torrent://example.com/module1.torrent.nbm" and you're good to go. Behind the scenes I'll actually download the http file and then download the torrent.
<!DOCTYPE module_updates PUBLIC "-//NetBeans//DTD Autoupdate Catalog 2.3//EN"
"http://www.netbeans.org/dtds/autoupdate-catalog-2_3.dtd">
<module_updates timestamp="35/26/18/21/01/2008">

<module codenamebase="org.yourorghere.emptymodule"
distribution="torrent://example.com:6881/cc032d0c003b12568c91a0339f88301fa6ca67f5.torrent.nbm"
... >
...

A small remark: note the .nbm extension. It's something the AutoUpdate module needs otherwise it won't be able to install the file as NBM (could be a bug, I'll report it at some point).

The module still needs some extensive testing and different BitTorrent libraries (I'm using Snark, but I would like to have the Azureus core as a different provider in the Lookup maybe) but it does show it is possible.

Using the same technique one could write multiple backend/"protocols" for the AutoUpdate. Drop me a message if you want to know more or want to help me (source code will be online soon).

Thursday, December 20, 2007

The opensource bureaucracy

I always thought that only post-communist countries like mine can be bureaucratic and not capitalist, civilized countries or the meritocratic internet.

But one may be shocked to notice the kind of bureaucracy open source brings. In a normal "distributed" project where you don't have a sugar-daddy to pay for all the project hosting and other expenses, you need to get some free hosting.

This is the first place where you need to get an approval for your hosting, depending on what you do (you can't expect to have any project approved) or what license you use (you get the free hosting if you give your work under their preferred terms).

And the more "free" stuff you need (like build servers, wikis, email lists) the more you have to wait, accept rules and abide by them. But generally, wait and read a lot of strange disclaimers and terms and conditions.

Don't get me started when you get to the licensing part. Do you want your code into some high-profile codebase ? You need to sign the agreement, which needs to be scanned and emailed or even better faxed. Then you need to wait for the acknowledgment that the fax did arrive and someone is going to give you commit access, in a few days.

Basically the more people you involve the more it takes to do anything, especially since you depend on their goodwill. The more "steps" you have to follow, the more agreements you have to approve of, the more time you have to wait.

I'm waiting for a month now for some approval on a high-rated open-source nexus. I'm not being denied, I'm just waiting for someone to finally get to my item in the todo list.

It almost makes renting my own server seem like a good expense.

Thursday, November 15, 2007

I really like Java's tooling

There, I've said it.

Really, Java has great developer tools.

Long ago I liked to experiment with a lot of different programming languages (I own lisp.ro). Many languages got things right from the beginning while Java with the C++ inheritance is really, really verbose. Nowadays I mostly play with Python, Javascript (and studying Erlang).

But what Java lacks in succinctness compensates in tools. Big, juicy, gooey tools.

First, a bow to the JVM. It's such a nice feeling to develop on OSX and only test rarely on Windows and have everything work !

Second, I really like my NetBeans IDE with my debugger and trusty profiler. Problem with the EJB: bam! add --debug to Glassfish and connect from the IDE. Possible performance problems? kpow! attach the profiler to the application and see what's the problem.

Wanna see the health of your code: put a whole bunch of reports in maven and build your site (findbugs, pmd, taglist, checkstyle, all good stuff).

And if you feel in a coding mood, why don't you add a MBean to get quick info from jconsole, even remotely? Or even better, make a custom JMX client using JFreeChart to get a nice display of the health of the application.

It just feels like software engineering. And it's nice.

Tuesday, October 30, 2007

Java 6 meet Linux on MacBook Pro

It seems there's no Java 6 in the new OSX Leopard from Apple.

I mean, it's not like it's been almost 11 months since SUN released Java 6 on Linux, Solaris and Windows. And I bet it's more than an year since Apple started getting code for Java 6 from SUN in order to customize it in due time.

But I guess the iPhone and transparent menu is far too important to actually put some developers work on Java.

It's bad enough that they are supposed to force down on my throat a new OS release for a new JDK version -- I just want Java 6 on Tiger thank you very much. But now, they are delaying even that !

So, I'm thinking that in the future I'll probably go back to Linux and stay there. I thought proprietary Microsoft software is bad; well, I'm starting to believe maybe proprietary Apple software is just as bad (only prettier).

Hence, the first step is to get Linux working and see the hardware support. Because if it's not good enough I might just go for a Thinkpad.

What about that Apple ?

Monday, October 22, 2007

Coupon or negative numbers' marketing

I've noticed a little trend in TV commercials nowadays in Romania and it also applies to computing: new commercials or new advertisements always underline the negative number, the "user gain". Stay with me...

So, you want a new car ? Well, our car is the best: it's 1000 euro "buyer (bar)gain".

Want to be subscribed to our useless service ? How could you refuse: the first month is free!

Why don't you migrate to the new version: it's 10% faster !

Want to take a 20 year long mortgage for 100K euro ? We are the best: we give you 4K euro for free.

As you noticed, all these advertisements avoid the real issue. They avoid, the actual price (or actual speed, actual time to completion, etc).

All they brag about is that you get this discount or that super-offer. But they don't even bother to tell you the actual cost anymore.

I mean, in their mind, getting anything for free should be reason enough for people to buy their product. Makes sense to me... NOT.

My opinion is that this coupon advertisement is trying very hard to confuse the buyer. Because if anyone uses the same unit for their product like price, it's easy to compare products.

But how hard is it to compare an offer where I get 2 free months with one that gives me a free (cheap) cell-phone or another one where I may have already won an all-expense paid trip to the Bahamas. See ? It's almost impossible.

So, please, marketing gurus, stop telling me things I don't care. Tell me things I can quantify: if I get a discount, how much will it still cost ?; if your product is faster than the old one, how fast is it actually (maybe the old product ran on Cyrix processors).

The exception is when I already am a customer so I do care what I have to gain. 10% speed: sure ! Half memory usage: excellent ! Less CPU usage: even better.

Thursday, October 18, 2007

New NetBeans Platform-based tool

I just noticed on my RSS feeds some talk about VisualVM so I went and downloaded it.

First, by the screenshots alone it was clear to me it's a NetBeans Platform application. Also, the charts look awfully like the NetBeans Profiler ones. Well, lo an behold, it is a simple Platform application holding the profiler cluster.

What's annoying me is that the profiling part only works with Java 6 (not available on OSX). But the NetBeans Profiler does work with Java 5 if we just configure the proper agent. I would have been nice for VisualVM to also use the agent as not everyone is using Java 6.

Second, the OSX integration is less than stellar (it's the 1st release so I'll excuse them). The menu doesn't show up on the apple menubar the 1st time you run the tool (but on subsequent runs it does, strangely). Also, no launcher like in the IDE.

Oh, forgot to mention it uses the NetBeans Platform from NetBeans 6. Looking good guys.

Thursday, October 04, 2007

(Maven) Building to a ramdrive

I own a MacBook Pro and I have a few projects with Maven (and MevenIDE on NetBeans).

What's annoying me with the build system is that it usually writes a lot to the disk. Not only is that quite unnecessary (as the files will be overridden in no time) but the laptop hard-disk is also quite slow and all this writing is trashing it.

The solution: write to a ramdrive ! As you all know, a ramdrive is a "virtual" hard-disk that sits in your RAM and goes away when you shutdown the machine (not that I care, the build files are temporary).

First step: create the ramdisk

There are some utilities that do this, but it's quite doable from the terminal (eventually with a shellscript).

  1. First, get your disksize in MB and multiply it by 2048. So 256MB means 524288.
  2. Next, create the device: hdik -nomount ram://524288 The command will also display the name of the new device file.
  3. Create a HFS filesystem in there: newfs_hfs -v "disk name"/dev/diskXXX , where diskXXX is whatever the previous command printed
  4. Mount the filesystem: mkdir /Volumes/diskXXX &&
    diskutil mount /dev/diskXXX
You'll probably need to run some of these commands as root (su admin-user, sudo sh). I also set my non-admin user as owner with chown -R myUser:myUser /Volumes/diskXXX

At this point you should have a new 256MB drive mounted.

Second step: link maven folders


Now, I have to set the "target" folders on the ramdisk. Normally the orthodox way is to change the pom but I just didn't get it working. So my old-school solution is to use symbolic links.

This could be smarter as a "mvn clean" will remove the links we just create (but just keep a script in place that recreates this).

My script is:

for i in *; do
echo $i;
mkdir -p "$1/$i/target";
ln -s "$1/$i/target" "$i/target";
done

and I run it in the folder that keeps all my Maven projects (a flat hierarchy). Note the $1 which is the argument to the script. I use it like this:

$./linkramdisk.sh /Volumes/diskXXX

Building

All the links are in place so you can try a mvn install and see how fast it works. I my case, I reduced the build time (with no unit-tests) from 27 seconds to 17 seconds.

That doesn't seem much, but it does add up for larger projects and most importantly, it keeps the hard-drive out of the loop.

Oh, did I mention I also use FileVault on my account ? That's another reason one would like to avoid writing to disk: no need to encrypt something useless like a build artifact...

The Trouble with Harry time loop

I saw The Trouble with Harry (1955) a while back and it didn't have a big impression on me. But recently I rewatched it and was amazed a...