« Data Sutra 4.0 Released | Main | Phone calls from web client with Asterisk »

May 26, 2013

Divide and conquer with Servoy headless client plugin

Who do you call when you write a method that turns out will take over 33+ days to complete on a 12 core server? As business programmers, efficient and speedy code usually takes a back seat to getting the job done. But 33 days is a few days above the limit where what you are (or not) doing is sure to get you noticed by someone up the food chain.

Wouldn't it be nice to speed this up dramatically without having to do a major code overhaul?

12-core army-1

12 cores maxed out

Scenario
Recently a job came through where we needed to convert a bunch of tables from integer primary keys to UUID primary keys. The other major component was splitting one table up into four related tables while keeping links to other tables in place. Exception logic included things like setting delete flags to orphaned data.

Issue
This is the kind of brute force coding task that can get you into trouble in a hurry. "Trouble" defined in this case as over a million rows in the table that we needed to split up across four tables.

Hitting the start button on the first pass version of our routines and watching things progress (…NOT…) and we knew we were in for it.

Servoy headless client plugin to the rescue
The Servoy headless client plugin spawns a server-side only client to run a method on. Effectively, this allows you to execute some code without blocking the continued execution of the function triggering the headless client plugin call.

Once a headless client plugin method is triggered, there is no connection maintained between the caller and the callee methods. However, you can pass in a name to a method that will run on completion of the headless client plugin call.

Main benefits:

• run code that doesn't block the UI
• run code simultaneously

Bring on the army
If you can call one headless client, why not call a bunch! The following code snippets show how to do this. A few key points:

1. How to divide up the work
The key to this approach is to figure out a way to divide up the iterative process that is taking a long time (often a loop) into chunks that can be assigned to different processes that can be handled independently.

In our case, we divide the total number of records into chunks and assign a chunk plus a starting point to a job.

2. How many processes to spawn
On a 12 core server, latest release of Ubuntu, lots of memory, and Java 7 -- Servoy can easily manage quite a few processes. We figured 50 concurrent jobs could easily be handled. The other limiting factor we accounted for in the code is how many client licenses are available.

3. Start up another headless client as soon as one finishes
Nifty coding trick to detect when a client slot frees up and spawn another process right away.

4. Efficient chunks of work
You want each job to run as efficiently as possible. It turns out that the larger a Servoy foundset gets, the longer it takes to create new records in a foundset. So we kept the foundset number per job to 5000.

// SNIPPET: main method
var dataset = databaseManager.getDataSetByQuery(
		'data_linds_old', 
		'select id_activity from activity order by id_activity asc limit 1', 
		null, 
		-1
	)
var startPK = dataset.getValue(1,1)
var interval = 5000

//how many workers we need in order to get all the records
var workers = Math.ceil(1020000 / interval)

//create army of headless workers
if (uuidMap && startPK && interval) {
	for (var i = 0; i < workers; i++) {
		//make sure that we don't run out of servoy licenses
		while (application.getActiveClientCount(false) > 60) {
			globals.CODE_debug_log('Waiting to spawn additional workers so do not run out of servoy licenses',LOGGINGLEVEL.INFO,logFile)
			application.sleep(60000)
		}
		var headlessClient = plugins.headlessclient.createClient("wf_CRM_lind", null, null, []);
		if (headlessClient != null && headlessClient.isValid()) {
			var offset = startPK + interval * i
			headlessClient.queueMethod(null, "UTIL_convert_activity_worker", [uuidMap,offset,interval,i+1], UTIL_convert_activity_destroyer)
		}
	}
}


// SNIPPET: processor method

/** @type {JSFoundSet} */
var fsActivity = databaseManager.getFoundSet('db:/data_linds_old/activity')
fsActivity.loadRecords('select id_activity from activity where id_activity between ? and ?',[start,start + size - 1])

End results
Utilizing the divide and conquer approach we were able to process around 5,000 records per minute. Bringing the total time down from 33 days to under four hours. Go army!

What we didn't try this go around was writing out insert statements to a SQL load file instead of creating records through foundsets. Writing to a text file is fast and nothing can compare to loading data directly to the backend. Probably get another order of magnitude speed increase skipping the Servoy middle tier for record creation.

Divide and conquer by spawning up many Servoy headless clients is definitely a technique we're happy to have in our toolbox though. Huge speed improvements can be realized without major code changes.

| Posted by David Workman on May 26, 2013 at 11:48 AM in Articles | Permalink

Comments

Just what Cobol on a mainframe was designed to handle :)

Posted by: David | May 27, 2013 3:19:52 AM

I'm pretty sure I killed a few trees debugging Cobol/JCL back in the day.

Posted by: David Workman | May 28, 2013 5:28:21 PM

Post a comment