Skip to main content

Discoveries

I never thought about research as a really engrossing activity right until you actually discover or uncover some facts that you never might have thought were there. I could only talk about my thesis because I have just begun analyzing results I had been gathering for a semester or so now already.

It seems that what has been written before about load balancing and processing distribution have a couple of stringent assumptions that may or may not be met in real life: first would be that the processors are the same (setup is homogeneous), and that global balance should be quantifiable. The first assumption is necessary to simplify the task of load balancing -- and stick as much as possible to solving the bin packing problem. However, in reality the bins don't all get emptied at the same rates, and the rate is not constant. The capacity is however, infinite (i.e. a processor can work forever or at least until it can perform as expected) but that hardly changes the setting of the problem.

So in the bin packing problem, if you want to be able to fit in items of different sizes and keep the sum of the sizes in each bin minimal, you'd most probably think that there should be a simple algorithm to solve the problem. I have read about the bin packing algorithm, and a modification which actually involves resorting the bins as they are being filled. However, the assumption here would be that you know the sizes of the items beforehand, and that there is a finite number of items to be distributed among the bins. The assumption of "finiteness" is necessary, otherwise putting the items in any manner into the bins wouldn't matter much. The assumption of knowing the sizes of the items to be put in the bins in real life would sound normal -- however, in the computing world, you never know what you have at hand.

I chose the prime number finder as a test problem for my proposed algorithm because primarily of the difference in the amount of work being done as you move from one number to another. Maybe that didn't make sense, but I'll try to outline it as simply as I can.

To find the prime numbers from 1 to N (where N > 5), you can either use the sieve of Erasthotenes (I hope I spelled that correctly) OR you check each and every number from 1 to N whether that number is prime or not. If N was a really large number, then the sieve would be faster but would use more memory than if you checked the primality of each number. However, checking the primality of a number in the worst case would be in exponential time (there are other ways, but that is not my concern). Aside from that, checking whether x is prime (where 3 < x < N - 2 and is ODD) would take less time than say whether x+2 is prime.

To check whether x is prime, there would be three basic steps: 1) if 0 < x <= 3 then x is PRIME; 2) if x is even then it is COMPOSITE; 3) check whether x is divisible by every ODD number from 3 to the cieling of the sqare root of x (CIEL(SQRT(x))) -- if it is, then it is COMPOSITE otherwise it is PRIME. In C it would be:

int isprime(int x) {
int i;
if ( x <= 3 ) return 1;
if ( x % 2 == 0) return 0;
for (i = 3; i*i <= x; i+=2)
if ( x % i == 0 ) return 0;
return 1;
}

NOW, to know whether 99983 is prime would take more time than say knowing whether 16 is prime. Then think of all the numbers from 1 to 10 million to be the items to be distributed to P bins... do you sort the numbers per value, or would you be able to know beforehand the length of time it will take a number to be determined prime or not? How then would you distribute the numbers being the items among P bins such that you do the most work at the shortest amount of time (i.e. the distribution is optimal)?

Dividing the 10 million numbers evenly among the number of slaves in a master-slave architecture doesn't necessarily achieve global load balance because the slaves don't do the same amount of work knowing that they all have to deal with different numbers. A mesh wouldn't help either because as in the master slave model, each node in the mesh would have different numbers to deal with. And then we all know that half the numbers would be even, but the number of primes in an interval wouldn't be easy to calculate beforehand (i might be wrong, but I haven't thought it out yet).

So what am I getting at here? Based on results, automatic load balancing (giving a node some work, and let it do the work until it needs more, on which time it will be given more work to do) would be a feasible solution to similar problems BUT the number of nodes and the amount of numbers to give each node would have a significant effect on the overall performance of the system. Even distribution of items in problems like these do not necessarily yield the best results in terms of time and resource maximization (throughput).

And it's so nice to know that you have data to back up your claim.

Keep chilled... Until next time.

(Technical Paper pending, but should be available to interested parties.)

Comments

Popular posts from this blog

Writing Again

It's 2019 and I just realised that I've not written on this blog for a long while. I feel a little bad about this so I'm picking it back up again. More importantly, I've limited my social media to just Twitter (I've deleted all my Facebook-related accounts) and will be writing more on the blog instead of engaging in other social media sites. If you want to reach me directly, you can also reach me through my keybase.io account for encrypted communication. If you have my phone number, you can also contact me through Signal. Quite a number of things have happened in the past few years and here's a quick update on things that I can share:

I've been working on XRay, a function call tracing system now part of the LLVM project. This took a good two and some years of my time at Google.Most recently I've moved to the Chrome Operations Team still here in Google Sydney. I can't give specifics yet of what I'll be working on, so stay tuned.There've been c…

A Passion Project

I was so moved today by the prospect of a passion project that I took some time on a Friday night to get it done. Let me present the #RedJeans project over at redjeans.org. I've found myself wanting to work on a project that came purely from the heart and one that was very dear to me, something that is personal, and connects with a larger community of people in the world.
The idea for redjeans.org came to me as a hint when I was writing up my reflection for 2018. I realised that I didn't spend quite as much time identifying with and working with a community. I did a bit of soul-searching and found that one of the activities I really enjoyed and cherished in years past is donating blood -- and I keep wondering why not more people do it. It was an idle thought but then a conversation with someone where I described why I wrote down "donate blood more often" in 2019 became an idea where instead of just me doing it, how about if I get my friends to do it too?

I left it a…

Futures and Options III: Economics, Journalism, or Computer Science

I realise it's been a year since my previous post on this blog, and I've found myself having very little time to do another "brain dump" on the subject of my early choices in life. With that in mind (and as I'll be traveling again soon) I get to think a little more and reflect on a few of the things that have happened.

Much like the previous post, this one's set in high school -- where I was part of the swimming team, in a band, had been programming with Turbo Pascal, Java, and then C++ later on, and was about to make a choice that would literally change the course of my life. This one is about the choices I made, and the ones that were made for me.

Note: This is part 3 of a series about my early choices in life which have gotten me to where I am today. I would greatly appreciate your feedback and thoughts, as well as for your reading through this series!