//----------------------------------------------------------------- // nicholas.s // // Nicholas wrote: I was asked this question today at a job- // interview: // // What is the fastest way to count the number of set bits in // an array of ten-thousand 16-bit integers with an unlimited // amount of RAM? // // OK, here is one way we could attack this problem using x86 // assembly language, taking advantage of the GNU assembler's // repeat-macro capability (to reduce 'branching') and of the // processor's special 'xlat' instruction (permitting 'reuse' // of our solutions for a small number of cases to be applied // efficiently when we have a much larger number of cases). // // programmer: ALLAN CRUSE // written on: 06 MAY 2008 // revised on: 29 APR 2009 -- for our x86_64 Linux platforms //----------------------------------------------------------------- .global bit_total # makes function-name visible # prototype: int bit_total( short *array, int count ); # the 'array' argument will be passed in register RDI # the 'count' argument will be passed in register RSI # the function-value must be returned in register RAX .section .data lookup: .zero 256 # this will be a lookup-table .section .text # # This function will be called from a C++ program, with function # arguments 'array' and 'count' passed in registers RDI and RSI; # it should return the array's total number of set bits in RAX. # bit_total: push %rbx # save caller's registers push %rcx push %rdx push %rsi push %rdi # construct a 'lookup-table' containing one entry # for each of the possible byte-values 0..255, so # that the i-th entry in this lookup-table equals # the number of set bits in that number i itself. # # Algorithm for constructing the lookup-table: # # for (int i = 0; i < 256; i++) # for (int j = 0; j < 8; j++) # if ( i & (1<