Inline caching is an optimizing technology for “dynamic language” compiling, more details about it can be found in https://en.wikipedia.org/wiki/Inline_caching. I am working on cl-gobject-introspection (https://github.com/andy128k/cl-gobject-introspection), which is a common-lisp binding of gobject introspection. Inline caching should be helpful for its performance. So I do some experiment in implementing inline caching in common lisp. Given a sample class/object implementation as follow,
(defclass some-obj () ((some-class :initarg :some-class :reader some-class-of) (fields :initarg :fields :type simple-vector))) (defclass some-class () ((fields-desc :initarg :fields-desc :type sequence) (methods-hash :initarg :methods-hash :type hash-table))) (declaim (ftype (function (some-class symbol) function) some-class-get-method)) (defun some-class-get-method (some-class method) (declare (type symbol method) (type some-class some-class)) (with-slots (methods-hash) some-class (gethash method methods-hash))) (declaim (ftype (function (some-class) some-obj) some-class-make-instance)) (defun some-class-make-instance (some-class) (with-slots (fields-desc) some-class (let ((inst (make-instance 'some-obj :some-class some-class :fields (make-array (list (length fields-desc)))))) (funcall (some-class-get-method some-class 'init) inst) inst))) (defun send (obj method &rest args) (declare (type some-obj obj) (type symbol method)) (let* ((cls (some-class-of obj)) (func (some-class-get-method cls method))) (apply func obj args)))
And a simple test program as follow,
(defun make-trivial-some-class () (let ((methods-hash (make-hash-table :test #'eq))) (setf (gethash 'init methods-hash) (lambda (some-obj) (declare (type some-obj some-obj)) (with-slots (fields) some-obj (setf (aref fields 0) 0)))) (setf (gethash 'inc methods-hash) (lambda (some-obj val) (declare (type some-obj some-obj)) (with-slots (fields) some-obj (incf (aref fields 0) val)))) (setf (gethash 'dec methods-hash) (lambda (some-obj val) (declare (type some-obj some-obj)) (with-slots (fields) some-obj (decf (aref fields 0) val)))) (make-instance 'some-class :fields-desc '(value) :methods-hash methods-hash))) (declaim (type fixnum +stress-loops+)) (defconstant +stress-loops+ 1000000) (defun stress1 () (let* ((c (make-trivial-some-class)) (o (some-class-make-instance c))) (iter (for i :from 0 :to +stress-loops+) (declare (type fixnum i)) (send o 'inc 1) (send o 'dec 1))))
For each method invoking (message sending), a hash table is searched to find the real function to call. Because the class of the object is only known at execution time (this is why it is called dynamic), the real function to call can be determined only at execution time too. Inline caching can be used to optimize for this situation. Because at most call sites, most objects has the same class. An inline optimized version is as follow,
(defmacro send-with-cache (class-var func-var obj method &rest args) (with-gensyms (robj rsome-class) `(let* ((,robj ,obj) (,rsome-class (some-class-of ,robj))) (declare (type function ,func-var)) (unless (eq ,class-var ,rsome-class) (setf ,func-var (some-class-get-method ,rsome-class ,method) ,class-var ,rsome-class)) (funcall ,func-var ,robj ,@args)))) (defmacro cache-defun (name params &body body) (let ((cache-vars nil)) (flet ((transform (form) (let ((sb-walker:*walk-form-expand-macros-p* t)) (sb-walker:walk-form form nil (lambda (subform context wenv) (declare (ignore wenv)) (cond ((and (eq context :eval) (listp subform) (eq (car subform) 'cache-send)) (let ((class-var (gensym)) (func-var (gensym))) (push class-var cache-vars) (push func-var cache-vars) `(send-with-cache ,class-var ,func-var ,@(cdr subform)))) (t subform))))))) (let ((ndefun (transform `(defun ,name ,params ,@body)))) `(let ,(iter (for var :in cache-vars) (collect `(,var nil))) ,ndefun))))) (cache-defun stress2 () (let* ((c (make-trivial-some-class)) (o (some-class-make-instance c))) (iter (for i :from 0 :to +stress-loops+) (declare (type fixnum i)) (cache-send o 'inc 1) (cache-send o 'dec 1))))
The last real function to call and its class is cached in 2 lexical variables. For subsequent invoking, the cached function is used directly if the class is same as the cached one. A code walker is used to generate lexical variables definition and use them in each call site. Unfortunately, it is quite hard/complex to implement a portable code walker, so the code walker of SBCL is used. It appears that many common lisp implementations have code walker implementation by themselves, so similar stuff can be implemented in some other common lisp implementations easily as well.
How about the performance gain of inline caching? Result is as follow on a low voltage mobile CPU.
CL-USER> (time (cl-inline-cache::stress1)) Evaluation took: 0.183 seconds of real time 0.184000 seconds of total run time (0.184000 user, 0.000000 system) 100.55% CPU 436,898,331 processor cycles 0 bytes consed NIL CL-USER> (time (cl-inline-cache::stress2)) Evaluation took: 0.107 seconds of real time 0.108000 seconds of total run time (0.108000 user, 0.000000 system) 100.93% CPU 256,277,313 processor cycles 0 bytes consed NIL
The performance gain is about 70%.
I should have said that the meta programming capability of common lisp makes this much easier to be implemented or even possible. But I found that this can be implemented in C with static local variable even easier. Never the less, I still can say that common lisp provides greater flexibility.
The inline caching here convert hash table look up into indirect call. While in typical implementation, this can be further optimized into direct call. That could be implemented in common lisp via converting the code and call compile function at run time.