avva | программистское

You're viewing

avva's journal
Create a Dreamwidth Account Learn More

Reload page in style: site light

Все-таки забавно, насколько использование виртуальных методов убивает скорость. Prefetch queue - наше все.

Скажем, алгоритм обрабатывает большие массивы памяти в цикле, который делает относительно мало работы, а вызывается очень много раз. Итератор цикла - виртуальный метод вспомогательного класса, который вызывает еще пару виртуальных методов, и все они делают очень мало работы. Ничего остального не меняя, делаем эти методы невиртуальными, и алгоритм бежит в пять раз быстрее.

Flat | Top-Level Comments Only

From: (Anonymous)

It would have been if I claimed that JVM can make *every* virtual function call a direct call or an inline call. I never did.

To elaborate a bit on my original statement, JVM can only translate a virtual function call to a direct/inline call iff it can infer from the code that dynamic type and static type at the call site are always the same. It cannot assume that just because previous 100 calls were to implementation "a()", the next call will also be to "a()". As a result, what JVM can do at runtime, a C++ compiler can do at compile time. Here is an example:

struct b
{
  virtual void f () = 0;
};

struct d1: b
{  
  virtual void f () {...}
};

struct d2: b
{  
  virtual void f () {...}
};

int main ()
{
  typedef vector<b*> bv;
  bv v;
  v.push_back (new d1);
  v.push_back (new d2);

  d1 d;

  for (bv::iterator i (v.begin ()); i != v.end (); ++i)
  {
     i->f (); // Neither JVM nor C++ can optimize something like this.
     d.f ();  // Both JVM and C++ can optimize something like this.
  } 
}

From:

cmm.livejournal.com

ah, so you were talking about a specific flaw in a specific JVM. OK, fair enough, I don't know much about extant JVM or about Java in general.

my general point remains, however: it *is* possible for a dynamic compiler to do much better than this; some, in fact, do.

From: (Anonymous)

so you were talking about a specific flaw in a specific JVM

No, I wasn't.

it *is* possible for a dynamic compiler to do much better than this; some, in fact, do.

Do you "just know" or you can show a situation where a run-time compiler can translate a virtual call to a direct/inline one while a traditional compiler cannot?

From:

cmm.livejournal.com

indeed, I know of specific Lisp compilers (at least Allegro CL and SBCL; I *think* some Smalltalk implementations do such stuff too) that insert direct calls when they can prove that only one method can satisfy the constraints at the call site. note that these languages' compilers don't even have the luxury (at least where the object system is concerned), to meaningfully differentiate between "compile time" and "run time", because these languages' fundamental dynamic nature does not leave many guarantees at compile time and thus demands more smarts at run time.

this is not rocket science; the only reason this is not done in "traditional" OO languages is because the users have been trained not to demand such stuff.

again, I don't see what you are trying to say here: that such optimizations are impossible (incorrect) or that the current state of the art in the C++/Java world is kind of pathetic (perhaps so, but so what? once a need is recognized widely enough, it tends to be fulfilled)?

From: (Anonymous)

What I am saying is that for any case where a compiler can prove that indirect call can be made direct at runtime, it can be also done (and is done by C++ compilers) at compile time (this is actuall quite easy to understand if you think about it: runtime compiler can't look into the future so the amount of information available to such a compiler is the same as to the static compiler since previous execution information is useless in making this kind of decicions). You seem to believe otherwise so I asked you to show an example. Instead you are provide so general hand-waving about Lisp compilers, etc.

From:

cmm.livejournal.com

> runtime compiler can't look into the future so the amount of information available to such a compiler is the same as to the static compiler since previous execution information is useless in making this kind of decicions

this is nonsense. firstly, you don't need to look into the future; the past and present are quite enough. either you know that there is only one method that fits the constraints (resulting in naked direct call), or you know which method was called last from your call site, so you cache it. pretty simple, really, especially for C++ or Java, where all you need to know in order to decide to use the cached method or not is to check the type of one object.

> general hand-waving

in what way is what I said hand-waving? I can dig up some code and disassemble it for you easily enough, but do consider the situation: an anonymous poster wants me to spend an effort in order to illustrate something bloody obvious. now, why should I do that?

From: (Anonymous)

or you know which method was called last from your call site, so you cache it. pretty simple, really, especially for C++ or Java, where all you need to know in order to decide to use the cached method or not is to check the type of one object.

Ok, I give up. The quoted text shows that you have no idea how virtual function mechanism works. When you call a virtual function, it is translated to an indirect call, that is, a call to a function who's address is stored somehere in memory:

i->f (); -> i->vtbl[0] (i);

So now you suggest that we cache the address of the last function called (plus some typeid of type to which it belongs, but you didn't mention that because you have no clue), then check on the next call if the types are the same (that is, obtain the typeid of an object and compare it to the stored) and finally call the cached function:

if (typeid (i) == cached_typeid)
{ 
  cached_function (i);
}
else
  i->vtbl[0] (i);

Now guess which one will be faster? Hint: just obtaining and comparing typeid's will be slower than the whole indirect call.

So instead of talking about stuff you have no clue about, why don't you just go to reddit or something and upvote all those list/java/haskel posts.

From:

cmm.livejournal.com

> Hint: just obtaining and comparing typeid's will be slower than the whole indirect call.

that's the point: not for the situation under discussion here, because from the fact that such an enormous overhead is incurred when calling a virtual method it is obvious that the vtable is not in the processor cache (why this is so is another interesting question)! but the object whose typeid you need is there. thus, accessing the object's typeid is much cheaper than accessing the vtable.

From:

cmm.livejournal.com

eek. silly me!
for C++, you are right, because there's no way to add classes at runtime anyway. so failing to cache virtual method calls is even less excusable than it would be otherwise.

now if you'll excuse me, I'll go look for those remedial reading comprehension classes. :/