Path: news.cs.au.dk!news.net.uni-c.dk!sunsite.auc.dk!twister.sunsite.auc.dk.POSTED!not-for-mail
Sender: eernst@nsu2.cs.auc.dk
Newsgroups: comp.lang.beta
Subject: Re: Am I missing something obvious
References: <20001024092406.22334.qmail@noatun.mjolner.dk> <39F593C3.EB309B74@cepsz.unizar.es>
From: Erik Ernst <eernst@cs.auc.dk>
Message-ID: <u8ok8abok1p.fsf@nsu2.cs.auc.dk>
Organization: Department of Computer Science, University of Aalborg, Denmark
Lines: 196
User-Agent: Gnus/5.0803 (Gnus v5.8.3) Emacs/20.7
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 10 Nov 2000 16:42:41 GMT
NNTP-Posting-Host: 130.225.194.46
X-Trace: twister.sunsite.auc.dk 973874561 130.225.194.46 (Fri, 10 Nov 2000 17:42:41 MET)
NNTP-Posting-Date: Fri, 10 Nov 2000 17:42:41 MET
Xref: news.cs.au.dk comp.lang.beta:12639

>>>>> "Alejandro" == Alejandro Villanueva <190921@cepsz.unizar.es> writes:

(Long-time-too-much-work-to-read-comp.lang.beta, and then there is
such an interesting discussion going on when I look..  :-)

    Alejandro> Well, having read section 5.10.2 of the BETA book, I'm
    Alejandro> told that the procedure will be inlined. 

The compiler can create one instance and reuse it, so "caching" might
be more precise than "inlining".  The BETA book does say 'similar to
an inline procedure call', but that could then be described as an
extra optimization opportunity in the cases where it is statically
known that the object will not be used in any way.

Actually, the semantics in the BETA book was never implemented (as far
as I know).  At the time, it seemed to be reasonable to let
programmers give the compiler a hint that certain method invocations
could be reused.  It was entirely up to the programmer to ensure that
this caching would not have ill effects (such as reusing the same
"activation record" for a recursive method, thus overwriting local
variables in other invocations of the same method).

The fact is that the inserted items (syntactically they are
actually InsertedItem or AttributeDenotation, used as an Imperative,
i.e., a statement) are never cached, a new object is created every
time.  Just as if you had written the "&".  So you may in fact just
forget all about "&" in connection with method invocations.

There are good reasons why you would want to keep it that way (and
change the BETA book :-)

  - It is an optimization hint from programmers, and such hints should
    probably be kept out of the language; it would be better for the
    language to optimize provably safe cases (i.e., where it makes no
    difference except for better performance) and let the programmer
    concentrate on "real" design.

  - It is an unsafe optimization: If it is applied in a (directly or
    indirectly) recursive method it will cause very funny bugs,
    because of the shared-local-variables problem.  I just tried this:

    ORIGIN '~beta/basiclib/betaenv'
    -- program:descriptor --
    (#
       m: (# i: @integer enter i do (if i<10 then i+1->m if) exit i #);
       m2: @(# i: @integer enter i do (if i<10 then i+1->m2 if) exit i #)
    do
       0->m->putint;
       newline;
       0->m2->putint
    #)

    This program prints '0' and '10'.  In the first case, 'm' is an
    ordinary recursive procedure, and it behaves as expected (as if we
    had used "&" before every non-defining occurrence of 'm').  In the
    second case, using 'm2', I've simulated the caching, and as we can
    see it leads to clobbering of 'i'.  All the recursive invocations
    of 'm2' are sharing the same 'i'.

  - I strongly feel that such a semantics should not be the
    (syntactically easier) default case.  If we really want to reuse
    the activation record of a method invocation (which is effectively
    what the caching would give us) then we could just specify that
    explicitly, just by writing and using 'm2' in place of 'm'.  So
    it's not even hard to get the same thing when we want it.  But we
    shouldn't get it by accident!

I've reinterpreted the symbol "&" slightly in context of the language
gbeta.  Here, it means "must be new".  So we could have this:

  (#
     m: (# do <<something>> #);
     m2: @(# do <<something>> #);
  do
     m; (* OK, create instance of 'm', execute it *)
     m2; (* OK, execute the existing object called 'm2' *)
     &m; (* OK, works like 'm', but also documents object creation *)
     &m2; (* ERROR! 'm2' _is_ an object, cannot create one *)
  #)

When "&" is taken to mean "must be new", we are effectively saying
"get hold of 'm', then create a new object and execute it".  This is
perfectly acceptable when 'm' is a pattern, and it works the same as
it always has.  A stand-alone 'm' would also cause the creation of a
new object, and that would also be the same behavior as today.  But it
would not be acceptable with 'm2', because 'm2' denotes an existing
object and not a new one.

The difference is that "&" is used to tell the programmer that there
_must_ be a new object involved.  This works as a call-site mark
(both as documentation and with automatic checking by the compiler),
to ensure that it is indeed possible to obtain a new object at 
that point.  So it's a signal from one programmer to another that "I 
really depend on this being a new object, every time".  Since sharing
of state is semantically significant, it makes sense that programmers
be able to specify this kind of constraint.

If you as a programmer do not specifically insist on having a new
object every time, then just leave out the "&".  In that case it will
be possible for other people to change the program in such a way that
the "invoked method" becomes a reused activation record.  Just edit
the declaration to make it look like 'm2' above.

With this approach, "&" is used in a backward compatible way, and it
enables programmers to require something semantically useful ("a new
object every time").  A compiler would then be allowed to do caching,
inlining and whatever _only_ in cases where it would provably not
change the visible behavior of the program.

    Alejandro> This is ok for
    Alejandro> something like:

    Alejandro> P: (#
    Alejandro>   I, J: @Integer;
    Alejandro>   enter (I, J)
    Alejandro>   do I+J -> I
    Alejandro>   exit (J, I)
    Alejandro> #)

    Alejandro> TEST: @(#
    Alejandro>   N, M: @Integer;
    Alejandro>   do
    Alejandro>   (2, 3) -> P -> (N, M);
    Alejandro> #)

Yes.

    Alejandro> [..] But... what about this one:

    Alejandro> Q: (#
    Alejandro>   A, B: @Integer;
    Alejandro>   PP: (#
    Alejandro>     I, J: @Integer;
    Alejandro>     enter (I, J)
    Alejandro>     do A+I -> A; B+J -> B;
    Alejandro>     exit (A, B)
    Alejandro>   #)
    Alejandro> #)

    Alejandro> TEST: (#
    Alejandro>   Q1: @Q;
    Alejandro>   N, M: @Integer;
    Alejandro>   do
    Alejandro>     3 -> Q1.A;
    Alejandro>     5 -> Q2.B;
                         ^ 1, I presume?

    Alejandro>     (1, 2) -> Q1.PP -> (N, M);
    Alejandro> #)

This is actually just fine.  You are accessing the 'A' attribute of
the 'Q' object, etc.  That's essentially the same as

  class Point { public int x,y; }
  Point p = new Point();
  p.x = 3;
  p.y = 5;

'Q1.PP' is an "inserted item", and that would allow the compiler to
create and reuse an instance of the pattern Q1.PP (with the old
interpretation of inserted item).  Such an instance would be nested
inside the object Q1, so it would work on the 'A' and 'B' attributes
of Q1.  This would be a case where the object caching makes no
difference, so even according to my (stricter but safer) semantics, it
could be cached.

It could not be inlined as source code, because that would break the
link between name applications like 'A' and the associated
declarations in the enclosing instance of 'Q'.  In general, we cannot
move code around (such as by inlining or whatever) and expect it to
have the same meaning, because name applications may resolve to
entities declared in enclosing scopes.

But that is a general property of inlining---we cannot expect to be
able to inline code without modifying it in such a way that it works
the same way after being moved.  Whether this modification happens on
source code, abstract syntax trees, intermediate language, or
whatever, that is a matter of implementation.

    Alejandro> where Q1 is not a pattern, but a pattern instance? How
    Alejandro> to inline it? What's the final value of N and M? and
    Alejandro> why? What's the difference
    Alejandro> if I wrote (1, 2) -> &Q1.PP -> (N, M) instead?

That makes no difference, in this special case.  Well, you could not
use a pattern for Q1, because an expression like 'Q.PP' is an error
(as a statement, at least).  If you are "dotting into" anything, then
that anything had better be an object.  But since "&" would apply to
'PP' anyway in 'Q1.PP', it makes no difference.


  regards,

-- 
Erik Ernst                                    eernst@cs.auc.dk
Department of Computer Science, University of Aalborg, Denmark