Path: news.cs.au.dk!news.net.uni-c.dk!sunsite.auc.dk!newsfeed1.uni2.dk!newsfeed.tli.de!newsfeed.cwix.com!206.172.150.11!news1.bellglobal.com!news.uunet.ca!not-for-mail From: Karl Waclawek Newsgroups: comp.lang.beta Subject: Re: Garbage Collector vs. cStruct, ExternalRecord, @@ Date: Thu, 11 Feb 1999 13:21:43 -0500 Organization: The Toronto Star Lines: 164 Message-ID: <36C31FB7.8F555F88@thestar.ca> References: <19990211102731.4091.qmail@noatun.mjolner.dk> NNTP-Posting-Host: 192.206.151.130 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Mozilla 4.5 [en] (Win95; I) X-Accept-Language: en Xref: news.cs.au.dk comp.lang.beta:11818 Peter Andersen wrote: > In your code below you ask why the code using cStruct is slower than > the code using @@. The answer lies in the fact that > > ... -> TestBuf.putByte > > currently is implemented as > > ... -> &TestBuf.putByte > > i.e. NOT being inlined. This means that you will get one million > objects allocated during the for-loop. > If you instead use > > pb: @TestBuf.putByte; > do (for ... repeat > ... -> pb; > for) > > you will get a significant speedup, since the same putByte object > is reused for each iteration. I tried it with approximately > 35% speed-up as a result. > > However, there is one flaw: Inside the implementation of cStruct.PutByte > there is an invocation of a virtual named ChkBounds (which implements > index check - notice that raw use of @@ does not do this). > The invocation of this virtual will also instantiate one object per call > of putByte. Unfortunately this is a bit harder to work around - actually > you have to change the file basiclib/betaenv.bet (causing recompilation > of everything thereafter). The change could be to declare one static instance > of ChkBounds: > > chk: @ChkBounds; > > and then using this in the implementation of the put- and get-operations > in basiclib/external.bet. > Feel free to try this in your local installation. It may require a recursive > change of permissions to allow the compiler to generate new .ast(L) and > .o files. > > I tried it in my local installation, and combined with the optimization > mentioned above, I gained a 70% speedup. That is excellent, of course, but there still remains a speed difference of roughly a factor 5!! (I didn't test the second fix, but I trust your 70% are correct). The real slowdown IMHO is the fact that putByte performs an index check + a function call every time. For tasks like the ones in the code below one should have a "putBytes" pattern which takes a Char repetition as input or a control pattern "scanRange", so that one could do the following (1, 1000000) -> MyStruct.scanRange(# do Current -> MyGetValue -> Value #); That way one would only have to perform the index checks for the boundaries and one could use a more efficient way to access a specific Byte position. > > > > Example with cStruct (buffer initialization is very slow): > > > > ORIGIN '~beta/basiclib/v1.6/betaenv'; > > INCLUDE '~beta/basiclib/v1.6/external' > > '~beta/basiclib/v1.6/numberio'; > > LIBFILE nti '$/crc.lib'; > > -- program: Descriptor -- > > (# > > BufType: cStruct (# byteSize::< (# do 1000000->Value #) #); > > IntegerStruct: cStruct > > (# byteSize::< (# do 4->Value #); > > val: Long (# pos::< (# do 0->Value #) #); > > enter val > > exit val > > #); > > GetCRC32: external > > (# BufP: ^BufType; ByteCount: @Integer; CRC: ^IntegerStruct; > > enter (BufP[],ByteCount,CRC[]) > > do callStd > > #); > > TestBuf: @BufType; > > TestCRC: @IntegerStruct; > > do (* initialize Buffer with some data - very slow *) > > (for I: TestBuf.byteSize repeat (I-1,I mod 256) -> TestBuf.putByte; for); > > 16xFFFFFFFF->TestCRC; (* initialize CRC *) > > (TestBuf[],TestBuf.byteSize,TestCRC[])->GetCRC32; > > (16,TestCRC)->putBaseD; (* print result *) > > #) > > > > Example with @@ (buffer initialization is fast): > > > > ORIGIN '~beta/basiclib/v1.6/betaenv'; > > INCLUDE '~beta/basiclib/v1.6/external' > > '~beta/basiclib/v1.6/numberio'; > > LIBFILE nti '$/crc.lib'; > > -- program: Descriptor -- > > (# > > BufType: (# R: [1000000] @Char #); > > GetCRC32: external > > (# BufPtr, ByteCount, CRCPtr: @Integer; > > enter (BufPtr,ByteCount,CRCPtr) > > do callStd > > #); > > TestBuf: @BufType; > > TestCRC: @Integer; > > > > do (* initialize Buffer - is a lot faster than above *) > > (for I: 1000000 repeat (I mod 256) -> TestBuf.R[I]; for); > > 16xFFFFFFFF->TestCRC; (* initialize CRC *) > > (@@TestBuf.R[1],1000000,@@TestCRC)->GetCRC32; > > (16,TestCRC)->putBaseD; > > #) > > > > This code looks simpler (and is faster), but could the garbage collector > > potentially free the TestBuf pattern? > > Especially in this case (BufPtr, CRCPtr declared as @Integer): > > ... > > @@TestBuf.R[1] -> BufPtr; > > @@TestCRC -> CRCPtr; > > 16xFFFFFFFF->TestCRC; > > (BufPtr,1000000,@@TestCRC)->GetCRC32; > > ... > > the GC might assume that TestBuf and TestCRC are not use anymore after > > the first two lines in the code sample above, and might therefore mark > > them for deletion, possibly causing access violations in the external call. > > The TestBuf is declared static in the Program SLOT. So as long as the > Program object exists (i.e. the lifetime of your program), the TestBuf > will exist. Aha, so the compiler does not leave hints for the GC as to when some variables will not be in use anymore. > But as said, the use of @@ may easily cause access violations for other > reasons. And since GC almost never happens at exact the same spots in the > program, the behaviour may actually turn out to be non-deterministic. I can understand that it is dangerous to mix GC and "manual" (de)allocation. I assume, that I can rely on a cStruct NOT being deallocated as long as it is in scope. If it goes out of scope then it may be GCed, even though some external code may still be using it. Is that correct? > > Can anybody shed some light on this? > > Is the purpose of cStruct to prevent problems with the GC? > > Exactly. The idea is to encapsulate the "greasy" lowlevel stuff in a > high level construct. Ideally it should be (almost) as efficient as using > the low-level approach directly, but in this case unfortunately the index- > check has not been proporly implemented. > It will be fixed in a later release. And some "bulk" processing patterns would be nice too (as mentioned above, i.e. putBytes, scanRange, or something else). Karl P.S.: In a message posted Jan. 15 I mentioned a potential bug in %getByte (another undocumented pattern). Can you confirm that or was I doing something wrong in the sample code I posted? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Karl Waclawek KD Soft Inc. * Phone: (905) 579-3443 * E-Mail: waclawek@idirect.com