summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Leandro Lucarella [Tue, 19 Jul 2011 02:02:25 +0000]
Don't leak weak pointers
The free() call after the return is never executed, so we have to move it
inside the locked() block of code.
Leandro Lucarella [Thu, 21 Oct 2010 00:36:25 +0000]
Adapt conservative default to the compiler support
When compiling the GC with a compiler that doesn't provide PointerMap
information, the conservative option defaults to true, as most likely the
GC will be used with programs compiled with the same compiler and all data
will be scanned conservatively anyway.
Leandro Lucarella [Thu, 21 Oct 2010 00:30:49 +0000]
Avoid compile error for LDC
The variables used in scope(exit) must be declared before the scope
statement.
Leandro Lucarella [Mon, 20 Sep 2010 23:30:16 +0000]
Add (optional) early collection support
When early collection is enabled, the collection will be triggered before
the memory is actually exhausted (the min_free option is used to determine
how early the collection should be triggered). This could remove a little
pressure from the GC when eager allocation is not available (or when eager
allocation makes the heap grow too much).
The option is disabled by default and can be disabled with the
early_collect option.
Leandro Lucarella [Mon, 20 Sep 2010 23:25:58 +0000]
Change the min_free default to 5%
Empirical tests shown that the performance is a little better with a lower
memory footprint.
Leandro Lucarella [Mon, 20 Sep 2010 23:22:36 +0000]
Add (disabled) debug print of overriden options
Leandro Lucarella [Mon, 20 Sep 2010 00:04:50 +0000]
Make realloc() a little more readable
This is only an aesthetic change, some unneeded indentation is remove by
removing some "else" blocks when return statements ensures that path
wouldn't be followed anyway.
Leandro Lucarella [Sun, 19 Sep 2010 23:53:19 +0000]
Abstract how we know if a collection is in progress
Leandro Lucarella [Thu, 16 Sep 2010 03:35:35 +0000]
Accommodate the heap size to the working set size
The GC can have a lot of pressure if a collection recovers very little
memory, but enough to fulfill the current memory request, causing a lot of
collections with too little gain (when the key to a efficient GC is
recover the bigger amount of memory with as little work as possible).
This effect is greatly reduced when using eager allocation, because
eventually the GC will allocate more memory, but heap minimization can
trigger this effect again.
This patch adds an option, min_free, which specifies the minimum
percentage of heap that should be free after a collection. If the free
heap is less than min_free% of the total heap, the GC will allocate a new
pool, big enough to fulfill that requirement. If the free heap is bigger
than min_free% of the total heap, the GC will try to release some pools to
the OS to keep the heap occupancy near min_free%.
Leandro Lucarella [Wed, 15 Sep 2010 01:15:29 +0000]
Try to keep the memory usage low more aggressively
Memory usage is minimize()d only when a big allocation is done. This could
be problematic for applications that only perform small objects
allocation, as their memory usage could grow a lot, specially if the eager
allocation option is used.
Trigger memory minimization even for small objects allocation to avoid
that pathological case.
Leandro Lucarella [Wed, 15 Sep 2010 00:14:17 +0000]
Make bigAlloc() a little bit more readable
bigAlloc() is implemented a weird state machine that even have some paths
that are not only unreadable, but useless.
The new implementation is still not ideal, but at least a human can read
it and even understand what's doing.
Leandro Lucarella [Tue, 14 Sep 2010 23:28:35 +0000]
Use a function to round up sizes to PAGESIZE
Leandro Lucarella [Tue, 14 Sep 2010 23:14:10 +0000]
Add pre_alloc configuration option
The new option is used to pre-allocate pools of memory at program start.
This could be useful if your program need to load a starting working set
into memory before doing the actual work, as it might avoid some useless
initial collections while the program is just doing allocation but no
memory is yet freed.
The option takes a numeric parameter indicating the size of the initial
pool in MiB. It can optionally specify an arbitrary number of pools to
create by appending "x" and the numeric value of the number of pools. Each
pool will have the specified pool size.
For exmaple, pre_alloc=50 will allocate an initial pool of 50 MiB, while
pre_alloc=5x10 will allocate 10 pools with 5 MiB each. Anything that can't
be parsed correctly (like "", "5x", "5a10", "5x10x", etc.) make the
starts with no pre-allocated memory as usual.
Leandro Lucarella [Thu, 9 Sep 2010 23:05:55 +0000]
Don't segfault if stats files can't be created
If any stats files (used by collect_stats_file and malloc_stats_file)
can't be created, the program segfauls trying to use a null FILE*.
It's extremely impolite to cause an strange segfault because of this, and
since there is no sensible error reporting mechanism either, we just
ignore those options if the selected files are not writable, as we do
with unknown options or any other wrong option parameters.
Leandro Lucarella [Thu, 9 Sep 2010 21:33:56 +0000]
Change 'no_fork' option to 'fork'
Even when using fork is a little more verbose, since fork=0 must be
specified to disable forking, it's more consistent with other options.
Leandro Lucarella [Thu, 9 Sep 2010 03:17:16 +0000]
Add eager allocation support when fork()ing
Eager allocation consist in allocating a new pool when a collection is
triggered (because an allocation failed to find enough free space in the
current pools). This enables the mutator to keep running when the mark
phase run in a fork()ed process in parallel. This completes the concurrent
GC, decreasing the maximum pause time (not only the stop-the-world time)
dramatically (almost by 2 orders of magnitude).
As a side effect, the total run-time is greatly reduced too because the GC
pressure is reduced due to the extra allocated pools. The number of
collections needed by a program can be reduced 3 to 9 times depending on
the load, which explains the total run-time reduction, even for
single-core environments.
To allow the mutator run in parallel with the mark phase, the freebits
of free pages must be set for *all* the possible block sizes, not only for
the start of the page, because the freebits are used as a starting point
for mark bits and the bin size of a freed page can be changed *after* the
mark phase is started, resulting in an inconsistency between mark bits and
free bits. Pools allocated during a parallel mark set all the mark bits
to avoid the sweep phase freeing those pages after the mark is done.
Leandro Lucarella [Mon, 6 Sep 2010 02:16:26 +0000]
Remove unused code for buggy OSs
Leandro Lucarella [Sun, 5 Sep 2010 22:01:52 +0000]
Build the freebits bit set incrementally
As a side effect, we can't copy anymore the freebits as the starting mark
bits because we don't keep the freebits updated so precisely, doing so
would need a bit more work.
Since the freebits can be constructed in the mark phase, which we can do
in parallel, this might not be as useful as thought at first.
Leandro Lucarella [Mon, 6 Sep 2010 02:02:45 +0000]
Avoid output duplication because of FILE* buffers
FILE* buffers are duplicated when fork()ing, and at program exit(), the C
library flushes the FILE* buffers, resulting in duplicated output.
To avoid this we flush all FILE* buffers before fork()ing.
Leandro Lucarella [Sat, 4 Sep 2010 01:13:06 +0000]
Avoid redundant checks for finals bits
In changeset
b28fd72842fc9ce935bed74f7b2ba79f9cc59711 (Run the mark phase
in a fork()ed process) we inadvertently changed the lazy allocation of
finals bit set to eager allocation.
This change left a lot of finals bit set initialization checks that are
not really needed using eager allocation.
We remove this redundant check as we decided to go with the eager
allocation to trade some space for a little more speed, as the extra
checks takes time and is very rare that a whole pool doesn't have any
blocks that need finalization, making the space saving very rare too.
Lazy allocation can impact too in locality of reference, as is more likely
that all the bit sets are allocated near in space, except for the lazyly
allocated one.
Leandro Lucarella [Sat, 4 Sep 2010 01:14:08 +0000]
Improve GCBits invariant
Leandro Lucarella [Sat, 4 Sep 2010 01:07:30 +0000]
Clean the cache in the sweep phase
Leandro Lucarella [Sat, 4 Sep 2010 01:04:46 +0000]
Ensure getInfo() gets a valid base pointer
Leandro Lucarella [Thu, 2 Sep 2010 02:50:28 +0000]
Sync the pool block size cache properly
There are some places where the block size change, but the pool's block
size cache is not updated, which may cause wrong size reporting later.
This patch clears and updates the cache in the place where this
synchronization was missing.
Leandro Lucarella [Sun, 29 Aug 2010 05:29:59 +0000]
Use integer division to calculate the bit position
Any decent compiler can optimize the division by a power of 2 to a shift.
Leandro Lucarella [Sun, 29 Aug 2010 05:29:37 +0000]
Avoid double initialization of stack variable
Leandro Lucarella [Sun, 29 Aug 2010 05:28:45 +0000]
Use bool instead of uint for a boolean variable
Leandro Lucarella [Sun, 29 Aug 2010 02:33:13 +0000]
Cache B_FREE pages also when marking
The small page cache used when marking was only exploited by B_PAGE and
B_PAGEPLUS pages, while B_FREE pages can benefit from it too.
Leandro Lucarella [Sat, 28 Aug 2010 02:46:25 +0000]
Fix some style issues
No functional changes are done, only bogus casts and commented out code
get removed and style issues fixed.
Leandro Lucarella [Sat, 28 Aug 2010 02:45:02 +0000]
Store a pointer to the pool in the free_list
Since the smallest bin size is big enough to store 2 pointers, the free
list is constructed storing the pointer to the pool the bin belongs to,
so we don't have to find the pool when operating with the free bin.
To make this work, the pools have to be stored outside the DynArray (and
store there only pointers) because when inserting sorted, moved pool
addresses are changed, and the stored pool pointer in the free list break.
Leandro Lucarella [Thu, 26 Aug 2010 21:53:19 +0000]
Check the sentinel invariant in release builds too
When building a release, the sentinel invariant isn't checked, even if the
"sentinel" option is used. This patches checks the invariant always when
the option is activated, abort()ing the program if the invariant fails.
Leandro Lucarella [Thu, 26 Aug 2010 01:21:59 +0000]
Improve opts unit tests
Leandro Lucarella [Thu, 26 Aug 2010 01:21:32 +0000]
Run the mark phase in a fork()ed process
This is the first big step towards a concurrent GC. The mark phase is ran
in a fork()ed process and the world is only stopped to do the fork()
because we need each thread to dump the CPU registers into the stack to
be scanned.
Forking is controlled via the option "no_fork" (which is false by default).
If not enough support from the underlying OS is found (i.e. no fork() or
no shared memory) or if fork() fails, the mark phase fallback to run in
the same process as the mutator (as it was done before this patch).
The mark and freebits bitmaps are shared between the two processes to
communicate the results of the mark phase. The freebits could not be
shared, but in that case the freebits should be set in the mutator process,
making pauses longer. Freebits should be revisited though.
Leandro Lucarella [Mon, 23 Aug 2010 00:40:17 +0000]
Move marking phase to a separate function
Leandro Lucarella [Mon, 23 Aug 2010 00:38:50 +0000]
Move sweeping phase to a separate function
Leandro Lucarella [Fri, 20 Aug 2010 01:48:57 +0000]
Allow testing for fork() availability
The concurrent GC will fork() to run the collection, so it need to know if
the underlying OS supports it. This patch renames the alloc module to os
to group all needed OS abstractions in one module.
Leandro Lucarella [Fri, 20 Aug 2010 01:40:30 +0000]
Allow mapping shared memory to allocate bitsets
The concurrent GC will fork() to run the collection, so it need to share
the mark bits to let the original process know the results of the mark
phase.
Leandro Lucarella [Thu, 19 Aug 2010 23:17:49 +0000]
alloc: Use tango to access OS-API
Leandro Lucarella [Mon, 16 Aug 2010 15:41:54 +0000]
Remove unneeded static attribute
Leandro Lucarella [Fri, 6 Aug 2010 02:46:55 +0000]
Revert "Skip non-scanneable words in chunks"
This reverts commit
5578146600d4ace17878e3b010aa09efdb202fb4, because
doing so many extra tests proved to be a "pessimization" in practice.
Avoiding the extra bit operations does help a little though, so that
change is not reverted.
Leandro Lucarella [Mon, 2 Aug 2010 03:03:43 +0000]
Add a one element cache to Pool.findSize()
Caching the last findSize() result for big objects gives a huge saving in
programs with a lot of array appending.
Leandro Lucarella [Mon, 2 Aug 2010 02:22:46 +0000]
Do a binary search in findPool()
Since findPool() is sorted, a binary search is more appropriate than
a linear search.
Leandro Lucarella [Mon, 2 Aug 2010 01:01:50 +0000]
Skip non-scanneable words in chunks
Leandro Lucarella [Mon, 2 Aug 2010 00:59:26 +0000]
Return the real size that can be used in getInfo()
getInfo() was not aware of the sentinel, returning a size larger than the
size usable by the user. There are probably more places where this
happens.
Leandro Lucarella [Sun, 1 Aug 2010 18:14:10 +0000]
Remove duplicated code in getInfo()
getInfo() just groups the information obtained via findBase(), findSize()
and getAttr(), so use those functions instead of duplicating the code.
Leandro Lucarella [Sun, 1 Aug 2010 18:09:14 +0000]
Minimize the use of findPool()
findPool() is one of the most used functions in the GC, usually taking 15%
of the GC time. This patch minimizes it use by converting the functions
findBase() and findSize() to Pool methods, avoiding calling findPool()
twice (in most cases, when calling findBase() or findSize() we already
know the pool).
Leandro Lucarella [Sat, 31 Jul 2010 16:44:06 +0000]
Group extern (C) declarations
Leandro Lucarella [Sat, 31 Jul 2010 16:37:45 +0000]
Move the locking to the C interface
Leandro Lucarella [Sat, 31 Jul 2010 02:43:53 +0000]
Convert methods to free functions
Making the GC an object makes no sense, since you can't instantiate it
more than once, it just make the code unnecessarily extra indented.
The GC struct now only have attributes (several renamed) and they are only
grouped for clarity (and to make easier to calculate the GC memory
overhead). All methods are converted to free functions that uses a global
instance of the GC struct.
Leandro Lucarella [Fri, 30 Jul 2010 00:15:36 +0000]
Merge iface.d in gc.d
There is no reason to be in different files, really, it just makes things
harder.
Leandro Lucarella [Thu, 29 Jul 2010 23:55:36 +0000]
Rename the global lock and remove staticness
It doesn't make any sense to make it static, since all the GC struct can't
be instantiated more than once anyway.
Leandro Lucarella [Thu, 29 Jul 2010 03:59:49 +0000]
Unify GC class and Gcx struct
For some unknown reason, the GC implementation was divided in 2, a GC
class and a Gcx struct. This patch unify them in a GC struct. Only code is
moved (and stats are adjusted).
Leandro Lucarella [Wed, 28 Jul 2010 22:43:55 +0000]
Remove obsolete unused variables
Leandro Lucarella [Wed, 28 Jul 2010 20:41:36 +0000]
Make heap precise scanning optional
Now D_GC_OPTS accepts a new boolean option: conservative. When true, the
heap is scanned conservatively, even when type information is available.
The option defaults to false.
Leandro Lucarella [Wed, 28 Jul 2010 20:39:30 +0000]
opts: Fix parsing a single boolean option without args
Leandro Lucarella [Wed, 28 Jul 2010 19:36:16 +0000]
stats: Refactor code to avoid duplication
Leandro Lucarella [Wed, 28 Jul 2010 04:20:45 +0000]
stats: Add more type information to malloc logging
Leandro Lucarella [Wed, 28 Jul 2010 03:12:10 +0000]
stats: Log the pointer to the allocated memory
Leandro Lucarella [Wed, 28 Jul 2010 02:40:46 +0000]
Make heap scanning precise
This patch[1] is based on the patch provided by Vincent Lang (AKA wm4),
which was based on a patch[2] by David Simcha, both published in the bug
3463[3]. The patch needs a patched Tango runtime (which is part of the
patch[1] for the GC) and a patched[4] DMD to work.
The patched DMD passes type information about pointer locations to
allocations, a PointerMap. The PointerMap has a member bits, which is an
array of size_t elements. The first element is the T.sizeof / size_t,
where T is the type being allocated. The next elements are bitmask
indicating words that should be scanned. After that, the next elements
store another bitmask with the information about pointers. A moving
collector could change the value of words marked as pointers, but not
words that should only be scanned (that words are not guaranteed to be
pointers), so a block could only be moved if it's only reachable by
pointers. The pointers information is not used yet by this patch, only the
scan bits are used.
The precise scanning algorithm is extremely simple, and needs a lot of
optimization, this patch was kept simple on purpose. Optimizations will
follow in separated patches.
A pointer to the type information is stored at the end of the allocated
blocks (only if the blocks should be scanned, blocks marked with NO_SCAN
don't need the type information). This wastes some space, and space that
then have to be scanned, which tends to decrease the performance quite
a bit.
[1] http://d.puremagic.com/issues/attachment.cgi?id=696
[2] http://d.puremagic.com/issues/attachment.cgi?id=489
[3] http://d.puremagic.com/issues/show_bug.cgi?id=3463
[4] http://d.puremagic.com/issues/attachment.cgi?id=700
Leandro Lucarella [Wed, 21 Jul 2010 16:45:27 +0000]
Improve variable names for block attributes
Leandro Lucarella [Wed, 9 Jun 2010 22:47:05 +0000]
Add statistics collection
Statistics meassure this metrics.
For each collection:
* Time spent in the malloc that triggered the collection.
* Time spent with the world stopped.
* Time spent doing the collection.
* Memory info before and after the collection: used, free, overhead and
wasted memory. Used is the memory used by the mutator, free is the
memory the mutator can request, overhead is the memory used by the
collector itself and wasted is memory that is not used by either the
mutator or collector and that can't be requested by the mutator either.
For each malloc() call:
* Time spent.
* Amount of memory requested.
* Attributes of the requested memory.
* A flag to tell if this call triggered a collection.
Statistics collection is controlled via the D_GC_OPTS environment
variable. To collect malloc statistics, use the option
malloc_stats_file, the value is the path to the file where to store the
malloc statistics (the contents will be replaced). To collect garbage
collection statistics, use the option collect_stats_file, the value is
the path to the file where to store the malloc statistics (the contents
will be replaced). The generated files are in CSV format and have
headers that make them self explanatory.
Leandro Lucarella [Mon, 19 Jul 2010 16:24:30 +0000]
Make the GC configurable at runtime via env vars
The GC offers a couple of options to debug memory problems, but they are
selectable only at compile-time. Being the GC part of the compiler
runtime, is not very common for the user to recompile the GC when it has
a memory problem, so making this option available always is very
desirable.
This patch allows configuring the GC via environment variables. 4 options
are available: sentinel, mem_stomp, verbose and log file. Only the first
2 are implemented right now.
For example, to check a program using memory stomping and a sentinel, you
can run it like this (using sh):
$ D_GC_OPTS=mem_stop=1:sentinel
As you can see, the value is optional for boolean options.
Leandro Lucarella [Sat, 3 Jul 2010 02:31:57 +0000]
Call memset() only for large enough chunks of data
Calling memset() for small memory chunks can be expensive compared to
a simple loop.
Leandro Lucarella [Wed, 30 Jun 2010 13:58:46 +0000]
Use a DynArray to store the memory pools
Leandro Lucarella [Wed, 30 Jun 2010 13:58:20 +0000]
Use a few more initial elements by default
Moving from 4 to 16 can improve the performance a little for short lived
programs.
Leandro Lucarella [Wed, 30 Jun 2010 13:57:18 +0000]
Use a custom dynamic array to store roots and ranges
Leandro Lucarella [Tue, 29 Jun 2010 00:09:55 +0000]
Remove Gcx destructor
There is no point on freeing memory as the OS will do it for us.
Leandro Lucarella [Tue, 22 Jun 2010 03:39:50 +0000]
Comment why we avoid calling free with null
Even when free() can be called with a null pointer, the extra call might
be significant. On hard GC benchmarks making the test for null in the GC
code (i.e. avoiding the free() call) can reduce the GC time by almost ~5%.
Leandro Lucarella [Mon, 21 Jun 2010 23:13:51 +0000]
Remove debug LOGGING code
This code will be superseded by the statistic collection code, and it was
unmantained and very probably broken (for example, the file and line
number was never filled in).
Leandro Lucarella [Wed, 9 Jun 2010 22:48:51 +0000]
Add VIM modeline to avoid style errors
Leandro Lucarella [Wed, 9 Jun 2010 22:41:47 +0000]
Use tango bindings to C standard library functions
As we need to use more libraries it became less practical to maintain our
own set of bindings, and since the GC only works with Tango, it makes
sense to just use Tango bindings.
Leandro Lucarella [Sun, 30 May 2010 23:13:06 +0000]
Remove PRINTF debug statements
Leandro Lucarella [Sun, 30 May 2010 01:45:06 +0000]
Fix minor coding style issues
Leandro Lucarella [Sun, 30 May 2010 01:44:49 +0000]
Use more explicit imports
Leandro Lucarella [Sat, 29 May 2010 23:21:30 +0000]
Add missing import for DMD
Leandro Lucarella [Sat, 29 May 2010 23:11:07 +0000]
Move the modules to package rt.gc.cdgc
Tango 0.99.9 uses this package scheme, so we follow it for easier
integration.
Leandro Lucarella [Sat, 29 May 2010 23:02:24 +0000]
Minor formatting fixes
Leandro Lucarella [Sat, 29 May 2010 22:54:04 +0000]
Add weak reference support for Tango 0.99.9
Leandro Lucarella [Thu, 21 Jan 2010 02:38:38 +0000]
Remove the MULTI_THREADED version
This will be an inherently concurrent GC, so having a non-threaded version
of it makes no sense. Even more, I think the non-threaded doesn't even
compile.
Leandro Lucarella [Sun, 17 Jan 2010 21:29:52 +0000]
Remove (un)committed pages distinction
This distinction is only made by Windows, and adds an extra complexity
that probably doesn't worth it (specially for other OSs, where this adds
a little overhead too, in both space and time).
Other OSs (like Linux) even do all the committing automatically, under the
hood, see:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;\
f=Documentation/vm/overcommit-accounting;hb=HEAD
Leandro Lucarella [Sun, 17 Jan 2010 00:48:12 +0000]
Make allocation functions that can fail return bool
There is no point on returning int, since no error code is returned, just
failure or success.
Leandro Lucarella [Sun, 17 Jan 2010 00:14:07 +0000]
Fix spacing style
Leandro Lucarella [Sun, 17 Jan 2010 00:01:04 +0000]
Declare public allocation API
Declare the public API and move comments to the declaration, to avoid
duplication and let ddoc document the functions for all versions.
Leandro Lucarella [Sat, 16 Jan 2010 23:45:22 +0000]
Remove valloc() allocation method
Almost any system that support valloc() supports mmap(), and being
a deprecated function, it makes not much sense to maintain it as an
allocation method.
Leandro Lucarella [Sat, 16 Jan 2010 23:43:54 +0000]
Make sure MAP_ANON exists when using mmap()
Leandro Lucarella [Sat, 16 Jan 2010 23:40:15 +0000]
Remove commented out code
Leandro Lucarella [Sat, 16 Jan 2010 00:29:31 +0000]
Remove Tango dependency
To avoid Tango dependency, we need to write our own C-API interface. This
is done in the new gc.libc module. In the future, maybe this module will
use Tango or Phobos accordly, but for now we stay free of dependencies (at
the expense of some extra work).
Leandro Lucarella [Sat, 16 Jan 2010 00:34:25 +0000]
Remove debug version THREADINVARIANT
The code seemed to be broken, since the self thread ID was stored at
initialization and then asserted that the GC always run from that thread,
which seems far from reality (the GC can be invoked by any thread).
The PRINTF version now doesn't print the current thread ID either.
Leandro Lucarella [Thu, 14 Jan 2010 02:29:33 +0000]
Rename module names to make more sense
Leandro Lucarella [Sun, 3 Jan 2010 18:20:21 +0000]
Make gc a package
Leandro Lucarella [Thu, 24 Dec 2009 23:22:19 +0000]
Add a "clean" target to the Makefile
Leandro Lucarella [Thu, 24 Dec 2009 23:20:57 +0000]
Put built stuff in a separated build directory
Leandro Lucarella [Thu, 24 Dec 2009 23:16:01 +0000]
Add a wrapper script to run programs using CDGC
Leandro Lucarella [Thu, 24 Dec 2009 23:10:28 +0000]
Remove redundant "private" from import statements
Since a long while ago, imports are "private" by default.
Leandro Lucarella [Thu, 24 Dec 2009 22:56:49 +0000]
Concurrent D Garbage Collector initial commit
The Concurrent D Garbage Collector (CDGC) is based on the "basic" garbage
collector from the Tango runtime. This first commit is a copy of this GC,
as it is in Tango 0.99.8.
The CDGC is designed only for Linux, at least for now.