Following the Elephant: June 2015

Wednesday, June 24, 2015

Garbage In, Garbage In, Garbage In

Many projects in the Apache ecosystem run Java. One of the places developers spend time in when dealing with performance issues is the Java Virtual Machine's (JVM) Garbage Collection options. When the heap becomes full, garbage is collected.

In this past, I have seen that .NET apps that explicitly call the garbage collector improved performance, especially when dealing with black-box code that doesn't dispose of objects itself nicely or bloats memory due to poor design. I have also seen where it will destroy performance for every .NET application on the machine.

In .NET 4.6 RC,

Enhancements to garbage collection (GC)

The GC class now includes TryStartNoGCRegion and EndNoGCRegion methods that allow you to disallow garbage collection during the execution of a critical path.

A new overload of the GC.Collect(Int32, GCCollectionMode, Boolean, Boolean) method allows you to control whether both the small object heap and the large object heap are swept and compacted or swept only.

So it seems people are still trying to trick the garbage truck to show up on the wrong day to pick up that rusty mattress or old toilet, or make sure that the garbage truck doesn't pass by when they're in the middle of running out the door with a million Glad bags.

http://stackoverflow.com/questions/118633/whats-so-wrong-about-using-gc-collect

At this point, suppose that performance plays a fundamental role and the slightest alteration in the program's flow could bring catastrophic consequences. Object creation is then reduced to the minimum possible by using object pools and the such but then, the GC chimes in unexpectedly and throws it all away, and someone dies.

Well that got dark really fast, stackoverflow.

Oracle has a good document around the concepts of the Heap and the Nursery. When the nursery fills up, the older ones leave to public school. When public school fills up, the oldest are forced out into the real world.

https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html

Databricks, the Spark folks, and Intel, recently posted a great article about how GC works with Spark and how to tune Spark instances for optimized JVM garbage collection which inspired (and augmented some content for) this post.

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

$500 in Google Cloud Credit with Free MapR Hadoop Training

What does MapR get from Google? $110 million in capital financing.

What do you get a Google Cloud Engine $500 free credit with MapR training? Apparently quite a bit...

Compute Engine

434.524 total hours per month
VM class: Regular
Instance type: n1-highmem-16
Region: United States
Total Estimated Cost: $438.00

SSD storage: 0 GB
Storage: 100 GB
Snapshot storage: 0 GB
$4.00

Egress - Americas/EMEA: 200 GB
Egress - Asia/Pacific: 0 GB
Egress - Australia: 0 GB
Egress - China: 0 GB
Google Cloud Interconnect United States: 0 GB
Google Cloud Interconnect Europe: 0 GB
Google Cloud Interconnect Asia/Pacific: 0 GB
Egress to a different Zone in the same Region: 0 GB
Egress to a different Region within the US: 0 GB
$24.00

Monthly total: $466.00

If you don't want 128GB of ram and 5 servers in your cluster, you could be a peon and buy some pre-emptible Instances to go the cheaper route.

https://cloud.google.com/compute/docs/instances/preemptible

Hadoop / HBase / Drill Training Link here...
https://www.mapr.com/company/press-releases/mapr-collaborates-google-cloud-platform-offer-500-credit-resources-mapr-fre-0

Sandbox VM download here
https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill

Search This Blog

Wednesday, June 24, 2015

Garbage In, Garbage In, Garbage In

$500 in Google Cloud Credit with Free MapR Hadoop Training

Monthly total: $466.00