Search This Blog

Wednesday, June 24, 2015

Garbage In, Garbage In, Garbage In

Many projects in the Apache ecosystem run Java.  One of the places developers spend time in when dealing with performance issues is the Java Virtual Machine's (JVM) Garbage Collection options.  When the heap becomes full, garbage is collected.

In this past, I have seen that .NET apps that explicitly call the garbage collector improved performance, especially when dealing with black-box code that doesn't dispose of objects itself nicely or bloats memory due to poor design.  I have also seen where it will destroy performance for every .NET application on the machine.

In .NET 4.6 RC,

So it seems people are still trying to trick the garbage truck to show up on the wrong day to pick up that rusty mattress or old toilet, or make sure that the garbage truck doesn't pass by when they're in the middle of running out the door with a million Glad bags.

http://stackoverflow.com/questions/118633/whats-so-wrong-about-using-gc-collect

At this point, suppose that performance plays a fundamental role and the slightest alteration in the program's flow could bring catastrophic consequences. Object creation is then reduced to the minimum possible by using object pools and the such but then, the GC chimes in unexpectedly and throws it all away, and someone dies.

Well that got dark really fast, stackoverflow.

Oracle has a good document around the concepts of the Heap and the Nursery.  When the nursery fills up, the older ones leave to public school.  When public school fills up, the oldest are forced out into the real world.

https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html

Databricks, the Spark folks, and Intel, recently posted a great article about how GC works with Spark and how to tune Spark instances for optimized JVM garbage collection which inspired (and augmented some content for) this post.

https://databricks.com/blog/2015/05/28/tuning-java-garbage-collection-for-spark-applications.html

$500 in Google Cloud Credit with Free MapR Hadoop Training

What does MapR get from Google?  $110 million in capital financing.

What do you get a Google Cloud Engine $500 free credit with MapR training?  Apparently quite a bit...

Compute Engine
5 x Servers

  • 434.524 total hours per month
  • VM class: Regular
  • Instance type: n1-highmem-16
  • Region: United States
  • Total Estimated Cost: $438.00
Persistent Disk
  • SSD storage: 0 GB
  • Storage: 100 GB
  • Snapshot storage: 0 GB
  • $4.00
GCE Network Bandwidth
  • Egress - Americas/EMEA: 200 GB
  • Egress - Asia/Pacific: 0 GB
  • Egress - Australia: 0 GB
  • Egress - China: 0 GB
  • Google Cloud Interconnect United States: 0 GB
  • Google Cloud Interconnect Europe: 0 GB
  • Google Cloud Interconnect Asia/Pacific: 0 GB
  • Egress to a different Zone in the same Region: 0 GB
  • Egress to a different Region within the US: 0 GB
  • $24.00

Monthly total: $466.00

If you don't want 128GB of ram and 5 servers in your cluster, you could be a peon and buy some pre-emptible Instances to go the cheaper route.

Hadoop / HBase / Drill Training Link here...
https://www.mapr.com/company/press-releases/mapr-collaborates-google-cloud-platform-offer-500-credit-resources-mapr-fre-0

Sandbox VM download here
https://www.mapr.com/products/mapr-sandbox-hadoop/download-sandbox-drill