Elastic Search Best practices

These are the self-notes from managing 100+ node ES cluster, reading through various resources and a lot of production incidents due to unhealthy ES.

Memory

  • Always choose ES_HEAP_SIZE 50% of the total available memory. Sorting and aggregations both can be memory hungry, so enough heap space to accommodate these is required. This property is set inside the /etc/init.d/elasticsearch file.
  • A machine with 64 GB of RAM is ideal; however, 32 GB and 16 GB machines are also common. Less than 8 GB tends to be counterproductive (you end up needing smaller machines), and greater than 64 GB has problems in pointer compression.

CPU

  • Choose a modern processor with multiple cores. If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offer will far outweigh a slightly faster clock speed. The number of threads is dependent on the number of cores. The more cores you have, the more threads you get for indexing, searching, merging, bulk, or other operations.

Disks

  • If you can afford SSDs, they are far superior to any spinning media. SSD-backed nodes see boosts in both querying and indexing performance.
  • Avoid network-attached storage (NAS) to store data.

Network

  • The faster the network you have, the more performance you will get in a distributed system. Low latency helps to ensure that nodes communicate easily, while a high bandwidth helps in shard movement and recovery.
  • Avoid clusters that span multiple data centers even if the data centers are collocated in close proximity. Definitely avoid clusters that span large geographic distances.

General consideration

  • It is better to prefer medium-to-large boxes. Avoid small machines because you don’t want to manage a cluster with a thousand nodes, and the overhead of simply running Elasticsearch is more apparent on such small boxes.
  • Always use a Java version greater than JDK1.7 Update 55 from Oracle and avoid using Open JDK.
  • A master node does not require much resources. In a cluster with 2 Terabytes of data having 100s of indexes, 2 GB of RAM, 1 Core CPU, and 10 GB of disk space is good enough for the master nodes. In the same scenario, the client nodes with 8 GB of RAM each and 2 Core CPUs is a very good configuration to handle millions of requests. The configuration of data nodes is completely dependent on the speed of indexing, the type of queries, and aggregations. However, they usually need very high configurations such as 64 GB of RAM and 8 Core CPUs.

Some other important configuration changes

  • Assign Names: Assign the cluster name and node name.
  • Assign Paths: Assign the log path and data path.
  • Recovery Settings: Avoid shard shuffles during recovery. The recovery throttling section should be tweaked in large clusters only; otherwise, it comes with very good defaults.
    Disable the deletion of all the indices by a single command:
    action.disable_delete_all_indices: false
  • Ensure by setting the following property that you do not run more than one Elasticsearch instance from a single installation:
    max_local_storage_nodes: “1”
    Disable HTTP requests on all the data and master nodes in the following way:
    http.enabled: false
  • Plugins installations: Always prefer to install the compatible plugin version according to the Elasticsearch version you are using and after the installation of the plugin, do not forget to restart the node.
  • Avoid storing Marvel indexes in the production cluster.
  • Clear the cache if the heap fills up when the node start-up and shards refuse to get initialized after going into red state This can be done by executing the following command:
  • To clear the cache of the complete cluster:
    curl -XPOST ‘http://localhost:9200/_cache/clear
  • To clear the cache of a single index:
    curl -XPOST ‘http://localhost:9200/index_name/_cache/clear
  • Use routing wherever beneficial for faster indexing and querying.

[Performance] : What does CPU% usage tell us ?

When you come across a system which is misbehaving, majority of the times the first metrics that we look at is CPU usage. But do we really understand what CPU usage of a system tells us ? In this article let us try and understand what X % usage of a system really means.

One of the easy ways to check on CPU is “top” command.

The “%Cpu(s)” metrics seen above is a combination of different components.

  • us – Time spent in user space
  • sy – Time spent in kernel space
  • ni – Time spent running niced user processes (User defined priority)
  • id – Time spent in idle operations
  • wa – Time spent on waiting on IO peripherals (eg. disk)
  • hi – Time spent handling hardware interrupt routines. (Whenever a peripheral unit want attention form the CPU, it literally pulls a line, to signal the CPU to service it)
  • si – Time spent handling software interrupt routines. (a piece of code, calls an interrupt routine…)
  • st – Time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)

 Out of all the breakdowns above, we usually concentrate mainly on User Time (us) , System time(sy) and IO wait time (wa). User time is the percentage of time the CPU is executing the application code and System time is the percentage of time the CPU is executing the kernel code. It is important to note that System time is related to application time; if application performs IO for example, the kernel will execute the code to read file from disk. Also, any wait seen in IO will reflect in IO wait time. So us%, sy% and wa % are related.

 Now let’s see if we understand this correctly on a whole.

My goal as a Performance Engineer would be to drive the CPU usage as high as possible for as short a time as possible. Does that sound far away from the “best-practice” ? Wait, hold your thought there.

The first thing to know is, the CPU usage reported by any command is always an average over an interval of time. If the CPU consumed by an application is 30% for 10minutes, the code can be tuned to make it consume 60% for 5minutes. Do you see what I mean by “driving the CPU as high as possible for as short time as possible”? This is doubling the performance. Did the CPU usage increase ? Sure, Yes. But is that a bad thing ? No. CPU is sitting there waiting to be used. Use it, improve the performance. High CPU usage is not a bad thing all the time. It may just mean that your system is used at its full potential. A good ROI. However, if you have your run-queue length increasing, where requests are waiting for cpu, then it definitely needs your attention.

In linux systems, the number of threads that are able to run (i.e, not blocked on IO or sleeping etc) are referred to as run-queue. You can check this by running “vmstat 1” command. The first number in each line refers to run-queue.

If the count of the threads in the above output is more than the available CPU’s (count in hyper-threading if enabled), that means the threads are waiting for CPU and the performance will be less the optimal. Although a higher number is ok for a brief amount of time, but if the run-queue length is high for a significant amount of time, it is an indication that system is overloaded.

Conclusion :

  • High CPU usage of a system is not a bad sign all the time. CPU is available to be used. Use it and improve the performance of the running application.
  • If run-queue length is high for a significant amount of time, that mean the system is overloaded, and needs optimizations.

Weekly Bullet #27 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • Different states of Java Threads and their transitions. – Link
  • A quick look into Sorting in python – RealPython site link (3mins)
  • DevOps in one picture:
  • A cheat sheet to “When to use which collection in java” – here
source: http://www.sergiy.ca
  • A great talk on internals of List and Tuple in Python – YouTube (28mins)

Non-Technical :

  • A crisp explanation on Manager vs Director vs VP – link
TL;DR – Summary from resource link
  • How to learn complex things quickly – Link
  • Bayes’ Theorem and its trap. An intriguing play of numbers – YouTube link (10mins)
  • Extract from a book (a rather long one) :

Imagine that you are having an out-of-body experience, observing yourself on an operating table while a surgeon performs open heart surgery on you. That surgeon is trying to save your life, but time is limited so he is operating under a deadline—a literal deadline.

How do you want that doctor to behave? Do you want him to appear calm and collected?

Or do you want him sweating and swearing?
Do you want him behaving like a professional, or like a “typical developer”?

The Clean Coder by Robert C. Martin

Weekly Bullet #26 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • “Performance checklist for SREs” – By Brendan Gregg at SREcon16 . YouTube link (1hr)
  • Resource list for Beginner to Pro in Python. Link
  • Navigation in IntelliJ IDEA – this could save so much time once all short cuts are know. YouTube link (8mins)
  • Monitoring SRE’s Golden Signals – The metrics that matter and the ones we absolutely need to monitor. Link
  • All The Important Features and Changes in Python 3.10. Link
  • “What makes a Great Software Engineer?” – An IEEE paper on non-technical qualities of a great Software Engineer. Link

Non-Technical :

  • [Highly Recommended] : Henry Rollins: The One Decision that Changed My Life Forever | Big Think – YouTube link (7mins)
  • Now that most of use working from home, here is mynoise.net for creating Quiet Animated Atmospheres. How to use – here
  • A great site for some fun riddles – here
  • Extract from a book :

“A professional is someone who may not have all the answers, but thoroughly studies their craft and seeks to hone their skills. A professional will freely admit when they don’t know the answer, but you can count on a professional to find it.”

Soft Skill by John Sonmez

Weekly Bullet #25 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • Amazon S3 on it’s 15th Birthday — It is Still Day 1 after 5,475 Days & 100 Trillion Objects. An article here.
  • A detailed Performance comparison of different programming languages / command-lines. Link here. (If you can’t read full article, go through the conclusion for insight)
  • The Amazon VP & CTO, Werner Vogels sits with Tom Killalea to discuss designing for evolution at scale. Article here.
  • ShortcutFoo is a site for spaced repetition of helpful shortcuts across tech stacks. Check it here.

Non-Technical :

  • Flamshot, an amazing multi-functional screenshot capturing tool. Check it here . Download link.
  • (Highly Recommended) : The context of “Why’s!” by Richard Feynman. Youtube link [Length – 7min]
  • Tim Ferriss podcast with Jordan Peterson(Canadian professor of psychology) as a guest. You can definitely learn new things here – link. [Youtube. Length – 1hr 20mins]
  • Extract from a book:

“One lesson I’ve learned is that if the job I do were easy, I wouldn’t derive so much satisfaction from it. The thrill of winning is in direct proportion to the effort I put in before. I also know, from long experience, that if you make an effort in training when you don’t especially feel like making it, the payoff is that you will win games when you are not feeling your best. That is how you win championships, that is what separates the great player from the merely good player. The difference lies in how well you’ve prepared.”

Rafael Nadal in Rafa

Weekly Bullet #24 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • An overview of iftop – a great network traffic visual tool. – here , also man page here
  • Rust is becoming one of the most loved languages. Here is an Illustrated Note about WTF is Rust – link
  • How They SRE” – best practices, tools, techniques, and culture of SRE adopted by the leading technology or tech-savvy organizations.- link
  • A single stop to find all upcoming Tech conferences in 2021 – link
  • A wiki on Unix Toolbox with all the commands and tasks useful for daily dive in to linux world. – link
  • “Python Tricks I cannot live without” – link
  • I am sure most of you follow HackerNews. Here is a great tool built using FlameGraphs to navigate through big threads on HN. – Link1 , Link2

Non-Technical :

  • A cool site where you can select the part of the body and find the relevant stretches and exercises here
  • An extract from something I am reading:

“Almost universally, the kind of performance we give on social media is positive. It’s more “Let me tell you how well things are going. Look how great I am.” It’s rarely the truth: “I’m scared. I’m struggling. I don’t know.”

Ryan Holiday, Ego Is the Enemy

Have a good week ahead.

[Performance] : Using iperf3 tool for Network throughput test

In this world of Microservices and the distributed systems, a single request (generally) hops through multiple servers before being served. More often than not, these hops are also across the Network cards making the Network Performance the source of slowness in the application.
These parameters makes the need to measure Network performance between servers/systems more critical for benchmarking or debugging.

Iperf3 is one of the open source tools which can be used for network throughput measurement. Below are some of its features.

  • Iperf3 can be used for testing maximum TCP and UDP throughput  between two servers.
  • Iperf3 tests can also be run in a controlled to way to not test the maximum limits but ingest and constant lower network traffic for testing.
  • Iperf3 has options for parallel mode(-P) where multiple clients can be used, setting CPU affinity(-A), pausing certain intervals between two requests(-i), setting the length of buffer to read or write(-l), setting target bandwidth (-b) etc.
  • More important than anything is the fact that iperf3 runs as an independent tool outside your application code. The results from this tool removes any ambiguities/doubts on the application code which might be causing the network problems.

Installation of iperf3 tool:

sudo apt-get install iperf3	

iperf3 tool has be installed on both servers between which you want to measure the network performance. One of the machines is treated as client and other as server.

Command to run on the server:

Below command when run on one of the two servers under test, signifies that the machine is acting as a server for the iperf test.

iperf3 -s -f K
  • -s — runs in server mode
  • -f K — signifies the format as KBytes.
    Note : If you do not want to use the default port (which is 5201) for the test, then specify the port with the option -p in the above command and use the same on client as well.

Command to run on the client:

Below command when run on the other server under test, pushes network bandwidth to server and reports the network capacity based on options used.

iperf3 -c 192.XX.XX.XX -f K
==== output ====

Connecting to host 192.XX.XX.XX, port 5201
[  4] local 192.XX.XX.XX port 50880 connected to 192.XX.XX.XX port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   678 MBytes  693729 KBytes/sec   74   1.06 MBytes
[  4]   1.00-2.00   sec   750 MBytes  767998 KBytes/sec    0   1.48 MBytes       
[  4]   2.00-3.00   sec   606 MBytes  620723 KBytes/sec  143   1.22 MBytes       
[  4]   3.00-4.00   sec   661 MBytes  677201 KBytes/sec    0   1.57 MBytes       
[  4]   4.00-5.00   sec   620 MBytes  634523 KBytes/sec    0   1.83 MBytes       
[  4]   5.00-6.00   sec   609 MBytes  623718 KBytes/sec  1095   1.44 MBytes       
[  4]   6.00-7.00   sec   730 MBytes  747525 KBytes/sec    0   1.76 MBytes       
[  4]   7.00-8.00   sec   716 MBytes  733224 KBytes/sec    0   2.04 MBytes       
[  4]   8.00-9.00   sec   772 MBytes  791192 KBytes/sec    0   2.29 MBytes       
[  4]   9.00-10.00  sec   944 MBytes  966472 KBytes/sec  212   1.63 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  6.92 GBytes  725627 KBytes/sec  1524 sender
[  4]   0.00-10.00  sec  6.92 GBytes  725350 KBytes/sec       receiver
  • -c — run in client mode
  • The ip mentioned is the ip of the server.
  • From the output (last two lines), it can be seen that the total bandwidth available between the two servers is 708MBytes/sec.

There are also various other options available for iperf3 tool. Like the below command specifies the test to be run for 60secs, which is by default 10secs (-t 60), specifies a target bandwidth of 10MB (-b 10M), number of parallel client streams set to 10 (-P 10).

iperf3 -c 192.XX.XX.XX -t 60 -b 10M -P 10 --get-server-output
  • –get-server-output : this get the command line output from server and prints it on client terminal
  • If you want to use udp instead of tcp, same can be achieved using the option –u
  • More details available here on man page – link

Below are some of the cases where I have used iperf3 for debugging purpose:

  • Throughput of the application doesn’t scale but there is no obvious resource contention in cpu, memory or disk. On running “sar -n DEV 2 20” I could see network usage doesn’t peak above 30MB/sec. On using Iperf3 for benchmarking we could see 30Mb/sec was the max network capacity between the servers.
  • When we wanted to find the impact of vpn on the network throughput we used iperf tool for comparative analysis.

Hope this gave you a sneak peak into iperf3 tool’s capability and usages.
Happy tuning!

[Performance] : Java Thread Dumps – Part2

In the previous article about Java Thread Dumps (link here) we looked in to a few basics on Thread dumps(When to take?, How to take?, Sneak peaks? etc.)

In this write up, I wanted to mention a few tools which can ease the process of collecting and analyzing thread dumps.

Collecting multiple thread dumps:

I prefer command-line over any APM tools for taking thread dumps. The best way for analyzing threads is to collect a few thread dumps (5 to 10) and look through the transition in the state of threads.

As mentioned in the previous article(link), one of the ways is using jstack, which is built in to jdk. Below command will collect 10 thread dumps with a time interval of 10sec between each. All dumps are written to a single file ThreadDump.jstack.

a=1; while [ $a -le 10 ]; do jstack -l <pid> >> ThreadDump.jstack && sleep 10; a=`expr $a + 1`; done

You can further split the 10 thread dumps to individual files using the csplit command. The below command basically looks for the line “Full thread dump OpenJDK” which is printed at the started of each thread dump and splits the dump in to individual ones.

csplit -z ThreadDump.jstack /Full\ thread\ dump\ OpenJDK\ 64-Bit\ Server/ '{*}' 
ThreadDumps split into individual files

Tools for visualizing and analyzing ThreadDumps:

To start with, there are many tools like VisualVM, Jprofiler, Yourkit and many more online aggregators for visualizing and analyzing ThreadDumps. Each one have their own pros & cons.
Here is a small list of available tools – link

I generally use the “Java Thread Dump Analyser” (JTDA) tool from here.

This tool is available as a jar, to which you can feed the Thread dump file via command line and visualize Threads grouped in there States. You would appreciate this grouping if you have analyzed Thread Dumps in vim which had 100’s and 100’s of lines.

java -jar jtda.jar <ThreadDumpFilePath>

Here are some of the features which I like about JTDA tool:

  • light weight and doesn’t need any configuration.
  • you can upfront get the count of threads in all states
  • you can only look in to threads that are in a certain state (Runnable / Waiting / Blocked etc)
  • threads with same stack are grouped together for ease of reading
  • “show thread name” if checked, give the name of thread pool for better context.

On the foot note, when looking in to thread dumps, it is very important to know the code paths and the request flows. This helps in root causing the issue and better reading / understanding of thread dumps.

Happy tuning.

Weekly Bullet #23 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • BPF(Berkeley Packet Filter) has come a long way from just being a packet capture tool to advance Performance analysis tool (EBPF – Extended Berkeley Packet Filter). Here (link) is an introduction to EBPF. Also here (link) is a talk on how BPF is used at Netflix.
  • “Minimal safe Bash script template” – link . Because there is no such thing as “knowing enough of bash!”
  • Kelsey Hightower is an inspiration. A writeup on how he made it from McDonald’s to Google (link). [HIGHLY RECOMMENDED] –> : A talk he gave about his journey a few years back here (link)
  • [That time of the year!] : “Best talks of 2020” — link
  • [Late news!] If you didn’t hear it already, Github has Dark mode now. – link

Non-Technical :

  • [Another one] “Ask HN: What book changed your life in 2020?” – some great recommendations here – link . Personally for me, “Sapiens” widened my horizon about evolution of Human Beings.
  • “100 Tips for better life.” – link – I don’t agree with all of them, but most of these are thought provoking.
  • An extract from the book that I am reading.

At the core of all anger is a need that is not being fulfilled.

Marshall B. Rosenberg, Nonviolent Communication: A Language of Life

Happy learning and an advance Happy new year 2021!

Weekly Bullet #22 – Summary for the week

Here are a bunch of Technical / Non-Technical topics that I came across recently and found them very resourceful.

Technical :

  • [Talk-Velocity 2017] : Performance Analysis Superpowers with Linux eBPF (44mins)- link
  • Popular Java Podcasts to follow in 2020 – link
  • Since we are taking about Podcasts, I have also heard good things about Barcode and ACM-ByteCast is amazing!
  • Illustration: Much that we have gotten wrong about SRE – link
  • A list of popular java libraries. – link
  • The second edition of “System Performance: Enterprise and Cloud” – by Brendan Gregg releasing on 2nd December. – link . This is “the best” reference guide for Performance Engineering.

Non-Technical :

  • “Library of Scroll” – Here is a site with one great article every Monday. Since it is just one, generally I find them very good. – link
  • Great site with short explanations of over 24 cognitive biases. Co-authored by Gabriel Weinberg who is the CEO of DuckDuckGo. – link
  • Not sure why I liked this, but this “57 Years Apart – A Boy And a Man Talk About Life” short video was quite gripping. – link
  • Soft skills for Software Engineers. Short thread. – link
  • Hand picked remote jobs from “Hacker News Who is hiring” November – link
  • Extract from a book:

“Respect an old tradition path as it is well tested, but also be open to the new modern way of things as they open up your mind.”

The Daily Stoic

Have a great week ahead.