FATA #1 / Big Data- Elastic Stack

Nazar Khimin
4 min readNov 20, 2021

--

[FATA] - From test automation to architecture article series

Elasticsearch — is a distributed, real-time search and analytics engine for all types of data.

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON

Used for:

  1. Logging & Log analytics
  2. Complex search
  3. Security analysis
  4. Marketing & Operations
  5. Business analytics

Features:

  1. Distributed — runs on multiple nodes within a cluster can scale to 1k nodes, which means performance of search can scale linearly with the number of nodes.
  2. Highly available and fault-tolerant — multiple copies of data are stored within the cluster, and every index is replicated.
  3. REST API — can be used for CRUD operations.
  4. Schema-less — documents can be indexed without explicitly providing a schema, used inverted index concept for lookup.
  5. Near real-time operations — read and write operations take less than a second to complete.
  6. Complementary tooling an plugins — Kibana, Logstash, Beats.
  7. Easy application development — Java, Python, PHP, JavaScript, Node.js, Ruby…

ELK Stack: Elasticsearch, Logstash, Kibana

  1. Elasticsearch is a search and analytics engine.
  2. Logstash is a server-side data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and then sends it to a “stash” like Elasticsearch.
  3. Kibana lets users visualize data with charts and graphs in Elasticsearch.
  4. Beat — data shippers, send data from machines to Logstash (if you need transformation and parsing) or Elasticsearch.

Cluster and nodes

Node — is instance of Elasticsearch that stores data.

Cluster — is a collection of related nodes that have the same cluster.name attribute. Clusters are completely independent of each other, it’s not common to perform cross-cluster searches.

Major components

  1. Indices — the largest unit of data in Elasticsearch, are logical partitions of documents and can be compared to a database in the world of relational databases.
  2. Documents — are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in table. Data in documents is defined with fields comprised of keys and value

Each document is also associated with metadata, the most important items being:

_index — The index where the document is stored

_id — The unique ID which identifies the document in the index

  1. Fields
  2. Mapping — It defines the fields for documents of a specific type — the data type (such as keyword and integer) and how the fields should be indexed and stored in Elasticsearch.
  3. Shards — is a single index which allow facilitate its scalability, when you create index you can define how many shards you want. (data parts inside shard)
  4. Replica — fail-safe mechanisms which basically copies your index’s shards.

Analysis and Analyzers

An analyzer contains three lower-level building blocks: character filter, tokenizers, and token filters.

Manage Data in Elasticsearch

  1. cat indices
  2. cat plugins
  3. cat templates
  4. cat health

Analyze & Query your data

  1. Histogram — is a multi-bucket values source-based aggregation that can be applied on numeric values or numeric range values extracted from the documents.
  2. Terms — is a multi-bucket value source-based aggregation where buckets are dynamically built — one per unique value.
  3. Range — is a multi-bucket value source-based aggregation that enables the user to define a set of ranges — each representing a bucket.

Metrics aggregation: Cardinality and Percentiles aggregation.

Top interview question references:

  1. https://www.guru99.com/elasticsearch-interview-questions.html
  2. https://facingissuesonit.com/elasticsearch-interview-questions-and-answers/
  3. https://logit.io/blog/post/the-top-50-elk-stack-and-elasticsearch-interview-questions
  4. https://facingissuesonit.com/elasticsearch-interview-questions-and-answers/

Used references:

  1. https://logz.io/blog/10-elasticsearch-concepts/
  2. https://www.elastic.co/what-is/elk-stack
  3. https://medium.com/make-it-heady/what-and-why-ekl-stack-378e6c4765b9

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

No responses yet

Write a response