Sunday, 16 October 2016

Setting up MultiNode Cassandra Cluster on Ubuntu16-04 machines

What is Cassandra?

Cassandra is a distributed database for managing large amount of structured data. It offers capabilities like horizontal scalability and high availability (no single point of failure because of its decentralized nature). Following are some key points about it
  • Scalable: Cassandra supports horizontal scalability. Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
  • Highly available: 
    • DecentralizedThere is no single point of failure. Every node in cluster is identical (no master/slave notion).
    • Fault Tolerant: Data is automatically replicated to multiple nodes for fault tolerance. Failed nodes can be replaced without any downtime. Replication across multiple data centers are supported. 

Setup a multi-node cluster on Ubuntu 16.04:

Prerqs

  1. Three machines with ubuntu 16.04 OS.
  2. Each machine should be able to communicate with each other.
NOTE: Repeat Below steps on each machine.

1. Installing oracle JVM:

  • sudo add-apt-repository ppa:webupd8team/java
  • sudo apt-get update
  • sudo apt-get install oracle-java8-set-default
  • java -version

2. Installing Cassandra:

  • echo "deb http://debian.datastax.com/community stable main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
  • curl -L https://debian.datastax.com/debian/repo_key | sudo apt-key add -
  • sudo apt-get update
  • sudo apt-get install dsc30
  • sudo apt-get install cassandra-tools

3. Connecting to the cluster:

  • sudo nodetool status
  • cqlsh
You should be able to see the cqlsh prompt.

4. Create a ring -- Deleting default data

  • sudo service cassandra stop
  • sudo rm -rf /var/lib/cassandra/data/system/*

5. Create a ring -- Configuring the cluster

    • Modify /etc/cassandra/cassandra.yaml
      cluster_name: 'cassan'
      seed_provider:
        - class_name: org.apache.cassandra.locator.SimpleSeedProvider
          parameters:
              - seeds:  "<server1 ip>,<server2 ip>"
      listen_address: <local server ip>
      rpc_address: <local server ip>
      auto_bootstrap: false
      data_file_directories:
        - /var/lib/cassandra/data
      commitlog_directory: /var/lib/cassandra/commitlog
      saved_caches_directory: /var/lib/cassandra/saved_caches
      commitlog_sync: periodic
      commitlog_sync_period_in_ms: 10000
      partitioner: org.apache.cassandra.dht.Murmur3Partitioner
      endpoint_snitch: SimpleSnitch
      start_native_transport: true
      native_transport_port: 9042

    6. Create a ring -- Configuring the firewall

    To allow communication, we'll need to open the 7000, 9042 network ports for each node
    • sudo apt-get install -y iptables-persistent
    • Add following to /etc/iptables/rules.v4
      -A INPUT -p tcp -s <your_other_server_ip> -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT
    • sudo service iptables-persistent restart
    • sudo service cassandra start

    Check the cluster status:

    • sudo nodetool statusYou should see something like following
      Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
    UN  10.10.0.32   123.93 KB  1            74.7%             6fa993f1-07f7-4368-8ee5-c52cedae3843  rack1
    UN  10.10.0.102  152.18 KB  1            12.4%             64c4c449-3949-4c83-a0a7-86b084a58d5c  rack1
    UN  10.10.0.4    229.86 KB  1            12.9%             83cd40ec-3e64-43ea-87a9-65bc8a90bd1d  rack1
    • You should also be able to see cqlsh prompt
      cqlsh <serverip> 9042

    Congratulations! You now have a multi-node Cassandra cluster running.



    No comments:

    Post a Comment