The power of Commodity Hardware

In this new era of Big Data, the term commodity hardware became very common, but it’s up to you to define how to use it. The term commodity hardware in general mean Computer hardware that is affordable and easy to obtain. Typically it is a low-performance system that is PC-compatible and is capable of running Microsoft Windows, Linux, or others operating systems without requiring any special devices or equipment.

Why do we need datacenters with special configurations to be able to run servers 24/7 : ACs, UPS, raised floors, networking, security…etc. While we used to have a desktop computer in every office. Imagine a company having 30 desktop computers having an i7 8cores 4GB ram 500GB hard drives each, that’s the equivalent of a 240 Core super computer with 120GB ram, and 15 TB or storage ! But obviously we don’t use them as servers, because these PCs fail frequently and they are not designed to run 24/7. Read more

Opening Up X-Road for the Developers Community

Launched in 2001, X-Road is the data exchange layer that is actually powering the e-Services infrastructure in Estonia.  It’s a technical and organisational environment, which enables secure Internet-based data exchange between the state’s information systems.

x-road

The key element of Estonian eGovernment program is that all its databases are decentralized, which mean there is no single owner or controller, and no locks into any database or software provider. Read more

MySQL Cluster NDB Up and running (7.4 and 6.3) on ubuntu server trusty 14.04

I previously talked about MySQL Master-Master clustering using the default MySQL packages which is more a proof of concept, and could be working fine if you don’t have heavy load on servers, but especially it will work fine if you don’t have autoincrement fields in your tables.

The most appropriate clustering solution for MySQL is at least the MySQL cluster package which is available as community edition, there are some other solutions that we’ll talk about in another article. If you are looking to get MySQL cluster up and running easily on Ubuntu servers, here is how to proceed. Read more

Shared OCFS2 partition on Ubuntu Server 10.04 x64

One of the applications that I’m working on uses archived documents, there is no NoSQL here, just plain tiff files with indexes in Oracle database. Everything related to document access, permissions, conversions, watermarking, security, encryption … is managed by the application itself. So I had to keep my cluster permanently connected to a SAN storage via HBA fiber cards.

In the beginning I opted for NFS as shared file system, then decided to go for OCFS2 which is open source, maintained by Oracle, and available under GPLv2.

Some Pros/Cons of NFS, GFS2 and OCFS2 (from Dublin OSS barcamp) :

NFS

  • Pro: standard, cross-platform, easy to implement
  • Con: Poor performance, single point of failure (single locking manager, even in HA)

GFS2

  • Pro: Very responsive on large data files, works on physical and virtual, quota and SE-Linux support, faster than EXT3 when I/O operations are on the same node
  • Con: Only supported with Red Hat, Performance issues on accessing small files on several subdirectory on different nodes

OCFS2

  • Pro: Very fast with large and small data files on different node with two types of performance models (mail, data file). Works on a physical and virtual.
  • Con: Supported only through contract with Oracle or SLES, no quota support, no on-line resize

First we need to install OCFS2 tools :

sudo apt-get install ocfs2-tools

There is another package ocfs2console that you want to install to configure the cluster via GUI, but since I’m using ubuntu server I’m skipping this to configure my cluster manually.

CFS2Console-300x192

 

Create on every node attached to storage /etc/ocfs2/cluster.conf

sudo vi /etc/ocfs2/cluster.conf

With the content below, only replace node1 and node2 with their respective names and IP for each node :

node:
name = node1
cluster = ocfs2
number = 0
ip_address = 10.10.0.0
ip_port = 7777
node:
name = node2
cluster = ocfs2
number = 1
ip_address = 10.10.0.1
ip_port = 7777
cluster:
name = ocfs2
node_count = 2

Now you reconfigure ocfs2-tools with the default values :

sudo dpkg-reconfigure ocfs2-tools

then restart services :

sudo /etc/init.d/o2cb restart
sudo /etc/init.d/ocfs2 restart

If your fiber card connected to your host/storage, and virtual disks created and presented you should run fdisk to see it :

$ sudo fdisk -l
Disk /dev/sda: 1073.7 GB, 1073741824000 bytes
255 heads, 63 sectors/track, 130541 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x02020202

the result have been truncated to show only one virtual disk, and you might see multiple /dev/sda, /dev/sdb, /dev/sdc… according to your configuration in addition to your local hard disks. What I have done is creating a 1TB partition that I will share between my two nodes :

$ sudo fdisk /dev/sda

In fdisk menu choose “n” for new partition, and choose your partition size according to your requirements. Then use “w” to write changes and exit.

Finally we create a ocfs2 partition table :

$ mkfs.ocfs2 /dev/sda

mount your partition :

$ sudo mkdir /archives
$sudo mount -t ocfs2 /dev/sda /archives

or you can add it to fstab to mount automatically on boot :

$ /dev/sda /archives ocfs2 _netdev 0 0

The _netdev option is used here to prevent the system from attempting to mount these file systems until the network has been enabled on the system.

You want to test your new partition, and you will notice that every file/folder created on node1 is automatically available on node2, and vise-versa.

Enjoy !

Two nodes Load balance and Failover with keepalived and Ubuntu Server 10.04 x64

In an ideal system architecture using load balancers in separate nodes is preferred, however it’s also possible to have your load balancers in the same nodes with your applications. I have used in this architecture the same hardware as the previous Master/Master MySQL cluster, including Ubuntu server 10.04 x64, Apache2 as web server, two nodes HP DL380G6 with 3 hard disks 15K in RAID5 and connected to a SAN storage via Fiber. For load balancing and failover I used keepalived and LVS, and you can use heartbeat to get your cluster running.

First you will need to set at least two IPs (10.10.0.1 and 10.10.0.2) for your servers, and one virtual (10.10.0.3) shared between the two servers, you will have for the first interface :

# The primary network interface
auto eth0
iface eth0 inet static
address 10.10.0.1
netmask 255.255.255.0
network 10.10.0.0
broadcast 10.10.0.255
gateway 10.10.0.250
auto eth0:0
iface eth0:0 inet static
address 10.10.0.3
netmask 255.255.255.0
network 10.10.0.0
broadcast 10.10.0.255

and the second interface :

# The primary network interface
auto eth0
iface eth0 inet static
address 10.10.0.2
netmask 255.255.255.0
network 10.10.0.0
broadcast 10.10.0.255
gateway 10.10.0.250
auto eth0:0
iface eth0:0 inet static
address 10.10.0.3
netmask 255.255.255.0
network 10.10.0.0
broadcast 10.10.0.255

Then we can start by installing keepalived (v1.1.17 is available in Ubuntu repositories)

sudo apt-get install keepalived

You will have to create two configuration files for the first node 10.10.0.1 (Master) and second node 10.10.0.2 (Backup). So we add in the master node :

[email protected]:~$ sudo nano /etc/keepalived/keepalived.conf
# Keepalived Configuration File
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 10
priority 200
virtual_ipaddress {
10.10.0.3/24
}
notify_master "/etc/keepalived/notify.sh del 10.10.0.3"
notify_backup "/etc/keepalived/notify.sh add 10.10.0.3"
notify_fault "/etc/keepalived/notify.sh add 10.10.0.3"
}
virtual_server 10.10.0.3 80 {
delay_loop 30
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 10.10.0.1 80 {
weight 100
HTTP_GET {
url {
path /index.php
digest d41d8cd98f00b204e9800998ecf8427e
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
real_server 10.10.0.2 80 {
weight 100
HTTP_GET {
url {
path /index.php
digest d41d8cd98f00b204e9800998ecf8427e
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
}

And in the backup node :

[email protected]:~$ cat /etc/keepalived/keepalived.conf
# Keepalived Configuration File
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 10
priority 100
virtual_ipaddress {
10.10.0.3/24
}
notify_master "/etc/keepalived/notify.sh del 10.10.0.3"
notify_backup "/etc/keepalived/notify.sh add 10.10.0.3"
notify_fault "/etc/keepalived/notify.sh add 10.10.0.3"
}
virtual_server 10.10.0.3 80 {
delay_loop 30
lb_algo rr
lb_kind DR
persistence_timeout 50
protocol TCP
real_server 10.10.0.1 80 {
weight 100
HTTP_GET {
url {
path /check.txt
digest d41d8cd98f00b204e9800998ecf8427e
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
real_server 10.10.0.2 80 {
weight 100
HTTP_GET {
url {
path /check.txt
digest d41d8cd98f00b204e9800998ecf8427e
}
connect_timeout 3
nb_get_retry 3
delay_before_retry 2
}
}
}

The hash is created using, notice that you can add exception so apache don’t log check.txt requests.

[email protected]:~$ genhash -s 10.10.0.1 -p 80 -u /check.txt
MD5SUM = d41d8cd98f00b204e9800998ecf8427e
[email protected]:~$ genhash -s 10.10.0.2 -p 80 -u /check.txt
MD5SUM = d41d8cd98f00b204e9800998ecf8427e

Also in both nodes we have to add a small utility to notify (/etc/keepalived/notify.sh) :


#!/bin/bash
VIP="$2"
case "$1" in
add)
/sbin/iptables -A PREROUTING -t nat -d $VIP -p tcp -j REDIRECT
;;
del)
/sbin/iptables -D PREROUTING -t nat -d $VIP -p tcp -j REDIRECT
;;
*)
echo "Usage: $0 {add|del} ipaddress"
exit 1
esac
exit 0

Launch keepalived on the two nodes :

sudo /etc/init.d/keepalived start

Now we need to enable ip_forward on the two nodes permanently

net.ipv4.ip_forward = 1

restart network on the two nodes

sudo /etc/init.d/networking restart

And we can check that load balancing is working correctly on Master :

[email protected]:~$ sudo ipvsadm -L -n
[sudo] password for usr01:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.10.0.3:80 rr persistent 50
-> 10.10.0.1:80 Local 100 0 0
-> 10.10.0.2:80 Route 100 0 0

Also on Backup server

[email protected]:~$ sudo ipvsadm -L -n
[sudo] password for usr01:
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.10.0.3:80 rr persistent 50
-> 10.10.0.1:80 Route 100 0 0
-> 10.10.0.2:80 Local 100 0 0

We are almost done, we only need to add a preroute rule on the backup node manually to get started :

[email protected]$ iptables -A PREROUTING -t nat -d 10.10.0.3 -p tcp -j REDIRECT
[email protected]$ iptables -t nat --list
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
REDIRECT tcp -- anywhere 10.10.0.3
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination

[email protected]$sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to-destination 10.10.0.3:80
[email protected]$ iptables -t nat --list
target prot opt source destination
DNAT tcp -- anywhere anywhere tcp dpt:www to:10.10.0.3:80

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination

That’s all.

Now you can connect to http://10.10.0.3 and you can notice load distributed between two nodes internally. In case one of the nodes fail, it will takes few seconds until the backup server notice the failure and update its iptables prerouting rule. When apache service goes down, you will notice that request on port 80 will be automatically redirected to second node.

As I have mentioned in the beginning, failover control cannot goes without downtime in such architecture, but it still great to distribute load if you are limited in hardware.

Finally, it will be much easier (even faster) to load balance using Round Robin DNS from active directory for example, if you can manage to monitor failed service or node, however this architecture remain better on failover even with a short downtime.

Update 2017-11-14 : I had an issue with iptables REDIRECT which was not redirecting to virtual IP anymore, replace it with DNAT fixes the issue.

(HBY) Consultancy