Welcome to another exciting edition of “way-too-detailed explanations of obscure data-related topics.” I’m your host, Robert J. Moore, and today we’ll be exploring the exciting world of SSH Tunneling with MySQL and PHP.
As you can imagine, a big part of our job here at RJMetrics is making sure that data gets from a client’s database at Point A to our systems at Point B securely, reliably, and automatically. Depending on your database platform and other system characteristics, there are a number of ways to approach to this challenge. For customers running Linux and MySQL, our preferred method is called SSH Tunneling.As anyone who has even brushed up against Linux knows, SSH is the “Secure Shell.” SSH uses public-key cryptography to securely exchange data between two networked devices over an insecure channel (such as the Internet). If you are in a local location and there is a Linux machine at a remote location containing your sensitive information, you can use SSH establish a secure connection between the two locations and interface with the remote machine via the local machine.
SSH has an exciting and often overlooked feature called Port Forwarding (also known as SSH Tunneling). SSH Tunneling allows you to send information to a specific TCP port on your local machine and have it be transferred to a port on the remote machine securely over SSH (and vice versa). This allows for the encrypted transfer of information from programs and services that might not otherwise communicate securely.
Just imagine the possibilities. You could create a secure tunnel to encrypt your FTP uploads. You could use an SSH tunnel to download your POP mail without worrying about your password or e-mails being sniffed along the way. When you consider the benefits, it should come as no surprise that SSH Tunneling is a key underlying methodology in many Virtual Private Networking (VPN) systems.
As an added bonus, SSH tunneling allows you to avoid the problems associated with penetrating your remote server’s firewall. When receiving information sent via an SSH Tunnel, a remote program is receiving data sent directly from a shell running on their own machine.
Here at RJMetrics, however, we care about things like ecommerce analytics. That means it’s all about the database. We use SSH Tunneling to port-forward MySQL queries from our servers to customer databases and receive the results securely via SSH. SSH tunneling is attractive for this purpose for a few key reasons:
- Data security is our top priority, and SSH Tunneling offers mature and secure solution to the data transfer problem. (We are also excited about the secure MySQL connections via SSL that now exist in nonstandard MySQL releases. This may prove to be a less complex solution as adoption picks up.)
- SSH tunneling is extremely easy for customers to configure and for us to automate (as explained below), and can often be set up in just a few minutes.
Configuration and Tunnel Creation
For our clients using MySQL and Linux, setting up SSH Tunneling couldn’t be easier. New customers receive detailed tutorials, of course, but here are the basics. It’s a three-step process:
- Create a Linux user that RJMetrics will use to SSH into a server that holds an instance of your database.
- Create a MySQL user that RJMetrics will use to run SELECT queries on that database.
- Use our secure dashboard portal (credentials are provided after signup) to provide us with the server’s address and the credentials for these two accounts.
Security can be further enhanced by restricting our Linux user’s rights within the shell (using methods like rbash, chroot, etc) and our MySQL user’s access to only the databases, tables, and columns you choose (using GRANT and REVOKE statements within MySQL).
Next, on our side, the real fun begins. For those of you who might be interested in creating your own SSH Tunnels for MySQL, what follows is a simplified version of what happens on our end.
It only takes one line of code to establish an SSH tunnel:
Let’s break this down:
- ssh: this is the call to the ssh command line program, which establishes the SSH connection.
- -f: this option “forks” the new SSH connection, sending it to the background and leaving control with the local shell. For port forwarding, this is necessary because we’ll be using local programs to communicate with the remote machine.
- -L local-port:remote-address:remote-port: The -L option is for “local forwarding.” It means we want SSH to tunnel commands sent via a port on our local machine to a specific port on the remote machine; the colon-separated parameters that follow are the ports and address in question. (There is a -R option that does the opposite — “remote forwarding” — we don’t use it here but can be quite useful in other applications.)
- login-name: this is the login name used to establish an SSH connection to the remote host.
- [command]: this optional parameter is a command you would like to execute remotely as soon as the connection is established.
- >> logfile: As we will explain in a moment, we want this tunnel to close itself after we are done using it. Redirecting standard output to a log file prevents the program from “hanging” while stdout awaits an end-of-stream that won’t arrive unless the process is manually killed.
To provide a bit more color on some of the nuances and complications involved, here is a version of the command with actual values included:
- The Port Numbers: notice that the local port is set to 3307, while the remote port is set to 3306. Port 3306 is the default port for MySQL, so that’s the port where we want our queries being forwarded on the remote system. On the local system, however, we already have our own instance of MySQL server occupying port 3306. As such, we must use a different port to disambiguate the local and remote MySQL servers.Using this technique (and more port numbers), it is actually possible for us to establish multiple MySQL connections to multiple clients at the same time.
- The Remote Address: You’ve probably noticed that 127.0.0.1 (the number between the port numbers) isn’t a public IP address, and as such the term “remote address” is a bit of a misnomer. This is a spot where many people get stuck when setting up MySQL tunneling. The value here is not necessarily the web server’s public IP address, but the IP address to which MySQL server is bound. The value we are looking for is stored in the file /etc/my.cnf (or wherever your MySQL configuration file is located). The line “bind-address=xxx.xx.xx.xx” binds the MySQL server to a specific IP address. Depending on whether or not you want to allow users to connect to MySQL server directly from remote locations (this should be avoided if you’re not sure), the bind-address can be set to the web server’s actual public IP address (which allows these outside connections) or to 127.0.0.1 (which only allows local connections). 127.0.0.1 is the loopback IP address, which is functionally equivalent to “localhost” but serves as a valid, necessary bind-address value for our tunneling commands.In some cases, the “bind-address” line may be commented out of my.cnf (since it’s not really needed if remote connections aren’t allowed). In these cases, however, you must uncomment the line and make sure that it reads “bind-address=127.0.0.1” — this is just as secure, but is necessary for tunneling to work. Also, if the line “skip-networking” appears, be sure that it is commented out.
- sleep 60: When it comes to tunnel connections, we basically have two options: leave the connection open all the time or open it and close it as needed. We prefer the latter, and as such we don’t specify the “-N” option when establishing a tunnel, which would leave it open until the process is manually killed (bad for automation). Since “-N” is not specified, our tunnel will close itself as soon as its SSH session isn’t being used for anything. This is ideal behavior, except for the few seconds between when we create the tunnel and when we get a MySQL connection up and running via the tunnel. To buy us some time during this period, we issue the harmless “sleep 60” command when the tunnel is created, which basically buys us 60 seconds to get something else going through the tunnel before it closes itself. As long as a MySQL connection is established in that timeframe, we are all set.
- Avoiding a password prompt: Normally, after issuing a command like the one above, you will receive a password prompt to validate the user logging into the remote machine. Again, this is bad for automation, since it’s never a good idea to have an application interacting with a command line prompt (or storing plain-text passwords). Public key encryption comes to the rescue again in this case. SSH won’t prompt for a password if the public certificate of remote user is stored in the ~/.ssh/authorized_keys file of the remote user account being logged into. We simply append our public key to this file and password prompts are no longer an issue.
Connecting via MySQL
It’s time to see all of our hard work pay off. From our local machine, we simply issue the following command:
Notice that the MySQL host is 127.0.0.1, which is the same as the bind-address value on the remote server’s my.cnf file. It’s important that we use this value and not localhost, since we are ultimately accessing a forwarded TCP port, and specifying localhost causes MySQL to ignore TCP altogether and simply connect to the local server via a local socket. Accordingly, notice that we have specified port 3307 to make the connection; this is the TCP port we are forwarding.
Integrating with PHP CLI
So, now we have all these interesting tools doing our bidding at the command line, but how do we use them to automate data replication and analysis? How do we go from a cool technique to a hosted business intelligence application? While we use a number of technologies and methodologies to conduct this process here at RJMetrics, I thought I would share a simple way one could automate this using PHP CLI (the often-overlooked PHP command line interface).
Here’s how to securely establish a remote database connection in just 2 lines of PHP code:
We use the shell_exec function to create the tunnel with a 60 second opening window, and then use the mysqli_connect function to open a database connection using the forwarded port. Note that we must use the “mysqli” library here because mysql_connect does not allow us to specify a port.
Of course, you should be careful using functions like shell_exec that make system level calls. They should never include user-specified data and should generally be avoided by code that serves webpages. We get away with it here because this particular PHP file is being used via CLI and can not be triggered via the web.
A final note on PHP: there is a very poorly documented PHP extension called SSH2 that provides an interface for making SSH connections via PHP functions (eliminating the need for shell_exec and its brethren). Unfortunately, however, the current version of this library does not allow you to specify a local port when opening an SSH tunnel. This renders the library rather useless for the MySQL application, since the port used by the remote application is almost always in use by the local server. For other applications of SSH tunneling, however, you may find it to be a more secure bet.
Thanks for reading and I hope you enjoyed this peek at just one of the many interesting ways we’re dealing with data here at RJMetrics.