My First Cloud Storage Failure

Amazon_Drive_logoI count myself as a skeptic of the new cloud-based world. When I first learned of DropBox I wasn’t willing to trust it with my data. When I read about the frustration of people whose data are still trapped on MegaUpload,  I felt vindicated.

But the tide of progress moved forward and now we have many cloud-based storage providers with enough of a track record for me to dip my toes in the water. Starting in January 2016 I started using cloud-based storage for my personal needs, spreading my files across Microsoft OneDrive, Google Drive, and Amazon Drive.

That’s not to say I trust cloud-based storage yet. I still maintain my regimen of home offline backups on removable hard drives. And for the files that I feel are important, I duplicate storage across at least two of the cloud storage providers.

Almost a year and a half in, I admit I see a lot of benefit. I’ve become a lot less dependent on a specific computer as much of my files are accessible from any computer with an internet connection. I can pick up work where I left off on another computer. I can refresh and reformat a computer with far less worry I’ll destroy any irreplaceable data.

And then there are the wacky outliers. When I wanted to obtain a library card, one required proof of local residency is an utility bill. I was able to pull out my phone and retrieve a recent electric bill thanks to cloud storage.

But just like physical spinning hard drives, a failure is only a matter of time. And tonight, after almost 18 months of flawless cloud storage performance, we have our first winner. Or more accurately, our first loser. Tonight I was not able to access my files on Amazon Drive. I get the “Loading” spinner that never goes away.

The underlying Amazon storage infrastructure seems ok. (AWS S3 status is green across the board.) This must be a failure in their consumer-level storage offering, which will probably get fixed soon.

In the meantime, they have one annoyed customer.

Static Web Site Hosting with Amazon S3 and Route 53

aws_logoWeb application frameworks have the current spotlight, which is why I started learning Ruby on Rails to get an idea what the fuss was about. But a big framework isn’t always the right tool for the job. Sometimes it’s just a set of static files to be served upon request. No server-side smarts necessary.

This was where I found myself when I wanted to put up a little web site to document my #rxbb8 project. I just wanted to document the design & build process, and I already had registered the domain rxbb8.com. The HTML content was simple enough to create directly in a text editor and styled with CSS from the Materialize library.

After I got a basic 1.0 version of my hand-crafted site, I uploaded the HTML (and associated images) to an Amazon S3 bucket. It only takes a few clicks to allow files in a S3 bucket to be web-accessible via a long cumbersome URL on Amazon AWS domain http://rxbb8.com.s3-website-us-west-2.amazonaws.com. Since I wanted this content to be accessible via the rxbb8.com domain I already registered, I started reading up on the AWS service named in geek-humor style as Route 53.

Route 53 is designed to handle the challenges of huge web properties, distributing workload across many computers in many regions. (No single computer could handled all global traffic for, say, netflix.com.) The challenge for a novice like myself is to figure out how to pull out just the one tool I need from this huge complex Swiss army knife.

Fortunately this usage case is popular enough for Amazon to have written a dedicated developer guide for it. Unfortunately, the page doesn’t have all the details. The writer helpfully points the reader to other reference articles, but those pages revert back to talking about complex deployments and again it takes effort to distill the simple basics out of the big feature list.

If you get distracted or lost, stay focused on this Cliff Notes version:

  1. Go into Route 53 dashboard, create a Hosted Zone for the domain.
  2. In that Hosted Zone, AWS has created two record sets by default. One of them is the NS type, write down the name servers listed.
  3. Go to your domain registrar and tell them to point name servers for the domain to the AWS name servers listed in step 2.
  4. Create S3 storage bucket for the site, enable static website hosting.
  5. Create a new Record Set in the Route 53 Hosted Zone. Select “Alias” to “Yes” and point alias target to the S3 storage bucket in step 4.

Repeat #4 and #5 for each sub-domain that needs to be hosted. (The AWS documentation created example.com and repeated 4-5 for www.example.com.)

And then… wait.

The update in step 3 needs time to propagate to name servers across the internet. My registrar said it may take up to 24 hours. In my case, I started getting intermittent results within 2 hours, but it took more than 12 hours before everything stabilized to the new settings.

But it was worth the effort to see version 1.0 of my created-from-scratch static web site up and running on my domain! And since it’s such a small and simple site with little traffic, it will cost me only a few pennies per month to host in this manner.

rxbb8.com.v1

Let the App… Materialize!

materializecsslogoAfter I got the Google sign-in working well enough for my Rails practice web app, the first order of business was to build the basic skeleton. This was a great practice exercise to take the pieces I learned in the Ruby on Rails Tutorial sample app and build something of my own design.

The initial pass implemented basic functionality but it didn’t look very appealing. I had focused on the Rails server-side code and left the client-side code simple plain HTML that would have been state-of-the-art in… maybe 1992?

Let’s make it look like something that belongs in 2017.

The Rails tutorial sample app used Bootstrap to improve the appearance and functionality of the client-side interface. I decided to take this opportunity to learn something new instead of doing the same thing. Since I’m using Google Sign-In in this app, I decided to adopt Google’s design concepts to my client-side appearance as well.

Web being the web, I knew I wouldn’t have to start from scratch. I knew about Google’s own Material Lite and thought that would be a good candidate before I learned it had been retired in favor of its successor, Material Components for the web. One of the touted advantages was improved integration with different web platforms. Sadly Rails was not among the platforms with examples ready-to-go.

I looked around for an existing project to help Rails projects adapt Google’s design language, and that’s when I found Materialize: A library that shares many usage patterns with Bootstrap. The style sheets are even written using SASS, native to default Rails apps, making for easy integration. Somebody has done that work and published it as Ruby gem materialize-sass, so all I had to do was add a single line to use Materialize in my app.

Of course I still had to put in the effort to revise all the view files in my web app to pick up Materialize styling and features. That took a few days, and the reward for this effort is a practice web app which no longer look so amateurish.

Protecting User Identity

google-sign-inRecently, web site security breaches have been a frequent topic of mainstream news. The technology is evolving but this chapter of technology has quite some ways to go yet. Learning web frameworks gives me an opportunity to understand the mechanics from web site developer’s perspective.

For my project I decided to use Google Identity platform and let Google engineers take care of identification and authentication. By configuring the Google service to retrieve only basic information, my site never sees the vast majority of personally identifiable information. It never sees the password, name, e-mail address, etc.

All my site ever receives is a string, the Google ID. My site uses it to identify an user account. With the security adage of “what I don’t know, I can’t spill” I thought this was a pretty good setup: The only thing I know is the Google ID, I can’t spill anything else.

Which led to the next question: what’s the worst that can happen with the ID?

I had thought the ID is something Google generated for my web site. More specifically my site’s Client ID. I no longer believe so. A little experimentation (aided by a change in Client ID for the security issue previously documented) led me to now believe it’s possible the Google ID is global across all Google services.

This means if a hacker manages to get a copy of my site’s database of Google ID, they can cross-reference to databases of other compromised web sites. Potentially assembling a larger picture out of small pieces of info.

While I can’t stop using Google ID (I have to have something to identify an user) I can make it more difficult for a hacker to cross-reference my database. I’ve just committed a change to hash the ID before it is stored in the database. Salted with a value that is unique per deployed instance of the app.

Now for a hacker to penetrate the identity of my user, they must do all of the following:

  1. Obtain a copy of the database.
  2. Obtain the hashing salt used by the specific instance of the app which generated that database.
  3. Already have the user’s Google ID, since they won’t get the original ID out of my database of hashed values.

None of which are impossible, but certainly a lot more effort than it would have otherwise taken.

I think this is a worthwhile addition.

Limiting Google Client ID Exposure

google-sign-inToday’s educational topic: the varying levels of secrecy around cloud API access.

In the previous experiment with AWS, things were relatively straightforward: The bucket name is going to be public, all the access information are secret, and none of them are ever exposed to the user. Nor are they checked into the source code. They are set directly on the Heroku server as environment variables.

Implementing a web site using Google Identity got into a murky in-between for the piece of information known as the client ID. Due to how the OAuth system is designed, the client ID has to be sent to the user’s web browser. Google’s primary example exposed it as a HTML <meta> tag.

The fact the client ID is publicly visible led me to believe the client ID is not something I needed to protect, so I had merrily hard-coded it into my source and checked it into Github.

Oops! According to this section of the Google Developer Terms of Service document, that was bad. See the sections I highlighted in bold:

Developer credentials (such as passwords, keys, and client IDs) are intended to be used by you and identify your API Client. You will keep your credentials confidential and make reasonable efforts to prevent and discourage other API Clients from using your credentials. Developer credentials may not be embedded in open source projects.

Looks like we have a “secret but not secret” level going on: while the system architecture requires that the client ID be visible to an user logging on to my site, as a developer I am still expected to keep it secret from anybody just browsing code online.

How bad was this mistake? As far as security goofs go, this was thankfully benign. On the Google developer console, the client ID is restricted to a specific set of URIs. Another web site trying to use the same client ID will get an error:

google-uri-mismatch

IP addresses can be spoofed, of course, but this mitigation makes abuse more difficult.

After this very instructional detour, I updated my project’s server-side and client-side code to retrieve the client ID from an environment variable. The app will still end up sending the client ID in clear text to the user’s web browser, but at least it isn’t in plain sight searchable on Github.

And to close everything out, I also went into the Google developer console to revoke the exposed client ID, so it can no longer be used by anybody.

Lesson learned, moving on…

Adventures in Server-Side Authentication

google-sign-inThe latest chapter comes courtesy of the Google Identity Platform. For my next Rails app project, I decided to venture away from the user password authentication engine outlined in the Hartl Ruby on Rails Tutorial sample app. I had seen the “Sign in with Google” button on several web sites (like Codecademy) and decided to see it from the other side: Users for my next Rails project will sign in with Google!

The client-side code was straightforward following directions in the Google documentation. The HTML is literally copy-and-paste, the JavaScript needed some reworking to translate into CoffeeScript for the standard Rails asset pipeline but wasn’t terribly hard.

The server side was less straightforward.

I started with the guide Authenticate with a Backend Server which had links to the Google API Client Library for (almost all) of the server side technologies including Ruby. The guide page itself included examples on using the client library to validate the ID token in Java, Node.JS, PHP, and Python. The lack of Ruby example would prove problematic because each flavor of the client library seems to have different conventions and use different names for the functionality.

Java library has a dedicated GoogleIdTokenVerifier class for the purpose. Node.JS library has a GoogleAuth.OAuth2 class with a verifyIdToken method. PHP has a Google_Client class with a verifyIdToken method. And to round out the set, Python library has oauth2client.verify_id_token.

Different, but they’re all in a similar vein of “verify”, “id”, and “token” so I searched the Ruby Google API client library documentation for those keywords in the name. After a few fruitless hours I concluded what I wanted wasn’t there.

Where to next? I went to the library’s Github page for clues. I had to wade through a lot of material irrelevant to the immediate task because the large library covers the entire surface of Google services.

I thought I had hit the jackpot when I found reference to the Google Auth Library for Ruby. It’s intended to handle all authentication work for the big client library, with the target completion date of Q2 2015. (Hmm…) Surely it would be here!

It was not.

After too many wrong turns, I looked at Signet in detail. It has a OAuth2::Client class, which sounded very similar to the other libraries, but it had no “verify” method so every time I see a reference to Signet I keep deciding to look elsewhere. Once I decided to read into the details of Signet::OAuth2::Client, I finally figured out that it had a decoded_id_token method that can optionally verify the token.

So it had the verification feature but the keyword “verify” itself wasn’t in the name, throwing off my search multiple times.

Gah.

Nothing to do now but to take some deep breaths, clear out the pent-up frustration, and keep on working…

Dipping toes in AWS via Rails Tutorial Sample App

aws_logoAmazon Web Services is a big, big ball of yarn. For somebody just getting started, the list of AWS products is quite intimidating. It’s not any fault of Amazon, it’s just the nature of building out such a comprehensive system. Fortunately, Amazon is not blind to the fact people can get overwhelmed and put admirable effort into a gentle introduction via their Getting Started resources: a series of (relatively) simple guided tours through select parts of the AWS domain.

At the end of it all, though, a developer has to roll up their sleeves and dive in. The question then is: where? In previous times, I couldn’t make up my mind and got stuck. This time around, I have a starting point: Michael Hartl’s Ruby on Rails Tutorial sample app, which wants to store images on Amazon S3 (Simple Storage Service).

Let’s make it happen.

One option is to blaze the simplest, most direct path to get rolling, but I resisted. The example I found on stackoverflow.com granted the rails app full access to storage with my AWS root credentials. Functionally speaking that would work, but that is a very bad idea from a security practices standpoint.

So I took a detour through Amazon IAM (Identity and Access Management). I wanted to learn how to do a properly scoped security access scheme for the Rails sample app, rather than giving it the Golden Key to the entire kingdom. Unfortunately, since IAM is used to manage access to all AWS properties, it is a pretty big ball of yarn itself.

Eventually I found my on-ramp to AWS: A section in the S3 documentation that discussed access control via IAM. Since I control the rails app and my own AWS account, I was able to skip a lot of the cross-account management for this first learning pass, boiling it down to the basics of what I can do for myself to implement IAM best practices on S3 access for my Rails app.

After a few bumps in the exploration effort, here’s what I ended up with.


Root account: This has access to everything, so we want to use this account as little as possible to minimize risk of compromising this account. Login to this account just long enough to activate multi-factor authentication and create an IAM user account with “AdministratorAccess” privileges. Log out of the management console as root, log back in under the new admin account to do everything else.

Admin account: This account is still very powerful so it is still worth protecting. But if it should be compromised, it can at least be shut down without closing the whole Amazon account. (If root is compromised, all bets are off.) Use this account to set up the remaining items.

Storage bucket: While logged in as the admin account, go to the S3 service dashboard and create a new storage bucket.

Access policy: Go to the IAM dashboard and create a new S3 access policy. The “resource” of the policy is the storage bucket we just created. The “action” we allow are the minimum set needed for the rails sample app and no more.

  1. PutObject – this permission allows the rails app to upload the image file itself.
  2. PutObjectAcl – this permission allows the rails app to change the access permission on the image object, make the image publicly visible to the world. This is required for use as the source field of an HTML <img> tag in the rails app.
  3. DeleteObject – when a micropost is deleted, the app needs this permission so the corresponding image can be deleted as well.

Access group: From the IAM dashboard, create a new access group. Under the “Permissions” list of the group, attach the access policy we just created. Now any account which is a member of the group has enough access the storage bucket to run the rails sample app.

User: From the IAM dashboard, create a new user account to be used by the rails app. Add this newly user to the access group we just created, so it is a part of the group and can use the access policy we created. (And no more.)

This new user, which we granted only a low level of access, will not need a password since we’ll never log in to Amazon management console with it. But we will need to generate an app access key and secret key.

Once all of the above are done, we have everything we need to put into Heroku for the Rails Tutorial sample app. A S3 storage bucket name, plus the access and secret key of the low level user account we created to access that S3 storage bucket.


While this is far more complex than the stackoverflow.com answer, it is more secure. Plus a good exercise to learn the major bits and pieces of an AWS access control system.

The above steps will ensure that, if the Rails sample app should be compromised in any way, the hacker has only the permissions we granted to the app and no more. While the hacker can put new images on the S3 bucket and make them visible, or delete existing images, but they can’t do anything else in that S3 bucket.

And most importantly, the hacker has no access to any other part of my AWS account.

Rails Tutorial (Take 2)

RailsTutorial-cover-webWith all the fun and excitement around 3D printing, I’ve let my Ruby on Rails education lapse. I want to dive back in, but it’s been long enough that I felt I needed a review. Also, during my time away, the Ruby on Rails team released version 5, and Michael Hartl’s Ruby on Rails Tutorial was updated accordingly.

Independent of the Rails 5 updates, it was well worth my time to go through the book again. On second run, I understood some things that didn’t make sense before. It was also good to look at the first do-nothing “hello app” and the second automated-scaffold “toy app” with a little more Rails knowledge under my belt. The book is structured so the beginner reader didn’t have to understand the mechanics of the hello or toy apps, but readers with a bit of understanding will get something out of it.

The release notes for the update mentioned that a few sections were rearranged for better pacing and structure, and added more exercises for readers to check their progress. Both are incremental improvements that I appreciated but neither were especially earth-shattering.

Action Cable, one of the big signature feature of Rails 5, was not rolled into the book. Hartl is handling that in a separate tutorial Learn Enough Action Cable to be Dangerous which I will go through at some point in the near future.

Towards the end of the book, Hartl introduced an optional advanced concept: using Amazon Web Services to store user image uploads. I skipped that section the first time through, and decided to dive into it this time.

I quickly found myself in a deep rabbit hole. Amazon Web Services has many moving parts designed for a wide range of audiences and it’s a challenge to get started without being overwhelmed.

Which is where I am now. Lots more exploration ahead!