Internet Search: Google Always Knows Best

One advantage of having my own domain like https://newscrewdriver.com is that I can subscribe to search engine services that provide feedback on how they process a site. Microsoft has Bing Webmaster Tools, Google has Google Search Console, etc. These are products focused on helping websites improve their chances of showing up in internet search results. An important objective for many sites, but not this one because I’m not trying to maximize revenue or anything here. I set up my dashboards because I was curious what information was available and what search giants expect site owners to do about it. Not that I can, for the most part, because majority of site implementation details here are handled by WordPress and out of my direct control.

Every once in a while I would receive a notification that some problem prevented a part of my site from search indexing. Most of these were caused by a change in WordPress and are quickly fixed with no action on my part. Sometimes the notification reflect a big attitude change in how Google thinks about the web and want sites to follow along. For example, a few years back Google decided a site’s treatment of mobile devices is more important than desktop, so sites are encouraged to have a good mobile experience or else their Google search ranking would take a hit. This was a change I agree with: making sites usable on small screens with slow processors and low bandwidth makes the web more accessible to all, and wielding Google search ranking as a stick to encourage adoption is one way to use Google power for good.

But then there are times that make me… less fond… of Google. I just received an error notification: “Duplicate, Google chose different canonical than user” with a link to this explanation. As I understand it, the error says that Google looked at the URL listed in my WordPress-generated site map and decided a different URL was better. I thought that was odd, but whatever. I went looking for a way to inform Google their decision was wrong, and I can’t. Google has decided they know my site better than I do and the “fix” is to change my site to use their chosen URL. Conform to Google. There is no alternative.

In this specific case, the URL was to one of my posts and Google’s chosen URL was to the comments section on that same page. Google is wrong, their link is not the superior link, but they will not hear arguments on their decision. Stalemate. I guess Google will keep using their wrong URL. Looking on the bright side, at least that URL actually points to my content, unlike some other search engines.

Lightweight Google AMP Gaining Weight

Today I received a notification from Google AMP that the images I use in my posts are smaller than their recommended size. This came as quite a surprise to me – all this time I thought I was helping AMP’s mission to keep things lightweight for mobile browsers. It keeps my blog posts from unnecessarily using up readers’ cell phone data plans, but maybe this is just a grumpy old man talking. It is clear that Google wants me to use up more bandwidth.

AMP stands for Accelerated Mobile Pages, an open source initiative that was launched by Google to make web pages that are quick to download and easy to render by cell phones. Cutting the fat also meant cutting web revenue for some publishers, because heavyweight interactive ads were forbidden. Speaking for myself, I am perfectly happy to leave those annoying “Shock the Monkey” ads behind.

As a WordPress.com blog writer I don’t usually worry about AMP, because they automatically creates and serves an AMP-optimized version of my page to appropriate readers. And since I don’t run ads on my page there’s little loss on my side. As a statistics junkie, I do miss out on knowing about my AMP viewership numbers, because people who read AMP cached versions of my posts don’t interact with WordPress.com servers and don’t show on my statistics. But that’s a minor detail. And in theory, having an AMP alternate is supposed to help my Google search rankings so I get more desktop visitors than I would otherwise. This might matter to people whose income depends on their web content. I have the privilege that I’m just writing this blog for fun.

Anyway, back to the warning about my content. While I leave AMP optimization up to WordPress.com, I do control the images I upload. And apparently I’ve been scaling them down too far for Google.

amp image recommend 1200 wide

I’m curious why they chose 1200 pixel width, that seems awfully large for a supposedly small lightweight experience. Most Chromebook screens are only around 1300 pixels wide, a 1200 pixel wide image is almost full screen! Even normal desktop web browsers visiting this site retrieves only a 700 pixel wide version of my images. Because of that fact, I had been uploading images 1024 pixels wide and thought I had plenty of headroom. Now that I know Google’s not happy with 1024, I’ll increase to 1200 pixels wide going forward.

Let the App… Materialize!

materializecsslogoAfter I got the Google sign-in working well enough for my Rails practice web app, the first order of business was to build the basic skeleton. This was a great practice exercise to take the pieces I learned in the Ruby on Rails Tutorial sample app and build something of my own design.

The initial pass implemented basic functionality but it didn’t look very appealing. I had focused on the Rails server-side code and left the client-side code simple plain HTML that would have been state-of-the-art in… maybe 1992?

Let’s make it look like something that belongs in 2017.

The Rails tutorial sample app used Bootstrap to improve the appearance and functionality of the client-side interface. I decided to take this opportunity to learn something new instead of doing the same thing. Since I’m using Google Sign-In in this app, I decided to adopt Google’s design concepts to my client-side appearance as well.

Web being the web, I knew I wouldn’t have to start from scratch. I knew about Google’s own Material Lite and thought that would be a good candidate before I learned it had been retired in favor of its successor, Material Components for the web. One of the touted advantages was improved integration with different web platforms. Sadly Rails was not among the platforms with examples ready-to-go.

I looked around for an existing project to help Rails projects adapt Google’s design language, and that’s when I found Materialize: A library that shares many usage patterns with Bootstrap. The style sheets are even written using SASS, native to default Rails apps, making for easy integration. Somebody has done that work and published it as Ruby gem materialize-sass, so all I had to do was add a single line to use Materialize in my app.

Of course I still had to put in the effort to revise all the view files in my web app to pick up Materialize styling and features. That took a few days, and the reward for this effort is a practice web app which no longer look so amateurish.

Protecting User Identity

google-sign-inRecently, web site security breaches have been a frequent topic of mainstream news. The technology is evolving but this chapter of technology has quite some ways to go yet. Learning web frameworks gives me an opportunity to understand the mechanics from web site developer’s perspective.

For my project I decided to use Google Identity platform and let Google engineers take care of identification and authentication. By configuring the Google service to retrieve only basic information, my site never sees the vast majority of personally identifiable information. It never sees the password, name, e-mail address, etc.

All my site ever receives is a string, the Google ID. My site uses it to identify an user account. With the security adage of “what I don’t know, I can’t spill” I thought this was a pretty good setup: The only thing I know is the Google ID, I can’t spill anything else.

Which led to the next question: what’s the worst that can happen with the ID?

I had thought the ID is something Google generated for my web site. More specifically my site’s Client ID. I no longer believe so. A little experimentation (aided by a change in Client ID for the security issue previously documented) led me to now believe it’s possible the Google ID is global across all Google services.

This means if a hacker manages to get a copy of my site’s database of Google ID, they can cross-reference to databases of other compromised web sites. Potentially assembling a larger picture out of small pieces of info.

While I can’t stop using Google ID (I have to have something to identify an user) I can make it more difficult for a hacker to cross-reference my database. I’ve just committed a change to hash the ID before it is stored in the database. Salted with a value that is unique per deployed instance of the app.

Now for a hacker to penetrate the identity of my user, they must do all of the following:

  1. Obtain a copy of the database.
  2. Obtain the hashing salt used by the specific instance of the app which generated that database.
  3. Already have the user’s Google ID, since they won’t get the original ID out of my database of hashed values.

None of which are impossible, but certainly a lot more effort than it would have otherwise taken.

I think this is a worthwhile addition.

Limiting Google Client ID Exposure

google-sign-inToday’s educational topic: the varying levels of secrecy around cloud API access.

In the previous experiment with AWS, things were relatively straightforward: The bucket name is going to be public, all the access information are secret, and none of them are ever exposed to the user. Nor are they checked into the source code. They are set directly on the Heroku server as environment variables.

Implementing a web site using Google Identity got into a murky in-between for the piece of information known as the client ID. Due to how the OAuth system is designed, the client ID has to be sent to the user’s web browser. Google’s primary example exposed it as a HTML <meta> tag.

The fact the client ID is publicly visible led me to believe the client ID is not something I needed to protect, so I had merrily hard-coded it into my source and checked it into Github.

Oops! According to this section of the Google Developer Terms of Service document, that was bad. See the sections I highlighted in bold:

Developer credentials (such as passwords, keys, and client IDs) are intended to be used by you and identify your API Client. You will keep your credentials confidential and make reasonable efforts to prevent and discourage other API Clients from using your credentials. Developer credentials may not be embedded in open source projects.

Looks like we have a “secret but not secret” level going on: while the system architecture requires that the client ID be visible to an user logging on to my site, as a developer I am still expected to keep it secret from anybody just browsing code online.

How bad was this mistake? As far as security goofs go, this was thankfully benign. On the Google developer console, the client ID is restricted to a specific set of URIs. Another web site trying to use the same client ID will get an error:

google-uri-mismatch

IP addresses can be spoofed, of course, but this mitigation makes abuse more difficult.

After this very instructional detour, I updated my project’s server-side and client-side code to retrieve the client ID from an environment variable. The app will still end up sending the client ID in clear text to the user’s web browser, but at least it isn’t in plain sight searchable on Github.

And to close everything out, I also went into the Google developer console to revoke the exposed client ID, so it can no longer be used by anybody.

Lesson learned, moving on…

Adventures in Server-Side Authentication

google-sign-inThe latest chapter comes courtesy of the Google Identity Platform. For my next Rails app project, I decided to venture away from the user password authentication engine outlined in the Hartl Ruby on Rails Tutorial sample app. I had seen the “Sign in with Google” button on several web sites (like Codecademy) and decided to see it from the other side: Users for my next Rails project will sign in with Google!

The client-side code was straightforward following directions in the Google documentation. The HTML is literally copy-and-paste, the JavaScript needed some reworking to translate into CoffeeScript for the standard Rails asset pipeline but wasn’t terribly hard.

The server side was less straightforward.

I started with the guide Authenticate with a Backend Server which had links to the Google API Client Library for (almost all) of the server side technologies including Ruby. The guide page itself included examples on using the client library to validate the ID token in Java, Node.JS, PHP, and Python. The lack of Ruby example would prove problematic because each flavor of the client library seems to have different conventions and use different names for the functionality.

Java library has a dedicated GoogleIdTokenVerifier class for the purpose. Node.JS library has a GoogleAuth.OAuth2 class with a verifyIdToken method. PHP has a Google_Client class with a verifyIdToken method. And to round out the set, Python library has oauth2client.verify_id_token.

Different, but they’re all in a similar vein of “verify”, “id”, and “token” so I searched the Ruby Google API client library documentation for those keywords in the name. After a few fruitless hours I concluded what I wanted wasn’t there.

Where to next? I went to the library’s Github page for clues. I had to wade through a lot of material irrelevant to the immediate task because the large library covers the entire surface of Google services.

I thought I had hit the jackpot when I found reference to the Google Auth Library for Ruby. It’s intended to handle all authentication work for the big client library, with the target completion date of Q2 2015. (Hmm…) Surely it would be here!

It was not.

After too many wrong turns, I looked at Signet in detail. It has a OAuth2::Client class, which sounded very similar to the other libraries, but it had no “verify” method so every time I see a reference to Signet I keep deciding to look elsewhere. Once I decided to read into the details of Signet::OAuth2::Client, I finally figured out that it had a decoded_id_token method that can optionally verify the token.

So it had the verification feature but the keyword “verify” itself wasn’t in the name, throwing off my search multiple times.

Gah.

Nothing to do now but to take some deep breaths, clear out the pent-up frustration, and keep on working…