Notes on Codecademy “Design Databases with PostgreSQL”

After a frustrating experience with the node-sqlite course on Codecademy, I’ve concluded that their in-browser instruction environment is not well set up for teaching SQL. Or at least, this course is far weaker than others on Codecademy, which I had been generally satisfied up until this point. I considered switching to a different topic but decided there were still a few more things I wanted to learn. But from here on out, when I fail an exercise, I’m more likely to decide my answer was fine. (More willing to hit “View Solution” to see the answer and compare.) And more importantly, I’m going to review the course syllabus beforehand and skip those with frustrating “Code Challenge” sections.

Which led me to “Design Databases with PostgreSQL” which is technically a “Skill Path” offering rather than a course. Like my previous skill path in web development, I started this skill path and was immediately at 50% completion due to material that overlapped with courses I’ve already taken. One major difference is that all my Codecademy database courses used SQLite to illustrate general SQL concepts, but this skill path has some PostgreSQL-specific details amongst more general database concepts.

Initially, I was mildly disappointed that the material was shallower than I had expected based on course description. The sales pitch made it sound like we’re going to get into some real detailed nuts-and-bolts, but while the course did indeed to into further depth than other courses to date, there were still many topics that ended with “we’re not going into more depth, but at least now you are aware it exists.” An example concerns database keys. From my earlier courses I had known about primary, foreign, and composite keys. This course mentioned there are also super, candidate, and secondary keys then proceeded to say nothing more about them. As a starting pointer this is fine, I had just expected more.

Once I adjusted my expectations, this skill path was time well spent. We get more information on database schema and that proper design of a schema can make a difference in database performance. Both at development time (make queries easier and less prone to problems) and at runtime (easier updates and faster queries.) Part of this design process is database normalization, which was covered by starting with a poorly designed database. After covering how a poorly normalized database causes problems in use, we are walked through typical normalization techniques to solve those problems.

But those solutions usually have a tradeoff. This course has a recurring theme in the form of database tradeoffs. A competent database engineer has to be able to understand the problem domain and usage patterns to properly prioritize certain constraints versus others. A normalized database has a lot of space, performance, and consistency advantages. But it does tend to make updates and queries more complex by requiring database joins. Similarly, a database index can make queries faster, but maintenance means updates are slower. The index also takes up space on disk.

A light complaint I have about this course is that its illustrative examples were surprisingly poor. One example used an email address as a primary key for a list of people, but a person can have multiple email addresses making it a poor primary key. Another example separated ingredients from recipes, but the ingredient is associated with a fixed amount. It is unrealistic for a cookbook to use, say, the same amount of salt for every recipe.

One of the practice exercises was “Bytes of China”, setting up a database that would track information suitable for a restaurant menu. From the starting directions I had looked forward to an open-ended exercise, because it said to: “Create table with columns that make sense based on the description” This came to a screeching halt when we were given information to add to these tables in the form of SQL INSERT clauses that expect a specific database schema. I had to delete the database schema that made sense to me and rebuild a different schema to suit the exercise data. This was annoying. They could have told us to build to suit their sample data upfront instead of letting us waste time designing our own that wouldn’t work.

Gripes aside, I learned a lot of neat things that I expect to use in future projects that might need a database. I learned how it was possible to represent many-to-many relationships with a “join table” that has a composite primary key that consists of a combination of foreign keys. Beyond “PRIMARY KEY” I learned about constraints like UNIQUE, CHECK, and REFERENCES. In the earlier SQLite course I was dismayed to learn it doesn’t enforce the schema, calling it a flexible feature instead of a bug. PostgreSQL isn’t as free-wheeling but we still need to watch out for times when it becomes inadvertently unhelpful. If I make a mistake trying to put a floating-point number like 1.5 into an integer field, PostgreSQL would round it to 2 without error. Or if I make a mistake putting it into a text field, PostSQL would helpfully convert the number 1.5 to the string “1.5”.

Every bit of SQL instruction I’ve come across before only ever joined two tables, this course was the first to teach me how to join more than two. I can see how this would feel obvious to SQL veterans, but every beginner has to see it at least once:

SELECT table_one.column_one AS alias_one, table_two.column_two AS alias_two, table_three.column_three AS alias_three
FROM table_one
INNER JOIN table_two
ON table_one.primary_key = table_two.foreign_key
INNER JOIN table_three
ON table_two.primary_key = table_three.foreign_key;

I think all of the remaining nifty tricks are PostgreSQL-only, but I’m not sure. The course doesn’t make a lot of distinction between thing we can use in other databases versus PostgreSQL-only. So I don’t know if I can extract just the date from a timestamp (DATE_PART()) with other databases, or if I can make this query to examine constraints on record for a table:

SELECT
  constraint_name, table_name, column_name
FROM
  information_schema.key_column_usage
WHERE
  table_name = 'fill in table name';

Given the pg_ prefix, the following query is likely PostgreSQL specific way to list every index built for a table.

SELECT *
FROM pg_Indexes
WHERE tablename = 'fill in table name';

Also with the prefix is a query to show size of a table, which includes space consumed by storing data and all associated index.

SELECT pg_size_pretty (pg_total_relation_size('products'));

And finally, we get a few starting points for performance analysis. Like prepending EXPLAIN ANALYZE in front of a query to get information on how the database plans out its execution. Or SELECT NOW(); to print out a timestamp before and after an operation so we can see how long it took.

That’s a lot of information packed into a single Skill Path. I wished for more, but I can understand there’s a tradeoff against making the course too long. Maybe this is the perfect length after all, and just enough for me to learn more on my own later. I can spend years learning all the intricacies of relational databases, but right now I’m more curious to explore something a little different.

Notes on Codecademy “Learn Node-SQLite”

After my SQL fresher course, shortly after learning Node.js, I thought the natural progression was to put them together with Codecademy’s “Learn Node-SQLite” course. The name node-sqlite3 is not a mathematical subtraction but that of a specific JavaScript library bridging worlds of JavaScript and SQL. This course was a frustrating disappointment. (Details below) In hindsight, I think I would have been better off skipping this course and learn the library outside of Codecademy.

About the library: Our database instructions such as queries must be valid SQL commands stored as strings in JavaScript source code. We have the option of putting some parameters into those strings in JavaScript fashion, but the SQL commands are mostly string literals. Results of queries are returned to the caller using Node’s error-first asynchronous callback function convention, and query results are accessible as JavaScript objects. Most of library functionality are concentrated in just a few methods, with details available from API documentation.

This Codecademy course is fairly straightforward, covering the basics of usage so we can get started and explore further on our own. I was amused that some of the examples were simple to the point of duplicating SQL functionality. Specifically the example for db.each() shows how we can tally values from a query which meant we ended up writing a lot of code just to duplicate SQL’s SUM() function. But it’s just an example, so understandable.

The course is succinct to the point of occasionally missing critical information. Specifically, the section about db.run() say “Add a function callback with a single argument and leave it empty for now. Make sure that this function is not an arrow.” but didn’t say why our callback function must not use arrow syntax. This minor omission became a bigger problem when we roll into the after-class quiz, which asked why it must not use arrow syntax. Well, you didn’t tell me! A little independent research found the answer: arrow notation functions have a different behavior around the “this” object than other function notations. And for db.run(), our feedback is stored in properties like this.lastID which would not be accessible in an arrow syntax function. Despite such little problems, the instruction portion of the course were mostly fine. Which brings us to the bad news…

The Code Challenge section is a disaster.

It suffers from the same problem I had with Code Challenge section of the Learn Express course: lack of feedback on failures. Our code was executed using some behind-the-scenes mechanism, which meant we couldn’t see our console.log() output. And unlike the Learn Express course, I couldn’t workaround this limitation by throwing exceptions. No console logs, no exceptions, we don’t even get to see syntax errors! The only feedback we receive is always the same “You did it wrong” message no matter the actual cause.

Hall of Shame Runner-Up: No JavaScript feedback. When I make a JavaScript syntax error, the syntax error message was not shown. Instead, I was told “Did you execute the correct SQL query?” so I wasted time looking at the wrong thing.

Hall of Shame Bronze Medal: No SQL feedback. When I make a SQL command error, I want to see the error message given to our callback function. But console.log(error) output is not shown, so I was stabbing in the dark. For Code Challenge #13, my mistake was querying from “Bridges” table when the sample database table is actually singular “Bridge”. If I could log the error, I would have seen “No such table Bridges” which would have been much more helpful than the vague “Is your query correct?” feedback.

Hall of Shame Silver Medal: Incomplete Instructions. Challenge #14 asked us to build a query where “month is the current month”. I used “month=11” and got nothing. The database had months in words, so I actually needed to use “month=’November'”. I wasted time trying to diagnose this problem because I couldn’t run a “SELECT * FROM Table” to see what the data looked like.

Hall of Shame Gold Medal Grand Prize Winner: Challenge #12 asks us to write a function. My function was not accepted because I did not declare it using the same JavaScript function syntax used in the solution. Instructions said nothing about which function syntax to use. After I clicked “View Solution” and saw what the problem was (image above) I got so angry at the time it wasted, I had to step away for a few hours before I could resume. This was bullshit.


These Hall of Shame (dis)honorees almost turned me off of Codecademy entirely, but after a few days away to calm down, I returned to learn what Codecademy has to teach about PostgreSQL

Notes on Codecademy “Learn SQL”

I’m a little sad that hobbyist web app projects have lost the option of free hosting on Heroku, but that’s no reason to stop learning. Heroku is not irreplaceable, I’m sure I can figure out something if a project proceeds far enough to be worth the effort. So, back to learning: where should I go next? Looking at project ideas that involve Node.js and potentially Express, I decided the next area of focus is a backing datastore. It’s time for some database refresher work starting with Codecademy’s “Learn SQL“.

I’ve taken several database courses in the past, to varying levels of rigor and depth. I expected the introductory material of this course to be review so I’m better able to learn new concepts later in the course. As it turned out, this course was entirely review for me but to be fair, some concepts were fresher in my mind than others. I especially appreciated the cool animations illustrating various table joins.

This specific course could be more accurately titled “Learn SQLite” because that’s the database engine used in the course. Which is fine, it covers all the basics. The one thing I hadn’t known (or had forgotten) about SQLite is its… flexibility… in data types. It is standard operating procedure for SQL tables to be declared with a data schema. “Names are strings, IDs are numbers”, etc. While SQL was designed for the database engine to enforce this schema, SQLite does not. When the Codecademy course mentioned this, I said “What!?” but the assertion checks out, confirmed by SQLite’s own FAQ which declares type flexibility as a feature and not a bug. I come from a world of strictly typed programming languages like C, so flexible typing like JavaScript feels more like a problem waiting to happen than a feature. I feel the same with SQLite’s lack of schema enforcement.

Another reason to take a SQL refresher course now is to review all concepts from a new perspective. Now that I am thinking of using a database as backend storage for a web application. From this perspective, some of SQL features make less sense than in other contexts. For example, I’m not sure ORDER BY makes sense to do within the database engine, as a web app almost certainly needs to have sorting logic anyway. Think of the shopping sites that lets the user reorder by availability, by lowest price, etc. For small datasets I’d want to do that on the client end instead of round-tripping each new sort as a new query all the way to the database. But the story changes for large datasets. It’ll make sense to sort data on the database if we want things ordered and then LIMIT to the top X items. That reduces bandwidth consumption between server and client and would be a good tool to have.

In contrast, other features like CASE (to categorize values), AS (to rename columns), and ROUND (rounding numbers) are definitely tasks better performed on the client end. I can’t think of a scenario (yet) where it makes sense to do that work on the server-side database.

This course touches on the concepts of primary keys and foreign keys, but other than uniqueness we didn’t get any further details of relational database design. This course didn’t cover concerns of properly designing a database to suit the task, such as database normalization. As a result, this course is good for setting someone up to use an existing database, but not enough to help them set up a new database. Or at least, not an efficient or effective one. Maybe that’ll be part of another course.