Notes on “Using MongoDB with Node.js” from MongoDB University

Most instructional material (and experimentation) for MongoDB uses the MongoDB Shell (mongosh), which is “a fully functional JavaScript and Node.js 16.x REPL environment for interacting with MongoDB deploymentsaccording to mongosh documentation. Making mongosh the primary command line interface useful for exploration, experimentation, and education like on Codecademy or MongoDB University.

Given the JavaScript focus of MongoDB, I was not surprised there is a set of first-party driver libraries to translate to/from various programming languages. But I was surprised to find that Node.js (JavaScript) was among the list of drivers. If this was all JavaScript anyway, why do we need a driver? The answer is that we don’t use JavaScript to talk to the underlying database. That is the job of BSON, the binary data representation used by MongoDB for internal storage. Compact and machine-friendly, it is also how data is transmitted over the network. Which is why we need a Node.js library to convert from JSON to BSON for data transmission. I started the MongoDB University course “Using MongoDB with Node.js” to learn more about using this library.

It was a short course, as befitting the minimal translation required of this JavaScript-focused database. The first course covered how to connect to a MongoDB instance from our Node.js environment. I decided to do my exercises with a Node.js Docker container.

docker run -it --name node-mongo-lab -v C:\Users\roger\coding\MongoDB\node-mongo-lab:/node-mongo-lab node:lts /bin/sh

The exercise is “Hello World” level, connecting to a MongoDB instance and listing all available databases. Success means we’ve verified all libraries & their dependencies are install correctly, that our MongoDB authentication is set up correctly, and that our networking path is clear. I thought that was a great starting point for more exercises, and was disappointed that we actually didn’t use our own Node.js environment any further in this course. The rest of the course used the Instruqt in-browser environment.

We had a lightning-fast review of MongoDB CRUD Operations and how we would do them with the Node.js driver library. All the commands and parameters are basically identical to what we’ve been doing in mongosh. The difference is that we need an instance of the client library as the starting point, from which we could obtain object representing a database and a collection with it. (client.db([database name]).collection([collection name]) Once we have that reference, everything else looks exactly as they did in mongosh. Except now they are code to be executed by Node.js runtime instead of typed. One effect of running code instead of typing commands is that it’s much easier to ensure transaction sessions complete within 60 seconds.

For me, a great side effect of this course is seeing JavaScript async/await in action doing more sophisticated things than simple straightforward things. The best example came from this code snippet demonstrating MongoDB Aggregation:

    let result = await accountsCollection.aggregate(pipeline)
    for await (const doc of result) {
      console.log(doc)
    }

The first line is straightforward: we run our aggregation pipeline and await its result. That result is an instance of MongoDB cursor which is not the entire collection of results but merely access to a portion of that collection. Cursors allow us to start processing data without having to load everything. This saves memory, bandwidth, and processing overhead. And in order to access bits of that collection, we have this “for await” loop I’ve never seen before. Good to know!

Many Paths to MongoDB Shell (mongosh)

To promote their product, MongoDB has setup their own online learning resource MongoDB University. I was curious to learn more about a database that offers something different from a standard SQL relational database, so I started with their “Introduction to MongoDB” course. Since it was at least partially a marketing tool, I was not surprised the course wanted to take us through a grand tour through all MongoDB products from their cloud-hosted MongoDB Atlas data platform to all the tools we can download and install. I was ready to just ignore and skip over those sales pitches, but then the course would quiz me to make sure I’ve actually installed them on my computer. This annoyed me, especially for the MongoDB Shell (mongosh). It’s how Codecademy introduces us to MongoDB, and it’s what we use in MongoDB University’s Instruqt hands-on labs. There are so many ways to get mongosh I refuse to download and run a full blown application installer package just for a command line tool.

At a minimum, this duplicates work. MongoDB Compass is another item the course wants us to download and install. Compass is a separate application that offers GUI-based methods for interacting with a MongoDB database and/or Atlas cluster. GUI are nice but rarely cover 100% of all scenarios, so developers like the option of dropping to a command line. Which is why clicking at the bottom of MongoDB Compass would bring up an integrated mongosh.

I’ve been using Docker as a tool to avoid installing software directly on my computer. I couldn’t escape installing MongoDB Compass because of its graphics interface, but as a command line tool mongosh is easy to run through Docker. The easiest way is to use the official MongoDB Docker image, which includes mongosh alongside the core database engine.

docker run --rm -it mongo:latest mongosh [connection string]

Doing this means we’re pulling down the entire MongoDB database just for the little mongosh tool. That’s like ordering an entire seven-course meal to eat just the little cherry on top of ice cream dessert. Even in this age of broadband internet, that seems rather excessive. I thought it’d be neat to try setting up a container just for mongosh, see if that’s any smaller.

I found instructions for installing mongosh on an Ubuntu instance. Ubuntu Focal is one of the supported versions so I’ll start there.

> docker run --name mymongosh -it ubuntu:focal

I’ll need some tools not found in this basic Ubuntu container, so I need to populate the package index followed by their installation.

> apt update

> apt install wget gnupg

After that I could follow MongoDB instructions. Slightly modified by removing “sudo” as it was unnecessary: we are running as root in this little world.

> wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | apt-key add -

> echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | tee /etc/apt/sources.list.d/mongodb-org-6.0.list

> apt update

> apt install mongodb-mongosh

And voila! I have a docker container “mymongosh” based on the Ubuntu Focal container. Let me see how big it is:

> docker ps --size
CONTAINER ID   IMAGE          COMMAND   CREATED          STATUS          PORTS     NAMES       SIZE
757519a645f8   ubuntu:focal   "bash"    19 minutes ago   Up 19 minutes             mymongosh   278MB (virtual 351MB)

Wow, getting this far meant pulling 278MB on top of 73MB of Ubuntu Focal. This was far larger than I had hoped. While this size still compared favorably with MongoDB image size of almost 700MB, I don’t think my little experiment was worth the effort.

PS C:\Users\roger\coding\PostgreSQL> docker image ls
REPOSITORY                                                        TAG              IMAGE ID       CREATED         SIZE
mongo                                                             latest           2dd27bb6d3e6   7 days ago      695MB

If I want small size, I’d have to learn how to build a minimal Docker image based on Alpine, which isn’t what I want to focus on right now. My next objective is to learn how to use MongoDB from Node.js code instead of mongosh.

Notes on “Introduction to MongoDB” on MongoDB University

I started learning about MongoDB on Codecademy, but there were technical difficulties blocking my progress. So I switched to MongoDB University, the free online learning platform hosted by MongoDB themselves. There was a bit of a learning curve on idiosyncrasies of MongoDB’s learning platform, but I got the hang of it soon enough and could focus on the information covered by “Introduction to MongoDB” course.

Each section of the course starts with a Wistia-hosted video. I prefer to learn by reading at my own pace over watching a video, so this wasn’t my favorite. The video sometimes has captions allowing us to read along with the spoken narrative, but the presence of captions was inconsistent. Sometimes the video is followed by a page of text covering the same information and I thought I could read that text and skip the video, only to find that sometimes the text is missing information covered by the video and vice versa. I end up having to do both video and repetitive text, which feels like a tremendously inefficient way to do things.

Another inefficiency are the hands-on labs powered by Instruqt. It takes several clicks and a few tens of seconds for the lab environment to spin up, which isn’t bad by itself except most of our exercises involve typing out a single mongosh (MongoDB Shell) command and exiting to proceed to the next lab. This implementation means we spend a lot of time twiddling our thumbs waiting for spin-up and tear-down overhead, I would have preferred that we do more in each Instruqt session, so we don’t spend so much time waiting.

As for the course material, it’s not technically just about the MongoDB database itself, the course is also a sales pitch for associated MongoDB products. Mainly MongoDB Atlas, the cloud-hosted (your choice of AWS, Azure or GCP) MongoDB service that cost money for us and generates revenue for the company. I don’t mind a company making sure we know how to give them money, but I don’t care for the fact that MongoDB Atlas features are intermingled with MongoDB core functionality in this course. When we run our own MongoDB instance (which I plan to do for my own projects) we won’t have Atlas functionality, but this course doesn’t always make clear what is core MongoDB versus functionality added by Atlas. I expect the company would say “Don’t need to bother, just use Atlas for your projects, we have a free tier!” and I appreciate that the offer exists. But I’m rather skeptical that their “Free Forever” tier will actually stay free given the discouraging historical record of free tiers.

Instructions in MongoDB mostly concentrate on what we can type into mongosh, which exposes a JavaScript style interface for interacting with a database. That is very different from storing SQL commands in strings as we did in node-sqlite. Compared to SQL syntax, I found it easier to remember and parse mongosh syntax because of its similarity to JavaScript. That said, I still come across things that feel inconsistent to me that catch me off guard. Some commands want us to specify an action first then where to store the results. {$count: "total_items"} Other commands want us to specify the location of results followed by the action to generate it. {"total_items": {$count {}}} Maybe there’s a system and I just haven’t learned them yet, certainly the course never tried to explain them.

At least I could still follow the logic for most of those inconsistencies. The one that completely confused me was an example using geographical location data in the form of { type: 'Point', coordinates: [ 40, -74 ] } I think this is their GeoJSON format but the course didn’t go into detail. The example then sorted by ‘latitude‘, and I don’t see how I could have looked at ‘type‘ and ‘coordinates‘ and decide ‘latitude‘ is available for sorting. To my eye ‘latitude‘ was not defined at all! If I figure this out later, I’ll add a URL here to the appropriate reference.

Here is my super-abbreviated variation of course syllabus:

  • Querying for data with db.[collection].findOne() and find(). Followed by insert with insertOne() and insertMany(). We start with a few basic query operators using comparison operators $gt, $gte, $lt, and $lte. Then logical operators like $and and $or. We can use $elemMatch if we want to peer into values in the form of an array.
  • Modifying data with replaceOne for direct replacement. updateOne() and updateMany() take operators like $set and $push. findAndModify() combines a common pattern of finding a single document, modifying it, and returning the results. Finally deleteOne() to remove documents.
  • Tailor query results with cursor.sort() and limit(). Projection parameter to find() lets us skip information we’re not interested in. And collection.countDocuments() is useful for data exploration.
  • Aggregation is roughly analogous to SQL table joins. We are introduced to operators $match, $group, $sort, $limit, $project, $count, $set, and $out.
  • Index helps us optimize lookup for specific query patterns. Recommend we list the fields in usage order of Equality, Sort, then Range. (But couse didn’t explain why.) Like SQL, there’s always some tradeoff involved for having a database index. The explain() command helps us see if an index is actually as helpful as expected.
  • There’s a whole arena of information on MongoDB anti-patterns. Like how we should be aware of BSON size limit of 16MB which means avoiding things like creating arrays that would grow unbounded.

This course gave me a solid start to using MongoDB in my projects. As I do not intend to rely on MongoDB Atlas staying free, I’m more likely to run the MongoDB Docker container on a test machine at home. Interactions with such a database would likely happen via mongosh and there are many ways to get it.

Notes on MongoDB University Learning Platform

I’ve been learning a lot of interesting things from Codecademy’s course catalog, including the fundamental concepts of the “NoSQL” database software design of MongoDB. However, when Codecademy’s hands-on learning mechanism got stuck, I couldn’t see how those fundamental concepts translated to MongoDB practice. But that’s OK, the Codecademy MongoDB course was built in conjunction with MongoDB themselves, so I can try their own MongoDB University platform for learning online.

The switch does introduce some friction, though. Starting with the online learning platform itself. Codecademy rarely uses video lectures. (Which I appreciate, and more comments about content will come later.) When they do, they embed a YouTube video and let Google figure out the rest. MongoDB University hosts their (more frequent) video lectures on Wistia. (“Where Video Meets marketing”.) Wistia’s video player component has a slightly different interface from YouTube. The most annoying aspect of the Wistia player is lack of memory across sessions. I prefer playing the videos slightly faster than standard speed, and its lack of memory means I have to click the settings button to speed up playback on Every. Single. Video. I also prefer to turn on English captions, which meant more clicking on every video. Not every MongoDB University video had captions available, but I don’t blame Wistia for that. I do wish they followed YouTube’s lead and offer at least the option of imperfect (but far better than nothing) auto-generated captions.

Since I loved Codecademy giving us hands-on interaction with the material, I was happy to see MongoDB University courses also has a hands-on interaction component powered by Instruqt. (“The #1 Hands-on Virtual IT Labs for Product-led Growth”) For these exercises, Instruqt provides a mongosh (MongoDB interactive shell) command line in our browser window. It feels like they’ve spun up at least two Docker instances for each exercise: one container running mongosh, connected to another container running MongoDB itself. This works well as we get a known starting state for each exercise and, after we’re done, undo anything we’ve changed so the next person gets the same starting state. The downside of a browser-based command line is that we don’t get the full set of command-line keyboard shortcuts. I had to use mouse right-click in order to copy and paste because my preferred keyboard shortcuts didn’t work.

Another similarity to Codecademy are quizzes sprinkled throughout the course to test our comprehension of course material. User interface for these quizzes leave something to be desired. The first problem is when I clicked “Show Results”, I saw items I didn’t select displayed as “Incorrect”. I was confused for a few minutes, reading the explanation trying to figure out where I went wrong. Eventually I figured it out: I wasn’t wrong. They’re just showing me all of the explanations. I did not select the incorrect answer and that was the correct thing to do. That was very confusing. The next problem is literally the “Next” button on these quizzes. After finishing one question, I clicked “Next” to proceed with the quiz, only to get disoriented when the course continued with new material. Eventually I figured out there is a “Next” button to continue with the quiz, and it is near a different “Next” button which will abandon the quiz and proceed with the course. I clicked the latter when I should have clicked the former. This was a pretty bad user interface design but once I figured out what was going on, I could deal with it and focus on the course material.

Notes on Codecademy “Learn MongoDB”

Codecademy’s course catalog on SQL databases is thinner than those on topics like web front-end development, and their in-browser learning infrastructure isn’t as polished for those courses either. This has caused me frustration, but I was still learning useful knowledge. Codecademy’s PostgreSQL skill path was packed with information I can use and links to where I could learn more. After that, though, I wasn’t interested in anything else under their SQL umbrella of courses, but that doesn’t mean I’m done with databases because SQL relational databases are no longer the only game in town: A few alternative “NoSQL” database designs have recently arisen, and I had been curious about their design tradeoffs. So, the next step is Codecademy’s Learn MongoDB course, which I learned was launched (or at least promoted) recently via a Codecademy mailing list. Unfortunately its technical immaturity caused problems, more on that later.

This course started well, with a review of databases for those who come into the course without a background in SQL relational databases. With that background established, it proceeded to describe how NoSQL databases (there are several subtypes) like MongoDB (representing the “document database” subtype) go about their business. I loved this section, because it answered my question about why these databases exist and when they might (or might not!) be the right tool for the job.

On the course syllabus it said “Built in partnership with MongoDB” which in practice meant many links to MongoDB’s established portfolio of guides and documentation. After Codecademy’s own explanation of SQL vs. NoSQL, we have a link to MongoDB’s own take. Related to that topic is a presentation that describes MongoDB structures in terms of close analogues in SQL, but also implored experienced database developers to free their mind and think beyond relational database conventions. It seems perfectly possible to set up a MongoDB database so it looks and act like a relational database, but not taking advantage of MongoDB strengths risks incurring all the disadvantages of NoSQL without any reward to balance them out.

The MongoDB advantage that really caught my attention is the ability to start working on a project before we know everything about data access patterns. In a SQL-backed project that is a recipe for disaster because incomplete information would lead to a suboptimal schema, and one that we’d be stuck with towards the end of the project. Over in MongoDB land, our data validation can be loose in early stages of the project and tightened as we go if desired. During the course of development, MongoDB can theoretically adapt to changes much more easily than SQL databases could handle schema updates. I wonder if this means it’s possible for an application to evolve their MongoDB usage and end up in a state where it’d make sense to migrate to a SQL relational database.

These features were all very promising, and I really looked forward to playing with MongoDB. Unfortunately, Codecademy’s hands-on lesson backend is broken today. On the first page of the first interactive lesson, I was told to show all databases in a MongoDB instance by typing in the “show dbs” command, which listed four databases as expected. After this list was shown, I was to click the “Check Work” button to verify my progress before proceeding. But when I clicked “Check Work”, nothing happened. I did not see a successful check, which would have allowed me to proceed. In the absence of success check, I expect to see a red X telling me my answer was wrong and need to try again. But I didn’t see that, either. Nor did I see any sort of an error message. No “Pass”, no “Fail”, and not even a “Oh no something is wrong”.

I’m stuck.

So Codecademy’s Learn MongoDB course was a bust, but as mentioned earlier, MongoDB has their own collection of learning resources. The reading material so far got me interested enough that I want to continue learning MongoDB. Instead of waiting for Codecademy to fix their backend, I will switch to MongoDB’s learning platform.

[UPDATE: Codecademy has fixed their MongoDB course and I could continue its lesson, which is now extremely easy and just a review as I’ve since gone through courses on MongoDB’s own learning platform.]