US Library of Congress Recommended Storage Format

The US Library of Congress recommends SQLite for the preservation of data — not other relational database systems. I am not surprised since SQLite reads and writes data to files without the need of a server. SQLite is low maintenance and has not changed the mechanism used to read and write data since released on 08/17/2000 following SQL-92 standards. The latter means that data written quarter of a century ago can still be read today without errors and it would most likely be read without errors decades from now.

Read more.

  1. https://sqlite.org/locrsf.html
  2. https://www.loc.gov/preservation/digital/formats/fdd/fdd000461.shtml#local
  3. https://www.loc.gov/preservation/resources/rfs/data.html
  4. https://news.ycombinator.com/item?id=48042434

SQL and its relation to AI

Artificial intelligence (AI) is a series of co-related queries to retrieve data from a series of databases – often times, guessing what the end user needs.

Structure Query Language (SQL) is the most basic form of retrieving data and managing the databases that AI needs.

Although AI could possibly write the query you need, there is a chance that you would need to edit it and/or improve it. After all, some AI models anticipate what the user asks verbally and try to answer right away. The latter is annoying since these AI models use data compiled by user group data – collected from people from a certain age range, geographical location and/or other factor. As such, when a person asks for information that is not in the expected dataset (for example, a question about music sung in Russian when the model expects everyone to listen to music sung in English, worse yet without the compensation for bad pronunciation), AI returns pathetic answers. AI fails to guess. This is where you (as the developer) come in and improve the queries that AI might generate for you.

Many vendors offer their own flavors of AI. For example, MySQL has its own libraries and facilities for AI – Automated Machine Language (AutoML) and Generative AI (GenAI).

MySQL AI provides integrated, automated, and secure machine learning (ML) and generative AI capabilities. AutoML simplifies ML processes, helping you build, train, and explain ML models without data movement or added costs. Similarly, in-database LLMs, a built-in vector store, and embedding models enable GenAI, semantic search, and retrieval-augmented generation (RAG) at reduced infrastructure costs and without data movement.
https://www.mysql.com/products/mysqlai/

As a developer, you need to clean what AI wrongfully believes is correct – often times going back to basic SQL. This is yet another reason why programming is fun.

So many databases, what to use and/or recommend?

As a purist, I would recommend PostgreSQL, which is the closest distribution to fully ANSI-complaint. Unfortunately the industry does not run on purists unless coding in Ruby on Rails (RoR) . The most common RoR installation is on the Heroku platform, which offers PostgreSQL as its default database. In any case, it is not heavily used as giants like Oracle or SQL Server (Microsoft) and this might make it more secure than the latter two – fewer hackers trying to break PostgreSQL or its security. Maybe this is a good reason to be open source as anyone can fix a bug before it gets maliciously exploited.

As a nostalgist, I would recommend Oracle, which is the system that I learned with. Unfortunately Oracle has somewhat of a bad reputation in the industry after taking over Sun. Some developers opted to quit rather than joining Oracle. The OpenOffice development group formed LibreOffice, while MySQL development group formed MariaDB. Oracle is proprietary (closed) source.

As a practicalist, I would recommend MariaDB and not MySQL to stay away from whatever toxic environment that has made many programmers leave Oracle. As a drop-in replacement of MySQL, MariaDB has taken over what had been the MySQL place in the LAMP (from Linux, Apache, MySQL/MariaDB, Perl/PHP/Python to Linux, Apache, MariaDB, Perl/PHP/Python) web stack maintaining the full open source mantra. Note that the web stack could also be referred to as WAMP when running on a Windows server.

LAMP is an Open Source Web development platform that uses Linux as the operating system, Apache as the Web server, MySQL as the relational database management system and PHP as the object-oriented scripting language. (Sometimes Perl or Python is used instead of PHP.)
https://www.techtarget.com/whatis/definition/LAMP-Linux-Apache-MySQL-PHP

As a realist, rather than teaching MariaDB, I would teach MySQL as it has a much stronger market share and more students might be interested in MySQL rather than its fork. In other words, new comers to the industry might have heard of MySQL, but not MariaDB. MySQL has a combination of open source (full open source for the Community Edition, which I am set to teach next) and proprietary (closed) sources.

As a minimalist, I would recommend SQLite (actually sqlite3) – no server, just data. SQLite is used in many systems including web browsers and mobile telephones. The only drawback is the lack of robust security or user authentication, but it is fully open source so its users clean up bugs right away.

Introduction to Structured Query Language for Data Analytics (SF25SQL6172025) — Day 8

After running into errors with the original database server for this class last Wednesday and building a new server on a virtual machine running a 90-day Windows 11 Enterprise 25H2 image provided by Microsoft with SQL Server Express 2025 and SSMS 22. It was a slow installation (~3 hours), but we were back last night.

We covered how to CREATE, DROP and ALTER data objects.

Download the class notes for day 8.