3.6. TODO

3.6.1. New Operations and Types

  • Add an undo operation to the log. This way, it is possible to keep all branches of history.
  • Needs a way to modify multiple columns atomically (allows unique_index to work + better trigger invocations). New operations:
    • start/end atomic record update
    • insert_and_start_atomic_update.
    • Also for vector insertions and updates.
  • Use diff for large-string update
  • Differentiate between “storage type” and “usage type”:
    • remove bool type and use int8 instead, with bool usage
    • usages: bool(int8), date(int64).
    • uint8, uint16, uint32, uint64
    • custom usage label: ip address(int32), URL(string), PNG file(string), UTF8(string) (use base64 instead for json output), …?

3.6.2. Blobs

  • network protocol extension to handle local blob cache without downloading everything
  • zero-copy access to blob data using memory-mapped file

3.6.3. On-disk Storage

  • In a directory
  • A checkpoint file (2 copies, valid if identical)
  • A subdirectory for each table
  • One file per column vector
  • One file for string data (string column = size + start_index)
  • Use memory-mapped files (is there a portable way?)

3.6.4. Compiler

  • Pass strings by value for new and update, and std::move them:
    • need for rvalue reference overload of Writable::update_string
    • plain reference version must be kept as well
    • using blobs or vectors of int8 can be a high-performance alternative
    • so for the moment, it is not worth the added complexity
  • allow reading dropped fields in custom functions that are invoked before the drop. Store data in a column vector, and clear the vector at the time of the drop. Make sure field id is not reused. (make access function private, and custom functions are friends)
  • check that vector range is OK in constructor of vector update
  • modularize code generation
    • Each module should have:
      • required include files
      • data structure for storing data
      • additional hidden table fields?
      • triggers (after/before insert/update/delete)
      • public methods
    • Possible to modularize:
      • indexes
      • sort functions
      • referential integrity
      • safety checks
      • incrementally updated group-by queries
  • use std::set and std::multiset for indexes? Might be better for strings.
  • Table options:
    • single_row: compiled to a simple struct, with simpler getters.
    • no_delete: allows more efficient indexing (+smaller code)
    • last N (for web access log) (last 0 = none)
  • Allow the user to write custom event-processing functions and store information in custom data structures (for instance: collect statistics from web access log without storing whole log in RAM).
  • Compiler utilities:
    • referential integrity
    • queries (SQL compiler?)
    • incrementally-updated group-by queries (OLAP, hypercube, …)
  • C wrapper. Catch all exceptions? Error codes?
  • jni wrapper

3.6.5. Better Freedom_Keeper

  • index returned by public methods of Freedom_Keeper should be record ids.
  • No need to maintain a linked list of individual records
  • A linked list of intervals instead, to unify everything
  • Let joedb_merge fuse intervals to remove holes (100% update_vector)
  • joedb_to_json can also become more efficient
  • Get ready for “last-N” storage, and no_delete option (force single interval).

3.6.6. Concurrency

  • content-matching option (full, fast, none)
  • better support for readonly connection (and client): separate types?
  • Pull-only connection (eg when serving a read-only file): -> joedb_client does not offer transaction and push -> reply with readonly flag during server handshake -> bool is_pullonly() const in connection (and client)
  • joedb_server:
    • fuzzer
    • use coroutines
    • support running on multiple threads (requires mutex?)
      • OK to keep one thread busy when waiting for a lock, or computing SHA 256, …
      • thread_count = max(core_count, 2 * server_count)
      • Requires synchronization. Mutex for global stuff (connection, disconnection, interrupt, …)
    • ipv6: https://raw.githubusercontent.com/boostcon/2011_presentations/master/wed/IPv6.pdf
    • get rid of signal. Make an interactive command-line interface to control the server. Maybe better: use asio’s (non-std::net) support for signal.
  • SHA-256: option for either none, fast or full.
  • reading and writing buffers: don’t use network_integers.h, but create a Buffer_File class, and use write<int64_t>
  • Connection_Multiplexer for multiple parallel backup servers
  • Notifications from server to client, in a second channel:
    • when another client makes a push
    • when the lock times out
    • when the server is interrupted
    • ping

3.6.7. Performance

  • use async_write_some and async_read_some during pull and push
  • vector of size 1: write ordinary insert and update to the journal instead
  • joedb::Database: use vector instead of map for tables and fields (with a bool indicating if deleted)
  • FILE_FLAG_SEQUENTIAL_SCAN or explicit asynchronous prefetch: https://devblogs.microsoft.com/oldnewthing/20221130-00/?p=107505

3.6.8. joedb_admin

  • serve with boost::beast.
  • work as a client to a joedb_server.
  • customizable GUI, similar to the icga database editor.

3.6.9. Other Ideas

  • One separate class for each exception, like joedb::exception::Out_Of_Date.
  • Is it possible to replace macros by templates?
  • ability to indicate minimum joedb version in joedbc (and joedbi?)
  • apply schema upgrade to readonly databases (custom functions)
  • only one file.check_write_buffer() call in write<T> and compact_write<T>: make code shorter and simpler.
  • make a package for vcpkg and conan. Maybe build2?
  • Null default initial values
  • better readable interface:
    • a separate table abstraction (that could be used for query output)
    • cursors on tables
  • compiled Readable
  • index and referential integrity: should be in the journal, and also implemented in the interpreted database?
  • Deal properly with inf and nan everywhere (logdump, joedb_admin, …)
  • Note that SQL does not support inf and nan. Use NULL instead.
  • Raw commands in interpreter?
  • import from SQL
  • rapidly undo-able history?
  • namespace for each subdir?