Skip to content

Database Layer

NexGraph uses PostgreSQL with three extensions: Apache AGE (property graphs), pg_trgm (trigram similarity), and pgvector (vector similarity for semantic search).

Relational Tables

projects

Top-level organizational unit.

ColumnTypeDescription
idUUIDPrimary key
nameTEXTProject name
settingsJSONBProject-level settings (include/exclude globs, architecture layers)
created_atTIMESTAMPCreation time
updated_atTIMESTAMPLast update time

api_keys

Project-scoped authentication keys.

ColumnTypeDescription
idUUIDPrimary key
project_idUUIDFK to projects
key_hashTEXTSHA-256 hash of the full key (UNIQUE)
key_prefixTEXTFirst 8 characters for identification
permissionsJSONBArray of permission strings
revokedBOOLEANSoft-delete flag
expires_atTIMESTAMPOptional expiry

repositories

Source code targets for indexing.

ColumnTypeDescription
idUUIDPrimary key
project_idUUIDFK to projects
nameTEXTRepository display name
urlTEXTSource URL or path
source_typeTEXTgit_url, zip_upload, or local_path
default_branchTEXTDefault git branch
graph_nameTEXTAGE graph name (UNIQUE)
last_indexed_commitTEXTSHA for incremental indexing
community_detected_atTIMESTAMPWhen community detection last ran
community_countINTEGERNumber of detected communities
process_countINTEGERNumber of detected processes
embeddings_generated_atTIMESTAMPWhen embedding generation last ran
embedding_countINTEGERNumber of generated symbol embeddings

indexed_files

Tracked files for change detection.

ColumnTypeDescription
idUUIDPrimary key
repository_idUUIDFK to repositories
file_pathTEXTRelative path
languageTEXTDetected programming language
content_hashTEXTSHA-256 hash of file content

indexing_jobs

Job queue tracking.

ColumnTypeDescription
idUUIDPrimary key
repository_idUUIDFK to repositories
statusTEXTqueued, running, completed, failed, cancelled
modeTEXTfull or incremental
phaseTEXTCurrent pipeline phase
progressREAL0.0–1.0 progress value
last_completed_phaseINTEGERFor resume support
files_totalINTEGERTotal files to process
files_doneINTEGERFiles processed so far
error_messageTEXTError details if failed
boss_job_idTEXTpg-boss job ID for cancellation

file_contents

Stored file content for full-text search (BM25) and regex grep.

ColumnTypeDescription
idUUIDPrimary key
repository_idUUIDFK to repositories
file_pathTEXTRelative path
contentTEXTRaw file content
search_vectorTSVECTORPostgreSQL FTS index

symbol_embeddings

Vector embeddings for semantic and hybrid search (requires pgvector extension).

ColumnTypeDescription
idUUIDPrimary key
repository_idUUIDFK to repositories
node_age_idBIGINTAGE graph node identifier
symbol_nameTEXTSymbol name
file_pathTEXTFile containing the symbol
labelTEXTNode label (Function, Class, etc.)
text_contentTEXTText used to generate the embedding
embeddingvector(384)384-dimensional embedding vector

Indexed with HNSW for fast approximate nearest-neighbor search:

sql
CREATE INDEX idx_symbol_embeddings_hnsw ON symbol_embeddings
    USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

repo_connections

Cross-repo connection rules that define which repositories should be linked and how.

ColumnTypeDescription
idUUIDPrimary key
project_idUUIDFK to projects
source_repo_idUUIDFK to repositories
target_repo_idUUIDFK to repositories
connection_typeTEXTCROSS_REPO_CALLS, CROSS_REPO_IMPORTS, CROSS_REPO_DEPENDS, or CROSS_REPO_MIRRORS
match_rulesJSONBAdditional resolver configuration
last_resolved_atTIMESTAMPWhen resolution last ran
created_atTIMESTAMPCreation time
updated_atTIMESTAMPLast update time

Unique constraint on (source_repo_id, target_repo_id, connection_type).

cross_repo_edges

Resolved edges discovered across repository boundaries. See Cross-Repo Resolution for details.

ColumnTypeDescription
idUUIDPrimary key
project_idUUIDFK to projects
source_repo_idUUIDFK to repositories
target_repo_idUUIDFK to repositories
source_nodeTEXTSource symbol identifier
target_nodeTEXTTarget symbol identifier
edge_typeTEXTMatches the connection_type that produced it
metadataJSONBResolution details (confidence, method, etc.)
manualBOOLEANtrue for user-created edges, false for auto-resolved
created_atTIMESTAMPCreation time

Apache AGE Graphs

Each repository gets its own named graph: proj_<uuid>_repo_<uuid> (hyphens replaced with underscores).

AGE is initialized on every connection:

sql
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

Migrations

SQL migration files in src/db/migrations/ are run in sorted order. Each migration executes in a transaction and is tracked in the schema_migrations table to prevent re-running.

bash
npm run db:migrate

Current migrations:

  1. Initial schema (projects, api_keys, repositories, indexed_files, indexing_jobs)
  2. API key auth fields (key_hash UNIQUE)
  3. Repository source_type and name
  4. Indexing phase tracking
  5. Indexing queue fields (boss_job_id, mode, cancelled status)
  6. Connection match_rules
  7. Incremental indexing (last_indexed_commit, changed_files_count)
  8. File contents search (tsvector + GIN index)
  9. Cross-repo edges manual flag
  10. Connections last_resolved_at
  11. Community & process detection (community_detected_at, community_count, process_count)
  12. Vector search (pgvector extension, symbol_embeddings table, HNSW index)

Released under the MIT License.