StoryCraft AI - Infinite Possibilities in Every Story

Select a model, enter your seed text, and generate a creative story that inspires your imagination.

Get Started
About

Welcome to StoryCraft AI – Infinite Possibilities in Every Story – a cutting-edge platform where boundless creativity meets advanced deep learning. Our website is the result of countless hours of research and development, training powerful deep learning models from scratch to generate compelling, creative stories. We leverage the power of GRU, LSTM, Bidirectional-GRU, and Bidirectional-LSTM architectures to bring you an interactive experience that transforms your seed text into imaginative narratives.

At StoryCraft AI, you can:

  • Choose Your Model: Select from a range of meticulously trained models, each offering its unique style of story generation.
  • Visualize the Journey: Dive into detailed training history graphs that showcase the evolution of our models.
  • Access the Source: Download model files and view the underlying code to understand the technology behind our creative engine.
  • Experience Innovation: Enjoy a premium, fully interactive interface designed to spark your creativity and inspire new ideas.

Our commitment to excellence and transparency means we not only deliver exceptional stories but also share the full depth of our development process. Join us on this journey where technology and imagination come together to unlock infinite possibilities in storytelling.

Story Generation
Generated Story
Model Training History Graphs

Loss

Accuracy

Available Models

Frequently Asked Questions

We form training sequences by sliding a window over the tokenized text. Each sequence contains 51 tokens: the first 50 tokens are used as input (X) and the 51st token is the target (y). Mathematically, if T = [t₁, t₂, ..., tₙ] is the tokenized text, then for each valid index i (i ≥ 51), the sequence Sᵢ = [tᵢ₋₅₀, ..., tᵢ₋₁, tᵢ]. This fixed window size is a trade-off: a longer context may capture more dependencies but increases model complexity and computational cost.
The vocabulary size is computed as the number of unique words in the training set plus one (i.e., vocabulary_size = len(tokenizer.word_index) + 1). This addition accounts for the fact that word indices typically start at 1 (with 0 reserved for padding), ensuring that every word has a unique index. This size defines the dimensions of both the embedding layer (input_dim) and the output layer of the network.
One-hot encoding transforms each target word into a binary vector with length equal to the vocabulary size. For example, if a word has index i, its one-hot encoded vector is all zeros except for a 1 in the i-th position. Mathematically, this allows us to use the categorical crossentropy loss function, defined as L = -Σ (y_true * log(y_pred)), to measure the divergence between the predicted probability distribution (from the softmax layer) and the true distribution.
Categorical crossentropy is ideal for multi-class classification problems like next-word prediction. It calculates the loss as L = -Σ y_truei * log(y_predi) for each word in the vocabulary. This formulation penalizes predictions that diverge from the one-hot true label, effectively training the model to assign a higher probability to the correct next word.
Dropout works by randomly setting a fraction of input units to zero during each training iteration. Mathematically, if p is the dropout rate, each neuron is kept with probability (1 − p). This randomization forces the network to learn redundant representations and prevents the co-adaptation of neurons. It can be seen as training an ensemble of sub-models, which collectively reduce overfitting.
GRU (Gated Recurrent Unit) uses update and reset gates to control the flow of information. Its core equations include:

zₜ = σ(Wzxₜ + Uzhₜ₋₁) (update gate) and rₜ = σ(Wrxₜ + Urhₜ₋₁) (reset gate). Then, the candidate activation is computed as h̃ₜ = tanh(Wxₜ + U(rₜ ⊙ hₜ₋₁)), and the new hidden state is hₜ = (1 - zₜ) ⊙ hₜ₋₁ + zₜ ⊙ h̃ₜ.

In contrast, LSTM (Long Short-Term Memory) introduces three gates (input, forget, output) along with a cell state to manage information flow:

iₜ = σ(Wixₜ + Uihₜ₋₁), fₜ = σ(Wfxₜ + Ufhₜ₋₁), and oₜ = σ(Woxₜ + Uohₜ₋₁) control the input, forgetting, and output of information respectively, while the cell state is updated as cₜ = fₜ ⊙ cₜ₋₁ + iₜ ⊙ tanh(Wc xₜ + Uc hₜ₋₁). These mathematical formulations allow LSTM to better capture long-range dependencies.
Bidirectional models process input sequences in both forward and backward directions. Mathematically, this means that for each time step, the hidden state is computed twice—once from past to future and once from future to past. The outputs are then concatenated (or summed), providing a richer representation that captures context from both directions. This dual perspective can improve prediction accuracy by incorporating more comprehensive contextual information.
Categorical crossentropy is used because it quantifies the difference between the predicted probability distribution and the true one-hot encoded vector. It is defined as:

L = -∑ (ytrue * log(ypred))

This loss function penalizes the model when the predicted probability for the correct word is low. By minimizing this loss during training, the model learns to assign higher probabilities to the correct next word.
During training, the model uses backpropagation through time (BPTT) to compute gradients of the loss function with respect to its weights. Gradient descent (or a variant like Adam) updates the weights by moving them in the direction that minimizes the loss. Mathematically, for each weight w, the update rule is:

w ← w - η * ∂L/∂w

where η is the learning rate and ∂L/∂w is the gradient of the loss L with respect to w.