129 lines (98 loc) · 20.7 KB

Components

Models

Component type	Component Version	Implementation	Configuration	Component Interface	Description
model	gpt2	GPT2LLM	GPT2LLMConfig	NNModel	GPT2 model for language modeling
model	huggingface_pretrained_model	HuggingFacePretrainedModel	HuggingFacePretrainedModelConfig	NNModel	HuggingFace pretrained model for language modeling
model	checkpointed	ModelFactory.get_checkpointed_model	CheckpointedModelConfig	nn.Module	Checkpointed Model instance
model	fsdp_wrapped	ModelFactory.get_fsdp_wrapped_model	FSDPWrappedModelConfig	NNModel	Model that has been sharded via FSDP
model	model_initialized	ModelFactory.get_weight_initialized_model	WeightInitializedModelConfig	nn.Module	Model with initialized weights
model	coca	CoCa	CoCaConfig	NNModel	CoCa Model (Contrastive Captioners)

Weight Initialization

Component type	Component Version	Implementation	Configuration	Component Interface	Description
model_initialization	composed	ComposedInitializationRoutines.get_composed_model_initializer	ComposedModelInitializationConfig	ModelInitializationIF	Component for initializing model weights in place

Losses

Component type	Component Version	Implementation	Configuration	Component Interface	Description
loss	clm_cross_entropy_loss	CLMCrossEntropyLoss	CLMCrossEntropyLossConfig	Loss	Cross-entropy loss function

Optimizers

Component type	Component Version	Implementation	Configuration	Component Interface	Description
optimizer	adam	OptimizerFactory.get_adam	AdamOptimizerConfig	Optimizer	ADAM optimizer
optimizer	adam_w	OptimizerFactory.get_adam_w	AdamWOptimizerConfig	Optimizer	ADAMW Optimizer
optimizer	checkpointed	OptimizerFactory.get_checkpointed_optimizer	CheckpointedOptimizerConfig	Optimizer	Optimizer instantiated from checkpoint

LR Scheduling

Component type	Component Version	Implementation	Configuration	Component Interface	Description
scheduler	dummy_lr	DummyLRScheduler	DummyLRSchedulerConfig	LRScheduler	Fake lr scheduler not adapting the lr rate
scheduler	step_lr	StepLR	StepLRSchedulerConfig	LRScheduler	Decays the learning rate of each parameter group by gamma every step_size steps
scheduler	constant_lr	ConstantLR	ConstantLRSchedulerConfig	LRScheduler	Multiplies the learning rate of each parameter group by a small constant factor until the number of steps reaches a pre-defined milestone
scheduler	onecycle_lr	OneCycleLR	OneCycleLRSchedulerConfig	LRScheduler	Sets the learning rate of each parameter group according to the 1cycle learning rate policy.
scheduler	cosine_annealing_lr	CosineAnnealingLR	CosineAnnealingLRSchedulerConfig	LRScheduler	Set the learning rate of each parameter group using a cosine annealing schedule
scheduler	linear_warmup_cosine_annealing_lr	LinearWarmupCosineAnnealingLRScheduler	LinearWarmupCosineAnnealingLRSchedulerConfig	LRScheduler	Linearly warms up to the base learning rate, then decays with cosine annealing for the remaining training steps

Tokenization

Component type	Component Version	Implementation	Configuration	Component Interface	Description
tokenizer	pretrained_hf_tokenizer	PreTrainedHFTokenizer	PreTrainedHFTokenizerConfig	TokenizerWrapper	Pretrained Huggingface tokenizer
tokenizer	pretrained_sp_tokenizer	PreTrainedSPTokenizer	PreTrainedSPTokenizerConfig	TokenizerWrapper	Pretrained SentencePiece tokenizer

Datasets

Component type	Component Version	Implementation	Configuration	Component Interface	Description
dataset	mem_map_dataset	DatasetFactory.get_mem_map_dataset	MemMapDatasetConfig	Dataset	MemMap Dataset
dataset	packed_mem_map_dataset_continuous	DatasetFactory.get_packed_mem_map_dataset_continuous	PackedMemMapDatasetContinuousConfig	Dataset	Packed Memory Mapped Dataset Continuous
dataset	dummy_dataset	DatasetFactory.get_dummy_dataset	DummyDatasetConfig	Dataset	Dummy dataset creating random samples of specified shape
dataset	combined	DatasetFactory.get_combined_dataset	CombinedDatasetConfig	Dataset	Dataset implementation combining multiple datasets into one.

Data sampling

Component type	Component Version	Implementation	Configuration	Component Interface	Description
sampler	distributed_sampler	DistributedSampler	DistributedSamplerConfig	Sampler	Sampler that restricts data loading to a subset of the dataset for distributed training
batch_sampler	default	BatchSampler	BatchSamplerConfig	Sampler	Wraps another sampler to yield a mini-batch of indices.

Data collation

Component type	Component Version	Implementation	Configuration	Component Interface	Description
collate_fn	gpt_2_llm_collator	GPT2LLMCollateFn	GPT2LLMCollateFnConfig	CollateFnIF	Data collator for the GPT2 model
collate_fn	coca_collator	CoCaCollatorFn	CoCaCollateFnConfig	CollateFnIF	Data collator for the CoCa model

Data loaders

Component type	Component Version	Implementation	Configuration	Component Interface	Description
data_loader	default	DataloaderFactory.get_dataloader	LLMDataLoaderConfig	DataLoader	LLM Data loader extending pytorch data loader functionality

Checkpointing

Component type	Component Version	Implementation	Configuration	Component Interface	Description
checkpoint_saving	default	CheckpointSaving	CheckpointSavingConfig	--	Component for saving checkpoints based on a savig and execution strategy.
checkpoint_saving_strategy	save_every_k_steps_checkpointing_strategy	SaveEveryKStepsCheckpointingStrategy	SaveEveryKStepsCheckpointingStrategyConfig	CheckpointSavingStrategyIF	Checkpointing strategy saving a checkpoint every k steps
checkpoint_saving_strategy	save_k_most_recent_checkpoints_strategy	SaveKMostRecentCheckpointsStrategy	SaveKMostRecentCheckpointsStrategyConfig	CheckpointSavingStrategyIF	Checkpointing strategy saving only the last k checkpoints and deleting the previous ones
checkpoint_saving_execution	fsdp	FSDPCheckpointSaving	FSDPCheckpointSavingConfig	CheckpointSavingExecutionABC	FSDPCheckpointSaving class for saving checkpoints of FSDP models and optimizers.
checkpoint_loading	fsdp	FSDPCheckpointLoading	FSDPCheckpointLoadingConfig	CheckpointLoadingIF	Component for loading FSDP checkpoints
checkpoint_loading	torch	TorchCheckpointLoading	TorchCheckpointLoadingConfig	CheckpointLoadingIF	Component for loading PyTorch checkpoints

Logging

Component type	Component Version	Implementation	Configuration	Component Interface	Description
progress_subscriber	dummy	ProgressSubscriberFactory.get_dummy_progress_subscriber	DummyProgressSubscriberConfig	MessageSubscriberIF	Dummy Progress subscriber not consuming any messages
progress_subscriber	rich	ProgressSubscriberFactory.get_rich_progress_subscriber	RichProgressSubscriberConfig	MessageSubscriberIF	Subscriber for writing out rich-formatted console outputs w.r.t. to training and evaluation progress
results_subscriber	wandb	ProgressSubscriberFactory.get_wandb_result_subscriber	WandBEvaluationResultSubscriberConfig	MessageSubscriberIF	Subscriber for logging evaluation results to Weights and Biases

Layer Norms

Component type	Component Version	Implementation	Configuration	Component Interface	Description
layer_norm	rms_norm	RMSLayerNorm	RMSLayerNormConfig	nn.Module	RMS Layer norm
layer_norm	layer_norm	nn.LayerNorm	LayerNormConfig	nn.Module	Layer norm

Gradient Clipping

Component type	Component Version	Implementation	Configuration	Component Interface	Description
gradient_clipper	fsdp	FSDPGradientClipper	FSDPGradientClipperConfig	GradientClipperIF	FSDP Gradient Clipper
gradient_clipper	fsdp_logging_only	FSDPLoggingOnlyGradientClipper	FSDPGradientClipperConfig	GradientClipperIF	Clipper that is responsible for logging the gradient norms without actually clipping the gradients
gradient_clipper	dummy	DummyGradientClipper	DummyGradientClipperConfig	GradientClipperIF	Dummy clipper that does not apply any gradient clipping.

Number conversions

Component type	Component Version	Implementation	Configuration	Component Interface	Description
number_conversion	local_num_batches_from_num_samples	NumberConversion.get_local_num_batches_from_num_samples	LocalNumBatchesFromNumSamplesConfig	--	Calculates the number of local batches for each rank, given the global number of samples and number of ranks.
number_conversion	local_num_batches_from_num_tokens	NumberConversion.get_local_num_batches_from_num_tokens	LocalNumBatchesFromNumTokensConfig	--	Calculates the number of local batches for each rank, given the global number of tokens and number of ranks.
number_conversion	local_num_batches_from_num_tokens	NumberConversion.get_num_samples_from_num_tokens	NumSamplesFromNumTokensConfig	--	Calculates the number of global samples, given the global number of tokens and sequence length
number_conversion	num_steps_from_num_samples	NumberConversion.get_num_steps_from_num_samples	NumStepsFromNumSamplesConfig	--	Calculates the number of steps given the global number of samples, local micro batch size and number of ranks.
number_conversion	num_steps_from_num_tokens	NumberConversion.get_num_steps_from_num_tokens	NumStepsFromNumTokensConfig	--	Calculates the number of steps given the global number of tokens, local micro batch size and number of ranks.
number_conversion	num_tokens_from_num_steps	NumberConversion.get_num_tokens_from_num_steps	NumTokensFromNumStepsConfig	--	Calculates the number of tokens from the number of steps, number of ranks, local micro batch size, global number of tokens, squence length and gradient accumulation steps
number_conversion	last_step_from_checkpoint_path	NumberConversion.get_num_seen_steps_from_checkpoint_path	NumberConversionFromCheckpointPathConfig	--	Get the last step id from a model or checkpoint file path.
number_conversion	global_num_target_tokens_from_checkpoint_path	NumberConversion.get_global_num_target_tokens_from_checkpoint_path	NumberConversionFromCheckpointPathConfig	--	Get the number of target tokens from a model or checkpoint file path.
number_conversion	num_tokens_from_packed_mem_map_dataset_continuous	NumberConversion.get_num_tokens_from_packed_mem_map_dataset_continuous	NumTokensFromPackedMemMapDatasetContinuousConfig	--	Get the number of tokens stored in a packed mem map continuous dataset from the respective dataset file path.
number_conversion	num_steps_from_raw_dataset_index	NumberConversion.get_num_steps_from_raw_dataset_index	NumStepsFromRawDatasetIndexConfig	--	Get the number of steps partially from the raw index of a raw JSONL dataset. Requires the file path to index, number of ranks, local micro batch size and gardient accumulation steps.