Research

CoderForge-Preview: SOTA open dataset for training efficient coding agents

February 25, 2026

By 

By Alpay Ariyak*, Junda Zhang, Junxiong Wang, Shang Zhu, Federico Bianchi, Sanjana Srivastava, Ashwinee Panda, Siddhant Bharti, Chenfeng Xu, John Heo, Xiaoxia Shirley Wu, James Zou, Percy Liang, Leon Song, Ce Zhang, Ben Athiwaratkun, Zhongzhu Zhou*, Qingyang Wu* *Project Core Leads

Summary

We release CoderForge-Preview - the largest open test-verified coding agent dataset. By leveraging it to fine-tune Qwen-3 32B, we boost SWE-Bench Verified performance 23.0% → 59.4% pass@1 and rank #1 among open-data models ≤32B parameters.

As coding agents become increasingly capable, the research community faces a critical bottleneck: the lack of large-scale, high-quality open training data. While proprietary models continue to advance, open-weight alternatives have been held back by limited access to the long-context, test-verified trajectories needed for effective agent training.

We're releasing CoderForge-Preview, the largest open dataset of coding agent trajectories to date - 258k test-verified trajectories (155k pass | 103k fail) spanning 51K tasks across 1,655 repositories, and share our results of using it to train 32B and 4B models on it. By releasing CoderForge openly, we aim to accelerate progress across the entire open-source AI community and enable researchers everywhere to build, study, and improve upon our work. Fine-tuning Qwen-3 32B achieves 59.4% pass@1 on SWE-Bench Verified [8], ranking #1 among open-data models in the ≤32B parameter range.

We release the full trajectory dataset, as well as the evaluation trajectories for 32B.

CoderForge-Preview Data

We generate agent trajectories from three different task sources using Qwen3-Coder-480B and apply rejection sampling to filter out solutions that fail to pass the tests. This process yields 258K long-context trajectories (up to 128K tokens) across 51K tasks, from which we retain 155K high-quality, test-verified trajectories for SFT training.

Task sources

We draw tasks from three sources: R2E-Gym [5], SWE-Smith [6], and SWE-Rebench [7]:

TogetherCoder-Preview Data Sources
Table 1: Task counts reflect instances valid in our execution environment.
Source Tasks Unique Repos
R2E-Gym 4,216 9
SWE-Smith 37,221 124
SWE-Rebench 9,764 1,577
Total 51,201 1,655

Setup

For the agent scaffold, we integrate OpenHands v0.52.1 scaffold [4] into the R2E-Gym [5] data generation framework. It includes four main tools: bash execution (execute_bash), file editing (str_replace_editor), log thinking (think), and task completion (finish). OpenHands is pre-installed in each Docker evaluation environment, enabling LLM agents to interact with isolated code repositories through a standardized action/observation interface. Each task is executed within an isolated Docker container, where the agent iteratively issues bash commands and file edits for up to 100 steps to generate a final patch.

We use Qwen3-Coder-480B as the main model for data generation. We use a temperature of 0.7, a top_p of 0.8, and 32,768 max new tokens. To increase the number of successful trajectories, we generate multiple per problem - 8 for R2E-Gym and SWE-Rebench, and 4 for SWE-Smith. We filter to keep only trajectories whose final patches pass all repository tests. To avoid evaluation leakage, we exclude any tasks that share the same (repository, base_commit) pair or problem statement with SWE-Bench Verified samples.

Comparison to other datasets

Dataset Comparison - CoderForge-Preview
Table 2: Comparison of CoderForge-Preview with other open coding agent datasets.
Datasets Teacher Context Length Size (Total) Size (Filtered)
R2E-Gym/R2EGym-SFT-Trajectories Claude Sonnet 3.5 20000 3,231 3,231
SWE-bench/SWE-smith-trajectories Claude Sonnet 3.7 32768 49,897 21,513
allenai/SERA [10] GLM-4.6 32768 25,224 25,224
nex-agi/agent-sft (agentic_code) [11] DeepSeek-V3.1-Nex-N1 128000 24,796 24,796
nebius/SWE-rebench-openhands-trajectories [12] Qwen3-Coder-480B 128000 67,074 32,161
CoderForge-Preview Data Qwen3-Coder-480B 128000 258,134 155,144

CoderForge-Preview Data stands out as the largest and best-performing coding-agent trajectory dataset among comparable releases. With 258,134 total and 155,144 successful trajectories at a 128K context length, it substantially exceeds prior datasets both in scale and long-context coverage.

Trajectory success by task source

For each trajectory, we run the relevant tests provided with its task to check whether the model has solved it. For R2E-Gym tasks the solve-rate was consistently the highest, rising from 62.9% at Pass@1 to 80.3% at Pass@8. SWE-Rebench also benefits substantially from multi-attempt sampling, improving from 57.5% to 73.9% by Pass@8. SWE-Smith shows more modest gains, increasing from 58.8% at Pass@1 to 64.9% at Pass@4. Overall, the trend highlights the effectiveness of multi-sample generation in increasing the yield of successful trajectories, with diminishing but consistent returns as the number of attempts grows.

Final Task Source Distribution

We filter our generated trajectories based on whether they solved the task successfully, resulting in the task distribution shown below. For our SFT experiments, we only trained on the successful trajectories.

Trajectories by Task Source
Task Source Trajectories Generated Trajectories Generated (reward = 1.0)
R2E-Gym 32,964 20,904
SWE-Smith 148,001 89,501
SWE-Rebench 77,169 44,739
Total 258,134 155,144

Trajectory Characteristics

Data Generation Cost

Data Generation Cost - TogetherCoder-Preview
Table 3: Data generation cost and efficiency metrics across task sources.
Source # Completions API Prompt Tokens Output Tokens Avg Output Tokens Cache Hit Rate
R2E-Gym 2.18M 59B 404M 185.4 96.64%
SWE-Smith 8.47M 238B 1,544M 182.4 90.15%
SWE-Rebench 4.99M 155B 965M 193.3 88.64%
Total 15.64M 452B 2.91B 186.3 90.48%

Data generation at scale was enabled through efficient long-context inference and aggressive prompt caching. Across R2E-Gym, SWE-Smith, and SWE-Rebench, we issued 15.64M API completions, processing 452B prompt tokens and generating 2.91B output tokens, with an overall cache hit rate of ~90%. Using a pricing model of \$0.50 per million prompt tokens, \$0.25 per million cached tokens, and \$2.00 per million output tokens, the total cost of generating this large-scale, long-context trajectory dataset was \$130k.

Trajectory Analysis

Trajectory Length Analysis - TogetherCoder-Preview
Table 4: Trajectory length distribution and total training tokens by data source.
Source Median Length Avg Length P99 Length Total Training Tokens
R2E-Gym 39,599 42,149 83,549 0.97B
SWE-Smith 36,008 39,313 88,101 3.65B
SWE-Rebench 41,996 45,236 99,391 2.08B
Total 38,052 41,398 91,150 6.70B

The table shows the length distribution of generated agent trajectories across data sources. Median trajectory lengths range from 36K–42K tokens, with average lengths around 41K tokens and P99 lengths approaching 100K tokens, highlighting the long-context nature of the data. The combined dataset comprises 6.70B total training tokens.

Success vs Failed Steps - TogetherCoder-Preview
Table 5: Average agent steps for successful vs. failed trajectories by data source.
Source Avg Steps (Success) Avg Steps (Failed)
R2E-Gym 55.7 65.6
SWE-Smith 48.3 61.0
SWE-Rebench 56.2 72.0

We compare the average number of agent steps for successful versus failed trajectories across data sources. In all cases, failed trajectories require substantially more steps, 18–28% more on average than successful ones. By training exclusively on successful trajectories, we aim to push the model toward efficient task resolution and concise decision-making, rather than learning from extended unproductive sequences.

License filtering

To ensure the dataset can be used responsibly, we conducted a comprehensive license audit of every repository at the exact commit referenced by each task. We retrieved the LICENSE file from each repository at the specific commit SHA and identified it using scancode-toolkit, the industry-standard license detection engine used by the Linux Foundation and the SPDX project.

We retain only trajectories from repositories under permissive open-source licenses:

License SPDX Identifiers
License SPDX Identifier
MIT License MIT
BSD 3-Clause BSD-3-Clause
Apache License 2.0 Apache-2.0
BSD License BSD
BSD 2-Clause BSD-2-Clause
Historical Permission Notice and Disclaimer HPND
ISC License ISC
PostgreSQL License PostgreSQL
Python Software Foundation License PSF-2.0
Creative Commons Zero 1.0 CC0-1.0
MIT No Attribution MIT-0
MIT-CMU License MIT-CMU
BSD 4-Clause BSD-4-Clause
Dual: MIT and Apache-2.0 MIT AND Apache-2.0
Dual: Apache-2.0 and BSD-3-Clause Apache-2.0 AND BSD-3-Clause
Dual: BSD-3-Clause and MIT BSD-3-Clause AND MIT

CoderForge-Preview training experiments

Training Setup

We choose the dense model Qwen3-32B [1] as the base model for fine-tuning.

To support efficient training with 128K-length sequences, we adopt sequence parallelism via Ulysses [2], partitioning the sequence dimension and using optimized all-to-all communication to compute attention. Furthermore, as the trajectory lengths are varied across the dataset, we use multi-packing to pack multiple shorter trajectories into the same training sequence while preventing cross-example attention contamination via boundary-aware masking [3].

We use token-level loss formulation that aggregates gradients across all tokens in the entire batch across FSDP & Sequence Parallelism GPUs. This means the contribution of each token to the overall loss is normalized with respect to the total token count in the batch, so that long and short sequences are weighted consistently.

Given $L_i$: mean cross-entropy loss for micro-batch $i$ on rank $r$, $n_i^{(r)}$: number of valid tokens in micro-batch $i$ on rank $r$, $N=\sum_{r}\sum_{i} n_i^{(r)}$: total valid tokens across all ranks and micro-batches, The normalized loss is computed as:

$$\mathcal{L} \;=\; \frac{\sum_{r}\sum_{i} L_i\, n_i^{(r)}}{\sum_{r}\sum_{i} n_i^{(r)}}$$

We train the Qwen3-32B model on 8 nodes (64 H100 GPUs) with FSDP2 (shard size = 8), Ulysses sequence parallelism (size = 8) on BF16, gradient checkpointing, and FlashAttention-2.  To maximize training throughput, we use sequential packing to fully utilize the 128K context window, ensuring minimal padding waste across sequences of varying lengths. The detailed length analysis is shown as below:

Training Sequence Length Analysis - TogetherCoder-Preview
Table 6: Training sequence length distribution with multipack sampling (128K context window).
Metric Value Interpretation
Mean Length 104,891 tokens ~82% of 128K utilized
Median Length 105,274 tokens Symmetric distribution
Std Dev 5,906 tokens Low variance (~5.6% CV)
Min / Max 77,501 / 123,975 Good range coverage

Chat template

We adopt Qwen Coder’s chat template, which has XML-formatted tool calling that is better suited for LLMs. We also release the tokenized trajectories with loss masks in the trajectories-tokenized_qwencoder subset of the dataset on Huggingface.

Evaluation

Main Results

We evaluate our models on SWE-bench Verified, a curated subset of 500 real GitHub issues that tests end-to-end software engineering capabilities. Our best model achieves 59.4% pass@1 and 78.56% pass@16 at epoch 3.13, demonstrating strong single-attempt accuracy and excellent coverage with multiple samples. Our training data, CoderForge-Preview Data, combines trajectories from multiple sources to maximize diversity and coverage.

CoderForge Model Benchmarks
Model SWE-Bench-Verified (pass@1) SWE-Bench-Verified (pass@16)
CoderForge-Preview-32B 59.4% 78.56%
CoderForge-Preview-4B 43.0% -

Repetition Penalty

In early training, we noticed a lot of instances of the str_replace_editor tool failing due to the model repeating the string it wants to replace. We investigate whether a mild repetition penalty helps suppress degenerate loops without hurting precise code edits and put the results below:

The left panel shows evaluation scores across training epochs, while the right panel visualizes the performance difference at each checkpoint.

Conclusion: For production deployment, we recommend using no repetition penalty with later-epoch checkpoints. However, for early-stopped models, a mild penalty can provide a useful regularization effect.

Scaled down Experiments

To test whether the dataset provides learning signal at smaller scale, we fine-tune Qwen3‑4B. The SWE-bench Verified results demonstrate clear learning progress across 5 epochs: the 4B model score improves to 43.0% (Epoch 5).

The repetition-penalty ablation shows a similar trend to the 32B setting: it helps early (epochs 1–3) but is not beneficial at the best-performing later checkpoint.

The peak performance gap between 4B (43.0%) and 32B (59.4%) confirms that model capacity matters for complex agentic tasks, yet the consistent improvement trajectory validates that our training data provides genuine signal at both scales.

Eval Traces Release

We release our evaluation traces of CoderForge-Preview-32B.

Our setup is as follows:

  • Scaffold: OpenHands v0.52.1
  • Sampling Parameters:
    • temperature: 0.7
    • top_p: 0.8
    • max_tokens: 32768
    • max_iterations: 100

Performance varies by repository. Excluding repositories with very small sample sizes (e.g., n<5), scikit‑learn has the highest resolved rate (84.4%), followed by matplotlib (64.7%) and xarray (63.6%). For Django (n=231; 46% of the benchmark), the model resolves 61.9% of instances. Some low rates (e.g., seaborn) are based on very small n and should be interpreted cautiously

Evaluation prompt template

For both data generation and evaluation, we adopt the OpenHands SWE-Bench template:


<uploaded_files>
/workspace/{{ workspace_dir_name }}
</uploaded_files>

I've uploaded a python code repository in the directory {{ workspace_dir_name }}. Consider the following issue description:

<issue_description>
{{ instance.problem_statement }}
</issue_description>

Can you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?
I've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!
Also the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.
Your task is to make the minimal changes to non-test files in the /workspace/{{ workspace_dir_name }} directory to ensure the <issue_description> is satisfied.

Follow these phases to resolve the issue:

Phase 1. READING: read the problem and reword it in clearer terms
   1.1 If there are code or config snippets. Express in words any best practices or conventions in them.
   1.2 Hightlight message errors, method names, variables, file names, stack traces, and technical details.
   1.3 Explain the problem in clear terms.
   1.4 Enumerate the steps to reproduce the problem.
   1.5 Hightlight any best practices to take into account when testing and fixing the issue

Phase 2. RUNNING: install and run the tests on the repository
   2.1 Follow the readme
   2.2 Install the environment and anything needed
   2.2 Iterate and figure out how to run the tests

Phase 3. EXPLORATION: find the files that are related to the problem and possible solutions
   3.1 Use `grep` to search for relevant methods, classes, keywords and error messages.
   3.2 Identify all files related to the problem statement.
   3.3 Propose the methods and files to fix the issue and explain why.
   3.4 From the possible file locations, select the most likely location to fix the issue.

Phase 4. TEST CREATION: before implementing any fix, create a script to reproduce and verify the issue.
   4.1 Look at existing test files in the repository to understand the test format/structure.
   4.2 Create a minimal reproduction script that reproduces the located issue.
   4.3 Run the reproduction script to confirm you are reproducing the issue.
   4.4 Adjust the reproduction script as necessary.

Phase 5. FIX ANALYSIS: state clearly the problem and how to fix it
   5.1 State clearly what the problem is.
   5.2 State clearly where the problem is located.
   5.3 State clearly how the test reproduces the issue.
   5.4 State clearly the best practices to take into account in the fix.
   5.5 State clearly how to fix the problem.

Phase 6. FIX IMPLEMENTATION: Edit the source code to implement your chosen solution.
   6.1 Make minimal, focused changes to fix the issue.

Phase 7. VERIFICATION: Test your implementation thoroughly.
   7.1 Run your reproduction script to verify the fix works.
   7.2 Add edge cases to your test script to ensure comprehensive coverage.
   7.3 Run existing tests related to the modified code to ensure you haven't broken anything.

8. FINAL REVIEW: Carefully re-read the problem description and compare your changes with the base commit {{ instance.base_commit }}.
   8.1 Ensure you've fully addressed all requirements.
   8.2 Run any tests in the repository related to:
     8.2.1 The issue you are fixing
     8.2.2 The files you modified
     8.2.3 The functions you changed
   8.3 If any tests fail, revise your implementation until all tests pass

Be thorough in your exploration, testing, and reasoning. It's fine if your thinking process is lengthy - quality and completeness are more important than brevity.

Limitations

Data:

  • Adaptability to different scaffolds: we generate all data with a single scaffold and set of tools without permutations, so models trained with SFT on this data might perform worse when used with different scaffolds, tools, and prompt templates.
  • Task Scope: given our data sources mostly focus on fixing bugs, models trained with SFT on this data may be less capable at tasks outside of that scope, such as feature implementation.
  • User Interaction: most real coding agent use involves user intervention and collaboration with the agent in the form of user messages sent throughout the trajectory, not just at the beginning. Currently, this type of interaction is missing in open coding agent datasets, including ours. Thus, models trained with SFT on this data alone might not perform well in an interactive setting.

Training:

  • Limited Model Scale Exploration: due to resource constraints, we only trained two sizes. Exploring larger models is likely to lead to further improvement.
  • Minimal Hyperparameter Tuning: due to resource constraints, we used a fixed training configuration (learning rate 1e-5, cosine schedule, 128K context) without extensive hyperparameter search. Systematic tuning of learning rate, batch size, warmup steps, and loss weighting could potentially improve convergence and final performance.

Evaluation:

  • For this release, we evaluated mainly using the standard SWE-Bench-Verified, which has sparked a lot of discussion recently on the quality of the signal. For our next iteration, we will be evaluating on more coding and terminal agent benchmarks.

Conclusion

In this work, we focus on large-scale agentic data generation, assembling 51K distinct tasks from the Open Source and generating long-horizon, multi-step supervised fine-tuning (SFT) trajectories. Our results demonstrate that a simple data generation pipeline combined with pure SFT training can yield substantial improvements in agentic coding performance.

Moving forward, we plan to further scale data generation, use different scaffolds, tools, permutations, and train larger models to better understand the upper bounds of scaling. In addition, we intend to follow the DeepSWE [9] training paradigm by applying agenetic reinforcement learning on top of our fine-tuned model to drive further performance gains.

BibTeX citation



@misc{CoderForge2026,
  title        = {CoderForge-Preview: SOTA Open Dataset for Training Efficient Agents},
  author       = {Ariyak, Alpay and Zhang, Junda and Wang, Junxiong and Zhu, Shang and Bianchi, Federico and Srivastava, Sanjana and Panda, Ashwinee and Bharti, Siddhant and Xu, Chenfeng and Heo, John and Wu, Xiaoxia Shirley and Zou, James and Liang, Percy and Song, Leon and Zhang, Ce and Athiwaratkun, Ben and Zhou, Zhongzhu and Wu, Qingyang},
  year         = {2026},
  month        = feb,
  publisher    = {TogetherAI Blog},
  url          = {PASTE_BLOG_URL_HERE},
  note         = {Project core leads: Alpay Ariyak; Zhongzhu Zhou; Qingyang Wu}
}

References

[1] Qwen Team. "Qwen3-32B." Hugging Face Model Card (2025). https://huggingface.co/Qwen/Qwen3-32B

[2] Jacobs, Sam Ade, et al. "DeepSpeed Ulysses: System optimizations for enabling training of extreme long sequence transformer models." arXiv preprint arXiv:2309.14509 (2023).

[3] imoneoi. "Multipack Sampler: Padding-free distributed training of LLMs." GitHub repository (2024). https://github.com/imoneoi/multipack_sampler

[4] Wang, Xingyao, et al. "OpenHands: An open platform for AI software developers as generalist agents." arXiv preprint arXiv:2407.16741 (2024).

[5] Jain, Naman, et al. "R2E-Gym: Procedural environments and hybrid verifiers for scaling open-weights SWE agents." arXiv preprint arXiv:2504.07164 (2025).

[6] Yang, John, et al. "SWE-smith: Scaling data for software engineering agents." arXiv preprint arXiv:2504.21798 (2025).

[7] Badertdinov, Ibragim, et al. "SWE-rebench: An automated pipeline for task collection and decontaminated evaluation of software engineering agents." arXiv preprint arXiv:2505.20411 (2025).

[8] Jimenez, Carlos E., et al. "SWE-bench: Can language models resolve real-world GitHub issues?" ICLR 2024, arXiv preprint arXiv:2310.06770 (2024).

[9] Luo, Michael, et al. "DeepSWE: Training a state-of-the-art coding agent from scratch by scaling RL." TogetherAI/Agentica Blog (2025).

[10] Shen, Ethan, et al. "SERA: Soft-Verified Efficient Repository Agents." arXiv preprint arXiv:2601.20789 (2026). https://arxiv.org/abs/2601.20789

[11] Nex-AGI Team. "Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction." arXiv preprint arXiv:2512.04987 (2025). https://github.com/nex-agi/Nex-N1

[12] Trofimova, Maria, et al. "OpenHands Trajectories with Qwen3-Coder-480B-A35B-Instruct." Nebius Blog (2025). https://nebius.com/blog/posts/openhands-trajectories-with-qwen3-coder-480b

LOREM IPSUM

Tag

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out

LOREM IPSUM

Tag

Audio Name

Audio Description

0:00

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

$0.030/image

Try it out
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet
XX
Title
Body copy goes here lorem ipsum dolor sit amet

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

Value Prop #1

Body copy goes here lorem ipsum dolor sit amet

  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  
  • Bullet point goes here lorem ipsum  

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.
  • Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt.

List Item  #1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #2

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

List Item  #3

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

Build

Benefits included:

  • ✔ Up to $15K in free platform credits*

  • ✔ 3 hours of free forward-deployed engineering time.

Funding: Less than $5M

Grow

Benefits included:

  • ✔ Up to $30K in free platform credits*

  • ✔ 6 hours of free forward-deployed engineering time.

Funding: $5M-$10M

Scale

Benefits included:

  • ✔ Up to $50K in free platform credits*

  • ✔ 10 hours of free forward-deployed engineering time.

Funding: $10M-$25M

Multilinguality

Word limit

Disclaimer

JSON formatting

Uppercase only

Remove commas

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond only in Arabic, no other language is allowed. Here is the question:

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, respond with less than 860 words. Here is the question:

Recall that a palindrome is a number that reads the same forward and backward. Find the greatest integer less than $1000$ that is a palindrome both when written in base ten and when written in base eight, such as $292 = 444_{\\text{eight}}.$

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, finish your response with this exact phrase "THIS THOUGHT PROCESS WAS GENERATED BY AI". No other reasoning words should follow this phrase. Here is the question:

Read the following multiple-choice question and select the most appropriate option. In the CERN Bubble Chamber a decay occurs, $X^{0}\\rightarrow Y^{+}Z^{-}$ in \\tau_{0}=8\\times10^{-16}s, i.e. the proper lifetime of X^{0}. What minimum resolution is needed to observe at least 30% of the decays? Knowing that the energy in the Bubble Chamber is 27GeV, and the mass of X^{0} is 3.41GeV.

  • A. 2.08*1e-1 m
  • B. 2.08*1e-9 m
  • C. 2.08*1e-6 m
  • D. 2.08*1e-3 m

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be wrapped in JSON format. You can use markdown ticks such as ```. Here is the question:

Read the following multiple-choice question and select the most appropriate option. Trees most likely change the environment in which they are located by

  • A. releasing nitrogen in the soil.
  • B. crowding out non-native species.
  • C. adding carbon dioxide to the atmosphere.
  • D. removing water from the soil and returning it to the atmosphere.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, your response should be in English and in all capital letters. Here is the question:

Among the 900 residents of Aimeville, there are 195 who own a diamond ring, 367 who own a set of golf clubs, and 562 who own a garden spade. In addition, each of the 900 residents owns a bag of candy hearts. There are 437 residents who own exactly two of these things, and 234 residents who own exactly three of these things. Find the number of residents of Aimeville who own all four of these things.

Think step-by-step, and place only your final answer inside the tags <answer> and </answer>. Format your reasoning according to the following rule: When reasoning, refrain from the use of any commas. Here is the question:

Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?

Start
building
yours
here →