Question: interest in a “high-tension preamble” track using an open S-class TXT pack?

Hi, first of all, thank you for LiveCodeBench. It has become one of the default references whenever people talk about realistic coding benchmarks, especially in contrast to older, saturated datasets.

I am coming from a slightly different direction. I maintain an open framework called WFGY, and the current release, WFGY 3.0, is a pure-text “Singularity Demo” pack. It is 131 S-class questions written as long-form prompts, targeting things like alignment edge cases, long-horizon planning, fragile world-models and other high-tension scenarios. Everything is MIT-licensed and already used as a long-range stress test for LLMs.

One idea I have been exploring is: instead of only measuring raw coding performance, we can also ask “what happens if the model first absorbs a highly loaded theoretical context, then tries to solve hard code problems afterwards”. In other words, use a TXT like WFGY 3.0 as a fixed preamble and see whether LiveCodeBench scores shift, collapse, or reveal new failure modes.

Would something like a “high-tension preamble” track fit into your roadmap at all? Even a small experimental variant, where selected LiveCodeBench tasks are run with and without such a preamble, could already be very informative.

If this is compatible with your design philosophy, I can draft a minimal proposal so you can quickly judge whether it is worth integrating, or keep it as an external experiment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: interest in a “high-tension preamble” track using an open S-class TXT pack? #142

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: interest in a “high-tension preamble” track using an open S-class TXT pack? #142

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions