Skip to content

Question: interest in a “high-tension preamble” track using an open S-class TXT pack? #142

@onestardao

Description

@onestardao

Hi, first of all, thank you for LiveCodeBench. It has become one of the default references whenever people talk about realistic coding benchmarks, especially in contrast to older, saturated datasets.

I am coming from a slightly different direction. I maintain an open framework called WFGY, and the current release, WFGY 3.0, is a pure-text “Singularity Demo” pack. It is 131 S-class questions written as long-form prompts, targeting things like alignment edge cases, long-horizon planning, fragile world-models and other high-tension scenarios. Everything is MIT-licensed and already used as a long-range stress test for LLMs.

One idea I have been exploring is: instead of only measuring raw coding performance, we can also ask “what happens if the model first absorbs a highly loaded theoretical context, then tries to solve hard code problems afterwards”. In other words, use a TXT like WFGY 3.0 as a fixed preamble and see whether LiveCodeBench scores shift, collapse, or reveal new failure modes.

Would something like a “high-tension preamble” track fit into your roadmap at all? Even a small experimental variant, where selected LiveCodeBench tasks are run with and without such a preamble, could already be very informative.

If this is compatible with your design philosophy, I can draft a minimal proposal so you can quickly judge whether it is worth integrating, or keep it as an external experiment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions