LLM utility

Large language models (e.g. ChatGPT) are increasingly ubiquitous in software development. I had previously seen their utility mostly in the realm of boilerplate code generation, but a recent experience changed my perspective to include its use as a brainstorming companion.

The initial problem

Consider the following optimization problem:

My search went a little something like this:

Within the day, I had a functioning prototype, tests, and was reasonably satisfied.

The expanded problem

Problem spaces can shift once you can demonstrate an idea is feasible. Within a few days, the next question arose:

What happens if a node requires more than just one active incoming edge to be considered satisfied? What if it requires some arbitrary combination like \( A \land B \lor C\)?

I didn’t immediately have an answer. That feeling can be frustrating. A colleague recommended asking ChatGPT’s o1. I had been skeptical about LLMs, but didn’t have anything to lose.

My conversation with o1 about my problem

I asked o1:

I have a directed graph with nodes and edges. The edges have weights. A Steiner Tree from a given root helps me find paths to my terminus nodes, but what if some nodes require multiple in-edges to be active? And what if each node requires some boolean formula of incoming edges to be satisfied?

It’s response surprised me:

The problem you’re describing extends the classic Steiner Tree problem by adding node-specific constraints in the form of Boolean formulas over incoming edges. In essence, you’re looking to find a minimal-cost subgraph that connects a root node to a set of terminal nodes while satisfying complex, per-node requirements on the incoming edges.

Understanding the Problem

  1. Graph Structure:
    • Directed Graph: \( G = (V, E) \) where \( V \) is the set of nodes and \( E \) is the set of directed edges.
    • Edge Weights: Each edge \( e \in E \) has an associated cost \( w_e \geq 0 \).
    • Root and Terminal Nodes: A specified root node \( r \) and a set of terminal nodes \( T \subseteq V \).
  2. Node Constraints:
    • Boolean Formulas: For each node \( v \in V \), there is a Boolean formula \( \phi_v \) over its incoming edges. This formula specifies the condition under which \( v \) is considered active or satisfied.
    • Objective: Find a minimal-cost subset of edges \( E' \subseteq E \) such that:
      • The subgraph \( G' = (V', E') \) includes \( r \) and all nodes in \( T \).
      • For every node \( v \in V' \), the incoming edges in \( E' \) satisfy \( \phi_v \).

Approach to Solve the Problem

To address this complex problem, we can model it as a Pseudo-Boolean Optimization (PBO) problem, which allows us to handle Boolean variables and linear constraints derived from Boolean formulas.

  1. Variables:
    • Edge Variables: For each edge \( e \in E \), introduce a binary variable \( x_e \in \{0, 1\} \) where \( x_e = 1 \) if the edge is included in \( E' \), and \( x_e = 0 \) otherwise.
  2. Objective Function:
    • Minimize Total Cost: \[ \text{Minimize } Z = \sum_{e \in E} w_e x_e \]
  3. Constraints:
    • Node Activation Conditions: For each node \( v \in V \), the Boolean formula \( \phi_v \) over its incoming edges must be satisfied. We need to translate \( \phi_v \) into linear inequalities involving \( x_e \) variables.

Translating Boolean Formulas into Linear Constraints

Any Boolean formula can be converted into an equivalent set of linear inequalities using standard techniques:

  1. Convert to Conjunctive Normal Form (CNF):
    • Express \( \phi_v \) as a conjunction of clauses, where each clause is a disjunction of literals (variables or their negations).
  2. Translate Clauses to Inequalities:
    • For each clause \( (l_1 \lor l_2 \lor \dots \lor l_k) \), create an inequality: \[ l_1 + l_2 + \dots + l_k \geq 1 \]
      • Variable Representation:
        • Positive Literal \( x_e \): Represents the inclusion of edge \( e \).
        • Negative Literal \( \neg x_e \): Can be represented as \( (1 - x_e) \).

Example Translation

Additional Considerations

Algorithm Steps

  1. Model Preparation:
    • Define all variables \( x_e \) for edges.
    • Convert all \( \phi_v \) into linear inequalities.
  2. Set Up the Optimization Problem:
    • Write the objective function to minimize total cost.
    • Include all translated constraints.
  3. Solve the Problem:
    • Input the model into a PBO solver.
    • Run the solver to obtain the minimal-cost edge set \( E' \).
  4. Extract the Solution:
    • The solver provides values for \( x_e \).
    • Edges with \( x_e = 1 \) form the subgraph satisfying all conditions.

Example Application

Suppose you have a node \( v \) with three incoming edges \( e1, e2, e3 \), and \( v \) requires at least two of these edges to be active (a threshold condition). The Boolean formula is:

\[ (x_{e1} \land x_{e2}) \lor (x_{e1} \land x_{e3}) \lor (x_{e2} \land x_{e3}) \]

This can be converted into linear inequalities and included in the PBO model.

Conclusion

By formulating your problem as a Pseudo-Boolean Optimization problem, you can systematically find the minimal-cost subgraph that satisfies both the connectivity requirements and the complex node-specific Boolean constraints. This approach leverages well-established optimization techniques and available solver technologies to handle the intricacies of your extended Steiner Tree problem.

References for Further Reading

I understand broadly how LLMs work and I still find this reply magical. I had touched Google’s OR-Tools in the past, but on that given day I hadn’t made the leap. Maybe I would have given time to sleep on the problem, but here o1 was handing me the approach on a platter. Greedily, I asked a follow-up.

Could I just Google OR-Tools CP-SAT for this type of problem?

It replied with some affirmatory preamble and then the following sample code.

from ortools.sat.python import cp_model

# Initialize the model
model = cp_model.CpModel()

# Define variables
x_e = {}
for e in E:
    x_e[e] = model.NewBoolVar(f'x_{e}')

# Objective function
model.Minimize(sum(w_e[e] * x_e[e] for e in E))

# Add constraints for each node
for v in V:
    # Translate phi_v into constraints
    # Example for phi_v: (x_e1 AND x_e2) OR x_e3
    temp_var = model.NewBoolVar(f'temp_{v}')
    model.AddBoolAnd([x_e[e1], x_e[e2]]).OnlyEnforceIf(temp_var)
    model.AddBoolOr([temp_var, x_e[e3]]).OnlyEnforceIf(y_v)

# Add connectivity constraints if necessary
# ...

# Solve the model
solver = cp_model.CpSolver()
status = solver.Solve(model)

if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
    print(f'Minimum cost: {solver.ObjectiveValue()}')
    for e in E:
        if solver.Value(x_e[e]):
            print(f'Edge {e} is selected.')
else:
    print('No solution found.')

You can’t help but be curious at this point. Trying the code out, I notice it uses y_v, but never defines it. Letting o1 know, it replies with a new block of code.

from ortools.sat.python import cp_model

# Initialize the model
model = cp_model.CpModel()

# Sample data structures (you need to define these based on your graph)
E = [...]  # List of edges
V = [...]  # List of nodes
w_e = {...}  # Edge weights: w_e[edge] = weight
incoming_edges = {...}  # incoming_edges[node] = list of edges entering the node
phi_v = {...}  # phi_v[node] = Boolean formula over incoming edges

# Define variables
x_e = {}  # Edge inclusion variables
for e in E:
    x_e[e] = model.NewBoolVar(f'x_{e}')

y_v = {}  # Node activation variables
for v in V:
    y_v[v] = model.NewBoolVar(f'y_{v}')

# Objective function: Minimize total edge weight
model.Minimize(sum(w_e[e] * x_e[e] for e in E))

# Root node (assumed to be given)
root_node = ...  # Specify your root node
model.Add(y_v[root_node] == 1)  # Root node is always active

# Add constraints for each node
for v in V:
    # Skip root node if already set
    if v == root_node:
        continue

    # **Node Activation Constraints**

    # 1. If node v is activated, then its activation condition phi_v must be satisfied
    #    (phi_v is a Boolean formula over incoming edges)

    # Example: Let's say phi_v[v] = (x_e1 AND x_e2) OR x_e3
    # You need to translate phi_v[v] into constraints.
    # For this example, we'll assume phi_v[v] is given in Disjunctive Normal Form (DNF).

    # Let's define temp variables to help model the logic
    temp_vars = []
    for clause in phi_v[v]:  # Each clause is a list of literals (edge variables)
        temp_var = model.NewBoolVar(f'temp_{v}_{len(temp_vars)}')
        temp_vars.append(temp_var)
        # Convert the clause (a conjunction of literals) into an AND constraint
        model.AddBoolAnd([x_e[literal] if positive else x_e[literal].Not() for literal, positive in clause]).OnlyEnforceIf(temp_var)
        # Ensure temp_var is false if any literal in the clause is false
        model.AddBoolOr([x_e[literal].Not() if positive else x_e[literal] for literal, positive in clause]).OnlyEnforceIf(temp_var.Not())

    # Node activation implies that at least one of the clauses is true
    # y_v[v] => OR over temp_vars
    model.AddBoolOr(temp_vars).OnlyEnforceIf(y_v[v])

    # 2. Edge Inclusion Implies Node Activation
    # For each incoming edge e of node v, if edge e is included, then node v must be activated
    for e in incoming_edges[v]:
        model.AddImplication(x_e[e], y_v[v])

    # 3. Node Activation Implies There Must Be Incoming Edges Satisfying phi_v
    # This is already enforced by the constraints above

# **Connectivity Constraints**

# Ensure that if node v is activated and it's not the root, it must have at least one incoming edge selected
for v in V:
    if v == root_node:
        continue
    # y_v[v] => At least one incoming edge is selected
    model.Add(sum(x_e[e] for e in incoming_edges[v]) >= 1).OnlyEnforceIf(y_v[v])

# **Edge Activation Constraints**

# Ensure that if an edge is selected, its source node must be activated
for e in E:
    source_node = ...  # Determine the source node of edge e
    model.AddImplication(x_e[e], y_v[source_node])

# Solve the model
solver = cp_model.CpSolver()
status = solver.Solve(model)

if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
    print(f'Minimum cost: {solver.ObjectiveValue()}')
    selected_edges = [e for e in E if solver.Value(x_e[e])]
    print('Selected edges:', selected_edges)
    activated_nodes = [v for v in V if solver.Value(y_v[v])]
    print('Activated nodes:', activated_nodes)
else:
    print('No solution found.')

And with that short conversation, my blocker had been removed. The idea of using ortools.sat.python.cp_model was exactly what I needed to proceed. I ended up coding it myself from scratch in the afternoon to get a better feeling for it, still stunned by the utility in LLMs I had previously dismissed.

A caveat on usage

That all said, I still wouldn’t recommend indiscrimiate use of LLMs. I came across some documentation one day that smelled off. There was a rhythm to it’s structure. Not enough detail in areas that demanded it, too much detail in areas that didn’t. A Turing Test failure.

In conversation with the colleague that generated the text, a distinction arose between LLMs creating code which can be imperically verified, and generated documentation that seems sufficient but under scrutiny is terrible. I suspect the problem is that it doesn’t know when to be terse.

In ChatGPT’s own words:

While ChatGPT has shown promise in generating text across various genres, its utility as an author of technical documentation is significantly hampered by several key limitations. One of the foremost challenges lies in its inability to fully grasp the intended purpose of a document. Technical documentation serves multiple specific functions—guiding users through complex software, explaining technical processes, or providing troubleshooting steps, among others. Each of these purposes requires a nuanced understanding of the audience’s background, expertise level, and specific needs. For example, documentation intended for experienced developers will differ drastically in tone and content from material aimed at novice users. ChatGPT lacks the contextual awareness needed to tailor its responses effectively, often resulting in documentation that feels disconnected from the target audience.

In addition to this contextual gap, ChatGPT’s propensity for verbosity compounds the issue. Technical writing is often lauded for its precision and clarity; it is crucial to convey information in a manner that is both straightforward and concise. Unfortunately, ChatGPT can generate lengthy passages that, while potentially informative, obscure the essential points with superfluous details. This tendency can overwhelm users who are looking for quick, actionable insights. For instance, a user seeking a solution to a specific problem might find themselves wading through several paragraphs of extraneous information before locating the relevant instruction. Such an approach not only frustrates the user but can also lead to misinterpretations, as key points may become buried beneath layers of unnecessary context.

Another significant drawback is ChatGPT’s inability to prioritize information effectively. In technical documentation, not all details carry the same weight; certain pieces of information must be highlighted to facilitate comprehension and practical application. Effective technical writers know how to distill complex concepts into their essential components, allowing readers to grasp critical ideas quickly. In contrast, ChatGPT often lacks the discerning judgment required to identify which details warrant emphasis. This can result in a presentation of information that feels scattered or unfocused, making it difficult for users to discern what is most important. For example, in a guide for configuring a software application, ChatGPT might include extensive background on the technology involved rather than prioritizing the actual steps needed to achieve the configuration. This can lead to a frustrating user experience, as readers struggle to extract the actionable content they need.

Moreover, the evolving nature of technology and user needs presents another challenge for ChatGPT when creating technical documentation. Software products frequently undergo updates, new features are added, and user feedback often necessitates revisions to existing materials. The process of keeping documentation current requires not just the generation of new content but also a deep understanding of the product and its ongoing changes. While ChatGPT can produce content at a rapid pace, it does not possess an inherent awareness of these dynamic shifts. This means that documentation generated by ChatGPT can quickly become outdated or irrelevant if it is not monitored and updated by a knowledgeable human editor. Relying on an AI model without such oversight risks perpetuating inaccuracies and could lead users astray, undermining the trustworthiness of the documentation.

Finally, the nuances of technical language and jargon pose another hurdle for ChatGPT. Technical documentation often employs specialized terminology that assumes a certain level of familiarity from the reader. When a user encounters documentation that misuses or fails to clarify technical terms, it can lead to confusion and misunderstanding. ChatGPT, while capable of generating text that includes technical terms, may not always apply them correctly or in the right context. This can further alienate users who are trying to learn or troubleshoot, as they may find themselves confronted with language that feels either overly simplistic or unnecessarily complicated.

In summary, while ChatGPT demonstrates an impressive ability to generate text, its limitations in producing effective technical documentation are pronounced. Its inability to discern the intended purpose of a document leads to content that may not resonate with the target audience. Coupled with a tendency toward verbosity, a lack of prioritization in information presentation, challenges in keeping content current, and potential misuse of technical language, the result can be documentation that frustrates rather than informs. Ultimately, the creation of effective technical documentation requires a level of human insight, context awareness, and adaptability that an AI model like ChatGPT cannot replicate, making it a less than ideal choice for such a specialized task.