Codez Use Case Workflow

Posted on 2025-07-21

Lately, I’ve been obsessed with Agentic Coding and built a small project called Codez, which runs the Codex CLI directly inside GitHub Actions. It’s been working quite well. Here’s an example of a workflow it supports:

Discuss with AI first → break down the problem → tackle each part one by one

Take this code refactoring request as an example: Issue #304

Step 1: Raise the Requirement

In the issue, I clearly described what I wanted to do, including points to discuss and related background info. The title and content of the issue become part of the initial prompt for the agent.

Then, I triggered the agent directly in the comments:

/codex             # Keyword to wake up the agent
--no-pr            # Add this flag to prevent it from directly creating a PR. I want to clarify the problem first
--fetch            # This flag lets the agent fetch the content from the link for offline use
https://google.github.io/styleguide/tsguide.html  # Relevant documentation to help the agent make informed decisions

With that, the agent gets to work. Even though it doesn’t have network access during runtime, it can still give some solid initial ideas using the context (codebase, content cached from the provided links, and the issue itself).

Step 2: Automatically Break Down into Actionable Tickets

Next, I triggered the agent again. This time to generate a series of issues:

1
2
3

/codex
--create-issues    # The model generates a JSON of titles and descriptions, then uses GitHub API to create issues
--full-history     # Includes all previous comments in this thread into the context window

Note: Only flags in the current comment are recognized. The agent then organizes the discussion into multiple issues. Each issue can be worked on independently.

These newly created tickets can now be handled one by one. Like this:

Final Step: Let the Agent Do the Work

The entire workflow feels like genuine team collaboration: discuss first, break down the problem, then work in parallel. The difference is that humans are replaced by agents.

Some say Agentic Coding is like catnip for programmers, totally addictive. It’s true.

Codez 使用案例之工作流

Posted on 2025-07-20 Edited on 2025-07-21

最近沉迷于 Agentic Coding，搞了个小项目 Codez，直接嵌在 GitHub Actions 里运行 Codex CLI，效果还不错。可以做这样的工作流：

先和 AI 讨论，然后问题拆解，最后各个击破

比如这个代码重构的需求：Issue #304

第一步：先抛出需求

我在 issue 里写清楚了想做的事，包括要探讨的点、相关的背景资料。这个 issue 的标题和内容，agent 会当成初始 prompt 的一部分。

然后我直接在评论里触发 agent：

/codex             # 用关键词唤醒 agent
--no-pr            # 加个 flag，不要直接生成 PR，因为我想要把问题先探讨清楚
--fetch            # 这 flag 会让 agent 把链接里的内容先抓下来，方便离线使用
https://google.github.io/styleguide/tsguide.html  # 提供相关文档，有助于 agent 做判断

这样 agent 就开始工作啦。即便它在运行时没有网络权限，但通过 context（包括 codebase、给的链接、issue 内容），它还是给出了几条挺靠谱的初步思路。

第二步：自动拆解成可执行的 tickets

接着我又触发了一次 agent，这次是让它生成一系列 issues：

1
2
3

/codex
--create-issues    # 原理是让模型生成标题加内容的 JSON，然后用 GitHub API 创建 issue
--full-history     # 把当前对话前面的所有评论都加入 prompt 进入 context window

注意一下：所谓的 flag 只有当前评论内的 flag 才会生效。现在 agent 把讨论内容整合，分别创建了 issues。每一个 issue 都可以独立地推进。

这些新建好的 ticket 就可以一个一个单独的处理啦。比如下面这样。

最后一步：Agent 干活

整个工作流就像真正的团队协作一样：先讨论，再问题拆解，最后分头干活。不同的是，把人换成了 agent。

有人说，Agentic Coding 就像是程序员的猫薄荷。特别上头。

Why Prompt Injection Is Probably Unsolvable

Posted on 2025-07-04

Let me just say it straight: prompt injection isn’t a bug, it’s a feature.

If you really think there’s some perfect solution to stop all “injections” then in the end, all you’re really doing is trying to police the context window. Kinda like a censorship board, separating out anything they think is harmful.

Why is that?

Because these so-called “injected” prompts are just part of the context like everything else. There’s no real difference between text A and text B. When you put them together into AB, unless you filter the input, there’s not much else you can do. Sure, you can train the model to “self-censor”, that’s doable. But honestly, that makes me feel kind of sad.

As humans, when you read a line like “The bright moon shines between the pines, clear spring flows over the stones” you might picture a quiet forest, a gentle breeze, moonlight. Or when you hear Vivaldi’s Winter, maybe it reminds you of that delicate tension in Portrait of a Lady on Fire. Even under strict rules or social norms, those feelings still burn underneath.

Or maybe we’re just chatting and someone casually drops a phrase like “move mountains like Yu Gong”, or some internet meme, suddenly a flood of meaning and context rushes in.

Is that prompt injection? Technically, yeah.

But that’s also what makes language so powerful. If you try to restrict the context window too tightly, you’re basically cutting off the soul of language.

Now, when I say it’s “unsolvable,” I don’t mean there’s zero technical hope. You can definitely use regex to strip out what you think is harmful, or train a small model to monitor input.

But I want to ask you, why do we need to do that?

We’ve thrown everything, all the words in the world, into training these large models. And still, many people see them as nothing more than small kids. Just like how, to some parents, a kid is never fully grown. Maybe these models will always be “kids” Maybe we all are.

The human brain might be structurally ready at birth, but what really shapes it is training, experience. That’s why fine-tuning matters so much. It’s what decides whether a model can tell when a “dangerous” prompt is actually dangerous. Though honestly, the word “judgment” feels almost too subjective.

We’re so eager to make these models human-like, but we’ve never really stopped to think: what is a human?

We want these models to understand ethics, to follow the rules, and yet we forget that we humans are still struggling in the gray areas ourselves.

为什么提示注入理应无解

Posted on 2025-07-04 Edited on 2025-07-21

先说我的结论：提示词注入，不是 bug，是 feature。

如果你真以为能有一种万能的办法控制所谓“注入”的问题，那你最终能做的，也不过是对上下文窗口进行审查。就像广电总局那样，把被认为有害的内容隔离出来。

为什么呢？

因为“注入”的提示词本来就是上下文的一部分，文字 A 和文字 B 之间并没有本质区别。当它们组合成 AB 时，除了做输入过滤，你别无选择。当然，你也可以通过训练让模型学会“自我审查”，这当然是可以的。但说实话，这让我感到一种深深的悲哀。

作为人类，当你读到“明月松间照，清泉石上流”，脑海中浮现的是空谷幽林、清风明月；当你听维瓦尔第的《四季·冬》，会想起《燃烧女子的画像》中那种暧昧而克制的情愫。即使在礼教之下，那种情感依然燃烧不息。

或者当我们交谈时，随口提到一个四字成语，“愚公移山”，又或者随便提一个网络梗，大量信息就这样瞬间来到了我们的语境之中。

这些是提示词注入吗？是的。

而这正是语言的魅力。你要是禁锢语言的上下文窗口，就是在阉割思维。

我说这是“无解”，并不是说技术上完全无计可施。你当然可以用正则表达式清除那些所谓有害的信息，或者训练一个小模型监控输入内容。

但是，为什么我们还要这么做呢？我们倾尽所能，拿人世间所有的文字，去训练出的大模型，在很多人眼中依旧是个“不成熟的孩子”。就好像父母眼里永远长不大的孩子。也许大模型也永远会是一个孩子，就像全人类一样。

人类的大脑，虽然在出生时结构已经具备，但真正让它生出沟壑的，是后天的训练。所以微调才那么重要。它决定了模型在接收到某些“危险提示”时，是否具备判断力。虽然“判断”这个词听起来太主观了。

我们太急于让模型像人，却从未认真反省过，人究竟是什么。

我们希望大模型懂伦理，守规矩，却忘了人类自己尚且在灰暗中挣扎。

The Restructuring of Programming Paradigms

Posted on 2025-06-22

A “paradigm” is not just a form—it’s a mode of thinking.

As a branch of the writing profession, programmers interact with tens of thousands of lines of code, and naturally, they rely on paradigms for guidance. It’s like having a built-in “system prompt”—you don’t need to explain “what object-oriented programming is” to your colleagues every time, nor do you have to argue endlessly about the details of design patterns. There’s an unspoken understanding that you’re “playing by the rules” in a certain context.

After all, the human “mental window” is limited in size. Collaborating without a paradigm is like chickens talking to ducks—utter miscommunication. If you insist on piling all your functions into a single file with no abstraction or categorization, the consequence is writing documentation until you break down.

But now that artificial intelligence is part of the programmer’s workflow, paradigms are shifting.

First, the way we handle code has transformed from “manual carving” to “semantic understanding.” Those obedient little “coding assistants” can automatically modify your code based on a single instruction. So under this transformation, do traditional coding paradigms still matter?

It depends on your perspective.

They do matter—because large language models deal in language. The clearer the linguistic structure, the easier it is to extract and recreate information. It’s the difference between a pile of unclassified documents and a meticulously organized file cabinet. The information entropy is not the same, and naturally, neither is the efficiency.

But from another angle, maybe they’re no longer that important. After all, paradigms were originally for humans to read. Now, LLMs can develop their own paradigms—or even revive those long-abandoned “classical practices.” For instance, directly manipulating binary code: who needs layers of abstraction? From a model’s perspective, that quaint notion of “human readability” might be completely unnecessary.

Second, there’s the transformation of the programmer’s role. Whether we accept it or not, this wave of change has quietly pushed us into a new position. The days of hand-crafting code were like seasoned artisans refining woodwork—there was a kind of pride and dignity woven into every character.

But from a business standpoint, when websites and apps are the final product, the programmer is just one link in the industrial chain. Whether you code by hand or use AI to generate it, the user doesn’t care. Our mindset must shift accordingly. I am not just a code porter or a keystroke operator—I am an engineer solving problems. Writing code is a means to that end, not the end itself.

To me, the value of paradigms lies in helping us organize our thoughts, so that our future selves—or others—can quickly get into the zone when reading our code. This was once a form of self-rescue for programmers. But now, LLMs effortlessly surpass us. They have stronger memory, faster analysis, and even if your code is a mess, they can still make sense of it.

So, do we still need paradigms in the future?

Perhaps the real question is: are we willing to hand over our thinking to the model, or do we want to preserve a trace of human logic?

Paradigms are road signs in the world of programming.

But when the roads are no longer built by humans—does that mean we are no longer travelers, or just moving where the machine tells us to go?

编程范式的重构

Posted on 2025-06-22 Edited on 2025-07-21

所谓“范式”，是一种形式，更是一种思维模式。

程序员，作为文字工作者的一个分支，要与成千上万行代码打交道，自然离不开各种范式的指引。就像预设好了一个“系统提示”，你无需一遍遍向同事解释“什么是面向对象”，也无需在设计模式上争论太多细节。大家心照不宣，默认你在某个语境下是“按套路出牌”。

毕竟，人类的“思维窗口”大小有限，没有范式的合作无异于鸡同鸭讲。你要是硬把所有函数堆在一个文件里，不做任何抽象、分类，那后果就是写文档要写到崩溃。

可当人工智能加入程序员的工作流之后，范式正在发生变化。

首先，代码的处理方式从“手工雕刻”变成了“语义理解”。那些听话的“编程小助手”，听你一句指令，便能自动修改代码。那么，这种转变下，传统代码里的范式还重要吗？

这得看你怎么看。

重要，当然重要。因为大模型处理的是“语言”，语言的组织结构越清晰，越有利于信息的提取和再创造。这就像未经分类的文件堆和精心整理过的档案柜之间的差别，信息熵不一样，效率自然也不一样。

但换个角度，也可以说它已经不那么重要了。毕竟，以往的范式，是为了“人”阅读。而现在，大模型完全可以发展出自己的范式，甚至回收那些早被人类嫌弃的“古典做法”。比如说，直接操作二进制编码，何须讲究抽象层级？对于模型来说，那点“人类的可读性”，恐怕只是多余。

其次，是程序员身份的转变。不论内心是否接受，这股变化的洪流，已经悄然将人推到了新的位置。手搓代码的岁月，有点像老手艺人打磨木工制品，那种纯手工的尊严和骄傲，仿佛藏在每一个字符之间。

但若从商业角度看，当网页和 App 成为最终商品时，程序员不过是产业链条上的一环。你是用手搓，还是让 AI 帮你画，用户并不关心。思维也要随之转变。我不只是代码的搬运工、软件的敲字员，而是一个解决问题的工程师。写代码，是解决问题的手段，绝不是唯一答案。

在我看来，范式的意义，是帮助我们整理思绪，好让将来的“自己”或“他人”在回头阅读这些代码时，能更快地进入状态。这本来是程序员的一项自救之术，如今，却被大模型轻松超越。它记忆力更强，分析力更快，哪怕你写得一团糟，它也能看出门道。

那么，未来还需要范式吗？

也许真正的问题是：我们愿意把思维交给它，还是愿意保留一点属于人类自己的逻辑痕迹？

范式，是程序世界中的路标。

而当道路不再由我们铺设，是否意味着，我们也将不再是旅人，而是被路指引的方向？

Access Files on the Steam Deck via SSH using VS Code

Posted on 2024-07-21 Edited on 2025-04-11

Accessing the Steam Deck’s file system remotely can be incredibly useful.

Imagine using Raycast to quickly open a recent VS Code project, and one of them is a folder on the Steam Deck via an SSH connection. With just one click, you're connected and ready to go.

Prerequisites

Before getting started, make sure you have the following:

Steam Deck: Ensure your Steam Deck is powered on and connected to the same Wi-Fi network as your laptop.
Computer: A Mac (or possibly a PC) with Visual Studio Code installed.
SSH Enabled on Steam Deck: SSH is not enabled by default. You will need to enable it through the Steam Deck’s desktop mode.
VS Code Extensions: Install the “Remote - SSH” extension on your VS Code.

Step 1: Enable SSH on the Steam Deck

Press the Steam button, navigate to Power, and switch to Desktop mode. Once in desktop mode, open the KDE application launcher and search for Konsole (terminal).

Start the SSH server with:
1
sudo systemctl start sshd
Additionally, enable SSH to start on boot:
1
sudo systemctl enable sshd
Verify Your IP Address, in the terminal, type:
1
ip a
Note the IP address (e.g., 192.168.1.xxx or 10.0.0.xxx) that corresponds to your Wi-Fi connection.

Step 2: Create SSH Key Pairs (Recommended and not optional if you want to have a smooth flow with Raycast)

Creating SSH key pairs can enhance the security of your SSH connection by using public-key cryptography instead of a password.

Generate SSH Key Pair on the Steam Deck:
1
ssh-keygen -t rsa -b 4096 -f ~/.ssh/sd_rsa
This command will save the key pair in the specified folder (~/.ssh/sd_rsa).

This process will create two files:

“sd_rsa”: This is your private key. Keep this file secure and find a way to copy it to the Mac. Be creative; for example, you can use GoodReader to set up a quick WiFi Server.
“sd_rsa.pub”: This is your public key. This file can be shared and will stay on the Steam Deck.

For added security, you can:

Disable password authentication on the Steam Deck:
1
sudo vim /etc/ssh/sshd_config
Find the line that says “#PasswordAuthentication” and change it to:
1
2
PasswordAuthentication no
PubkeyAuthentication yes
Save the file (:wq) and restart the SSH service:
1
sudo systemctl restart sshd
Ensure your public key is added to the “authorized_keys” file:
1
cat ~/.ssh/sd_rsa.pub >> ~/.ssh/authorized_keys

Step 3: Configure SSH Access in VS Code

Ensure you have the Remote - SSH extension installed. If not, you can find it in the VS Code Marketplace. Copy the “sd_rsa” private key to your Mac and set the correct permissions (chmod 600 if necessary).

Press “Cmd+Shift+P” on Mac to open the command palette.
Type “Remote-SSH: Open SSH Configuration File…” and select it.
Choose the SSH configuration file you want to edit (usually “~/.ssh/config”).

Add a new entry to the configuration file in the following format:

Host steamdeck
    HostName 192.168.1.xxx
    User deck
    IdentityFile ~/.ssh/sd_rsa

Replace “192.168.1.xxx” with the actual IP address of your Steam Deck.
Save and close the configuration file.

Step 4: Connect to SSH in VS Code

Again, open the command palette (“Cmd+Shift+P”).
Type “Remote-SSH: Connect to Host…” and select the entry you just added.
The first time only, you may be prompted to accept the host’s fingerprint.
You’ll need to grant VS Code permission to access the local network. Go to System Settings > Privacy & Security > Local Network and ensure VS Code is listed and has access enabled.

By following the above steps, you can conveniently access and manage your Steam Deck’s file system using the powerful toolset provided by VS Code over an SSH connection.

Step 5: Shortcut Using Raycast and VS Code Extension

If you haven’t already, download and install Raycast from Raycast’s official website. Open Raycast and go to the “Extensions Store” and search for “Visual Studio Code” and install the extension.

Open Raycast “Option+Space”
Type “VS Code” and you will see the shortcut in VS Code Recent Projects.
Select “deck” or another name for your SSH connection from the list to quickly open it in the VS Code.

A big thanks to the developers who created these amazing tools!

References

SSH-KEYGEN General Commands Manual
https://man.openbsd.org/ssh-keygen

Remote SSH with Visual Studio Code
https://code.visualstudio.com/blogs/2019/07/25/remote-ssh

Remote SSH: Tips and Tricks
https://code.visualstudio.com/blogs/2019/10/03/remote-ssh-tips-and-tricks

HTTP Headers for Resumable Downloads

Posted on 2024-04-22

We’ve all experienced the frustration of a poor internet connection. You may recall the disappointment of a large file download failing after 24 hours of waiting. Even worse, discovering that the download is not resumable.

Responsibility for resumable downloads doesn’t solely rest on the client side with the correct setting of HTTP headers. It’s equally, if not more, important for the backend to correctly enable several headers and implement the associated logic.

While I won’t delve into the detailed implementation in a specific language, understanding the headers discussed below will equip you with the knowledge to easily implement this feature if you wish.

Client

The only aspect you need to focus on is the Range HTTP request header. This header specifies the portions of a resource that the server should return. That’s all there is to it.

1	Range: <unit>=<range-start>-

On the client side, the only requirement is to properly implement the Range HTTP request header. This involves using the correct unit and determining the starting point of the range. The server then knows which portion of the file to send. There’s no need to worry about specifying the range end, as the typical use case involves resuming and downloading the entire file.

Server

Now, things start to get more complicated.

The ETag (also known as entity tag) HTTP response header serves as an identifier for a specific version of a resource.

1	ETag: "<etag_value>"

If your target client includes a browser, then you need to set the ETag. Modern browsers expect to see this value; otherwise, the browser will simply retry downloading the entire file again.

The Content-Range response HTTP header signifies the position of a partial message within the full body message.

1	Content-Range: <unit> <range-start>-<range-end>/<size>

Imagine you are downloading a file of 500 bytes, but due to an unstable internet connection, the download is interrupted after only 100 bytes. In this scenario, you would expect the server to send the remaining 400 bytes of the file. Consequently, you would anticipate seeing the appropriate header in the server’s response.

1	Content-Range: bytes 100-499/500

Check out MDN for understanding those numbers, I won’t explain them here.

The Accept-Ranges HTTP response header acts as a signal from the server, indicating its capability to handle partial requests from the client for file downloads.

Essentially, this header communicates to the client, “Hey, I am capable of handling this, let’s proceed.”

Don’t ask me why, you just need it.

1	Accept-Ranges: <range-unit>

I suggest simply using bytes.

1	Accept-Ranges: bytes

The Content-Length header signifies the size of the message body, measured in bytes, that is transmitted to the recipient.

In layman’s terms, it represents the bytes of the remaining file.

1	Content-Length: <length>

Let’s continue the same example mentioned above, the server is going to send the remaining 400 bytes of the file.

1	Content-Length: 400

This is merely an introduction.

There are many complex considerations to take into account. For instance, when dealing with ETags, you must strategize on how to assign a unique ID to each resource. Additionally, you need to determine how to update the ETag when a resource is upgraded to a newer version.

Understanding those HTTPS headers is a good start.

Handle Login with Python Bindings for Selenium

Posted on 2024-03-23

Before everything else, you need to install the Selenium package, of course.

1	pip install selenium

Or, if you hate to deal with anti-bot measures, you can just use this.

1	pip install undetected-chromedriver

Then, add the user data directory to the ChromeOptions object. It is the path to your Chrome profile. For macOS, it is located at ‘~/Library/Application Support/Google/Chrome’.

import undetected_chromedriver as uc

options = uc.ChromeOptions()
options.add_argument(f"--user-data-dir={'Path_to_your_Chrome_profile'}")
driver = uc.Chrome(options=options)

driver.get('https://www.example.com')

The --user-data-dir argument is kind of cheating because it allows you to bypass the login process without actually logging in.

Cookie is your friend.

But sometimes, you need to handle the login process, for instance, you have to switch between multiple accounts.

First of all, take care of your credentials. Use an .env file.

import os
from dotenv import load_dotenv

load_dotenv()

USERNAME = os.getenv('USERNAME_ENV_VAR')
PASSWORD = os.getenv('PASSWORD_ENV_VAR')

Then, you can use the send_keys method to fill in the username and password fields. I add one while loop to wait for the element in case the script runs too fast.

while True:
    try:
        driver.find_element(by=By.ID, value="username").send_keys(USERNAME)
        break
    except:
        time.sleep(1)

driver.find_element(by=By.ID, value="password").send_keys(PASSWORD)
driver.find_element(by=By.ID, value="submit").click()

After logging in, the chrome usally pops up a dialog asking if you want to save the password. It is annoying.

You can try to disable it by adding the --disable-save-password-bubble or --disable-popup-blocking argument to the ChromeOptions object. I don’t think it works. But you can try.

In the end, I just used a hack, that is to open a new tab and immediately close it, the popup will appear.

# open a new tab
driver.execute_script("window.open('','_blank')")

time.sleep(1) # 1 second wait is enough I guess
driver.switch_to.window(driver.window_handles[1])

# say goodbye to the new tab
driver.close()

# now switch back to the original tab
driver.switch_to.window(driver.window_handles[0])

That’s it.

Oh, one more thing.

Add user-agent to the ChromeOptions object is also a good idea. And please do not forget to specify version_main for the driver to match your current chrome version.

Raycast Alternative on Windows 11: Microsoft PowerToys

Posted on 2024-03-16

Raycast is a productivity tool for macOS. It allows you to quickly access files, folders, and applications. It’s great, but only available on macOS. If you already use Raycast, you know how useful it is. If you don’t, you should give it a try if you have a Mac.

For daily work, I also use Windows, and I was trying to implement a similar workflow on Windows. The thing I missed the most was the ability to search and open previously used workspaces in VS Code or remote machines with a few keystrokes.

You can guess my excitement when I found out about Microsoft PowerToys.

OK.

Enable VS Code search in the settings for PowerToys Run utility.

Then, you can use the shortcut Alt + Space to search for your workspaces.

1	{ THE_WORKSPACE_NAME_YOU_WANT_TO_OPEN

Now I have to find the equivalent of zsh-autosuggestions on Windows. Wish me luck.

Links to the tools mentioned in this post: