Frank

Create joy for myself, create value for others, and walk toward freedom with courage and independent thinking

China Supermarket 2025: Walmart, Sam's Club, Hema, and Yonghui — Four Paths

The same China supermarket market in 2025 split into four very different trajectories: Walmart's hypermarkets closed 129 stores over five years while restructuring; Sam's Club broke 140 billion RMB in sales and kept opening stores; Hema hit profitability for the first time in a decade with GMV near 100 billion; Yonghui, after being bought out by Miniso founder Ye Guofu for 6.27 billion RMB, is remaking 315 stores under a 'Pang Donglai model' bailout. This post compares scale, financials, business models, operating efficiency, and management styles with public data (annual reports, official disclosures, third-party research).

From the 2026 Iran War to Iraq: Three Shapes of the Same U.S. Military Machine

Operation Epic Fury, launched February 2026, is the most consequential high-intensity U.S. campaign in two decades — 52 days, 1,700+ targets, all six services in the fight, a foreign head of state killed. This essay uses Epic Fury as the main case, dissecting command chain, acquisition, logistics, training-and-exercise pipeline, and strategic implications across five sections. Then it looks back at the 8-month-earlier Midnight Hammer precision strike. Finally the 22-year-older Iraq War as historical reference. Three cases, ordered newest to oldest, showing how the same organizational machine flexes across wildly different mission intensities.

A Private Recommendation System

Social media recommenders want me to stay as long as possible, and they'll do whatever it takes to poke at my emotions — which isn't pleasant. RSS readers don't push emotional buttons, but once the volume grows I get buried under things I don't care about. So I built a private recommender that balances quality and efficiency.

Book Notes: Freedom of Money

Finished CZ's autobiography Freedom of Money in two days. I don't follow crypto, but his story — going all-in on Bitcoin, becoming the richest Chinese person, and then serving time in a US prison — is genuinely fascinating.

Survival, Competition, and Freedom

People go through three stages when they work: survival is driven by instinct, competition is framed by the coordinate system your rivals set, and true freedom means standing in an open wilderness with no compass. Freedom without values is a painful cage; freedom with values is the most powerful engine.

When Hardcore Industry Goes Consumer-Grade

Shenzhen and the Yangtze River Delta have a breed of company that takes industrial-grade hardcore hardware and collapses it into consumer products, turning creation and manufacturing into entertainment. DJI collapsed $100,000+ professional drones to a $3,000 smart flying camera. Bambu Lab turned a $20,000 3D printer into a $2,000 out-of-the-box machine. xTool pushed $10,000 industrial laser cutters down to $1,500. Hypershell compressed the 30kg military exoskeleton into a 2kg / $799 consumer version. Unitree undercut Boston Dynamics' $75,000 Spot to a $1,600 Go2. Xmachine took the $1M+ factory five-axis CNC and put it on your desk for $8,000. Read together they show how consumer-grade downsizing works in today's China — not through policy, but through supply chain density, engineering cost structure, iteration speed, and go-to-market on overseas markets.

Dissecting the Business Models of Content Creation

Content creation might be the biggest lever an ordinary person can pull today — low barrier, high ceiling, zero marginal cost of distribution. Strip it down to its business model, and there are really only three ways to make it work: sell the content itself, sell other people's products, or sell your own.

Compressed Boundaries

Knowledge splits in two: Techne, which can be serialized, and Metis, the embodied know-how that grows from context, trial, and error. AI is brilliant at compressing the first and can't touch the second — that's where its capability ends. As standardized work gets automated away, the knowledge that can't be compressed becomes scarcer, not less valuable.

Reading Notes: The Republic of Technology

Palantir's CEO Alex Karp released a new book in 2025 titled "Tech Republic," which was also published in mainland China by the end of the year. I bought it immediately and read it. The book's viewpoints represent the right-wing ideology of Silicon Valley, and its influence is evident in current American politics.

Data Suppliers Behind Large Models - Surge AI

First learned about Surge AI from Edwin Chen's podcast interview during their initial fundraising. Edwin's extremely pragmatic and efficient views were impressive.

VL Model Behind Doubao AI Phone

According to public reports, the model used by Doubao AI phone is a closed-source version optimized for mobile based on UI-TARS. UI-TARS is derived from SFT on Alibaba's Qwen2 VL, with a 7b version open-sourced (Qwen2 VL has models ranging from 3b to 72b open-sourced). This post will not delve into Qwen (Qwen2 VL already includes UI Operation features), but will focus on further improvements of the UI-TARS model on Qwen2 VL, covering data and training aspects.

Using UTM Tags to Analyze Traffic Sources

When promoting, we typically use multiple channels: cold email, Google ads, Twitter promotion, SEO optimization, community content, etc. Understanding our traffic sources and conversion effectiveness is crucial, as it helps us further optimize our marketing strategy. Today, I'll share a simple way to differentiate traffic and analyze conversion effectiveness:

Cold Start and Growth Strategy for jenni.ai

jenni.ai is a tool for assisting with essay writing and reading, currently generating \$5M ARR with 2.5M users and still growing rapidly. The author expects to reach \$10M~\$20M ARR. Their CEO, David Park, sincerely shared their revenue and user growth strategies, offering many valuable insights.

Benefits Related to Startups

Recently, I plan to take advantage of some deals and have compiled the benefits offered by major international companies, mainly focusing on cloud services and OpenAI tokens.

Quantitative Analysis of PyTorch Training Acceleration

This article starts with a baseline and gradually optimizes training speed through various software and hardware methods, ultimately reducing training time to 1/8.

Milestones in Neural Architecture Search (NAS)

Neural Architecture Search (NAS) has been extremely popular this year. This post briefly outlines some of the works I find particularly representative. Feel free to point out any errors or omissions. hhhh

Feeding the GPU in Deep Learning

Recently, I trained several models and found that more GPUs don't always lead to better results. Sometimes, there's no difference between using one V100 and two V100s. I later discovered the bottleneck was elsewhere. This article summarizes some tricks I've used.

Learning to Push by Grasping: Using Multiple Tasks for Effective Learning

Currently, end-to-end learning frameworks are becoming popular in the field of robotic control. These frameworks take states/images as direct input and output predicted torque and action parameters. However, they have been criticized for their high data demands, sparking discussions about their scalability. Specifically, does end-to-end learning require a separate model for each task? Intuitively, sharing between tasks is beneficial because they require some common understanding of the environment. This paper explores the next step in data-driven end-to-end learning frameworks, moving from task-specific models to joint models for multiple robotic tasks, yielding surprising results: multi-task learning outperforms single-task learning with the same amount of data. For example, in the grasp task, a model trained with 2.5k grasp data and 2.5k push data performs better than a model trained with 5k grasp data alone.

Playing Atari with Deep Reinforcement Learning

This paper by Volodymyr Mnih, presented at NIPS 2013, is essentially the pioneering work on DQN, along with another paper published in Nature in 2015.

Cityscapes Dataset

Cityscapes is typically used for semantic segmentation and contains data divided into 8 categories, including one named "void." Each category has multiple classes, totaling 30 classes in Cityscapes. However, there are 35 labeled types after numbering, including labels like "unlabeled" that are not counted as classes.

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Previous segmentation networks were either slow or had low accuracy. Here, an EDANet module is designed, combining asymmetric conv, dilated conv, and dense connectivity. It outperforms FCN in all aspects and does not require a decoder structure, context module, post-processing scheme, or pretrained model. Experiments were conducted on Cityscapes and CamVid.

Darts: Differentiable Architecture Search

This paper aims to challenge structure search by defining the task in a differentiable form, rather than using traditional methods that rely on reinforcement learning in a discrete, non-differentiable space. The approach is based on continuous relaxation of structure representation, allowing efficient methods like gradient descent for structure search. Subsequent experiments demonstrate that the algorithm performs well in exploring high-performance CNN structures for image recognition and RNN structures for language modeling, and is much faster than existing state-of-the-art non-differentiable structures.

Compressing Neural Networks with the Hashing Trick

Deep networks are increasingly applied on mobile devices, highlighting a dilemma: while deep learning trends toward developing models that can absorb larger datasets, mobile devices have limited storage and cannot accommodate overly large models. HashedNets are introduced to reduce model size by minimizing inherent redundancy within neural networks. HashedNets use a low-cost hash function to randomly group connection weights into different hash buckets, where all connections in the same bucket share a single parameter value, adjusted during standard backpropagation. This hashing process does not incur additional memory overhead. Performance on various benchmark datasets demonstrates that HashedNets can significantly reduce storage requirements while maintaining generalization performance.

ShuffleNetV2

Many network designs today focus on non-direct metrics like FLOPs for computational complexity, but direct metrics such as speed are influenced by more than just FLOPs, including MAC (memory access cost) and platform characteristics. This article aims to measure directly on specific platforms, which is more effective than only considering FLOPs. Through a series of controlled experiments, it proposes guidelines for efficient networks, leading to the development of a new architecture, ShuffleNetV2. Comprehensive ablation experiments demonstrate that this model achieves state-of-the-art performance in balancing efficiency and accuracy.

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

This article introduces an efficient network, ShuffleNet, which primarily uses pointwise group convolution and channel shuffle operations. These techniques significantly reduce computational costs while maintaining accuracy, outperforming previous networks on ImageNet and COCO.

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

For mobile and embedded vision applications, this blog post introduces an efficient model called MobileNets, a lightweight neural network constructed using depthwise separable convolutions. The model employs two hyperparameters to balance accuracy and latency, and extensive experiments on ImageNet demonstrate its powerful performance compared to other models. Experiments also showcase ImageNet's strengths in various applications, including object detection, fine-grained classification, facial attributes, and large-scale geolocation.

InceptionV4 Summary

In recent years, very deep convolutional neural networks have significantly enhanced image recognition performance. The Inception network structure offers excellent performance with relatively low computational cost. The combination of recent residual connections with traditional structures achieved the best results at the 2015 ILSVRC, comparable to InceptionV3. Integrating Inception networks with residual connections has been shown to significantly accelerate the training of Inception networks. There is also evidence that Inception networks with residual connections perform slightly better than those without, despite having nearly the same computational load. This article introduces some new Inception networks with and without residual connections, which also noticeably improved single-frame classification performance in the 2012 ILSVRC. Lastly, it mentions that using appropriate activation scaling can make training very wide residual connection Inception networks more stable.

Derivatives of Vectors and Matrices

In machine learning algorithms, you'll encounter numerous matrix-related differentiation and derivation tasks. Here, we introduce some common differentiation formulas related to matrices and vectors.

A General Solution to Stock Problems in Dynamic Programming

There is a type of dynamic programming problem where you are given a stock price sequence and need to calculate the maximum profit from buying and selling stocks. These problems often have many variations, such as allowing only one transaction, multiple transactions, or imposing a transaction tax. The maximum profit is usually determined by the timing of the trades and the allowed maximum number of transactions (each transaction being a combination of one buy and one sell).

Definition of Convex Sets and Common Convex Sets

Similar to solving optimization problems with only equality constraints as discussed earlier, optimization problems with inequality constraints can also be solved using the Lagrange multiplier method.

Derivation of SVM (3)

In the previous post, we introduced the derivation of hard-margin SVM. This article will continue with the mathematical derivation of soft-margin SVM, which allows for some misclassification when samples are not linearly separable.

Derivation of SVM (2)

In the previous article (1), we discussed the derivation of hard-margin SVM and its dual form, which can be simplified into the following form.

Derivation of SVM (1)

SVM is a classic method in machine learning. Besides hard-margin SVM, it includes variants like soft-margin SVM and kernel tricks. This article mainly introduces the derivation of **hard-margin SVM**.

Solving Systems of Linear Equations (3)

The pseudoinverse discussed here is the **Moore-Penrose inverse matrix**.

Solving Systems of Linear Equations (2)

In the previous blog post, we discussed one scenario of linear equations where the number of unknowns is less than the number of equations, introducing the least squares method. In this post, we will cover another scenario where the number of equations is less than the number of unknowns. In this case, the system has infinitely many solutions, but there is only one solution closest to the origin, known as the **minimum norm solution** of the linear equations.

207. Course Schedule

This topic uses DFS and BFS to determine if a graph can be topologically sorted.

Solving Linear Equations (1)

In this post, we will discuss solving a specific case of linear equations, namely considering linear equations.

Numerical Computation in Machine Learning (1)

Machine learning algorithms often require extensive numerical computations, solving for approximations through iteration rather than analytical solutions. These algorithms typically involve optimization and solving linear equations. Since computers represent various floating-point numbers with limited precision, certain methods are needed to ensure computational accuracy.

Training a Simple Neural Network with TensorFlow

In this blog post, we use TensorFlow's Eager Execution to build models, eliminating the need to create Graphs and Sessions as before, making neural network training more convenient and faster. We will train a neural network using the Iris dataset as an example, with code from Google's tutorial.

Deep Learning on GeekCloud

Recently, I've been working on an image-related deep learning task assigned by my teacher. After debugging the code, I realized my laptop's memory (8GB) wasn't sufficient. Later, I discovered a very useful deep learning cloud service platform.

Radar + Camera Data Fusion in KITTI

The KITTI dataset offers a variety of data; here, we select the raw_data for integration.

Solving Optimization Problems with Inequality Constraints

Similar to solving optimization problems with only equality constraints discussed earlier, optimization problems with inequality constraints can also be solved using the Lagrange multiplier method.

Constructors in C++

Each class defines how its objects are initialized through one or more special member functions called **constructors**. The constructor's task is to initialize the data members of the class object, and it is executed whenever a class object is created.

Associative Containers in C++

Associated containers support efficient keyword lookup and access. The two main associated containers are `set` and `map`. Elements in a `map` are key-value pairs, where the keyword acts as an index and the value represents the data associated with the index. Elements in a `set` contain only a keyword. `Set` supports efficient keyword lookup operations, likely implemented using a hash table.

Derivation of Neural Network Backpropagation

In the training process of neural networks, the backpropagation algorithm is the core.

Sequential Containers in C++

A container is a collection of objects of a specific type. Sequence containers provide the ability to control the order of storage and access of elements.

Introduction to Decision Tree and Random Forest Algorithms

Decision trees are a method for classification and regression. This post focuses on decision trees used for classification. A decision tree has a tree-like structure and represents the process of classifying data based on features. It can be seen as a collection of if-then rules or as a conditional probability distribution defined over feature and class spaces. The main advantages are good model interpretability and fast classification speed. During training, a decision tree model is built using training data by minimizing a loss function. For prediction, new data is classified using the decision tree. Learning a decision tree typically involves three steps: feature selection, tree generation, and tree pruning. The concepts of decision trees mainly originate from Quinlan's ID3 algorithm (1986) and C4.5 algorithm (1993), as well as the CART algorithm proposed by Breiman et al. in 1984.

I/O Classes in C++

C++ does not handle input and output directly; instead, it uses a set of types defined in the standard library for IO operations. These types support reading from and writing to devices like files and console windows. Some types also allow memory IO, such as reading from and writing to strings.

Solving Optimization Problems with Equality Constraints

This article will discuss optimization problems for such shapes.

Dual Problems in Linear Programming

Every linear programming problem has a corresponding dual problem, which is also a linear programming problem. The dual of the dual problem is the original problem. The optimal solution of the original problem can be obtained from the dual problem. Sometimes, using dual theory to solve linear programming problems is simpler and provides a deeper understanding of the problem's nature. Inspired by dual theory, the performance of the simplex method has been improved, and some non-simplex methods for solving linear programming problems have emerged, which will not be detailed in this article.

Parameter Passing in C++ Functions

In a C++ program, when calling a function, you need to pass an argument to it. Apart from void, argument passing is divided into **pass by reference** and **pass by value**.

Simplex Algorithm for Solving Linear Programming Problems

In 1947, Dantzig introduced a method for solving linear programming problems, known today as the simplex method. This concise and efficient algorithm is hailed as one of the top ten algorithms of the 20th century with the greatest impact on scientific development and engineering practice.

Overview of Linear Programming

In optimization problems, there is a category known as linear programming problems, which are constrained optimization problems. Linear programming involves finding the extremum of a linear objective function under **linear constraints** (equalities or inequalities).

The `const` Keyword in C++

When programming, we often need to define a variable whose value doesn't change, such as pi=3.14, e=2.72, or the elastic modulus of a material. In these cases, the const keyword is used.