Frank's Blog

硬核工业的消费级降维

February 27, 2026

在Linus的书《Just for Fun》中，表示人做事情会经历三个阶段，先是生存、然后是社会地位、最后是娱乐。在深圳有一类公司，善于将工业级的硬核产品做成消费级的产品，将创作和制造变成生活中的一种娱乐。最知名的是大疆，将无人机从数十万的高门槛的专业设备，变成了几千块的智能飞行相机；还有最近起飞的拓竹，将3D打印从数万元且需深度调校的高端设备，变成了两三千块开箱即用的高速多色打印机；以及最近看到的一家Xmachine，将五轴CNC从上百万的工厂重型加工中心，变成了三五万块即可摆在桌面上的精密减材制造工坊。

Read

自媒体商业模式分析

February 15, 2026

自媒体可能是目前普通人最大的杠杆之一，门槛低且上限高，分发没有边际成本。这篇文章想从商业模式的角度对自媒体进行剖析，从内容的角度可以分为三种类型：内容直接变现，帮别人打广告，帮自己打广告

Read

压缩的边界

February 1, 2026

有些知识是可以序列化的，有些不能。如果你要做一盘西红柿炒鸡蛋，需要AI给你一份食谱，那你会收获一个极其详细的食谱，时间精确到秒，材料精确到克。但是换一个角度：你在家里请客人吃饭，做好了一盘西红柿炒鸡蛋后，AI无法告诉你盐放多了还是放少了，因为它既不知道客人的口味和身体状态，也不知道这盘菜的盐到底有多少。

Read

读书笔记：科技共和国

January 23, 2026

Palantir的CEO Alex Karp在2025年发布了一本新书《科技共和国》，年底在中国内地也发行了，第一时间买来读了一下，书中的观点代表了美国硅谷的右翼思潮，而且现实的美国政治中也随处可见这本书的影子

Read

Palantir与硅谷右翼

January 22, 2026

作为科技和军事爱好者，一直对硅谷的文化和历史很感兴趣，Palantir这家公司最近几年股价起飞，并因为独特的大数据+军事业务受到了广泛关注，而Alex Karp和Peter Thiel作为强势崛起的硅谷右翼势力的核心人物，他们的观点很大程度上反映了美国科技界的走向，甚至政治风向

Read

投资心得 - 资产分类与被忽略的债权

January 10, 2026

接触股票有3年左右了，第一年当韭菜散户在中概股亏得晕头转向，完全看不懂业务，全凭感觉买；第二年略有好转，在美股吃到了Reddit和Tesla的红利，年化做到40%+；第三年行情不错，吃到了Cloudflare和Google的红利，年化做到了50%+。这两年最大的提升在于能看懂公司业务组成了，对公司的商业模式和企业文化也有了更深的理解，但专业知识一直有所欠缺，打算写一些文章记录自己的学习心得

Read

大模型背后的数据供应商 - Surge AI

December 21, 2025

第一次知道Surge AI这家公司是看到Edwin Chen的播客访谈，正值他们第一次出来融资，Edwin极其务实高效的观点让人印象深刻

Read

豆包AI手机背后的VL模型

December 19, 2025

根据公开报道，豆包AI手机使用的模型是基于UI-TARS在手机上优化的闭源版本，UI-TARS是在阿里的Qwen2 VL上做SFT得来的，目前开源了7b的版本（Qwen2 VL开源了3b-72b的模型）。这里不再多介绍Qwen（Qwen2 VL其实也已经有了UI Operation的功能），主要关注UI-TARS模型在Qwen2 VL上的进一步改进，分数据和训练两部分。

Read

使用UTM标签来分析流量来源

April 4, 2024

我们在做推广的时候，通常会采用多个渠道：cold email、google广告投放、twitter推广、seo优化、社区内容等等，弄清楚我们流量来源和转化效果是至关重要的，这样可以帮助我们进一步优化营销策略。今天分享一种简单的方式来区分流量以及分析转化效果：

Read

jenni.ai的冷启动与增长策略

March 21, 2024

jenni.ai是用来辅助论文写作与阅读的工具，目前收入已经达到了$5M ARR，用户2.5M，并还在高速增长，作者表示预期能达到$10M~20M ARR，他们的CEO David Park非常诚意的分享了他们的收入和用户的增长策略，有很多值得学习的地方

Read

startup的相关福利

March 3, 2024

最近打算薅一下羊毛，梳理了国外大厂给的福利，主要是云和openai的token

Read

PyTorch训练加速的量化分析

November 3, 2020

本文从一个baseline出发，通过软件+硬件各种方法逐步对训练速度进行优化，最终将训练时间减少为1/8

Read

神经网络架构搜索(NAS)中的milestones

December 1, 2019

神经网络架构搜索(NAS)今年也是火的不行，本文简单梳理一下个人觉得比较有代表意义的工作，如果有错误或者遗漏欢迎大家指出hhhh

Read

在深度学习中喂饱gpu

August 12, 2019

前段时间训练了不少模型，发现并不是大力出奇迹，显卡越多越好，有时候1张v100和2张v100可能没有什么区别，后来发现瓶颈在其他地方，写篇文章来总结一下自己用过的一些小trick

Read

Learning to Push by Grasping: Using multiple tasks for effective learning

November 22, 2018

目前end-to-end的学习框架在机器人控制的领域内变得流行起来，这些框架的直接输入是状态/图像，然后直接输出预测的扭矩和动作参数。但是由于其对于数据的大量需求而受到了批判，并兴起了对于其可扩展性的讨论，即end-to-end的学习方式是否需要为每一个任务都单独建立一个模型？从直觉上来说任务间的共享是有帮助的，因为其对环境都需要有一些共同的理解。在该论文中尝试了数据驱动的end-to-end学习框架的下一步，即从特定任务的模型到多机器人任务的联合模型，得到了令人惊讶的结果：在同样数据量下多任务学习的效果要优于单任务学习。比如说对于grasp任务，2.5k的grasp数据+2.5k的push数据训练下的模型的表现要优于5k的grasp数据训练下的模型。

Read

Playing Atari with Deep Reinforcement Learning

November 17, 2018

这篇论文是Volodymyr Mnih在nips2013上的一篇论文，差不多是DQN的开山之作，另一篇是nature2015年的论文

Read

dataset-cityscapes

November 2, 2018

cityscapes通常被用作语义分割，里面的数据一共分为8个category，其中包含一个名为void的category，每个category中又有多个class，cityscapes一共有30个class，但是cityscapes编号过后的label一共有35种，其中也包含unlabeled等并没有算作class的label。

Read

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

October 30, 2018

之前的用于segmentation的网络要么速度比较慢，要么精度比较低，这里设计了一种EDANet模块，将asymmtric conv，dilated conv，dense connectivity相结合。在各个方面上都比FCN要好，并且不具有decoder structure，context module，post-processing scheme和pretrained model。在cityscapes和camvid上做了实验。

Read

Darts: Differentiable Architecture Search

October 22, 2018

这篇论文旨在挑战结构搜索，通过将该任务定义成一个可微分的形式，而不是像传统的做法：在离散的不可微分的空间中利用增强学习来实现结构搜索。该方法基于结构表示的连续松弛，允许使用梯度下降等高效的方法进行结构搜索。后续实验表明该算法在探索高性能的用于图像识别的CNN结构和语言建模的RNN结构上都有良好的表现，并且比现有的state-of-the-art非微分结构要快得多。

Read

Compressing Neural Networks with the Hashing Trick

October 15, 2018

深度网络在移有链接动设备上应用越来越多，一个dilemma变得越来越明显：深度学习的趋势是开发能够吸收更大数据集的模型，然而移动设备的存储空间有限，不能存储过大的模型，这里提出了一种HashedNets，通过减少神经网络的内部固有冗余来实现模型尺寸的减少。HashedNets利用一个低开销的哈希函数来将连接权重随机分组进不同的哈希桶，而同一个哈希桶里面的所有连接都使用同一个参数值，这些参数在标准的反向传播过程中被进行调整。这个哈希过程不会引入额外的内存开销。在不同的benchmark数据集上性能说明HashedNets可以在保留泛化性能的基础上明显减少存储需要。

Read

ShuffleNetV2

October 11, 2018

现在很多的网络设计在计算复杂度方面都只考虑了非直接度量（比如FLOPs），而对于直接度量（如速度等）并不只是由FLOPs来决定的，包括MAC（内存访问消耗）和平台特性都对速度有一定的影响。本文意在特定平台下进行直接度量，比仅仅考虑FLOPs要更好，并且在基于一系列控制实验下提出了一些对于高效率网络的指导准则，根据这些准则提出了ShuffleNetV2这样一种新的网络结构，全面的ablation experiments表明该模型在性能和精度的权衡上达到了state-of-the-art。

Read

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

October 10, 2018

本文介绍了一种很高效的网络ShuffleNet，其主要在于pointwise group conv和channel shuffle两种操作，可以在维持精度的时候大量减少计算消耗，在ImageNet和COCO上面的表现都超过了之前的网络

Read

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

October 4, 2018

针对移动和嵌入式视觉应用，本文提出了一种高效的模型称之为MobileNets，基于depthwise separable convolutions构造的一种轻量级神经网络。该模型使用两个超参数来平衡准确率和延迟，并针对二者的平衡在ImageNet上做了广泛的实验，与其他模型相比展现出了强大的性能。并通过实验展现了ImageNet在各种应用上的强大之处，包括目标检测，精细化分类，人脸属性和大范围地理定位等。

Read

InceptionV4总结

September 28, 2018

近些年，非常深的卷积神经网络在提升图像识别的性能表现上具有最大的促进作用。而Inception网络结构在具有很好的性能的同时还具有相对较低的计算消耗。最近的残差连接与传统结构的结合在2015 ILSVRC上取到了最好的结果，与InceptionV3的效果相近。考虑将Inception网络与残差连接相结合，充分的证据表明残差连接可以很大程度上加速Inception网络的训练，同样也有证据表明残差连接的Inception相比不带残差连接的几乎同样计算量的Inception网络性能要稍有优势。本文也提出了一些新的残差连接和不带残差连接的Inception网络，这些改变同样也明显改善了2012 ILSVRC的单帧分类性能。最后还提到了利用合适的激活缩放可以使非常宽的残差连接Inception网络的训练变得更加稳定。

Read

关于向量与矩阵的求导

September 21, 2018

Read

动态规划中股票问题的通用解法

September 14, 2018

有一类动态规划的问题是给定一个股票价格序列，然后计算买卖股票所能获得的最大收益，这类问题通常有很多变种，例如只允许交易一次，允许交易多次或者增收交易税等。即问题的最大收益通常由交易的时间和允许的最大交易次数（每次交易指一次买与一次卖的一个组合）决定的。

Read

凸集的定义与常见凸集

August 31, 2018

与前文讨论的只含等式约束的优化问题求解类似，含不等式约束的优化问题同样可以用拉格朗日乘子法进行求解

Read

SVM的推导(3)

August 24, 2018

Read

SVM的推导(2)

August 18, 2018

上一篇文章(1)我们讨论了硬间隔SVM的推导及其对偶形式，其对偶问题可以化简成以下形式

Read

SVM的推导(1)

August 10, 2018

SVM是机器学习中的一种经典方法，除了硬间隔SVM之外，还包括软间隔SVM，核技巧等SVM的变种，本文主要介绍**硬间隔SVM**的推导。

Read

求解线性方程组(3)

July 26, 2018

这里所介绍的伪逆是**Moore-Penrose逆矩阵**

Read

求解线性方程组(2)

July 21, 2018

上一篇博文介绍了线性方程组的情况之一，即未知数数量小于方程个数的情况，介绍了最小二乘法，在本文中将介绍线性方程组的另一种情况，即方程个数小于未知数数量的情况，此时方程组有无限多的解，但是最接近原点的解，即范数最小的解只有一个，也就是这里将会介绍的线性方程组的**最小范数解**。

Read

207. Course Schedule

July 20, 2018

该题目利用DFS和BFS来判断某个图是否能进行拓扑排序

Read

求解线性方程组(1)

July 20, 2018

在本文中将讨论线性方程组中的一种情况的求解，即考虑线性方程组

Read

机器学习中的数值计算(1)

July 14, 2018

机器学习算法通常需要大量的数值计算，即通过迭代求解近似值而非求得解析解。这些算法通常包括最优化和线性方程组的求解，在计算机中要通过有限位来表示各种浮点数是具有一定误差的，需要通过一些方法来保证我们的计算精度。

Read

利用TensorFlow训练一个简单的神经网络

July 6, 2018

我们在这里利用TensorFlow的Eager Execution 来构建模型，这样不用像以前一样创建Graph和Session了，可以使神经网络的训练更加方便快捷，下面以Iris数据集为例来训练一个神经网络，代码来自谷歌的教程。

Read

在极客云上进行深度学习

June 29, 2018

Read

KITTI的雷达+摄像头数据融合

June 15, 2018

KITTI的数据集有很多，我们在这里选取了其中的raw_data(原始数据)进行融合

Read

不等式约束的优化问题求解

June 8, 2018

与前文讨论的只含等式约束的优化问题求解类似，含不等式约束的优化问题同样可以用拉格朗日乘子法进行求解

Read

C++中的构造函数

June 2, 2018

每个类都分别定义了它的对象被初始化的方式，类通过一个或多个特殊的成员函数来控制其对象的初始化过程，这些函数就叫做**构造函数(constructor)**。构造函数的任务是初始化类对象的数据成员，无论何时只要类的对象被创建，就会执行构造函数。

Read

神经网络反向传播的推导

June 1, 2018

对于神经网络的训练过程而言，其反向传播算法是训练过程的核心

Read

C++中的关联容器

June 1, 2018

关联容器支持高效的关键字查找和访问，两个主要的关联容器是set和map。map中的元素是一些键值对(key-value)，关键字起着索引的作用，值则表示与索引相关联的数据，set中的元素只包含一个关键字。set支持高效的关键字查找操作，底层应该是用的哈希表来实现的。

Read

C++中顺序容器

May 25, 2018

一个容器就是一些特定类型对象的集合。顺序容器提供了控制元素存储和访问顺序的能力。

Read

决策树和随机森林算法简介

May 24, 2018

决策树（decision tree）是一种分类与回归方法，本文主要讨论用于分类的决策树，决策树的结构呈树形结构，在分类问题中，其代表基于特征对数据进行分类的过程，通常可以认为是if-then规则的集合，也可以认为是定义在特征空间与类空间上的条件概率分布。其主要优点是模型可读性好并且分类速度快。训练的时候，利用训练数据根据损失函数最小化的原则建立决策树模型。预测时对于新的数据，利用决策树进行分类。决策树的学习通常包括三个步骤：特征选择，生成决策树，对决策树进行剪枝。这些决策树的思想主要来自Quinlan在1986年提出的ID3算法和1993年提出的C4.5算法，以及Breiman等人在1984年提出的CART算法。

Read

C++中的IO类

May 18, 2018

C++语言不直接处理输入输出，而是通过一组定义在标准库中的类型来处理IO。这些类型支持从设备读取数据，向设备写入数据的IO操作，设备可以是文件，控制台窗口等。还有一些类型允许内存IO，即从string读取数据，向string写入数据等。

Read

等式约束的优化问题求解

May 18, 2018

本文将讨论下类形状的优化问题

Read

线性规划中的对偶问题

May 11, 2018

每个线性规划问题都有一个与之对应的对偶问题，对偶问题也是一个线性规划问题，并且对偶问题的对偶问题是原问题。原问题的最优解可以由对偶问题得到，有时候利用对偶理论求解线性规划问题更加简单，也更能了解问题的本质。在对偶理论的启发下，单纯形法的性能得到了改进，也出现了一些求解线性规划问题的非单纯形法，本文暂不详解。

Read

C++函数中的参数传递

May 4, 2018

在C++程序中，调用函数的时候需要向函数传入一个参数，除了空参数(void)之外，参数传递分为**引用传递**和**值传递**两种

Read

求解线性规划问题的单纯形算法

May 4, 2018

1947年，丹齐格提出了一种求解线性规划问题的方法，即今天所称的单纯形法，这是一种简洁且高效的算法，被誉为20世纪对科学发展和工程实践影响最大的十大算法之一。

Read

线性规划概述

April 27, 2018

在最优化问题中有一类问题被称作线性规划问题，属于有约束下的优化问题，线性规划是在**线性约束条件**下（等式或不等式）**求解线性目标函数极值**的问题。

Read

C++中的const关键字

April 26, 2018

在编程的时候我们常常需要定义一种变量，但是这种变量的值是不变的，例如定义pi=3.14，e=2.72或者定义一种材料的弹性模量等，这时候需要用到const关键字

Read

Consumer-Level Dimensional Reduction of Hardcore Industry

February 27, 2026

In Linus's book "Just for Fun," he describes three stages people go through: survival, social status, and entertainment. In Shenzhen, some companies excel at transforming industrial-grade products into consumer-grade ones, turning creation and manufacturing into a form of entertainment. The most famous is DJI, which turned drones from high-cost professional equipment into affordable smart flying cameras. Recently, TuoZhu has launched 3D printers that went from expensive, complex machines to affordable, ready-to-use high-speed multi-color printers. Another company, Xmachine, has transformed five-axis CNC machines from million-dollar factory equipment into precision manufacturing workshops that fit on a desktop for a fraction of the cost.

Read

Analysis of Social Media Business Models

February 15, 2026

Social media may be one of the biggest levers for ordinary people today, with low entry barriers and high potential, and distribution has no marginal cost. This article analyzes social media from a business model perspective, categorizing content into three types: direct content monetization, advertising for others, and self-promotion.

Read

Compressed Boundaries

February 1, 2026

Some knowledge can be serialized, while some cannot. If you want AI to provide a recipe for making scrambled eggs with tomatoes, you'll get an extremely detailed recipe, with time down to the second and ingredients measured to the gram. However, from another perspective: if you're hosting guests at home and have prepared a dish of scrambled eggs with tomatoes, AI cannot tell you if there's too much or too little salt, because it doesn't know the guests' taste preferences or health conditions, nor does it know the exact amount of salt in the dish.

Read

Reading Notes: The Republic of Technology

January 23, 2026

Palantir's CEO Alex Karp released a new book in 2025 titled "Tech Republic," which was also published in mainland China by the end of the year. I bought it immediately and read it. The book's viewpoints represent the right-wing ideology of Silicon Valley, and its influence is evident in current American politics.

Read

Palantir and the Silicon Valley Right Wing

January 22, 2026

As a tech and military enthusiast, I've always been interested in Silicon Valley's culture and history. Palantir has seen a stock surge in recent years and gained attention for its unique combination of big data and military business. Alex Karp and Peter Thiel, as central figures in the rising right-wing forces of Silicon Valley, significantly reflect the direction of the American tech industry and even political trends.

Read

Investment Insights - Asset Allocation and Overlooked Claims

January 10, 2026

I've been involved in stocks for about three years. In the first year, I was a clueless retail investor losing money on Chinese stocks, buying purely on intuition without understanding the business. The second year saw some improvement, benefiting from Reddit and Tesla gains in the U.S. market, achieving over 40% annualized returns. The third year was favorable, with gains from Cloudflare and Google, reaching over 50% annualized returns. The biggest improvement over these two years has been understanding company business structures and gaining a deeper insight into business models and corporate culture. However, I still lack professional knowledge, so I plan to write some articles to document my learning experiences.

Read

Data Suppliers Behind Large Models - Surge AI

December 21, 2025

First learned about Surge AI from Edwin Chen's podcast interview during their initial fundraising. Edwin's extremely pragmatic and efficient views were impressive.

Read

VL Model Behind Doubao AI Phone

December 19, 2025

According to public reports, the model used by Doubao AI phone is a closed-source version optimized for mobile based on UI-TARS. UI-TARS is derived from SFT on Alibaba's Qwen2 VL, with a 7b version open-sourced (Qwen2 VL has models ranging from 3b to 72b open-sourced). This post will not delve into Qwen (Qwen2 VL already includes UI Operation features), but will focus on further improvements of the UI-TARS model on Qwen2 VL, covering data and training aspects.

Read

Using UTM Tags to Analyze Traffic Sources

April 4, 2024

When promoting, we typically use multiple channels: cold email, Google ads, Twitter promotion, SEO optimization, community content, etc. Understanding our traffic sources and conversion effectiveness is crucial, as it helps us further optimize our marketing strategy. Today, I'll share a simple way to differentiate traffic and analyze conversion effectiveness:

Read

Cold Start and Growth Strategy for jenni.ai

March 21, 2024

jenni.ai is a tool for assisting with essay writing and reading, currently generating $5M ARR with 2.5M users and still growing rapidly. The author expects to reach $10M~$20M ARR. Their CEO, David Park, sincerely shared their revenue and user growth strategies, offering many valuable insights.

Read

Benefits Related to Startups

March 3, 2024

Recently, I plan to take advantage of some deals and have compiled the benefits offered by major international companies, mainly focusing on cloud services and OpenAI tokens.

Read

Quantitative Analysis of PyTorch Training Acceleration

November 3, 2020

This article starts with a baseline and gradually optimizes training speed through various software and hardware methods, ultimately reducing training time to 1/8.

Read

Milestones in Neural Architecture Search (NAS)

December 1, 2019

Neural Architecture Search (NAS) has been extremely popular this year. This post briefly outlines some of the works I find particularly representative. Feel free to point out any errors or omissions. hhhh

Read

Feeding the GPU in Deep Learning

August 12, 2019

Recently, I trained several models and found that more GPUs don't always lead to better results. Sometimes, there's no difference between using one V100 and two V100s. I later discovered the bottleneck was elsewhere. This article summarizes some tricks I've used.

Read

Learning to Push by Grasping: Using Multiple Tasks for Effective Learning

November 22, 2018

Currently, end-to-end learning frameworks are becoming popular in the field of robotic control. These frameworks take states/images as direct input and output predicted torque and action parameters. However, they have been criticized for their high data demands, sparking discussions about their scalability. Specifically, does end-to-end learning require a separate model for each task? Intuitively, sharing between tasks is beneficial because they require some common understanding of the environment. This paper explores the next step in data-driven end-to-end learning frameworks, moving from task-specific models to joint models for multiple robotic tasks, yielding surprising results: multi-task learning outperforms single-task learning with the same amount of data. For example, in the grasp task, a model trained with 2.5k grasp data and 2.5k push data performs better than a model trained with 5k grasp data alone.

Read

Playing Atari with Deep Reinforcement Learning

November 17, 2018

This paper by Volodymyr Mnih, presented at NIPS 2013, is essentially the pioneering work on DQN, along with another paper published in Nature in 2015.

Read

Cityscapes Dataset

November 2, 2018

Cityscapes is typically used for semantic segmentation and contains data divided into 8 categories, including one named "void." Each category has multiple classes, totaling 30 classes in Cityscapes. However, there are 35 labeled types after numbering, including labels like "unlabeled" that are not counted as classes.

Read

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

October 30, 2018

Previous segmentation networks were either slow or had low accuracy. Here, an EDANet module is designed, combining asymmetric conv, dilated conv, and dense connectivity. It outperforms FCN in all aspects and does not require a decoder structure, context module, post-processing scheme, or pretrained model. Experiments were conducted on Cityscapes and CamVid.

Read

Darts: Differentiable Architecture Search

October 22, 2018

This paper aims to challenge structure search by defining the task in a differentiable form, rather than using traditional methods that rely on reinforcement learning in a discrete, non-differentiable space. The approach is based on continuous relaxation of structure representation, allowing efficient methods like gradient descent for structure search. Subsequent experiments demonstrate that the algorithm performs well in exploring high-performance CNN structures for image recognition and RNN structures for language modeling, and is much faster than existing state-of-the-art non-differentiable structures.

Read

Compressing Neural Networks with the Hashing Trick

October 15, 2018

Deep networks are increasingly applied on mobile devices, highlighting a dilemma: while deep learning trends toward developing models that can absorb larger datasets, mobile devices have limited storage and cannot accommodate overly large models. HashedNets are introduced to reduce model size by minimizing inherent redundancy within neural networks. HashedNets use a low-cost hash function to randomly group connection weights into different hash buckets, where all connections in the same bucket share a single parameter value, adjusted during standard backpropagation. This hashing process does not incur additional memory overhead. Performance on various benchmark datasets demonstrates that HashedNets can significantly reduce storage requirements while maintaining generalization performance.

Read

ShuffleNetV2

October 11, 2018

Many network designs today focus on non-direct metrics like FLOPs for computational complexity, but direct metrics such as speed are influenced by more than just FLOPs, including MAC (memory access cost) and platform characteristics. This article aims to measure directly on specific platforms, which is more effective than only considering FLOPs. Through a series of controlled experiments, it proposes guidelines for efficient networks, leading to the development of a new architecture, ShuffleNetV2. Comprehensive ablation experiments demonstrate that this model achieves state-of-the-art performance in balancing efficiency and accuracy.

Read

ShuffleNet: An Ultra-Efficient Convolutional Neural Network for Mobile Devices

October 10, 2018

This article introduces an efficient network, ShuffleNet, which primarily uses pointwise group convolution and channel shuffle operations. These techniques significantly reduce computational costs while maintaining accuracy, outperforming previous networks on ImageNet and COCO.

Read

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision

October 4, 2018

For mobile and embedded vision applications, this blog post introduces an efficient model called MobileNets, a lightweight neural network constructed using depthwise separable convolutions. The model employs two hyperparameters to balance accuracy and latency, and extensive experiments on ImageNet demonstrate its powerful performance compared to other models. Experiments also showcase ImageNet's strengths in various applications, including object detection, fine-grained classification, facial attributes, and large-scale geolocation.

Read

InceptionV4 Summary

September 28, 2018

In recent years, very deep convolutional neural networks have significantly enhanced image recognition performance. The Inception network structure offers excellent performance with relatively low computational cost. The combination of recent residual connections with traditional structures achieved the best results at the 2015 ILSVRC, comparable to InceptionV3. Integrating Inception networks with residual connections has been shown to significantly accelerate the training of Inception networks. There is also evidence that Inception networks with residual connections perform slightly better than those without, despite having nearly the same computational load. This article introduces some new Inception networks with and without residual connections, which also noticeably improved single-frame classification performance in the 2012 ILSVRC. Lastly, it mentions that using appropriate activation scaling can make training very wide residual connection Inception networks more stable.

Read

Derivatives of Vectors and Matrices

September 21, 2018

In machine learning algorithms, you'll encounter numerous matrix-related differentiation and derivation tasks. Here, we introduce some common differentiation formulas related to matrices and vectors.

Read

A General Solution to Stock Problems in Dynamic Programming

September 14, 2018

There is a type of dynamic programming problem where you are given a stock price sequence and need to calculate the maximum profit from buying and selling stocks. These problems often have many variations, such as allowing only one transaction, multiple transactions, or imposing a transaction tax. The maximum profit is usually determined by the timing of the trades and the allowed maximum number of transactions (each transaction being a combination of one buy and one sell).

Read

Definition of Convex Sets and Common Convex Sets

August 31, 2018

Read

Derivation of SVM (3)

August 24, 2018

In the previous post, we introduced the derivation of hard-margin SVM. This article will continue with the mathematical derivation of soft-margin SVM, which allows for some misclassification when samples are not linearly separable.

Read

Derivation of SVM (2)

August 18, 2018

In the previous article (1), we discussed the derivation of hard-margin SVM and its dual form, which can be simplified into the following form.

Read

Derivation of SVM (1)

August 10, 2018

SVM is a classic method in machine learning. Besides hard-margin SVM, it includes variants like soft-margin SVM and kernel tricks. This article mainly introduces the derivation of **hard-margin SVM**.

Read

Solving Systems of Linear Equations (3)

July 26, 2018

The pseudoinverse discussed here is the **Moore-Penrose inverse matrix**.

Read

Solving Systems of Linear Equations (2)

July 21, 2018

In the previous blog post, we discussed one scenario of linear equations where the number of unknowns is less than the number of equations, introducing the least squares method. In this post, we will cover another scenario where the number of equations is less than the number of unknowns. In this case, the system has infinitely many solutions, but there is only one solution closest to the origin, known as the **minimum norm solution** of the linear equations.

Read

207. Course Schedule

July 20, 2018

This topic uses DFS and BFS to determine if a graph can be topologically sorted.

Read

Solving Linear Equations (1)

July 20, 2018

In this post, we will discuss solving a specific case of linear equations, namely considering linear equations.

Read

Numerical Computation in Machine Learning (1)

July 14, 2018

Machine learning algorithms often require extensive numerical computations, solving for approximations through iteration rather than analytical solutions. These algorithms typically involve optimization and solving linear equations. Since computers represent various floating-point numbers with limited precision, certain methods are needed to ensure computational accuracy.

Read

Training a Simple Neural Network with TensorFlow

July 6, 2018

In this blog post, we use TensorFlow's Eager Execution to build models, eliminating the need to create Graphs and Sessions as before, making neural network training more convenient and faster. We will train a neural network using the Iris dataset as an example, with code from Google's tutorial.

Read

Deep Learning on Geek Cloud

June 29, 2018

Recently, I've been working on an image-related deep learning task assigned by my teacher. After debugging the code, I realized my laptop's memory (8GB) wasn't sufficient. Later, I discovered a very useful deep learning cloud service platform.

Read

Radar + Camera Data Fusion in KITTI

June 15, 2018

The KITTI dataset offers a variety of data; here, we select the raw_data for integration.

Read

Solving Optimization Problems with Inequality Constraints

June 8, 2018

Read

Constructors in C++

June 2, 2018

Each class defines how its objects are initialized through one or more special member functions called **constructors**. The constructor's task is to initialize the data members of the class object, and it is executed whenever a class object is created.

Read

Derivation of Neural Network Backpropagation

June 1, 2018

In the training process of neural networks, the backpropagation algorithm is the core.

Read

Associative Containers in C++

June 1, 2018

Associated containers support efficient keyword lookup and access. The two main associated containers are `set` and `map`. Elements in a `map` are key-value pairs, where the keyword acts as an index and the value represents the data associated with the index. Elements in a `set` contain only a keyword. `Set` supports efficient keyword lookup operations, likely implemented using a hash table.

Read

Sequential Containers in C++

May 25, 2018

A container is a collection of objects of a specific type. Sequence containers provide the ability to control the order of storage and access of elements.

Read

Introduction to Decision Tree and Random Forest Algorithms

May 24, 2018

Decision trees are a method for classification and regression. This post focuses on decision trees used for classification. A decision tree has a tree-like structure and represents the process of classifying data based on features. It can be seen as a collection of if-then rules or as a conditional probability distribution defined over feature and class spaces. The main advantages are good model interpretability and fast classification speed. During training, a decision tree model is built using training data by minimizing a loss function. For prediction, new data is classified using the decision tree. Learning a decision tree typically involves three steps: feature selection, tree generation, and tree pruning. The concepts of decision trees mainly originate from Quinlan's ID3 algorithm (1986) and C4.5 algorithm (1993), as well as the CART algorithm proposed by Breiman et al. in 1984.

Read

I/O Classes in C++

May 18, 2018

C++ does not handle input and output directly; instead, it uses a set of types defined in the standard library for IO operations. These types support reading from and writing to devices like files and console windows. Some types also allow memory IO, such as reading from and writing to strings.

Read

Solving Optimization Problems with Equality Constraints

May 18, 2018

This article will discuss optimization problems for such shapes.

Read

Dual Problems in Linear Programming

May 11, 2018

Every linear programming problem has a corresponding dual problem, which is also a linear programming problem. The dual of the dual problem is the original problem. The optimal solution of the original problem can be obtained from the dual problem. Sometimes, using dual theory to solve linear programming problems is simpler and provides a deeper understanding of the problem's nature. Inspired by dual theory, the performance of the simplex method has been improved, and some non-simplex methods for solving linear programming problems have emerged, which will not be detailed in this article.

Read

Parameter Passing in C++ Functions

May 4, 2018

In a C++ program, when calling a function, you need to pass an argument to it. Apart from void, argument passing is divided into **pass by reference** and **pass by value**.

Read

Simplex Algorithm for Solving Linear Programming Problems

May 4, 2018

In 1947, Dantzig introduced a method for solving linear programming problems, known today as the simplex method. This concise and efficient algorithm is hailed as one of the top ten algorithms of the 20th century with the greatest impact on scientific development and engineering practice.

Read

Overview of Linear Programming

April 27, 2018

In optimization problems, there is a category known as linear programming problems, which are constrained optimization problems. Linear programming involves finding the extremum of a linear objective function under **linear constraints** (equalities or inequalities).

Read

The `const` Keyword in C++

April 26, 2018

When programming, we often need to define a variable whose value doesn't change, such as pi=3.14, e=2.72, or the elastic modulus of a material. In these cases, the const keyword is used.

Read