Home
Frank's Blog
Thoughts on technology, startups and life
压缩的边界
February 1, 2026
有些知识是可以序列化的,有些不能。如果你要做一盘西红柿炒鸡蛋,需要AI给你一份食谱,那你会收获一个极其详细的食谱,时间精确到秒,材料精确到克。但是换一个角度:你在家里请客人吃饭,做好了一盘西红柿炒鸡蛋后,AI无法告诉你盐放多了还是放少了,因为它既不知道客人的口味和身体状态,也不知道这盘菜的盐到底有多少。
Read
读书笔记:科技共和国
January 23, 2026
Palantir的CEO Alex Karp在2025年发布了一本新书《科技共和国》,年底在中国内地也发行了,第一时间买来读了一下,书中的观点代表了美国硅谷的右翼思潮,而且现实的美国政治中也随处可见这本书的影子
Read
Palantir与硅谷右翼
January 22, 2026
作为科技和军事爱好者,一直对硅谷的文化和历史很感兴趣,Palantir这家公司最近几年股价起飞,并因为独特的大数据+军事业务受到了广泛关注,而Alex Karp和Peter Thiel作为强势崛起的硅谷右翼势力的核心人物,他们的观点很大程度上反映了美国科技界的走向,甚至政治风向
Read
投资心得 - 资产分类与被忽略的债权
January 10, 2026
接触股票有3年左右了,第一年当韭菜散户在中概股亏得晕头转向,完全看不懂业务,全凭感觉买;第二年略有好转,在美股吃到了Reddit和Tesla的红利,年化做到40%+;第三年行情不错,吃到了Cloudflare和Google的红利,年化做到了50%+。这两年最大的提升在于能看懂公司业务组成了,对公司的商业模式和企业文化也有了更深的理解,但专业知识一直有所欠缺,打算写一些文章记录自己的学习心得
Read
大模型背后的数据供应商 - Surge AI
December 21, 2025
第一次知道Surge AI这家公司是看到Edwin Chen的播客访谈,正值他们第一次出来融资,Edwin极其务实高效的观点让人印象深刻
Read
豆包AI手机背后的VL模型
December 19, 2025
根据公开报道,豆包AI手机使用的模型是基于UI-TARS在手机上优化的闭源版本,UI-TARS是在阿里的Qwen2 VL上做SFT得来的,目前开源了7b的版本(Qwen2 VL开源了3b-72b的模型)。这里不再多介绍Qwen(Qwen2 VL其实也已经有了UI Operation的功能),主要关注UI-TARS模型在Qwen2 VL上的进一步改进,分数据和训练两部分。
Read
使用UTM标签来分析流量来源
April 4, 2024
我们在做推广的时候,通常会采用多个渠道:cold email、google广告投放、twitter推广、seo优化、社区内容等等,弄清楚我们流量来源和转化效果是至关重要的,这样可以帮助我们进一步优化营销策略。今天分享一种简单的方式来区分流量以及分析转化效果:
Read
jenni.ai的冷启动与增长策略
March 21, 2024
jenni.ai是用来辅助论文写作与阅读的工具,目前收入已经达到了$5M ARR,用户2.5M,并还在高速增长,作者表示预期能达到$10M~20M ARR,他们的CEO David Park非常诚意的分享了他们的收入和用户的增长策略,有很多值得学习的地方
Read
startup的相关福利
March 3, 2024
最近打算薅一下羊毛,梳理了国外大厂给的福利,主要是云和openai的token
Read
PyTorch训练加速的量化分析
November 3, 2020
本文从一个baseline出发,通过软件+硬件各种方法逐步对训练速度进行优化,最终将训练时间减少为1/8
Read
神经网络架构搜索(NAS)中的milestones
December 1, 2019
神经网络架构搜索(NAS)今年也是火的不行,本文简单梳理一下个人觉得比较有代表意义的工作,如果有错误或者遗漏欢迎大家指出hhhh
Read
在深度学习中喂饱gpu
August 12, 2019
前段时间训练了不少模型,发现并不是大力出奇迹,显卡越多越好,有时候1张v100和2张v100可能没有什么区别,后来发现瓶颈在其他地方,写篇文章来总结一下自己用过的一些小trick
Read
Learning to Push by Grasping: Using multiple tasks for effective learning
November 22, 2018
目前end-to-end的学习框架在机器人控制的领域内变得流行起来,这些框架的直接输入是状态/图像,然后直接输出预测的扭矩和动作参数。但是由于其对于数据的大量需求而受到了批判,并兴起了对于其可扩展性的讨论,即end-to-end的学习方式是否需要为每一个任务都单独建立一个模型?从直觉上来说任务间的共享是有帮助的,因为其对环境都需要有一些共同的理解。在该论文中尝试了数据驱动的end-to-end学习框架的下一步,即从特定任务的模型到多机器人任务的联合模型,得到了令人惊讶的结果:在同样数据量下多任务学习的效果要优于单任务学习。比如说对于grasp任务,2.5k的grasp数据+2.5k的push数据训练下的模型的表现要优于5k的grasp数据训练下的模型。
Read
Playing Atari with Deep Reinforcement Learning
November 17, 2018
这篇论文是Volodymyr Mnih在nips2013上的一篇论文,差不多是DQN的开山之作,另一篇是nature2015年的论文
Read
dataset-cityscapes
November 2, 2018
cityscapes通常被用作语义分割,里面的数据一共分为8个category,其中包含一个名为void的category,每个category中又有多个class,cityscapes一共有30个class,但是cityscapes编号过后的label一共有35种,其中也包含unlabeled等并没有算作class的label。
Read
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
October 30, 2018
之前的用于segmentation的网络要么速度比较慢,要么精度比较低,这里设计了一种EDANet模块,将asymmtric conv,dilated conv,dense connectivity相结合。在各个方面上都比FCN要好,并且不具有decoder structure,context module,post-processing scheme和pretrained model。在cityscapes和camvid上做了实验。
Read
Darts: Differentiable Architecture Search
October 22, 2018
这篇论文旨在挑战结构搜索,通过将该任务定义成一个可微分的形式,而不是像传统的做法:在离散的不可微分的空间中利用增强学习来实现结构搜索。该方法基于结构表示的连续松弛,允许使用梯度下降等高效的方法进行结构搜索。后续实验表明该算法在探索高性能的用于图像识别的CNN结构和语言建模的RNN结构上都有良好的表现,并且比现有的state-of-the-art非微分结构要快得多。
Read
Compressing Neural Networks with the Hashing Trick
October 15, 2018
深度网络在移有链接动设备上应用越来越多,一个dilemma变得越来越明显:深度学习的趋势是开发能够吸收更大数据集的模型,然而移动设备的存储空间有限,不能存储过大的模型,这里提出了一种HashedNets,通过减少神经网络的内部固有冗余来实现模型尺寸的减少。HashedNets利用一个低开销的哈希函数来将连接权重随机分组进不同的哈希桶,而同一个哈希桶里面的所有连接都使用同一个参数值,这些参数在标准的反向传播过程中被进行调整。这个哈希过程不会引入额外的内存开销。在不同的benchmark数据集上性能说明HashedNets可以在保留泛化性能的基础上明显减少存储需要。
Read
ShuffleNetV2
October 11, 2018
现在很多的网络设计在计算复杂度方面都只考虑了非直接度量(比如FLOPs),而对于直接度量(如速度等)并不只是由FLOPs来决定的,包括MAC(内存访问消耗)和平台特性都对速度有一定的影响。本文意在特定平台下进行直接度量,比仅仅考虑FLOPs要更好,并且在基于一系列控制实验下提出了一些对于高效率网络的指导准则,根据这些准则提出了ShuffleNetV2这样一种新的网络结构,全面的ablation experiments表明该模型在性能和精度的权衡上达到了state-of-the-art。
Read
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
October 10, 2018
本文介绍了一种很高效的网络ShuffleNet,其主要在于pointwise group conv和channel shuffle两种操作,可以在维持精度的时候大量减少计算消耗,在ImageNet和COCO上面的表现都超过了之前的网络
Read
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
October 4, 2018
针对移动和嵌入式视觉应用,本文提出了一种高效的模型称之为MobileNets,基于depthwise separable convolutions构造的一种轻量级神经网络。该模型使用两个超参数来平衡准确率和延迟,并针对二者的平衡在ImageNet上做了广泛的实验,与其他模型相比展现出了强大的性能。并通过实验展现了ImageNet在各种应用上的强大之处,包括目标检测,精细化分类,人脸属性和大范围地理定位等。
Read
InceptionV4总结
September 28, 2018
近些年,非常深的卷积神经网络在提升图像识别的性能表现上具有最大的促进作用。而Inception网络结构在具有很好的性能的同时还具有相对较低的计算消耗。最近的残差连接与传统结构的结合在2015 ILSVRC上取到了最好的结果,与InceptionV3的效果相近。考虑将Inception网络与残差连接相结合,充分的证据表明残差连接可以很大程度上加速Inception网络的训练,同样也有证据表明残差连接的Inception相比不带残差连接的几乎同样计算量的Inception网络性能要稍有优势。本文也提出了一些新的残差连接和不带残差连接的Inception网络,这些改变同样也明显改善了2012 ILSVRC的单帧分类性能。最后还提到了利用合适的激活缩放可以使非常宽的残差连接Inception网络的训练变得更加稳定。
Read
关于向量与矩阵的求导
September 21, 2018
机器学习的算法中会遇到大量的与矩阵相关的微分与求导,在这里介绍一些常见的矩阵和向量相关的求导公式。
Read
动态规划中股票问题的通用解法
September 14, 2018
有一类动态规划的问题是给定一个股票价格序列,然后计算买卖股票所能获得的最大收益,这类问题通常有很多变种,例如只允许交易一次,允许交易多次或者增收交易税等。即问题的最大收益通常由交易的时间和允许的最大交易次数(每次交易指一次买与一次卖的一个组合)决定的。
Read
凸集的定义与常见凸集
August 31, 2018
与前文讨论的只含等式约束的优化问题求解类似,含不等式约束的优化问题同样可以用拉格朗日乘子法进行求解
Read
SVM的推导(3)
August 24, 2018
前文介绍了硬间隔SVM的相关推导,本文将继续介绍软间隔SVM的数学推导,即在样本不是线性可分的情况下,允许一部分样本错误分类的SVM
Read
SVM的推导(2)
August 18, 2018
上一篇文章(1)我们讨论了硬间隔SVM的推导及其对偶形式,其对偶问题可以化简成以下形式
Read
SVM的推导(1)
August 10, 2018
SVM是机器学习中的一种经典方法,除了硬间隔SVM之外,还包括软间隔SVM,核技巧等SVM的变种,本文主要介绍**硬间隔SVM**的推导。
Read
求解线性方程组(3)
July 26, 2018
这里所介绍的伪逆是**Moore-Penrose逆矩阵**
Read
求解线性方程组(2)
July 21, 2018
上一篇博文介绍了线性方程组的情况之一,即未知数数量小于方程个数的情况,介绍了最小二乘法,在本文中将介绍线性方程组的另一种情况,即方程个数小于未知数数量的情况,此时方程组有无限多的解,但是最接近原点的解,即范数最小的解只有一个,也就是这里将会介绍的线性方程组的**最小范数解**。
Read
207. Course Schedule
July 20, 2018
该题目利用DFS和BFS来判断某个图是否能进行拓扑排序
Read
求解线性方程组(1)
July 20, 2018
在本文中将讨论线性方程组中的一种情况的求解,即考虑线性方程组
Read
机器学习中的数值计算(1)
July 14, 2018
机器学习算法通常需要大量的数值计算,即通过迭代求解近似值而非求得解析解。这些算法通常包括最优化和线性方程组的求解,在计算机中要通过有限位来表示各种浮点数是具有一定误差的,需要通过一些方法来保证我们的计算精度。
Read
利用TensorFlow训练一个简单的神经网络
July 6, 2018
我们在这里利用TensorFlow的Eager Execution 来构建模型,这样不用像以前一样创建Graph和Session了,可以使神经网络的训练更加方便快捷,下面以Iris数据集为例来训练一个神经网络,代码来自谷歌的教程。
Read
在极客云上进行深度学习
June 29, 2018
最近在做老师给的一个图像相关的深度学习任务,代码调试后发现电脑内存不够(8g笔记本),后来发现了一个很好用的深度学习云服务平台
Read
KITTI的雷达+摄像头数据融合
June 15, 2018
KITTI的数据集有很多,我们在这里选取了其中的raw_data(原始数据)进行融合
Read
不等式约束的优化问题求解
June 8, 2018
与前文讨论的只含等式约束的优化问题求解类似,含不等式约束的优化问题同样可以用拉格朗日乘子法进行求解
Read
C++中的构造函数
June 2, 2018
每个类都分别定义了它的对象被初始化的方式,类通过一个或多个特殊的成员函数来控制其对象的初始化过程,这些函数就叫做**构造函数(constructor)**。构造函数的任务是初始化类对象的数据成员,无论何时只要类的对象被创建,就会执行构造函数。
Read
神经网络反向传播的推导
June 1, 2018
对于神经网络的训练过程而言,其反向传播算法是训练过程的核心
Read
C++中的关联容器
June 1, 2018
关联容器支持高效的关键字查找和访问,两个主要的关联容器是set和map。map中的元素是一些键值对(key-value),关键字起着索引的作用,值则表示与索引相关联的数据,set中的元素只包含一个关键字。set支持高效的关键字查找操作,底层应该是用的哈希表来实现的。
Read
C++中顺序容器
May 25, 2018
一个容器就是一些特定类型对象的集合。顺序容器提供了控制元素存储和访问顺序的能力。
Read
决策树和随机森林算法简介
May 24, 2018
决策树(decision tree)是一种分类与回归方法,本文主要讨论用于分类的决策树,决策树的结构呈树形结构,在分类问题中,其代表基于特征对数据进行分类的过程,通常可以认为是if-then规则的集合,也可以认为是定义在特征空间与类空间上的条件概率分布。其主要优点是模型可读性好并且分类速度快。训练的时候,利用训练数据根据损失函数最小化的原则建立决策树模型。预测时对于新的数据,利用决策树进行分类。决策树的学习通常包括三个步骤:特征选择,生成决策树,对决策树进行剪枝。这些决策树的思想主要来自Quinlan在1986年提出的ID3算法和1993年提出的C4.5算法,以及Breiman等人在1984年提出的CART算法。
Read
C++中的IO类
May 18, 2018
C++语言不直接处理输入输出,而是通过一组定义在标准库中的类型来处理IO。这些类型支持从设备读取数据,向设备写入数据的IO操作,设备可以是文件,控制台窗口等。还有一些类型允许内存IO,即从string读取数据,向string写入数据等。
Read
等式约束的优化问题求解
May 18, 2018
本文将讨论下类形状的优化问题
Read
线性规划中的对偶问题
May 11, 2018
每个线性规划问题都有一个与之对应的对偶问题,对偶问题也是一个线性规划问题,并且对偶问题的对偶问题是原问题。原问题的最优解可以由对偶问题得到,有时候利用对偶理论求解线性规划问题更加简单,也更能了解问题的本质。在对偶理论的启发下,单纯形法的性能得到了改进,也出现了一些求解线性规划问题的非单纯形法,本文暂不详解。
Read
C++函数中的参数传递
May 4, 2018
在C++程序中,调用函数的时候需要向函数传入一个参数,除了空参数(void)之外,参数传递分为**引用传递**和**值传递**两种
Read
求解线性规划问题的单纯形算法
May 4, 2018
1947年,丹齐格提出了一种求解线性规划问题的方法,即今天所称的单纯形法,这是一种简洁且高效的算法,被誉为20世纪对科学发展和工程实践影响最大的十大算法之一。
Read
线性规划概述
April 27, 2018
在最优化问题中有一类问题被称作线性规划问题,属于有约束下的优化问题,线性规划是在**线性约束条件**下(等式或不等式)**求解线性目标函数极值**的问题。
Read
C++中的const关键字
April 26, 2018
在编程的时候我们常常需要定义一种变量,但是这种变量的值是不变的,例如定义pi=3.14,e=2.72或者定义一种材料的弹性模量等,这时候需要用到const关键字
Read
Compressed Boundaries
February 1, 2026
Some knowledge can be serialized, while some cannot. If you want AI to provide a recipe for making scrambled eggs with tomatoes, you'll get an extremely detailed recipe, with time down to the second and ingredients measured to the gram. However, from another perspective: if you're hosting guests at home and have prepared a dish of scrambled eggs with tomatoes, AI cannot tell you if there's too much or too little salt, because it doesn't know the guests' taste preferences or health conditions, nor does it know the exact amount of salt in the dish.
Read
Reading Notes: The Republic of Technology
January 23, 2026
Palantir's CEO Alex Karp released a new book in 2025 titled "Tech Republic," which was also published in mainland China by the end of the year. I bought it immediately and read it. The book's viewpoints represent the right-wing ideology of Silicon Valley, and its influence is evident in current American politics.
Read
Palantir and the Silicon Valley Right Wing
January 22, 2026
As a tech and military enthusiast, I've always been interested in Silicon Valley's culture and history. Palantir has seen a stock surge in recent years and gained attention for its unique combination of big data and military business. Alex Karp and Peter Thiel, as central figures in the rising right-wing forces of Silicon Valley, significantly reflect the direction of the American tech industry and even political trends.
Read
Investment Insights - Asset Allocation and Overlooked Claims
January 10, 2026
I've been involved in stocks for about three years. In the first year, I was a clueless retail investor losing money on Chinese stocks, buying purely on intuition without understanding the business. The second year saw some improvement, benefiting from Reddit and Tesla gains in the U.S. market, achieving over 40% annualized returns. The third year was favorable, with gains from Cloudflare and Google, reaching over 50% annualized returns. The biggest improvement over these two years has been understanding company business structures and gaining a deeper insight into business models and corporate culture. However, I still lack professional knowledge, so I plan to write some articles to document my learning experiences.
Read
Data Suppliers Behind Large Models - Surge AI
December 21, 2025
First learned about Surge AI from Edwin Chen's podcast interview during their initial fundraising. Edwin's extremely pragmatic and efficient views were impressive.
Read
VL Model Behind Doubao AI Phone
December 19, 2025
According to public reports, the model used by Doubao AI phone is a closed-source version optimized for mobile based on UI-TARS. UI-TARS is derived from SFT on Alibaba's Qwen2 VL, with a 7b version open-sourced (Qwen2 VL has models ranging from 3b to 72b open-sourced). This post will not delve into Qwen (Qwen2 VL already includes UI Operation features), but will focus on further improvements of the UI-TARS model on Qwen2 VL, covering data and training aspects.
Read
Using UTM Tags to Analyze Traffic Sources
April 4, 2024
When promoting, we typically use multiple channels: cold email, Google ads, Twitter promotion, SEO optimization, community content, etc. Understanding our traffic sources and conversion effectiveness is crucial, as it helps us further optimize our marketing strategy. Today, I'll share a simple way to differentiate traffic and analyze conversion effectiveness:
Read
Cold Start and Growth Strategy for jenni.ai
March 21, 2024
jenni.ai is a tool for assisting with essay writing and reading, currently generating $5M ARR with 2.5M users and still growing rapidly. The author expects to reach $10M~$20M ARR. Their CEO, David Park, sincerely shared their revenue and user growth strategies, offering many valuable insights.
Read
Benefits Related to Startups
March 3, 2024
Recently, I plan to take advantage of some deals and have compiled the benefits offered by major international companies, mainly focusing on cloud services and OpenAI tokens.
Read
Quantitative Analysis of PyTorch Training Acceleration
November 3, 2020
This article starts with a baseline and gradually optimizes training speed through various software and hardware methods, ultimately reducing training time to 1/8.
Read
Milestones in Neural Architecture Search (NAS)
December 1, 2019
Neural Architecture Search (NAS) has been extremely popular this year. This post briefly outlines some of the works I find particularly representative. Feel free to point out any errors or omissions. hhhh
Read
Feeding the GPU in Deep Learning
August 12, 2019
Recently, I trained several models and found that more GPUs don't always lead to better results. Sometimes, there's no difference between using one V100 and two V100s. I later discovered the bottleneck was elsewhere. This article summarizes some tricks I've used.
Read
Learning to Push by Grasping: Using Multiple Tasks for Effective Learning
November 22, 2018
Currently, end-to-end learning frameworks are becoming popular in the field of robotic control. These frameworks take states/images as direct input and output predicted torque and action parameters. However, they have been criticized for their high data demands, sparking discussions about their scalability. Specifically, does end-to-end learning require a separate model for each task? Intuitively, sharing between tasks is beneficial because they require some common understanding of the environment. This paper explores the next step in data-driven end-to-end learning frameworks, moving from task-specific models to joint models for multiple robotic tasks, yielding surprising results: multi-task learning outperforms single-task learning with the same amount of data. For example, in the grasp task, a model trained with 2.5k grasp data and 2.5k push data performs better than a model trained with 5k grasp data alone.
Read
Playing Atari with Deep Reinforcement Learning
November 17, 2018
This paper by Volodymyr Mnih, presented at NIPS 2013, is essentially the pioneering work on DQN, along with another paper published in Nature in 2015.
Read
Cityscapes Dataset
November 2, 2018
Cityscapes is typically used for semantic segmentation and contains data divided into 8 categories, including one named "void." Each category has multiple classes, totaling 30 classes in Cityscapes. However, there are 35 labeled types after numbering, including labels like "unlabeled" that are not counted as classes.
Read
Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation
October 30, 2018
Previous segmentation networks were either slow or had low accuracy. Here, an EDANet module is designed, combining asymmetric conv, dilated conv, and dense connectivity. It outperforms FCN in all aspects and does not require a decoder structure, context module, post-processing scheme, or pretrained model. Experiments were conducted on Cityscapes and CamVid.
Read
Darts: Differentiable Architecture Search
October 22, 2018
This paper aims to challenge structure search by defining the task in a differentiable form, rather than using traditional methods that rely on reinforcement learning in a discrete, non-differentiable space. The approach is based on continuous relaxation of structure representation, allowing efficient methods like gradient descent for structure search. Subsequent experiments demonstrate that the algorithm performs well in exploring high-performance CNN structures for image recognition and RNN structures for language modeling, and is much faster than existing state-of-the-art non-differentiable structures.
Read
Compressing Neural Networks with the Hashing Trick
October 15, 2018
Deep networks are increasingly applied on mobile devices, highlighting a dilemma: while deep learning trends toward developing models that can absorb larger datasets, mobile devices have limited storage and cannot accommodate overly large models. HashedNets are introduced to reduce model size by minimizing inherent redundancy within neural networks. HashedNets use a low-cost hash function to randomly group connection weights into different hash buckets, where all connections in the same bucket share a single parameter value, adjusted during standard backpropagation. This hashing process does not incur additional memory overhead. Performance on various benchmark datasets demonstrates that HashedNets can significantly reduce storage requirements while maintaining generalization performance.
Read
ShuffleNetV2
October 11, 2018
Many network designs today focus on non-direct metrics like FLOPs for computational complexity, but direct metrics such as speed are influenced by more than just FLOPs, including MAC (memory access cost) and platform characteristics. This article aims to measure directly on specific platforms, which is more effective than only considering FLOPs. Through a series of controlled experiments, it proposes guidelines for efficient networks, leading to the development of a new architecture, ShuffleNetV2. Comprehensive ablation experiments demonstrate that this model achieves state-of-the-art performance in balancing efficiency and accuracy.
Read
ShuffleNet: An Ultra-Efficient Convolutional Neural Network for Mobile Devices
October 10, 2018
This article introduces an efficient network, ShuffleNet, which primarily uses pointwise group convolution and channel shuffle operations. These techniques significantly reduce computational costs while maintaining accuracy, outperforming previous networks on ImageNet and COCO.
Read
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
October 4, 2018
For mobile and embedded vision applications, this blog post introduces an efficient model called MobileNets, a lightweight neural network constructed using depthwise separable convolutions. The model employs two hyperparameters to balance accuracy and latency, and extensive experiments on ImageNet demonstrate its powerful performance compared to other models. Experiments also showcase ImageNet's strengths in various applications, including object detection, fine-grained classification, facial attributes, and large-scale geolocation.
Read
InceptionV4 Summary
September 28, 2018
In recent years, very deep convolutional neural networks have significantly enhanced image recognition performance. The Inception network structure offers excellent performance with relatively low computational cost. The combination of recent residual connections with traditional structures achieved the best results at the 2015 ILSVRC, comparable to InceptionV3. Integrating Inception networks with residual connections has been shown to significantly accelerate the training of Inception networks. There is also evidence that Inception networks with residual connections perform slightly better than those without, despite having nearly the same computational load. This article introduces some new Inception networks with and without residual connections, which also noticeably improved single-frame classification performance in the 2012 ILSVRC. Lastly, it mentions that using appropriate activation scaling can make training very wide residual connection Inception networks more stable.
Read
Derivatives of Vectors and Matrices
September 21, 2018
In machine learning algorithms, you'll encounter numerous matrix-related differentiation and derivation tasks. Here, we introduce some common differentiation formulas related to matrices and vectors.
Read
A General Solution to Stock Problems in Dynamic Programming
September 14, 2018
There is a type of dynamic programming problem where you are given a stock price sequence and need to calculate the maximum profit from buying and selling stocks. These problems often have many variations, such as allowing only one transaction, multiple transactions, or imposing a transaction tax. The maximum profit is usually determined by the timing of the trades and the allowed maximum number of transactions (each transaction being a combination of one buy and one sell).
Read
Definition of Convex Sets and Common Convex Sets
August 31, 2018
Similar to solving optimization problems with only equality constraints as discussed earlier, optimization problems with inequality constraints can also be solved using the Lagrange multiplier method.
Read
Derivation of SVM (3)
August 24, 2018
In the previous post, we introduced the derivation of hard-margin SVM. This article will continue with the mathematical derivation of soft-margin SVM, which allows for some misclassification when samples are not linearly separable.
Read
Derivation of SVM (2)
August 18, 2018
In the previous article (1), we discussed the derivation of hard-margin SVM and its dual form, which can be simplified into the following form.
Read
Derivation of SVM (1)
August 10, 2018
SVM is a classic method in machine learning. Besides hard-margin SVM, it includes variants like soft-margin SVM and kernel tricks. This article mainly introduces the derivation of **hard-margin SVM**.
Read
Solving Systems of Linear Equations (3)
July 26, 2018
The pseudoinverse discussed here is the **Moore-Penrose inverse matrix**.
Read
Solving Systems of Linear Equations (2)
July 21, 2018
In the previous blog post, we discussed one scenario of linear equations where the number of unknowns is less than the number of equations, introducing the least squares method. In this post, we will cover another scenario where the number of equations is less than the number of unknowns. In this case, the system has infinitely many solutions, but there is only one solution closest to the origin, known as the **minimum norm solution** of the linear equations.
Read
207. Course Schedule
July 20, 2018
This topic uses DFS and BFS to determine if a graph can be topologically sorted.
Read
Solving Linear Equations (1)
July 20, 2018
In this post, we will discuss solving a specific case of linear equations, namely considering linear equations.
Read
Numerical Computation in Machine Learning (1)
July 14, 2018
Machine learning algorithms often require extensive numerical computations, solving for approximations through iteration rather than analytical solutions. These algorithms typically involve optimization and solving linear equations. Since computers represent various floating-point numbers with limited precision, certain methods are needed to ensure computational accuracy.
Read
Training a Simple Neural Network with TensorFlow
July 6, 2018
In this blog post, we use TensorFlow's Eager Execution to build models, eliminating the need to create Graphs and Sessions as before, making neural network training more convenient and faster. We will train a neural network using the Iris dataset as an example, with code from Google's tutorial.
Read
Deep Learning on Geek Cloud
June 29, 2018
Recently, I've been working on an image-related deep learning task assigned by my teacher. After debugging the code, I realized my laptop's memory (8GB) wasn't sufficient. Later, I discovered a very useful deep learning cloud service platform.
Read
Radar + Camera Data Fusion in KITTI
June 15, 2018
The KITTI dataset offers a variety of data; here, we select the raw_data for integration.
Read
Solving Optimization Problems with Inequality Constraints
June 8, 2018
Similar to solving optimization problems with only equality constraints discussed earlier, optimization problems with inequality constraints can also be solved using the Lagrange multiplier method.
Read
Constructors in C++
June 2, 2018
Each class defines how its objects are initialized through one or more special member functions called **constructors**. The constructor's task is to initialize the data members of the class object, and it is executed whenever a class object is created.
Read
Derivation of Neural Network Backpropagation
June 1, 2018
In the training process of neural networks, the backpropagation algorithm is the core.
Read
Associative Containers in C++
June 1, 2018
Associated containers support efficient keyword lookup and access. The two main associated containers are `set` and `map`. Elements in a `map` are key-value pairs, where the keyword acts as an index and the value represents the data associated with the index. Elements in a `set` contain only a keyword. `Set` supports efficient keyword lookup operations, likely implemented using a hash table.
Read
Sequential Containers in C++
May 25, 2018
A container is a collection of objects of a specific type. Sequence containers provide the ability to control the order of storage and access of elements.
Read
Introduction to Decision Tree and Random Forest Algorithms
May 24, 2018
Decision trees are a method for classification and regression. This post focuses on decision trees used for classification. A decision tree has a tree-like structure and represents the process of classifying data based on features. It can be seen as a collection of if-then rules or as a conditional probability distribution defined over feature and class spaces. The main advantages are good model interpretability and fast classification speed. During training, a decision tree model is built using training data by minimizing a loss function. For prediction, new data is classified using the decision tree. Learning a decision tree typically involves three steps: feature selection, tree generation, and tree pruning. The concepts of decision trees mainly originate from Quinlan's ID3 algorithm (1986) and C4.5 algorithm (1993), as well as the CART algorithm proposed by Breiman et al. in 1984.
Read
I/O Classes in C++
May 18, 2018
C++ does not handle input and output directly; instead, it uses a set of types defined in the standard library for IO operations. These types support reading from and writing to devices like files and console windows. Some types also allow memory IO, such as reading from and writing to strings.
Read
Solving Optimization Problems with Equality Constraints
May 18, 2018
This article will discuss optimization problems for such shapes.
Read
Dual Problems in Linear Programming
May 11, 2018
Every linear programming problem has a corresponding dual problem, which is also a linear programming problem. The dual of the dual problem is the original problem. The optimal solution of the original problem can be obtained from the dual problem. Sometimes, using dual theory to solve linear programming problems is simpler and provides a deeper understanding of the problem's nature. Inspired by dual theory, the performance of the simplex method has been improved, and some non-simplex methods for solving linear programming problems have emerged, which will not be detailed in this article.
Read
Parameter Passing in C++ Functions
May 4, 2018
In a C++ program, when calling a function, you need to pass an argument to it. Apart from void, argument passing is divided into **pass by reference** and **pass by value**.
Read
Simplex Algorithm for Solving Linear Programming Problems
May 4, 2018
In 1947, Dantzig introduced a method for solving linear programming problems, known today as the simplex method. This concise and efficient algorithm is hailed as one of the top ten algorithms of the 20th century with the greatest impact on scientific development and engineering practice.
Read
Overview of Linear Programming
April 27, 2018
In optimization problems, there is a category known as linear programming problems, which are constrained optimization problems. Linear programming involves finding the extremum of a linear objective function under **linear constraints** (equalities or inequalities).
Read
The `const` Keyword in C++
April 26, 2018
When programming, we often need to define a variable whose value doesn't change, such as pi=3.14, e=2.72, or the elastic modulus of a material. In these cases, the const keyword is used.
Read
Previous
Next